UArizona issues¶
- Table of contents
- UArizona issues
Some strings are triple-encoded with UTF-8¶
Rows affected: 4994
Sample row:
CatalogNumberNumeric | Collector | Country | StateProvince | Locality |
266048 | E.A. Mearns | Mexico | México | Oro Blanco, Picacho, [boundary], Mex. . |
Error: StateProvince field contains extra Unicode characters
Decode three times with UTF-8 Decoder
Some dates are missing a month¶
Rows affected: 1+
Errors:
SyntaxException: Invalid XML function syntax: ValueError: month must be in 1..12 function: <_date><date>29 1999</date></_date> row #: 10587
Some dates seem to contain three different days of the month¶
Rows affected: 3+
Errors:
DataError: time zone displacement out of range: "18 May 18-19 1975"
DataError: time zone displacement out of range: "24 July , 18-22 1913"
SyntaxException: Invalid XML function syntax: ValueError: unknown string format function: <_date><date>26 27 28 Septem 1913</date></_date> row #: 4057
The staging CSV import process interpreted backslashes before quotes as escape characters, when they should be treated literally: FIXED by importing directly from CSV¶
Rows affected: 3
Sample row:
Invalid CSV | CatalogNumberNumeric | Collector | CollectorNumber | FieldNumber | YearCollected | MonthCollected | DayCollected | CollectedDate | TimeOfDay | VerbatimCollectingDate |
"Larry Hendrickson\","841","841" * |
206666 | Larry Hendrickson","841 | 841 | 1997 | 5 | 7 | 0 | NULL | 7 May 1997 | NULL |
Invalid CSV | CatalogNumberNumeric | FieldNotes | County | Locality | DecimalLatitude |
arroyo.\","","Mexico:Sonora: |
205646 | Uncommon small tree in tropical eciduous forest on sloper above arroyo.",","Mexico:Sonora: | 26.85 | -108.91667 | 0 |
Invalid CSV | CatalogNumberNumeric | Remarks |
"\"\n |
212836 | "\n"Herbarium:ARIZ:dbsn212926 |
* Note that the second "841" is the FieldNumber, which happens to be the same as the CollectorNumber
Error: Fields contain CSV formatting; columns are shifted to the left (sometimes taken from the next row)