Project

General

Profile

UArizona issues

Some strings are triple-encoded with UTF-8

Rows affected: 4994

Sample row:

CatalogNumberNumeric Collector Country StateProvince Locality
266048 E.A. Mearns Mexico México Oro Blanco, Picacho, [boundary], Mex. .

Error: StateProvince field contains extra Unicode characters

Decode three times with UTF-8 Decoder

Some dates are missing a month

Rows affected: 1+

Errors:

SyntaxException: Invalid XML function syntax: ValueError: month must be in 1..12 
function:
<_date><date>29  1999</date></_date>
row #: 10587

Some dates seem to contain three different days of the month

Rows affected: 3+

Errors:

DataError: time zone displacement out of range: "18 May 18-19 1975" 
DataError: time zone displacement out of range: "24 July , 18-22 1913" 
SyntaxException: Invalid XML function syntax: ValueError: unknown string format 
function:
<_date><date>26 27 28 Septem 1913</date></_date>
row #: 4057

The staging CSV import process interpreted backslashes before quotes as escape characters, when they should be treated literally: FIXED by importing directly from CSV

Rows affected: 3

Sample row:

Invalid CSV CatalogNumberNumeric Collector CollectorNumber FieldNumber YearCollected MonthCollected DayCollected CollectedDate TimeOfDay VerbatimCollectingDate
"Larry Hendrickson\","841","841" * 206666 Larry Hendrickson","841 841 1997 5 7 0 NULL 7 May 1997 NULL
Invalid CSV CatalogNumberNumeric FieldNotes County Locality DecimalLatitude
arroyo.\","","Mexico:Sonora: 205646 Uncommon small tree in tropical eciduous forest on sloper above arroyo.",","Mexico:Sonora: 26.85 -108.91667 0
Invalid CSV CatalogNumberNumeric Remarks
"\"\n 212836 "\n"Herbarium:ARIZ:dbsn212926

 * Note that the second "841" is the FieldNumber, which happens to be the same as the CollectorNumber

Error: Fields contain CSV formatting; columns are shifted to the left (sometimes taken from the next row)