VegBIEN mappings: distance fields: Remove units
xml_func.py: _units: Allow value to be NULL
xml_func.py: _units: Use new format.cleanup_units() to do units parsing
format.py: Added clean_numeric(), str2int(), str2float(). Added units-related functions. Added documentation labels to each section.
Added filter_errors to filters `map` error messages
Renamed bin/errors_filter_* to filter_errors_* to sound more natural and to have a different prefix than error_stats so that both can easily be tab-completed at the command line
README.TXT: Testing: Added instructions for testing just mapping process, just map spreadsheet generation, and everything
root Makefile: Added test-all for most complete coverage. Removed extraneous ";" at the end of the prerequisites line of rules with a recipe.
mappings/Makefile: Use new ci_map to make DwC.cs-VegBIEN.specimens.csv case-insensitive
Added ci_map to make a map spreadsheet case-insensitive.
mappings: DwC: Generate case-insensitive map of DwC1 and DwC2 together, rather than just DwC2. DwC1-DwC2.specimens.csv: Make input columns lowercase so that case-insensitization will work properly.
inputs/SpeciesLink: Switched to using flat files instead of DB
inputs/MO: Switched to using flat files instead of DB
input.Makefile: Mapping: Support multiple segments of a source table flat file. Use with_cat_csv if flat file segment(s) are available; otherwise use the input file in $+ or the input database, if any. Don't look for an explicit CSV header file because it can now be handled as the first segment if appropriately named.
Added with_cat_csv
with_cat: Added support for custom cat command in env var
cat_csv: Abort if output stream closed instead of exiting with an IOError
cat_csv: Ignore any duplicated headers instead of requiring each CSV to have a header identical to the first. Rewrote to pass the CSVs through as lines rather than parsing each row. Because the CSVs are not parsed, checked that all CSVs have the same dialect.
csvs.py: Added csv modifications to compare Dialect instances
util.py: Added classes_eq()
csvs.py: Added stream_info() to return NamedTuple {header_line, dialect} for later use in cat_csv. Changed reader_and_header() to use stream_info().
util.py: Added NamedTuple
csvs.py: reader_and_header(): Restrict delimiters to common delimiters so that e.g. letters are not considered delimiters just because they appear frequently
Renamed inputs/NYBG to inputs/NY to match herbarium code
Renamed inputs/UNC-NCSC to inputs/NCU-NCSC to match herbarium code
Renamed inputs/UArizona to inputs/ARIZ to match herbarium code
Regenerated inputs/MO/maps/src.join.specimens.csv
Renamed inputs/MOBOT to inputs/MO to match herbarium code
Regenerated vegbien.ERD exports
vegbien.sql: taxonoccurrence: Added cultivatedbasis
vegbien.sql: Moved all accessioncode fields to the bottom of their tables. vegbien.ERD.mwb: Adjusted lines to remove overlaps.
vegbien.sql: taxonoccurrence: Added iscultivated, isnative. Moved accessioncode to bottom.
vegbien.sql: Changed taxonoccurrence.growthform type to more specific growthform
vegbien.sql: Added growthform and establishmentmeans_dwc enums using values from taxonclass. Documented that taxonclass is growthform + establishmentmeans_dwc + some other values.
VegBIEN: Moved aggregateoccurrence.growthform to taxonoccurrence
Added inputs/UNC-NCSC/maps/src.join.specimens.csv
VegBIEN: Merged aggregateoccurrence.verbatimcollectorname and specimenreplicate.verbatimcollectorname into taxonoccurrence
xml_func.py: parse_range(): Handle negative numbers by treating them as not a range
Added inputs/UNC-NCSC/test with initial accepted test outputs
Added inputs/UNC-NCSC/maps
xml_func.py: _replace: Fixed bug where value entry was not unpacked
Added inputs/UNC-NCSC
Added inputs/MOBOT/test with initial accepted test outputs
Added inputs/MOBOT/maps
Added inputs/MOBOT
VegX mappings: Updated plot place mappings to VegX 1.5.3 method of place type-tagged place names. This removes the userdef fields in plot.
VegX mappings: Changed userdef xPosition, yPosition to /relativePlotPosition/relativeX, /relativePlotPosition/relativeY
Regenerated mappings/DwC-VegBIEN.specimens.no_empty.csv
bin/map: map_table(): wrap_row(): Use util.list_as_length() to handle CSV rows of different lengths
util.py: Added list_as_length(). Documented that list_set_length() takes a list, not a tuple. Documented that ListDict must have len(list_) == len(keys).
util.py: Added list_set_length(). Changed list_set() to use list_set_length().
mappings/DwC2-VegBIEN.specimens.csv: Added empty *_id/taxonoccurrence attr to primary keys to ensure that a taxonoccurrence is always created for the specimenreplicate
xml_func.py: _label: Use ustr instead of str when checking types
csvs.py: Set dialect.doublequote to True because Sniffer doesn't turn this on by default
Merged inputs/NYBG-CSV into NYBG
Merged inputs/UArizona-CSV into UArizona
Added inputs/SpeciesLink/test
Added inputs/SpeciesLink/maps
xml_func.py: range-related funcs: Made inputs optional in case they get set to NULL by _nullIf
mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema
bin/map: Added support for getting columns with an optional prefix list for DB/CSV inputs
bin/map: Factored out code common to DB and CSV inputs into map_table()
bin/map: Parse any prefixes in map input column name. They will later be used to check for versions of columns with a prefix added when processing CSV/DB inputs.
strings.py: Added split(), remove_prefix(), remove_suffix(), and remove_prefixes(). Added section comments.
mappings/DwC2-VegBIEN.specimens.csv: minimumElevationInMeters: Handle embedded ranges using _rangeStart and _rangeEnd
xml_func.py: Added _rangeStart and _rangeEnd
xpath.py: parse(): Split paths: Raise a SyntaxException if can't attach a split path because there is no parent element to attach to
Parser.py: Renamed _syntax_err() to syntax_err() to make it a public method
mappings/DwC2-VegBIEN.specimens.csv: Mapped fieldNotes and taxonRemarks to description using _merge. inputs/UArizona*/maps/DwC.specimens.csv: Mapped Remarks to taxonRemarks, which now has a VegBIEN mapping.
Added inputs/GBIF/src with small files that can be under version control
input.Makefile: svn_props: Ignore everything in the src/ subdir that hasn't been explicitly checked in
Added inputs/GBIF/test with accepted test outputs
Added inputs/GBIF/maps
Regenerated inputs/UArizona*/maps VegBIEN maps
bin/map: Use new csvs.reader_and_header() to support CSVs/TSVs with other than the default Excel dialect
Added csvs.py for CSV I/O such as automatically detecting the dialect based on the header line
join: Don't append suffix to empty output mappings, so that they stay empty ("NULL")
input.Makefile: Added tsv to $(exts). Strip extra whitespace from $(inputs) so that it's the empty string if $(<in) (and $(<in).header) don't exist, and can be used in $(if ...).
input.Makefile: Fixed bug in inputFiles wildcard where extensions were manually listed instead of dynamically determined from the $(exts) config var
README.TXT: Tell user to `disown -h 1` after running `make import x%x` so that it won't be sent a SIGHUP if the user logs out
input.Makefile: Prepend separate CSV header when available
input.Makefile: Use with_cat in map to later support prepending separate CSV headers
Added with_cat to run a command, taking input from the concatenation of files
input.Makefile: Set mapEnv if $(dbEngine) is set, to eventually support pre-existing DB connections
input.Makefile: Changed $(dbFile) to $(dbExport) to make it unambiguous that it refers to a SQL export, not a pre-existing DB, which will be supported later
input.Makefile: Added .txt to list of input file extensions
Added inputs/SpeciesLink
root Makefile: python-Linux: Added pymetrics
bin/map: Consider \N to be None
util.py: none_if(): Allow multiple none_vals using varargs
Added inputs/GBIF
exc.py: Fixed bug in traceback-saving mechanism that didn't deal with nested Exceptions (such as Exceptions with causes in ExceptionWithCause). Renamed add_exc_info() to add_traceback() since we really only need to store the traceback.
dates.py: parse_date_range(): Fixed bug where the date parts were not joined back together into a string for each date range element. Use strings.single_space() after the date has been split into range parts so that whitespace around the range separator is removed instead of being replaced with a single space.
xml_func.py: process(): Also catch XML func internal errors to assist in debugging. Use new exc.add_exc_info() to save traceback in case later code throws exception, overwriting exc_info().
exc.py: str_(): Add the traceback at the end of the exception string. Added add_exc_info() and get_exc_info() for providing traceback info for str_().
mappings/DwC2-VegBIEN.specimens.csv: eventDate, dateIdentified: Use _dateRangeStart and _dateRangeEnd
xml_func.py: Added _dateRangeStart and _dateRangeEnd