csvs.py: Added stream_info() to return NamedTuple {header_line, dialect} for later use in cat_csv. Changed reader_and_header() to use stream_info().
util.py: Added NamedTuple
csvs.py: reader_and_header(): Restrict delimiters to common delimiters so that e.g. letters are not considered delimiters just because they appear frequently
Renamed inputs/NYBG to inputs/NY to match herbarium code
Renamed inputs/UNC-NCSC to inputs/NCU-NCSC to match herbarium code
Renamed inputs/UArizona to inputs/ARIZ to match herbarium code
Regenerated inputs/MO/maps/src.join.specimens.csv
Renamed inputs/MOBOT to inputs/MO to match herbarium code
Regenerated vegbien.ERD exports
vegbien.sql: taxonoccurrence: Added cultivatedbasis
vegbien.sql: Moved all accessioncode fields to the bottom of their tables. vegbien.ERD.mwb: Adjusted lines to remove overlaps.
vegbien.sql: taxonoccurrence: Added iscultivated, isnative. Moved accessioncode to bottom.
vegbien.sql: Changed taxonoccurrence.growthform type to more specific growthform
vegbien.sql: Added growthform and establishmentmeans_dwc enums using values from taxonclass. Documented that taxonclass is growthform + establishmentmeans_dwc + some other values.
VegBIEN: Moved aggregateoccurrence.growthform to taxonoccurrence
Added inputs/UNC-NCSC/maps/src.join.specimens.csv
VegBIEN: Merged aggregateoccurrence.verbatimcollectorname and specimenreplicate.verbatimcollectorname into taxonoccurrence
xml_func.py: parse_range(): Handle negative numbers by treating them as not a range
Added inputs/UNC-NCSC/test with initial accepted test outputs
Added inputs/UNC-NCSC/maps
xml_func.py: _replace: Fixed bug where value entry was not unpacked
Added inputs/UNC-NCSC
Added inputs/MOBOT/test with initial accepted test outputs
Added inputs/MOBOT/maps
Added inputs/MOBOT
VegX mappings: Updated plot place mappings to VegX 1.5.3 method of place type-tagged place names. This removes the userdef fields in plot.
VegX mappings: Changed userdef xPosition, yPosition to /relativePlotPosition/relativeX, /relativePlotPosition/relativeY
Regenerated mappings/DwC-VegBIEN.specimens.no_empty.csv
bin/map: map_table(): wrap_row(): Use util.list_as_length() to handle CSV rows of different lengths
util.py: Added list_as_length(). Documented that list_set_length() takes a list, not a tuple. Documented that ListDict must have len(list_) == len(keys).
util.py: Added list_set_length(). Changed list_set() to use list_set_length().
mappings/DwC2-VegBIEN.specimens.csv: Added empty *_id/taxonoccurrence attr to primary keys to ensure that a taxonoccurrence is always created for the specimenreplicate
xml_func.py: _label: Use ustr instead of str when checking types
csvs.py: Set dialect.doublequote to True because Sniffer doesn't turn this on by default
Merged inputs/NYBG-CSV into NYBG
Merged inputs/UArizona-CSV into UArizona
Added inputs/SpeciesLink/test
Added inputs/SpeciesLink/maps
xml_func.py: range-related funcs: Made inputs optional in case they get set to NULL by _nullIf
mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema
bin/map: Added support for getting columns with an optional prefix list for DB/CSV inputs
bin/map: Factored out code common to DB and CSV inputs into map_table()
bin/map: Parse any prefixes in map input column name. They will later be used to check for versions of columns with a prefix added when processing CSV/DB inputs.
strings.py: Added split(), remove_prefix(), remove_suffix(), and remove_prefixes(). Added section comments.
mappings/DwC2-VegBIEN.specimens.csv: minimumElevationInMeters: Handle embedded ranges using _rangeStart and _rangeEnd
xml_func.py: Added _rangeStart and _rangeEnd
xpath.py: parse(): Split paths: Raise a SyntaxException if can't attach a split path because there is no parent element to attach to
Parser.py: Renamed _syntax_err() to syntax_err() to make it a public method
mappings/DwC2-VegBIEN.specimens.csv: Mapped fieldNotes and taxonRemarks to description using _merge. inputs/UArizona*/maps/DwC.specimens.csv: Mapped Remarks to taxonRemarks, which now has a VegBIEN mapping.
Added inputs/GBIF/src with small files that can be under version control
input.Makefile: svn_props: Ignore everything in the src/ subdir that hasn't been explicitly checked in
Added inputs/GBIF/test with accepted test outputs
Added inputs/GBIF/maps
Regenerated inputs/UArizona*/maps VegBIEN maps
bin/map: Use new csvs.reader_and_header() to support CSVs/TSVs with other than the default Excel dialect
Added csvs.py for CSV I/O such as automatically detecting the dialect based on the header line
join: Don't append suffix to empty output mappings, so that they stay empty ("NULL")
input.Makefile: Added tsv to $(exts). Strip extra whitespace from $(inputs) so that it's the empty string if $(<in) (and $(<in).header) don't exist, and can be used in $(if ...).
input.Makefile: Fixed bug in inputFiles wildcard where extensions were manually listed instead of dynamically determined from the $(exts) config var
README.TXT: Tell user to `disown -h 1` after running `make import x%x` so that it won't be sent a SIGHUP if the user logs out
input.Makefile: Prepend separate CSV header when available
input.Makefile: Use with_cat in map to later support prepending separate CSV headers
Added with_cat to run a command, taking input from the concatenation of files
input.Makefile: Set mapEnv if $(dbEngine) is set, to eventually support pre-existing DB connections
input.Makefile: Changed $(dbFile) to $(dbExport) to make it unambiguous that it refers to a SQL export, not a pre-existing DB, which will be supported later
input.Makefile: Added .txt to list of input file extensions
Added inputs/SpeciesLink
root Makefile: python-Linux: Added pymetrics
bin/map: Consider \N to be None
util.py: none_if(): Allow multiple none_vals using varargs
Added inputs/GBIF
exc.py: Fixed bug in traceback-saving mechanism that didn't deal with nested Exceptions (such as Exceptions with causes in ExceptionWithCause). Renamed add_exc_info() to add_traceback() since we really only need to store the traceback.
dates.py: parse_date_range(): Fixed bug where the date parts were not joined back together into a string for each date range element. Use strings.single_space() after the date has been split into range parts so that whitespace around the range separator is removed instead of being replaced with a single space.
xml_func.py: process(): Also catch XML func internal errors to assist in debugging. Use new exc.add_exc_info() to save traceback in case later code throws exception, overwriting exc_info().
exc.py: str_(): Add the traceback at the end of the exception string. Added add_exc_info() and get_exc_info() for providing traceback info for str_().
mappings/DwC2-VegBIEN.specimens.csv: eventDate, dateIdentified: Use _dateRangeStart and _dateRangeEnd
xml_func.py: Added _dateRangeStart and _dateRangeEnd
dates.py: Added parse_date_range() and helper funcs could_be_year() and could_be_day()
strings.py: Added single_space()
inputs/UArizona*: Map the ScientificNameAuthor to the binomial instead since it contains the binomial in addition to the authority
Added inputs/UArizona-CSV/test
input.Makefile: Use .PRECIOUS to save outputs of failed tests so they can be accepted (needed now that .DELETE_ON_ERROR is turned on globally)
bin/map: Moved string-cleanup code from get_value() to cleanup(), called by process_row(). process_row() now cleans up the string before checking if it's None, because cleanup() uses none_if() to map "" to None.
util.py: Added do_ignore_none()
Added inputs/UArizona-CSV/verify
Added inputs/UArizona-CSV/maps
mappings/DwC2-VegBIEN.specimens.csv: Mapped coordinateUncertaintyInMeters to the same place as coordinatePrecision (input sources generally use only one of these columns, which is most likely the accuracy regardless of what it's named)
join: In error message when map column names don't match, include the actual column names
Makefiles: Added .DELETE_ON_ERROR to delete target if recipe fails
VegBIEN mappings: plantnames: Nest taxons hierarchically using plantname.parent_id. Mappings using _forEach: Append a "," to the `in` list so that mappings will sort from shortest to longest `in` list ("]" comes after "," in ASCII, causing this not to happen without the trailing ",").
xpath.py: parse(): _paths(): Remove trailing ","
xpath_func.py: _forEach: Made syntax more natural-looking by using values instead of names for string args and attrs instead of branches for array args
xpath.py: parse() Fixed bug in _paths() where empty lists would be parsed as a list containing a single empty path, instead of as an empty list
VegBIEN mappings: Place names: Use _forEach to simplify XPaths for recursively nested places
bin/map: In debug mode, print output XPaths
xpath_func.py: _forEach: Fixed to support _val replacements anywhere, by doing a string-based search-and-replace on a quoted XPath instead of a list-based search-and-replace on an already-parsed XPath
xpath_func.py: Renamed _for to _forEach. Finished implementing _forEach.
xpath.py: Import xpath_func after defining XpathElem because xpath_func depends on XpathElem and it hasn't yet been factored into a separate file