bin/map: Use new csvs.reader_and_header() to support CSVs/TSVs with other than the default Excel dialect
Added csvs.py for CSV I/O such as automatically detecting the dialect based on the header line
join: Don't append suffix to empty output mappings, so that they stay empty ("NULL")
input.Makefile: Added tsv to $(exts). Strip extra whitespace from $(inputs) so that it's the empty string if $(<in) (and $(<in).header) don't exist, and can be used in $(if ...).
input.Makefile: Fixed bug in inputFiles wildcard where extensions were manually listed instead of dynamically determined from the $(exts) config var
README.TXT: Tell user to `disown -h 1` after running `make import x%x` so that it won't be sent a SIGHUP if the user logs out
input.Makefile: Prepend separate CSV header when available
input.Makefile: Use with_cat in map to later support prepending separate CSV headers
Added with_cat to run a command, taking input from the concatenation of files
input.Makefile: Set mapEnv if $(dbEngine) is set, to eventually support pre-existing DB connections
input.Makefile: Changed $(dbFile) to $(dbExport) to make it unambiguous that it refers to a SQL export, not a pre-existing DB, which will be supported later
input.Makefile: Added .txt to list of input file extensions
Added inputs/SpeciesLink
root Makefile: python-Linux: Added pymetrics
bin/map: Consider \N to be None
util.py: none_if(): Allow multiple none_vals using varargs
Added inputs/GBIF
exc.py: Fixed bug in traceback-saving mechanism that didn't deal with nested Exceptions (such as Exceptions with causes in ExceptionWithCause). Renamed add_exc_info() to add_traceback() since we really only need to store the traceback.
dates.py: parse_date_range(): Fixed bug where the date parts were not joined back together into a string for each date range element. Use strings.single_space() after the date has been split into range parts so that whitespace around the range separator is removed instead of being replaced with a single space.
xml_func.py: process(): Also catch XML func internal errors to assist in debugging. Use new exc.add_exc_info() to save traceback in case later code throws exception, overwriting exc_info().
exc.py: str_(): Add the traceback at the end of the exception string. Added add_exc_info() and get_exc_info() for providing traceback info for str_().
mappings/DwC2-VegBIEN.specimens.csv: eventDate, dateIdentified: Use _dateRangeStart and _dateRangeEnd
xml_func.py: Added _dateRangeStart and _dateRangeEnd
dates.py: Added parse_date_range() and helper funcs could_be_year() and could_be_day()
strings.py: Added single_space()
inputs/UArizona*: Map the ScientificNameAuthor to the binomial instead since it contains the binomial in addition to the authority
Added inputs/UArizona-CSV/test
input.Makefile: Use .PRECIOUS to save outputs of failed tests so they can be accepted (needed now that .DELETE_ON_ERROR is turned on globally)
bin/map: Moved string-cleanup code from get_value() to cleanup(), called by process_row(). process_row() now cleans up the string before checking if it's None, because cleanup() uses none_if() to map "" to None.
util.py: Added do_ignore_none()
Added inputs/UArizona-CSV/verify
Added inputs/UArizona-CSV/maps
mappings/DwC2-VegBIEN.specimens.csv: Mapped coordinateUncertaintyInMeters to the same place as coordinatePrecision (input sources generally use only one of these columns, which is most likely the accuracy regardless of what it's named)
join: In error message when map column names don't match, include the actual column names
Makefiles: Added .DELETE_ON_ERROR to delete target if recipe fails
VegBIEN mappings: plantnames: Nest taxons hierarchically using plantname.parent_id. Mappings using _forEach: Append a "," to the `in` list so that mappings will sort from shortest to longest `in` list ("]" comes after "," in ASCII, causing this not to happen without the trailing ",").
xpath.py: parse(): _paths(): Remove trailing ","
xpath_func.py: _forEach: Made syntax more natural-looking by using values instead of names for string args and attrs instead of branches for array args
xpath.py: parse() Fixed bug in _paths() where empty lists would be parsed as a list containing a single empty path, instead of as an empty list
VegBIEN mappings: Place names: Use _forEach to simplify XPaths for recursively nested places
bin/map: In debug mode, print output XPaths
xpath_func.py: _forEach: Fixed to support _val replacements anywhere, by doing a string-based search-and-replace on a quoted XPath instead of a list-based search-and-replace on an already-parsed XPath
xpath_func.py: Renamed _for to _forEach. Finished implementing _forEach.
xpath.py: Import xpath_func after defining XpathElem because xpath_func depends on XpathElem and it hasn't yet been factored into a separate file
util.py: Added list_replace()
xpath_func.py: Changed XPath function signature to take arguments (args, path), and process() to parse out the args. Implemented basic for that repeats its do arg as many times as there are in elements.
xpath.py: parse(): Run xpath_func.process() on the parsed XPath
Added xpath_func.py for XPath "function" elements that transform their subpaths
VegBIEN mappings: Removed no longer needed taxondetermination.determinationtype values, because they can be determined from the new role closed list
filter_ERD.csv: Removed no longer needed references to role
Regenerated vegbien.ERD exports
VegBIEN: Changed role table to a closed list
PostgreSQL-MySQL.csv: custom types: Consider everything except a set of accepted types to be a custom type
VegBIEN: taxonrank enum: Made values lowercase to match case convention in other enums
vegbien.sql: Renamed plantconceptscope to plantnamescope because it's now attached to plantname
vegbien.sql: Moved parent_id from plantconcept to plantname, since plantnames themselves are unique according to their parent taxons (a species under one genus is not the same as a species under another genus)
vegbien.ERD.mwb: Fixed lines
vegbien.sql: Moved scope_id from plantconcept to plantname, since plantnames themselves are scoped, not just the plantconcepts that use them (e.g. "sp. 1" has different meanings in different scopes, so it should not be shared between scopes). plantname: Added accessioncode.
vegbien.sql: Moved plantconcept parent_id from plantstatus to plantconcept. plantconcept: Removed datasource-specific fields to make it globally unique (one plantconcept for each assigned parent taxon of a plantname, of which there will usually be just one)
vegbien.sql: plantname: Removed datasource-specific fields to make this a globally-unique table (the datasource-specific fields belong in plantconcept)
Added inputs/UArizona/verify
mappings/verify.specimens.sql: Updated for schema changes
vegbien.sql: placerank enum: Added "village"
VegBIEN mappings: lat/long locationdetermination: Removed [!namedplace_id] key so that it's merged into the namedplace locationdetermination
VegBIEN mappings: Changed namedplace mappings to use new nested format for storing place containment relationships
xml_func.py: Added _simplifyPath
xpath.py: Added get_1()
vegbien.sql: namedplace: Removed parent_id from unique constraint because some data might be missing intervening links (e.g. state for a county, country), but the place (e.g. county) should still be attached to the existing place of the same name and rank (which will hopefully already have the correct parent_id link)
vegbien.sql: namedplace: Made rank required
vegbien.sql: namedplace: Removed no longer needed placesystem, which has been replaced by rank closed list
VegBIEN mappings: Map namedplaces using new rank field
vegbien.sql: namedplace: Added rank. Do duplicate elimination using rank and parent_id instead of placesystem
vegbien.sql: placerank: Standardized names to DwC/GML
vegbien.sql: Added placerank enum
vegbien.sql: namedplace: Removed VegBank internal fields and datasource scoping fields (namedplaces are globally unique). Added parent_id to point to containing namedplace.
xml_func.py: Added _dateRangePart with partial implementation (only works on strings with no range)
DwC mappings: Moved date _date filter outside _alt so it would run only on the string that was actually chosen, and not produce date format errors when a pre-parsed year/month/day is already available
xml_func.py: _date: Map date with only empty fields to NULL (occurs when all fields were e.g. 0 and were filtered to NULL by _nullIf)
xml_func.py: _date: Removed mapping year/month/day of 0 to NULL because that is now handled on a case-by-case basis in the mappings
mappings/DwC1-DwC2.specimens.csv: Map year/month/day of 0 to NULL
inputs/SALVIAS/maps/VegX.organisms.csv: Habit: Fixed syntax error in growthForm map
inputs/SALVIAS/maps/VegX.organisms.csv: Habit: Removed input values from growthForm map that Brad said were invalid
xml_func.py: _map: Added option to make map a closed list
mappings/DwC2-VegBIEN.specimens.csv: Fixed waterdepth mappings to use _avg
mappings/verify.specimens.sql: Use ORDER BY ... NULLS FIRST to match MySQL
input.Makefile: verify: Time the verification since it can take a long time
specimens verification: Added duplicate catalog numbers test
map: On nimoy, use bien2_staging unless otherwise specified
specimens verification: Added # counties test
specimens verification: Added collection codes and # catalog numbers tests
inputs/SALVIAS/maps/VegX.organisms.csv: Mapped custom Habit values not listed in the SALVIAS data dictionary
strings.py: Added unicode_reader for later use in handling Unicode characters in map spreadsheets
xpath.py: Removed unnecessary copy.deepcopy()'s and instead changed set_value() and set_id() to make copies of any elements they change. This should result in up to a 17% speed increase in the import, because deepcopy() was taking a lot of time. Added documentation to set_value() and set_id() that caller must make a shallow copy of the path to prevent modifications from propagating to other copies of the path. (Previously, a deep copy was needed, but there was no comment specifying this.)
mappings/VegX-VegBIEN.organisms.csv: Removed unneeded lookahead assertions from stemtag mappings. They relied on a bug ("feature"?) in the XPath engine that made the value of the lookahead assertion's path the same as the value of the main path, even though the value is set after the path is parsed.
xml_func.py: _date: For year/month/day dates, require the year (it would not make sense to default to a particular year)
inputs/UArizona: Added test outputs
mappings/DwC1-DwC2.specimens.csv: Fixed to allow datasource to define custom date mappings that don't pass through the default date mapping