xml_func.py: XML functions that assume their last argument is a value (_map, etc.): Use new helper function pop_value() to retrieve this value. Return None if value is None because this indicates the input is empty.
xml_func.py: _date: Use format.str2int instead of int to convert date parts to int so that strange formatting will be parsed correctly
format.py: clean_numeric(): Also fix some OCR errors
filter_errors: Default to outputing only the first match
xpath.py: Added append() to recursively append subpath to every leaf of a path tree. parse(): Use append() to fix bug in split path parsing where subpath was not added to every leaf of the tree, only the main leaf of the main branch and the main leaves of the other branches of the last element.
exc.py: Changed to store multiple tracebacks in an exception, in case an exception is caught and re-raised inside an ExceptionWithCause wrapper. This preserves more of the traceback in this situation, because you get the ExceptionWithCause's traceback as well.
input.Makefile: import: Removed verbose=1 because verbose mode is now automatically on (except in test mode)
bin/map: verbose mode defaults to off in test mode and on otherwise
bin/map: In verbose mode, print which input rows will be processed
bin/map: n option: Defaults to 1 in test mode. Empty string "" is interpreted as None (previously n would have to be unset to specify None).
bin/map: Added section comments to env var config retrieval. Reordered env var config retrieval to put DB config last, since these options are input-type specific and complex, and putting them first hides the more general other options.
inputs/SALVIAS*/maps/VegX.plots.csv: Updated _units for % -> decimal conversion to use new syntax
xml_func.py: _units: If value can't be converted to float, wrap the ValueError in a SyntaxException
units.py: convert(): Added support for unit conversions. Added initial unit conversion for % -> unitless. str2quantity(): Fixed regexp to match % as units. Set Quantity.__repr__ to quantity2str.
units.py: convert(): Put "units None" test after "quantity.units units" test because a destination of no units might require a conversion for some input units (e.g. % -> unitless requires a division by 100)
inputs/SALVIAS*/maps/VegX.organisms.csv: Habit: Ignore invalid values instead of generating a SyntaxException
xml_dom.py: minidom modifications: Escape as many text strings as we use directly. This still leaves the tagName used by xml.dom.minidom.Element.writexml: It uses 'writer.write(indent+"<" + self.tagName)' and doesn't escape the tagName.
xml_func.py: Made everything Unicode-safe by using strings.ustr instead of str
schemas/tree_cross-links.sql: Added comment for how to get the namedplace trigger from the provided plantname trigger
vegbien.sql: Fixed bug in tree cross-link algorithm where recursion to descendants' ancestors did not use new to refer to the current node's plantname_id
vegbien.sql: Fixed bug in tree cross-link algorithm to also insert ancestors for top-level nodes, because they now need an ancestor entry for themselves
Added separate SQL file for tree cross-links code. A link to this can be e-mailed to people to review.
vegbien.sql: Modified tree cross-link algorithm to add an "ancestor" for this node. This is useful for queries, because you don't have to separately test if the leaf node is the one you're looking for, in addition to that leaf node's ancestors.
README.TXT: Added instructions how to stop all running imports
vegbien.sql: Added namedplace_update_ancestors and plantname_update_ancestors triggers to populate ancestor cross-links in new namedplace_ancestor and plantname_ancestor tables
sql.py: insert() (and try_insert()): Added optional returning param to provide name of an inserted column (usually pkey) to return
env_password: Print Usage message if run without initial "."
Added bin/stop_imports to stop all running imports
import_all: Print Usage message if was run without initial "."
Renamed import-all to import_all to match convention of using underscores
inputs/CTFS: Added remaining non-data src files
Added CTFS data dictionary inputs/CTFS/src/ctfs-comments_worksheet.xls
import-all: Fixed to display the datasource name in the job name instead of 'make ${input}import &'
import-all: disown each new import process to ignore SIGHUP
Added jobspecs to extract jobspecs (%#) from (possibly filtered) `jobs` output
README.TXT: Changed `make import &` to `. bin/import-all`
main Makefile: import: Before running imports, print message that `. bin/import-all` can be used to import all inputs at once
Added import-all to import all inputs at once
mappings/DwC2-VegBIEN.specimens.csv: Mapped establishmentMeans, which contains growthform, iscultivated, isnative, etc. combined
inputs/SALVIAS-CSV/maps/VegX.organisms.csv: habit: Updated mapping to match equivalent SALVIAS mapping
xml_func.py: _map: Instead of _closed special entry, make all maps closed by default and open them if special entry "*=*" is present. Support using a _map to filter values by interpreting special entry "*=" as removing all values not explicitly specified, and by interpreting special value "*" as keeping input value the same.
xml_func.py: _date: On error "month must be in 1..12", try swapping month and day
row: Support getting multiple rows. Document that does not handle embedded newlines.
mappings/Makefile: Removed no longer needed DwC-VegBIEN.specimens.no_empty.csv
input.Makefile: Removed no longer needed $(join) command
input.Makefile: Removed no longer needed src join maps
input.Makefile: Generate VegBIEN maps from full via maps in order to include all input columns if a src map was provided. This causes the VegBIEN join process to produce all the "No join mapping" errors for that datasource, not just those for fields in the (non-full) via map. maps/src.join.*.csv should no longer be needed for producing "No join mapping" errors.
mappings/Makefile: Generate DwC-VegBIEN.specimens.csv from new intermediate DwC.ci-VegBIEN.specimens.csv using $(removeEmpty) so that "No join mapping" errors will be reported when maps are joined to it. Deprecate DwC-VegBIEN.specimens.no_empty.csv because it's now identical to DwC-VegBIEN.specimens.csv.
Added inputs/NY/maps/src.specimens.csv
Added reverse_join to inner-join two map spreadsheets in the opposite order they are specified in
input.Makefile: Intersect the generated VegBIEN and full via maps with the src map, if it exists. This reduces the size of the autogen maps significantly by including only the entries used by the datasource.
intersect: Compare columns based on specified compare_col_nums, just like subtract
input.Makefile: Use var $(selfMap) instead of spelling out $(bin)/cols 0 0
mappings/DwC2-VegBIEN.specimens.csv: Mapped continent
inputs/SpeciesLink/maps/DwC.specimens.csv: Mapped remaining fields
inputs/SpeciesLink/maps/src.specimens.csv: Fixed bug where prefixes had not been removed from fields, which prevented join mappings from being found for any of the fields
main Makefile: Added missing_joins to determine which input fields are missing join mappings
xml_func.py: SyntaxException: Inherit from exc.ExceptionWithCause so the traceback will be populated with the cause's traceback instead of the SyntaxException wrapper's traceback
Added inputs/UNCC/test with accepted test outputs
Added inputs/UNCC/maps
xml_func.py: _date: month: Convert month names to numbers before casting everything to int
xml_func.py: _date: Refactored to convert items to dict right away, and use iteritems() for later type conversion. This will enable month names to be converted before casting everything to int.
mappings/Makefile: Sort mappings/DwC.self.specimens.csv so that entries can more easily be found when using it as a DwC terms reference
Added inputs/UNCC
Added inputs/U/test with accepted test outputs
inputs/U/maps/DwC.specimens.csv: Mapped most of the remaining fields
input.Makefile: Clean up via maps when they change by subtracting the via format's self map from the via map (the comments column is ignored in determining which entries are redundant, and empty entries with a matching input column are also removed)
subtract: Fixed bug where entries were removed even if maps were not combinable and ignore was off
union: Fixed bug where combinable was not saved for use in deciding whether to add entries in map 1 that weren't already defined
inputs/U/maps: Set svn props
subtract: Also remove nonexplicit empty mappings whose input col is in map 1
maps.py: Added is_nonexplicit_empty_mapping()
subtract: Use new maps.combinable() to compare column headers, which allows more flexibility in combining maps
union: Use new maps.combinable()
maps.py: Added col_label() and combinable()
union: Use new strings.overlaps()
strings.py: Added overlaps()
vegbien.sql: Fixed sytnax error in taxonclass enum: missing comma at end of element
inputs/*/maps/DwC.specimens.csv: Ran through `cols *` to standardize CSV format to that generated by Python
cols: If column number of "*" given, get all columns
bin/subtract: If no compare columns given, compare on all columns instead of column 0
util.py: list_subset(): Support special idxs value None, which returns entire list
cat_csv: Added support for using - to cat stdin
Added inputs/U/maps
Added inputs/U
Put inputs/REMIB/src/remib_raw.0.header.specimens.txt under version control
Added inputs/REMIB/test with accepted test outputs
Added inputs/REMIB/maps
inputs/NCU-NCSC/maps/DwC.specimens.csv: Removed State->StateProvince mapping because that is now in mappings/DwC1-DwC2.specimens.csv
mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema
Added inputs/REMIB
bin/map: Deal with fields that may be in the dataset under more than one prefix by getting all fields and coalesce()ing them (e.g. SpeciesLink has dwcore* and darwin1* columns for the same DwC field)
util.py: Added coalesce()
xpath_func.py: process(): Fixed bug where XPath elem's other_branches were not also processed