inputs/*/maps/DwC.specimens.csv: Ran through `cols *` to standardize CSV format to that generated by Python
cols: If column number of "*" given, get all columns
bin/subtract: If no compare columns given, compare on all columns instead of column 0
util.py: list_subset(): Support special idxs value None, which returns entire list
cat_csv: Added support for using - to cat stdin
Added inputs/U/maps
Added inputs/U
Put inputs/REMIB/src/remib_raw.0.header.specimens.txt under version control
Added inputs/REMIB/test with accepted test outputs
Added inputs/REMIB/maps
inputs/NCU-NCSC/maps/DwC.specimens.csv: Removed State->StateProvince mapping because that is now in mappings/DwC1-DwC2.specimens.csv
mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema
Added inputs/REMIB
bin/map: Deal with fields that may be in the dataset under more than one prefix by getting all fields and coalesce()ing them (e.g. SpeciesLink has dwcore* and darwin1* columns for the same DwC field)
util.py: Added coalesce()
xpath_func.py: process(): Fixed bug where XPath elem's other_branches were not also processed
row: Don't prepend header row because this feature prevents the program from being used on a pipeline. Sheets may be constructed in a pipeline if multiple segments need to be joined, e.g. with cat_csv.
Added row to get a row of a spreadsheet, preceded by the header row
bin programs: Fixed bug in Usage message where program name was not printed because unset variable $self was used instead of $0
xml_func.py: _nullIf: types_by_name: Use strings.ustr instead of str to support Unicode values
xml_func.py: _nullIf: If value not convertible, return it, because can't equal null. Refactored to store types by name in a dict instead of using if statements.
units.py: convert(): raise MissingUnitsException if quantity doesn't have units. MissingUnitsException: Take Quantity input instead of str.
inputs/NCU-NCSC/maps/DwC.specimens.csv: "Cultivated?": For clarity, use _map instead of _if to translate boolean to "cultivated". Translate "No" to "wild" (the opposite of "cultivated") to store an explicit not-cultivated as such.
xml_func.py: _map: empty map entry means None
xml_func.py: _avg: Support empty inputs by returning None. Moved _range after _rangeStart/_rangeEnd since it's less frequently used.
units.py: Restructured to use a Quantity object for the units-tagged value and conversion functions quantity2str() and str2quantity() to convert between that and a raw string. Added convert() with basic support for removing units and passing through matching units. xml_func.py: _units: Added "to" attr. VegBIEN mappings: Remove units using new _units "to" attr instead of temporary workaround in _units.
xml_func.py: _units: default units attr renamed to default to clarify that it's not the units you're converting to
xml_func.py: Added documentation labels to each section of XML functions
Moved units-related functions from format.py to new units.py
lib/*.py: Removed svn:executable property to turn execute bit off
vegbien.sql: growthform (and taxonclass) enum: Added options suggested by Michael Lee. Removed "woody". establishmentmeans_dwc (and taxonclass) enum: Reordered to match order of taxonoccurrence boolean fields, and to place each option next to its opposite. taxonclass enum: Moved "woody" to bottom because it's no longer part of growthform.
VegBIEN mappings: distance fields: Remove units
xml_func.py: _units: Allow value to be NULL
xml_func.py: _units: Use new format.cleanup_units() to do units parsing
format.py: Added clean_numeric(), str2int(), str2float(). Added units-related functions. Added documentation labels to each section.
Added filter_errors to filters `map` error messages
Renamed bin/errors_filter_* to filter_errors_* to sound more natural and to have a different prefix than error_stats so that both can easily be tab-completed at the command line
README.TXT: Testing: Added instructions for testing just mapping process, just map spreadsheet generation, and everything
root Makefile: Added test-all for most complete coverage. Removed extraneous ";" at the end of the prerequisites line of rules with a recipe.
mappings/Makefile: Use new ci_map to make DwC.cs-VegBIEN.specimens.csv case-insensitive
Added ci_map to make a map spreadsheet case-insensitive.
mappings: DwC: Generate case-insensitive map of DwC1 and DwC2 together, rather than just DwC2. DwC1-DwC2.specimens.csv: Make input columns lowercase so that case-insensitization will work properly.
inputs/SpeciesLink: Switched to using flat files instead of DB
inputs/MO: Switched to using flat files instead of DB
input.Makefile: Mapping: Support multiple segments of a source table flat file. Use with_cat_csv if flat file segment(s) are available; otherwise use the input file in $+ or the input database, if any. Don't look for an explicit CSV header file because it can now be handled as the first segment if appropriately named.
Added with_cat_csv
with_cat: Added support for custom cat command in env var
cat_csv: Abort if output stream closed instead of exiting with an IOError
cat_csv: Ignore any duplicated headers instead of requiring each CSV to have a header identical to the first. Rewrote to pass the CSVs through as lines rather than parsing each row. Because the CSVs are not parsed, checked that all CSVs have the same dialect.
csvs.py: Added csv modifications to compare Dialect instances
util.py: Added classes_eq()
csvs.py: Added stream_info() to return NamedTuple {header_line, dialect} for later use in cat_csv. Changed reader_and_header() to use stream_info().
util.py: Added NamedTuple
csvs.py: reader_and_header(): Restrict delimiters to common delimiters so that e.g. letters are not considered delimiters just because they appear frequently
Renamed inputs/NYBG to inputs/NY to match herbarium code
Renamed inputs/UNC-NCSC to inputs/NCU-NCSC to match herbarium code
Renamed inputs/UArizona to inputs/ARIZ to match herbarium code
Regenerated inputs/MO/maps/src.join.specimens.csv
Renamed inputs/MOBOT to inputs/MO to match herbarium code
Regenerated vegbien.ERD exports
vegbien.sql: taxonoccurrence: Added cultivatedbasis
vegbien.sql: Moved all accessioncode fields to the bottom of their tables. vegbien.ERD.mwb: Adjusted lines to remove overlaps.
vegbien.sql: taxonoccurrence: Added iscultivated, isnative. Moved accessioncode to bottom.
vegbien.sql: Changed taxonoccurrence.growthform type to more specific growthform
vegbien.sql: Added growthform and establishmentmeans_dwc enums using values from taxonclass. Documented that taxonclass is growthform + establishmentmeans_dwc + some other values.
VegBIEN: Moved aggregateoccurrence.growthform to taxonoccurrence
Added inputs/UNC-NCSC/maps/src.join.specimens.csv
VegBIEN: Merged aggregateoccurrence.verbatimcollectorname and specimenreplicate.verbatimcollectorname into taxonoccurrence
xml_func.py: parse_range(): Handle negative numbers by treating them as not a range
Added inputs/UNC-NCSC/test with initial accepted test outputs
Added inputs/UNC-NCSC/maps
xml_func.py: _replace: Fixed bug where value entry was not unpacked
Added inputs/UNC-NCSC
Added inputs/MOBOT/test with initial accepted test outputs
Added inputs/MOBOT/maps
Added inputs/MOBOT
VegX mappings: Updated plot place mappings to VegX 1.5.3 method of place type-tagged place names. This removes the userdef fields in plot.
VegX mappings: Changed userdef xPosition, yPosition to /relativePlotPosition/relativeX, /relativePlotPosition/relativeY
Regenerated mappings/DwC-VegBIEN.specimens.no_empty.csv
bin/map: map_table(): wrap_row(): Use util.list_as_length() to handle CSV rows of different lengths
util.py: Added list_as_length(). Documented that list_set_length() takes a list, not a tuple. Documented that ListDict must have len(list_) == len(keys).
util.py: Added list_set_length(). Changed list_set() to use list_set_length().
mappings/DwC2-VegBIEN.specimens.csv: Added empty *_id/taxonoccurrence attr to primary keys to ensure that a taxonoccurrence is always created for the specimenreplicate
xml_func.py: _label: Use ustr instead of str when checking types
csvs.py: Set dialect.doublequote to True because Sniffer doesn't turn this on by default
Merged inputs/NYBG-CSV into NYBG
Merged inputs/UArizona-CSV into UArizona
Added inputs/SpeciesLink/test
Added inputs/SpeciesLink/maps
xml_func.py: range-related funcs: Made inputs optional in case they get set to NULL by _nullIf
bin/map: Added support for getting columns with an optional prefix list for DB/CSV inputs
bin/map: Factored out code common to DB and CSV inputs into map_table()
bin/map: Parse any prefixes in map input column name. They will later be used to check for versions of columns with a prefix added when processing CSV/DB inputs.
strings.py: Added split(), remove_prefix(), remove_suffix(), and remove_prefixes(). Added section comments.
mappings/DwC2-VegBIEN.specimens.csv: minimumElevationInMeters: Handle embedded ranges using _rangeStart and _rangeEnd
xml_func.py: Added _rangeStart and _rangeEnd
xpath.py: parse(): Split paths: Raise a SyntaxException if can't attach a split path because there is no parent element to attach to