input.Makefile: Removed no longer needed src join maps
input.Makefile: Generate VegBIEN maps from full via maps in order to include all input columns if a src map was provided. This causes the VegBIEN join process to produce all the "No join mapping" errors for that datasource, not just those for fields in the (non-full) via map. maps/src.join.*.csv should no longer be needed for producing "No join mapping" errors.
mappings/Makefile: Generate DwC-VegBIEN.specimens.csv from new intermediate DwC.ci-VegBIEN.specimens.csv using $(removeEmpty) so that "No join mapping" errors will be reported when maps are joined to it. Deprecate DwC-VegBIEN.specimens.no_empty.csv because it's now identical to DwC-VegBIEN.specimens.csv.
Added inputs/NY/maps/src.specimens.csv
Added reverse_join to inner-join two map spreadsheets in the opposite order they are specified in
input.Makefile: Intersect the generated VegBIEN and full via maps with the src map, if it exists. This reduces the size of the autogen maps significantly by including only the entries used by the datasource.
intersect: Compare columns based on specified compare_col_nums, just like subtract
input.Makefile: Use var $(selfMap) instead of spelling out $(bin)/cols 0 0
mappings/DwC2-VegBIEN.specimens.csv: Mapped continent
inputs/SpeciesLink/maps/DwC.specimens.csv: Mapped remaining fields
inputs/SpeciesLink/maps/src.specimens.csv: Fixed bug where prefixes had not been removed from fields, which prevented join mappings from being found for any of the fields
main Makefile: Added missing_joins to determine which input fields are missing join mappings
xml_func.py: SyntaxException: Inherit from exc.ExceptionWithCause so the traceback will be populated with the cause's traceback instead of the SyntaxException wrapper's traceback
Added inputs/UNCC/test with accepted test outputs
Added inputs/UNCC/maps
xml_func.py: _date: month: Convert month names to numbers before casting everything to int
xml_func.py: _date: Refactored to convert items to dict right away, and use iteritems() for later type conversion. This will enable month names to be converted before casting everything to int.
mappings/Makefile: Sort mappings/DwC.self.specimens.csv so that entries can more easily be found when using it as a DwC terms reference
Added inputs/UNCC
Added inputs/U/test with accepted test outputs
inputs/U/maps/DwC.specimens.csv: Mapped most of the remaining fields
input.Makefile: Clean up via maps when they change by subtracting the via format's self map from the via map (the comments column is ignored in determining which entries are redundant, and empty entries with a matching input column are also removed)
subtract: Fixed bug where entries were removed even if maps were not combinable and ignore was off
union: Fixed bug where combinable was not saved for use in deciding whether to add entries in map 1 that weren't already defined
inputs/U/maps: Set svn props
subtract: Also remove nonexplicit empty mappings whose input col is in map 1
maps.py: Added is_nonexplicit_empty_mapping()
subtract: Use new maps.combinable() to compare column headers, which allows more flexibility in combining maps
union: Use new maps.combinable()
maps.py: Added col_label() and combinable()
union: Use new strings.overlaps()
strings.py: Added overlaps()
vegbien.sql: Fixed sytnax error in taxonclass enum: missing comma at end of element
inputs/*/maps/DwC.specimens.csv: Ran through `cols *` to standardize CSV format to that generated by Python
cols: If column number of "*" given, get all columns
bin/subtract: If no compare columns given, compare on all columns instead of column 0
util.py: list_subset(): Support special idxs value None, which returns entire list
cat_csv: Added support for using - to cat stdin
Added inputs/U/maps
Added inputs/U
Put inputs/REMIB/src/remib_raw.0.header.specimens.txt under version control
Added inputs/REMIB/test with accepted test outputs
Added inputs/REMIB/maps
inputs/NCU-NCSC/maps/DwC.specimens.csv: Removed State->StateProvince mapping because that is now in mappings/DwC1-DwC2.specimens.csv
mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema
Added inputs/REMIB
bin/map: Deal with fields that may be in the dataset under more than one prefix by getting all fields and coalesce()ing them (e.g. SpeciesLink has dwcore* and darwin1* columns for the same DwC field)
util.py: Added coalesce()
xpath_func.py: process(): Fixed bug where XPath elem's other_branches were not also processed
row: Don't prepend header row because this feature prevents the program from being used on a pipeline. Sheets may be constructed in a pipeline if multiple segments need to be joined, e.g. with cat_csv.
Added row to get a row of a spreadsheet, preceded by the header row
bin programs: Fixed bug in Usage message where program name was not printed because unset variable $self was used instead of $0
xml_func.py: _nullIf: types_by_name: Use strings.ustr instead of str to support Unicode values
xml_func.py: _nullIf: If value not convertible, return it, because can't equal null. Refactored to store types by name in a dict instead of using if statements.
units.py: convert(): raise MissingUnitsException if quantity doesn't have units. MissingUnitsException: Take Quantity input instead of str.
inputs/NCU-NCSC/maps/DwC.specimens.csv: "Cultivated?": For clarity, use _map instead of _if to translate boolean to "cultivated". Translate "No" to "wild" (the opposite of "cultivated") to store an explicit not-cultivated as such.
xml_func.py: _map: empty map entry means None
xml_func.py: _avg: Support empty inputs by returning None. Moved _range after _rangeStart/_rangeEnd since it's less frequently used.
units.py: Restructured to use a Quantity object for the units-tagged value and conversion functions quantity2str() and str2quantity() to convert between that and a raw string. Added convert() with basic support for removing units and passing through matching units. xml_func.py: _units: Added "to" attr. VegBIEN mappings: Remove units using new _units "to" attr instead of temporary workaround in _units.
xml_func.py: _units: default units attr renamed to default to clarify that it's not the units you're converting to
xml_func.py: Added documentation labels to each section of XML functions
Moved units-related functions from format.py to new units.py
lib/*.py: Removed svn:executable property to turn execute bit off
vegbien.sql: growthform (and taxonclass) enum: Added options suggested by Michael Lee. Removed "woody". establishmentmeans_dwc (and taxonclass) enum: Reordered to match order of taxonoccurrence boolean fields, and to place each option next to its opposite. taxonclass enum: Moved "woody" to bottom because it's no longer part of growthform.
VegBIEN mappings: distance fields: Remove units
xml_func.py: _units: Allow value to be NULL
xml_func.py: _units: Use new format.cleanup_units() to do units parsing
format.py: Added clean_numeric(), str2int(), str2float(). Added units-related functions. Added documentation labels to each section.
Added filter_errors to filters `map` error messages
Renamed bin/errors_filter_* to filter_errors_* to sound more natural and to have a different prefix than error_stats so that both can easily be tab-completed at the command line
README.TXT: Testing: Added instructions for testing just mapping process, just map spreadsheet generation, and everything
root Makefile: Added test-all for most complete coverage. Removed extraneous ";" at the end of the prerequisites line of rules with a recipe.
mappings/Makefile: Use new ci_map to make DwC.cs-VegBIEN.specimens.csv case-insensitive
Added ci_map to make a map spreadsheet case-insensitive.
mappings: DwC: Generate case-insensitive map of DwC1 and DwC2 together, rather than just DwC2. DwC1-DwC2.specimens.csv: Make input columns lowercase so that case-insensitization will work properly.
inputs/SpeciesLink: Switched to using flat files instead of DB
inputs/MO: Switched to using flat files instead of DB
input.Makefile: Mapping: Support multiple segments of a source table flat file. Use with_cat_csv if flat file segment(s) are available; otherwise use the input file in $+ or the input database, if any. Don't look for an explicit CSV header file because it can now be handled as the first segment if appropriately named.
Added with_cat_csv
with_cat: Added support for custom cat command in env var
cat_csv: Abort if output stream closed instead of exiting with an IOError
cat_csv: Ignore any duplicated headers instead of requiring each CSV to have a header identical to the first. Rewrote to pass the CSVs through as lines rather than parsing each row. Because the CSVs are not parsed, checked that all CSVs have the same dialect.
csvs.py: Added csv modifications to compare Dialect instances
util.py: Added classes_eq()
csvs.py: Added stream_info() to return NamedTuple {header_line, dialect} for later use in cat_csv. Changed reader_and_header() to use stream_info().
util.py: Added NamedTuple
csvs.py: reader_and_header(): Restrict delimiters to common delimiters so that e.g. letters are not considered delimiters just because they appear frequently
Renamed inputs/NYBG to inputs/NY to match herbarium code
Renamed inputs/UNC-NCSC to inputs/NCU-NCSC to match herbarium code
Renamed inputs/UArizona to inputs/ARIZ to match herbarium code
Regenerated inputs/MO/maps/src.join.specimens.csv
Renamed inputs/MOBOT to inputs/MO to match herbarium code
Regenerated vegbien.ERD exports
vegbien.sql: taxonoccurrence: Added cultivatedbasis
vegbien.sql: Moved all accessioncode fields to the bottom of their tables. vegbien.ERD.mwb: Adjusted lines to remove overlaps.
vegbien.sql: taxonoccurrence: Added iscultivated, isnative. Moved accessioncode to bottom.
vegbien.sql: Changed taxonoccurrence.growthform type to more specific growthform