xml_func.py: Added _units
vegbien.sql: soilobs: Converted user-defined fields to first-class. Labeled appropriate fields as "fraction".
VegBIEN mappings: Changed tableRecord_ID to tablerecord_id to match PostgreSQL field name
DwC2-VegBIEN mapping: Adjusted user-defined mappings
vegbien.sql: userdefined: Made userdefinedname NOT NULL. userdefined, definedvalue: Added unique constraints.
VegX-VegBIEN mapping: Mapped userdefined fields to new first-class fields
xml_func.py: Added _map and _replace
Regenerated vegbien.ERD exports
vegbien.ERD.mwb: Fixed lines. Expanded truncated tables where there was room.
vegbien.sql: locationevent: Added temperature and precipitation
vegbien.sql: aggregateoccurrence: Added growthform
vegbien.ERD.mwb: Reversed the locations of soiltaxon and soilobs to give soilobs room to add new fields
vegbien.sql: Removed embargo table and emb_* fields because we're using a central field, location.confidentialitystatus, for embargo information and coordinate fuzzing
vegbien.sql: stemobservation: Added heightfirstbranch
vegbien.sql: stemobservation: Added diameteraccuracy. Reordered fields.
VegBIEN: stemobservation: Renamed diameter to diameterbreastheight to be more accurate
vegbien.ERD.mwb: Expanded tables where there was room
DwC mappings: Fixed user-defined field mappings according to Brad Boyle's changes
vegbien.sql: Changed specimenreplicate_unique_collectionnumber constraint to include verbatimcollectorname because collection number is assigned by collector
VegBIEN: Moved taxonoccurrence.verbatimcollectorname to specimenreplicate and aggregateoccurrence so that it can be used in specimenreplicate duplicate elimination
mappings/DwC1-DwC2.specimens.csv: Notes mapping: Removed extraneous /_merge/1
input.Makefile: svn_props: Removed no longer needed items from input dir svn:ignore
input.Makefile: verify: Fixed bug for inputs without a .ref where $(wildcard) wouldn't recheck the file after verify/%.out is run, so the verify output wasn't printed
input.Makefile: Moved verify files into separate subdir
bin/map: Changed root label data format convention to datasrc[data_format] so datasource names containing hyphens would not have the part after the - treated as the data format
inputs maps: Changed input root labels to match dir names since verify expects these to be the same
input.Makefile: verify: Fixed bug where datasource name was not set for non-DB inputs
input.Makefile: Removed no longer needed default verify action for dirs with no verify.ref's
input.Makefile: verify: Made verifications table-specific
input.Makefile: import: Merged import and import-all because they do the same thing
input.Makefile: verify: Started rearranging to allow different verifies for each table
Moved verify.sql to mappings since it's mapping-related
input.Makefile: Changed option nolog to log so that options aren't specified in the negative
input.Makefile: svn ignore .trace files
input.Makefile: Profile imports into a .trace file unless env var profile=""
xml_func.py: _alt: On empty input, return None instead of raising SyntaxException because empty input should be OK
xml_func.py: _alt: Fixed bug where not specifying any item would crash the program instead of raising a SyntaxException
Factored verify.sql out into schemas dir
input.Makefile: verify: Print diff in two columns if verbose=1
inputs/SALVIAS/verify.sql: When filtering by datasource name, use an AND clause in the JOIN party's ON condition instead of a separate WHERE statement, so that the datasource filtering code is all on the same line
inputs/SALVIAS/verify.sql: Use new :datasource variable instead of literal 'SALVIAS'
input.Makefile: Provide the verify.sql script a :datasource variable set to the datasource name (in quotes)
vegbien.ERD.mwb: Re-marked aggregateoccurrence:plantobservation relationship as 1:1 in the ERD
bin/map: DB, CSV inputs: Use column indexes instead of column names to look up each field (optimization to avoid repeated dict lookups of the same key)
util.py: ListDict: str(): Print each entry on its own line, in the order the keys were provided
NYBG-DwC maps: Filter out MinimumElevation = "."
xml_dom.py: NodeTextEntryIter: Filter out empty entries (instead of producing an entry with an explicit None value, which causes problems with XML funcs that can't handle Nones)
NYBG-DwC maps: Map to input fields with XML func appended whenever possible (DwC1->DwC2 translation is done by DwC-VegBIEN.specimens.csv)
vegbien.sql: Renamed methodtaxonclass.description to methodtaxonclass.taxonclass and changed it to a closed list (enum taxonclass). method.description can still be used for freeform taxonclass inclusions/exclusions.
DwC1-DwC2.specimens.csv: Removed no longer needed /_alt/2 XML func from date mappings (you will only ever map either the full date or the year/month/day)
DwC mappings: Moved DwC1's CoordinatePrecision /_noCV/value XML func suffix to DwC2-VegBIEN.specimens.csv
mappings: Removed mappings for XML func suffixes of a path because they are now automatically created heuristically by join
join: Added heuristic search for a match on a parent path, so that every XML func suffix of a path doesn't need its own mapping
vegbien.sql: Added method.pointsperline. Rearranged ERD after removing role fkeys.
filter_ERD.csv: Remove role fkeys
vegbien.sql: aggregateoccurrence: Added linecover
vegbien.sql: methodtaxonclass: Added description comment with list of values (which may become a closed list)
vegbien.sql: Changed lengthunits to m in all comments
vegbien.sql: method: Added subplotspacing and subplotmethod_id
vegbien.sql: method: Removed lengthunits and instead require all length- or area-related measurements throughout VegBIEN to be converted to SI base units, e.g. cm -> m, ha -> m^2. Adjusted ERD to avoid some densely packed lines.
vegbien.sql: methodtaxonclass: Added description field for taxon classes that don't fit well into a plantconcept. Made at least one of plantconcept_id or description required. Added unique constraint.
SALVIAS verifications: Use count(DISTINCT) instead of nested SELECT DISTINCT
VegBIEN verifications: Select only the records for the datasource being verified
SALVIAS verifications: Fixed to exclude subplots from locations/location events and uniqify locations based on coords
inputs/SALVIAS/verify.sql: Updated for schema changes
vegbien.ERD.mwb: Re-marked aggregateoccurrence:plantobservation relationship as 1:1 in the ERD. (I think this will need to be manually re-marked whenever either of those tables is updated.)
vegbien.sql: Removed methodgrowthform and growthform, since growthforms can be accommodated by plantconcept in a similar way as higher-order taxonomic ranks
vegbien.sql: methodgrowthform, methodtaxonclass: Removed "included" default value so it's always obvious whether the author intended the classes to be inclusions or exclusions
vegbien.sql: aggregateoccurrence: Removed unneeded fields. Added aggregateoccurrence->coverindex fkey.
vegbien.sql: Added constraint to enforce 1:1 aggregateoccurrence:plantobservation relationship
vegbien.sql: Added plantname unique constraint
bin/map: Use new util.ListDict and util.WrapIter to simplify getting rows by column name instead of index, and to enable a row to be printed with its column names in error messages
util.py: Added WrapIter to wrap an iterator and ListDict to view a list as a dict
bin/map: Use new util.list_flip()
util.py: Added list_flip()
env_password: Fixed to set the environment variable in the calling shell. Do this by cc-ing the tty only on messages before the "Enter password" prompt, because the redirect creates a subshell which causes the env var to only be set within that subshell.
inputs/NYBG-CSV/maps/DwC.specimens.csv: Removed mappings that are already present in mappings/DwC1-DwC2.specimens.csv. This map now contains only the mappings where NYBG-CSV differs from standard DwC1.
inputs/NYBG/maps/DwC.specimens.csv: Removed mappings that are already present in mappings/DwC1-DwC2.specimens.csv. This map now contains only the mappings where NYBG differs from standard DwC1.
Remove accidentally-committed temp file inputs/NYBG/DwC.specimens2.csv
mappings/Makefile: Generate DwC.self.specimens.csv from DwC-VegBIEN.specimens.csv for use in creating full via maps for inputs
input.Makefile: Generate full via maps from input via maps by appending mappings from the via format to itself when available
inputs/NYBG/maps/DwC.specimens.csv: Changed label to "NYBG-DwC" to take advantage of automatic filling in of DwC mappings not specified in the NYBG map
subtract: Support custom column numbers to compare on (instead of just input col). Added ignore option to continue even if input columns don't match.
bin/map: DB inputs: Get all rows in one query (hopefully a significant optimization). Allow maps to contain entries for columns that are not in the DB table.
sql.py: select(): Select all fields if fields == None. Replaced col(cur, idx) with col_names(cur) because an iterator is easier to use than getting by index.
bin/map: Fixed bug in previous implementation of allowing maps for CSV inputs to contain entries for columns that are not in the CSV file
bin/map: Allow maps for CSV inputs to contain entries for columns that are not in the CSV file
Use new sort_map instead of manually specifying the sort order
Added sort_map to sort a map spreadsheet in the standard order
Removed no longer needed join_passthru, because join_union_sort now serves its purpose
Don't generate mappings/for_review/DwC-VegBIEN.specimens.csv because it's a derived map with lots of duplicated mappings for the various DwC versions
mappings/Makefile: Generate DwC-VegBIEN.specimens.csv directly from DwC1-DwC2 and DwC2-VegBIEN mappings by using join_union_sort with header_num=1, rather than via intermediate DwC1-VegBIEN.specimens.csv
union: Added header_num option to select which map's header to use as the output header
Rename join_sort to join_union_sort and have it run union in ignore mode. This will automatically append the joined map when the input map is a derivative of the joined map, such as for NYBG-DwC.