join: Fixed bug in "No join mapping" error generation where rows with no existing comments column would cause an IndexError
util.py: Added list_set() and list_setdefault()
inputs/UArizona/maps/DwC.specimens.csv: Merge FieldNotes and Remarks
inputs/UArizona/maps/DwC.specimens.csv: Finished mappings
inputs/UArizona/maps/DwC.specimens.csv: Removed fields already present in DwC mappings
inputs/NYBG-CSV/maps/DwC.specimens.csv: Removed mappings already present in case-insensitive DwC2 mapping
inputs/NYBG/maps/DwC.specimens.csv: Removed mappings already present in case-insensitive DwC2 mapping
mappings/DwC1-DwC2.specimens.csv: Removed fields already present in DwC2.ci-VegBIEN.specimens.csv
Makefiles: Moved remake into main Makefile. Fixed remake to run `make all` in a new make so that cache of existing files is reset. Have main remake run clean and then all instead of forwarding remake to subdirs, so that everything is cleaned before everything is remade.
input.Makefile: maps: maps/$(via).%.full.csv: Fixed bug where $(selfMap) would be ignored if it had not yet been made
mappings/Makefile: Reorganized into DwC and VegX sections
Added autogenerated mappings/DwC2.ci-VegBIEN.specimens.csv. Use it to include DwC2 fields with first letter uppercased in the full DwC mapping, so that datasources that use DwC2 terms with a different case can still use the DwC2 mapping.
inputs/UArizona/maps/DwC.specimens.csv: Mapped CollectedDate to eventDate/_alt/2 even though it's not used because other datasources might copy these mappings and want it already filled in
Added ucase_first to uppercase the first character of columns in a spreadsheet
Added inputs/UArizona/maps/DwC.specimens.csv autogen maps
inputs/UArizona/maps/DwC.specimens.csv: Mapped more fields
mappings/DwC1-DwC2.specimens.csv: Remove date -> date/_alt/2 mappings because they prevent the original DwC2 date field from being mapped to without an extra /_alt/2 appended
xml_func.py: Use new dates.strtotime(). When component date parts specified, year defaults to dates.epoch.year.
dates.py: Added strtotime() to wrap dateutil.parser.parse() with default defaulting to epoch, so that e.g. months with day missing default to day 1 instead of the current day of the month
mappings/DwC1-DwC2.specimens.csv: Map eventDate,dateIdentified using /_alt/2 and year/month/day using /_alt/1 so that inputs with both a date and date parts will select between the two
input.Makefile: Added comment that self map must be made first if it's needed for maps/$(via).%.full.csv
Makefiles: Use .SECONDARY with no prerequisites instead of setting a .PRECIOUS for each intermediate, to simplify turning off automatic deletion of intermediate files
inputs/UArizona: Added initial maps/DwC.specimens.csv
DwC mappings: Map datasource name via institutionID to avoid conflicting with existing institutionCode fields that many DwC data sources have
input.Makefile: Don't profile by default because it appears to slow things down significantly on long imports
Added inputs/UArizona/maps
Makefile: python-Linux: Added python-profiler
specimens verification: Added # binomials test
vegbien.sql: specimenreplicate: Removed specimenreplicate_unique_collectionnumber index because the collectionnumber (NYBG FieldNumber) is not always unique within a collector, even though it should be. Changed specimenreplicate_unique_catalognumber to only operate on rows with no sourceaccessioncode (of which there are 8 in NYBG).
mappings/verify.specimens.sql: # species test: Fixed to join separately on taxondeterminations for genus and species. # genera test: Removed no longer needed join on party.
vegbien.sql: specimenreplicate: Added fki index on taxonoccurrence_id
vegbien.sql: plantname: Added index on rank to speed up specimens verifications, where the query planner insists on joining from plantname to specimenreplicate instead of the other way around (which takes much longer without the index)
mappings/verify.*: Use nested SELECT instead of JOIN on party to get datasource_id, so that party will not be joined on after other joins have already occurred (which slows things down)
vegbien.sql: party: Changed party_unique_name to ignore NULL values and the organizationname (a first(+middle)+last name is considered unique)
vegbien.sql: party: Added party_unique_organizationname constraint
Specimens verification: Added # genera and # species
input.Makefile: verify: Create target dir if it doesn't exist
inputs/NYBG: Added verify/specimens.ref.sql
Added mappings/verify.specimens.sql
Added inputs/NYBG-CSV/verify/
Makefile: Print done message after verify
VegX-VegBIEN mapping: Use new lookup-only element syntax to ensure that stemtag 1 is not created if it doesn't exist when stemtag 2 tries to set its iscurrent status to false. This should fix the 136 "NullValueException: columns: tag" errors in the SALVIAS organisms import.
xpath.py: get(): Added support for lookup-only elements which are not created if they don't exist
xpath.py: parse(): Added support for lookup-only elements which are not created if they don't exist
VegX-VegBIEN mapping: Map stemtags using [] instead of :[] for attrs that are really keys
Regenerated vegbien.ERD exports
VegX-VegBIEN mapping: Handle user-defined field voucherType (SALVIAS DetType) by mapping specimenreplicates for voucherTypes other than direct via voucher
xml_func.py: Added _if and _eq. Added cast() to throw SyntaxException if can't cast and use it in conv_items(). _merge: Check types of input using conv_items(strings.ustr, items).
util.py: Added all_not_none() and bool2str()
strings.py: Added ustr() (like built-in str() but converts to unicode object)
PostgreSQL-MySQL.csv: Fixed bug in removal of casts of default values, which treated NOT NULL as part of the datatype
VegBIEN: soilobs: Added default value for horizon. Adjusted mappings to remove now-unecessary horizon value.
repl: Removed automatic case-insensitivity because Python apparently only supports turning on case-insensitivity via (?i) but not off via (?-i) (as Java does)
VegBIEN: soilobs: Removed soil* prefix from fields
VegX-VegBIEN mapping: Map to new soilobs fields
SALVIAS inputs: Use new _units:[units="%"] on soil fields that are percents. Replace "<..." values with 0.
xml_func.py: Added _units
vegbien.sql: soilobs: Converted user-defined fields to first-class. Labeled appropriate fields as "fraction".
VegBIEN mappings: Changed tableRecord_ID to tablerecord_id to match PostgreSQL field name
DwC2-VegBIEN mapping: Adjusted user-defined mappings
vegbien.sql: userdefined: Made userdefinedname NOT NULL. userdefined, definedvalue: Added unique constraints.
VegX-VegBIEN mapping: Mapped userdefined fields to new first-class fields
xml_func.py: Added _map and _replace
vegbien.ERD.mwb: Fixed lines. Expanded truncated tables where there was room.
vegbien.sql: locationevent: Added temperature and precipitation
vegbien.sql: aggregateoccurrence: Added growthform
vegbien.ERD.mwb: Reversed the locations of soiltaxon and soilobs to give soilobs room to add new fields
vegbien.sql: Removed embargo table and emb_* fields because we're using a central field, location.confidentialitystatus, for embargo information and coordinate fuzzing
vegbien.sql: stemobservation: Added heightfirstbranch
vegbien.sql: stemobservation: Added diameteraccuracy. Reordered fields.
VegBIEN: stemobservation: Renamed diameter to diameterbreastheight to be more accurate
vegbien.ERD.mwb: Expanded tables where there was room
DwC mappings: Fixed user-defined field mappings according to Brad Boyle's changes
vegbien.sql: Changed specimenreplicate_unique_collectionnumber constraint to include verbatimcollectorname because collection number is assigned by collector
VegBIEN: Moved taxonoccurrence.verbatimcollectorname to specimenreplicate and aggregateoccurrence so that it can be used in specimenreplicate duplicate elimination
mappings/DwC1-DwC2.specimens.csv: Notes mapping: Removed extraneous /_merge/1
input.Makefile: svn_props: Removed no longer needed items from input dir svn:ignore
input.Makefile: verify: Fixed bug for inputs without a .ref where $(wildcard) wouldn't recheck the file after verify/%.out is run, so the verify output wasn't printed
input.Makefile: Moved verify files into separate subdir
bin/map: Changed root label data format convention to datasrc[data_format] so datasource names containing hyphens would not have the part after the - treated as the data format
inputs maps: Changed input root labels to match dir names since verify expects these to be the same
input.Makefile: verify: Fixed bug where datasource name was not set for non-DB inputs
input.Makefile: Removed no longer needed default verify action for dirs with no verify.ref's
input.Makefile: verify: Made verifications table-specific
input.Makefile: import: Merged import and import-all because they do the same thing
input.Makefile: verify: Started rearranging to allow different verifies for each table
Moved verify.sql to mappings since it's mapping-related
input.Makefile: Changed option nolog to log so that options aren't specified in the negative
input.Makefile: svn ignore .trace files
input.Makefile: Profile imports into a .trace file unless env var profile=""
xml_func.py: _alt: On empty input, return None instead of raising SyntaxException because empty input should be OK
xml_func.py: _alt: Fixed bug where not specifying any item would crash the program instead of raising a SyntaxException
Factored verify.sql out into schemas dir
input.Makefile: verify: Print diff in two columns if verbose=1
inputs/SALVIAS/verify.sql: When filtering by datasource name, use an AND clause in the JOIN party's ON condition instead of a separate WHERE statement, so that the datasource filtering code is all on the same line