inputs/NY, ARIZ: FieldNumber: Remapped to recordNumber because term usage was inconsistent with DwC definition. Datasources sometimes confuse this term, because it seems like the collection number, but is actually the author code for the event (VegBank's authorObsCode).
schemas/vegbank.ERD.pdf: Restored to VegBank ERD, which had gotten overwritten when the vegbien.ERD exports were regenerated
mappings/DwC1-DwC2.specimens.csv: Removed Source column and source-related comments because this information is now maintained in mappings/Veg+.terms.csv
mappings/DwC2-VegBIEN.specimens.csv: Removed Source column because this information is now maintained in mappings/Veg+.terms.csv
mappings/VegCSV-VegBIEN.specimens.csv: Removed Source column and source-related comments because this information is now maintained in mappings/Veg+.terms.csv
Added mappings/Veg+.terms.csv, which will serve the purpose of listing all available terms with their source. This will remove the need to store the sources in the mappings, where they are out of place and difficult to maintain during refactoring.
mappings/VegX-VegCSV.stems.csv: Removed Comments and Source columns because this information is now maintained in mappings/VegCSV-VegBIEN.specimens.csv. This will simplify later VegCSV refactoring, because the Comments and Source columns will not need to be changed along with the VegCSV column.
mappings/VegCSV-VegBIEN.specimens.csv: Removed Comments and Source columns because this information is now maintained in mappings/VegCSV-VegBIEN.specimens.csv. This will simplify later VegCSV refactoring, because the Comments and Source columns will not need to be changed along with the VegCSV column.
mappings/VegCSV-VegBIEN.specimens.csv: Changed plotID to locationID and parentPlotID to parentLocationID to use DwC-related terms
mappings/DwC2-VegBIEN.specimens.csv: collectionID: Fixed mapping to point to collectioncode_dwc instead of collectionnumber, as this is an ID of the collection rather than within it
inputs/import.stats.xls: Updated with stats from latest import
schemas: Renamed vegbien.ERD.pdf to vegbien.ERD.1_pg.pdf since it's not the primary PDF that should be used, due to its slow load time
Regenerated vegbien.ERD exports
schemas/vegbien.sql: specimenreplicate: specimenreplicate_plantobservation_1_to_1: Only apply when sourceaccessioncode and catalognumber_dwc are NULL, in order to support multiple specimenreplicates for one plant in plots data. specimenreplicate_unique_catalognumber: Added plantobservation_id, so that catalognumber_dwc (a sort of authorSpecimenCode for plots data) only needs to be unique within a plant. Eventually, we will want to migrate the mappings so that collectionnumber is used for this purpose instead.
schemas/vegbien.sql: specimenreplicate: Made plantobservation_id optional again, since indirect vouchers do create specimenreplicates without a parent plantobservation. schemas/vegbien.ERD.mwb: Fixed lines.
schemas/vegbien.sql: specimenreplicate: Made plantobservation_id required, since that is now the parent table fkey
schemas/vegbien.ERD.mwb: Fixed lines
schemas/vegbien.ERD.mwb: Adjusted lines. Adjusted position of locationdetermination to put location directly next to locationevent. Expanded location to fill newly-available space.
schemas/vegbien.sql: locationevent: Renamed authorlocationcode to authoreventcode to be consistent with the table name. Note that for our current datasources, the plot = the plot event, so the authoreventcode and authorlocationcode/authorPlotCode will be the same.
mappings/VegCSV-VegBIEN.specimens.csv: Changed VegCSV term fieldNumber (from DwC) to recordNumber to be consistent with the TDWG meaning of fieldNumber, which defines it as the author code for the event, not the organism (what VegBIEN calls the authorlocationcode and VegBank calls the authorObsCode)
mappings/VegCSV-VegBIEN.specimens.csv: Comments: Removed no longer applicable comments about XPath syntax added to affect sort order
mappings/VegCSV-VegBIEN.specimens.csv: height: Removed mapping to plantobservation.overallheight, since the height is a stem field rather than a plant field. Note that a height in the organisms table will be mapped to the height in a single stemobservation for that plant, with NULL sourceaccessioncode and authorstemcode. Note also that this change is possible because no mapped datasource yet provides a valid overallheight with multiple stems or that differs from its single stem's height. (Although SALVIAS sometimes provides both a stem height and an organism height, that height is always either the same, or the organism height is invalid. See <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/SALVIAS_issues#Some-organisms-have-one-stem-but-different-heights-in-the-organisms-and-stems-tables>.)
mappings/DwC2-VegBIEN.specimens.csv: establishmentMeans: Removed obsolete mapping to growthform, since growthforms and cultivated/native information are no longer merged into one field in VegBIEN (which they were when this mapping was created)
mappings/VegCSV-VegBIEN.specimens.csv: decimalLatitude/decimalLongitude: Added _nullIf suffix for mergability with VegCSV-VegBIEN.specimens.csv
mappings/VegCSV-VegBIEN.specimens.csv: coordinateUncertaintyInMeters: Added _noCV suffix for mergability with VegCSV-VegBIEN.specimens.csv
mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Added _if wrapper for mergability with VegCSV-VegBIEN.specimens.csv
mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber direct voucher _if statement: Changed @name to "if indirect voucher", so that it's logical consistent with the else branch following it. It was previously "if direct voucher" because the _if statement only contained a case for direct vouchers, and the else branch was being used in place of a _not() function.
mappings/roots: plots roots: Default to using VegCSV instead of VegX for new plots datasources
mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber _if statements: Changed @names to more descriptive comments. This also prevents the @name from looking confusingly like the condition of the _if statement, which is actually supplied through the cond param and is usually located in a separate mapping.
mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Split _if apart into separate _ifs for the indirect and direct voucher cases. Moved direct voucher _if inwards so it is just wrapping catalognumber_dwc itself. This will enable this mapping to be used for specimens data, which is always considered a direct voucher and will always have this _if return true. Also moved indirect voucher _if inwards in the same way, so that a future SQL function implementation of _if only needs to concern itself with returning one value or another, not with handling entire XML subtrees. Note that if the indirect voucher _if returns false, NOT NULL and CHECK constraint violations will cause the intervening voucher and specimenreplicate elements to be deleted, thus having the same effect. Use new @name syntax for distinguishing _if statements.
mappings: Removed no longer used for_review/VegBIEN-DwC2.specimens.csv
xml_func.py: _if(): Changed documentation about name param for distinguishing separate _if statements to use @name attribute instead, so that the XML/SQL function mechanism doesn't have to deal with code that's solely for XPath merging
schemas/filter_ERD.csv: Removed no longer applicable specimenreplicate inheritance filters
inputs/import.stats.xls: Updated with stats from latest import. Note that the import now includes additional date parsing on all date fields, which adds 1/2-1 hour to the import time. Eventually, we will want to translate _date() to PL/pgSQL and only use extra date processing if PostgreSQL's cast to timestamp doesn't work, which should greatly reduce this time.
schemas/vegbien.sql: Removed inheritance link between specimenreplicate and taxonoccurrence, which is not needed now that specimenreplicate is mapped via plantobservation. mappings/DwC2-VegBIEN.specimens.csv: As part of this change, moved mappings to specimenreplicate fields inherited from taxonoccurrence to go directly to taxonoccurrence.
schemas/vegbien.ERD.mwb: Synced with schema
mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Default to mapping via plantobservation rather than via voucher when no voucherType is specified, in order to be consistent with the specimens data mapping for catalogNumber
Regenerated mappings/for_review/VegX-VegCSV.stems.csv. Note that running `make mappings/` did not change mappings/VegX-VegCSV.stems.csv, because all changes were deletions of lines.
mappings/VegX-VegCSV.stems.csv: Removed no longer used user-defined terms (simpleUserdefined). Note that CTFS does use user-defined terms, but these are all defined in its own map spreadsheet.
mappings: Removed no longer needed VegX-VegBIEN mappings
mappings/Makefile: Made VegCSV-VegBIEN.specimens.csv a non-derived map, since the VegX-VegCSV mapping is no longer used. This causes automatic creation of a for_review file.
plots inputs: Removed maps/.VegX.*.csv.last_cleanup
plots inputs: Remapped all VegX via maps to VegCSV. See steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegX-%3EVegCSV>.
join: Added map_1_core_only option that uses only columns 0 and 1 of map_1. This is useful for one-time refactoring joins where the Source column, mappings comments, etc. shouldn't be part of the datasource's via map (although they will be part of the autogenerated VegBIEN map)
join: Use opts.env_usage() for usage message
mappings: Made VegCSV-VegBIEN.{plots,organisms,stems}.csv symlinks to VegCSV-VegBIEN.specimens.csv
mappings/Makefile: VegCSV-VegBIEN.specimens.csv: Commented out combining with DwC2-VegBIEN mappings, because merging DwC and VegX/VegCSV into one map is a lower priority than replacing all datasource VegX mappings with VegCSV (which does not require the merging but does require XPaths that don't collide, which is not yet the case)
lib/xml_func.py: _if(): Made then param optional, so that user can just map to the else branch as a shortcut for logically inverting the condition. (Note that a _not() XML function does not exist yet, so this is also a workaround.)
VegBIEN mappings: Wrapped dates in _date() and _dateRangeStart()/_dateRangeEnd(), to assist in importing date and date range values that PostgreSQL cannot parse. This will increase the import time, but hopefully also decrease the # of invalid values in the errors tables. (These functions can later be optimized to reduce the impact on import time.)
sql_io.py: put_table(): is_literals: is_function: Fixed bug where function call needed to be recreated in each iteration of the main loop, because the arguments to the function, which are based on mapping, may change as the result of error handling replacing invalid values with NULL
sql_io.py: put_table(): is_literals: Fixed bug where sql.select() that calls the function needed to be run recoverably, to auto-rollback errors. Made sql.select() cacheable because SQL functions are immutable, so it should be idempotent.
mappings/DwC2-VegBIEN.specimens.csv: Remapped taxonRemarks to taxondetermination.notes because http://rs.tdwg.org/dwc/terms/#taxonRemarks indicates that these notes are "about the taxon", not the specimen/plant in general
mappings/DwC2-VegBIEN.specimens.csv: Remapped eventDate to new aggregateoccurrence.collectiondate, which is a more accurate place than locationevent.obsstartdate/obsenddate because the date refers to a specific specimen. This also makes eventDate compatible with plots data.
mappings/DwC2-VegBIEN.specimens.csv: Moved sex user-defined mapping to plantobservation because it's a property of the plant rather than the specimen, and so that it can also apply to plots data
mappings: Remapped specimenreplicate.description to new aggregateoccurrence.notes because the notes don't necessarily refer specifically to the specimen, especially for plots data
schemas/vegbien.sql: aggregateoccurrence: Added notes, to serve the purpose that specimenreplicate.description previously did. specimenreplicate.description is not appropriate for plots data, and often not appropriate even for specimens data, which uses fieldNotes as a general notes field rather than a description of the specimen.
schemas/vegbien.sql: aggregateoccurrence: Reordered linecover so it's near cover instead of at the end
schemas/vegbien.sql: Moved collectiondate from specimenreplicate to aggregateoccurrence because it's actually the SALVIAS census_date, which is the date the plant was sampled, rather than the DwC eventDate, which is the date the specimen was collected
mappings/DwC2-VegBIEN.specimens.csv: Mapped specimenreplicate via plantobservation for consistency with plots data. (This change is required for VegCSV table merging to work properly.) This is also a more accurate way of representing the data, because a specimen in fact comes from a plant, and it's natural to place the plant-related data (measurements, etc.) in the plantobservation table.
mappings/VegX-VegCSV.stems.csv: Remapped stem notes to new stemNotes term, and mapped new organism notes VegX XPath to now-available DwC fieldNotes
inputs/SALVIAS/maps/VegX.organisms.csv: Map organism notes to different place than stem notes, because these are separate fields
mappings/Makefile: VegCSV-VegBIEN.specimens.csv: Temporarily sort by input column rather than output column, to assist in finding terms that map to different places in the DwC- and VegX-VegBIEN mappings
mappings/Makefile: VegCSV-VegBIEN.specimens.csv: Use new all option to union, in order to manually review inputs which appear in both maps but map to different places
union: Added full flag to turn off merging mappings that are in both maps, in order to review inputs which appear in both maps but map to different places
mappings/Makefile: Merged .VegX-VegCSV.stems.csv.last_cleanup into .%.last_cleanup, since VegX-VegCSV.stems.csv now uses the same cleanup operations as the other non-derived maps. Note that this automatically creates a file in for_review for VegX-VegCSV.stems.csv, which is currently identical to it.
mappings/Makefile: .%.last_cleanup: Removed simplify_xpath because non-derived maps will now have VegX XPaths in their Source column URLs, which should not be modified
mappings/Makefile: VegX-VegCSV.stems.csv: Removed autogeneration command because once file has been generated, regeneration is no longer needed
mappings/Makefile: Fixed bug where VegX-VegCSV.stems.csv needed to be removed from $(vegcsvMaps) so it wouldn't be deleted on `make clean`
mappings/VegX-VegCSV.stems.csv: Source: Put URLs in the order their terms appear in the VegCSV term name
mappings/VegX-VegCSV.stems.csv: Comments: Changed "Table name" to "Table" to be concise
mappings/VegX-VegCSV.stems.csv: Mapped VegX community fields
mappings/VegX-VegCSV.stems.csv: Mapped VegX cover-related fields
mappings/VegX-VegCSV.stems.csv: Changed authorPlantCode to the associated DwC term fieldNumber
mappings/VegX-VegCSV.stems.csv: Changed locationNarrative to the associated DwC term locality
mappings/VegX-VegCSV.stems.csv: Changed collectedDate to the associated DwC term eventDate
mappings/VegX-VegCSV.stems.csv: Added plot prefix to eventStartDate/eventEndDate to distinguish it from the DwC eventDate, which is the date the specimen was collected
mappings/VegX-VegCSV.stems.csv: Order within table: Updated order #s for salvias_plots terms that got changed to SALVIAS data dictionary terms
mappings/VegX-VegCSV.stems.csv: Changed collector name parts to the associated DwC term recordedBy
mappings/VegX-VegCSV.stems.csv: Mapped SALVIAS voucher type
mappings/VegX-VegCSV.stems.csv: Mapped collector name parts
mappings/VegX-VegCSV.stems.csv: Table names ("." prefixes) merged into name where possible, for consistency. computer taxonomic elements have not been merged because the field part should exactly match the corresponding DwC term.
mappings/VegX-VegCSV.stems.csv: Order within table: If Source has multiple URLs, ensure each source has its own order
mappings/VegX-VegCSV.stems.csv: Order within table: Separate orders of multiple elements with "," instead of ";", for consistency with the Source column
mappings/VegX-VegCSV.stems.csv: Changed authorPlotCode terms to a variation of VegX's plotName, for standardization with VegX
mappings/VegX-VegCSV.stems.csv: Changed uniqueIDs with table names to the table name + "ID", for standardization
mappings/VegX-VegCSV.stems.csv: Changed terms with table names to DwC terms where possible
mappings/VegX-VegCSV.stems.csv: Removed comments about alternate names, as these will be included in a separate "VegCSV-alt" mapping to "VegCSV-core" terms
mappings/VegX-VegCSV.stems.csv: Clarified comments about the inclusion of the table name
mappings/VegX-VegCSV.stems.csv: Mapped plotObservation user-defined terms
mappings/VegX-VegCSV.stems.csv: Mapped VegX plotObservation fields
mappings/VegX-VegCSV.stems.csv: Corrected sources of DwC terms to point to the actual DwC term, where needed. eventDate parts: Added source for VegBank field used as named suffix.
mappings/VegX-VegCSV.stems.csv: Corrected sources of VegX names to point to the actual VegX field name, where needed
mappings/VegX-VegCSV.stems.csv: Mapped SALVIAS stem tags