xml_func.py: _name(): Fixed bug where needed to pass None values through and handle no name parts to properly support NULL propagation
xml_dom.py: value(), set_value(): Use new strings.isspace_none_str as sentinel None equivalent, to support cloning text nodes containing a sentinel None
strings.py: Added isspace_none_str to support clone-safe sentinel str values that pass isspace()
xml_dom.py: is_whitespace(): Also consider empty text nodes to be whitespace
xml_dom.py: is_whitespace(): Support text nodes whose value() is None by using .nodeValue instead
xml_dom.py: set_value(): Don't set the value of a text node to None by removing it, because this prevents the node from being reused. Instead use a sentinel string value to denote None, and map to and from it.
strings.py: Added none_str and helper class NonInternedStr to support sentinel str values
xml_dom.py: set_value(): Support setting the value of a text node to None, by removing it
Removed trailing whitespace on non-empty lines
sql_io.py: put_table(): DuplicateKeyException: is_literals: Fixed bug where sql.select() needed to select on just the join_cols, not the whole mapping
xml_func.py: process(): Removed support for no longer used structural functions
xml_func.py: Removed no longer used structural functions
mappings/for_review/DwC2-VegBIEN.specimens.fields.csv: input root: Removed DwC XML path info since DwC is now a CSV schema
mappings/DwC2-VegBIEN.specimens.csv: eventDate: Also map to obsstartdate/obsenddate, since the collectiondate is also the event date for specimens data, and for mergability with VegCSV
mappings/VegCSV-VegBIEN.specimens.csv: eventDate: Added mappings to obsstartdate/obsenddate, since users of this field (currently SALVIAS census_date) intend it as the plot event's date. Keep the mapping to collectiondate because a non-range plot event date is also the collectiondate of all organisms in that plot event.
schemas/py_functions.sql: parse_date_range(): Always return a value for end date, even if string is not a date range. This enables using _dateRangeEnd() as a filter function on anything intended as an end date.
mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: eventDate: collectiondate mapping: Removed _dateRangeStart filter because the eventDate (obsstartdate) is only valid as the date the specimen was collected if it is a single date, not a date range. (It is still valid as the obsstartdate/obsenddate if it's a range.)
mappings/Veg+.terms.csv: Added dateCollected
input via maps: Removed _date/date filter from date fields because the main mappings now have _date around all dates, so this filter is redundant
inputs/SALVIAS-CSV/maps/VegCSV.organisms.csv: census_date: Don't map directly to the year, as this field is allowed to be a full date even though our data sample contains only years. Note that _date/date will automatically detect plain years and treat them as years, and so will casts to timestamp.
inputs/SALVIAS*/maps/VegCSV.organisms.csv: census_date: Documented that this is for the subplot, not the organism, as all organisms in a subplot have the same value for it
mappings/DwC2-VegBIEN.specimens.csv: verbatimLatitude/verbatimLongitude: Fixed mappings to use _alt/2 instead of _alt/1 to avoid collisions with decimalLatitude/decimalLongitude
schemas/functions.sql: _merge(): Changed sort_orders to match the $-variable name instead of the function parameter name, so each line of the VALUES clause would use the same number for both
schemas/functions.sql: _merge(): Filter out NULL values as optimization so DISTINCT ON only has to consider non-NULL values
schemas/functions.sql: join_strs(): Return NULL if all strings were NULL or ''. This fixes unexpected behavior in _merge() where all elements are NULL but the return value is non-NULL.
schemas/functions.sql: Added join_strs_transform_preserve_empty() and use it in join_strs_transform_fold_empty()
schemas/functions.sql: Renamed join_strs_() to join_strs_transform_fold_empty() for clarity and to indicate that it's for use by the join_strs() aggregate
mappings/DwC2-VegBIEN.specimens.csv: recordNumber: Added VegCSV mappings for it
mappings/DwC2-VegBIEN.specimens.csv: occurrenceID: Added VegCSV mappings for it
mappings/DwC2-VegBIEN.specimens.csv: mappings to /location/sourceaccessioncode: Added _alt to prioritize them properly
inputs/UNCC/maps/DwC.specimens.csv: herbarium: Fixed mapping to go to institutionCode instead of collectionCode
mappings/DwC2-VegBIEN.specimens.csv: Remapped institutionCode/collectionCode/catalogNumber location mappings to location.authorlocationcode
schemas/vegbien.ERD.mwb: Reset methodtaxonclass lines so that only one needs to be repositioned after syncing with the schema
mappings/VegCSV-VegBIEN.specimens.csv: locationID: Removed mapping to locationevent.sourceaccessioncode, because locationID relates to the plot, not the plot event. (The locationevent is scoped by the location when the sourceaccessioncode and authoreventcode are not specified, so duplicate elimination will still occur correctly.)
mappings/DwC2-VegBIEN.specimens.csv: Mapped locationID, for mergability with VegCSV
mappings/VegCSV-VegBIEN.specimens.csv: plotName: Removed authoreventcode mapping because plotName relates to the plot, not the plot event. (The locationevent is scoped by the location when the authoreventcode is not specified, so duplicate elimination will still occur correctly.) Instead map only authoreventcode-related fields (currently CVS's authorObsCode) to authoreventcode, via DwC's (confusingly-named) fieldNumber ("An identifier given to the event in the field").
schemas/vegbien.sql: locationevent: locationevent_unique_within_location: Added authoreventcode to index. It was already in the locationevent_unique_within_*parent*_by_authoreventcode index, but also needed to be in the no-parent (non-subplot) index. This fixes locationevent duplicate elimination when a locationevent sourceaccessioncode is not specified.
schemas/vegbien.sql: location: location_unique_within_datasource unique index: Added COALESCE and `WHERE sourceaccessioncode IS NOT NULL` now that sourceaccessioncode is nullable. Renamed location_unique_within_datasource and location_unique_authorlocationcode to location_unique_within_datasource_by_... to show that both are alternatives for globally unique keys. schemas/vegbien.ERD.mwb: Moved elements slightly to reduce the number of lines that need to be repositioned after syncing with the schema.
mappings/DwC2-VegBIEN.specimens.csv: Mapped verbatimElevation and samplingProtocol, for mergability with VegCSV
inputs/import.stats.xls: Updated with stats from latest import
mappings/VegCSV-VegBIEN.specimens.csv: location unique keys: Map to a new parent location for the location, instead of a parent locationevent for the locationevent. This much simpler mapping (which does not require _alt or _merge) is possible now that the necessary unique indexes have been set up.
Regenerated vegbien.ERD exports, now including both pages in vegbien.ERD.core.pdf. Renamed schemas/vegbien.ERD.core.pdf to vegbien.ERD.pdf because it now includes the full schema.
schemas/filter_ERD.csv: Removed extraneous lines to improve readability. schemas/vegbien.ERD.mwb: Reconfigured elements to put only the most important ones in the core subset (the top page).
schemas/vegbien.sql: location: Made sourceaccessioncode optional if authorlocationcode is specified, since either of these fields can now serve as the unique key
mappings/VegCSV-VegBIEN.specimens.csv: Map to new location.authorlocationcode
schemas/vegbien.sql: location: Support uniquely specifying a location by its authorlocationcode
schemas/vegbien.sql: location: Added authorlocationcode to unique indexes
schemas/vegbien.sql: location: Added authorlocationcode
schemas/vegbien.sql: location: Added location_unique_within_parent_by_coords unique index that uses COALESCE, replacing location_unique_subplot_coords unique constraint
mappings/VegCSV-VegBIEN.specimens.csv: maximumElevationInMeters: Fixed bug where _rangeEnd filter needed to be removed because this only works on a field which can be either a range or the start of a range, such as minimumElevationInMeters (on an end-of-range field, a single value will be removed completely). Added _alt for mergeability with DwC. minimumElevationInMeters: Added elevationrange-to mapping using _rangeEnd for mergeability with DwC.
mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: minimum/maximumElevationInMeters, minimum/maximumDepthInMeters: Remove any "ca." prefix from value. Doing this on all elevation/depth fields will make the DwC and VegCSV mappings mergeable.
mappings/VegCSV-VegBIEN.specimens.csv: locality: Mapped using same XPath as DwC, to enable merging
mappings/DwC2-VegBIEN.specimens.csv: Mapped individualCount. This will enable merging with VegCSV.
mappings/VegCSV-VegBIEN.specimens.csv: Cleaned up. This still needs to be run manually with `make mappings/` because the derived maps are symlinks rather than make targets, so make never touches the non-derived map and doesn't run its recipe in the automated tests
mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxondetermination mappings: Removed iscurrent=true because it is not the role of the mappings to specify which taxondetermination is the current one. Eventually, the order of the determinations will need to be specified using a sort # or similar, and the DB will select the current one for queries to use. Ensure all mappings have :[isoriginal=true] so that they match up between DwC and VegCSV.
mappings/DwC2-VegBIEN.specimens.csv, VegCSV-VegBIEN.specimens.csv: taxondetermination mappings: Ensure all mappings have :[iscurrent=true] or equivalent so that they sort together, and match up between DwC and VegCSV
mappings/VegCSV-VegBIEN.specimens.csv: individualCount: Disambiguated alternate meaning as stem count by changing stem count fields to map to new stemCount term, which maps to plantobservation.stemcount
mappings/Veg+.terms.csv: Added stemCount
mappings/VegCSV-VegBIEN.specimens.csv: Cleaned up
mappings/DwC2-VegBIEN.specimens.csv: Mapped identificationQualifier. This will enable merging with VegCSV.
mappings/VegCSV-VegBIEN.specimens.csv: identificationQualifier (taxon fit): Removed mapping to prefix of binomial field, since that field should just contain what the datasource said was the binomial. It's TNRS's job to concatenate the taxon fit, etc. with the binomial and other name parts for name resolution.
mappings/DwC2-VegBIEN.specimens.csv: fieldNumber: Remapped to authoreventcode because this is (confusingly) the author code for the event, according to the DwC definition
inputs/NY, ARIZ: FieldNumber: Remapped to recordNumber because term usage was inconsistent with DwC definition. Datasources sometimes confuse this term, because it seems like the collection number, but is actually the author code for the event (VegBank's authorObsCode).
schemas/vegbank.ERD.pdf: Restored to VegBank ERD, which had gotten overwritten when the vegbien.ERD exports were regenerated
mappings/DwC1-DwC2.specimens.csv: Removed Source column and source-related comments because this information is now maintained in mappings/Veg+.terms.csv
mappings/DwC2-VegBIEN.specimens.csv: Removed Source column because this information is now maintained in mappings/Veg+.terms.csv
mappings/VegCSV-VegBIEN.specimens.csv: Removed Source column and source-related comments because this information is now maintained in mappings/Veg+.terms.csv
Added mappings/Veg+.terms.csv, which will serve the purpose of listing all available terms with their source. This will remove the need to store the sources in the mappings, where they are out of place and difficult to maintain during refactoring.
mappings/VegX-VegCSV.stems.csv: Removed Comments and Source columns because this information is now maintained in mappings/VegCSV-VegBIEN.specimens.csv. This will simplify later VegCSV refactoring, because the Comments and Source columns will not need to be changed along with the VegCSV column.
mappings/VegCSV-VegBIEN.specimens.csv: Removed Comments and Source columns because this information is now maintained in mappings/VegCSV-VegBIEN.specimens.csv. This will simplify later VegCSV refactoring, because the Comments and Source columns will not need to be changed along with the VegCSV column.
mappings/VegCSV-VegBIEN.specimens.csv: Changed plotID to locationID and parentPlotID to parentLocationID to use DwC-related terms
mappings/DwC2-VegBIEN.specimens.csv: collectionID: Fixed mapping to point to collectioncode_dwc instead of collectionnumber, as this is an ID of the collection rather than within it
schemas: Renamed vegbien.ERD.pdf to vegbien.ERD.1_pg.pdf since it's not the primary PDF that should be used, due to its slow load time
Regenerated vegbien.ERD exports
schemas/vegbien.sql: specimenreplicate: specimenreplicate_plantobservation_1_to_1: Only apply when sourceaccessioncode and catalognumber_dwc are NULL, in order to support multiple specimenreplicates for one plant in plots data. specimenreplicate_unique_catalognumber: Added plantobservation_id, so that catalognumber_dwc (a sort of authorSpecimenCode for plots data) only needs to be unique within a plant. Eventually, we will want to migrate the mappings so that collectionnumber is used for this purpose instead.
schemas/vegbien.sql: specimenreplicate: Made plantobservation_id optional again, since indirect vouchers do create specimenreplicates without a parent plantobservation. schemas/vegbien.ERD.mwb: Fixed lines.
schemas/vegbien.sql: specimenreplicate: Made plantobservation_id required, since that is now the parent table fkey
schemas/vegbien.ERD.mwb: Fixed lines
schemas/vegbien.ERD.mwb: Adjusted lines. Adjusted position of locationdetermination to put location directly next to locationevent. Expanded location to fill newly-available space.
schemas/vegbien.sql: locationevent: Renamed authorlocationcode to authoreventcode to be consistent with the table name. Note that for our current datasources, the plot = the plot event, so the authoreventcode and authorlocationcode/authorPlotCode will be the same.
mappings/VegCSV-VegBIEN.specimens.csv: Changed VegCSV term fieldNumber (from DwC) to recordNumber to be consistent with the TDWG meaning of fieldNumber, which defines it as the author code for the event, not the organism (what VegBIEN calls the authorlocationcode and VegBank calls the authorObsCode)
mappings/VegCSV-VegBIEN.specimens.csv: Comments: Removed no longer applicable comments about XPath syntax added to affect sort order
mappings/VegCSV-VegBIEN.specimens.csv: height: Removed mapping to plantobservation.overallheight, since the height is a stem field rather than a plant field. Note that a height in the organisms table will be mapped to the height in a single stemobservation for that plant, with NULL sourceaccessioncode and authorstemcode. Note also that this change is possible because no mapped datasource yet provides a valid overallheight with multiple stems or that differs from its single stem's height. (Although SALVIAS sometimes provides both a stem height and an organism height, that height is always either the same, or the organism height is invalid. See <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/SALVIAS_issues#Some-organisms-have-one-stem-but-different-heights-in-the-organisms-and-stems-tables>.)
mappings/DwC2-VegBIEN.specimens.csv: establishmentMeans: Removed obsolete mapping to growthform, since growthforms and cultivated/native information are no longer merged into one field in VegBIEN (which they were when this mapping was created)
mappings/VegCSV-VegBIEN.specimens.csv: decimalLatitude/decimalLongitude: Added _nullIf suffix for mergability with VegCSV-VegBIEN.specimens.csv
mappings/VegCSV-VegBIEN.specimens.csv: coordinateUncertaintyInMeters: Added _noCV suffix for mergability with VegCSV-VegBIEN.specimens.csv
mappings/DwC2-VegBIEN.specimens.csv: catalogNumber: Added _if wrapper for mergability with VegCSV-VegBIEN.specimens.csv
mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber direct voucher _if statement: Changed @name to "if indirect voucher", so that it's logical consistent with the else branch following it. It was previously "if direct voucher" because the _if statement only contained a case for direct vouchers, and the else branch was being used in place of a _not() function.
mappings/roots: plots roots: Default to using VegCSV instead of VegX for new plots datasources
mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber _if statements: Changed @names to more descriptive comments. This also prevents the @name from looking confusingly like the condition of the _if statement, which is actually supplied through the cond param and is usually located in a separate mapping.
mappings/VegCSV-VegBIEN.specimens.csv: catalogNumber: Split _if apart into separate _ifs for the indirect and direct voucher cases. Moved direct voucher _if inwards so it is just wrapping catalognumber_dwc itself. This will enable this mapping to be used for specimens data, which is always considered a direct voucher and will always have this _if return true. Also moved indirect voucher _if inwards in the same way, so that a future SQL function implementation of _if only needs to concern itself with returning one value or another, not with handling entire XML subtrees. Note that if the indirect voucher _if returns false, NOT NULL and CHECK constraint violations will cause the intervening voucher and specimenreplicate elements to be deleted, thus having the same effect. Use new @name syntax for distinguishing _if statements.
mappings: Removed no longer used for_review/VegBIEN-DwC2.specimens.csv
xml_func.py: _if(): Changed documentation about name param for distinguishing separate _if statements to use @name attribute instead, so that the XML/SQL function mechanism doesn't have to deal with code that's solely for XPath merging
schemas/filter_ERD.csv: Removed no longer applicable specimenreplicate inheritance filters
inputs/import.stats.xls: Updated with stats from latest import. Note that the import now includes additional date parsing on all date fields, which adds 1/2-1 hour to the import time. Eventually, we will want to translate _date() to PL/pgSQL and only use extra date processing if PostgreSQL's cast to timestamp doesn't work, which should greatly reduce this time.