bugfix: inputs/SALVIAS/plotMetadata_/map.csv: things mapped to project_participant: remapped to event__participant because these actually relate to the event, not the project, even though they seem like project-related fields
fix: inputs/SALVIAS/plotMetadata_/map.csv, inputs/Madidi/LocationObservation/map.csv: things mapped to communityID: remapped to communityName, which is what's used in analytical_stem (communityID is for numeric IDs)
inputs/SALVIAS/plotMetadata_/create.sql, map.csv: expanded plot_administrator:party_code_party_ and mapped plot_administrator_name to a 2nd project_participant
mappings/VegCore-VegBIEN.csv: project_participant: use [!...] negative lookahead assertion so that multiple project_participant columns will properly map to separate projectcontributor rows
inputs/SALVIAS/plotMetadata_/map.csv: mapped PrimOwnerID_name->project_participant
inputs/SALVIAS/plotMetadata_/create.sql: added join to PrimOwnerID:party_code_party_
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
bugfix: mappings/VegCore-VegBIEN.csv: nest all taxonoccurrences inside a stratum event, so that the parent locationevent is always fully populated before child locationevents point to it. (previously, a stub parent event was created when the child event was imported first, which blocked the fully-populated parent event from being inserted later on.) this uses auto-folding (for VegBank/CVS) and auto-forwarding (for other datasources) to prune empty stratum events for taxonoccurrences that don't have strata. (see wiki.vegpath.org/Auto-folding, wiki.vegpath.org/Auto-forwarding for more info about these normalization techniques.) note that the inserted row counts stay exactly the same for all datasources except VegBank (which was being fixed), indicating that this signficant change to the mappings did not change the semantics of the import of taxonoccurrences.
inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix
inputs/*/*/map.csv: added distinguishing #... suffix (e.g. UNUSED#institutionID) to the special terms OMIT, PRIVATE, UNUSED (VegCore.vegpath.org#Special-terms) to avoid creating a collision in the staging table renaming
inputs/input.Makefile: SVN: add, %/add: */logs: also svn:ignore *.gz, used for compressed log files
mappings/VegCore-VegBIEN.csv: subplotID,subplot -> location.sourceaccessioncode: Fixed bug where need /_first to handle the case where both subplotID and subplot are provided
inputs/input.Makefile: %/.map.csv.last_cleanup: Run fix_line_endings after canon/translate to standardize Python's \r\n line endings back to \n. This prevents issues with mixed line endings because LibreOffice (and probably Excel) treat all cell-internal line endings as \n but row line endings as whatever the file had, while text editors like jEdit translate all line endings to whatever the autodetected line ending is. (This creates spurious line ending diffs when a map spreadsheet containing multiline cells is edited in a text editor.)
mappings/VegCore-VegBIEN.csv: locationRemarks: Remapped to locationnarrative because location.notespublic is a boolean field
inputs/SALVIAS/: Regenerated salvias_*.schema.sql from the MySQL version, to take advantage of my2pg improvements. The placeholder *_index columns which take the place of MySQL's inline index definitions have now been replaced by no-op CHECK constraints, so that there are no longer lots of dummy *_index columns in the map spreadsheets.
mappings/VegCore-VegBIEN.csv: _avg(): Use numeric param names to work with SQL functions
inputs/SALVIAS/plotMetadata_/create.sql: Added primary key
mappings/VegCore-VegBIEN.csv, inputs/*/*/map.csv: Applied term renamings from the new dynamically generated Veg+-VegCore.csv, which reflects the current state of the data dictionary. (Permanently switching to the new Veg+-VegCore.csv will be a separate change.) Updates to VegCore term names that have occurred since the data dictionary was created are now able to take effect, which involves remapping and inferring units on several fields.
mappings/VegCore-VegBIEN.csv: locationID->location.sourceaccessioncode: Removed restriction that this mapping can't occur if geovalidation information is present. The locationID is no longer mapped to the place.sourceaccessioncode, so this filter is not necessary.
mappings/VegCore.csv: Regenerated from wiki
inputs/SALVIAS/*/test.xml.ref: Restored SALVIAS* inserted row counts, which had gotten auto-accepted from a test run on a non-empty DB
mappings/VegCore-VegBIEN.csv: Removed TNRS input taxonlabels meant to cross-link to taxonlabels added by the TNRS import, because TNRS taxondeterminations are now created instead
mappings/VegCore-VegBIEN.csv: locationID/locationName + subplot -> location.sourceaccessioncode mapping: Fixed bug where subplot was incorrectly being mapped to this field even when there was no location*. (This field can only be populated if both location* and subplot are specified.) Also only map locationID for this, to avoid inconsistencies where one table supplies locationID+subplot, while another table supplies locationName+subplot, but they both get mapped to the same field, preventing plots from being matched up with their observations when creating the analytical_stem.
mappings/VegCore-VegBIEN.csv: authortaxoncode mappings: Only use authorTaxonCode if there is no plant ID, because an individual plant gets its own taxonoccurrence and thus needs the taxonoccurrence's IDs to be unique to the plant, regardless of what the author designates as the taxonoccurrence code
mappings/VegCore-VegBIEN.csv: Mapped authorTaxonCode
mappings/VegCore.csv: Terms: Removed namespace prefixes (dcterms:), because VegCore terms are globally unique within VegCore and there should not be multiple versions of the same VegCore term with different namespaces. Provenance is instead indicated in the Sources column, which contains not just a namespace but a full URL to each source term.
mappings/VegCore.csv: Term names: Changed special characters to _ because Redmine doesn't support special characters in HTML anchors (it removes everything except letters, numbers, _, and -)
mappings/VegCore.csv: Renamed plotName to locationName because this term also applies to the location of a specimen. This replaces CTFS's definition of locationName as locality.
schemas/vegbien.sql: Renamed soilobs to soilsample per working group discussion
mappings/VegCore-VegBIEN.csv: Mapped locality description fields to location.iscultivated using _locationnarrative_is_cultivated()
db_xml.py: put(): _setDefault(): Support setting multiple col_defaults at once by using the param names themselves as the column names
mappings/VegCore-VegBIEN.csv: Set the source_id col_default to the datasource name using the new _setDefault() built-in function and _env()
mappings/VegCore-VegBIEN.csv: Mapped acceptedCounty, county to the matched place
inputs/SALVIAS/plotMetadata_/map.csv: AccessCode: Removed _map entry for 4, which does not apply to plots
inputs/SALVIAS/plotMetadata_/map.csv: Mapped AccessCode to dcterms:accessRights with appropriate _map filter
mappings/VegCore-VegBIEN.csv: Removed _date/date, because _date using a string date argument is no longer supported under plpython3u (dateutil is missing). Note that PostgreSQL's own date parsing is sufficient for most dates, so this use of _date is not strictly necessary and removing it will improve import times.
schemas/vegbien.sql: Renamed reference -> source to make this table more broadly applicable, and because this now stores the datasource metadata
inputs/SALVIAS/: Updated to new salvias_plots export on nimoy, which has a different schema
inputs/SALVIAS/: Mapped unmapped fields with a VegCore/VegBIEN equivalent. plotMetadata_/: Remapped life_zone to communityID because it is now alt-ed together with vegetation*, and thus not just a description with life_zone_code as its globally unique name.
inputs/SALVIAS/plotMetadata_/map.csv: Mapped lookup_MethodCode_Description to new observationMeasure
inputs/SALVIAS/salvias_plots.~.clean_up.sql: Rename lookup_MethodCode.Description to lookup_MethodCode_Description to make it globally unique when joined with plotMetadata
mappings/VegCore-VegBIEN.csv: matched place's coordinates: Fixed bug where coordinates entry itself needed to have its datasource (reference) set to geoscrub, in addition to the place entry that uses it, in order to match up properly with geoscrub's corresponding input place (whose coordinates as well as place are owned by the geoscrub datasource)
mappings/VegCore-VegBIEN.csv: matched place's coordinates: Fixed bug where coordinates mappings with and without matched_place_id=0 need to sort together in order to be merged, by prepending ".," to the place attrs list
inputs/SALVIAS/plotMetadata/: LEFT JOINed with lookup_MethodCode to create plotMetadata_
mappings/VegCore-VegBIEN.csv: subplot locationevent: Only populate parent locationevent's location unique IDs if a subplot #/subplotID is actually specified. (The lack of a location unique ID will cause the parent locationevent's location to be removed, as well as the parent locationevent itself if there is no parent locationevent unique ID.) This fixes a bug where top-level plots in datasources that provide a nullable subplot #/subplotID were incorrectly getting connected to parent locationevents.
mappings/VegCore-VegBIEN.csv: subplots: Also complete the locationevent/location diamond (subplot event -> {subplot location, parent plot event} -> parent plot location) when an eventDate or range is specified, as this is also an identifying field for locationevent. This fixes a bug where subplots data without explicit plot events (such as SALVIAS and TEAM) was not being connected to the appropriate parent plot event as well as parent plot location. This should fix the SALVIAS verification # location events, which should include only parent plots' locationevents to correspond with # locations, which only includes parent plots' locations, and uses locationevent.parent_id being NULL to determine what is a parent plot event.
mappings/VegCore-VegBIEN.csv: decimalLatitude/Longitude->geoscrub input coordinates: Also set to NULL if 0 here, not just for the coordinates linked to the datasource's place instance
mappings/VegCore-VegBIEN.csv: matched place: Also map verbatim place's geoscrub-related fields to the matched place, to link up with geoscrub's corresponding input place
mappings/VegCore-VegBIEN.csv: Mapped acceptedCountry, acceptedStateProvince, acceptedDecimalLatitude/Longitude. Mapped decimalLatitude/Longitude to matched place's coordinates when acceptedDecimalLatitude/Longitude not provided (as is the case for the geoscrub table).
mappings/VegCore-VegBIEN.csv: Map locationID to place.placecode instead when geovalidation columns are provided
mappings/VegCore-VegBIEN.csv: Remapped latitude/longitude to new coordinates table
schemas/vegbien.sql: Renamed placepath to place since this contains primary information about the place, including the reference to the canonical place
mappings/VegCore-VegBIEN.csv: location: Populate sourceaccessioncode with locationID + subplot when subplot is unique only within the parent plot, so that location always has a sourceaccessioncode to use as the plotCode in analytical_db_view
mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode: Only populate if needed to distinguish the taxonoccurrence within a plot
inputs/*/*/map.csv: Prefix a * to every term that's not in Veg+ for easy identification of unmapped terms when editing map.csv. Note that canon will remove the * when it finds a matching Veg+ term.
inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Documented that units are assumed to be % based on the range of values
inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Removed no longer needed old-style _units filter, now that unit conversion is handled by mappings/VegCore-VegBIEN.csv using _percent_to_fraction
mappings/VegCore.csv: Soil component measurements: Added default units of percent (cmol_kg for cationExchangeCapacity). This involves translating the names everywhere and adding a _percent_to_fraction conversion in mappings/VegCore-VegBIEN.csv.
mappings/VegCore-VegBIEN.csv: Remapped verbatimLatitude/Longitude to locationcoords.verbatimlatitude/longitude because these fields now contain only non-decimal coordinates. This involves removing the _alt suffix on decimalLatitude/Longitude, which causes the VegBIEN.csvs to change.
input.Makefile: Maps validation: %/new_terms.csv: Include the entire map spreadsheet row, so that each new term is listed together with its mapping. This facilitates adding new mappings to mappings/Veg+-VegCore.csv directly from any new_terms.csv. Note that the use of `sort -u` (in lib/mappings.Makefile) causes multiline comments to be separated, leading to spurious lines for each multiline comment line.
mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode alternatives: Use _first instead of _alt because when one of these fields is present, it can be used directly even if it's sometimes NULL, without needing to spend a lot of time _alting together fields that won't be used. Datasources where the authortaxoncode is sometimes NULL usually have a separate sourceaccessioncode for the taxonoccurrence. (In the rare case that they don't, they should map a non-NULL field to recordNumber or tag to ensure that taxonoccurrences can be uniquely identified.)
mappings/VegCore-VegBIEN.csv: Mapped tag to taxonoccurrence.authortaxoncode when the record is an organism, in case there is no other ID for the taxonoccurrence. This fixes a bug in FIA and TEAM data where all organisms in a plot used the same taxonoccurrence because taxonoccurrence was not properly constrained, causing the loss of individual taxondeterminations on each organism.
mappings/VegCore-VegBIEN.csv: Removed _rangeStart and _rangeEnd filters from fields which should contain decimal values. These filters should be added on a per-datasource basis instead.
mappings/VegCore-VegBIEN.csv: Removed /_units:[default=m,to=m,to=]/value filter from fields. It should be added on a per-datasource basis instead.
mappings/VegCore-VegBIEN.csv: Removed /_replace:["\bca\.?"=]/value filter from fields. It should be added on a per-datasource basis instead.
inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Remapped elev_m from verbatimElevation to elevationInMeters, since the values are all decimals. Note that the units of SALVIAS Elev were provided by a comment from Brad (and can also be assumed to be the same as SALVIAS-CSV elev_m).
schemas/vegbien.sql: Changed _frac units suffix to _fraction for clarity and for consistency with _percent (which is spelled out), as used by SALVIAS (http://salvias.net/Documents/salvias_data_dictionary.html) and elsewhere
mappings/VegCore-VegBIEN.csv: Renamed plotAreaHa to plotArea_ha for consistency with VegBIEN units suffixing convention, which includes an "_"
inputs/*/*/map.csv: Remapped applicable plotArea fields to plotAreaHa
mappings/VegCore-VegBIEN.csv: Mapped plotAreaHa
schemas/vegbien.sql: Added units suffix to additional VegBIEN fields that have units
schemas/vegbien.sql: Added units suffix to all core VegBIEN fields that have units. It is the responsibility of the mappings to ensure that all units are properly translated.
schemas/vegbien.sql: Added placepath (analogous to taxonpath), and point locationplace to it instead of directly to namedplace
schemas/vegbien.sql: Split locationdetermination into locationcoords and locationplace, so that coordinate determinations can be made separately from place determinations
inputs/*/*/map.csv: Changed output column header from Veg+ to VegCore because the names will be VegCore names after automapping. This is possible now that we're using new automapping scripts that do not require a particular column header.
input.Makefile: Maps validation: $(newTerms): Fixed bug where header needed to be removed before running filter_out_ci because filter_out_ci only removes the header if it matches the vocabulary's header. Removing the header afterward can cause the first row to be removed instead if the header was already removed.
inputs/*/*/map.csv: Moved filter suffixes to separate filter column to enable automapping to work on those mappings' terms, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Move-filter-suffixes-to-separate-filter-column>. Note that the only changes to VegBIEN.csvs are the (now automapped) names of terms in "No join mapping" comments.
inputs/*/*/map.csv: Added Filter column to contain any suffix added after the term, so that the automapping mechanism does not have to deal with the filter expressions
inputs/*/*/map.csv: Removed no longer needed [Veg+] suffix in root, because the input column is no longer used by old-style map utilities such as union that needed this
filter_out_ci: Filter header instead of passing it through, in order to properly support CSVs without a header, such as the unmapped_terms.csv and new_terms.csv files. For CSVs with a header, the header of the vocabulary should be removed before passing it to filter_out_ci.
input.Makefile: Maps building: Removed no longer used %/src.csv, because it is no longer needed to generate map.full.csv from map.csv
input.Makefile: Maps building: Removed no longer used %/map.full.csv
input.Makefile: Maps building: %/map.full.csv: Generate by copying map.csv, because the content of these files now differs only in the sort order of the names
inputs/*/*/map.csv: Changed empty mappings to self mappings, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Change-empty-mappings-to-self-mappings>. Note that in map.full.csv and VegBIEN.csv, lines that have changed are always the result of the input field's case being changed to match the case of the datasource's actual column name.
inputs/*/*/map.csv: Added back automapped mappings to map.csv, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Add-back-automapped-mappings-to-mapcsv>
inputs/: Added [Veg+] to via map roots to indicate that the datasource and Veg+ vocabularies are combinable. This is possible now that automapped entries are no longer subtracted when this is in the map root, so there is no concern of losing comments on subtracted automapped rows. Note that this change turns on old-style automapping for these datasources, causing SALVIAS plotMetadata to acquire additional mappings.
mappings/VegCore-VegBIEN.csv: Removed no longer needed /_simplifyPath:[next=parent_id]/path expressions in specific paths because parent_id forwarding is now set globally for all paths in the map root
mappings/VegCore-VegBIEN.csv: Added /_simplifyPath:[next=parent_id]/path to root so the returned subplot location will be its parent location if there is no subplot name or ID (indicating that that particular plot did not have subplots). Note that this also causes the parent_id forwarding effect to occur for all other tables containing parent_id, which will help prevent similar issues with subplot events, etc. This will hopefully fix the SALVIAS.plotObservations bug where some organisms did not have a subplot #, causing the subplot location to become NULL and causing the corresponding locationevent rows not to match the locationevent_unique_within_location index filter condition (which requires a parent_id), which caused multiple output table pkeys to be returned for those rows, violating the locationevent_pkeys temp table's primary key.
mappings/VegCore-VegBIEN.csv: namedplace elements: _simplifyPath() calls: Removed no longer needed `require` arg, and removed no longer needed table suffix from `next` arg
Regenerated/modified inputs/*/*/src.csv to use the self-mapping format used by the new automapping mechanism
input.Makefile: Maps building: %/.map.csv.last_cleanup: $(newTerms): Remove the CSV header from the terms lists so that multiple terms lists can easily be appended together
input.Makefile: Maps building: %/.map.csv.last_cleanup: Generate reports on new and unmapped terms in map.csv
input.Makefile: Maps building: %/.map.csv.last_cleanup: Translate map.csv using $(mappings)/$(via)-VegCore.csv
mappings/VegCore-VegBIEN.csv: Mapped min/max SlopeAspect/SlopeGradient. Note that this allows the min/maxSlopeAspect values to bypass the additional _compass filter that is applied to slopeAspect.
mappings/VegCore-VegBIEN.csv: startDate, endDate mappings: Removed _dateRangeStart/_dateRangeEnd filters because these are assumed to already be start and end dates of a range. (eventDate should be used for concatenated date ranges.)
schemas/vegbien.sql: project: Don't require projectname to be specified when sourceaccessioncode is provided
inputs/SALVIAS/plotMetadata/map.csv: Remapped project_id to project.sourceaccessioncode
mappings/VegCore-VegBIEN.csv: if subplot: Also forward locationID and plotName to the location of the parent locationevent (in addition to the parent location of the location), in order to "complete the diamond" connecting subplot locationevent -> (parent plot locationevent, subplot location) -> parent plot location
inputs/SALVIAS/: Switched to using the DB export's staging tables instead of the exported CSVs