moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
inputs/VegBank/plot/postprocess.sql: locality: include the site name (authorlocation), because this is part of the unique specification of the place that was sampled
bugfix: inputs/VegBank/: need to remove inter-datasource duplicates from plot instead of the left-joined plot_ table, because the fkeys needed to do the cascading deletes are all to the plot table. this requires doing the column-renaming and postprocessing on plot before it's left-joined.
inputs/VegBank/plot_/create.sql: updated runtime (5 s) for previous bugfix
bugfix: inputs/VegBank/plot_/postprocess.sql: coordinateUncertaintyInMeters__from_fuzzing: need to convert km to m in the fuzzing radii. updated derived cols runtimes.
inputs/VegBank/plot_/postprocess.sql: remove duplicated CVS plots (2323 of 7079 CVS plots are removed by this)
fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
inputs/VegBank/plot_/create.sql: documented runtime (5 min)
inputs//: don't import joined tables, because they are now imported in the taxon_observation.** left-join instead
inputs/VegBank/: prepended the table name to each column name to prevent column collisions, using the steps at http://wiki.vegpath.org/Left-joining_a_datasource
inputs/VegBank/: switched to new-style import, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource
bugfix: inputs/VegBank/plot_/postprocess.sql: coordinateUncertaintyInMeters: need to use GREATEST instead of _alt() to handle cases where the coordinate uncertainty is > than the fuzzing uncertainty, where you wouldn't want to just use the smaller fuzzing uncertainty
inputs/VegBank/plot_/: translated multi-column filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns
inputs/VegBank/plot_/postprocess.sql: map_*() derived cols: updated runtime
inputs/VegBank/plot_/: translated single-column filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#Translating-filters-to-postprocessing-derived-columns
inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix
inputs/VegBank/*/postprocess.sql: added primary keys to derived tables
inputs/VegBank/plot_/map.csv: area|country|territory, region|state|province (from place table): remapped to DUPLICATE, since these have the same data as, and are populated less often than, their country/stateprovince couterparts
bugfix: inputs/VegBank/plot_/create.sql: need to join place.*plot_id* to plot.plot_id instead of plotplace_id. this is the cause of the "State is wrong, not Wyoming, but Tennessee" and "County is incorrect (not Powell, but Orange)" bugs in the VegBank spot-checking (wiki.vegpath.org/Spot-checking#Great-Smoky-Mountains-National-Park).
inputs/VegBank/plot_/map.csv: elevation: documented that it has only 5 decimal places of precision, with only 9s and random #s after that
inputs/VegBank/plot_/test.xml.ref: update rowcount
inputs/*/*/map.csv: added distinguishing #... suffix (e.g. UNUSED#institutionID) to the special terms OMIT, PRIVATE, UNUSED (VegCore.vegpath.org#Special-terms) to avoid creating a collision in the staging table renaming
inputs/*/*/logs: updated svn:ignore
mappings/VegCore-VegBIEN.csv: subplotID,subplot -> location.sourceaccessioncode: Fixed bug where need /_first to handle the case where both subplotID and subplot are provided
inputs/input.Makefile: %/.map.csv.last_cleanup: Run fix_line_endings after canon/translate to standardize Python's \r\n line endings back to \n. This prevents issues with mixed line endings because LibreOffice (and probably Excel) treat all cell-internal line endings as \n but row line endings as whatever the file had, while text editors like jEdit translate all line endings to whatever the autodetected line ending is. (This creates spurious line ending diffs when a map spreadsheet containing multiline cells is edited in a text editor.)
mappings/VegCore-VegBIEN.csv: _avg(): Use numeric param names to work with SQL functions
mappings/VegCore-VegBIEN.csv: locationID->location.sourceaccessioncode: Removed restriction that this mapping can't occur if geovalidation information is present. The locationID is no longer mapped to the place.sourceaccessioncode, so this filter is not necessary.
mappings/VegCore.csv: Regenerated from wiki
input.Makefile: Maps validation: %/new_terms.csv: Filter out terms that map to UNUSED, because these are not mappings that are useful as VegCore synonyms
mappings/VegCore-VegBIEN.csv: locationID/locationName + subplot -> location.sourceaccessioncode mapping: Fixed bug where subplot was incorrectly being mapped to this field even when there was no location*. (This field can only be populated if both location* and subplot are specified.) Also only map locationID for this, to avoid inconsistencies where one table supplies locationID+subplot, while another table supplies locationName+subplot, but they both get mapped to the same field, preventing plots from being matched up with their observations when creating the analytical_stem.
mappings/VegCore-VegBIEN.csv: authortaxoncode mappings: Only use authorTaxonCode if there is no plant ID, because an individual plant gets its own taxonoccurrence and thus needs the taxonoccurrence's IDs to be unique to the plant, regardless of what the author designates as the taxonoccurrence code
mappings/VegCore-VegBIEN.csv: Mapped authorTaxonCode
mappings/VegCore.csv: Terms: Removed namespace prefixes (dcterms:), because VegCore terms are globally unique within VegCore and there should not be multiple versions of the same VegCore term with different namespaces. Provenance is instead indicated in the Sources column, which contains not just a namespace but a full URL to each source term.
mappings/VegCore.csv: Term names: Changed special characters to _ because Redmine doesn't support special characters in HTML anchors (it removes everything except letters, numbers, _, and -)
inputs/VegBank/plot_/map.csv: Mapped confidentialitystatus to coordinateUncertaintyInMeters, overriding locationaccuracy when the confidentialitystatus indicates fuzzing
mappings/VegCore.csv: Renamed plotName to locationName because this term also applies to the location of a specimen. This replaces CTFS's definition of locationName as locality.
mappings/VegCore-VegBIEN.csv: Mapped locality description fields to location.iscultivated using _locationnarrative_is_cultivated()
db_xml.py: put(): _setDefault(): Support setting multiple col_defaults at once by using the param names themselves as the column names
mappings/VegCore-VegBIEN.csv: Set the source_id col_default to the datasource name using the new _setDefault() built-in function and _env()
mappings/VegCore-VegBIEN.csv: Mapped acceptedCounty, county to the matched place
mappings/VegCore-VegBIEN.csv: Removed mapping for coordinatePrecision, which is not the same as coordsaccuracy_m. coordinatePrecision is instead "the precision of the coordinates" themselves in degrees (<http://rs.tdwg.org/dwc/terms/#coordinatePrecision>).
schemas/vegbien.sql: coordinates: Changed coordinates.coordsaccuracy_deg units to m
inputs/VegBank/plot_/map.csv: confidentialitystatus filter: Merged mappings for 0 with other public-equivalent fields. Note that fuzzed plots are still public, because the private columns have been removed.
inputs/VegBank/plot_/map.csv: Mapped confidentialitystatus to dcterms:accessRights with an appropriate _map filter
schemas/vegbien.sql: Renamed reference -> source to make this table more broadly applicable, and because this now stores the datasource metadata
mappings/VegCore-VegBIEN.csv: matched place's coordinates: Fixed bug where coordinates entry itself needed to have its datasource (reference) set to geoscrub, in addition to the place entry that uses it, in order to match up properly with geoscrub's corresponding input place (whose coordinates as well as place are owned by the geoscrub datasource)
mappings/VegCore-VegBIEN.csv: matched place's coordinates: Fixed bug where coordinates mappings with and without matched_place_id=0 need to sort together in order to be merged, by prepending ".," to the place attrs list
inputs/VegBank/plot_/test.xml.ref: Updated inserted row count
inputs/VegBank/vegbank.~.clean_up.sql: Remove private columns (plot.reallatitude, reallongitude) that should not be publicly visible
mappings/VegCore-VegBIEN.csv: subplot locationevent: Only populate parent locationevent's location unique IDs if a subplot #/subplotID is actually specified. (The lack of a location unique ID will cause the parent locationevent's location to be removed, as well as the parent locationevent itself if there is no parent locationevent unique ID.) This fixes a bug where top-level plots in datasources that provide a nullable subplot #/subplotID were incorrectly getting connected to parent locationevents.
mappings/VegCore-VegBIEN.csv: subplots: Also complete the locationevent/location diamond (subplot event -> {subplot location, parent plot event} -> parent plot location) when an eventDate or range is specified, as this is also an identifying field for locationevent. This fixes a bug where subplots data without explicit plot events (such as SALVIAS and TEAM) was not being connected to the appropriate parent plot event as well as parent plot location. This should fix the SALVIAS verification # location events, which should include only parent plots' locationevents to correspond with # locations, which only includes parent plots' locations, and uses locationevent.parent_id being NULL to determine what is a parent plot event.
mappings/VegCore-VegBIEN.csv: decimalLatitude/Longitude->geoscrub input coordinates: Also set to NULL if 0 here, not just for the coordinates linked to the datasource's place instance
mappings/VegCore-VegBIEN.csv: matched place: Also map verbatim place's geoscrub-related fields to the matched place, to link up with geoscrub's corresponding input place
mappings/VegCore-VegBIEN.csv: Mapped acceptedCountry, acceptedStateProvince, acceptedDecimalLatitude/Longitude. Mapped decimalLatitude/Longitude to matched place's coordinates when acceptedDecimalLatitude/Longitude not provided (as is the case for the geoscrub table).
mappings/VegCore-VegBIEN.csv: Map locationID to place.placecode instead when geovalidation columns are provided
mappings/VegCore-VegBIEN.csv: Remapped latitude/longitude to new coordinates table
schemas/vegbien.sql: Renamed placepath to place since this contains primary information about the place, including the reference to the canonical place
mappings/VegCore-VegBIEN.csv: location: Populate sourceaccessioncode with locationID + subplot when subplot is unique only within the parent plot, so that location always has a sourceaccessioncode to use as the plotCode in analytical_db_view
mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode: Only populate if needed to distinguish the taxonoccurrence within a plot
input.Makefile: Staging tables installation: `%/install: %/create.sql`: Don't add a row number column to the created table because it is now added automatically to the temp table by column-based import (row-based import now also does not require a pkey for DB inputs)
inputs/*/*/map.csv: Prefix a * to every term that's not in Veg+ for easy identification of unmapped terms when editing map.csv. Note that canon will remove the * when it finds a matching Veg+ term.
mappings/VegCore-VegBIEN.csv: Remapped verbatimLatitude/Longitude to locationcoords.verbatimlatitude/longitude because these fields now contain only non-decimal coordinates. This involves removing the _alt suffix on decimalLatitude/Longitude, which causes the VegBIEN.csvs to change.
inputs/*/*/map.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees
mappings/VegCore-VegBIEN.csv: Added empty mappings for special values (OMIT, etc.), so that they don't show up in **/unmapped_terms.csv. Note that the VegBIEN.csvs only change because the "No join mapping" errors change to "No non-empty join mapping".
input.Makefile: Maps validation: %/new_terms.csv: Include the entire map spreadsheet row, so that each new term is listed together with its mapping. This facilitates adding new mappings to mappings/Veg+-VegCore.csv directly from any new_terms.csv. Note that the use of `sort -u` (in lib/mappings.Makefile) causes multiline comments to be separated, leading to spurious lines for each multiline comment line.
inputs/VegBank/plot_/: Automapped with new parentPlotID term, which now has a join mapping in mappings/VegCore-VegBIEN.csv
Regenerated unmapped_terms.csv, new_terms.csv
mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode alternatives: Use _first instead of _alt because when one of these fields is present, it can be used directly even if it's sometimes NULL, without needing to spend a lot of time _alting together fields that won't be used. Datasources where the authortaxoncode is sometimes NULL usually have a separate sourceaccessioncode for the taxonoccurrence. (In the rare case that they don't, they should map a non-NULL field to recordNumber or tag to ensure that taxonoccurrences can be uniquely identified.)
mappings/VegCore-VegBIEN.csv: Mapped tag to taxonoccurrence.authortaxoncode when the record is an organism, in case there is no other ID for the taxonoccurrence. This fixes a bug in FIA and TEAM data where all organisms in a plot used the same taxonoccurrence because taxonoccurrence was not properly constrained, causing the loss of individual taxondeterminations on each organism.
inputs/*/*/map.csv: Remapped all unused terms to special value UNUSED. Remapped all private terms to special value PRIVATE. Remapped all deliberately unmapped terms to special value OMIT.
inputs/VegBank/plot_/map.csv: Remapped elevation from verbatimElevation to elevationInMeters, since the values are all decimals. The units come from the data dictionary.
mappings/Veg+-VegCore.csv: Mapped realLatitude, realLongitude to OMIT because private data should not be placed in a public database
inputs/VegBank/plot_/map.csv: Documented that elevationrange is unused
schemas/vegbien.sql: Changed _frac units suffix to _fraction for clarity and for consistency with _percent (which is spelled out), as used by SALVIAS (http://salvias.net/Documents/salvias_data_dictionary.html) and elsewhere
inputs/*/*/map.csv: Remapped applicable plotArea fields to plotArea_m2
schemas/vegbien.sql: Added units suffix to all core VegBIEN fields that have units. It is the responsibility of the mappings to ensure that all units are properly translated.
schemas/vegbien.sql: Added placepath (analogous to taxonpath), and point locationplace to it instead of directly to namedplace
schemas/vegbien.sql: Split locationdetermination into locationcoords and locationplace, so that coordinate determinations can be made separately from place determinations
inputs/*/*/map.csv: Changed output column header from Veg+ to VegCore because the names will be VegCore names after automapping. This is possible now that we're using new automapping scripts that do not require a particular column header.
input.Makefile: Maps validation: $(newTerms): Fixed bug where header needed to be removed before running filter_out_ci because filter_out_ci only removes the header if it matches the vocabulary's header. Removing the header afterward can cause the first row to be removed instead if the header was already removed.
inputs/*/*/map.csv: Added Filter column to contain any suffix added after the term, so that the automapping mechanism does not have to deal with the filter expressions
inputs/*/*/map.csv: Removed no longer needed [Veg+] suffix in root, because the input column is no longer used by old-style map utilities such as union that needed this
filter_out_ci: Filter header instead of passing it through, in order to properly support CSVs without a header, such as the unmapped_terms.csv and new_terms.csv files. For CSVs with a header, the header of the vocabulary should be removed before passing it to filter_out_ci.
input.Makefile: Maps building: Removed no longer used %/src.csv, because it is no longer needed to generate map.full.csv from map.csv
input.Makefile: Maps building: Removed no longer used %/map.full.csv
input.Makefile: Maps building: %/map.full.csv: Generate by copying map.csv, because the content of these files now differs only in the sort order of the names
inputs/*/*/map.csv: Changed empty mappings to self mappings, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Change-empty-mappings-to-self-mappings>. Note that in map.full.csv and VegBIEN.csv, lines that have changed are always the result of the input field's case being changed to match the case of the datasource's actual column name.
inputs/*/*/map.csv: Added back automapped mappings to map.csv, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Add-back-automapped-mappings-to-mapcsv>
mappings/VegCore-VegBIEN.csv: Removed no longer needed /_simplifyPath:[next=parent_id]/path expressions in specific paths because parent_id forwarding is now set globally for all paths in the map root
mappings/VegCore-VegBIEN.csv: Added /_simplifyPath:[next=parent_id]/path to root so the returned subplot location will be its parent location if there is no subplot name or ID (indicating that that particular plot did not have subplots). Note that this also causes the parent_id forwarding effect to occur for all other tables containing parent_id, which will help prevent similar issues with subplot events, etc. This will hopefully fix the SALVIAS.plotObservations bug where some organisms did not have a subplot #, causing the subplot location to become NULL and causing the corresponding locationevent rows not to match the locationevent_unique_within_location index filter condition (which requires a parent_id), which caused multiple output table pkeys to be returned for those rows, violating the locationevent_pkeys temp table's primary key.
mappings/VegCore-VegBIEN.csv: namedplace elements: _simplifyPath() calls: Removed no longer needed `require` arg, and removed no longer needed table suffix from `next` arg
Regenerated/modified inputs/*/*/src.csv to use the self-mapping format used by the new automapping mechanism
input.Makefile: Maps building: %/.map.csv.last_cleanup: $(newTerms): Remove the CSV header from the terms lists so that multiple terms lists can easily be appended together
input.Makefile: Maps building: %/.map.csv.last_cleanup: Generate reports on new and unmapped terms in map.csv
input.Makefile: Maps building: %/.map.csv.last_cleanup: Translate map.csv using $(mappings)/$(via)-VegCore.csv
mappings/VegCore-VegBIEN.csv: Mapped min/max SlopeAspect/SlopeGradient. Note that this allows the min/maxSlopeAspect values to bypass the additional _compass filter that is applied to slopeAspect.
inputs/VegBank/plot_/map.csv: Omit reallatitude/reallongitude because private data should not be placed in a public database
inputs/VegBank/: Added plot_/