tnrs_client: Support parsing multiple taxons at once, by specifying each as a command-line argument. Increased the max_pause to 10 min to support large batches. Limited the batch size to 5000 names, using the limit at <http://tnrs.iplantcollaborative.org/TNRSapp.html>. Note that when using xargs to pass many names, xargs will by default split its arguments into chunks of 5000. You can change this using the -n option.
inputs/import.stats.xls: Updated import times
Added tnrs_client. Note that obtaining an actual CSV requires four (!) steps: submit, retrieve, prepare download, and download. The output of the retrieve step is unusable because the array has different lengths depending on the taxonomic ranks present in the provided taxon name. This initial version runs one name at a time, but could later be expanded to batch process because TNRS can run multiple names at once.
streams.py: Line iteration: Added read_all()
inputs/Madidi/Plot/map.csv: Soil component measurements: Documented that units are assumed to be % based on the range of values
sql_io.py: null_strs: Added '-'
sql_io.py: cleanup_table(): Fixed bug where each column name needed to be converted to Unicode before being concatenated with other strings, to support non-ASCII characters
inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Documented that units are assumed to be % based on the range of values
inputs/SALVIAS/plotMetadata/map.csv, inputs/SALVIAS-CSV/Plot/map.csv: Soil component measurements: Removed no longer needed old-style _units filter, now that unit conversion is handled by mappings/VegCore-VegBIEN.csv using _percent_to_fraction
inputs/VegBank/observation_/map.csv: soilObs fields: Cited data dictionary source of units
mappings/Veg+-VegCore.csv: Soil component measurements: Added unitless terms that automap to all alternatives of units
mappings/VegCore.csv: Added term with *_fraction units for every *_percent term
mappings/VegCore.csv: Soil component measurements: Added default units of percent (cmol_kg for cationExchangeCapacity). This involves translating the names everywhere and adding a _percent_to_fraction conversion in mappings/VegCore-VegBIEN.csv.
mappings/VegCore-VegBIEN.csv: Remapped verbatimLatitude/Longitude to locationcoords.verbatimlatitude/longitude because these fields now contain only non-decimal coordinates. This involves removing the _alt suffix on decimalLatitude/Longitude, which causes the VegBIEN.csvs to change.
inputs/*/*/map.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees
inputs/SpeciesLink/Specimen/map.csv: Documented that dwc_geospatial_VerbatimLatitude/Longitude contain a mix of DMS and other verbatim coordinates
inputs/QMOR/Specimen/map.csv: Remapped verbatimLatitude/verbatimLongitude to latitude_DMS/longitude_DMS since these fields contain DMS values
inputs/Madidi/Plot/map.csv: Remapped Latitude/Longitude (DMS) to new latitude_DMS/longitude_DMS
mappings/VegCore-VegBIEN.csv: Mapped latitude_DMS, longitude_DMS
mappings/VegCore.csv: Added latitude_DMS, longitude_DMS
inputs/REMIB/Specimen/map.csv: Remapped lat_deg/long_deg to decimalLatitude/Longitude because these values are (integer) degrees suitable for decimalLatitude/Longitude. Note that the other DMS fields are not yet translated to decimal degrees.
mappings/Veg+-VegCore.csv: Remapped latitude/longitude to decimalLatitude/Longitude because these fields almost always have units of decimal degrees
mappings/VegCore-VegBIEN.csv: Added empty mappings for special values (OMIT, etc.), so that they don't show up in **/unmapped_terms.csv. Note that the VegBIEN.csvs only change because the "No join mapping" errors change to "No non-empty join mapping".
input.Makefile: Maps validation: %/unmapped_terms.csv, %/new_terms.csv: Don't automatically regenerate the aggregated unmapped_terms.csv, new_terms.csv because this almost doubles the remake time when a mappings/ prerequisite changes (41s -> 75s)
inputs/GBIF/Specimen/map.csv: Remapped VerbatimLatitude/Longitude to decimalLatitude/Longitude because DecimalLatitude/Longitude just contains VerbatimLatitude/Longitude cast to a low-resolution float, which created spurious repeating decimals
mappings/Makefile: .VegCore-VegBIEN.csv.last_cleanup: Generate VegCore-VegBIEN.unsourced_terms.csv whenever VegCore-VegBIEN.csv changes, to track VegCore terms that are mapped to VegBIEN but not documented in VegCore.csv. Note that this file is not svn:ignored, so it will show up with a ? when the user runs `svn st` if there are any unsourced terms.
mappings/Makefile: Changed catch-all `.%.last_cleanup: %` target to a specific target for VegCore-VegBIEN.csv, because it's the only file that uses this target
mappings/: Don't generate a for_review version of Veg+-VegCore.csv, because it is identical to the machine-readable Veg+-VegCore.csv (there are no output XPaths to simplify)
mappings/: Don't generate a for_review version of VegX-VegCore.csv, because it is identical to the machine-readable VegX-VegCore.csv (there are no output XPaths to simplify)
mappings/: Removed Veg+.unmapped_terms.csv because these terms are found in each datasource's new_terms.csv, which are updated regularly, while this file isn't, and which exist for every datasource, while this file only contained terms from a few datasources
inputs/ARIZ/Specimen/map.csv: Remapped VerbatimLatitude, VerbatimLongitude to UNUSED
Regenerated root unmapped_terms.csv, new_terms.csv
lib/mappings.Makefile: unmapped_terms.csv, new_terms.csv: Only remake if newer than existing %/unmapped_terms.csv, %/new_terms.csv which haven't been autoremoved. This avoids always remaking every unmapped_terms.csv, new_terms.csv whenever `make missing_mappings` is run. Note that these files will automatically be remade whenever their corresponding map.csv changes, so it is not necessary to actually remake %/unmapped_terms.csv, %/new_terms.csv; they are prerequisites only so that their modification time may be checked to determine whether unmapped_terms.csv, new_terms.csv needs to be remade.
input.Makefile: Maps validation: %/unmapped_terms.csv, %/new_terms.csv: Automatically regenerate aggregated unmapped_terms.csv, new_terms.csv when a subdir's corresponding file changes
inputs/: Regenerated aggregated unmapped_terms.csv, new_terms.csv
inputs/REMIB/: Moved nodes.make into Specimen.src/ so it's with the data it generates
inputs/TEAM/: Regenerated */new_terms.csv
inputs/TEAM/: Obtained new download of TEAM data. (Note that the new download has a slightly different schema.) Archived old data in _archive/. Added tables to import_order.txt. Renamed TeamPlotMetaData/ to TEAM_Sites/ to correspond with the section header in Vegetation-Tree-and-Liana-Metadata-1.5.pdf. Fixed TEAM_Sites mappings: Remapped CollectionDate to eventDate because it relates to the plot, not the organism. Mapped Name to plotName so TEAM_Sites data will match up with VL, VT data.
inputs/TEAM/VL, VT: Split concatenated flat files apart into separate parts each time a header is duplicated, so that the header would be autoremoved by cat_csv. Changed modified BIEN2 flat file headers back to original headers (the duplicated headers) so the headers of all part files would match up. (This is required for cat_csv header autoremoval to work properly.) This results in changes to the input column names in */map.csv.
sql_io.py: null_strs: Added 'nulo' (used by REMIB)
mappings/Veg+-VegCore.csv: DBH: Removed diameterBreastHeight_m alternative because datasources that don't append units to DBH almost always have units of cm or in
inputs/TEAM/*/map.csv: Remapped dbh from diameterBreastHeight_m to diameterBreastHeight_cm, using the units defined in Vegetation-Metadata-1.4.pdf
inputs/TEAM/: Added TeamPlotMetaData
inputs/TEAM/_src/: Added ci-team_extract/Vegetation-Metadata-1.4.pdf and symlink to it in the _src subdir
inputs/: Added aggregated unmapped_terms.csv, new_terms.csv which were not already under version control
inputs/SALVIAS-CSV/Organism/map.csv: Remapped stem_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for intercept_cm, which measures the same dimension
inputs/SALVIAS/stems/map.csv: Remapped stem_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for plotObservations.intercept_cm, which measures the same dimension
inputs/SALVIAS/plotObservations/map.csv: Remapped temp_dbh from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the units for intercept_cm, which measures the same dimension
inputs/Madidi/Organism/map.csv: Remapped Diameter from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units based on the range and precision of values
inputs/FIA/Organism/map.csv: DBH: Changed units comment to include that assumption was also based on location inside the U.S., because some data outside the U.S. also uses fractional DBHs, but these are not likely to be inch measurements
inputs/FIA/Organism/map.csv: Remapped DBH from diameterBreastHeight_m to diameterBreastHeight_in, assuming units based on the range and precision of values
inputs/CTFS/StemObservation/map.csv: DBH: Changed units comment to include that assumption was also based on the precision of values, because fractional DBHs sometimes indicate units of inches
mappings/VegCore.csv: Added diameterBreastHeight_in
schemas/functions.sql: Added _in_to_m()
mappings/Veg+-VegCore.csv: Remapped DBH from no longer existing term diameterBreastHeight to diameterBreastHeight_cm, diameterBreastHeight_m (both terms will be listed in the map spreadsheet after automapping, and the user can then choose one)
inputs/CTFS/StemObservation/map.csv: Remapped DBH from diameterBreastHeight_m to diameterBreastHeight_cm, assuming units are cm based on the range of values
mappings/VegCore.csv: Added diameterBreastHeight_cm
mappings/VegCore.csv: Added stemID, which was only in mappings/VegCore-VegBIEN.csv
input.Makefile: Maps validation: Inline $(unmappedTerms) because it's only used once
input.Makefile: Maps validation: %/new_terms.csv: Include the entire map spreadsheet row, so that each new term is listed together with its mapping. This facilitates adding new mappings to mappings/Veg+-VegCore.csv directly from any new_terms.csv. Note that the use of `sort -u` (in lib/mappings.Makefile) causes multiline comments to be separated, leading to spurious lines for each multiline comment line.
inputs/: Added unmapped_terms.csv, new_terms.csv which were not already under version control
inputs/VegBank/plot_/: Automapped with new parentPlotID term, which now has a join mapping in mappings/VegCore-VegBIEN.csv
Regenerated unmapped_terms.csv, new_terms.csv
mappings/Veg+-VegCore.csv: Added parentPlotID
mappings/VegCore-VegBIEN.csv: Added parentLocationID, parentPlotName, which always map directly to the parent location, regardless of whether any subplot ID is present
mappings/Veg+.unmapped_terms.csv: Removed vague term volumeCanopy, which has no definition in VegX
mappings/Makefile: .VegCore.csv.last_cleanup: Fixed bug where needed to change sorting columns to match new column order
mappings/VegCore.csv: Reordered columns to put Comments first, which matches mappings/Veg+-VegCore.csv
mappings/Veg+-VegCore.csv: Removed redundant stem_id->stemID mapping
mappings/Veg+-VegCore.csv: Standardized the capitalization of names, by camel-casing each name except for acronyms and "ID", which are made all uppercase
mappings/VegCore.csv: Renamed diameterBreastHeight to diameterBreastHeight_m to assert units matching the VegBIEN field
mappings/VegCore.csv: Removed duplicates
input.Makefile: Maps building: Use new mappings/VegCore.csv as the VegCore vocabulary to canonicalize on, in order to also canonicalize VegCore terms which are not yet mapped to VegBIEN. This results in several DwC terms getting their case standardized according to http://rs.tdwg.org/dwc/terms/. Continue to determine unmapped terms using mappings/VegCore-VegBIEN.csv, because a term should not be considered mapped until it has been mapped all the way through to VegBIEN.
mappings/VegCore.csv: Removed trailing spaces from terms
mappings/Veg+.unmapped_terms.csv: Removed duplicates of VegCore terms
mappings/: Split Veg+.terms.csv into VegCore.csv and Veg+.unmapped_terms.csv
mappings/Veg+.terms.csv: Removed terms that are in mappings/Veg+-VegCore.csv
mappings/Veg+-VegCore.csv: Added sources where missing
mappings/Veg+-VegCore.csv: Added Source and Comments columns from mappings/Veg+.terms.csv. Reordered columns to put Comments first.
mappings/Veg+.terms.csv: Removed duplicate entries for stem_id/stemID, collector
inputs/REMIB/Specimen/: Filter out invalid, frameshifted rows so they don't produce errors in the import or anomalies like thousands of taxondeterminations for one taxonoccurrence. This involves moving the CSVs to Specimen.src and using a create.sql to create the filtered table.
mappings/VegCore-VegBIEN.csv: Forward occurrenceID to taxonoccurrence.sourceaccessioncode when there is no other taxonoccurrence.sourceaccessioncode, to ensure that taxonoccurrence is uniquely identified so that there is one taxonoccurrence per organism
mappings/VegCore-VegBIEN.csv: taxonoccurrence.authortaxoncode alternatives: Use _first instead of _alt because when one of these fields is present, it can be used directly even if it's sometimes NULL, without needing to spend a lot of time _alting together fields that won't be used. Datasources where the authortaxoncode is sometimes NULL usually have a separate sourceaccessioncode for the taxonoccurrence. (In the rare case that they don't, they should map a non-NULL field to recordNumber or tag to ensure that taxonoccurrences can be uniquely identified.)
mappings/VegCore-VegBIEN.csv: Mapped tag to taxonoccurrence.authortaxoncode when the record is an organism, in case there is no other ID for the taxonoccurrence. This fixes a bug in FIA and TEAM data where all organisms in a plot used the same taxonoccurrence because taxonoccurrence was not properly constrained, causing the loss of individual taxondeterminations on each organism.
input.Makefile: Testing: %/test.by_col.xml: Do abort tester if by-column test fails. There are no longer small rowcount differences between row-based and column-based import on some datasources, so this is now possible.
schemas/vegbien.sql: stemobservation: stemobservation_unique_within_plantobservation unique index: Added tag so that a stemobservation can be scoped by its tag when no other ID is specified
schemas/vegbien.sql: stemobservation: stemobservation_unique_within_plantobservation unique index: Fixed bug where filter condition underconstrained stemobservation when neither sourceaccessioncode nor authorstemcode was specified, by making sure that at least one *_unique index always applies
mappings/VegCore-VegBIEN.csv: Remapped tag to new stemobservation.tag
schemas/vegbien.sql: stemobservation: Added tag, tags
mappings/VegCore-VegBIEN.csv: tag: Removed no longer applicable comment
mappings/VegCore-VegBIEN.csv: Removed no longer used previousTag and the complex mapping logic that attempts to place both tags in VegBIEN in the correct order but does not work for column-based import. tag: Removed iscurrent=true because there is now only one tag field.
inputs/SALVIAS/*/map.csv: Remapped all versions of stem and tree tags to tag, with the second tag superceding the first, to avoid the complex VegCore-VegBIEN mapping logic that attempts to place both tags in VegBIEN in the correct order but does not work for column-based import. inputs/SALVIAS-CSV/Organism/map.csv: stem and tree tags: Made the stem tag supercede the tree tag instead of vice versa, to have as specific of a tag as possible.
inputs/SALVIAS/stems/map.csv: Copied Brad's comments on plotObservations.tag1, tag2 to stem_tag1, stem_tag2
mappings/VegCore-VegBIEN.csv: Removed _rangeStart and _rangeEnd filters from fields which should contain decimal values. These filters should be added on a per-datasource basis instead.
inputs/ARIZ/Specimen/map.csv: Documented that MinimumElevationInMeters, MinimumElevationInMeters contain some verbatim values, including ranges and units