Regenerated vegbien.ERD exports
schemas/vegbien.sql: Renamed plantconcept to taxonpath for consistency with DwC's Taxon category and to emphasize that the table stores taxonomic paths
schemas/vegbien.sql: Renamed plantname to taxon for consistency with DwC's Taxon category
schemas/vegbien.sql: plantname: Renamed plantname field to taxonname for consistency with DwC's Taxon category
Updated aggregated unmapped_terms.csv, new_terms.csv. This removes terms that contained a filter (which is now in a separate column) and moves new terms that are unmapped from new_terms.csv to unmapped_terms.csv. Note that the majority of unmapped terms are from VegBank's huge tables, and are not part of the core fields needed for the analytical DB.
schemas/vegbien.sql: taxonrank: Switched to using extended taxonomic ranks list derived from VegX at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegBIEN_taxonomic_schema#Extended>. This renames *division to *phylum and splits up 'cultivar/forma'.
schemas/vegbien.sql: taxonrank: Removed 'authority', which doesn't belong as a taxonomic rank
schemas/vegbien.sql: plantname: Added authority so each taxonomic level can have its own authority (author). Include it in the plantname_unique unique index because plantname is a globally scoped table.
schemas/vegbien.sql: taxonrank: Removed 'binomial', which doesn't belong as a taxonomic rank
schemas/vegbien.sql: Changed analytical_db_view to use new denormalized taxonomic names in plantconcept, which significantly reduces the number of joins. Note that changing the tables used by a view which depends on other tables will cause those tables to be reordered in dependency order to appear before the view, causing things to be moved around in the svn diff.
inputs/Madidi/Organism/map.csv: Remapped Specie+autor to new scientificNameWithAuthorship. Mapped Species and morphotypes to now-available scientificName.
mappings/VegCore-VegBIEN.csv: Moved scientificNameWithAuthorship before scientificName in taxonoccurrence.authortaxoncode's _alts
mappings/VegCore-VegBIEN.csv: Mapped scientificNameWithAuthorship as an _alt of taxonoccurrence.authortaxoncode
mappings/VegCore-VegBIEN.csv: Mapped scientificNameWithAuthorship
mappings/Veg+.terms.csv: Added scientificNameWithAuthorship
mappings/VegCore-VegBIEN.csv: Taxonomic names: Remapped to new denormalized fields in plantconcept
schemas/vegbien.sql: plantname: Added comment documenting how to include a taxon name at a rank with no explicit column, by using the plantname table as an ordered linked list linked together using parent_id. (This method of using a linked list is one way of storing an ordered list of user-defined data. It is similar to using locationevent.previous_id to link successive reobservations of the same location together.) Note that plantname can store both the official tree of life and the data provider's own custom tree of life (or a subset thereof), with the two being distinguished by whether the data provider's or TNRS's taxondeterminations point to them.
schemas/vegbien.sql: plantname: Added verbatimrank to store ranks of custom taxonomic levels, such as rosids. Note that even if you specify a custom verbatimrank, you must also specify a closest-match rank from the taxonrank closed list. This ensures that every taxonomic name is placed in the correct relative order in the taxonomic hierarchy.
schemas/vegbien.sql: plantconcept: Made plantname_id optional because the datasource's plantconcepts do not need to be placed in the recursive plantname hierarchy
schemas/vegbien.sql: plantconcept: Added datasource_id and appropriate unique indexes to enable scoping by datasource. Moved plantcode right after datasource_id because it will be used for the sourceaccessioncode (if any).
schemas/vegbien.sql: Moved plantconcept.plantdescription to plantname and renamed it to description, so that a taxon of any rank can have a description
schemas/vegbien.sql: plantconcept: Added denormalized taxonomic ranks from <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/VegBIEN_taxonomic_schema#Primary> and concatenated scientific name fields
Removed no longer used ucase_first
Removed no longer used bin/union
Removed no longer used join_union_sort
Removed no longer used ci_map, because all relevant mapping scripts are now case-insensitive
mappings/Makefile: Inline $(review_) because it's only used once
mappings/Makefile: Removed no longer used $(review)
mappings/Makefile: Don't set $(SHELL) to /bin/bash because this is no longer needed
mappings/Makefile: Removed empty VegCSV section. mappings/Makefile's only functionality is now to clean up (sort) the core maps whenever they change and create human-readable maps from them.
mappings/Makefile: Removed no longer used self maps, because the new automapping mechanism does not use them
input.Makefile: Existing maps discovery: Substituted Veg+ for $(via) because it's now only used once
mappings/VegCore-VegBIEN.csv: Changed input column header from VegCore[Veg+] to VegCore because this is more accurate. This is possible now that we're using new automapping scripts that do not require a particular column header.
inputs/*/*/map.csv: Changed _merge to _join everywhere because _merge's (slower) duplicate elimination functionality is not needed (the combined columns do not both contain the same value, so they can simply be concatenated)
schemas/functions.sql: _label(): Accept params of any type, in order to support types other than text (which come from staging tables that are imported directly from a SQL export). This fixes a bug in SALVIAS.plotMetadata's column-based import.
schemas/functions.sql: _label(): Support NULL labels by not prepending a label
mappings/Veg+-VegCore.csv: Changed output column header from Veg+ to VegCore because this is more accurate. This is possible now that we're using new automapping scripts that do not require a particular column header. Note that this change now requires the map.csvs to use VegCore as their output column header, because otherwise the Veg+ header will get automapped to VegCore. (The header replacing is a feature to support changing the header when the schema of the column's terms changes.)
mappings/root.sh: Changed output column header from Veg+ to VegCore because this is more accurate following the initial automapping
inputs/*/*/map.csv: Changed output column header from Veg+ to VegCore because the names will be VegCore names after automapping. This is possible now that we're using new automapping scripts that do not require a particular column header.
inputs/import.stats.xls: Copied the Change factor formula to all rows (it displays an empty string for rows that don't have both a row-based and a column-based import)
README.TXT: Data import: Added steps to record the import times in inputs/import.stats.xls
inputs/import.stats.xls: Updated with stats from latest import
Added import_times
mappings/root.sh: Removed no longer needed $in_root_suffix
src_map: Upgraded to match new map format by adding Filter column
input.Makefile: $(viaMaps): Fixed bug where could not wrap it in $(wildcard) because that would prevent map.csv from being created when a new datasource or new subdir is added
input.Makefile: $(viaMaps): Removed extra addition of */map.csv, which is already included because all $(tables) have or will get a map.csv
mappings/: Removed no longer used derived file Veg+.vocab.csv
input.Makefile: Removed no longer used $(vocab)
input.Makefile: Maps validation: %/new_terms.csv: Filter out $(coreMap) and $(dict) successively instead of $(vocab), to avoid requiring intermediate mapping files not edited by the user
input.Makefile: Maps validation: $(newTerms): Don't hardcode the caller's first filter_out_ci by prerequisite position; instead allow them to specify the command (including the var name) themselves
input.Makefile: Maps validation: $(newTerms): For simplicity, subset the columns before running filter_out_ci
mappings/: Removed no longer used Veg+-VegBIEN.csv and derived autogen Veg+.self.csv
input.Makefile: Maps building: %/unmapped_terms.csv: Use $(coreMap) instead of $(vocab) because the terms should already be translated to VegCore terms, rather than still being Veg+
input.Makefile: Maps validation: $(newTerms): Fixed bug where header needed to be removed before running filter_out_ci because filter_out_ci only removes the header if it matches the vocabulary's header. Removing the header afterward can cause the first row to be removed instead if the header was already removed.
cols: Support CSVs without a header, such as intermediates that become unmapped_terms.csv, new_terms.csv
inputs/: Regenerated unmapped_terms.csv, new_terms.csv
input.Makefile: %/.map.csv.last_cleanup: Removed no longer used prerequisite $(vocab)
input.Makefile: %/.map.csv.last_cleanup: Canonicalize separately on $(coreMap) and $(dict), instead of requiring them to be combined in $(vocab)
input.Makefile: Use mappings/VegCore-VegBIEN.csv instead of mappings/Veg+-VegBIEN.csv as the core map, because the automapper now takes care of Veg+ -> VegCore translation
inputs/*/*/map.csv: Moved filter suffixes to separate filter column to enable automapping to work on those mappings' terms, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Move-filter-suffixes-to-separate-filter-column>. Note that the only changes to VegBIEN.csvs are the (now automapped) names of terms in "No join mapping" comments.
inputs/*/*/map.csv: Added Filter column to contain any suffix added after the term, so that the automapping mechanism does not have to deal with the filter expressions
Added cat_cols
Added ins_col
input.Makefile: Maps building: %/.map.csv.last_cleanup: Reference fixed prerequisites by name instead of by position in the prerequisites list
Removed no longer used intersect
inputs/*/*/map.csv: Removed no longer needed [Veg+] suffix in root, because the input column is no longer used by old-style map utilities such as union that needed this
translate: Translate the column header instead of passing it through, in order to properly support CSVs without a header and to support renaming the header when the column's contents change to a different schema or vocabulary
canon: Canonicalize the column header instead of passing it through, in order to properly support CSVs without a header
filter_out_ci: Filter header instead of passing it through, in order to properly support CSVs without a header, such as the unmapped_terms.csv and new_terms.csv files. For CSVs with a header, the header of the vocabulary should be removed before passing it to filter_out_ci.
autoremove: `svn rm`: Fixed bug where needed to add --force in case the file had already been modified before being autoremoved
input.Makefile: Maps building: Removed no longer used $(createOnlyMaps)
input.Makefile: Maps building: Removed no longer used %/src.csv, because it is no longer needed to generate map.full.csv from map.csv
input.Makefile: Maps building: %/map.csv: If it doesn't exist, generate directly using $(mkSrcMap) instead of by copying %/src.csv, in order to eventually avoid the need to create a separate src.csv at all. Note that this avoids the need to run make twice when the table is first created to properly bootstrap all maps.
autoremove: Try `svn rm` first in case the file is in svn
input.Makefile: Maps building: Removed no longer used %/map.full.csv
input.Makefile: Maps building: %/VegBIEN.csv: Use %/map.csv directly because %/map.full.csv is now a copy of it
input.Makefile: Maps building: %/map.full.csv: Generate by copying map.csv, because the content of these files now differs only in the sort order of the names
inputs/*/*/map.csv: Changed empty mappings to self mappings, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Change-empty-mappings-to-self-mappings>. Note that in map.full.csv and VegBIEN.csv, lines that have changed are always the result of the input field's case being changed to match the case of the datasource's actual column name.
join: passthru mode: Fixed bug where empty join mappings needed to have the output field of the right-hand row manually set to the output field of the left-hand row for maps.merge_mappings() to work properly
inputs/*/*/map.csv: Added back automapped mappings to map.csv, using the steps at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Add-back-automapped-mappings-to-mapcsv>
inputs/VegBank/taxonobservation_/map.csv: Updated with new renamings of colliding join columns
join: When a join mapping exists but is empty, still include any additional columns from that mapping in the combined row
inputs/SpeciesLink/Specimen/src.csv, inputs/XAL/Specimen/src.csv: Use input term as the initial Veg+ term, so the src.csv can be used with the Add back automapped mappings process at <https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Map_refactoring#Add-back-automapped-mappings-to-mapcsv>
inputs/XAL/Specimen/src.csv, map.csv: Switched from using root prefixes to full column names, because the namespace mapping functionality can be handled much better by treating each namespace-qualified term as its own term rather than as a term and a prefix
inputs/SpeciesLink/Specimen/src.csv, map.csv: Switched from using root prefixes to full column names, because the namespace mapping functionality can be handled much better by treating each namespace-qualified term as its own term rather than as a term and a prefix
inputs/SpeciesLink/Specimen/map.csv: Removed no longer needed duplicate entries for each first letter case, which cause duplicate output mappings now that join is case- and punctuation-insensitive. Note that the `svn diff` hides _alt entry 0, which contains one of the removed duplicate columns that appears in the diff.
inputs/SpeciesLink/Specimen/src.csv, inputs/XAL/Specimen/src.csv: Added Comments column for consistency with autogenerated src.csv format
join: Added new passthru mode which passes through terms with no input mapping or no join mapping
inputs/: Added [Veg+] to via map roots to indicate that the datasource and Veg+ vocabularies are combinable. This is possible now that automapped entries are no longer subtracted when this is in the map root, so there is no concern of losing comments on subtracted automapped rows. Note that this change turns on old-style automapping for these datasources, causing SALVIAS plotMetadata to acquire additional mappings.
canon, translate, filter_out_ci: Support vocabularies/dictionaries with additional columns in addition to the functional column(s) used by the program. These columns can contain comments, etc. This was not originally supported because Python 2's iterable unpacking only supports "an iterable with the same number of items as there are targets in the target list" (http://docs.python.org/reference/simple_stmts.html#assignment-statements). We now use numeric array indexes instead to get around this limitation, and for consistency with other map-manipulation scripts.
Removed no longer used subtract (use filter_out_ci instead)
input.Makefile: Maps building: %/.map.csv.last_cleanup: Removed no longer needed subtraction of automapped entries, because information about unmapped and new terms is now available in unmapped_terms.csv and new_terms.csv
README.TXT: Data import: `make backups/download`: Removed '&' because running the command in the background prevents rsync from providing a continuously updating progress indication (because a backgrounded process's stdout is not a TTY)
mappings/VegCore-VegBIEN.csv: Removed no longer needed /_simplifyPath:[next=parent_id]/path expressions in specific paths because parent_id forwarding is now set globally for all paths in the map root
mappings/VegCore-VegBIEN.csv: Added /_simplifyPath:[next=parent_id]/path to root so the returned subplot location will be its parent location if there is no subplot name or ID (indicating that that particular plot did not have subplots). Note that this also causes the parent_id forwarding effect to occur for all other tables containing parent_id, which will help prevent similar issues with subplot events, etc. This will hopefully fix the SALVIAS.plotObservations bug where some organisms did not have a subplot #, causing the subplot location to become NULL and causing the corresponding locationevent rows not to match the locationevent_unique_within_location index filter condition (which requires a parent_id), which caused multiple output table pkeys to be returned for those rows, violating the locationevent_pkeys temp table's primary key.
mappings/VegCore-VegBIEN.csv: namedplace elements: _simplifyPath() calls: Removed no longer needed `require` arg, and removed no longer needed table suffix from `next` arg