my2pg.data: Translate indefinite dates (dates with 0 as the month or day)
my2pg: Use my2pg.data to perform data-only replacements, instead of duplicating them in both my2pg and my2pg.data
my2pg: named UNIQUE KEYs: Comment out the name because PostgreSQL requires it to be globally unique, but MySQL only requires it to be unique within the table
my2pg: Translate UNIQUE KEYs instead of removing them
my2pg*: Removed KEYs: Comment out the definition rather than removing it
my2pg*: Remove FOREIGN KEYs because MySQL does not dump tables in dependency order, which prevents PostgreSQL from creating tables whose fkeys refer to a later table
my2pg*: Replacing invalid table elements to remove them: Use a dummy CHECK constraint instead of a boolean field to avoid adding fields to the table. The elements can't always simply be removed because sed can't remove the trailing comma of the previous element, and removing the following comma doesn't work for the last element in the table.
my2pg*: Replace '0000-00-00 00:00:00' with '-infinity'
my2pg: Replace datetime with timestamp
my2pg: Remove COLLATE field attribute
lib/MySQL.*.sql.make: Documented that $server user/host are for ssh, not the DB
lib/MySQL.*.sql.make: Documented that $server can also contain a username (which will be used by ssh)
my2pg_export: Use the --quick option to facilitate exporting large tables (it avoids retrieving all rows before outputting any of them)
README.TXT: Datasource setup: Added instructions for MS Access databases
README.TXT: Datasource setup: MySQL inputs: Added instruction to skip the Add input data for each table section
inputs/NY/: Added SQL export for refresh
mappings/VegCore.htm: Regenerated from wiki. Brad's new DwC ID terms spreadsheet has now been added, and a number of the ID terms clarified, disambiguated, and recategorized. In particular, institutionCode has now been split into the custodialInstitutions and collectingInstitution, to differentiate between which institution has the specimen vs. stamped the specimen. This distinction is important because the catalogNumber, stamped on the specimen, is only unique within the collectingInstitution. Most datasources don't unambiguously specify which institution their institutionCode is referring to, so it has been assumed to be custodialInstitutions unless a data dictionary says otherwise (as is the case for UNCC). In addition, a MatchedTaxonDetermination table has been added with the *_matched fields from TNRS.
inputs/CVS/observation_/map.csv: baseSaturation: Resolved ambiguous term
mappings/Makefile: VegCore.vocab.csv: Ignore leading ? when sorting so that ambiguous terms sort alphabetically with other terms. This prevents terms from moving from their previous location when they become ambiguous.
Added sort_ci to sort a spreadsheet, ignoring leading punctuation
mappings/VegCore.vocab.csv: Changed line endings to \r\n in preparation for having a Python script run on it (which changes the line endings)
mappings/Makefile: VegCore.vocab.csv: Added back ambiguous terms, so that the vocabulary contains all terms defined by VegCore, regardless of whether they are ambiguous or unambiguous terms
mappings/Makefile: VegCore.vocab.csv: Added back synonyms, so that the vocabulary contains all terms defined by VegCore, regardless of whether they are synonyms or primary terms. This also prevents VegCore.vocab.csv from losing entries when terms are renamed, which made it difficult to verify that no terms were lost when refactoring.
inputs/MO/Specimen/postprocess.sql: Remove frameshifted rows by detecting InstitutionCodes without any letters
inputs/ARIZ/Specimen/map.csv: CollectorNumber/FieldNumber: Use /_first to map these identical fields to the same location
inputs/ARIZ/Specimen/map.csv: Fixed bug where the column names for InstitutionCode and CollectionCode were reversed in the source data
inputs/*/Specimen/map.csv for Canadensys sources: Remapped institutionID to UNUSED
mappings/VegCore.htm: Regenerated from wiki. The original*, accepted*, and verbatim* Taxon fields have now been moved to separate OriginalTaxonDetermination, AcceptedTaxonDetermination, and TaxonVerbatim tables.
mappings/VegCore.htm: Regenerated from wiki
README.TXT: Maintenance: VegCore data dictionary: Replaced VegCore.*.csv with VegCore.htm because now that VegCore.*.csv are sorted alphabetically, they generally don't change when VegCore.htm changes
mappings/VegCore.*.csv: Regenerated from wiki. A plain text label is now used for Replace with, which fixes a bug where the PRIVATE permalink pointed to its Replace with in realLatitude instead of its definition.
redmine_synonyms: Support plain text labels other than Alternative, such as Replace with
mappings/VegCore.*.csv: Regenerated from wiki. Alternatives now contain the "Alternative" label as plain text rather than as an image title, thus avoiding an HTML anchor conflict with the definition and allowing ambiguous terms to be placed before their alternatives as well as after.
README.TXT: Maintenance: VegCore data dictionary: Updated VegCore.csv filename to VegCore.*.csv
redmine_synonyms: Support alternatives which contain the "Alternative" label as plain text rather than as an image title. This is done to include the "Alternative" label in the HTML anchor and thus prevent the anchor from conflicting with the actual definition of the alternative (which would otherwise have the same anchor text). This allows ambiguous terms to be placed before their alternatives as well as after, because there won't be anchor conflicts that need to be resolved with careful ordering.
mappings/VegCore.csv: Regenerated from wiki. Taxon terms with prefixes for other TaxonDeterminations now indicate the analogous term in an "analogous to" label next to the term
mappings/VegCore-VegBIEN.csv: datasourceRecordID: Fixed bug where also need to add datasourceRecordID next to occurrenceID for an institutionCode remap switch
inputs/bien_web/observation/test.xml.ref: Regenerated
inputs/import.stats.xls: Updated import times using the import_times bugfix for times longer than a day
import_times: times(): Fixed bug where need to match whitespace in times, in order to match times with days
inputs/*/Specimen/map.csv: Remapped ID to datasourceRecordID
mappings/VegCore-VegBIEN.csv: Mapped datasourceRecordID
inputs/import.stats.xls: Updated import times
inputs/FIA/_src/_README.TXT: Documented that the refresh is missing some PLT_CN values present in the original version
inputs/FIA/import_order.txt: Reverted back to using FIA_COND_unique instead of COND_unique because the PLT_CN IDs in the refresh don't match the PLT_CN IDs in the original version, making COND_unique and Organism incompatible
inputs/FIA/import_order.txt: Removed FIA_COND_unique, which is superseded by COND_unique
inputs/FIA/import_order.txt: Fixed bug where need to import COND_unique before Organism because the plot entries need to be created before they can be linked to by organisms
redmine_synonyms: sed pattern: Match <h# directly at the beginning of the line rather than after ^.*, which greatly speeds up the pattern matching because the first character is a literal character. (If <h# were not located at the left margin, the ^.* would unfortunately still be needed because the beginning of the line needs to be matched in order to be removed by the replacement operation.)
mappings/VegCore.csv: Regenerated from wiki. Alternatives are now able to use h3 instead of h4 (which had display problems). realLatitude/Longitude is now no longer needs the ? prefix to have its replacement (PRIVATE) interpreted as an alternative, and thus is properly able to be included in the vocabulary.
mappings/Makefile: VegCore.vocab.csv: Use the term's type label instead of its header level to determine if it's a synonym or alternative. This allows header levels to be chosen for presentational reasons rather than being constrained by being parsable.
redmine_synonyms: Don't require ambiguous terms to start with ?, because the ambiguous term for an alternative can be identified simply by choosing the last term that didn't have a type label (previously, this would have been the last term that wasn't h3 or h4)
redmine_synonyms: Use the term's type label instead of its header level to determine if it's a synonym or alternative. This allows header levels to be chosen for presentational reasons rather than being constrained by being parsable.
mappings/VegCore.csv: Regenerated from wiki. The data dictionary has been reformatted to be much more vertically compact, by placing the term type (Synonym, Alternative, etc.) and sources (From:) on the same line as the term. Note that globalUniqueIdentifier_SpeciesLink has been removed from the vocabulary because a definition entry has been added for it (when this entry is missing, the term is incorrectly identified as a primary term).
mappings/Makefile, redmine_synonyms: Updated for new VegCore data dictionary format, which prefixes the term type (Synonym, Alternative, etc.) to the term instead of including it as a section label. This ensures that the term type of a non-primary term is shown next to the term when it is visited via a permalink, which causes the term header to appear at the top of the screen and obscures the section header containing the type.
mappings/Makefile: VegCore.thesaurus.csv: removal of tables: ignore errors if grep found no match
Renamed mappings/VegCore.csv to VegCore.vocab.csv and Veg+-VegCore.csv to VegCore.thesaurus.csv for clarity
mappings/Makefile, input.Makefile: Renamed $(dict) to $(thesaurus) because Veg+-VegCore.csv is actually a thesaurus, not a dictionary
mappings/Makefile: Replaced occurrences of VegCore.csv with $(vocab) and Veg+-VegCore.csv with $(dict)
README.TXT: Maintenance: VegCore data dictionary: When moving terms, check that no terms were lost: Updated steps now that VegCore.csv and Veg+-VegCore.csv are sorted by name, so that a comparison of added/deleted counts is not necessary and a simple `svn di` can be used
mappings/Makefile: Veg+-VegCore.csv: Sort terms by name so that reordering terms in the VegCore data dictionary does not cause Veg+-VegCore.csv to change. This makes it much easier to identify synonyms and ambiguous terms that were accidentally deleted during a data dictionary refactoring. (Note that these are no longer included in VegCore.csv, so this is required in addition to sorting VegCore.csv by name.)
mappings/Makefile: VegCore.csv: Sort terms by name so that reordering terms in the VegCore data dictionary does not cause VegCore.csv to change. This makes it much easier to identify terms that were accidentally deleted during a data dictionary refactoring.
mappings/VegCore.csv: Regenerated from wiki. This adds cf_aff.
mappings/Makefile: VegCore.csv: Filter out namespaces by matching only terms whose header links within the data dictionary
mappings/VegCore.csv: Regenerated from wiki. This causes TNRS's Annotations (cf/aff) to be mapped into VegBIEN.
mappings/VegCore-VegBIEN.csv: matched*Fit_fraction: Remapped to taxonconfidence instead of taxonfit
mappings/Makefile: VegCore.csv: Fixed bug where need to remove duplicates, which are no longer supported by canon, by removing alternatives of ambiguous terms when these occur separately from their definitions
mappings/Makefile: VegCore.csv: Removed synonyms and ambiguous terms, since the canonicalization of them is handled by Veg+-VegCore.csv. This also reduces the time it takes canon to build the in-memory Python dict of replacements, which scales to all inputs and should speed up the build/test command.
mappings/Makefile: VegCore.csv: Removed synonyms, since the canonicalization of them is handled by Veg+-VegCore.csv
mappings/Makefile: VegCore.csv: Match terms by header # instead of matching all anchors, in order to include the leading ? before an ambiguous term
mappings/Makefile: Veg+-VegCore.csv: Generate dynamically from VegCore.htm, which allows the VegCore thesaurus to be automatically kept up to date. More importantly, it allows terms in all map spreadsheets to be updated simultaneously when a term is renamed (e.g. by replacing a term with one of its synonyms).
mappings/VegX-VegCore.csv: Applied term renamings from the new dynamically generated Veg+-VegCore.csv. Updates to VegCore term names that have occurred since the data dictionary was created are now able to take effect, which involves remapping several fields.
mappings/VegCore-VegBIEN.csv, inputs/*/*/map.csv: Applied term renamings from the new dynamically generated Veg+-VegCore.csv, which reflects the current state of the data dictionary. (Permanently switching to the new Veg+-VegCore.csv will be a separate change.) Updates to VegCore term names that have occurred since the data dictionary was created are now able to take effect, which involves remapping and inferring units on several fields.
mappings/VegCore-VegBIEN.csv: Mapped basalDiameter_in
mappings/VegCore-VegBIEN.csv: Mapped diameterBreastHeightGentry_cm, basalDiameter_cm, precipitation_mm
schemas/vegbien.sql: Added _mm_to_m()
mappings/Makefile: Veg+-VegCore.csv: Fixed bugs where also need to filter out ambiguous tables, but shouldn't filter out acronyms (which are regular fields)
mappings/VegCore-VegBIEN.csv: locationID->location.sourceaccessioncode: Removed restriction that this mapping can't occur if geovalidation information is present. The locationID is no longer mapped to the place.sourceaccessioncode, so this filter is not necessary.
mappings/VegCore.csv: Regenerated from wiki
mappings/Makefile: Veg+-VegCore.csv: Fixed bug where need to filter out table names to avoid applying table replacements to fields which have the same name as a table
inputs/Madidi/map.csv: Fixed bug where needed to remove duplicate input names, now that translate doesn't allow them
mappings/Makefile: VegX-VegCore.csv: Sort by the input column instead of the output column to keep the sort order stable across VegCore term renames
mappings/Makefile: Veg+-VegCore.csv: Before running collapse_multimap, canonicalize alternatives of ambiguous terms using unambiguous mappings. This ensures that the alternatives lists contain only canonical VegCore terms rather than synonyms.
mappings/VegCore.csv: Regenerated from wiki. All synonyms are now hyperlinked, allowing them to be matched by redmine_synonyms.
mappings/Veg+-VegCore.csv: Removed Sources, Definition columns because source information is now in the VegCore data dictionary
mappings/VegCore.csv: Regenerated from wiki. Ambiguous terms newly available to redmine_synonyms due to the bugfix now have multiple alternatives.
redmine_synonyms: Ambiguous terms: Fixed bug where need to use header # instead of term name to determine whether a term is an alternative, because some alternatives (e.g. verbatimElevation) don't follow the units-suffix naming convention.
mappings/VegCore.csv: Regenerated from wiki. All ambiguous terms now have multiple alternatives, preventing them from being automapped to a single alternative without prompting the user for confirmation
mappings/Makefile: Veg+-VegCore.csv: translate: Fixed bug where need to run on $@ instead of $<
mappings/VegCore.csv: Regenerated from wiki. All mappings/Veg+-VegCore.csv terms are now added as synonyms or separate terms.
mappings/VegCore.csv: Regenerated from wiki. Most ambiguous terms are now split into alternatives, and most mappings/Veg+-VegCore.csv terms are now added as synonyms.
canon: Raise an error if two input terms map to the same simplified string
translate: Changed dictionary to thesaurus, since the map used actually has synonyms rather than definitions
mappings/Makefile: Veg+-VegCore.csv: Translate the thesaurus's output terms using itself in order to map a synonym of an ambiguous term directly to its alternatives list rather than only to the ambiguous term itself
mappings/Makefile: Veg+-VegCore.csv: Run collapse_multimap on the generated map so that all alternatives are included, rather than just the first alternative, when translate maps an ambiguous term
redmine_synonyms: Fixed bug where need to output a CSV rather than TSV to be usable by other programs that use map spreadsheets
Added collapse_multimap, which collapses multimap entries in a spreadsheet dictionary