Project

General

Profile

2012-09-07 conference call

Schema changes

Flattening the taxonomic and placename hierarchies

  • The large number of joins required to flatten these hierarchies causes the server to run out of disk space when generating the analytical DB
  • A flattened table with a column for each rank would be easier to join, query, filter, and map to
  • We want to preserve the data provider's own names, but normalizing unscrubbed names into a "tree of life" is not useful
  • We propose to add the standard DwC and VegBank taxonomic columns to the taxondetermination table or a separate taxonverbatim table
    • The namedplace hierarchy will be flattened to denormalized columns in locationdetermination
  • The normalized "tree of life" hierarchy will be used just to store the scrubbed names
    • The scrubbed names may additionally be stored in taxondetermination
  • taxonrank will become an open picklist?

Sample records showing data provider's higher classifications

Also shows data that we might want to filter out for the analysis

ScientificName Kingdom Phylum Class Order Family Genus Species Subspecies ScientificNameAuthor
Tolypella comosa Allen Algae Charophyta Charophyceae (01) Charales Characeae Tolypella comosa Allen
Tolypella intertexta Allen Algae Charophyta Charophyceae (01) Charales Characeae Tolypella intertexta Allen

query:

SELECT "ScientificName", "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species", "Subspecies", "ScientificNameAuthor" 
FROM "NY"."Specimen" 
WHERE "Phylum" IS NOT NULL AND "Class" IS NOT NULL
LIMIT 2

Validations

Workflow

*Brad and John's workflow documents*

  1. Geovalidation
    • Comes first, but both it and TNRS needed
    • UTF-8/ASCII conversion causes 80-90% of names to match
    • how countries are partitioned: GADM.org
  2. TNRS
    • removes higher-order taxa and leaves just genus and below (using family for disambiguation)
  3. Higher classifications
  4. Flag cultivated records
  • Analytical DB depends on validations because it requires scrubbed names
    • Will shelve analytical DB until done with validations

To do

  • Finish loading for next week
  • Approve schema changes (above)
  • Validations

Availability

  • Bob will be on the call next week