2012-09-07 conference call¶
- Table of contents
- 2012-09-07 conference call
Schema changes¶
Flattening the taxonomic and placename hierarchies¶
- The large number of joins required to flatten these hierarchies causes the server to run out of disk space when generating the analytical DB
- A flattened table with a column for each rank would be easier to join, query, filter, and map to
- We want to preserve the data provider's own names, but normalizing unscrubbed names into a "tree of life" is not useful
- We propose to add the standard DwC and VegBank taxonomic columns to the taxondetermination table or a separate taxonverbatim table
- The namedplace hierarchy will be flattened to denormalized columns in locationdetermination
- The normalized "tree of life" hierarchy will be used just to store the scrubbed names
- The scrubbed names may additionally be stored in taxondetermination
- taxonrank will become an open picklist?
Sample records showing data provider's higher classifications¶
Also shows data that we might want to filter out for the analysis
ScientificName | Kingdom | Phylum | Class | Order | Family | Genus | Species | Subspecies | ScientificNameAuthor |
Tolypella comosa Allen | Algae | Charophyta | Charophyceae (01) | Charales | Characeae | Tolypella | comosa | Allen | |
Tolypella intertexta Allen | Algae | Charophyta | Charophyceae (01) | Charales | Characeae | Tolypella | intertexta | Allen |
query:
SELECT "ScientificName", "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species", "Subspecies", "ScientificNameAuthor"
FROM "NY"."Specimen"
WHERE "Phylum" IS NOT NULL AND "Class" IS NOT NULL
LIMIT 2
Validations¶
Workflow¶
*Brad and John's workflow documents*
- Geovalidation
- Comes first, but both it and TNRS needed
- UTF-8/ASCII conversion causes 80-90% of names to match
- how countries are partitioned: GADM.org
- TNRS
- removes higher-order taxa and leaves just genus and below (using family for disambiguation)
- Higher classifications
- Flag cultivated records
- Analytical DB depends on validations because it requires scrubbed names
- Will shelve analytical DB until done with validations
To do¶
Finish loading for next weekApprove schema changes (above)- Validations
Availability¶
- Bob will be on the call next week