Project

General

Profile

2013-03-06 conference call with Brad

Analytical DB

analytical_* views

  • add specimen fields to analytical_plot so that analytical_plot is a superset of analytical_specimen
  • derive analytical_specimen from analytical_plot? no, because this would not work with the sync_analytical_stem_to_view() function

TNRS

  • just include scientificName_verbatim, rather than taxonName_verbatim and scientificNameAuthorship_verbatim
    • when scientificName itself is not provided, form this by concatenating taxonName and scientificNameAuthorship
  • don't include parsed name, just matched and verbatim names
    • in parse-only mode, TNRS returns parsed names in the family, genus, etc. columns
      in match mode (the default), TNRS returns matched names in these columns
  • note that TNRS uses GNI, which parses name components before matching to a list of known names
    this can lead to parsing problems when e.g. the capitalization is wrong or a family is in the name twice
  • don't allow genera in the family field
  • don't send the family's authority to TNRS, because it can't parse it

cultivated

  • can only be true or NULL, not false
    • datasource-provided false values must be mapped to NULL (e.g. in ARIZ)
  • populate cultivatedBasis
    • use a composite type (a struct) to keep the reason together with the boolean value
    • unpack the struct into two columns in the analytical DB itself
    • when from a datasource-provided value, use "flagged by provider"
  • add cultivated_verbatim which stores the input string
  • parse cultivated from the locality desc in the analytical DB creation, not when the datasource is imported

phenology

  • add boolean flowers, fruit fields to VegCore
  • when boolean flowers, fruit values provided, also append "flowers" and/or "fruit" to the phenology field done for UNCC, which is the only datasource that provides flower, fruit separated out
  • parse flowers, fruit from specimenDescription

Data validation

Madidi

  • need to send reexport with plot fields

UNCC

  • cultivated: for our purposes, anything non-NULL should be assumed to be cultivated
  • flowers, fruit
  • ignore leaves, roots

CTFS

  • Brad will validate

Data loading

Denormalize-first method

  • FullOccurrence will be the denormalized table
    or maybe FullOccurrenceRaw with the mapped data and FullOccurrenceScrubbed with the analytical info added?
    • not the same as the analytical DB, which will be a subset of the FullOccurrence (FullOccurrenceScrubbed) columns