Project

General

Profile

2013-02-28 conference call

Data validation

ARIZ

  • add remaining fields to analytical DB and VegBIEN
  • HorizontalDatum is scrubbed CoordinateSystem
    • != OriginalCoordinateSystem (verbatim CoordinateSystem)

BRIT

  • Notes_Plant sometimes contains footer with donor herbarium, which messes up cultivated flag parsing
  • if no Locality_Description, then ignore Notes_Plant because it's likely malformed

UNCC

  • Bob will validate this weekend (or Brad can also validate)

Madidi

  • Brad will validate

General

  • extracts' rows should be resized to show embedded lines

Data refresh

FIA

  • work on mapping

GBIF

  • send Brad the list of institution names that have animal specimens
  • whitelist IH herbaria
  • in non-IH herbaria, exclude if < 80% plants

MO

refresh mapped

  • don't wait for refresh to do analytical DB and range modeling
  • we can just reimport it whenever it's available
  • Brad will validate

Schema changes

Strata

  • add stratumName (ID column), stratumDescription (as stratumName) to analytical DB and VegBIEN
  • unlike subplots, strata are overlapping
  • each TaxonOccurrence has exactly one stratum, which is between it and the LocationObservation
    • stratumName defaults to "plot" when no explicit stratum specified
  • strata are often nested within one another

Nested plots

  • some subplots are nested within other subplots
    • e.g. in CVS
  • make each subplot's ID globally unique so that it is independent of any hierarchy of plots which it is enclosed in
  • inherit plot attributes from the enclosing plot
  • analytical DB just needs outermost and innermost plots' IDs (plus the path that identifies the stratum)

identificationQualifier, etc.

  • make identificationQualifier a separate field from taxonFit
  • make match scores separate fields from matchedTaxonConfidence (taxonConfidence)
  • the identificationQualifier + taxonName is the QualifiedTaxonName

occurrenceRemarks, etc.

  • add separate specimen description field

Analytical DB

  • start focusing on this next week

Cultivated flag

  • parse all notes fields for this

Data loading

Denormalize-first method

  • easier to combine denormalized tables and then normalize, rather than normalizing first and then denormalizing
    • Brad said this approach is OK
    • we can start using this approach for GBIF