Project

General

Profile

2011 working group Fr BIEN Implementation

  • start out w/ VegBank, VegX, or DwC?
  • specimen data very uniform, so only need one dataset/
  • park service db has plots data
  • sample of SALVIAS plots
  • take data, metadata and put into VegBank clone
  • get as close to the source as possible
  • work w/ Brad Boyle on loading SALVIAS data
  • do a MOBOT file
  • FIA is simple dataset
  • SALVIAS has individual level, observation level data
  • MBG, FIA extracts on nimoy
    • FIA organized by state (48 states)
  • materialized views vs. single tables: raw data
  • SALVIAS not on nimoy
  • start w/ MBG, FIA
  • load data that's the actual data from the source, not already modified to fit the staging data
  • how obtained FIA, MOBOT:
    • MOBOT from Brian Enquist, who got it from Jay
  • for herbaria: need strict DwC
  • DwC mismatches: load to first adjustment of schema
  • NYBG: NY Botannical Garden
  • MBG: automated process (web service) exposes data
    • publicly available?
  • /home/bien_shared/raw_data/ny/: DwC
  • DIGIR servers provide data from the source
  • DwC archives
  • GBIF now uses CSV files to index things w/ metadata
  • does BIEN 2 deal with specimens: need to add fields
  • cultivated specimens in DwC? Ariz has them
  • really need specimen desc field in DwC
  • isCultivated is boolean (nullable?); has text desc field to explain reason
  • no schema spec, so handled differently by each institution
  • isCultivated is interpretation; may change in the future
  • should isCultivated be required?
  • FIA is problematic
    • need orig source of plot data
    • distribute CD of data
    • extract data from Access DB
  • compute aggregates
  • load FIA from the source
  • start w/ NYBG dataset
  • stress testing the model
  • primarily a learning exercise
  • see if we can get SALVIAS data
  • develop in pipelines and workflows
  • "press the button"-type of solution
  • map oddities of each db to VegX vs. directly to VegBank
  • don't focus entirely on single-push model
  • SALVIAS is static
  • complexity depends on amount of schema modifications
  • real plots in CSVs, but uniform and standardized
  • simple plot dataset as training tool
  • step 2: map SALVIAS
  • spreadsheet is CSV: one for aggregates and one for plot attrs
    • 3 representative CSVs
  • choose example datasets
  • VegBank deals w/ TurboVeg? no direct communication but could export as CSV -> import
  • Brad will post or e-mail current version of requirements doc
  • get actual plot data