2011 working group Fr BIEN Implementation¶
- start out w/ VegBank, VegX, or DwC?
- specimen data very uniform, so only need one dataset/
- park service db has plots data
- sample of SALVIAS plots
- take data, metadata and put into VegBank clone
- get as close to the source as possible
- work w/ Brad Boyle on loading SALVIAS data
- do a MOBOT file
- FIA is simple dataset
- SALVIAS has individual level, observation level data
- MBG, FIA extracts on nimoy
- FIA organized by state (48 states)
- materialized views vs. single tables: raw data
- SALVIAS not on nimoy
- start w/ MBG, FIA
- load data that's the actual data from the source, not already modified to fit the staging data
- how obtained FIA, MOBOT:
- MOBOT from Brian Enquist, who got it from Jay
- for herbaria: need strict DwC
- DwC mismatches: load to first adjustment of schema
- NYBG: NY Botannical Garden
- MBG: automated process (web service) exposes data
- publicly available?
/home/bien_shared/raw_data/ny/
: DwC- DIGIR servers provide data from the source
- DwC archives
- GBIF now uses CSV files to index things w/ metadata
- does BIEN 2 deal with specimens: need to add fields
- cultivated specimens in DwC? Ariz has them
- really need specimen desc field in DwC
- isCultivated is boolean (nullable?); has text desc field to explain reason
- no schema spec, so handled differently by each institution
- isCultivated is interpretation; may change in the future
- should isCultivated be required?
- FIA is problematic
- need orig source of plot data
- distribute CD of data
- extract data from Access DB
- compute aggregates
- load FIA from the source
- start w/ NYBG dataset
- stress testing the model
- primarily a learning exercise
- see if we can get SALVIAS data
- develop in pipelines and workflows
- "press the button"-type of solution
- map oddities of each db to VegX vs. directly to VegBank
- don't focus entirely on single-push model
- SALVIAS is static
- complexity depends on amount of schema modifications
- real plots in CSVs, but uniform and standardized
- simple plot dataset as training tool
- step 2: map SALVIAS
- spreadsheet is CSV: one for aggregates and one for plot attrs
- 3 representative CSVs
- choose example datasets
- VegBank deals w/ TurboVeg? no direct communication but could export as CSV -> import
- Brad will post or e-mail current version of requirements doc
- get actual plot data