2011 working group We BIEN tools¶
- UI to enter data
- diagram in iPlant proposal
- arrow 4-5: analytical views of data
- use iPlant's data discovery env?
- feedback to data providers
- need to decide on confed db: BIEN 2 or VegBank-based?
- BIEN2 is analytical db
- optimize core db for transactions
- separate datastore from analytical cache
- key issue is confed: how lossless is confed schema?
- VegX addresses lossiness
- XMLS difficult to work w/ b/c doesn't merge well with ER
- still need to map into RDBMS
- complexity of VegX is problematic
- how to maintain integrity of original data?
- go back to raw data or VegX
- translate input dbs' schemas into VegX
- need for scientists to have more confed schema now
- get stuff into confed db
- make more robust
- staging db: SQL-to-SQL point
- need confed schema itself
- how to accommodate changes in VegX itself?
- go with VegBank, which has reverse-tweaks from VegX?
- use VegBank out of the box?
- maintain VegBank as service: official repo for nat'l veg classification
- technology underlying VegBank: Java/Spring, not standard LAMP stack
- GeoDjango? Python becoming more accessible, standard
- keep schema, reimplement underlying technology
- help sustain VegBank effort
- adjustments to VegBank model
- lossless mapping VegBank to BIEN 3
- VegBranch interface: generates VegBank XML, replace with VegX?
- need a few DwC fields
- new VegBank env: migrating out of Java framework
- maintain intended VegBank or build duplicate with updated technology?
- platform-oriented architecture
- VegBank has Postgres, but app framework is old
- migrating Java/Spring Postgres framework takes a long time
- Java not common in sci community
- LAMP stack more sustainable
- db will be Postgres
- Django
- 3 yrs to develop VegBank
- Java Play is newest Java framework
- confed resource for a large # of scientists to use
- get data together into queryable framework
- data in -> framework -> analyt dbs
- VegBank data not usually updated
- automated updating of taxonomy
- version the analytical database for each analysis
- confed database will be source record for some institutions: need editing capability
- revision record in VegBank for users to edit data
- Peter's db is record-by-record edits, homogenous data
- dataset-level import from Missouri
- replacing an entire plot
- this confed db is not primary repo for any data
- provider still has own data
- don't have each herbarium have its master database as BIEN
- snapshot of BIEN referenced
- replicability
- analytical db not updated frequently, old versions gets archived
- satisfy conditions to expose records
- most queries on geospatial, taxonomic: expose convenient axes in main db
- researchers want raw data down to finest cell
- detailed stem-level data spread across several tables
- analyzed, summarized data can't be constrained
- focus on confed db
- analytical db is denormalized export of db every 3-6 months
- analytical db could be done in R
- range models from db: pre-build in db?
- analytical db an instance or an abstraction?
- does analytical db have all the data?
- generalize for camera trapping community
- data entry tool works for variety of plots
- coverage complicates things
- stems
- not same as CTFS
- mortality: "dead codes" for trees: sprouts on tree after it dies, stump left, cut, missing, fallen, snapped
- VegBank vocab: trying to anticipate what people will use
- plants shorter than breast height: dbh?
- VegX attrs all over the map
- stem dies, but individual tree still alive
- get access to geoscrub db
- geoscrub scripts in bien_shared
- gazetteer solutions for geoscrubbing
- make sure lat/long falls in polygon
- but names in field misspelled, shapefile names not standardized
- unconverted UTF-8 and Latin-1
- currently does everything short of fuzzy matching
- 1 month to run scripting of polygons
- build in a utility
- name resolution scripts work fine
- w/ fuzzy matching, can recover a lot more data
- Yahoo has place names API web service
- doesn't resolve misspelled/garbled place names
- spreadsheets don't handle accents well
- if find web service, use that
- validation steps: redone or not?
- use cases
- trait use case from Brian
- phylo use case
- Brad will send use cases
- Yahoo! GeoPlanet
- BIEN traits on Plone site
- VegBank w/ modifications
- don't need to stick w/ VegBank platform
- versioning: to be decided
- don't allow users to do detailed edits of data
- total refresh of dataset
- data entry tool not in scope: nice to have
- deciding on VegBank, but modify to meet requirements
- don't need to keep technology behind VegBank
- add new fields to VegBank
- model built by copying VegX schema, adjusting to suit needs
- load legacy data into db, then build VegX-based loader
- but also load from DwC, VegBranch
- VegBank has own XML format
- transform VegX XML to VegBank XML, then import
- tool produces VegX file, then mapped into BIEN backend
- decouple tools from db
- dataset level refresh: people maintain data with own tools
- BIEN is data aggregator: puts together data sources and runs validation
- for expediency, identify major existing plot resources and push into VegX (modified for BIEN): "VegBIEN"
- raw plots data
- structure lost from raw plots
- modify VegBIEN to accommodate existing stack of plots to address use cases
- fix VegBIEN to import VegX documents
- modify VegBIEN schema to get things sci can use
- SALVIAS data not dynamic, so just import once
- other sources need dynamic script
- data entry tools: Steve developing one for CTFS
- loads raw data into CTFS
- data loader into CTFS a VegX creation tool?
- VegX
- first task is to get existing stack of raw data into VegBIEN (modification of VegBank), after creating VegBIEN
- make changes needed to get VegBIEN to work
- don't do VegX pipeline validation b/c data already in relational model
- importing data directly into existing VegBIEN db
- load data we already have, then start working on VegX loader
- NVS-BIEN-VegBank-SALVIAS-TurboVeg all communicate with one another
- different plot dbs -> one format: VegX
- Brad, Bob, Matt, IT, field people
- VegX meeting following year, smaller group
- BIEN is main VegX user
- initial wrappers for VegES(?)
- mapping to VegBank via VegX: Martin Kleikampf
- 2010 climate change meeting in Hamburg
- barrier at uptake end, researchers focused on own datasets
- individuals not interested in VegX because complicates spreadsheet
- concepts around measuring vegetation plot data
- VegX components
- plot has plot observation (specific in time and space)
- taxon concept: pub. taxonomic unit
- taxon name: pub. nomenclatural unit
- get figure of VegX relationships
- XML schema is work in progress, draft
- formalize, standards track? up to end users
- IAVS strongly in favor of VegX, will/have formally endorsed
- int'l vegetation scientists org
- high-level VegX elements optional
- well structured plot data
- top level has range of high-level elements
- plot refers to plot obs
- indiv organisms, observations
- attributes, methods, protocols
- collection of records any time of plot
- attributes of most elements
- id: identifier
- plot name sometimes also unique identifier; stem unique names
- plotName vs plotUniqueIdentifier
- quadrats in terms of subplots
- subplot has reference to parent plot (this determines it's a subplot)
- relative to point of origin or corner of plot
- VegBank has 3 growthFormType fields: what about #4? need to normalize
- fully normalized vs can't anticipate -> flat, easy to use
- aggregated, indiv observations: aka taxon
- nested schema difficult to work with programmatically
- XML too difficult for average user
- can you flatten VegX so average user can work with it?
- averageValue.value -> averageValueValue
- a lot of VegX comes from EML
- attribute
- ordinal vs non-ordinal data
- units, precision, etc.
- transmit vocab for fields
- enumerated codes
- tag for every tree
- can't parse other business rules
- some attrs don't have constraints, so datatypes don't match VegBIEN
- uncontrolled value field
- qualitative attrs don't have constraints vocab
- specimen collected is the individual, or representative of all individuals in a plot?
- voucher is one obs linked to another obs, both with names
- name transferred by voucher?
- taxonRelationshipAssertion: determination/identification event
- mult for different opinions
- published name assoc with referenced taxonomy
- most herbaria don't have a name attached to a specimen; instead a determination table
- DwC uses just latest name
- reference taxonomic concept: TCS
- has GUID, associated names?
- published name of taxon
- structure of botannical names conveys info
- resolve names to TNRS, which includes all the info
- don't store year of pub of name
- stem tags labels: stemCode in VegBank
- relationship to whole individual
- mult records for tree, stem
- each stem in its own record?
- VegBank counting stems rather than plants
- mult stems with codes: same genetic individual?
- relatedItem: one-way relationship
- table relating stems to an individual
- NVS model
- multiple measurements per stem
- vouchering: fundamental piece of data needing to be supported
- stem: according to VegBank: breast height (4.5 ft/1.7 m above ground), splits below 0.5 m along stem
- across time, stem grows and might change from mult stems to branches
- plots on shrubs: same rules used
- shrubs vs trees
- VegBank uses % cover
- do all stems with same rules
- growth form classification separate from size
- trees, lianas
- how to fit peyote into group?
- what constitutes a stem?: need a rule
- cover, stratum, diameter count optional
- taxonObservation populated for any plant
- nameless plant: into taxonObservation w/o name link? need to write unknown or blank
- taxonInterpretation attached to taxonObs
- light blue tables: observation has census events for plots
- stems can have own taxonDetermination
- stems w/ coords go into stemLocation; stems w/ counts go into stemCount
- reference_ID for ?
- 3 stems w/ same morphospecies: enter 3x unknown, or unknown w/ count of 3?
- traits: diameter, height (for individuals)
- traits attached to invid, taxon, plot
- TraitNet cares about all 3 levels of traits