Project

General

Profile

2011 working group We BIEN tools

  • UI to enter data
  • diagram in iPlant proposal
    • arrow 4-5: analytical views of data
    • use iPlant's data discovery env?
    • feedback to data providers
  • need to decide on confed db: BIEN 2 or VegBank-based?
    • BIEN2 is analytical db
  • optimize core db for transactions
  • separate datastore from analytical cache
  • key issue is confed: how lossless is confed schema?
  • VegX addresses lossiness
  • XMLS difficult to work w/ b/c doesn't merge well with ER
    • still need to map into RDBMS
  • complexity of VegX is problematic
  • how to maintain integrity of original data?
  • go back to raw data or VegX
  • translate input dbs' schemas into VegX
  • need for scientists to have more confed schema now
  • get stuff into confed db
  • make more robust
  • staging db: SQL-to-SQL point
  • need confed schema itself
  • how to accommodate changes in VegX itself?
  • go with VegBank, which has reverse-tweaks from VegX?
  • use VegBank out of the box?
  • maintain VegBank as service: official repo for nat'l veg classification
  • technology underlying VegBank: Java/Spring, not standard LAMP stack
  • GeoDjango? Python becoming more accessible, standard
  • keep schema, reimplement underlying technology
  • help sustain VegBank effort
  • adjustments to VegBank model
  • lossless mapping VegBank to BIEN 3
  • VegBranch interface: generates VegBank XML, replace with VegX?
  • need a few DwC fields
  • new VegBank env: migrating out of Java framework
  • maintain intended VegBank or build duplicate with updated technology?
  • platform-oriented architecture
  • VegBank has Postgres, but app framework is old
  • migrating Java/Spring Postgres framework takes a long time
  • Java not common in sci community
  • LAMP stack more sustainable
  • db will be Postgres
  • Django
  • 3 yrs to develop VegBank
  • Java Play is newest Java framework
  • confed resource for a large # of scientists to use
  • get data together into queryable framework
  • data in -> framework -> analyt dbs
  • VegBank data not usually updated
  • automated updating of taxonomy
  • version the analytical database for each analysis
  • confed database will be source record for some institutions: need editing capability
  • revision record in VegBank for users to edit data
  • Peter's db is record-by-record edits, homogenous data
  • dataset-level import from Missouri
  • replacing an entire plot
  • this confed db is not primary repo for any data
  • provider still has own data
  • don't have each herbarium have its master database as BIEN
  • snapshot of BIEN referenced
  • replicability
  • analytical db not updated frequently, old versions gets archived
  • satisfy conditions to expose records
  • most queries on geospatial, taxonomic: expose convenient axes in main db
  • researchers want raw data down to finest cell
  • detailed stem-level data spread across several tables
  • analyzed, summarized data can't be constrained
  • focus on confed db
  • analytical db is denormalized export of db every 3-6 months
  • analytical db could be done in R
  • range models from db: pre-build in db?
  • analytical db an instance or an abstraction?
  • does analytical db have all the data?
  • generalize for camera trapping community
  • data entry tool works for variety of plots
    • coverage complicates things
  • stems
  • not same as CTFS
  • mortality: "dead codes" for trees: sprouts on tree after it dies, stump left, cut, missing, fallen, snapped
  • VegBank vocab: trying to anticipate what people will use
  • plants shorter than breast height: dbh?
  • VegX attrs all over the map
  • stem dies, but individual tree still alive
  • get access to geoscrub db
    • geoscrub scripts in bien_shared
  • gazetteer solutions for geoscrubbing
  • make sure lat/long falls in polygon
    • but names in field misspelled, shapefile names not standardized
  • unconverted UTF-8 and Latin-1
  • currently does everything short of fuzzy matching
  • 1 month to run scripting of polygons
  • build in a utility
  • name resolution scripts work fine
  • w/ fuzzy matching, can recover a lot more data
  • Yahoo has place names API web service
    • doesn't resolve misspelled/garbled place names
  • spreadsheets don't handle accents well
  • if find web service, use that
  • validation steps: redone or not?
  • use cases
    • trait use case from Brian
    • phylo use case
  • Brad will send use cases
  • Yahoo! GeoPlanet
  • BIEN traits on Plone site
  • VegBank w/ modifications
  • don't need to stick w/ VegBank platform
  • versioning: to be decided
  • don't allow users to do detailed edits of data
  • total refresh of dataset
  • data entry tool not in scope: nice to have
  • deciding on VegBank, but modify to meet requirements
    • don't need to keep technology behind VegBank
  • add new fields to VegBank
  • model built by copying VegX schema, adjusting to suit needs
  • load legacy data into db, then build VegX-based loader
  • but also load from DwC, VegBranch
  • VegBank has own XML format
  • transform VegX XML to VegBank XML, then import
  • tool produces VegX file, then mapped into BIEN backend
  • decouple tools from db
  • dataset level refresh: people maintain data with own tools
  • BIEN is data aggregator: puts together data sources and runs validation
  • for expediency, identify major existing plot resources and push into VegX (modified for BIEN): "VegBIEN"
  • raw plots data
  • structure lost from raw plots
  • modify VegBIEN to accommodate existing stack of plots to address use cases
  • fix VegBIEN to import VegX documents
  • modify VegBIEN schema to get things sci can use
  • SALVIAS data not dynamic, so just import once
  • other sources need dynamic script
  • data entry tools: Steve developing one for CTFS
    • loads raw data into CTFS
  • data loader into CTFS a VegX creation tool?
  • VegX
  • first task is to get existing stack of raw data into VegBIEN (modification of VegBank), after creating VegBIEN
  • make changes needed to get VegBIEN to work
  • don't do VegX pipeline validation b/c data already in relational model
  • importing data directly into existing VegBIEN db
  • load data we already have, then start working on VegX loader
  • NVS-BIEN-VegBank-SALVIAS-TurboVeg all communicate with one another
  • different plot dbs -> one format: VegX
    • Brad, Bob, Matt, IT, field people
  • VegX meeting following year, smaller group
  • BIEN is main VegX user
  • initial wrappers for VegES(?)
  • mapping to VegBank via VegX: Martin Kleikampf
  • 2010 climate change meeting in Hamburg
  • barrier at uptake end, researchers focused on own datasets
  • individuals not interested in VegX because complicates spreadsheet
  • concepts around measuring vegetation plot data
  • VegX components
    • plot has plot observation (specific in time and space)
  • taxon concept: pub. taxonomic unit
  • taxon name: pub. nomenclatural unit
  • get figure of VegX relationships
  • XML schema is work in progress, draft
  • formalize, standards track? up to end users
  • IAVS strongly in favor of VegX, will/have formally endorsed
    • int'l vegetation scientists org
  • high-level VegX elements optional
  • well structured plot data
  • top level has range of high-level elements
  • plot refers to plot obs
  • indiv organisms, observations
  • attributes, methods, protocols
  • collection of records any time of plot
  • attributes of most elements
    • id: identifier
  • plot name sometimes also unique identifier; stem unique names
  • plotName vs plotUniqueIdentifier
  • quadrats in terms of subplots
  • subplot has reference to parent plot (this determines it's a subplot)
    • relative to point of origin or corner of plot
  • VegBank has 3 growthFormType fields: what about #4? need to normalize
  • fully normalized vs can't anticipate -> flat, easy to use
  • aggregated, indiv observations: aka taxon
  • nested schema difficult to work with programmatically
    • XML too difficult for average user
  • can you flatten VegX so average user can work with it?
    • averageValue.value -> averageValueValue
  • a lot of VegX comes from EML
  • attribute
    • ordinal vs non-ordinal data
    • units, precision, etc.
  • transmit vocab for fields
  • enumerated codes
  • tag for every tree
  • can't parse other business rules
  • some attrs don't have constraints, so datatypes don't match VegBIEN
  • uncontrolled value field
  • qualitative attrs don't have constraints vocab
  • specimen collected is the individual, or representative of all individuals in a plot?
  • voucher is one obs linked to another obs, both with names
    • name transferred by voucher?
  • taxonRelationshipAssertion: determination/identification event
    • mult for different opinions
  • published name assoc with referenced taxonomy
  • most herbaria don't have a name attached to a specimen; instead a determination table
    • DwC uses just latest name
  • reference taxonomic concept: TCS
    • has GUID, associated names?
  • published name of taxon
  • structure of botannical names conveys info
  • resolve names to TNRS, which includes all the info
  • don't store year of pub of name
  • stem tags labels: stemCode in VegBank
    • relationship to whole individual
  • mult records for tree, stem
  • each stem in its own record?
  • VegBank counting stems rather than plants
  • mult stems with codes: same genetic individual?
  • relatedItem: one-way relationship
  • table relating stems to an individual
  • NVS model
  • multiple measurements per stem
  • vouchering: fundamental piece of data needing to be supported
  • stem: according to VegBank: breast height (4.5 ft/1.7 m above ground), splits below 0.5 m along stem
  • across time, stem grows and might change from mult stems to branches
  • plots on shrubs: same rules used
  • shrubs vs trees
  • VegBank uses % cover
  • do all stems with same rules
  • growth form classification separate from size
    • trees, lianas
  • how to fit peyote into group?
  • what constitutes a stem?: need a rule
  • cover, stratum, diameter count optional
  • taxonObservation populated for any plant
  • nameless plant: into taxonObservation w/o name link? need to write unknown or blank
  • taxonInterpretation attached to taxonObs
  • light blue tables: observation has census events for plots
  • stems can have own taxonDetermination
  • stems w/ coords go into stemLocation; stems w/ counts go into stemCount
  • reference_ID for ?
  • 3 stems w/ same morphospecies: enter 3x unknown, or unknown w/ count of 3?
  • traits: diameter, height (for individuals)
  • traits attached to invid, taxon, plot
  • TraitNet cares about all 3 levels of traits