2011 working group We BIEN tools¶

UI to enter data
diagram in iPlant proposal
- arrow 4-5: analytical views of data
- use iPlant's data discovery env?
- feedback to data providers
need to decide on confed db: BIEN 2 or VegBank-based?
- BIEN2 is analytical db
optimize core db for transactions
separate datastore from analytical cache
key issue is confed: how lossless is confed schema?
VegX addresses lossiness
XMLS difficult to work w/ b/c doesn't merge well with ER
- still need to map into RDBMS
complexity of VegX is problematic
how to maintain integrity of original data?
go back to raw data or VegX
translate input dbs' schemas into VegX
need for scientists to have more confed schema now
get stuff into confed db
make more robust
staging db: SQL-to-SQL point
need confed schema itself
how to accommodate changes in VegX itself?
go with VegBank, which has reverse-tweaks from VegX?
use VegBank out of the box?
maintain VegBank as service: official repo for nat'l veg classification
technology underlying VegBank: Java/Spring, not standard LAMP stack
GeoDjango? Python becoming more accessible, standard
keep schema, reimplement underlying technology
help sustain VegBank effort
adjustments to VegBank model
lossless mapping VegBank to BIEN 3
VegBranch interface: generates VegBank XML, replace with VegX?
need a few DwC fields
new VegBank env: migrating out of Java framework
maintain intended VegBank or build duplicate with updated technology?
platform-oriented architecture
VegBank has Postgres, but app framework is old
migrating Java/Spring Postgres framework takes a long time
Java not common in sci community
LAMP stack more sustainable
db will be Postgres
Django
3 yrs to develop VegBank
Java Play is newest Java framework
confed resource for a large # of scientists to use
get data together into queryable framework
data in -> framework -> analyt dbs
VegBank data not usually updated
automated updating of taxonomy
version the analytical database for each analysis
confed database will be source record for some institutions: need editing capability
revision record in VegBank for users to edit data
Peter's db is record-by-record edits, homogenous data
dataset-level import from Missouri
replacing an entire plot
this confed db is not primary repo for any data
provider still has own data
don't have each herbarium have its master database as BIEN
snapshot of BIEN referenced
replicability
analytical db not updated frequently, old versions gets archived
satisfy conditions to expose records
most queries on geospatial, taxonomic: expose convenient axes in main db
researchers want raw data down to finest cell
detailed stem-level data spread across several tables
analyzed, summarized data can't be constrained
focus on confed db
analytical db is denormalized export of db every 3-6 months
analytical db could be done in R
range models from db: pre-build in db?
analytical db an instance or an abstraction?
does analytical db have all the data?

generalize for camera trapping community
data entry tool works for variety of plots
- coverage complicates things
stems
not same as CTFS
mortality: "dead codes" for trees: sprouts on tree after it dies, stump left, cut, missing, fallen, snapped
VegBank vocab: trying to anticipate what people will use
plants shorter than breast height: dbh?
VegX attrs all over the map
stem dies, but individual tree still alive
get access to geoscrub db
- geoscrub scripts in bien_shared
gazetteer solutions for geoscrubbing
make sure lat/long falls in polygon
- but names in field misspelled, shapefile names not standardized
unconverted UTF-8 and Latin-1
currently does everything short of fuzzy matching
1 month to run scripting of polygons
build in a utility
name resolution scripts work fine
w/ fuzzy matching, can recover a lot more data
Yahoo has place names API web service
- doesn't resolve misspelled/garbled place names
spreadsheets don't handle accents well
if find web service, use that
validation steps: redone or not?
use cases
- trait use case from Brian
- phylo use case
Brad will send use cases
Yahoo! GeoPlanet
BIEN traits on Plone site
VegBank w/ modifications
don't need to stick w/ VegBank platform
versioning: to be decided
don't allow users to do detailed edits of data
total refresh of dataset
data entry tool not in scope: nice to have
deciding on VegBank, but modify to meet requirements
- don't need to keep technology behind VegBank
add new fields to VegBank
model built by copying VegX schema, adjusting to suit needs
load legacy data into db, then build VegX-based loader
but also load from DwC, VegBranch
VegBank has own XML format
transform VegX XML to VegBank XML, then import
tool produces VegX file, then mapped into BIEN backend
decouple tools from db
dataset level refresh: people maintain data with own tools
BIEN is data aggregator: puts together data sources and runs validation
for expediency, identify major existing plot resources and push into VegX (modified for BIEN): "VegBIEN"
raw plots data
structure lost from raw plots
modify VegBIEN to accommodate existing stack of plots to address use cases
fix VegBIEN to import VegX documents
modify VegBIEN schema to get things sci can use
SALVIAS data not dynamic, so just import once
other sources need dynamic script
data entry tools: Steve developing one for CTFS
- loads raw data into CTFS
data loader into CTFS a VegX creation tool?
VegX
first task is to get existing stack of raw data into VegBIEN (modification of VegBank), after creating VegBIEN
make changes needed to get VegBIEN to work
don't do VegX pipeline validation b/c data already in relational model
importing data directly into existing VegBIEN db
load data we already have, then start working on VegX loader
NVS-BIEN-VegBank-SALVIAS-TurboVeg all communicate with one another
different plot dbs -> one format: VegX
- Brad, Bob, Matt, IT, field people
VegX meeting following year, smaller group
BIEN is main VegX user
initial wrappers for VegES(?)
mapping to VegBank via VegX: Martin Kleikampf
2010 climate change meeting in Hamburg
barrier at uptake end, researchers focused on own datasets
individuals not interested in VegX because complicates spreadsheet
concepts around measuring vegetation plot data
VegX components
- plot has plot observation (specific in time and space)
taxon concept: pub. taxonomic unit
taxon name: pub. nomenclatural unit
get figure of VegX relationships
XML schema is work in progress, draft
formalize, standards track? up to end users
IAVS strongly in favor of VegX, will/have formally endorsed
- int'l vegetation scientists org
high-level VegX elements optional
well structured plot data
top level has range of high-level elements
plot refers to plot obs
indiv organisms, observations
attributes, methods, protocols
collection of records any time of plot
attributes of most elements
- id: identifier
plot name sometimes also unique identifier; stem unique names
plotName vs plotUniqueIdentifier
quadrats in terms of subplots
subplot has reference to parent plot (this determines it's a subplot)
- relative to point of origin or corner of plot
VegBank has 3 growthFormType fields: what about #4? need to normalize
fully normalized vs can't anticipate -> flat, easy to use
aggregated, indiv observations: aka taxon
nested schema difficult to work with programmatically
- XML too difficult for average user
can you flatten VegX so average user can work with it?
- averageValue.value -> averageValueValue
a lot of VegX comes from EML
attribute
- ordinal vs non-ordinal data
- units, precision, etc.
transmit vocab for fields
enumerated codes
tag for every tree
can't parse other business rules
some attrs don't have constraints, so datatypes don't match VegBIEN
uncontrolled value field
qualitative attrs don't have constraints vocab
specimen collected is the individual, or representative of all individuals in a plot?
voucher is one obs linked to another obs, both with names
- name transferred by voucher?
taxonRelationshipAssertion: determination/identification event
- mult for different opinions
published name assoc with referenced taxonomy
most herbaria don't have a name attached to a specimen; instead a determination table
- DwC uses just latest name
reference taxonomic concept: TCS
- has GUID, associated names?
published name of taxon
structure of botannical names conveys info
resolve names to TNRS, which includes all the info
don't store year of pub of name
stem tags labels: stemCode in VegBank
- relationship to whole individual
mult records for tree, stem
each stem in its own record?
VegBank counting stems rather than plants
mult stems with codes: same genetic individual?
relatedItem: one-way relationship
table relating stems to an individual
NVS model
multiple measurements per stem
vouchering: fundamental piece of data needing to be supported
stem: according to VegBank: breast height (4.5 ft/1.7 m above ground), splits below 0.5 m along stem
across time, stem grows and might change from mult stems to branches
plots on shrubs: same rules used
shrubs vs trees
VegBank uses % cover
do all stems with same rules
growth form classification separate from size
- trees, lianas
how to fit peyote into group?
what constitutes a stem?: need a rule
cover, stratum, diameter count optional
taxonObservation populated for any plant
nameless plant: into taxonObservation w/o name link? need to write unknown or blank
taxonInterpretation attached to taxonObs
light blue tables: observation has census events for plots
stems can have own taxonDetermination
stems w/ coords go into stemLocation; stems w/ counts go into stemCount
reference_ID for ?
3 stems w/ same morphospecies: enter 3x unknown, or unknown w/ count of 3?
traits: diameter, height (for individuals)
traits attached to invid, taxon, plot
TraitNet cares about all 3 levels of traits

Files (0)

Project

General

Profile

Wiki

2011 working group We BIEN tools¶