2011 working group Tu BIEN database¶
- time constraint: 1 year
- what do we want at end of year?
- audience for mapping tool
- additional data not in 2.0 that we want in 3.0
- identify primary science questions to be done w/ BIEN 3
- complexity of plots
- primary users
- educate people on domain
- need MS Access databases for loading data
- plot collection
- present plot data mgmt interface
- kinds of data
- occurrence data of organism w/ coords
- plot data
- 50 million herbaria in the world
- digital records of herbarium data
- same species put together
- undescribed species
- metadata: same species in different plots refer to difft things
- specimen is just a presence, but a plot has absences of species, too
- every occurrence/recorded indiv associated with a specific area
- eco plots where only studying a subset of species
- cover if doing all taxa
- safe way to store data in archives
- plot
- place that doesn't change (geocoordinates)
- subplots
- plot observations
- most plots have one obs
- mult obs if come back to same plot mult times
- lump observations into projects
- community classifications
- US nat'l veg classification
- taxon obs vs individual obs
- each occurrence of a taxon labeled as a particular kind(s)
- VegBank tables fit into PPT diagram boxes
- VegBank can store as dataset, export to VegBranch
- VegBranch is access tool
- maximize flavors of plots that can go in
- different vertical strata defined in different ways
- easy to search, cite, link to, export, import, annotate (errors, community, types of organisms)
- multiple labels: original, current, used in pub
- particular plant might be observed in multiple events
- one name -> multiple specimens, but multiple names/specimen: many-to-many
- three databases
- plots
- nomenclature
- communities
- plug all of VegBank into BIEN 3 or just subset of 7 tables?
- what to add to fit BIEN 3 requirements?
- traits, other than stem diameter
- leaf area
- get from outside b/c linked to taxon name
- traits, other than stem diameter
- plot has locality info
- user-defined attributes
- TROPICOS defines fields: measurements, sla, morphological description
- attach additional variables
- traitValue, traitName, traitUnits
- plot properties don't change
- leave empty unused props
- location: latitude, accuracy (m or km), confidentiality, public view
- confidential plots b/c on private land: gov't regulate or people take plants
- fuzzing of location
- cover, strata methods
- metadata will apply to plot level
- embargo rules: period of time or until change status
- real lat vs lat: lat is public view; real is what we know
- coordinate geom: UTM coords converted
- BIEN: might not want to manage private data
- lat/long: origin: corner or center? usually center
- sometimes multiple for same plot
- reference subplots as pos within plot
- synonyms: mult obs of same thing
- another name for duplicate
- party boxes: who took data
- get rid of duplicates? but might be slightly different: near synonyms
- one synonym tagged as current
- how synonyms generated
- some data in CVS database
- additional context to add
- digitization feature like
- data should be identical except for digitization differences
- e.g. data enterer didn't know that record already in database
- keep inactive records to reestablish link later
- same entity
- cleaned taxa to be consistent with different authority
- observation table has some measurements of place (plot level obs): soil obs, height of tree layer (avg height of canopy), height of shrub layer
- taxon obs are obs of things
- note VegBank diagram legend
- importance values pertain to taxon
- sometimes taxa assigned to vertical strata in taxon observations nest
- stratum is vertical slice of plot by bands of foliage (shrub layer, etc.) or by height
- taxon attributes: cover, biomass
- some sizes done in bins
- taxon observation has name for kind of organism
- count of different-size stems
- record whether tree is leaning; hurricane damage in user-def field
- stem count: # stems in stem size class
- taxon importance
- coord of stem
- CTFS plot just has x, y, height, diameter
- tropical forest plots: taxon name, diameter, height
- but in Carolinas, don't do x, y of every stem: just tally by size class and species
- can do CTFS, cover plots, bins
- lots of column names
- stemCode: tag number
- area overnormalized
- stem x, y position: pertains to one stem or multiple? only for one stem
- VegBank ERD compromise to accommodate multiple protocols; more complicated than necessary
- tree with multiple stems: as same individual: linked together
- strata: merge tables?
- stem counts are aggregated: what is aggregation when aggregate is one
- tuple: species
- people divide between aggregate or individual observations
- indiv vs avg measurements in analysis: throw out obs w/ count >1
- species, # indiv, stem sizes: don't know how indivs map to stems
- can mix indiv measurements with plot-level measurements
- get VegBank data loading scripts from Steve Dolins
- get MS Access databases for loading data
- get TROPICOS schema
- need desktop dat entry tool for off-the-grid access
- SQLite, embedded MySQL instance
- autocomplete taxonomic names
- modify existing CTFS data entry tool (under development)?
- talk to Brad
- Steve has previous implementation
- focus on confederated schema
- modify VegBank based on BIEN 3 analytical db
- check that all repos are there
- takes 4-6 hours to load all data (down from 3 days)
- takes 4 days to load TNRS data b/c querying MOBOT web service 1 name at a time
- Atrium project has online data entry tool