Project

General

Profile

2011 working group Tu BIEN database

  • time constraint: 1 year
  • what do we want at end of year?
  • audience for mapping tool
  • additional data not in 2.0 that we want in 3.0
  • identify primary science questions to be done w/ BIEN 3
  • complexity of plots
  • primary users
  • educate people on domain
  • need MS Access databases for loading data
  • plot collection
  • present plot data mgmt interface
  • kinds of data
    • occurrence data of organism w/ coords
    • plot data
  • 50 million herbaria in the world
  • digital records of herbarium data
  • same species put together
  • undescribed species
  • metadata: same species in different plots refer to difft things
  • specimen is just a presence, but a plot has absences of species, too
  • every occurrence/recorded indiv associated with a specific area
  • eco plots where only studying a subset of species
  • cover if doing all taxa
  • safe way to store data in archives
  • plot
    • place that doesn't change (geocoordinates)
    • subplots
  • plot observations
    • most plots have one obs
    • mult obs if come back to same plot mult times
  • lump observations into projects
  • community classifications
    • US nat'l veg classification
  • taxon obs vs individual obs
  • each occurrence of a taxon labeled as a particular kind(s)
  • VegBank tables fit into PPT diagram boxes
  • VegBank can store as dataset, export to VegBranch
  • VegBranch is access tool
  • maximize flavors of plots that can go in
    • different vertical strata defined in different ways
    • easy to search, cite, link to, export, import, annotate (errors, community, types of organisms)
  • multiple labels: original, current, used in pub
  • particular plant might be observed in multiple events
  • one name -> multiple specimens, but multiple names/specimen: many-to-many
  • three databases
    • plots
    • nomenclature
    • communities
  • plug all of VegBank into BIEN 3 or just subset of 7 tables?
  • what to add to fit BIEN 3 requirements?
    • traits, other than stem diameter
      • leaf area
      • get from outside b/c linked to taxon name
  • plot has locality info
  • user-defined attributes
  • TROPICOS defines fields: measurements, sla, morphological description
    • attach additional variables
    • traitValue, traitName, traitUnits
  • plot properties don't change
    • leave empty unused props
    • location: latitude, accuracy (m or km), confidentiality, public view
  • confidential plots b/c on private land: gov't regulate or people take plants
    • fuzzing of location
  • cover, strata methods
  • metadata will apply to plot level
  • embargo rules: period of time or until change status
  • real lat vs lat: lat is public view; real is what we know
  • coordinate geom: UTM coords converted
  • BIEN: might not want to manage private data
  • lat/long: origin: corner or center? usually center
    • sometimes multiple for same plot
  • reference subplots as pos within plot
  • synonyms: mult obs of same thing
    • another name for duplicate
  • party boxes: who took data
  • get rid of duplicates? but might be slightly different: near synonyms
    • one synonym tagged as current
  • how synonyms generated
  • some data in CVS database
  • additional context to add
  • digitization feature like
  • data should be identical except for digitization differences
    • e.g. data enterer didn't know that record already in database
  • keep inactive records to reestablish link later
  • same entity
  • cleaned taxa to be consistent with different authority
  • observation table has some measurements of place (plot level obs): soil obs, height of tree layer (avg height of canopy), height of shrub layer
  • taxon obs are obs of things
  • note VegBank diagram legend
  • importance values pertain to taxon
  • sometimes taxa assigned to vertical strata in taxon observations nest
  • stratum is vertical slice of plot by bands of foliage (shrub layer, etc.) or by height
  • taxon attributes: cover, biomass
  • some sizes done in bins
  • taxon observation has name for kind of organism
  • count of different-size stems
  • record whether tree is leaning; hurricane damage in user-def field
  • stem count: # stems in stem size class
  • taxon importance
  • coord of stem
  • CTFS plot just has x, y, height, diameter
  • tropical forest plots: taxon name, diameter, height
  • but in Carolinas, don't do x, y of every stem: just tally by size class and species
  • can do CTFS, cover plots, bins
  • lots of column names
  • stemCode: tag number
  • area overnormalized
  • stem x, y position: pertains to one stem or multiple? only for one stem
  • VegBank ERD compromise to accommodate multiple protocols; more complicated than necessary
  • tree with multiple stems: as same individual: linked together
  • strata: merge tables?
  • stem counts are aggregated: what is aggregation when aggregate is one
  • tuple: species
  • people divide between aggregate or individual observations
  • indiv vs avg measurements in analysis: throw out obs w/ count >1
  • species, # indiv, stem sizes: don't know how indivs map to stems
    • can mix indiv measurements with plot-level measurements
  • get VegBank data loading scripts from Steve Dolins
  • get MS Access databases for loading data
  • get TROPICOS schema
  • need desktop dat entry tool for off-the-grid access
    • SQLite, embedded MySQL instance
    • autocomplete taxonomic names
    • modify existing CTFS data entry tool (under development)?
    • talk to Brad
    • Steve has previous implementation
  • focus on confederated schema
  • modify VegBank based on BIEN 3 analytical db
  • check that all repos are there
  • takes 4-6 hours to load all data (down from 3 days)
  • takes 4 days to load TNRS data b/c querying MOBOT web service 1 name at a time
  • Atrium project has online data entry tool