Project

General

Profile

2011 working group Mo technical challenges

  • finalize schema details
  • choose database/web framework
  • clarify services and interfaces
    • services: how researchers and automated systems interact w/ database
    • optimize interfaces for visual presentation and performance
  • workflow
    • heterogenous raw sources
    • some go straight into confed database
    • data supplemented w/ lat/long in db
    • nomenclature, taxonomy issues
    • expose confed resource so it can talk to other frameworks
  • finalizing schema details
    • independent researchers should also be able to contribute
    • how lossy can it be? what is minimal info that data coming into framework must have?
      • e.g. Latin binomial, spatial reference
    • VegBank, CTFS, SALVIAS
    • VegX: is semantic scope what we want?
      • needs revisions or good enough as is?
      • how plays together with DwC?
    • specimens and occurrences, TCS, DwC, TraitNet (plant traits)
      • semantics approach to storing trait information
    • how to reference traits stored in trait db
      • omics community wants to reference traits
  • VegBank
    • fully-fledged ER model w/ full impl for storing plots info
    • recent hardware refresh
    • implications of deciding we like VegBank
  • choosing a framework
    • VegBank (Java/Postgres), BIEN 2 (PHP/MySQL), VegX (XMLS), Python-Django (PostGIS)
    • locational info about jellyfish: Django on Postgres
    • Python gaining traction in geospatial community: GeoDjango
  • link to
    • taxonomic info: TCS, TNRS
    • geospatial GNRS, MoL, SOS
    • traits: TraitNet, ontologies (earth science, biology, phenotypes, life sciences)
    • RDF/OWL, triplestores
    • LAMP stacks
    • take less conventional web app approach?
    • how web services implemented in framework
  • services and interfaces
    • VegBank home page (Bob designed): simple/advanced plot search
    • well-documented db: tutorial, instructional info
    • is BIEN 3 intended to be highly accessible framework?
    • simple plot search interface streamlined
    • discuss how interfaces should look
  • data acquisition
    • TEAM: alliance of institutions; vegetation protocol
    • MBG: how to integrate taxonomic side of things
    • how to get data into system
    • more sophisticated data acquisition tool
    • Eric, Sandy Andelman?
    • data provider for repos
    • VegBranch, TurboVeg load data directly into db
    • now in scope to make progress on data acq
  • objectives
    • decide confed schema
    • decide tech framework
    • brainstorm linkages to other services: GBIF exposes resources through services
    • Brad, Jim Regetz, Aaron involved with technical side
    • borrow from ALA?
    • iPlant/NCEAS: dedicated development workforce
  • more issues
    • access control granularity
    • differentiate between atomic occurrence and aggregated view of data
      • % coverage vs determining % form indiv occurrences w/in plot
    • inadequate metadata
    • allow people to choose from methodologies for aggregating data
    • handle specimen and occurrence data better
    • time series/repeated measurements need special flagging and treatment
      • VegBank: observations nested in a plot, multiple data observations nested in the first one
      • potential responses of species to climate change: need old observations
      • CTFS data is occurrence data
    • as develop services (TNRS), something wrong w/ data -> provide feedback to data provider
    • when John using BIEN 2 framework and experiences problem, capture issues
    • iterative next gen of BIEN
      • get pieces accessible to operate on
      • research perspective
  • technical challenges
    • see RedMine wiki: changes on day-to-day basis
    • reporting/tracking what's going on
  • time dimension is important
  • how users will add data
  • earlier iterations of BIEN: wanted to simplify schema so eliminated fields
    • now, throw nothing away because some fields critical metadata (DwC elements)
    • specimens: every element of DwC that pertains to plants should be in there
  • SALVIAS static, users now using BIEN
  • CTFS -> BIEN 3
  • what would benefit from becoming part of BIEN
  • need samples of data/schema so we can see types of attribute info being collected
    • can VegBank capture everything? VegX?
  • each subgroup: responsibility includes writing use case (due We afternoon 1pm)
    • BIEN 3 group will use them to define BIEN 3
    • example use cases