Project

General

Profile

Requirements

Working group BIEN 3.0 requirements outline

  • Stores large amounts of species occurrence data in database
    • "Global scale, extensible, confederated database: Here we will create a web-based framework with data communication protocols and exchange schema" (BIEN iPlant grant proposal p. 8)
  • Integrates existing databases
    • "Herbarium specimens will supply large samples of presence data needed for modeling species distributions and ranges sizes; ecological inventories will provide data on co-occurrence, diversity, and demography and traits" (BI Charter deliverable 4)
    • "combine plot data over extensive spatial and temporal gradients in order to perform analyses and make predictions of vegetation change and dynamics at local and global scales" (Veg-X – an exchange standard for plot-based vegetation data)
    • "Veg-X will be our container for getting things in and out" (e-mail from Brad Boyle on 2011-10-12)
  • Integrates specimen and plot data
    • "BIEN (and it's predecessor, SALVIAS) is unique among biological databases in that it combines specimens and ecological inventories (not to mention traits). This comprehensiveness presents many challenges, and is the reason why we will require two loading schemas to load external data sources into BIEN3.0 (VegX and DwC; have a look at section 1.0 of the BIEN3 architecture powerpoint: "Data mapping and provider services")." (e-mail from Brad Boyle on 2011-10-18)
  • Populates analytical databases
  • Allows queries to be run on this data
    • "Web-accessible end-user resource: Next, we propose to create a flexible and logical interface for powerful data querying, discovery and analysis, with download formats readily usable by all field of plant biology research" (BIEN iPlant grant proposal p. 8)
  • Answers scientific questions
  • Time series data and repeated measurements over time
    • Notion of individual entity and time and place it's in
    • Time usually represented as a timestamp or datetime, but can also be an index for a repeated measurement
    • Have different times in different columns (time1, time2, ...)?
    • Sometimes need to transpose data to put times on the x-axis
  • Sufficient metadata
    • any column that has the same value in every row is metadata
    • add columns for metadata to a dataset when integrating it with other datasets
    • EML fields as attributes in metadata table
    • Darwin Core: data or metadata?
  • Specimen and occurrence info
  • Integrates different types of data: (Outline of possible BIEN 3.0 white paper p. 1)
    • Environmental layers
    • Taxonomic services
      • Taxonomic concept resolution
      • Support for alternative taxonomic standards
      • Taxon concept relationship mapper
      • Taxon agglomeration tool
    • Trait data
    • Geospatial discovery tool
    • Data discovery for equivalent data housed elsewhere?
    • IPToL and other phylogeny resources
  • Validates and standardizes taxonomic info
    • TNRS: Taxonomic Name Resolution Service
    • Different spellings of species should be corrected
    • Some tables (e.g. traits) include higher-order taxa, such as family, which should be standardized
    • Species determines all the higher-order taxa, so they should be the same everywhere that species is used
    • Taxonomic relationships are sometimes revised, and the revisions should be reflected
  • Validates latitude/longitude
  • Data is validated by humans
  • Captures expert knowledge
  • Gives feedback to data providers on correctness and completeness of their data
    • Corrections, expansions of content, missing data, preferred synonyms
  • Is accessible to the average scientist
  • Records source of data (provenance)
    • Attribution of databank that dataset was imported from
  • Stores different versions of data
    • Can be multiple determinations of what something is: initial, expert, genetic sequencing
  • Allows annotation of data
  • Delivers data as a web service
  • Uses authentication and access controls to limit user actions
    • who can modify and view each table
    • LDAP-based single sign on?
    • OpenID?
    • see DataONE authentication