Project

General

Profile

Solutions

Technical challenges

Integrating plots and specimens

  • Enter specimen as species count with abundance = 1 and plot size = coordinates precision?
  • Flag species counts and specimens as different types of data
  • Dealing with rélevés from TurboVeg

Obtaining data from each databank

  • Very few databanks have a unified, exportable format
  • Even fewer databanks have a web service
  • Some databanks require a login or a request form to access data

Merging widely varying formats of data

  • Common formats: XML, CSV, HTML tables, map images
  • We really want access to the actual database behind each databank, so that we can directly extract the information it's storing, but this may be difficult to obtain
  • Each data source will need a potentially complex script to convert it to the main BIEN 3 format
  • The BIEN 3 format will need to be very complex to contain all the data in existing data formats
  • A one-time export from each databank is not enough if we want to capture future changes to the databanks

Creating a user interface that has the features of existing databanks but uses the BIEN 3 schema

  • Existing UIs are tied to their associated schemas and couldn't easily be re-used unless the BIEN 3 schema is very similar
  • This would make it difficult to build off of e.g. VegBank unless we build the schema off of VegBank, too
  • Alternative is to create a UI from scratch, which will take much longer to duplicate all the features in existing databanks

Supporting normalized and denormalized data needs

  • If BIEN 3 database is highly normalized, denormalized views would need to be maintained for common selections and projections of data for CSVs
  • If BIEN 3 database is denormalized, data would be more difficult to store, maintain, and update

Database

  • One row per observation
  • Each scientist deletes the rows and columns they're not interested in
    • select: which rows
    • project: which columns
  • Validate all taxonomic info with TNRS
  • Differentiate between individual specimen locations and aggregate species abundances in a plot
    • But could enter an ecological inventory (say, a count of plants of a particular species in a plot) as a specimen with a quantity, and enter an individual specimen with a quantity of one
  • Include tables for species traits and the taxonomic hierarchy

Hierarchical, normalized tables

  • similar to VegBank entity relationship diagram
    • several dozen tables
  • "BIEN3 should be a highly normalized database more similar to the vegBank" (e-mail from Brad Boyle on 2011-10-17)
  • updates only need to change one value
  • but joins complex, potentially slow
  • views to include calculated values
  • expand VegBank schema with fields for specimens and time series data

Flat, denormalized table

  • easily extract subset of rows and columns
  • no joins needed
  • but updates must update every occurrence of a value
  • triggers to fill in calculated values
    • e.g. abundance in a plot when individual specimens are provided
    • run on select so that totals are not calculated after each row insert
  • expand viewFullOccurrence to include columns from other data formats

User interface

Adapt University of Arizona Herbarium code to BIEN 3 schema

  • Brad is using this code to create a UI for the BIEN 2 database
  • Add additional search and query capabilities

Create entirely new database similar to existing ones

  • Use schema based on VegBank, Veg-X
  • Could be modeled after VegBank
  • Use a newer Java framework like Play

Expand VegBank to include data from other continents and additional observation info

  • VegBank import uses XML, so it can store any data that can be converted to VegBank XML
  • Data from other databases can be imported into VegBank
    • This process can be automated to regularly add new data entered into the source databases
  • Cite source databank to address intellectual property concerns about storing external data
    • The Atlas of Living Australia currently uses external datasets in a similar way
  • Disadvantage: VegBank uses Java Spring, which is outdated

Create search tool that draws from existing databases but doesn't itself store data

  • Needs web service or web page scraper for each database that information is pulled from
  • A central server could cache data from users' queries in a local database
  • Alternatively, a program could run on the user's machine (Java applet, etc.) to avoid burdening a server with all the queries

Effect of climate

Add climate metadata to plots

  • Fields: temperature, rainfall, etc.
  • Use plot coordinates to pull this data from other data sources that have it: Climate data

Overlay climate maps with species location maps

  • Shows trends visually