Solutions¶

Table of contents
Solutions

Technical challenges¶

Technical Challenges for BIEN 3

Integrating plots and specimens¶

Enter specimen as species count with abundance = 1 and plot size = coordinates precision?
Flag species counts and specimens as different types of data
Dealing with rélevés from TurboVeg

Obtaining data from each databank¶

Very few databanks have a unified, exportable format
Even fewer databanks have a web service
Some databanks require a login or a request form to access data

Merging widely varying formats of data¶

Common formats: XML, CSV, HTML tables, map images
We really want access to the actual database behind each databank, so that we can directly extract the information it's storing, but this may be difficult to obtain
Each data source will need a potentially complex script to convert it to the main BIEN 3 format
The BIEN 3 format will need to be very complex to contain all the data in existing data formats
- see ERDs and XML tree diagrams
A one-time export from each databank is not enough if we want to capture future changes to the databanks

Creating a user interface that has the features of existing databanks but uses the BIEN 3 schema¶

Existing UIs are tied to their associated schemas and couldn't easily be re-used unless the BIEN 3 schema is very similar
This would make it difficult to build off of e.g. VegBank unless we build the schema off of VegBank, too
Alternative is to create a UI from scratch, which will take much longer to duplicate all the features in existing databanks

Supporting normalized and denormalized data needs¶

If BIEN 3 database is highly normalized, denormalized views would need to be maintained for common selections and projections of data for CSVs
If BIEN 3 database is denormalized, data would be more difficult to store, maintain, and update

Database¶

Modifications to VegBank to get VegBIEN
db will be Postgres
UI will be Django?

One row per observation
Each scientist deletes the rows and columns they're not interested in
- select: which rows
- project: which columns
Validate all taxonomic info with TNRS
Differentiate between individual specimen locations and aggregate species abundances in a plot
- But could enter an ecological inventory (say, a count of plants of a particular species in a plot) as a specimen with a quantity, and enter an individual specimen with a quantity of one
Include tables for species traits and the taxonomic hierarchy

Hierarchical, normalized tables¶

similar to VegBank entity relationship diagram
- several dozen tables
"BIEN3 should be a highly normalized database more similar to the vegBank" (e-mail from Brad Boyle on 2011-10-17)
updates only need to change one value
but joins complex, potentially slow
views to include calculated values
expand VegBank schema with fields for specimens and time series data

Flat, denormalized table¶

easily extract subset of rows and columns
no joins needed
but updates must update every occurrence of a value
triggers to fill in calculated values
- e.g. abundance in a plot when individual specimens are provided
- run on select so that totals are not calculated after each row insert
expand viewFullOccurrence to include columns from other data formats

User interface¶

Adapt University of Arizona Herbarium code to BIEN 3 schema¶

Brad is using this code to create a UI for the BIEN 2 database
Add additional search and query capabilities

Create entirely new database similar to existing ones¶

Use schema based on VegBank, Veg-X
Could be modeled after VegBank
Use a newer Java framework like Play

Expand VegBank to include data from other continents and additional observation info¶

VegBank import uses XML, so it can store any data that can be converted to VegBank XML
Data from other databases can be imported into VegBank
- This process can be automated to regularly add new data entered into the source databases
Cite source databank to address intellectual property concerns about storing external data
- The Atlas of Living Australia currently uses external datasets in a similar way
Disadvantage: VegBank uses Java Spring, which is outdated

Create search tool that draws from existing databases but doesn't itself store data¶

Needs web service or web page scraper for each database that information is pulled from
A central server could cache data from users' queries in a local database
Alternatively, a program could run on the user's machine (Java applet, etc.) to avoid burdening a server with all the queries

Effect of climate¶

Add climate metadata to plots¶

Fields: temperature, rainfall, etc.
Use plot coordinates to pull this data from other data sources that have it: Climate data

Overlay climate maps with species location maps¶

Shows trends visually

Files (0)

Project

General

Profile

Wiki

Solutions¶

Technical challenges¶

Integrating plots and specimens¶

Obtaining data from each databank¶

Merging widely varying formats of data¶

Creating a user interface that has the features of existing databanks but uses the BIEN 3 schema¶

Supporting normalized and denormalized data needs¶

Database¶

Hierarchical, normalized tables¶

Flat, denormalized table¶

User interface¶

Adapt University of Arizona Herbarium code to BIEN 3 schema¶

Create entirely new database similar to existing ones¶

Expand VegBank to include data from other continents and additional observation info¶

Create search tool that draws from existing databases but doesn't itself store data¶

Effect of climate¶

Add climate metadata to plots¶

Overlay climate maps with species location maps¶