Solutions¶
- Table of contents
- Solutions
Technical challenges¶
Integrating plots and specimens¶
- Enter specimen as species count with abundance = 1 and plot size = coordinates precision?
- Flag species counts and specimens as different types of data
- Dealing with rélevés from TurboVeg
Obtaining data from each databank¶
- Very few databanks have a unified, exportable format
- Even fewer databanks have a web service
- Some databanks require a login or a request form to access data
Merging widely varying formats of data¶
- Common formats: XML, CSV, HTML tables, map images
- We really want access to the actual database behind each databank, so that we can directly extract the information it's storing, but this may be difficult to obtain
- Each data source will need a potentially complex script to convert it to the main BIEN 3 format
- The BIEN 3 format will need to be very complex to contain all the data in existing data formats
- A one-time export from each databank is not enough if we want to capture future changes to the databanks
Creating a user interface that has the features of existing databanks but uses the BIEN 3 schema¶
- Existing UIs are tied to their associated schemas and couldn't easily be re-used unless the BIEN 3 schema is very similar
- This would make it difficult to build off of e.g. VegBank unless we build the schema off of VegBank, too
- Alternative is to create a UI from scratch, which will take much longer to duplicate all the features in existing databanks
Supporting normalized and denormalized data needs¶
- If BIEN 3 database is highly normalized, denormalized views would need to be maintained for common selections and projections of data for CSVs
- If BIEN 3 database is denormalized, data would be more difficult to store, maintain, and update
Database¶
- Modifications to VegBank to get VegBIEN
- db will be Postgres
- UI will be Django?
- One row per observation
- Each scientist deletes the rows and columns they're not interested in
- select: which rows
- project: which columns
- Validate all taxonomic info with TNRS
- Differentiate between individual specimen locations and aggregate species abundances in a plot
- But could enter an ecological inventory (say, a count of plants of a particular species in a plot) as a specimen with a quantity, and enter an individual specimen with a quantity of one
- Include tables for species traits and the taxonomic hierarchy
Hierarchical, normalized tables¶
- similar to VegBank entity relationship diagram
- several dozen tables
- "BIEN3 should be a highly normalized database more similar to the vegBank" (e-mail from Brad Boyle on 2011-10-17)
- updates only need to change one value
- but joins complex, potentially slow
- views to include calculated values
- expand VegBank schema with fields for specimens and time series data
Flat, denormalized table¶
- easily extract subset of rows and columns
- no joins needed
- but updates must update every occurrence of a value
- triggers to fill in calculated values
- e.g. abundance in a plot when individual specimens are provided
- run on select so that totals are not calculated after each row insert
- expand
viewFullOccurrence
to include columns from other data formats
User interface¶
Adapt University of Arizona Herbarium code to BIEN 3 schema¶
- Brad is using this code to create a UI for the BIEN 2 database
- Add additional search and query capabilities
Create entirely new database similar to existing ones¶
- Use schema based on VegBank, Veg-X
- Could be modeled after VegBank
- Use a newer Java framework like Play
Expand VegBank to include data from other continents and additional observation info¶
- VegBank import uses XML, so it can store any data that can be converted to VegBank XML
- Data from other databases can be imported into VegBank
- This process can be automated to regularly add new data entered into the source databases
- Cite source databank to address intellectual property concerns about storing external data
- The Atlas of Living Australia currently uses external datasets in a similar way
- Disadvantage: VegBank uses Java Spring, which is outdated
Create search tool that draws from existing databases but doesn't itself store data¶
- Needs web service or web page scraper for each database that information is pulled from
- A central server could cache data from users' queries in a local database
- Alternatively, a program could run on the user's machine (Java applet, etc.) to avoid burdening a server with all the queries
Effect of climate¶
Add climate metadata to plots¶
- Fields: temperature, rainfall, etc.
- Use plot coordinates to pull this data from other data sources that have it: Climate data
Overlay climate maps with species location maps¶
- Shows trends visually