2014-02-24 working group¶
Brian E's notes:¶
decisions¶
BIEN3 beta release¶
- release BIEN3 to BIEN group in April(?)
data provider feedback¶
- want to improve this (Peter J)
new data¶
- want a "perpetual data machine" (Bob)
- want to be able to add new datasources easily
to do for Aaron¶
documentation¶
- create tutorials about work done
- create tutorial for how to add specimens and plots data
traits¶
- refresh traits data from
bien2.trait_staging
and reimport to VegBIEN, once updated data from Cyrille is available
new data¶
- make it easy to add new datasources
other to dos¶
new data¶
- need to find people who will add data after October (Martha)
analytical database¶
- develop specific use cases for it
responses to questions asked¶
Action Items¶
BIEN committees¶
(1) Data Attribution group¶
For each data set identify public, private, or intemediate use¶
We have already done this, and the results have been compiled in a *Google spreadsheet* .
Main BIEN3 Database comments/suggestions¶
Bob¶
How to get quicker turnaround in future?¶
Automating and simplifying key parts of the *mapping* and *import process* would help enormously with this.
Main short term goals before BIEN3 release¶
1. Ability to grow the database¶
Add new data sources reasonably quickly¶
(see above)
5. Data access¶
Direct access to database¶
We have a page with *instructions on how to access the database*. It is also possible to *log in without a password* to view just the database structure.
Detailed notes of the meeting discussion¶
Work so far has been delayed¶
main issue is bogging down with data validations¶
We could speed things up by allowing the wider BIEN group (or even the wider ecological community) to help test the database. This would involve making an alpha release, which scientists and data providers could then use and test, with the caveat that there may be errors in the data. (However, such errors are not likely to be any more significant than the errors in BIEN2.)
Data have been loaded since early 2013?¶
Yes, that's correct. We have been running the full-database import regularly since 3/20121, and have had all the BIEN2 datasources loaded since 9/20122.
1 *import stats* > current
tab > leftmost ("By row") import on 2012-3-22 (with 9 datasources)
2 *import stats* > 2012-6~9
tab > 2012-9-7 import (with 19 datasources)
.
BIEN3 – public release¶
We need to discuss how this public release will look¶
I would suggest releasing the *denormalized, full_occurrence-style table*, as this will be in the most useful format for scientists.
Future of BIEN3? Where will it be?¶
Move over to iPlant? Or stay at NCEAS?¶
It would be good to have a mirror of the database at iPlant, for faster access by iPlant/UArizona scientists. It would also be a good idea to have an off-site backup on the iPlant servers.
Discussion about the importance of training the next generation of informaticians! Can’t just rely on Brad and Aaron for BIEN development.¶
To avoid a dependency on a small # of people for all development, we should train iPlant personnel on how to maintain and add data to the database.
We would like Aaron to capture the essence for going through work done via creating ‘Tutorials’ - having him produce documents that would walk through clearly and easily how to understand what he has done.¶
We already have some of these kinds of tutorials:
- Adding a normalized flat-file datasource
- Mapping a new table in a normalized SQL datasource
- Import steps
- *Full database import*
BIEN 3, what are the new results? BIEN3 – has filled in many of our data holes.¶
There are approximately 350(?) K species in BIEN3 post scrubbing.¶
Note that this includes no-opinion names.
BIEN 3 Data release¶
Need a list of data sources that are public, not public, or intermediate?¶
We have already done this, and the results have been compiled in a *Google spreadsheet* .
BIEN Feedback to data providors¶
We could be a data cleaning service¶
Even better, we could be a data standardization service, which puts custom data into a standard exchange schema (VegCore).
BIEN3 pipeline . . . still somewhat modular¶
Geovalidation was done at iPlant¶
Actually, the geovalidation is done on the NCEAS servers in the same VM as the import. The confusion may be because some work was done on the geovalidation pipeline by an iPlant developer.