Project

General

Profile

2014-02-24 working group

Brian E's notes:

decisions

BIEN3 beta release

  • release BIEN3 to BIEN group in April(?)

data provider feedback

  • want to improve this (Peter J)

new data

  • want a "perpetual data machine" (Bob)
  • want to be able to add new datasources easily

to do for Aaron

documentation

  • create tutorials about work done
  • create tutorial for how to add specimens and plots data

traits

  • refresh traits data from bien2.trait_staging and reimport to VegBIEN, once updated data from Cyrille is available

new data

  • make it easy to add new datasources

other to dos

new data

  • need to find people who will add data after October (Martha)

analytical database

  • develop specific use cases for it

responses to questions asked

Action Items

BIEN committees

(1) Data Attribution group

For each data set identify public, private, or intemediate use

We have already done this, and the results have been compiled in a *Google spreadsheet* .

Main BIEN3 Database comments/suggestions

Bob

How to get quicker turnaround in future?

Automating and simplifying key parts of the *mapping* and *import process* would help enormously with this.

Main short term goals before BIEN3 release

1. Ability to grow the database

Add new data sources reasonably quickly

(see above)

5. Data access

Direct access to database

We have a page with *instructions on how to access the database*. It is also possible to *log in without a password* to view just the database structure.

Detailed notes of the meeting discussion

Work so far has been delayed

main issue is bogging down with data validations

We could speed things up by allowing the wider BIEN group (or even the wider ecological community) to help test the database. This would involve making an alpha release, which scientists and data providers could then use and test, with the caveat that there may be errors in the data. (However, such errors are not likely to be any more significant than the errors in BIEN2.)

Data have been loaded since early 2013?

Yes, that's correct. We have been running the full-database import regularly since 3/20121, and have had all the BIEN2 datasources loaded since 9/20122.

1 *import stats* > current tab > leftmost ("By row") import on 2012-3-22 (with 9 datasources)

2 *import stats* > 2012-6~9 tab > 2012-9-7 import (with 19 datasources)
.

BIEN3 – public release

We need to discuss how this public release will look

I would suggest releasing the *denormalized, full_occurrence-style table*, as this will be in the most useful format for scientists.

Future of BIEN3? Where will it be?

Move over to iPlant? Or stay at NCEAS?

It would be good to have a mirror of the database at iPlant, for faster access by iPlant/UArizona scientists. It would also be a good idea to have an off-site backup on the iPlant servers.

Discussion about the importance of training the next generation of informaticians! Can’t just rely on Brad and Aaron for BIEN development.

To avoid a dependency on a small # of people for all development, we should train iPlant personnel on how to maintain and add data to the database.

We would like Aaron to capture the essence for going through work done via creating ‘Tutorials’ - having him produce documents that would walk through clearly and easily how to understand what he has done.

We already have some of these kinds of tutorials:

BIEN 3, what are the new results? BIEN3 – has filled in many of our data holes.

There are approximately 350(?) K species in BIEN3 post scrubbing.

Note that this includes no-opinion names.

BIEN 3 Data release

Need a list of data sources that are public, not public, or intermediate?

We have already done this, and the results have been compiled in a *Google spreadsheet* .

BIEN Feedback to data providors

We could be a data cleaning service

Even better, we could be a data standardization service, which puts custom data into a standard exchange schema (VegCore).

BIEN3 pipeline . . . still somewhat modular

Geovalidation was done at iPlant

Actually, the geovalidation is done on the NCEAS servers in the same VM as the import. The confusion may be because some work was done on the geovalidation pipeline by an iPlant developer.