Project

General

Profile

Deliverables for 2013-10-31

with modifications by Mark

underlined items are tasks; crossed out tasks have been completed

Our goal is to produce a useful information resource for vegetation ecologists to study broad-scale, even global patterns in plant distribution and abundance. To achieve this goal, it is most important at this advanced stage in the BIEN DB's development that we assess the accuracy of the database; perform usability testing to ensure that the database contains information that meets scientists' research needs; and verify that those data can be delivered in formats accessible to the researchers [requires delivering extract to each scientist in the format requested, to test the full extract-generation pipeline].

In the upcoming weeks, then, we must identify scientists who are knowledgeable about and need to use BIEN3 data, and ask them both which attributes are needed for their analyses and what, if any, attributes are missing from the confederated data sources. To be clear, our confederation schema could not accommodate ANY and ALL attributes for every ingested dataset, as this led to continual refactoring of the schema with attendant delays due to coding and reloading of the data. We must, however, be confident that we have captured the essential details of all the datasources currently merged into BIEN.

By the 10/31 deadline, we hope to have identified several scientists who will participate in usefulness/usability1 testing, and (depending on their availability) to have received specific lists of the attributes needed from BIEN for their analyses. We also plan to have captured by that time any critical attributes from individual datasources that have been identified from our weekly discussions, such as the geovalid filter and the globally-unique occurrenceID [needs DB reload].

The attribute lists will not only inform the usefulness/usability1 testing, but will also help focus the datasource validations on just those columns that are most important. Since datasource validations are a particularly time-consuming process, we would need to carefully prioritize them to make useful progress on them for the 10/31 deadline. Key priorities would include fixing already-identified issues (in VegBank, Madidi, ARIZ, U, and TEX) and fulfilling those validation feature requests that relate to a specific analysis from the usability testing. This should take priority over preparing 1st-round extracts for datasources that have not yet been validated (MO (revalidation after refresh), GBIF, FIA, and CVS), because we would not be able to find and fix issues in those datasources in time for the 10/31 deadline [Martha and Brad changed the priorities for data validation so that this ordering no longer applies].

1 usability: whether the users can get what they want (an objective question)
usefulness: the intrinsic value of what is provided (a subjective question)