Grant deliverables¶
remaining¶
- making the data and database publicly accessible
- services for uploading (and re-uploading) the data
- improvements to services for accessing the data
- speed up querying: don't count the # rows in the result set, because this requires the entire result set to be materialized (very slow) to display just the first few rows (very fast)
- query canceling when user presses the Stop button
- currently, you need to use the process list and
pg_cancel_backend()
to kill orphaned queries (similar to how it works in phpMyAdmin for MySQL)
- currently, you need to use the process list and
- query timeouts
- validation
- not mentioned specifically in the grant; more likely something that specific users of the DB would request for specific columns, once the DB is released
from the *iPlant grant description*¶
crossed out tasks have been completed and underlined tasks are left
- create a -web-accessible database- [however, iPlant most likely also intended the database to be publicly accessible] in support of the Botanical Information and Ecology Network project. The envisioned database must accommodate
multiple-millions of records of plant biodiversity data[we have over 80 million plant observations], including information about theirtaxonomic identity[scrubbed name and ranks],geospatial location[placename and coordinates],time of sampling[date collected], as well aspotentially related information regarding co-occurrence with other taxa[the plot ID of each sampled taxon can be used to find co-occurrences],sampling methodologies[sampling protocol, plot area],functional traits[diameter, height, growth form, custom traits], andassociated environmental measurements[temperature, precipitation]
developing an integrated information resource by merging several well-established plant occurrence information resources, including specimen data from various natural history and botanical museums, as well as plots data;
we have merged 30 datasources (22 with specimens and 8 with plots) into a unified schemacreating useful and appealing web interfacesand services for uploading and -accessing these data- [see also example query results ; desired improvements listed above under remaining] for quantitative investigations of plant biodiversity;merging this information resource with other services and tools under development within the iPlant cyberinfrastructure, such as for taxonomic name resolution or geospatial quality control, as well as related efforts;
we have integrated directly with the TNRS web service for taxonomic name resolution. we have our own geovalidation tools, so there is no need to integrate with an iPlant service for this.planning for the architecture of this framework to be compatible with emerging data confederations in the earth and life sciences, such as DataONE and/or the Data Conservancy
we are published in the KNB, which is a DataONE nodeenabling this resource to be extensible to accommodate the growing array of relevant information useful for biodiversity research, including but not limited to information about geospatial and environmental context, plant phylogenies, and associated genomic and functional trait data.
the database is reloadable, which allows the schema to be expanded
- The DBD activities will also
coordinate activities with the complementary activities in plant sciences[we have integrated several existing data aggregators into our database, such as GBIF and VegBank], including especiallydevelopments in geospatial intelligence technologies underway through the Environments & Organisms Working Group collaboration between iPlant and NCEAS[we use geovalidation tools from Jim Regetz, who also worked on the Environments & Organisms project]
additional deliverables from Mark¶
- validation
- "strong enough confidence that there are no egregious errors in how the various datasources are confederated within BIEN3" (Mark)
materialized viewslater, maybe add a separate view for just the plot taxaattribution