2011 working group Th BIEN Components¶

different classes of validation
- required formats to be enterable into database: db constraints
use cases related to analytical database: what does analyst want to get out of db
user interface in scope?
- more interest from sci community w/ interface
different than having webmap server serve shapefiles
huge pipeline to John's map generating iPlant component
easy to use interface -> more interest -> more data
users -> API -> data -> API -> tools, users
decouple impl from database
robust core db w/ clean APIs
UI: tools themselves?
web-based search tool?
map to look at locations
UI implements API
UI for data upload
MySQL in vs API
need public access point in some form
API abstracts database backend: whether it's MySQL or Postgres, etc.
range maps: already run and produced endpoint
cron job to produce analyt db regularly (behind API)
use cases are what we want to retrieve from db
timestamping and versioning
- analyt dbs have versions generated at regular intervals
- timestamp and archive download
continuous taxonomic updating of core db: track changes?
timestamp as much as possible, but sometimes data is dynamic (GBIF query)
if people only getting data though endpoint, don't need to have minute-to-minute versioning
reporting db is from day before (generated daily)
- don't keep old versions of it
requirement to use data in papers to have stamped versions
each time refresh endpoint, new version
do users need to wait to see new data entered?
- allow to query live and snapshoted db
version data points rather than whole db
- e.g. species lat/longs
user record by record changes
refresh dataset -> auto refresh endpoints
mirror of core db to query vs products put out every quarter
need to cite exact version in paper
having real-time queries to other data sources?
- bandwidth problems
- piping in other data sources challenging if dynamic
build TNRS into database?
names can be validated on the fly but then names change from query to query
sometimes want repeatability, but then can only use snapshotable data
key elements
- core db
- loading modules
- validation
- analytical database
- public access point
- versioning
analytical end products are views of db
- not directly in raw data
data summaries/end products
raw data vs calculated values
normalization, aggregation
derived data products range from raw data to highly-derived anayt products (e.g. range maps)
user just needs traits, range map as products
identify commonly desired end products
reasons for derived products
- versioning
- performance (range maps take a long time on personal computer, but 6 hrs on high performance machine)
- convenience
- repeatability
- simplifies data distribution UIs
query builder
single table to pick and choose search criteria for what to download
relationships among data elements that are not inherent in the data
info, algs, software
assembly of info creates more info than component parts
platform doesn't matter as long as doesn't become obsolete/blocker
- e.g. if MySQL can't do geo, switch
e.g. TNRS is a scrubbing alg
some of additional info comes from TNRS, validation: combines existing info with external info
what to do to ensure user can get range maps
is validation in scope?
validation is something that data passes through on way from core data to analyt data
validation, range mapping are processes applied to data on the way out
priority workflow/timeline diagram: where we are, what plan to produce and when
TNRS, GNRS also useful for data providers: get something in return for giving us data
mech to give data provider has complete access to own data
timeline
FIA has FTP site where can get all their data, metadata
reacquiring data from data sources in scope
use cases -> need metadata

Files (0)

Project

General

Profile

Wiki

2011 working group Th BIEN Components¶