Project

General

Profile

2011 working group Th BIEN Components

  • different classes of validation
    • required formats to be enterable into database: db constraints
  • use cases related to analytical database: what does analyst want to get out of db
  • user interface in scope?
    • more interest from sci community w/ interface
  • different than having webmap server serve shapefiles
  • huge pipeline to John's map generating iPlant component
  • easy to use interface -> more interest -> more data
  • users -> API -> data -> API -> tools, users
  • decouple impl from database
  • robust core db w/ clean APIs
  • UI: tools themselves?
  • web-based search tool?
  • map to look at locations
  • UI implements API
  • UI for data upload
  • MySQL in vs API
  • need public access point in some form
  • API abstracts database backend: whether it's MySQL or Postgres, etc.
  • range maps: already run and produced endpoint
  • cron job to produce analyt db regularly (behind API)
  • use cases are what we want to retrieve from db
  • timestamping and versioning
    • analyt dbs have versions generated at regular intervals
    • timestamp and archive download
  • continuous taxonomic updating of core db: track changes?
  • timestamp as much as possible, but sometimes data is dynamic (GBIF query)
  • if people only getting data though endpoint, don't need to have minute-to-minute versioning
  • reporting db is from day before (generated daily)
    • don't keep old versions of it
  • requirement to use data in papers to have stamped versions
  • each time refresh endpoint, new version
  • do users need to wait to see new data entered?
    • allow to query live and snapshoted db
  • version data points rather than whole db
    • e.g. species lat/longs
  • user record by record changes
  • refresh dataset -> auto refresh endpoints
  • mirror of core db to query vs products put out every quarter
  • need to cite exact version in paper
  • having real-time queries to other data sources?
    • bandwidth problems
    • piping in other data sources challenging if dynamic
  • build TNRS into database?
  • names can be validated on the fly but then names change from query to query
  • sometimes want repeatability, but then can only use snapshotable data
  • key elements
    • core db
    • loading modules
    • validation
    • analytical database
    • public access point
    • versioning
  • analytical end products are views of db
    • not directly in raw data
  • data summaries/end products
  • raw data vs calculated values
  • normalization, aggregation
  • derived data products range from raw data to highly-derived anayt products (e.g. range maps)
  • user just needs traits, range map as products
  • identify commonly desired end products
  • reasons for derived products
    • versioning
    • performance (range maps take a long time on personal computer, but 6 hrs on high performance machine)
    • convenience
    • repeatability
    • simplifies data distribution UIs
  • query builder
  • single table to pick and choose search criteria for what to download
  • relationships among data elements that are not inherent in the data
  • info, algs, software
  • assembly of info creates more info than component parts
  • platform doesn't matter as long as doesn't become obsolete/blocker
    • e.g. if MySQL can't do geo, switch
  • e.g. TNRS is a scrubbing alg
  • some of additional info comes from TNRS, validation: combines existing info with external info
  • what to do to ensure user can get range maps
  • is validation in scope?
  • validation is something that data passes through on way from core data to analyt data
  • validation, range mapping are processes applied to data on the way out
  • priority workflow/timeline diagram: where we are, what plan to produce and when
  • TNRS, GNRS also useful for data providers: get something in return for giving us data
  • mech to give data provider has complete access to own data
  • timeline
  • FIA has FTP site where can get all their data, metadata
  • reacquiring data from data sources in scope
  • use cases -> need metadata