Project

General

Profile

2012-11-29 prioritization UI group

  • cost, dependencies

UI

  1. tracking provenance, data providers (1)
  2. authentication (2)
    • users/passwords
    • federated authentication system
  3. content access control
    • limit access to controlled datasets (3a)
      • access to authenticated content
      • logging IP addresses
    • reporting details of data access to data owner (3b)
      • point to text log, send e-mail, send digest e-mail
      • setting that provider can control: opt-in/out
    • itemize controlling content access
  4. control access of data by owner
    • receive automated requests for data access
    • receive invitations for co-authorship
  5. non-authenticated content access (3d)?
    • providers want control of data
    • so people will commit to providing info to BIEN
    • make summaries of data access available to data owners
    • how complex is notification that someone has requested your data
      • e-mail link to approve person
      • automated request generation
      • picklists of datasets, users
      • don't need to install policing mechanism
    • what will be publicly visible
    • embargo completely hidden data
    • make maps available online after window expires (to MOBG?)
      • maps are highly digested products, make available to everyone?
    • push maps to Map of Life after window expires
    • range maps, IUCN threat levels
  6. data loading
    • who's the gatekeeper
    • who to accept plot data from

Notification mechanism

  • build in messaging within application
  • craigslist: see only encrypted e-mail of person
  • display user, need way to find user among list of many
    • search on personal info?
  • pulled back from exposing personal data
  • permissions granted
  • if asking for data, agree to reveal personal info
    • people will not give data to anonymous user
  • when person moved through different institutes, track changes

Data loading (5)

  • can get data in, but more difficult to get data out

HTML mapping tool

  • comprehensive UI tool
  • or series of instructions w/ file format, mapping file
  • then provide mapping to BIEN
  • who our users are?
    • novice? differing levels of ability
  • map against our term for field
  • complex to build interface to cover all scenarios
  • where to upload data?
  • mediated by website, not person
  • UI, series of templates?
  • VegX mapping tool to map spreadsheet data
  • handcraft VegX? but not intent of VegX
  • plots that needed to be connected to previous records
    • tree measurements at different times connected together
    • match on tree tag, ID
      • history of tags
  • each tree gets unique number
  • number is unique at scope of plot
  • measure individual trunks
  • connect remeasurements together
  • locate individual tree, identified by subplot, ind. ID
  • different fields to identify individual
  • template scenario: correct set of unique identifiers
  • file template based on published schema, with mapping instructions
  • user maps data to template
  • error reports on import
  • DataUP: CA digital library got grant
    • validation, metadata plugin
    • open spreadsheet, validate each column
    • like Google Refine: transforms data into format
    • DataONE node
    • convert to CSV, stores it
    • lightweight model
    • transformations on data
  • units
  • save customized schema/mapping for re-use
  • load spreadsheet -> do mappings, transformations
  • access to mappings via repository
  • managing mappings
  • zip archive that gets extracted

UI

  • user logs in, exports file to CSV
  • instructions for format, how to transform to CSVs
  • downloadable mapping template
  • user's fields stay the same
  • most important to store the map.csvs
  • tomorrow: review VegCore
  • Nick, Susan, Aaron
  • harmonize with VegCSV terms
  • metadata terms
  • strategy

Structural changes to BIEN

  • data upload adjusted to allow partial updates
  • do TNRS on each incoming dataset
  • geovalidation, TNRS on live data
  • errors that prevent loading
  • duplication of processes, adding new records
  • granular editing tools
  • data management tool: complex job
  • process is automated
  • used to be adding records->rescrub
  • some providers will start correcting at the source
  • data upload (4a)
  • data refresh (4c)

Error reporting (4b)

  • how to report data back to user
  • digest of error reports
    • join to original rows
  • original records, issues in data
  • download error log/CSV table
  • report back to user via log file?
  • generate errors report
  • would error report provide necessary info for provider
  • placename->records with it
  • human-readable digest
  • HTML display to read text error report
  • e-mail link to report
  • download error table, join to their table
  • give data provider something to work from
  • display list of errors, used as checklist
  • status of upload: in import log
    • exposed by website
  • summary of data -> PDF
  • HTML report or PDF summary
  • user profile->folder with past reports

Download tracking

  • track each download as unique event
  • details of all imports
  • status of import
  • success/failure
  • human moderator needed?
  • uploaded->staging table
  • monitoring of upload status (initial validation, staging, core, complete)
  • certain date, time->version schema
  • management tool for admin

Search/discovery

  • query interfaces: API (5a), UI (HTML) (5b)
  • lower priority to make available to public?
  • Brian McGill's API
  • SQL statement->API URL to perform SQL request
  • UI, data people
  • almost no query logic in UI
  • UI just knows how to talk to API, not directly to DB
  • level of separation
  • separation of concerns
  • augment API w/o breaking website

Backups

  • need data on server
  • expanded schema to support users, data access levels, profiles, user input

Schema changes

  • authentication table
  • user-driven uploads
  • some points of pipeline need intervention
  • stop at part of pipeline
  • ontological soil schema
  • metadata for plot
  • storing soil data different for every plot schema

Traits

  • DB of values
  • like in taxonomy
  • every revision, all info redone instead of storing raw observation data
  • trait DB model to store actual measurements for recombining
  • when info synthesized, lose info

To do

  • override map spreadsheet name using dir name
  • hierarchy of projects