Project

General

Profile

2012-11-29 web interface breakout group

  • how to walk away from BIEN so it runs itself
  • web service
  • user interface needs to use API
  • HTML form that calls API
  • TROPICOS has web interface to create SQL queries
  • users familiar w/ command line, users who understand content
  • ordering
  • website, core requirements
  • factual website
  • visualizations
  • data requests
  • data uploads
  • data architecture
  • series of use cases
  • high-level user story for purpose of website
  • similar to ecommerce shopping site
  • gives the data, not analysis
  • passive interface
  • also interface to do something to data
  • data entry tool to update data in BIEN
  • provenance issue: how to get correct data back to original provider
  • expert users of BIEN who are allowed to manipulate data
  • plots more complex, smaller than specimens datasets
  • mechanism to collect thousands of plots on Excel spreadsheets
  • contracts: must make data publicly available
  • role of VegBank?
  • NVS broader than VegBank
  • VegCore can accommodate these changes
  • implications for components the website would have
  • who are the users? what products/analyses they need?
  • should BIEN become data repo for plots data in the U.S./the world?
  • make BIEN modular? each organization has data in empty schema
  • BIEN is method, not data
  • Africa with BIEN structure
  • beyond sci community or researchers, who has interest in BIEN data?
    • general public? consultants?
    • scientific method not in consulting
  • assessment: what's potentially there in terms of species
  • various Latin American repos have started charging consultants
  • stopblock
  • what mechanisms? companies make donation
  • NBG (NY?) has contracts with mining companies
  • other things BIEN produces: plots are input, ranges are output
  • horticultural community: what could grow in person's area
  • get and contribute plot data
  • native plant society-type groups
  • agriculture: iPlant funding because useful for crop science
  • plant groups, education
  • package data in simple way for students: modules
  • IUCN classification
  • NGOs: range models, raw occurrence data
  • select interfaces to get data
  • challenge is interfaces that change data
  • TROPICOS experience: takes years for user community to be happy with forms, steps
  • data entry interfaces are highly programmer-intensive, lower priority
  • 4-5 years have had group of programmers doing web interfaces for TROPICOS
  • Eric Fegraus (Conservation International): unified schema
  • BIEN not involved in interface development, CI would do that
  • independent data entry tool which can push data into BIEN
  • continued funding for UI development?
  • expose web interface for uploading data, but not data entry tools

Download tracking

  • track who downloads data
  • can't just make all data public, because some of it has access restrictions
  • not data entry/correction interface
  • SALVIAS a good model for interface
  • monitor who downloads the data
  • don't need graphical interface
  • datasets in SALVIAS can be tagged in 3 access ways: totally hidden, metadata-only, public
  • logging of downloads
  • who user was, IP address
  • users tagged as belonging to dataset
  • mechanism to send someone an e-mail when someone downloads their data (opt-out)
  • providing this through a web service
  • anonymous downloads
  • capture timestamp, IP address, username if logged in
  • authorization of access to level 2 data
  • owner grants access to datasets
  • peer-to-peer access mediated by database
  • SALVIAS maintains itself
  • if build infrastructure that supports this, other things come with it:
    • can provide to data provider who downloaded their data
  • particular functionality that repo should support
  • requirements
  • potentially have a RESTful API
  • a URL to do any action you want, then a UI on top of that
  • request access to dataset
  • grant API keys
  • authenticate access
  • infrastructure exposed to people who don't know RESTful APIs
  • interface issue
  • data entry and correction: nice to have
  • download/logging: required
  • control of data access by owner
    • avoid TRY's headaches of having to mediate this
  • e-mail changes
  • window after which dataset goes public: 5 years or 10 unanswered requests
  • build in e-mail pinger
  • SALVIAS has dead-end e-mails
  • data that's not fully public->ensure no data spills with minimal future work needed
  • identify visibility of records
  • what gets coalesced back into analytical DB
  • accesslevel field in analytical DB
  • track provenance in analytical DB
  • plot data has species name, place
  • elements assembled
  • access at owner/plot/date level
  • queries could bypass levels in core DB
  • one challenge is fuzziness
  • hierarchy of top-level and ultimate data provider
  • allow for fuzziness in identifying data provider
  • who owns plot data in public repo?
  • proximal entity
  • Conabio/REMIB
    • were open to sharing with TROPICOS?
  • error reports, range maps
  • another user community: data providers
  • data providers serve range maps created by BIEN
  • estimate of cost?
  • developing TROPICOS DB: paid developer
  • TNRS paid developer for a year
  • 4000-5000 rare species names to exclude
  • web service mechanism to request data products
  • provenance functionality, data ownership
  • data exploration: Brian's web service
  • put user interface over web service
  • more user-friendly
  • HTML form with picklists, GIS maps
  • built into web interface
  • web interface and web service would match
  • different groups doing different services, need to collaborate: "eat your own dogfood"
  • every group provides data to other groups via API
  • RESTful API
  • work to make interface more robust: security, authentication
  • BIEN-specific requests->queries
  • translate higher level request to SQL query
  • cached queries
  • indexes on analytical DB
  • index every field in analytical DB
  • TROPICOS reporting DB regenerated nightly
  • capture administrative data: additional schema elements
  • SALVIAS: when user signs up, add name, e-mail
    • user (human/institution) linked to data
  • NVS has party concept to manage ownership and participation on plots
  • application and permissions
  • external authentication
  • confederated security
  • Google sign-in
  • Shibboleth
  • everyone else: needs new account
  • using DataONE
  • if we do something, prefer to use out-of-the-box security
  • identity research
  • use own credentials to sign in on another site
  • ad-hoc user needs own account
  • complex model
  • need authentication of some kind
  • iPlant has approaches?
  • what is procedure to get an account?
  • sign up link
  • verifying that not a bot
  • passive interface that doesn't require human approval
  • need to be identified somehow
  • need user to track downloads
  • also internal mechanism for data access
  • require log in
  • anonymous user -> access public data
  • this is just for read transactions
  • log IPs to determine hits/user
  • data packages: how many times read?
  • logging table for Python table
  • straightforward?
  • what about update mechanism
  • takes month and many e-mails to load data for other DBs
  • what is a mechanism to upload data?
  • published schema to use?
  • CSV file like on TDWG site?
  • increasingly automate pipeline
  • need human being to be comfortable that incoming data meets DB's quality standards
  • compare to global jellyfish (JEDI)
  • upload CSV: potentially VegCSV
  • spec of what upload needs to look like
  • datatypes
  • if import fails, provide feedback to user
  • mechanism to send data: drop box, harvester, etc.
  • managed to staging system
  • data validations
  • feedback to provider about data quality, valid mappings
  • data w/ frequent updates (active datasets)
  • immediate feedback
  • balance between strict vs. loose VegCSV
    • where possible, use well-known standards
    • but also allow similar data
  • metadata catalog
  • GIVD: items that didn't apply, e.g. # of releves
  • vocabularies w/ common elements
  • core elements that everyone recognizes
  • optional elements
  • minimal required elements
  • weakly-typed table->define datatypes
  • once user's data is good, PDF report generated
  • successful upload->moderation queue
  • who submitted, when
  • table with unique ID associated with upload
  • deletion of inserted records if error
  • custom mapping saved in background
  • track submission as bundle
  • how to know when to delete something?
  • TNRS model: don't rebuild whole database, grows or contracts on dataset basis
  • versioning database: rollback to previous version
  • TNRS fkey walk: ON DELETE CASCADE
  • but leaves NCBI taxonomy
  • embed as much within database structure as possible
  • which are shared, which are unique keys
  • validations, reporting
  • frequency of plots
  • sparkline things
  • class of data captured by NVS
  • upload data->analysis of internal quality
  • 2nd-level validations
  • how data compares to population statistics
  • put in taxonomic name
  • early validations->flag for user to check
  • meet back at 11:15am
  • end-of-pipeline crowdsourcing and user feedback issues
  • corrections to data
  • once maps visible, find issues
  • part of DB to store user feedback, to filter/improve data
    • exclude data marked as wrong
  • 3-4 categories to tag data with
  • human/automated layers to filter data
  • hide incorrect maps?
  • how to track all range maps
  • API level, how to store bits of feedback, annotations on content
  • annotations on data object in BIEN, even if data itself is not in BIEN
  • click range map point to mark as incorrect
  • users mark specimens as cultivated
  • visual interactions with data
  • collections as cultivated
  • every herbarium from Index Herbariorum, mark w/in radius as cultivated
  • shapefile with boundaries of botanical gardens
  • weighting by population (cities)
  • geospatial component, but lose info corrected around small cities
  • web interface: view data record, add specific comment
  • look at range map->validate, star rating, which are incorrect
    • rating a book
  • occurrence, point data
  • cluster of points->flag as questionable
  • simple interface->develop more w/ feedback
  • people request datasets, find issues
  • feedback as dataset comment
  • talked about versioning downloads
  • snapshot of BIEN
  • comments about data within snapshot
  • remind user to give feedback on downloaded data
  • complete the loop on how data used
  • collate answer
  • don't build large infrastructure, start small
  • way to get feedback on invalid records
  • first year we have range map data for all species
  • BIEN 2 data->BIEN 3
  • which species to rerun
  • frameworks so don't need to build infrastructure
  • one-click flagging of points
  • feedback about record
  • original list
  • automatic downloads: downloading portions of the data
  • mech to associate user with provider
  • log IP address
  • search for things
  • data provider flips switch to make data public
  • data exports in same format
  • user interface for visualizing the data
  • hard to gauge what user community would want
  • needs-driven
  • CVS data exploration visualization
  • data discovery
  • hardening the range modeling algorithms
  • layer that sits on top of range modeling applications
  • things that people can say
  • filter on mapped areas, things, species
  • select specific fields that want to look at
  • picklist
  • something much more detailed: shapefile of Nat'l Park
    • would be very nice app
  • Java
  • what are main search/discovery axes?
    • country, spatial, temporal, taxonomic, trait, plot size, size range, habit
    • BIEN2 doesn't have temporal data, because old collections are handwritten->range of years
  • need start/end date for collected date
  • date ranges: good thing about VegBank
  • TROPICOS also has start/end dates
  • fields for D/M/Y->display date
  • legacy data
  • spatial query: what's at a point
  • family or habit
  • flag what is co-occurrence data
  • level of granularity
  • how many axes to subset
    • one for each column
  • also support ANDs/ORs
  • filter on axes
  • rainfall > x
  • climate filters
  • TROPICOS query builder
  • different levels of access through web interfaces
  • SELECT access only
  • interface only queries analytical database
  • NVS interface doesn't query core DB, instead analytical DB and metadata
  • avoid need to e-mail Brad to request extract