2012-11-29 web interface breakout group¶
- how to walk away from BIEN so it runs itself
- web service
- user interface needs to use API
- HTML form that calls API
- TROPICOS has web interface to create SQL queries
- users familiar w/ command line, users who understand content
- ordering
- website, core requirements
- factual website
- visualizations
- data requests
- data uploads
- data architecture
- series of use cases
- high-level user story for purpose of website
- similar to ecommerce shopping site
- gives the data, not analysis
- passive interface
- also interface to do something to data
- data entry tool to update data in BIEN
- provenance issue: how to get correct data back to original provider
- expert users of BIEN who are allowed to manipulate data
- plots more complex, smaller than specimens datasets
- mechanism to collect thousands of plots on Excel spreadsheets
- contracts: must make data publicly available
- role of VegBank?
- NVS broader than VegBank
- VegCore can accommodate these changes
- implications for components the website would have
- who are the users? what products/analyses they need?
- should BIEN become data repo for plots data in the U.S./the world?
- make BIEN modular? each organization has data in empty schema
- BIEN is method, not data
- Africa with BIEN structure
- beyond sci community or researchers, who has interest in BIEN data?
- general public? consultants?
- scientific method not in consulting
- assessment: what's potentially there in terms of species
- various Latin American repos have started charging consultants
- stopblock
- what mechanisms? companies make donation
- NBG (NY?) has contracts with mining companies
- other things BIEN produces: plots are input, ranges are output
- horticultural community: what could grow in person's area
- get and contribute plot data
- native plant society-type groups
- agriculture: iPlant funding because useful for crop science
- plant groups, education
- package data in simple way for students: modules
- IUCN classification
- NGOs: range models, raw occurrence data
- select interfaces to get data
- challenge is interfaces that change data
- TROPICOS experience: takes years for user community to be happy with forms, steps
- data entry interfaces are highly programmer-intensive, lower priority
- 4-5 years have had group of programmers doing web interfaces for TROPICOS
- Eric Fegraus (Conservation International): unified schema
- BIEN not involved in interface development, CI would do that
- independent data entry tool which can push data into BIEN
- continued funding for UI development?
- expose web interface for uploading data, but not data entry tools
Download tracking¶
- track who downloads data
- can't just make all data public, because some of it has access restrictions
- not data entry/correction interface
- SALVIAS a good model for interface
- monitor who downloads the data
- don't need graphical interface
- datasets in SALVIAS can be tagged in 3 access ways: totally hidden, metadata-only, public
- logging of downloads
- who user was, IP address
- users tagged as belonging to dataset
- mechanism to send someone an e-mail when someone downloads their data (opt-out)
- providing this through a web service
- anonymous downloads
- capture timestamp, IP address, username if logged in
- authorization of access to level 2 data
- owner grants access to datasets
- peer-to-peer access mediated by database
- SALVIAS maintains itself
- if build infrastructure that supports this, other things come with it:
- can provide to data provider who downloaded their data
- particular functionality that repo should support
- requirements
- potentially have a RESTful API
- a URL to do any action you want, then a UI on top of that
- request access to dataset
- grant API keys
- authenticate access
- infrastructure exposed to people who don't know RESTful APIs
- interface issue
- data entry and correction: nice to have
- download/logging: required
- control of data access by owner
- avoid TRY's headaches of having to mediate this
- e-mail changes
- window after which dataset goes public: 5 years or 10 unanswered requests
- build in e-mail pinger
- SALVIAS has dead-end e-mails
- data that's not fully public->ensure no data spills with minimal future work needed
- identify visibility of records
- what gets coalesced back into analytical DB
- accesslevel field in analytical DB
- track provenance in analytical DB
- plot data has species name, place
- elements assembled
- access at owner/plot/date level
- queries could bypass levels in core DB
- one challenge is fuzziness
- hierarchy of top-level and ultimate data provider
- allow for fuzziness in identifying data provider
- who owns plot data in public repo?
- proximal entity
- Conabio/REMIB
- were open to sharing with TROPICOS?
- error reports, range maps
- another user community: data providers
- data providers serve range maps created by BIEN
- estimate of cost?
- developing TROPICOS DB: paid developer
- TNRS paid developer for a year
- 4000-5000 rare species names to exclude
- web service mechanism to request data products
- provenance functionality, data ownership
- data exploration: Brian's web service
- put user interface over web service
- more user-friendly
- HTML form with picklists, GIS maps
- built into web interface
- web interface and web service would match
- different groups doing different services, need to collaborate: "eat your own dogfood"
- every group provides data to other groups via API
- RESTful API
- work to make interface more robust: security, authentication
- BIEN-specific requests->queries
- translate higher level request to SQL query
- cached queries
- indexes on analytical DB
- index every field in analytical DB
- TROPICOS reporting DB regenerated nightly
- capture administrative data: additional schema elements
- SALVIAS: when user signs up, add name, e-mail
- user (human/institution) linked to data
- NVS has party concept to manage ownership and participation on plots
- application and permissions
- external authentication
- confederated security
- Google sign-in
- Shibboleth
- everyone else: needs new account
- using DataONE
- if we do something, prefer to use out-of-the-box security
- identity research
- use own credentials to sign in on another site
- ad-hoc user needs own account
- complex model
- need authentication of some kind
- iPlant has approaches?
- what is procedure to get an account?
- sign up link
- verifying that not a bot
- passive interface that doesn't require human approval
- need to be identified somehow
- need user to track downloads
- also internal mechanism for data access
- require log in
- anonymous user -> access public data
- this is just for read transactions
- log IPs to determine hits/user
- data packages: how many times read?
- logging table for Python table
- straightforward?
- what about update mechanism
- takes month and many e-mails to load data for other DBs
- what is a mechanism to upload data?
- published schema to use?
- CSV file like on TDWG site?
- increasingly automate pipeline
- need human being to be comfortable that incoming data meets DB's quality standards
- compare to global jellyfish (JEDI)
- upload CSV: potentially VegCSV
- spec of what upload needs to look like
- datatypes
- if import fails, provide feedback to user
- mechanism to send data: drop box, harvester, etc.
- managed to staging system
- data validations
- feedback to provider about data quality, valid mappings
- data w/ frequent updates (active datasets)
- immediate feedback
- balance between strict vs. loose VegCSV
- where possible, use well-known standards
- but also allow similar data
- metadata catalog
- GIVD: items that didn't apply, e.g. # of releves
- vocabularies w/ common elements
- core elements that everyone recognizes
- optional elements
- minimal required elements
- weakly-typed table->define datatypes
- once user's data is good, PDF report generated
- successful upload->moderation queue
- who submitted, when
- table with unique ID associated with upload
- deletion of inserted records if error
- custom mapping saved in background
- track submission as bundle
- how to know when to delete something?
- TNRS model: don't rebuild whole database, grows or contracts on dataset basis
- versioning database: rollback to previous version
- TNRS fkey walk: ON DELETE CASCADE
- but leaves NCBI taxonomy
- embed as much within database structure as possible
- which are shared, which are unique keys
- validations, reporting
- frequency of plots
- sparkline things
- class of data captured by NVS
- upload data->analysis of internal quality
- 2nd-level validations
- how data compares to population statistics
- put in taxonomic name
- early validations->flag for user to check
- meet back at 11:15am
- end-of-pipeline crowdsourcing and user feedback issues
- corrections to data
- once maps visible, find issues
- part of DB to store user feedback, to filter/improve data
- exclude data marked as wrong
- 3-4 categories to tag data with
- human/automated layers to filter data
- hide incorrect maps?
- how to track all range maps
- API level, how to store bits of feedback, annotations on content
- annotations on data object in BIEN, even if data itself is not in BIEN
- click range map point to mark as incorrect
- users mark specimens as cultivated
- visual interactions with data
- collections as cultivated
- every herbarium from Index Herbariorum, mark w/in radius as cultivated
- shapefile with boundaries of botanical gardens
- weighting by population (cities)
- geospatial component, but lose info corrected around small cities
- web interface: view data record, add specific comment
- look at range map->validate, star rating, which are incorrect
- rating a book
- occurrence, point data
- cluster of points->flag as questionable
- simple interface->develop more w/ feedback
- people request datasets, find issues
- feedback as dataset comment
- talked about versioning downloads
- snapshot of BIEN
- comments about data within snapshot
- remind user to give feedback on downloaded data
- complete the loop on how data used
- collate answer
- don't build large infrastructure, start small
- way to get feedback on invalid records
- first year we have range map data for all species
- BIEN 2 data->BIEN 3
- which species to rerun
- frameworks so don't need to build infrastructure
- one-click flagging of points
- feedback about record
- original list
- automatic downloads: downloading portions of the data
- mech to associate user with provider
- log IP address
- search for things
- data provider flips switch to make data public
- data exports in same format
- user interface for visualizing the data
- hard to gauge what user community would want
- needs-driven
- CVS data exploration visualization
- data discovery
- hardening the range modeling algorithms
- layer that sits on top of range modeling applications
- things that people can say
- filter on mapped areas, things, species
- select specific fields that want to look at
- picklist
- something much more detailed: shapefile of Nat'l Park
- would be very nice app
- Java
- what are main search/discovery axes?
- country, spatial, temporal, taxonomic, trait, plot size, size range, habit
- BIEN2 doesn't have temporal data, because old collections are handwritten->range of years
- need start/end date for collected date
- date ranges: good thing about VegBank
- TROPICOS also has start/end dates
- fields for D/M/Y->display date
- legacy data
- spatial query: what's at a point
- family or habit
- flag what is co-occurrence data
- level of granularity
- how many axes to subset
- one for each column
- also support ANDs/ORs
- filter on axes
- rainfall > x
- climate filters
- TROPICOS query builder
- different levels of access through web interfaces
- SELECT access only
- interface only queries analytical database
- NVS interface doesn't query core DB, instead analytical DB and metadata
- avoid need to e-mail Brad to request extract