Project

General

Profile

2012-11-09 conference call

Upcoming

  • We will be meeting again next Friday 11/16
  • Brad is sending out a questionnaire to BIEN members to see what information they need in the analytical database

To do

  1. Talk to Mike and Bob about where datasource metadata is stored in VegBank
    • usr, userdataset, party, observationcontributor tables?
    • need to cite VegBank providers as VegBIEN subproviders
  2. Figure out which VegBank records are embargoed (not just fuzzed): there are 288 embargo entries whose embargo stop dates are in the future1
  3. Refactor VegBIEN to store BIEN2 datasource information
    1. obtain BIEN2 datasource schema (ERD) from Brad: see attached bien_web_datasource_schema.sql, bien_web_datasource_schema.mwb
    2. Add copyright table to VegBIEN: added accesslevel, accessconditions to reference
      • stores datasource access restrictions and coauthorship requirements
  4. Determine the BIEN2 datasources' access restrictions
    • separate meeting?: done
  5. Obtain SALVIAS providers' access restrictions
    • public, metadata-only, completely hidden
    • not in the salvias_plots export Brad put on nimoy: actually, this information is in plotMetadata.AccessCode
  6. Determine precision of coordinates
    • Determine fuzz factor applied to location-embargoed data
    • Store in coordinates.coordsaccuracy_deg (VegBank's plot.locationaccuracy)

1 on vegbiendev:

SELECT count(*) FROM "VegBank".embargo WHERE embargostop > now()

VegBank provider metadata

Table Metadata stored
party contact info for each party associated with a plot, etc.
usr stores only names and email addresses, which might be all the contact info there is
usercertification stores credentials, not quite the same thing as contact info
userdataset datasetsharing private/public flag
embargo plot-specific, and stores only restrictions on viewing the data, not on redistributing the data
userpermission empty (and likely applies only to user access to VegBank data)

Note: VegBank fields not in the ERD are not populated

Geovalidation

  • Jim has scripts that regenerate the geoscrub table
  • Jim will e-mail out the meaning of each numerical code and summarizations of the results

Steps

  1. clean names
    1. decode UTF-8 that was mis-encoded as Latin-1
    2. expand HTML entities
    3. country: match 3-char, 2-char ISO country codes
    4. county: remove "Co." after county name
    5. city: remove "city of"/"municipio de" before city name
  2. match names to geonames.org hierarchy
  3. translate geonames.org names to GADM human-readable names at country level
    • less matches below country level, e.g. Madagascar reorganization
  4. point-in-polygon
    • result for each level stored in flag field with different codes for the validation status
    • need to combine validation codes to boolean is-valid field

Results

  • 1.7 million input locations
  • 1.48 million inputs have country
  • all but 7000 countries can be matched
    • unmatched countries: 5,000 are the word "Caribbean"; others islands, etc.
  • 1/6 of records have 3rd-level name
  • 50% of counties matched
  • takes 2 hours (?) to parse names