2012-11-09 conference call¶
Upcoming¶
- We will be meeting again next Friday 11/16
- Brad is sending out a questionnaire to BIEN members to see what information they need in the analytical database
To do¶
Talk to Mike and Bob about where datasource metadata is stored in VegBank- usr, userdataset, party, observationcontributor tables?
- need to cite VegBank providers as VegBIEN subproviders
Figure out which VegBank records are embargoed (not just fuzzed): there are 288 embargo entries whose embargo stop dates are in the future1Refactor VegBIEN to store BIEN2 datasource informationobtain BIEN2 datasource schema (ERD) from Brad: see attached bien_web_datasource_schema.sql, bien_web_datasource_schema.mwbAdd copyright table to VegBIEN: added accesslevel, accessconditions to reference- stores datasource access restrictions and coauthorship requirements
- Determine the BIEN2 datasources' access restrictions
separate meeting?: done
Obtain SALVIAS providers' access restrictions- public, metadata-only, completely hidden
- not in the salvias_plots export Brad put on nimoy: actually, this information is in plotMetadata.AccessCode
Determine precision of coordinatesDetermine fuzz factor applied to location-embargoed dataStore in coordinates.coordsaccuracy_deg (VegBank's plot.locationaccuracy)
1 on vegbiendev:
SELECT count(*) FROM "VegBank".embargo WHERE embargostop > now()
VegBank provider metadata¶
Table | Metadata stored |
party | contact info for each party associated with a plot, etc. |
usr | stores only names and email addresses, which might be all the contact info there is |
usercertification | stores credentials, not quite the same thing as contact info |
userdataset | datasetsharing private/public flag |
embargo | plot-specific, and stores only restrictions on viewing the data, not on redistributing the data |
userpermission | empty (and likely applies only to user access to VegBank data) |
Note: VegBank fields not in the ERD are not populated
Geovalidation¶
- Jim has scripts that regenerate the geoscrub table
- Jim will e-mail out the meaning of each numerical code and summarizations of the results
Steps¶
- clean names
- decode UTF-8 that was mis-encoded as Latin-1
- expand HTML entities
- country: match 3-char, 2-char ISO country codes
- county: remove "Co." after county name
- city: remove "city of"/"municipio de" before city name
- match names to geonames.org hierarchy
- translate geonames.org names to GADM human-readable names at country level
- less matches below country level, e.g. Madagascar reorganization
- point-in-polygon
- result for each level stored in flag field with different codes for the validation status
- need to combine validation codes to boolean is-valid field
Results¶
- 1.7 million input locations
- 1.48 million inputs have country
- all but 7000 countries can be matched
- unmatched countries: 5,000 are the word "Caribbean"; others islands, etc.
- 1/6 of records have 3rd-level name
- 50% of counties matched
- takes 2 hours (?) to parse names