Project

General

Profile

2013-10-25 conference call

Martha's notes

Upcoming

  • call next week at usual time (Th 9am PT&Tucson/12pm ET)

Availability

Loading Google Spreadsheet...

Decisions made

validation order

  • can't mark datasource as done until fixes have been re-checked by the data provider (Martha)
  • plots before specimens (Brad)
  • better to do VegBank/CVS while they're fresh in Bob/Mike Lee's memory
  1. plots
    1. VegBank
    2. CVS
      1. VegBank-related fixes (plus fixes for new VegBank issues, once available)
      2. reload extract from Mike Lee
    3. Madidi (Peter Jorgensen)
    4. FIA
  2. specimens
    1. MO (Peter Jorgensen)
    2. ARIZ, U, TEX (Brad)
    3. GBIF, UNCC

VegBank validation

  • 3 feature requests were changed to issues:
    • add slopeAspect, slopeGradient to denormalized view
    • map taxoninterpretation.party_id to identifiedBy
    • add stemCount to denormalized view
  • 3 new feature requests:
    • 2 related to selecting the current taxondetermination 1 complete
    • 1 related to inter-datasource deduplication

VegBank deduplication

SALVIAS deduplication

  • SALVIAS contains some records that are in other plots datasources
  • the initial SALVIAS export on nimoy had these removed, but the refresh may still have the duplicates in
  • Brad will e-mail which SALVIAS projects are duplicates

_from Brad:_

I've modified the list of projects to exclude. Here they are:

mysql> select project_id, project_name from projects where project_id IN  (8,9,11,16,18,14,17,23);
+------------+------------------------------------------------+
| project_id | project_name                                   |
+------------+------------------------------------------------+
|          8 | Inventarios de Bosques en Ecuador              |
|          9 | Inventarios de Bosques de la Costa del Ecuador |
|         11 | INW Vegetation Plots                           |
|         14 | Madidi Transects                               |
|         16 | nsf_example                                    |
|         17 | Madidi Permanent Plots                         |
|         18 | SERBO Selva Seca Oaxaca                        |
|         23 | Madidi Savana Line Transects                   |
+------------+------------------------------------------------+
8 rows in set (0.00 sec)

Here's why they need to be excluded:

- Projects 8, 9, and 18 are hidden within the SALVIAS database and are not accessible for download. SALVIAS has agreed not to re-distribute these datasets.
- Project 11 is a duplicate of data in VegBank
- Project 16 is made-up example data for an old NSF proposal
- Projects 14, 17 and 23 should be duplicates of the Madidi data we are receiving directly from Peter Jorgensen

I made a mistake the last time by asking you to exclude the RAINFOR plots (project 5). I had forgotten that we will not be getting a new dump from Oliver Phillips. Please be sure to include project 5 from SALVIAS.

To do for Brad

SALVIAS deduplication

  • e-mail the list of SALVIAS projects that are also in other datasources

To do for Paul

most important:

  1. provide the psql commands to run numbered steps that are .sql scripts instead of .sh shell scripts:
    1. step 5. geonames.sql
      psql -e --set ON_ERROR_STOP=1 -d geoscrub < geonames.sql # run as DB superuser
    2. step 6. geovalidate.sql
      psql -e --set ON_ERROR_STOP=1 -d geoscrub < geovalidate.sql # run as DB superuser
    3. step 3. geonames-to-gadm.sql

also:

  1. add any in-progress scripts to svn
  2. explain which steps the new scripts are used for: they are called by the new headless scripts
    • geonames-to-gadm.*.sql
    • update.*.sql
  3. create shell scripts to run steps 1-3 and 4-6

To do for Aaron

validation

VegBank deduplication

  • remove CVS plots from VegBank by authorplotcode
    • use list of authorplotcodes in CVS extract, because not all plots use SSN format

SALVIAS deduplication

  • remove Brad's list of duplicated projects from SALVIAS