2013-10-25 conference call¶
Martha's notes¶
Upcoming¶
- call next week at usual time (Th 9am PT&Tucson/12pm ET)
Availability¶
- See the *Google spreadsheet* (and please add your availability for future weeks once it's known):
Loading Google Spreadsheet...
Decisions made¶
validation order¶
- can't mark datasource as done until fixes have been re-checked by the data provider (Martha)
- plots before specimens (Brad)
- better to do VegBank/CVS while they're fresh in Bob/Mike Lee's memory
- plots
VegBank2nd-round fixesplus feature requests that were changed to issues- and new feature requests? some
CVSVegBank-related fixes (plus fixes for new VegBank issues, once available)reload extract from Mike Lee
- Madidi (Peter Jorgensen)
- FIA
- specimens
- MO (Peter Jorgensen)
- ARIZ, U, TEX (Brad)
- GBIF, UNCC
VegBank validation¶
- 3 feature requests were changed to issues:
addslopeAspect
,slopeGradient
to denormalized viewmaptaxoninterpretation.party_id
toidentifiedBy
addstemCount
to denormalized view
- 3 new feature requests:
- 2 related to selecting the current taxondetermination 1 complete
1 related to inter-datasource deduplication
VegBank deduplication¶
need to remove CVS records from VegBank to avoid duplication
SALVIAS deduplication¶
- SALVIAS contains some records that are in other plots datasources
- the initial SALVIAS export on nimoy had these removed, but the refresh may still have the duplicates in
Brad will e-mail which SALVIAS projects are duplicates
I've modified the list of projects to exclude. Here they are:
mysql> select project_id, project_name from projects where project_id IN (8,9,11,16,18,14,17,23); +------------+------------------------------------------------+ | project_id | project_name | +------------+------------------------------------------------+ | 8 | Inventarios de Bosques en Ecuador | | 9 | Inventarios de Bosques de la Costa del Ecuador | | 11 | INW Vegetation Plots | | 14 | Madidi Transects | | 16 | nsf_example | | 17 | Madidi Permanent Plots | | 18 | SERBO Selva Seca Oaxaca | | 23 | Madidi Savana Line Transects | +------------+------------------------------------------------+ 8 rows in set (0.00 sec)
Here's why they need to be excluded:
- Projects 8, 9, and 18 are hidden within the SALVIAS database and are not accessible for download. SALVIAS has agreed not to re-distribute these datasets.
- Project 11 is a duplicate of data in VegBank
- Project 16 is made-up example data for an old NSF proposal
- Projects 14, 17 and 23 should be duplicates of the Madidi data we are receiving directly from Peter JorgensenI made a mistake the last time by asking you to exclude the RAINFOR plots (project 5). I had forgotten that we will not be getting a new dump from Oliver Phillips. Please be sure to include project 5 from SALVIAS.
To do for Brad¶
SALVIAS deduplication¶
e-mail the list of SALVIAS projects that are also in other datasources
To do for Paul¶
most important:
provide the psql commands to run numbered steps that are .sql scripts instead of .sh shell scripts:step 5. geonames.sqlpsql -e --set ON_ERROR_STOP=1 -d geoscrub < geonames.sql # run as DB superuser
step 6. geovalidate.sqlpsql -e --set ON_ERROR_STOP=1 -d geoscrub < geovalidate.sql # run as DB superuser
step 3. geonames-to-gadm.sql
also:
add any in-progress scripts to svnexplain which steps the new scripts are used for:they are called by the new headless scriptsgeonames-to-gadm.*.sql
update.*.sql
create shell scripts to run steps 1-3 and 4-6
To do for Aaron¶
validation¶
- see validation order above
VegBank deduplication¶
remove CVS plots from VegBank by authorplotcodeuse list of authorplotcodes in CVS extract, because not all plots use SSN format
SALVIAS deduplication¶
remove Brad's list of duplicated projects from SALVIAS