Project

General

Profile

GBIF validation

source data: *on vegbiendev in MySQL* (log in as bien_read with the BIEN password)

2014-10-x

input-to-output query:

SELECT *
FROM      "GBIF".raw_occurrence_record_plants
LEFT JOIN view_full_occurrence_individual ON
    view_full_occurrence_individual.datasource = 'GBIF'
AND view_full_occurrence_individual.custodial_institution_codes = raw_occurrence_record_plants.specimen_duplicate_institutions
AND view_full_occurrence_individual.collection_code = raw_occurrence_record_plants.collection
AND view_full_occurrence_individual.catalog_number = raw_occurrence_record_plants."accessionNumber" 
ORDER BY raw_occurrence_record_plants."*row_num" 
LIMIT 100

output-to-input query:

SELECT *
FROM      view_full_occurrence_individual
LEFT JOIN "GBIF".raw_occurrence_record_plants ON
    view_full_occurrence_individual.custodial_institution_codes = raw_occurrence_record_plants.specimen_duplicate_institutions
AND view_full_occurrence_individual.collection_code = raw_occurrence_record_plants.collection
AND view_full_occurrence_individual.catalog_number = raw_occurrence_record_plants."accessionNumber" 
WHERE view_full_occurrence_individual.datasource = 'GBIF'
/* no ORDER BY because the view_full_occurrence_individual rows are not in any
particular order (the hash joins that produce it do not sort their output) */
LIMIT 100

2013-12-10

*GBIF.2013-12-10.100_rows.xls*
(input and output data are in separate tabs. refer to the VegCore data dictionary for column definitions.)

  • KNOWN NON-ISSUE: stateProvince is not populated, because the herbarium in the validation view does not provide it

subset import command:

time (export log= version=GBIF_VegBIEN; make schemas/$version/reinstall; make inputs/GBIF/import by_col=1 n=100; bin/make_analytical_db) # runtime: 1 min ("0m39.965s") @starscream; 2 min ("1m38.950s") @vegbiendev

query:

SELECT *
FROM      "GBIF".raw_occurrence_record_plants
LEFT JOIN "GBIF_VegBIEN".analytical_specimen ON
    analytical_specimen."datasource" = 'GBIF'
AND analytical_specimen."specimenHolderInstitutions" = raw_occurrence_record_plants."specimenHolderInstitutions" 
AND analytical_specimen."collection" = raw_occurrence_record_plants."collection" 
AND analytical_specimen."accessionNumber" = raw_occurrence_record_plants."accessionNumber" 
ORDER BY raw_occurrence_record_plants."*row_num" 
LIMIT 100