moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: rerun time: noted that this is only fast after manual vacuuming of the table (to remove the deleted rows from the index). autovacuum apparently does not run, although it should.
inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: CREATE INDEX ... specimenHolderInstitutions: documented runtime (45 min)
inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: documented runtime (3.5 min)
inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: # duplicates: added revision #
inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: documented that there are 4.5 million duplicates (59,998,354 rows before - 55,417,646 rows after = 4,580,708)
inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: added rerun time (~0 thanks to index, so no problem doing the DELETE each time postprocess.sql is run)
bugfix: inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: updated column names to match the renamings in map.csv, which are now performed on the staging table itself
bugfix: inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: institution_code index: create it idempotently using create_if_not_exists() and an explicit index name, so that a duplicate index doesn't get added each time postprocess.sql is run
inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: add util to the search_path so that postprocess.sql will also work when run by inputs/input.Makefile, which only puts the datasource (GBIF) in the search_path
inputs/GBIF/raw_occurrence_record/: renamed to raw_occurrence_record_plants because it's actually only the plants in raw_occurrence_record, not all of raw_occurrence_record. also, this will allow us to create a separate raw_occurrence_record_plants view whose name matches the folder and does not collide with the raw_occurrence_record table.
added inputs/GBIF/raw_occurrence_record/postprocess.sql, which removes institutions that we have direct data for