Project

General

Profile

Statistics
| Revision:
  • svn:ignore: *

# Date Author Comment
14762 09/26/2014 12:36 AM Aaron Marcuse-Kubitza

fix: lib/sh/util.sh: already_exists_msg(): changed calling convention to avoid it seeming like `return 0` is run if already_exists_msg() throws an error, when in fact already_exists_msg() is just a command that should be run before returning/errexiting

14075 07/15/2014 09:35 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/test.xml.ref: updated

13965 07/10/2014 12:17 PM Aaron Marcuse-Kubitza

inputs/GBIF/_MySQL/.rsync_ignore: don't exclude GBIFPortalDB-*.data.sql.gz, even though this is an intermediate file, because it's better to have a backup of it locally. this was excluded in r13316 (2014-4-24) to free up disk space on the local machine.

13401 05/03/2014 02:03 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: add: verify/: also svn:ignore *.log

13316 04/24/2014 05:29 PM Aaron Marcuse-Kubitza

inputs/GBIF/_MySQL/.rsync_ignore: added GBIFPortalDB-*.data.sql.gz, because these are intermediate files

12988 03/30/2014 05:41 PM Aaron Marcuse-Kubitza

added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt, which is useful to see what fields will be available when we switch to the new GBIF export format

12985 03/30/2014 05:11 PM Aaron Marcuse-Kubitza

added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt.run

12968 03/29/2014 04:06 AM Aaron Marcuse-Kubitza

*{.sh,run}: runscript targets: use begin_target instead of echo_func so the target name is properly echoed. note that this requires using with_rm so that $rm is properly progagated to applicable invoked targets. (previously, $rm was progagated to all invoked targets. note that with_rm only works inside a runscript target that starts with begin_target.)

12967 03/29/2014 03:58 AM Aaron Marcuse-Kubitza

lib/sh/make.sh: self_make(): renamed to with_rm() for clarity, since this is used only to progagate $rm, and does not also invoke a command with the same name as the current function, as the name might suggest

12886 03/24/2014 05:35 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: specimenreplicate.institution_id: renamed to duplicate_institutions_sourcelist_id, as decided in the conference calls (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2)

12879 03/24/2014 01:49 AM Aaron Marcuse-Kubitza

mappings/VegCore.htm: regenerated from wiki: rename specimenHolderInstitutions to specimen_duplicate_institutions, as decided in the 2014-03-13 conference call (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2). note that most schema changes (such as this one) involve mappings changes, which are handled automatically by `inputs/run postprocess; yes|make inputs/{NVS,SALVIAS,TEAM}/test`.

12873 03/23/2014 11:43 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/table.run: switched to using lib/runscripts/table.run instead of mysql.table.run because some subdirs (Source/) need the regular table.run to work properly. mysql.table.run should instead be used directly by subdirs that use the MySQL install.

12779 03/20/2014 07:58 PM Aaron Marcuse-Kubitza

*{.sh,run}: use new begin_target instead of `echo_func; set_make_vars`

12516 02/27/2014 01:27 PM Aaron Marcuse-Kubitza

bugfix: *.sql: public.source_by_shortname(): need to wrap it in a nested SELECT because Postgres incorrectly does not constant-fold (inline) it, leading to a slowdown when it is therefore run many times. this is done using the steps at wiki.vegpath.org/Postgres_queries#wrap-function-call-in-nested-SELECT .

12018 02/02/2014 12:49 AM Aaron Marcuse-Kubitza

inputs/input.Makefile: add!: verify/: also svn:ignore *.tsv, *.txt

11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

11888 12/10/2013 06:35 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/map.csv: row_num: remapped to plain *row_num, like the other datasources that have this field

11887 12/10/2013 06:31 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: rerun time: noted that this is only fast after manual vacuuming of the table (to remove the deleted rows from the index). autovacuum apparently does not run, although it should.

11881 12/09/2013 07:24 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/test.xml.ref: reran test, which added yearCollected/monthCollected/dayCollected

11869 12/09/2013 02:43 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/run: updated import() runtime (same), documented table cleanup runtime (1.5 h)

11868 12/09/2013 02:38 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: CREATE INDEX ... specimenHolderInstitutions: documented runtime (45 min)

11867 12/09/2013 02:28 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: documented runtime (3.5 min)

11788 11/26/2013 11:11 PM Aaron Marcuse-Kubitza

**/new_terms.csv, unmapped_terms.csv updated (using `make missing_mappings`)

11705 11/21/2013 12:24 AM Aaron Marcuse-Kubitza

copyright scrub: inputs/: removed data provider-owned schema and documentation files, which are not BIEN copyright and should not be part of what is submitted for open-sourcing. these files will remain accessible via the web interface (fs.vegpath.org), but will not be in the repository.

11658 11/14/2013 02:17 AM Aaron Marcuse-Kubitza

added inputs/GBIF/_src/0001000-131106143450413.zip.md5, GBIFPortalDB-2013-09-10.dump.gz.md5

11654 11/14/2013 12:49 AM Aaron Marcuse-Kubitza

inputs/GBIF/_src/GBIFPortalDB-2013-09-10.dump.gz.url: documented download time (5.5 h for an 18 GB file)

11653 11/14/2013 12:40 AM Aaron Marcuse-Kubitza

inputs/GBIF/_src/0001000-131106143450413.zip.url: documented download time (only 2 h for an 18 GB file)

11650 11/13/2013 07:14 PM Aaron Marcuse-Kubitza

added inputs/GBIF/_src/0001000-131106143450413.zip.url (DwC-A export), GBIFPortalDB-2013-09-10.dump.gz.url (raw data), portal_26_feb_2013.war.url (raw data portal)

11648 11/13/2013 04:16 PM Aaron Marcuse-Kubitza

inputs/GBIF/: added LOA files: _src/use_conditions/LetterOfAgreement_template.doc, BIEN LoA agreement annex.docx

11396 10/21/2013 07:14 PM Aaron Marcuse-Kubitza

fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error

11107 09/29/2013 08:58 PM Aaron Marcuse-Kubitza

bugfix: mappings/VegCore-VegBIEN.csv: nest all taxonoccurrences inside a stratum event, so that the parent locationevent is always fully populated before child locationevents point to it. (previously, a stub parent event was created when the child event was imported first, which blocked the fully-populated parent event from being inserted later on.) this uses auto-folding (for VegBank/CVS) and auto-forwarding (for other datasources) to prune empty stratum events for taxonoccurrences that don't have strata. (see wiki.vegpath.org/Auto-folding, wiki.vegpath.org/Auto-forwarding for more info about these normalization techniques.) note that the inserted row counts stay exactly the same for all datasources except VegBank (which was being fixed), indicating that this signficant change to the mappings did not change the semantics of the import of taxonoccurrences.

10866 09/04/2013 11:06 PM Aaron Marcuse-Kubitza

inputs/*/*/test.xml.ref: updated source.shortname for new datasource name, which now starts out with .new suffix

10443 07/26/2013 05:58 PM Aaron Marcuse-Kubitza

inputs/{.,}*/*.schema.sql: regenerated using the instructions in bin/my2pg. this primarily replaces timestamp with text/*timestamp*/ (to preserve indefinite dates).

10425 07/25/2013 07:34 PM Aaron Marcuse-Kubitza

bugfix: inputs/*/*/map.csv for specimen tables: remapped eventDate,day,month,year to *Collected, because a general date always applies to the observation itself rather than to any parent event (specimens don't have a parent event)

10270 07/14/2013 01:26 AM Aaron Marcuse-Kubitza

bugfix: inputs/*/*/map.csv (e.g. inputs/GBIF/raw_occurrence_record_plants/map.csv): remapped author to scientificNameAuthorship rather than authors, which it had gotten incorrectly automapped to. note that the VegCore term authors has now been renamed to data_authors to avoid ambiguity, but incorrect automappings resulting from it had not yet been fixed.

10269 07/14/2013 12:54 AM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/raw_occurrence_record_plants/run: updated herbaria.ih column names for staging table column renaming

10268 07/14/2013 12:33 AM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/table.run: need to include lib/runscripts/mysql.table.run instead of table.run (table.run was accidentally substituted when inputs/.NCBI/table.run was copied to all new-style datasources

10242 07/10/2013 10:07 PM Aaron Marcuse-Kubitza

inputs/*/Source/VegBIEN.csv: regenerated for new-style import, which uses a symlink to mappings/VegCore-VegBIEN.csv instead of a custom mapping using the original column names

10209 07/10/2013 02:32 AM Aaron Marcuse-Kubitza

inputs/*/*/map.csv for CSV tables with a row_num column: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table

10199 07/09/2013 04:44 PM Aaron Marcuse-Kubitza

bugfix: inputs/*/Source/map.csv: added missing row_num entry, which is needed by the staging table column renaming to make the order of the map.csv columns match the order in the staging table. the staging table column renaming is now used by all Source tables.

10179 07/06/2013 05:39 PM Aaron Marcuse-Kubitza

inputs/*/: added table.run for use by the table subdirs in new-style import. datasources without table subdirs do not need this.

10174 07/06/2013 03:55 PM Aaron Marcuse-Kubitza

bugfix: inputs/input.Makefile: %/VegBIEN.csv: for new-style datasources, use a symlink to mappings/VegCore-VegBIEN.csv directly instead of prefiltering VegCore-VegBIEN.csv to include only the columns in map.csv. prefiltering used to be performed as part of mapping the map.csv VegCore output terms to VegBIEN using bin/join, but is no longer needed because the staging table columns are now VegCore terms. instead, the full VegCore-VegBIEN.csv is needed so that derived columns added in stage I or II validations are detected by bin/map (rather than just the original source columns in map.csv).

10166 07/06/2013 11:29 AM Aaron Marcuse-Kubitza

bugfix: inputs/*/Source/data.csv for new-style datasources: need to include a blank row (plus a blank header) so that the metadata values are imported at least once instead of zero times, now that there is an installed staging table that will be iterated over. the blank row did not used to be necessary, because db_xml.put_table() has a special case for metadata-only tables with no installed table, which avoids iterating over the table's rows.

10163 07/03/2013 10:20 PM Aaron Marcuse-Kubitza

inputs/*/Source/ for new-style datasources: use an actual staging table instead of a metadata-only table, so that metadata values can be stored in the staging table instead of the map.csv (as will be required by new-style import)

10089 06/27/2013 12:20 PM Aaron Marcuse-Kubitza

added inputs/GBIF/_archive/

10088 06/27/2013 12:18 PM Aaron Marcuse-Kubitza

removed inputs/GBIF/Specimen/, which has been replaced by the refresh in raw_occurrence_record_plants/

10087 06/27/2013 12:17 PM Aaron Marcuse-Kubitza

added inputs/GBIF/map.csv, used to regenerate inputs/GBIF/raw_occurrence_record_plants/map.csv when raw_occurrence_record_plants is resubset

10051 06/26/2013 07:55 AM Aaron Marcuse-Kubitza

inputs/GBIF/run: inherit from lib/runscripts/datasrc_dir.run, which uses import_order.txt to forward calls to the subdirs

10050 06/26/2013 07:54 AM Aaron Marcuse-Kubitza

added blank runscripts inputs/GBIF/Source/run, Specimen/run because they are in import_order.txt (used by lib/runscripts/datasrc_dir.run)

10036 06/25/2013 03:31 PM Aaron Marcuse-Kubitza

added inputs/GBIF/_src/.rsync_filter.upload,download to prevent old versions of GBIFPortalDB-*.dump.gz from being downloaded to the local machine, while keeping them on jupiter. this avoids the need to store these files in ~/Documents/BIEN/large_files/ with symlinks from inputs/GBIF/_src/ to exclude them from the sync.

10008 06/23/2013 03:47 PM Aaron Marcuse-Kubitza

added inputs/GBIF/raw_occurrence_record_plants/.rsync_ignore with filters that have previously needed to be manually added whenever `make inputs/upload` was run

10007 06/23/2013 03:46 PM Aaron Marcuse-Kubitza

added inputs/GBIF/_MySQL/.rsync_ignore with filters from /README.TXT > Maintenance > to synchronize vegbiendev, jupiter, and your local machine. these filters will now be used with bin/sync_upload in addition to the periodic backup commands.

9927 06/19/2013 10:17 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: genus->taxonlabel.taxonomicname: use new _filter_genus() (see r9882)

9885 06/12/2013 11:26 AM Aaron Marcuse-Kubitza

added inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.data.0.preamble.sql

9882 06/12/2013 10:49 AM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: genus->taxonlabel.taxonomicname: filter out genera that contain numbers (using new _filter_genus()), which break TNRS and prevent it from matching any other parts of the name. later, these genera can instead be moved to the end of the name, where TNRS will correctly match them as Unmatched_terms.

9877 06/12/2013 10:05 AM Aaron Marcuse-Kubitza

added inputs/GBIF/raw_occurrence_record_plants/table.tsv.md5

9876 06/12/2013 09:51 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/test.xml.ref: regenerated. updated for new staging table input columns, which are now the same as the output columns.

9875 06/12/2013 09:41 AM Aaron Marcuse-Kubitza

bugfix: inputs/input.Makefile: %/VegBIEN.csv: use header from map.csv instead of the new columns, so that source.shortname is set to GBIF instead of VegCore

9874 06/12/2013 09:24 AM Aaron Marcuse-Kubitza

inputs/input.Makefile: %/VegBIEN.csv: when a runscript is available, instead map the output columns of map.csv to VegBIEN, because the columns have been renamed in the staging table

9873 06/12/2013 08:32 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/VegBIEN.csv: regenerated, which adds row_num input col

9864 06/12/2013 06:35 AM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/import_order.txt, run: updated raw_occurrence_record/ to raw_occurrence_record_plants/

9858 06/12/2013 04:47 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/run: import() runtime: specified that this does not include table.tsv.gz/make()

9857 06/12/2013 04:07 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: # duplicates: added revision #

9856 06/12/2013 04:07 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: documented that there are 4.5 million duplicates (59,998,354 rows before - 55,417,646 rows after = 4,580,708)

9855 06/12/2013 03:49 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: added rerun time (~0 thanks to index, so no problem doing the DELETE each time postprocess.sql is run)

9854 06/12/2013 03:25 AM Aaron Marcuse-Kubitza

*{.sh,run}: use simpler .rel() instead of `. "$(dirname "${BASH_SOURCE0}")"/...` for relative includes

9851 06/12/2013 02:48 AM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/_MySQL/MySQL_schema, MySQL_data: sed: put {} commands on their own line to work on Mac

9845 06/11/2013 06:40 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: updated column names to match the renamings in map.csv, which are now performed on the staging table itself

9828 06/11/2013 03:29 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: institution_code index: create it idempotently using create_if_not_exists() and an explicit index name, so that a duplicate index doesn't get added each time postprocess.sql is run

9826 06/11/2013 03:22 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: add util to the search_path so that postprocess.sql will also work when run by inputs/input.Makefile, which only puts the datasource (GBIF) in the search_path

9823 06/11/2013 09:04 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/run: added import() runtime (5 h)

9822 06/10/2013 11:58 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/run: table.tsv.gz/make() runtime: noted that this excludes the upload time

9821 06/10/2013 11:58 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/run: added table.tsv.gz/upload() runtime (15 min)

9820 06/10/2013 11:48 PM Aaron Marcuse-Kubitza

added lib/runscripts/mysql.table.run (general to all MySQL datasources) and use it in inputs/GBIF/table.run

9819 06/10/2013 11:13 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/run: table.tsv/make(): to view runtime when using `screen`: keys used to scroll: added Ctrl-B/Ctrl-F for page-at-a-time scrolling (there are a lot of pages of output for the import() target!)

9818 06/09/2013 09:21 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/table.run: table.tsv.gz/make(): don't run table.tsv.gz/upload in test mode, to avoid clobbering the backup of a full table.tsv with a partial, testing table.tsv

9816 06/09/2013 09:08 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/table.run: table.tsv.gz/upload(): don't use inplace mode because it leaves a newer mtime when aborted, causing rsync to think that the partial upload is actually newer than the source. note that rsync's --partial-dir mode is just as capable of resuming an aborted upload (it will just use a file in .rsync-tmp instead). inplace mode is primarily designed for fixed-offset files which don't change much between edits, but this is not true for exports (or the gzips of them), which will change the file offsets of most data if even one row or column is added or removed.

9815 06/09/2013 09:01 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/table.run: table.tsv.gz/make(): run table.tsv.gz/upload here instead of in table.tsv/make() because it should not run until table.tsv.gz is finished being made, which is not the case in table.tsv/make() because table.tsv.gz/make is run in the background

9814 06/09/2013 08:59 PM Aaron Marcuse-Kubitza

inputs/GBIF/table.run: table.tsv.gz/upload(): moved before table.tsv.gz/make() so it can be used by it

9813 06/09/2013 08:39 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/table.run: table.tsv.gz/upload(): need overwrite=1 because the mtime of an aborted inplace upload is newer

9812 06/09/2013 08:32 PM Aaron Marcuse-Kubitza

inputs/GBIF/table.run: table.tsv*/upload(): renamed to table.tsv.gz/upload() to upload only table.tsv.gz, not table.tsv, in order to save bandwidth

9807 06/09/2013 07:00 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/table.run: table.tsv*/upload(): need to run put in live mode (live=1)

9803 06/09/2013 06:30 PM Aaron Marcuse-Kubitza

inputs/GBIF/table.run: table.tsv/make(): run table.tsv*/upload when the file make is done so that the file is backed up to jupiter

9802 06/09/2013 06:29 PM Aaron Marcuse-Kubitza

inputs/GBIF/table.run: added table.tsv*/upload()

9781 06/09/2013 11:13 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/run: table.tsv/make(): documented how to view the runtime when using `screen` (press Ctrl-A [ , use up-arrow, and then press Esc to leave copy mode)

9780 06/09/2013 11:12 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/run: herbaria_filter/make(): use new ih_herbarium table instead of the herbaria_filter.ih.csv_ file directly

9779 06/08/2013 12:23 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/run: added ih_herbarium/make(), which stores the IH herbaria

9778 06/08/2013 11:50 AM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/raw_occurrence_record_plants/run: table/make(): also filter out rows with a non-plant family (as described at http://vegpath.org/wiki/2013-06-06_conference_call#GBIF-subsetting-fix-raw_occurrence_record-filter-formula), since some institutions have both animal and plant rows, even though they are in IH or in the 80% list. (note that NULL families are OK.)

9777 06/08/2013 04:12 AM Aaron Marcuse-Kubitza

*{.sh,run}: use mysql instead of mysql_ANSI because mysql is now an alias to mysql_ANSI (since ANSI mode still supports key MySQL features, like `` quotes)

9776 06/08/2013 04:09 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/run: table.tsv/make(): documented that incremental output is provided right away with --quick (unbuffered), but takes awhile to become visible in Macfusion sshfs. this can be tested with `while true; do stat inputs/GBIF/raw_occurrence_record_plants/table.tsv; sleep 2; done` running concurrently with `./inputs/GBIF/raw_occurrence_record_plants/run table.tsv/make` on vegbiendev:/home/bien/svn .

9775 06/08/2013 04:00 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/run: table.tsv/make(): use new raw_occurrence_record_plants view from table/make()

9774 06/08/2013 03:15 AM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/raw_occurrence_record_plants/run: table/make(): added make of prerequisites

9773 06/08/2013 03:14 AM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/raw_occurrence_record_plants/run: table/make(): don't reset $table to plant_fraction_for_herbaria_filter for commands that use $table

9772 06/08/2013 03:10 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/run: added table/make(), which makes the filter view

9771 06/08/2013 02:14 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/: renamed to raw_occurrence_record_plants because it's actually only the plants in raw_occurrence_record, not all of raw_occurrence_record. also, this will allow us to create a separate raw_occurrence_record_plants view whose name matches the folder and does not collide with the raw_occurrence_record table.

9770 06/08/2013 12:44 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: herbaria_filter/make(): added runtime, which is ~0 since it just needs to do CSV import and index scans

9769 06/08/2013 12:43 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: herbaria_filter/make(): time the population of herbaria_filter

9768 06/07/2013 11:47 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: plant_fraction/make(): updated runtime. added rows affected count to runtime so if the number of rows it's related to (in this case, institution_code) changes, the runtime can be expected to change accordingly.

9766 06/06/2013 04:49 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/raw_occurrence_record/run: plant_fraction/make(): plant_fraction column: COUNT counts non-NULL rather than true values (which counter-intuitively includes false, because it's non-NULL), so need to add NULLIF around the boolean expression to turn it into a NULL-or-not expression. see http://vegpath.org/wiki/2013-06-06_conference_call#GBIF-subsetting-fix-plant_fraction-SQL-bug .

9755 06/06/2013 08:09 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: table.tsv.gz/make(): documented runtime (35 min)