Activity - BIEN 3 - NCEAS Projects

Activity

From 11/16/2013 to 12/15/2013

12/15/2013

05:30 PM Revision 11911: updated inputs/datasource_release_status.xlsx: Aaron Marcuse-Kubitza
05:27 PM Revision 11910: added inputs/datasource_release_status.xlsx, export of Google spreadsheet at https://docs.google.com/spreadsheet/ccc?key=0ArZXrTAXd-TYdDRRb2RxYi11TWZrQVh5bVdKOURCeFE: Aaron Marcuse-Kubitza

12/12/2013

08:57 AM Revision 11909: planning/timeline/timeline.2013.xls: updated for progress: Aaron Marcuse-Kubitza
08:35 AM Revision 11908: bugfix: schemas/vegbien.sql: location: use the place_id from the parent location when no place_id is specified. this fixes a bug in analytical_stem_view where the parent location's place_id was used because it was sometimes missing from the sublocation, but the parent place_id *itself* was sometimes missing instead if sublocations each had their own place information. this way, it is always available directly in the sublocation, populated from the parent location if needed.: Aaron Marcuse-Kubitza
08:27 AM Revision 11907: bugfix: schemas/vegbien.sql: location: added place_id which is autopopulated from the current locationplace. join on this in plot.**, to avoid a 1:many join when a location has multiple locationplaces.: Aaron Marcuse-Kubitza

12/11/2013

11:10 PM Revision 11906: bugfix: schemas/vegbien.sql: locationevent_unique_within_parent_by_location unique index: need COALESCE() around location_id since it's nullable: Aaron Marcuse-Kubitza
10:54 PM Revision 11905: fix: inputs/CVS/^taxon_observation.**.sample/: added _no_import because this table duplicates part of what's imported from taxon_observation.**: Aaron Marcuse-Kubitza
10:42 PM Revision 11904: bugfix: inputs/VegBank/plot/: added _no_import because this table is left-joined and should not be imported separately: Aaron Marcuse-Kubitza
10:40 PM Revision 11903: bugfix: inputs/{.NCBI,CTFS}/*.src/: added _no_import because these tables are left-joined and should not be imported separately: Aaron Marcuse-Kubitza
09:56 PM Revision 11902: inputs/import.stats.xls: removed table names from datasources where only one table is imported: Aaron Marcuse-Kubitza
09:52 PM Revision 11901: fix: inputs/import.stats.xls: removed deleted tables from current import: Aaron Marcuse-Kubitza
09:51 PM Revision 11900: inputs/import.stats.xls: updated import times: Aaron Marcuse-Kubitza
07:56 PM Revision 11899: updated backups/TNRS.backup.md5: Aaron Marcuse-Kubitza
07:56 PM Revision 11898: added backups/vegbien.r11786.backup.md5: Aaron Marcuse-Kubitza
07:53 PM Revision 11897: /README.TXT: Full database import: backups: added step to download backup to local machine: Aaron Marcuse-Kubitza
07:45 PM Revision 11896: bugfix: /Makefile: install: need to run inputs/download in live mode so that the flat files are actually downloaded: Aaron Marcuse-Kubitza
07:43 PM Revision 11895: lib/common.Makefile: added %/live, for use with `make inputs/download`: Aaron Marcuse-Kubitza

12/10/2013

07:44 AM Revision 11894: planning/timeline/timeline.2013.xls: rescheduled tasks: Aaron Marcuse-Kubitza
07:40 AM Revision 11893: planning/timeline/timeline.2013.xls: updated for progress: Aaron Marcuse-Kubitza
07:36 AM Revision 11892: /README.TXT: Full database import: In PostgreSQL: documented that the tables to check are located in the *r# schema*, not public: Aaron Marcuse-Kubitza
07:32 AM Revision 11891: planning/timeline/timeline.2013.xls: updated for progress: Aaron Marcuse-Kubitza
07:32 AM Revision 11890: planning/timeline/timeline.2013.xls: datasource validations: reordered datasources according to Brian Enquist's new validation order (wiki.vegpath.org/Spot-checking_validation_order): Aaron Marcuse-Kubitza
07:10 AM Revision 11889: fix: schemas/vegbien.sql: analytical_specimen: added specimens-related columns that are in analytical_plot: Aaron Marcuse-Kubitza
06:35 AM Revision 11888: inputs/GBIF/raw_occurrence_record_plants/map.csv: row_num: remapped to plain *row_num, like the other datasources that have this field: Aaron Marcuse-Kubitza
06:31 AM Revision 11887: inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: rerun time: noted that this is only fast *after* manual vacuuming of the table (to remove the deleted rows from the index). autovacuum apparently does not run, although it should.: Aaron Marcuse-Kubitza
05:18 AM Revision 11886: planning/timeline/timeline.2013.xls: hid previous weeks: Aaron Marcuse-Kubitza
05:18 AM Revision 11885: planning/timeline/timeline.2013.xls: added timespan dots ◦ for supertasks: Aaron Marcuse-Kubitza
05:15 AM Revision 11884: planning/timeline/timeline.2013.xls: legend: changed to movable text box to avoid needing to erase and repopulate the header columns with the legend cells: Aaron Marcuse-Kubitza
05:03 AM Revision 11883: planning/timeline/timeline.2013.xls: crossed out and hid completed tasks: Aaron Marcuse-Kubitza
04:58 AM Revision 11882: planning/timeline/timeline.2013.xls: updated for progress: Aaron Marcuse-Kubitza

12/09/2013

07:24 PM Revision 11881: inputs/GBIF/raw_occurrence_record_plants/test.xml.ref: reran test, which added yearCollected/monthCollected/dayCollected: Aaron Marcuse-Kubitza
07:23 PM Revision 11880: inputs/CVS/plantConcept_/create.sql: documented runtime (3 min): Aaron Marcuse-Kubitza
06:59 PM Revision 11879: inputs/CTFS/*.src/: added test.xml.ref: Aaron Marcuse-Kubitza
06:58 PM Revision 11878: inputs/CTFS/*.src/: added VegBIEN.csv: Aaron Marcuse-Kubitza
06:56 PM Revision 11877: bugfix: inputs/CTFS/TaxonOccurrence*/map.csv: things mapped to taxonObservationID: remapped to taxonOccurrenceID since taxonObservationID is not mapped to anything in VegBIEN (denormalized VegCore doesn't distinguish between taxon occurrences and taxon observations of them): Aaron Marcuse-Kubitza
05:46 PM Revision 11876: bugfix: inputs/ARIZ/~.clean_up.sql: prevent "column already exists" errors when there is an input column of the same name as an output column: Aaron Marcuse-Kubitza
05:44 PM Revision 11875: bugfix: lib/runscripts/datasrc_dir.run: import(): don't run `sql/install` if the schema already exists, because this will try to rerun all the schema-creation queries. note that this idempotent functionality was *not* provided by the `make .../install` target that was previously used (idempotency is new with new-style import).: Aaron Marcuse-Kubitza
05:26 PM Revision 11874: bugfix: schemas/vegbien.sql: updated for renamed county_centroids column names: Aaron Marcuse-Kubitza
04:16 PM Revision 11873: inputs/.geoscrub/import_order.txt: added county_centroids so that it would be installed by new-style import: Aaron Marcuse-Kubitza
03:54 PM Revision 11872: bugfix: lib/runscripts/datasrc_dir.run: import(): can't run `datasrc_make reinstall` anymore because this now defers to the runscript for new-style import datasources (which was done so that `make .../install` properly reinstalls all the datasources). instead, call the applicable make targets manually (there are just 2 of them).: Aaron Marcuse-Kubitza
03:37 PM Revision 11871: inputs/FIA/TREE/run: documented import() runtime (1.5 h), which includes table cleanup runtime (1 h): Aaron Marcuse-Kubitza
03:09 PM Revision 11870: bugfix: bin/pg_dump_limit: support errexit by ignoring the nonzero exit status that grep returns when it doesn't match anything: Aaron Marcuse-Kubitza
02:43 PM Revision 11869: inputs/GBIF/raw_occurrence_record_plants/run: updated import() runtime (same), documented table cleanup runtime (1.5 h): Aaron Marcuse-Kubitza
02:38 PM Revision 11868: inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: CREATE INDEX ... specimenHolderInstitutions: documented runtime (45 min): Aaron Marcuse-Kubitza
02:28 PM Revision 11867: inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: documented runtime (3.5 min): Aaron Marcuse-Kubitza
02:27 PM Revision 11866: /README.TXT: Datasource setup: added steps to backup e-mails: Aaron Marcuse-Kubitza

12/06/2013

07:46 AM Revision 11865: bugfix: inputs/CTFS/import_order.txt: added *.src so that these would be installed under new-style import as well. this means that their columns will now be automapped, requiring the names to be renamed to VegCore names in */create.sql. note that VegCore taxonOccurrenceID has been renamed to taxonObservationID since this was last run.: Aaron Marcuse-Kubitza
06:56 AM Revision 11864: inputs/.geoscrub/run: documented import() runtime (20 min): Aaron Marcuse-Kubitza
06:12 AM Revision 11863: bugfix: inputs/.NCBI/import_order.txt: added nodes.src, names.src so that these would be installed under new-style import as well. this means that their columns will now be automapped, requiring the names to be renamed to VegCore names in nodes/create.sql.: Aaron Marcuse-Kubitza
06:01 AM Revision 11862: fix: /Makefile: inputs/reinstall: commented out to avoid a cascade of "overriding commands for target" warnings. this will revert to the default uninstall, install sequence for this target rather than the simultaneous-reinstall optimization (which can still be invoked manually).: Aaron Marcuse-Kubitza
05:52 AM Revision 11861: lib/sh/local.sh: public_schema_exists(): use a higher log_level for pg_schema_exists, to avoid all the verbose output involved in running the query: Aaron Marcuse-Kubitza
05:44 AM Revision 11860: bugfix: lib/sh/local.sh: public_schema_exists(): can no longer use psql_script_vegbien for this, because using `SET search_path` (called by psql_script_vegbien) with a schema that does not exist no longer produces an error. instead, use new pg_schema_exists(), which uses a different command that does produce an error if the schema does not exist.: Aaron Marcuse-Kubitza
05:38 AM Revision 11859: lib/sh/db.sh: added pg_require_schema(): Aaron Marcuse-Kubitza
05:37 AM Revision 11858: lib/sh/util.sh: stderr2stdout(): documented that this redirects fd 2->1 and log_fd (but *not* back to 2): Aaron Marcuse-Kubitza
05:34 AM Revision 11857: bugfix: lib/sh/util.sh: stderr2stdout() use `command` before tee, which re-filters log_fd so that stderr itself is also filtered. this allows log-filtering out an otherwise-confusing benign error when using e.g. stderr_matches().: Aaron Marcuse-Kubitza
04:31 AM Revision 11856: lib/sh/util.sh: added not(), for use in prefixing wrapped commands: Aaron Marcuse-Kubitza
04:14 AM Revision 11855: lib/sh/db.sh: added pg_schema_exists(): Aaron Marcuse-Kubitza
04:10 AM Revision 11854: lib/sh/util.sh: added stderr_matches(): Aaron Marcuse-Kubitza
03:59 AM Revision 11853: lib/sh/util.sh: documented that fds 2x/3x should not be used because *we* use these, as opposed to 1x which is used by the shell internally: Aaron Marcuse-Kubitza
03:57 AM Revision 11852: lib/sh/util.sh: added stdout_contains(): Aaron Marcuse-Kubitza
03:34 AM Revision 11851: lib/sh/util.sh: added stderr2stdout(): Aaron Marcuse-Kubitza
02:52 AM Revision 11850: fix: lib/sh/db.sh: pg_table_exists(): usage: documented that $table is actually required for this function: Aaron Marcuse-Kubitza
02:44 AM Revision 11849: bugfix: inputs/input.Makefile: install: for new-style datasources, use the associated runscript instead (the old-style install target will not do everything that's needed for a new-style datasource): Aaron Marcuse-Kubitza
01:57 AM Revision 11848: bugfix: /Makefile: moved inputs/reinstall to end so it overrides the corresponding subdir forwarding target: Aaron Marcuse-Kubitza
12:51 AM Revision 11847: bugfix: inputs/input.Makefile: install: for new-style datasources, use the associated runscript instead (the old-style install target will not do everything that's needed for a new-style datasource): Aaron Marcuse-Kubitza
12:27 AM Revision 11846: bugfix: /Makefile: inputs/install: don't run bin/reinstall_all here, because /install targets are supposed to be idempotent, forward-only actions that don't first remove existing data: Aaron Marcuse-Kubitza

12/05/2013

11:55 PM Revision 11845: bugfix: /Makefile: postgres-Darwin: don't prepend $(MAKE) to $(postgresReload-Darwin), because this is now a list of commands: Aaron Marcuse-Kubitza
11:52 PM Revision 11844: bugfix: /Makefile: config: ignore errors if ~/bin/make exists: Aaron Marcuse-Kubitza
11:38 PM Revision 11843: inputs/FIA/COND/postprocess.sql: filtering formula: documented that this was created by Brad, and provided the URL to it on nimoy: Aaron Marcuse-Kubitza
12:27 PM Revision 11842: inputs/CVS/cvs.~.clean_up.sql: remove plot.realLatitude/realLongitude, since this is private data that should not be publicly visible: Aaron Marcuse-Kubitza
12:19 PM Revision 11841: inputs/CVS/cvs.~.clean_up.sql: remove plot.realLatitude/realLongitude, since this is private data that should not be publicly visible: Aaron Marcuse-Kubitza
08:38 AM Revision 11840: bin/make_analytical_db: don't regenerate family_higher_plant_group from the NCBI data because the lookup table is now prepopulated as part of the schema: Aaron Marcuse-Kubitza
08:37 AM Revision 11839: bin/import_all: don't import NCBI because the lookup table is now prepopulated as part of the schema: Aaron Marcuse-Kubitza
08:35 AM Revision 11838: schemas/vegbien.sql: include the family_higher_plant_group lookup table values so that these don't need to be regenerated from the NCBI nodes whenever the DB is reloaded: Aaron Marcuse-Kubitza
07:58 AM Revision 11837: schemas/vegbien.sql: taxonlabel_update_ancestors(): don't do an index scan if the value being scanned for is NULL, to support testing this function without the indexes in place, without extra full-table scans for NULL values affecting things. this can be used to determine if the function is actually using the indexes, by turning them off and seeing if the runtime changes.: Aaron Marcuse-Kubitza
07:03 AM Revision 11836: schemas/util.sql: explain2table(): documented usage:: PERFORM util.explain2table($$
query
$$); Aaron Marcuse-Kubitza
05:52 AM Revision 11835: schemas/util.sql: explain2table(): by default, use the util.explain table: Aaron Marcuse-Kubitza
05:49 AM Revision 11834: schemas/util.sql: added explain table: Aaron Marcuse-Kubitza
05:47 AM Revision 11833: schemas/util.sql: added explain2notice(): Aaron Marcuse-Kubitza
05:44 AM Revision 11832: schemas/util.sql: added explain2str(): Aaron Marcuse-Kubitza
05:33 AM Revision 11831: schemas/util.sql: added explain2table(): Aaron Marcuse-Kubitza
05:23 AM Revision 11830: schemas/util.sql: added explain(): Aaron Marcuse-Kubitza
01:31 AM Revision 11829: schemas/vegbien.sql: taxonlabel_update_ancestors(): don't create a performance-intensive nested transaction (EXCEPTION block) for each INSERT, because there should no longer be duplicate ancestors, so it's OK to abort the whole transaction if this assertion fails: Aaron Marcuse-Kubitza
01:03 AM Revision 11828: bugfix: schemas/vegbien.sql: taxonlabel_update_ancestors_on_{insert,update}(): only use *either* the matched taxon's ancestors *or* the parent's ancestors, to avoid issues related to duplication between these two ancestors lists. this also fixes a bug where the 2nd taxonlabel_update_ancestors() call assumes that the existing ancestors are for the old *parent*, when in fact they have actually just been set to those for the new *matched taxon* (which horribly confuses taxonlabel_update_ancestors()).: Aaron Marcuse-Kubitza

12/04/2013

10:06 PM Revision 11827: schemas/vegbien.sql: _taxonlabel_set_parent_id(): just use a plain UPDATE statement, to avoid the significant parsing and stringification overhead of EXECUTE and quote_nullable(). it is not clear that EXECUTE is actually necessary to avoid caching the query plan, because the cache should be invalidated automatically when the table's ANALYZE statistics are regenerated.: Aaron Marcuse-Kubitza
10:00 PM Revision 11826: schemas/vegbien.sql: removed unused function _taxonlabel_set_matched_label_id(), which refers to obsolete fields: Aaron Marcuse-Kubitza
09:58 PM Revision 11825: schemas/vegbien.sql: synced to DB (the view renderer apparently changed the text of a view): Aaron Marcuse-Kubitza
09:44 PM Revision 11824: backups/TNRS.backup: saved copy backups/TNRS.2013-11-18.backup: Aaron Marcuse-Kubitza
07:26 PM Revision 11823: bugfix: bin/import_all: run in errexit mode, so that if the user cancels reinstalling of the import schema, the script will then abort instead of continuing and using the wrong schema: Aaron Marcuse-Kubitza
06:56 PM Revision 11822: bugfix: schemas/Makefile: %/uninstall: always confirm before removing an existing schema, not just for public and r*, because an auxiliary schema might also be used as $version and reinstalled by bin/import_all: Aaron Marcuse-Kubitza
06:04 PM Revision 11821: schemas/vegbien.sql: analytical_stem_view: scrubbed_author: removed empty COALESCE() around value (left over from when multiple values needed to be combined for many TNRS fields): Aaron Marcuse-Kubitza
04:57 PM Revision 11820: inputs/CVS/^taxon_observation.**.sample/create.sql: uncommented identifiedBy since this is now part of taxonObservation_: Aaron Marcuse-Kubitza
04:08 PM Revision 11819: fix: inputs/CVS/observation_community/create.sql: communityName: populate from commConcept.commName instead, because commInterpretation.commname is not always populated. this requires left-joining to commConcept.: Aaron Marcuse-Kubitza
03:58 PM Revision 11818: inputs/CVS/observation_community/map.csv: updated output column names to new input column names, to avoid later output column collisions: Aaron Marcuse-Kubitza
03:42 PM Revision 11817: inputs/CVS/observation_community/header.csv, map.csv: updated input column names for cvs.~.clean_up.sql renamings: Aaron Marcuse-Kubitza
03:12 PM Revision 11816: schemas/vegbien.sql: provider_count_view: source totals: use the much faster query developed for Brad (wiki.vegpath.org/VegBIEN_FAQ#from-Brad-on-2013-12-4), which avoids the need to do a GROUP BY on all of analytical_stem. eventually, we will want to apply the same optimization to the first publisher subtotals.: Aaron Marcuse-Kubitza
04:19 AM Revision 11815: inputs/CVS/cvs.~.clean_up.sql: commClass, commConcept fields: prepend table name to avoid inter-table collisions upon join: Aaron Marcuse-Kubitza
03:43 AM Revision 11814: added inputs/CVS/observation_community/, as for VegBank: Aaron Marcuse-Kubitza
03:32 AM Revision 11813: inputs/CVS/cvs.~.clean_up.sql: commClass.dba_src_ID: prepend table name to avoid inter-table collisions upon join: Aaron Marcuse-Kubitza

12/03/2013

04:32 PM Revision 11812: added inputs/CVS/observationContributor_/, which adds the people collecting the plot: Aaron Marcuse-Kubitza
04:02 PM Revision 11811: inputs/CVS/cvs.~.clean_up.sql: observationContributor.dba_src_ID: prepended table name to avoid collision when left-joining to party: Aaron Marcuse-Kubitza
03:44 PM Revision 11810: bugfix: inputs/input.Makefile: %/header.csv: errexit the command so that errors won't scroll by, which in this case requires `set -o pipefail`: Aaron Marcuse-Kubitza
02:57 PM Revision 11809: fix: inputs/CVS/taxonObservation_/create.sql: mapped identifiedBy, which involves joining to party: Aaron Marcuse-Kubitza
02:35 PM Revision 11808: inputs/CVS/cvs.~.clean_up.sql: don't rename taxonInterpretation.PARTY_ID, so that this can be USING-joined to party in inputs/CVS/taxonObservation_/create.sql: Aaron Marcuse-Kubitza
01:47 PM Revision 11807: schemas/vegbien.ERD.mwb: regenerated exports: Aaron Marcuse-Kubitza
08:58 AM Revision 11806: bin/map: support param start="", which indicates the default value. this fixes a bug in inputs/input.Makefile $(restart_row), which outputs "" if an explicit starting row is not found.: Aaron Marcuse-Kubitza
08:25 AM Revision 11805: inputs/CVS/^taxon_observation.**.sample/map.csv: synced output columns to input columns (which removes the extra *s): Aaron Marcuse-Kubitza
08:00 AM Revision 11804: fix: inputs/CVS/plot_/postprocess.sql: locality: include the site name (authorLocation), because this is part of the unique specification of the place that was sampled, and Bob wants this to be included in VegBIEN: Aaron Marcuse-Kubitza
07:58 AM Revision 11803: inputs/CVS/^taxon_observation.**.sample/create.sql: removed parentLocationID, since this is unused in CVS: Aaron Marcuse-Kubitza
07:45 AM Revision 11802: bugfix: inputs/input.Makefile: `%/install: %/create.sql`: errexit the command so that errors won't scroll by, which in this case requires `set -o pipefail`: Aaron Marcuse-Kubitza
06:51 AM Revision 11801: inputs/VegBank/plot/postprocess.sql: locality: include the site name (authorlocation), because this is part of the unique specification of the place that was sampled: Aaron Marcuse-Kubitza
06:27 AM Revision 11800: bugfix: /README.TXT: Full database import: To restart an aborted import for a specific table: run the two commands in errexit mode so that the datasource does not incorrectly have the temp suffix removed if the import command exited with an error: Aaron Marcuse-Kubitza
05:19 AM Revision 11799: fix: inputs/CVS/taxon_observation.**/map.csv: omit authorPlantName because it is not specific to the taxonInterpretation row (this is in a separate taxonInterpretation for the original determination instead): Aaron Marcuse-Kubitza
04:59 AM Revision 11798: web/links/index.htm: updated to Firefox bookmarks. PostgreSQL: added links for troubleshooting out-of-memory errors, which show up (cryptically) as "The database system is in recovery mode" errors in processes running at the time the out-of-memory condition occurred.: Aaron Marcuse-Kubitza
02:31 AM Revision 11797: schemas/postgresql.conf: work_mem: documented that this seemingly small # is *multiplied* by max_connections, i.e. 256 MB * 100 = *26 GB*, which approaches total memory (32 GB): Aaron Marcuse-Kubitza
01:21 AM Task #831 (New): make tests use their own public schema: * enables the automated tests to be run on vegbiendev
* allows adding a new datasource directly on vegbiendev, witho... Aaron Marcuse-Kubitza
12:58 AM Task #501: find out which datasources won't allow their data to be publicly accessible: see [[Datasource conditions of use]] Aaron Marcuse-Kubitza

12/02/2013

02:46 PM Revision 11796: fix: inputs/CVS/plot_/map.csv: PARENT_ID: remapped to UNUSED, to clarify that subplots are *not* implemented through this field: Aaron Marcuse-Kubitza

11/27/2013

11:16 PM Revision 11795: bugfix: /README.TXT: Full database import: To restart an aborted import for a specific table: added command to remove the temp suffix from the source table entry, which is *not* automatic for importing a specific table (only for importing the entire datasource, at the end of which the datasource is considered completely imported and ready to overwrite any previous import): Aaron Marcuse-Kubitza
11:04 PM Revision 11794: inputs/input.Makefile: scrub: clarified that using & (background process) *also* ignores TNRS errors (the primary purpose of & , of course, is to run asynchronously): Aaron Marcuse-Kubitza
10:42 PM Revision 11793: bugfix: schemas/Makefile: $(confirmRmPublicSchema): only prompt to delete the schema if it actually exists. this avoids prompting to remove a non-existent schema at the beginning of bin/import_all, which requires user attention. since bin/import_all is often run with a delayed start (e.g. to wait for a staging table reinstall to complete), the user may not be at the terminal when this message is displayed, and without this fix, the import would be prevented from running until they return.: Aaron Marcuse-Kubitza
09:24 PM Revision 11792: inputs/.geoscrub/geoscrub_output/run: import() runtime: added starscream runtime (20 min): Aaron Marcuse-Kubitza
08:48 PM Revision 11791: planning/timeline/timeline.2013.xls: updated for progress: Aaron Marcuse-Kubitza
08:33 PM Revision 11790: inputs/.geoscrub/geoscrub_output/run: documented import() runtime (15 min): Aaron Marcuse-Kubitza

11/26/2013

11:18 PM Revision 11789: inputs/.geoscrub/Source/map.csv: source__modified_date: updated for current run: Aaron Marcuse-Kubitza
11:11 PM Revision 11788: **/new_terms.csv, unmapped_terms.csv updated (using `make missing_mappings`): Aaron Marcuse-Kubitza
11:10 PM Revision 11787: /README.TXT: Full database import: documented that `make schemas/reinstall` requires sudo access: Aaron Marcuse-Kubitza
11:07 PM Revision 11786: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: updated upload time (30 s): Aaron Marcuse-Kubitza
11:00 PM Revision 11785: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: export_(): updated runtime (25 s): Aaron Marcuse-Kubitza
10:58 PM Revision 11784: lib/sh/util.sh: import_vars: don't overwrite vars that are already defined, to allow the caller to specify their own values for the vars to create. this requires callers that rely on the overwriting functionality to reverse the order in which they run use_* commands, so that the higher-precedence use_* is applied first and the other one as the default values for the first.: Aaron Marcuse-Kubitza
10:03 PM Revision 11783: derived/biengeo/README.txt: updated geoscrub.sh runtime: Aaron Marcuse-Kubitza
09:57 PM Revision 11782: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: make(): derived/biengeo/geoscrub.sh: documented runtime (2.5 h): Aaron Marcuse-Kubitza
09:45 PM Revision 11781: inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: don't connect to DB as the root user, because this is not needed now that the geoscrub schema is owned by the bien user. this avoids a sudo password prompt at the end of the geoscrubbing run.: Aaron Marcuse-Kubitza
07:38 PM Revision 11780: planning/timeline/timeline.2013.xls: rescheduled tasks: Aaron Marcuse-Kubitza
06:51 PM Revision 11779: planning/timeline/timeline.2013.xls: rescheduled tasks: Aaron Marcuse-Kubitza
06:41 PM Revision 11778: planning/timeline/timeline.2013.xls: updated for progress: Aaron Marcuse-Kubitza
02:23 PM Revision 11777: bugfix: inputs/input.Makefile: $(import): except in a full-database import, errexit so that the import will stop on an error and not let it scroll by: Aaron Marcuse-Kubitza
01:55 PM Revision 11776: added inputs/CVS/^taxon_observation.**.sample/, used for the extract. note that the column list is slightly different than for VegBank.: Aaron Marcuse-Kubitza
01:42 PM Revision 11775: inputs/CVS/taxonObservation_/map.csv: removed taxonObservation_-- prefix from terms that do not need to be table-specific (like for VegBank): Aaron Marcuse-Kubitza
01:32 PM Revision 11774: fix: inputs/CVS/taxonObservation_/map.csv: plantConcept_ columns: synced input and output column names to their names in plantConcept_: Aaron Marcuse-Kubitza
01:30 PM Revision 11773: fix: inputs/CVS/taxonObservation_/map.csv: plantConcept_ columns: synced input and output column names to their names in plantConcept_: Aaron Marcuse-Kubitza
01:26 PM Revision 11772: inputs/CVS/plantConcept_/map.csv: removed plantConcept_-- prefix from terms that do not need to be table-specific (like for VegBank): Aaron Marcuse-Kubitza
01:22 PM Revision 11771: lib/sh/db.sh: pg_table_exists(): use `SELECT NULL` instead of `SELECT *` to avoid a long column list cluttering up the log output: Aaron Marcuse-Kubitza
12:47 PM Revision 11770: lib/runscripts/table.run: table_make_install(): simplified the setting of $noclobber since there no longer needs to be a different command for when the log exists: Aaron Marcuse-Kubitza
12:08 PM Revision 11769: bugfix: lib/runscripts/table.run: need to errexit the make target, so that errors in the SQL install scripts are not suppressed. this requires pre-checking if the table exists (using new pg_table_exists), so that the install target's errexit does not then need to be suppressed for cases when the table already exists.: Aaron Marcuse-Kubitza
12:01 PM Revision 11768: lib/sh/db.sh: added pg_table_exists(): Aaron Marcuse-Kubitza
06:48 AM Revision 11767: planning/timeline/timeline.2013.xls: added timespan dots ◦ for supertasks: Aaron Marcuse-Kubitza
06:46 AM Revision 11766: planning/timeline/timeline.2013.xls: crossed out and hid completed tasks: Aaron Marcuse-Kubitza
06:20 AM Revision 11765: planning/timeline/timeline.2013.xls: hid previous weeks: Aaron Marcuse-Kubitza
06:18 AM Revision 11764: planning/timeline/timeline.2013.xls: consolidated legend to take up fewer columns and avoid repeating labels: Aaron Marcuse-Kubitza
06:09 AM Revision 11763: bugfix: inputs/CVS/import_order.txt: added taxon_observation.**. rescheduled tasks.: Aaron Marcuse-Kubitza
06:05 AM Revision 11762: planning/timeline/timeline.2013.xls: updated for progress: Aaron Marcuse-Kubitza
05:56 AM Revision 11761: bugfix: inputs/CVS/import_order.txt: added taxon_observation.**: Aaron Marcuse-Kubitza
05:54 AM Revision 11760: inputs/CVS/: don't import joined tables, because they are now imported in the taxon_observation.** left-join instead: Aaron Marcuse-Kubitza
05:53 AM Revision 11759: inputs/CVS/: added taxon_observation.** left-join of the tables, using the steps at http://wiki.vegpath.org/Left-joining_a_datasource. this involves renaming taxonOccurrenceID->taxonOccurrenceID__overall_plot so that it can then be joined together with aggregateOrganismObservationID to create the full taxonOccurrenceID (as in VegBank).: Aaron Marcuse-Kubitza
05:46 AM Revision 11758: inputs/CVS/stemCount_/map.csv: remapped stratum_ID->*STRATUM_ID so it would match up with stratum.*STRATUM_ID: Aaron Marcuse-Kubitza

11/25/2013

10:14 PM Revision 11757: inputs/CVS/taxonObservation_/map.csv: mapped TAXONINTERPRETATION_ID to identificationID: Aaron Marcuse-Kubitza
10:03 PM Revision 11756: added inputs/CVS/stratum/: Aaron Marcuse-Kubitza
10:02 PM Revision 11755: added inputs/CVS/stratumType/: Aaron Marcuse-Kubitza
09:43 PM Revision 11754: inputs/CVS/: prepended the table name to each column name to prevent column collisions, using the steps at http://wiki.vegpath.org/Left-joining_a_datasource: Aaron Marcuse-Kubitza
08:07 PM Revision 11753: bugfix: inputs/CVS/plantConcept_/map.csv: PLANTCONCEPT_ID: remapped without * prefix so that the USING join in inputs/CVS/taxonObservation_/create.sql would continue to work: Aaron Marcuse-Kubitza
08:05 PM Revision 11752: inputs/CVS/taxonObservation_/header.csv, map.csv: updated to use plantConcept_ renamed columns: Aaron Marcuse-Kubitza
08:03 PM Revision 11751: bugfix: inputs/CVS/plantConcept_/map.csv: PLANTCONCEPT_ID: remapped without * prefix so that the USING join in inputs/CVS/taxonObservation_/create.sql would continue to work: Aaron Marcuse-Kubitza
07:59 PM Revision 11750: planning/timeline/timeline.2013.xls: updated for progress: Aaron Marcuse-Kubitza
07:52 PM Revision 11749: inputs/CVS/: switched to new-style import, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource: Aaron Marcuse-Kubitza
07:32 PM Revision 11748: inputs/CVS/taxonObservation_/map.csv: updated for CVS refresh: Aaron Marcuse-Kubitza
07:17 PM Revision 11747: inputs/CVS/taxonObservation_/map.csv: updated input column names to plantConcept_ renamings: Aaron Marcuse-Kubitza
07:06 PM Revision 11746: inputs/CVS/plantConcept_/header.csv, map.csv: updated for CVS refresh: Aaron Marcuse-Kubitza
06:51 PM Revision 11745: fix: inputs/CVS/plot_/map.csv: removed filter-less collisions. note that the name county_ is assigned in plot_/create.sql, not cvs.~.clean_up.sql as one might expect, because this is a generated column.: Aaron Marcuse-Kubitza
06:42 PM Revision 11744: fix: inputs/CVS/plot_/map.csv: removed filter-less collisions: Aaron Marcuse-Kubitza
06:41 PM Revision 11743: fix: inputs/CVS/plot_/map.csv: removed filter-less collisions: Aaron Marcuse-Kubitza
05:32 PM Revision 11742: fix: inputs/CVS/taxonObservation_/map.csv: moved inherited derived columns to right after the other columns, because for this table, these are actually real input columns rather than appended derived columns. the column order must match header.csv to avoid mis-renamings.: Aaron Marcuse-Kubitza
04:51 PM Revision 11741: inputs/CVS/taxonObservation_/map.csv: removed filter functions, which are now performed in plantConcept_: Aaron Marcuse-Kubitza
04:43 PM Revision 11740: inputs/CVS/taxonObservation_/postprocess.sql: added _parent index to facilitate joins: Aaron Marcuse-Kubitza
04:24 PM Revision 11739: fix: inputs/CVS/taxonObservation_/header.csv, map.csv: updated for CVS refresh and addition of plantConcept_ derived columns: Aaron Marcuse-Kubitza
03:22 PM Revision 11738: inputs/CVS/stemCount_/: translated filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#1-Translate-filters-to-postprocessing-derived-columns. note that the inserted row count changes, because there is now a primary key (which the table is auto-sorted by) where previously there was none.: Aaron Marcuse-Kubitza
02:58 PM Revision 11737: web/links/index.htm: updated to Firefox bookmarks. added API writing links, including the best quotes from a Google developer's PowerPoint on the topic.: Aaron Marcuse-Kubitza
12:59 AM Revision 11736: schemas/vegbien.sql: collected_dates: documented runtime (2.5 min): Aaron Marcuse-Kubitza
12:57 AM Revision 11735: schemas/vegbien.sql: collected_date_min: replaced with collected_dates view that lists all dates we have, so that we can determine which of these may be valid. it turns out that we have data collected from very far back (to the year 1), which are not merely 2-digit years because PostgreSQL will only parse early years when there are 4 digits.: Aaron Marcuse-Kubitza
12:26 AM Revision 11734: added planning/publication/KNB/submission.published.old_site.maff, submission.published.eml.xml from old KNB site: Aaron Marcuse-Kubitza
12:18 AM Revision 11733: added planning/publication/KNB/submission.*: Aaron Marcuse-Kubitza

11/24/2013

11:48 PM Revision 11732: bugfix: schemas/vegbien.sql: collected_date_min: exclude invalid dates < 1000-01-01: Aaron Marcuse-Kubitza
11:41 PM Revision 11731: bugfix: schemas/vegbien.sql: collected_date_min: exclude -infinity: Aaron Marcuse-Kubitza
11:13 PM Revision 11730: schemas/vegbien.sql: added collected_date_min view: Aaron Marcuse-Kubitza

11/21/2013

05:20 PM Revision 11729: inputs/CVS/plot_/: translated column filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#1-Translate-filters-to-postprocessing-derived-columns: Aaron Marcuse-Kubitza
04:59 PM Revision 11728: /README.TXT: Full database import: verifying import: In PostgreSQL: don't include current values of the datasource counts, etc., because these may change and should always be re-checked at wiki.vegpath.org/VegBIEN_contents: Aaron Marcuse-Kubitza
04:27 PM Revision 11727: inputs/CVS/plot_/postprocess.sql: added pkey from the primary joined table: Aaron Marcuse-Kubitza
04:11 PM Revision 11726: inputs/CVS/plot_/map.csv: documented assumptions about the units of fields: Aaron Marcuse-Kubitza
03:52 PM Revision 11725: inputs/CVS/plot_/map.csv: documented assumptions about the units and meaning of numeric codes for fields: Aaron Marcuse-Kubitza
03:01 PM Revision 11724: inputs/CVS/plantConcept_/: translated multi-column filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#1-Translate-filters-to-postprocessing-derived-columns: Aaron Marcuse-Kubitza
02:54 PM Revision 11723: inputs/CVS/plantConcept_/: translated multi-column filters to postprocessing derived columns, using the steps at http://wiki.vegpath.org/Adding_new-style_import_to_a_datasource#1-Translate-filters-to-postprocessing-derived-columns: Aaron Marcuse-Kubitza
01:59 PM Revision 11722: web/links/index.htm: updated to Firefox bookmarks. BIEN: added DataONE compatibility links.: Aaron Marcuse-Kubitza
01:58 PM Revision 11721: inputs/CVS/plantConcept_/postprocess.sql: added pkey from the primary joined table: Aaron Marcuse-Kubitza
01:11 PM Revision 11720: inputs/CVS/observation_/postprocess.sql: added pkey from the primary joined table. added _parent index to facilitate joins.: Aaron Marcuse-Kubitza
01:08 PM Revision 11719: fix: inputs/input.Makefile: $(svnFilesGlob): removed schema and PDF files, since these are owned by the data provider and should not be in the repository that gets open-sourced: Aaron Marcuse-Kubitza
01:01 PM Revision 11718: bugfix: inputs/CVS/observation_/create.sql: only include one soilObs for each observation (using DISTINCT ON), rather than just left-joining them: Aaron Marcuse-Kubitza
11:59 AM Revision 11717: inputs/: removed SALVIAS-CSV, because this is a sample datasource which was only there to test the mapping process. it should not be adding records that duplicate SALVIAS, nor should it take up maintenance effort (switching to new-style import, updating to match SALVIAS, etc.).: Aaron Marcuse-Kubitza
11:52 AM Revision 11716: planning/timeline/timeline.2013.xls: removed the weeks of 12/23, 12/30 because these are during winter break. rescheduled tasks.: Aaron Marcuse-Kubitza
11:08 AM Revision 11715: inputs/.TNRS/schema.sql: updated runtime (30 min) and rowcount (+2 million): Aaron Marcuse-Kubitza
10:23 AM Revision 11714: planning/timeline/timeline.2013.xls: rescheduled tasks: Aaron Marcuse-Kubitza
10:16 AM Revision 11713: planning/timeline/timeline.2013.xls: crossed out and hid completed tasks: Aaron Marcuse-Kubitza
10:14 AM Revision 11712: planning/timeline/timeline.2013.xls: updated for progress: Aaron Marcuse-Kubitza
09:04 AM Revision 11711: fix: inputs/.TNRS/schema.sql: tnrs_populate_fields(): is_valid_match: set this to false if Taxonomic_status is Invalid: Aaron Marcuse-Kubitza
08:53 AM Revision 11710: schemas/vegbien.sql: analytical_stem_view: added taxonomic_status. notice that PostgreSQL 9.3 puts each view column on a separate line, making it *much* easier to review the svn diff!: Aaron Marcuse-Kubitza
08:49 AM Revision 11709: inputs/.TNRS/schema.sql: added map_taxonomic_status(): Aaron Marcuse-Kubitza
08:48 AM Revision 11708: inputs/.TNRS/schema.sql, data.sql: updated for PostgreSQL 9.3: Aaron Marcuse-Kubitza
08:26 AM Revision 11707: bugfix: inputs/CVS/stemCount_/map.csv: ensure the aggregateoccurrence.sourceaccessioncode is always populated, because this is a required field when using sourceaccessioncodes. without it, the import will exclude rows which lack a value in this field because it cannot deduplicate on it for these rows, leading to the dropping of large numbers of occurrences. this shows up when comparing provider_count to the input table's row count, and produces the following error in the .errors table:: ---
ERROR: duplicate key value violates unique constraint "aggregateoccurrence_taxonoccurrence_1_to_1"
DETAIL: Key ... Aaron Marcuse-Kubitza
07:40 AM Revision 11706: fix: schemas/vegbien.sql: taxon_trait_view: include only TNRS-valid names: Aaron Marcuse-Kubitza
12:24 AM Revision 11705: copyright scrub: inputs/: removed data provider-owned schema and documentation files, which are not BIEN copyright and should not be part of what is submitted for open-sourcing. these files will remain accessible via the web interface (fs.vegpath.org), but will not be in the repository.: Aaron Marcuse-Kubitza
12:02 AM Revision 11704: added inputs/TEAM/_src/data_cart.tsv, containing the content extracted from data_cart.maff: Aaron Marcuse-Kubitza

11/20/2013

11:38 PM Revision 11703: web/links/index.htm: updated to Firefox bookmarks. BIEN: open-sourcing: added UArizona and iPlant IP policies, which are relevant to Brad's numerous documentation and schema-modeling contributions in our repository (most done while he was an iPlant employee).: Aaron Marcuse-Kubitza
10:49 PM Revision 11702: removed inputs/TEAM/_src/data_cart.pdf since this does not contain all the info in data_cart.maff: Aaron Marcuse-Kubitza
01:21 PM Revision 11701: added planning/legal/open-sourcing/request_to_open_source_software.orig.docx.url: Aaron Marcuse-Kubitza
01:18 PM Revision 11700: added planning/legal/open-sourcing/, which will contain the "request to open source software" form (this cannot be under version control due to copyright limitations stated in the form): Aaron Marcuse-Kubitza

11/19/2013

09:21 PM Revision 11699: web/links/index.htm: updated to Firefox bookmarks. BIEN: open-sourcing: added potential licenses we could use (public domain/CC0, BSD, GNU Verbatim Copying License, *not* CC-BY because incompatible w/ GPL).: Aaron Marcuse-Kubitza
08:31 PM Revision 11698: web/links/index.htm: updated to Firefox bookmarks. BIEN: added links related to open-sourcing it, including the "Request to Open Source Software" form, the funding sources that need to be included in it, and part of the delegation of authority chain (from the UC Regents) that authorizes the open-sourcing.: Aaron Marcuse-Kubitza

11/18/2013

11:38 PM Revision 11697: backups/TNRS.backup.md5: updated: Aaron Marcuse-Kubitza
10:50 PM Task #816 (New): re-run TNRS on mis-scrubbed names: * temporary workaround for names with an accepted name:
use @Accepted_name_family@ when @Name_matched_accepted_famil... Aaron Marcuse-Kubitza
05:40 PM Revision 11696: schemas/vegbien.sql: sync_analytical_stem_to_view(): use new util.force_recreate() instead of manually dropping and re-creating every view that uses this. this avoids the need to add several lines to this function every time we add a new scientific view (of which we expect to have many), because force_recreate()'s error parsing handles this automatically. this makes it possible for a non-expert user to add scientific views without compromising the ability to add columns to analytical_stem_view, because they don't need to understand Postgres's dependency error messages when updating analytical_stem with this function.: Aaron Marcuse-Kubitza
05:32 PM Revision 11695: schemas/util.sql: added force_recreate(), for use by sync_analytical_stem_to_view(). this uses the new `GET STACKED DIAGNOSTICS` in PostgreSQL 9.3 to access the DETAIL section of the dependent_objects_still_exist error.: Aaron Marcuse-Kubitza
12:10 PM Revision 11694: web/links/index.htm: updated to Firefox bookmarks. upgrading to PostgreSQL 9.3: added Linux pg_upgrade steps and install instructions. added Mac PostGIS, psycopg2 install steps. added note that after installing, you need to restore config values that the upgrade reset: in pgAdmin > Preferences > Query tool > Query editor, set Max characters per column back to -1 (to avoid cells being truncated). (this is *not* a bug in PostgreSQL, only in pgAdmin, and does *not* signal a need to downgrade.): Aaron Marcuse-Kubitza
06:52 AM Revision 11693: planning/timeline/timeline.2013.xls: hid previous weeks: Aaron Marcuse-Kubitza
06:51 AM Revision 11692: planning/timeline/timeline.2013.xls: rescheduled tasks: Aaron Marcuse-Kubitza
06:45 AM Revision 11691: planning/timeline/timeline.2013.xls: added timespan checkmarks: Aaron Marcuse-Kubitza
06:44 AM Revision 11690: planning/timeline/timeline.2013.xls: hid completed tasks: Aaron Marcuse-Kubitza
06:43 AM Revision 11689: planning/timeline/timeline.2013.xls: updated for progress: Aaron Marcuse-Kubitza
06:23 AM Revision 11688: inputs/CVS/run: `make .../reinstall`: documented vegbiendev runtime (45 min): Aaron Marcuse-Kubitza
05:35 AM Revision 11687: removed inputs/CVS/cvs-archive-2012-12-04.schema.sql, which has been replaced by cvs-eep-archive-2013-10-22-VegBIEN.schema.sql: Aaron Marcuse-Kubitza
05:05 AM Revision 11686: bugfix: /README.TXT: to backup files not in Time Machine: PostgreSQL: need to run with `overwrite=1` so removed files are also deleted: Aaron Marcuse-Kubitza
05:02 AM Revision 11685: /README.TXT: to backup files not in Time Machine: PostgreSQL: only stop PostgreSQL after all files have been copied, to minimize the time that the PostgreSQL server is down (the final copy just copies concurrent changes): Aaron Marcuse-Kubitza
05:02 AM Revision 11684: /README.TXT: to backup files not in Time Machine: PostgreSQL: only stop PostgreSQL after all files have been copied, to minimize the time that the PostgreSQL server is down (the final copy just copies concurrent changes): Aaron Marcuse-Kubitza
04:59 AM Revision 11683: /README.TXT: updated to PostgreSQL 9.3: Aaron Marcuse-Kubitza
04:54 AM Revision 11682: added inputs/CVS/_src/cvs-eep-archive-2013-10-22-VegBIEN.zip.url: Aaron Marcuse-Kubitza
04:54 AM Revision 11681: added inputs/CVS/cvs-eep-archive-2013-10-22-VegBIEN.schema.sql: Aaron Marcuse-Kubitza
04:52 AM Revision 11680: inputs/CVS/run: documented `make .../reinstall` runtime (25 min): Aaron Marcuse-Kubitza
04:27 AM Revision 11679: inputs/VegBank/stemlocation_/header.csv: updated from reinstalling stemlocation_: Aaron Marcuse-Kubitza
04:26 AM Revision 11678: added inputs/CVS/_src/cvs-eep-archive-2013-10-22-VegBIEN.schema.sql: Aaron Marcuse-Kubitza
04:23 AM Revision 11677: added inputs/CVS/_src/cvs-eep-archive-2013-10-22-VegBIEN.schema.sql.run, which makes the SQL suitable for PostgreSQL: Aaron Marcuse-Kubitza
03:52 AM Revision 11676: bugfix: inputs/input.Makefile: sql/install: exit on error by using `set -o pipefail`: Aaron Marcuse-Kubitza
12:43 AM Revision 11675: fix: /Makefile: $(macPostgresLibs): added libpq.5, which is needed by PostgreSQL 9.3: Aaron Marcuse-Kubitza
12:29 AM Revision 11674: fix: /Makefile: postgres-Darwin: also need to install psycopg2: Aaron Marcuse-Kubitza

11/17/2013

11:27 PM Revision 11673: /Makefile: postgres-Linux: add the PostgreSQL 9.2 apt-src in case we ever need to downgrade to it: Aaron Marcuse-Kubitza
10:57 PM Revision 11672: bugfix: /Makefile: postgres-Linux: ignore errors if `sudo apt-get update` returns a non-zero exit status due to unreachable apt sources (which are likely unrelated to PostgreSQL, and should not prevent PostgreSQL configuration from continuing): Aaron Marcuse-Kubitza
10:54 PM Revision 11671: bugfix: /Makefile: postgres-Linux: fixed command to create /etc/apt/sources.list.d/pgdg.list: Aaron Marcuse-Kubitza

Also available in: Atom