/ - Changes - BIEN 3 - NCEAS Projects

root @ 11873

#	Date	Author	Comment
11873	12/09/2013 04:16 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/import_order.txt: added county_centroids so that it would be installed by new-style import
11872	12/09/2013 03:54 PM	Aaron Marcuse-Kubitza	bugfix: lib/runscripts/datasrc_dir.run: import(): can't run `datasrc_make reinstall` anymore because this now defers to the runscript for new-style import datasources (which was done so that `make .../install` properly reinstalls all the datasources). instead, call the applicable make targets manually (there are just 2 of them).
11871	12/09/2013 03:37 PM	Aaron Marcuse-Kubitza	inputs/FIA/TREE/run: documented import() runtime (1.5 h), which includes table cleanup runtime (1 h)
11870	12/09/2013 03:09 PM	Aaron Marcuse-Kubitza	bugfix: bin/pg_dump_limit: support errexit by ignoring the nonzero exit status that grep returns when it doesn't match anything
11869	12/09/2013 02:43 PM	Aaron Marcuse-Kubitza	inputs/GBIF/raw_occurrence_record_plants/run: updated import() runtime (same), documented table cleanup runtime (1.5 h)
11868	12/09/2013 02:38 PM	Aaron Marcuse-Kubitza	inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: CREATE INDEX ... specimenHolderInstitutions: documented runtime (45 min)
11867	12/09/2013 02:28 PM	Aaron Marcuse-Kubitza	inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: documented runtime (3.5 min)
11866	12/09/2013 02:27 PM	Aaron Marcuse-Kubitza	/README.TXT: Datasource setup: added steps to backup e-mails
11865	12/06/2013 07:46 AM	Aaron Marcuse-Kubitza	bugfix: inputs/CTFS/import_order.txt: added .src so that these would be installed under new-style import as well. this means that their columns will now be automapped, requiring the names to be renamed to VegCore names in /create.sql. note that VegCore taxonOccurrenceID has been renamed to taxonObservationID since this was last run.
11864	12/06/2013 06:56 AM	Aaron Marcuse-Kubitza	inputs/.geoscrub/run: documented import() runtime (20 min)
11863	12/06/2013 06:12 AM	Aaron Marcuse-Kubitza	bugfix: inputs/.NCBI/import_order.txt: added nodes.src, names.src so that these would be installed under new-style import as well. this means that their columns will now be automapped, requiring the names to be renamed to VegCore names in nodes/create.sql.
11862	12/06/2013 06:01 AM	Aaron Marcuse-Kubitza	fix: /Makefile: inputs/reinstall: commented out to avoid a cascade of "overriding commands for target" warnings. this will revert to the default uninstall, install sequence for this target rather than the simultaneous-reinstall optimization (which can still be invoked manually).
11861	12/06/2013 05:52 AM	Aaron Marcuse-Kubitza	lib/sh/local.sh: public_schema_exists(): use a higher log_level for pg_schema_exists, to avoid all the verbose output involved in running the query
11860	12/06/2013 05:44 AM	Aaron Marcuse-Kubitza	bugfix: lib/sh/local.sh: public_schema_exists(): can no longer use psql_script_vegbien for this, because using `SET search_path` (called by psql_script_vegbien) with a schema that does not exist no longer produces an error. instead, use new pg_schema_exists(), which uses a different command that does produce an error if the schema does not exist.
11859	12/06/2013 05:38 AM	Aaron Marcuse-Kubitza	lib/sh/db.sh: added pg_require_schema()
11858	12/06/2013 05:37 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: stderr2stdout(): documented that this redirects fd 2->1 and log_fd (but not back to 2)
11857	12/06/2013 05:34 AM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: stderr2stdout() use `command` before tee, which re-filters log_fd so that stderr itself is also filtered. this allows log-filtering out an otherwise-confusing benign error when using e.g. stderr_matches().
11856	12/06/2013 04:31 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added not(), for use in prefixing wrapped commands
11855	12/06/2013 04:14 AM	Aaron Marcuse-Kubitza	lib/sh/db.sh: added pg_schema_exists()
11854	12/06/2013 04:10 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added stderr_matches()
11853	12/06/2013 03:59 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: documented that fds 2x/3x should not be used because we use these, as opposed to 1x which is used by the shell internally
11852	12/06/2013 03:57 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added stdout_contains()
11851	12/06/2013 03:34 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added stderr2stdout()
11850	12/06/2013 02:52 AM	Aaron Marcuse-Kubitza	fix: lib/sh/db.sh: pg_table_exists(): usage: documented that $table is actually required for this function
11849	12/06/2013 02:44 AM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: install: for new-style datasources, use the associated runscript instead (the old-style install target will not do everything that's needed for a new-style datasource)
11848	12/06/2013 01:57 AM	Aaron Marcuse-Kubitza	bugfix: /Makefile: moved inputs/reinstall to end so it overrides the corresponding subdir forwarding target
11847	12/06/2013 12:51 AM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: install: for new-style datasources, use the associated runscript instead (the old-style install target will not do everything that's needed for a new-style datasource)
11846	12/06/2013 12:27 AM	Aaron Marcuse-Kubitza	bugfix: /Makefile: inputs/install: don't run bin/reinstall_all here, because /install targets are supposed to be idempotent, forward-only actions that don't first remove existing data
11845	12/05/2013 11:55 PM	Aaron Marcuse-Kubitza	bugfix: /Makefile: postgres-Darwin: don't prepend $(MAKE) to $(postgresReload-Darwin), because this is now a list of commands
11844	12/05/2013 11:52 PM	Aaron Marcuse-Kubitza	bugfix: /Makefile: config: ignore errors if ~/bin/make exists
11843	12/05/2013 11:38 PM	Aaron Marcuse-Kubitza	inputs/FIA/COND/postprocess.sql: filtering formula: documented that this was created by Brad, and provided the URL to it on nimoy
11842	12/05/2013 12:27 PM	Aaron Marcuse-Kubitza	inputs/CVS/cvs.~.clean_up.sql: remove plot.realLatitude/realLongitude, since this is private data that should not be publicly visible
11841	12/05/2013 12:19 PM	Aaron Marcuse-Kubitza	inputs/CVS/cvs.~.clean_up.sql: remove plot.realLatitude/realLongitude, since this is private data that should not be publicly visible
11840	12/05/2013 08:38 AM	Aaron Marcuse-Kubitza	bin/make_analytical_db: don't regenerate family_higher_plant_group from the NCBI data because the lookup table is now prepopulated as part of the schema
11839	12/05/2013 08:37 AM	Aaron Marcuse-Kubitza	bin/import_all: don't import NCBI because the lookup table is now prepopulated as part of the schema
11838	12/05/2013 08:35 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: include the family_higher_plant_group lookup table values so that these don't need to be regenerated from the NCBI nodes whenever the DB is reloaded
11837	12/05/2013 07:58 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonlabel_update_ancestors(): don't do an index scan if the value being scanned for is NULL, to support testing this function without the indexes in place, without extra full-table scans for NULL values affecting things. this can be used to determine if the function is actually using the indexes, by turning them off and seeing if the runtime changes.
11836	12/05/2013 07:03 AM	Aaron Marcuse-Kubitza	schemas/util.sql: explain2table(): documented usage: PERFORM util.explain2table($$ query $$);
11835	12/05/2013 05:52 AM	Aaron Marcuse-Kubitza	schemas/util.sql: explain2table(): by default, use the util.explain table
11834	12/05/2013 05:49 AM	Aaron Marcuse-Kubitza	schemas/util.sql: added explain table
11833	12/05/2013 05:47 AM	Aaron Marcuse-Kubitza	schemas/util.sql: added explain2notice()
11832	12/05/2013 05:44 AM	Aaron Marcuse-Kubitza	schemas/util.sql: added explain2str()
11831	12/05/2013 05:33 AM	Aaron Marcuse-Kubitza	schemas/util.sql: added explain2table()
11830	12/05/2013 05:23 AM	Aaron Marcuse-Kubitza	schemas/util.sql: added explain()
11829	12/05/2013 01:31 AM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: taxonlabel_update_ancestors(): don't create a performance-intensive nested transaction (EXCEPTION block) for each INSERT, because there should no longer be duplicate ancestors, so it's OK to abort the whole transaction if this assertion fails
11828	12/05/2013 01:03 AM	Aaron Marcuse-Kubitza	bugfix: schemas/vegbien.sql: taxonlabel_update_ancestors_on_{insert,update}(): only use either the matched taxon's ancestors or the parent's ancestors, to avoid issues related to duplication between these two ancestors lists. this also fixes a bug where the 2nd taxonlabel_update_ancestors() call assumes that the existing ancestors are for the old parent, when in fact they have actually just been set to those for the new matched taxon (which horribly confuses taxonlabel_update_ancestors()).
11827	12/04/2013 10:06 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: _taxonlabel_set_parent_id(): just use a plain UPDATE statement, to avoid the significant parsing and stringification overhead of EXECUTE and quote_nullable(). it is not clear that EXECUTE is actually necessary to avoid caching the query plan, because the cache should be invalidated automatically when the table's ANALYZE statistics are regenerated.
11826	12/04/2013 10:00 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: removed unused function _taxonlabel_set_matched_label_id(), which refers to obsolete fields
11825	12/04/2013 09:58 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: synced to DB (the view renderer apparently changed the text of a view)
11824	12/04/2013 09:44 PM	Aaron Marcuse-Kubitza	backups/TNRS.backup: saved copy backups/TNRS.2013-11-18.backup
11823	12/04/2013 07:26 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: run in errexit mode, so that if the user cancels reinstalling of the import schema, the script will then abort instead of continuing and using the wrong schema
11822	12/04/2013 06:56 PM	Aaron Marcuse-Kubitza	bugfix: schemas/Makefile: %/uninstall: always confirm before removing an existing schema, not just for public and r*, because an auxiliary schema might also be used as $version and reinstalled by bin/import_all
11821	12/04/2013 06:04 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: analytical_stem_view: scrubbed_author: removed empty COALESCE around value (left over from when multiple values needed to be combined for many TNRS fields)
11820	12/04/2013 04:57 PM	Aaron Marcuse-Kubitza	inputs/CVS/^taxon_observation.**.sample/create.sql: uncommented identifiedBy since this is now part of taxonObservation_
11819	12/04/2013 04:08 PM	Aaron Marcuse-Kubitza	fix: inputs/CVS/observation_community/create.sql: communityName: populate from commConcept.commName instead, because commInterpretation.commname is not always populated. this requires left-joining to commConcept.
11818	12/04/2013 03:58 PM	Aaron Marcuse-Kubitza	inputs/CVS/observation_community/map.csv: updated output column names to new input column names, to avoid later output column collisions
11817	12/04/2013 03:42 PM	Aaron Marcuse-Kubitza	inputs/CVS/observation_community/header.csv, map.csv: updated input column names for cvs.~.clean_up.sql renamings
11816	12/04/2013 03:12 PM	Aaron Marcuse-Kubitza	schemas/vegbien.sql: provider_count_view: source totals: use the much faster query developed for Brad (wiki.vegpath.org/VegBIEN_FAQ#from-Brad-on-2013-12-4), which avoids the need to do a GROUP BY on all of analytical_stem. eventually, we will want to apply the same optimization to the first publisher subtotals.
11815	12/04/2013 04:19 AM	Aaron Marcuse-Kubitza	inputs/CVS/cvs.~.clean_up.sql: commClass, commConcept fields: prepend table name to avoid inter-table collisions upon join
11814	12/04/2013 03:43 AM	Aaron Marcuse-Kubitza	added inputs/CVS/observation_community/, as for VegBank
11813	12/04/2013 03:32 AM	Aaron Marcuse-Kubitza	inputs/CVS/cvs.~.clean_up.sql: commClass.dba_src_ID: prepend table name to avoid inter-table collisions upon join
11812	12/03/2013 04:32 PM	Aaron Marcuse-Kubitza	added inputs/CVS/observationContributor_/, which adds the people collecting the plot
11811	12/03/2013 04:02 PM	Aaron Marcuse-Kubitza	inputs/CVS/cvs.~.clean_up.sql: observationContributor.dba_src_ID: prepended table name to avoid collision when left-joining to party
11810	12/03/2013 03:44 PM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: %/header.csv: errexit the command so that errors won't scroll by, which in this case requires `set -o pipefail`
11809	12/03/2013 02:57 PM	Aaron Marcuse-Kubitza	fix: inputs/CVS/taxonObservation_/create.sql: mapped identifiedBy, which involves joining to party
11808	12/03/2013 02:35 PM	Aaron Marcuse-Kubitza	inputs/CVS/cvs.~.clean_up.sql: don't rename taxonInterpretation.PARTY_ID, so that this can be USING-joined to party in inputs/CVS/taxonObservation_/create.sql
11807	12/03/2013 01:47 PM	Aaron Marcuse-Kubitza	schemas/vegbien.ERD.mwb: regenerated exports
11806	12/03/2013 08:58 AM	Aaron Marcuse-Kubitza	bin/map: support param start="", which indicates the default value. this fixes a bug in inputs/input.Makefile $(restart_row), which outputs "" if an explicit starting row is not found.
11805	12/03/2013 08:25 AM	Aaron Marcuse-Kubitza	inputs/CVS/^taxon_observation.*.sample/map.csv: synced output columns to input columns (which removes the extra s)
11804	12/03/2013 08:00 AM	Aaron Marcuse-Kubitza	fix: inputs/CVS/plot_/postprocess.sql: locality: include the site name (authorLocation), because this is part of the unique specification of the place that was sampled, and Bob wants this to be included in VegBIEN
11803	12/03/2013 07:58 AM	Aaron Marcuse-Kubitza	inputs/CVS/^taxon_observation.**.sample/create.sql: removed parentLocationID, since this is unused in CVS
11802	12/03/2013 07:45 AM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: `%/install: %/create.sql`: errexit the command so that errors won't scroll by, which in this case requires `set -o pipefail`
11801	12/03/2013 06:51 AM	Aaron Marcuse-Kubitza	inputs/VegBank/plot/postprocess.sql: locality: include the site name (authorlocation), because this is part of the unique specification of the place that was sampled
11800	12/03/2013 06:27 AM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: Full database import: To restart an aborted import for a specific table: run the two commands in errexit mode so that the datasource does not incorrectly have the temp suffix removed if the import command exited with an error
11799	12/03/2013 05:19 AM	Aaron Marcuse-Kubitza	fix: inputs/CVS/taxon_observation.**/map.csv: omit authorPlantName because it is not specific to the taxonInterpretation row (this is in a separate taxonInterpretation for the original determination instead)
11798	12/03/2013 04:59 AM	Aaron Marcuse-Kubitza	web/links/index.htm: updated to Firefox bookmarks. PostgreSQL: added links for troubleshooting out-of-memory errors, which show up (cryptically) as "The database system is in recovery mode" errors in processes running at the time the out-of-memory condition occurred.
11797	12/03/2013 02:31 AM	Aaron Marcuse-Kubitza	schemas/postgresql.conf: work_mem: documented that this seemingly small # is multiplied by max_connections, i.e. 256 MB * 100 = 26 GB, which approaches total memory (32 GB)
11796	12/02/2013 02:46 PM	Aaron Marcuse-Kubitza	fix: inputs/CVS/plot_/map.csv: PARENT_ID: remapped to UNUSED, to clarify that subplots are not implemented through this field
11795	11/27/2013 11:16 PM	Aaron Marcuse-Kubitza	bugfix: /README.TXT: Full database import: To restart an aborted import for a specific table: added command to remove the temp suffix from the source table entry, which is not automatic for importing a specific table (only for importing the entire datasource, at the end of which the datasource is considered completely imported and ready to overwrite any previous import)
11794	11/27/2013 11:04 PM	Aaron Marcuse-Kubitza	inputs/input.Makefile: scrub: clarified that using & (background process) also ignores TNRS errors (the primary purpose of & , of course, is to run asynchronously)
11793	11/27/2013 10:42 PM	Aaron Marcuse-Kubitza	bugfix: schemas/Makefile: $(confirmRmPublicSchema): only prompt to delete the schema if it actually exists. this avoids prompting to remove a non-existent schema at the beginning of bin/import_all, which requires user attention. since bin/import_all is often run with a delayed start (e.g. to wait for a staging table reinstall to complete), the user may not be at the terminal when this message is displayed, and without this fix, the import would be prevented from running until they return.
11792	11/27/2013 09:24 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/run: import() runtime: added starscream runtime (20 min)
11791	11/27/2013 08:48 PM	Aaron Marcuse-Kubitza	planning/timeline/timeline.2013.xls: updated for progress
11790	11/27/2013 08:33 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/run: documented import() runtime (15 min)
11789	11/26/2013 11:18 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/Source/map.csv: source__modified_date: updated for current run
11788	11/26/2013 11:11 PM	Aaron Marcuse-Kubitza	**/new_terms.csv, unmapped_terms.csv updated (using `make missing_mappings`)
11787	11/26/2013 11:10 PM	Aaron Marcuse-Kubitza	/README.TXT: Full database import: documented that `make schemas/reinstall` requires sudo access
11786	11/26/2013 11:07 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: updated upload time (30 s)
11785	11/26/2013 11:00 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: export_(): updated runtime (25 s)
11784	11/26/2013 10:58 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: import_vars: don't overwrite vars that are already defined, to allow the caller to specify their own values for the vars to create. this requires callers that rely on the overwriting functionality to reverse the order in which they run use_* commands, so that the higher-precedence use_* is applied first and the other one as the default values for the first.
11783	11/26/2013 10:03 PM	Aaron Marcuse-Kubitza	derived/biengeo/README.txt: updated geoscrub.sh runtime
11782	11/26/2013 09:57 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: make(): derived/biengeo/geoscrub.sh: documented runtime (2.5 h)
11781	11/26/2013 09:45 PM	Aaron Marcuse-Kubitza	inputs/.geoscrub/geoscrub_output/geoscrub.csv.run: don't connect to DB as the root user, because this is not needed now that the geoscrub schema is owned by the bien user. this avoids a sudo password prompt at the end of the geoscrubbing run.
11780	11/26/2013 07:38 PM	Aaron Marcuse-Kubitza	planning/timeline/timeline.2013.xls: rescheduled tasks
11779	11/26/2013 06:51 PM	Aaron Marcuse-Kubitza	planning/timeline/timeline.2013.xls: rescheduled tasks
11778	11/26/2013 06:41 PM	Aaron Marcuse-Kubitza	planning/timeline/timeline.2013.xls: updated for progress
11777	11/26/2013 02:23 PM	Aaron Marcuse-Kubitza	bugfix: inputs/input.Makefile: $(import): except in a full-database import, errexit so that the import will stop on an error and not let it scroll by
11776	11/26/2013 01:55 PM	Aaron Marcuse-Kubitza	added inputs/CVS/^taxon_observation.**.sample/, used for the extract. note that the column list is slightly different than for VegBank.
11775	11/26/2013 01:42 PM	Aaron Marcuse-Kubitza	inputs/CVS/taxonObservation_/map.csv: removed taxonObservation_-- prefix from terms that do not need to be table-specific (like for VegBank)
11774	11/26/2013 01:32 PM	Aaron Marcuse-Kubitza	fix: inputs/CVS/taxonObservation_/map.csv: plantConcept_ columns: synced input and output column names to their names in plantConcept_

Project

General

Profile