Project

General

Profile

Statistics
| Revision:

# Date Author Comment
11904 12/11/2013 10:42 PM Aaron Marcuse-Kubitza

bugfix: inputs/VegBank/plot/: added _no_import because this table is left-joined and should not be imported separately

11903 12/11/2013 10:40 PM Aaron Marcuse-Kubitza

bugfix: inputs/{.NCBI,CTFS}/*.src/: added _no_import because these tables are left-joined and should not be imported separately

11902 12/11/2013 09:56 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: removed table names from datasources where only one table is imported

11901 12/11/2013 09:52 PM Aaron Marcuse-Kubitza

fix: inputs/import.stats.xls: removed deleted tables from current import

11900 12/11/2013 09:51 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: updated import times

11899 12/11/2013 07:56 PM Aaron Marcuse-Kubitza

updated backups/TNRS.backup.md5

11898 12/11/2013 07:56 PM Aaron Marcuse-Kubitza

added backups/vegbien.r11786.backup.md5

11897 12/11/2013 07:53 PM Aaron Marcuse-Kubitza

/README.TXT: Full database import: backups: added step to download backup to local machine

11896 12/11/2013 07:45 PM Aaron Marcuse-Kubitza

bugfix: /Makefile: install: need to run inputs/download in live mode so that the flat files are actually downloaded

11895 12/11/2013 07:43 PM Aaron Marcuse-Kubitza

lib/common.Makefile: added %/live, for use with `make inputs/download`

11894 12/10/2013 07:44 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: rescheduled tasks

11893 12/10/2013 07:40 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: updated for progress

11892 12/10/2013 07:36 AM Aaron Marcuse-Kubitza

/README.TXT: Full database import: In PostgreSQL: documented that the tables to check are located in the r# schema, not public

11891 12/10/2013 07:32 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: updated for progress

11890 12/10/2013 07:32 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: datasource validations: reordered datasources according to Brian Enquist's new validation order (wiki.vegpath.org/Spot-checking_validation_order)

11889 12/10/2013 07:10 AM Aaron Marcuse-Kubitza

fix: schemas/vegbien.sql: analytical_specimen: added specimens-related columns that are in analytical_plot

11888 12/10/2013 06:35 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/map.csv: row_num: remapped to plain *row_num, like the other datasources that have this field

11887 12/10/2013 06:31 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: rerun time: noted that this is only fast after manual vacuuming of the table (to remove the deleted rows from the index). autovacuum apparently does not run, although it should.

11886 12/10/2013 05:18 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: hid previous weeks

11885 12/10/2013 05:18 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: added timespan dots ◦ for supertasks

11884 12/10/2013 05:15 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: legend: changed to movable text box to avoid needing to erase and repopulate the header columns with the legend cells

11883 12/10/2013 05:03 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: crossed out and hid completed tasks

11882 12/10/2013 04:58 AM Aaron Marcuse-Kubitza

planning/timeline/timeline.2013.xls: updated for progress

11881 12/09/2013 07:24 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/test.xml.ref: reran test, which added yearCollected/monthCollected/dayCollected

11880 12/09/2013 07:23 PM Aaron Marcuse-Kubitza

inputs/CVS/plantConcept_/create.sql: documented runtime (3 min)

11879 12/09/2013 06:59 PM Aaron Marcuse-Kubitza

inputs/CTFS/*.src/: added test.xml.ref

11878 12/09/2013 06:58 PM Aaron Marcuse-Kubitza

inputs/CTFS/*.src/: added VegBIEN.csv

11877 12/09/2013 06:56 PM Aaron Marcuse-Kubitza

bugfix: inputs/CTFS/TaxonOccurrence*/map.csv: things mapped to taxonObservationID: remapped to taxonOccurrenceID since taxonObservationID is not mapped to anything in VegBIEN (denormalized VegCore doesn't distinguish between taxon occurrences and taxon observations of them)

11876 12/09/2013 05:46 PM Aaron Marcuse-Kubitza

bugfix: inputs/ARIZ/~.clean_up.sql: prevent "column already exists" errors when there is an input column of the same name as an output column

11875 12/09/2013 05:44 PM Aaron Marcuse-Kubitza

bugfix: lib/runscripts/datasrc_dir.run: import(): don't run `sql/install` if the schema already exists, because this will try to rerun all the schema-creation queries. note that this idempotent functionality was not provided by the `make .../install` target that was previously used (idempotency is new with new-style import).

11874 12/09/2013 05:26 PM Aaron Marcuse-Kubitza

bugfix: schemas/vegbien.sql: updated for renamed county_centroids column names

11873 12/09/2013 04:16 PM Aaron Marcuse-Kubitza

inputs/.geoscrub/import_order.txt: added county_centroids so that it would be installed by new-style import

11872 12/09/2013 03:54 PM Aaron Marcuse-Kubitza

bugfix: lib/runscripts/datasrc_dir.run: import(): can't run `datasrc_make reinstall` anymore because this now defers to the runscript for new-style import datasources (which was done so that `make .../install` properly reinstalls all the datasources). instead, call the applicable make targets manually (there are just 2 of them).

11871 12/09/2013 03:37 PM Aaron Marcuse-Kubitza

inputs/FIA/TREE/run: documented import() runtime (1.5 h), which includes table cleanup runtime (1 h)

11870 12/09/2013 03:09 PM Aaron Marcuse-Kubitza

bugfix: bin/pg_dump_limit: support errexit by ignoring the nonzero exit status that grep returns when it doesn't match anything

11869 12/09/2013 02:43 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/run: updated import() runtime (same), documented table cleanup runtime (1.5 h)

11868 12/09/2013 02:38 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: CREATE INDEX ... specimenHolderInstitutions: documented runtime (45 min)

11867 12/09/2013 02:28 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: documented runtime (3.5 min)

11866 12/09/2013 02:27 PM Aaron Marcuse-Kubitza

/README.TXT: Datasource setup: added steps to backup e-mails

11865 12/06/2013 07:46 AM Aaron Marcuse-Kubitza

bugfix: inputs/CTFS/import_order.txt: added *.src so that these would be installed under new-style import as well. this means that their columns will now be automapped, requiring the names to be renamed to VegCore names in */create.sql. note that VegCore taxonOccurrenceID has been renamed to taxonObservationID since this was last run.

11864 12/06/2013 06:56 AM Aaron Marcuse-Kubitza

inputs/.geoscrub/run: documented import() runtime (20 min)

11863 12/06/2013 06:12 AM Aaron Marcuse-Kubitza

bugfix: inputs/.NCBI/import_order.txt: added nodes.src, names.src so that these would be installed under new-style import as well. this means that their columns will now be automapped, requiring the names to be renamed to VegCore names in nodes/create.sql.

11862 12/06/2013 06:01 AM Aaron Marcuse-Kubitza

fix: /Makefile: inputs/reinstall: commented out to avoid a cascade of "overriding commands for target" warnings. this will revert to the default uninstall, install sequence for this target rather than the simultaneous-reinstall optimization (which can still be invoked manually).

11861 12/06/2013 05:52 AM Aaron Marcuse-Kubitza

lib/sh/local.sh: public_schema_exists(): use a higher log_level for pg_schema_exists, to avoid all the verbose output involved in running the query

11860 12/06/2013 05:44 AM Aaron Marcuse-Kubitza

bugfix: lib/sh/local.sh: public_schema_exists(): can no longer use psql_script_vegbien for this, because using `SET search_path` (called by psql_script_vegbien) with a schema that does not exist no longer produces an error. instead, use new pg_schema_exists(), which uses a different command that does produce an error if the schema does not exist.

11859 12/06/2013 05:38 AM Aaron Marcuse-Kubitza

lib/sh/db.sh: added pg_require_schema()

11858 12/06/2013 05:37 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: stderr2stdout(): documented that this redirects fd 2->1 and log_fd (but not back to 2)

11857 12/06/2013 05:34 AM Aaron Marcuse-Kubitza

bugfix: lib/sh/util.sh: stderr2stdout() use `command` before tee, which re-filters log_fd so that stderr itself is also filtered. this allows log-filtering out an otherwise-confusing benign error when using e.g. stderr_matches().

11856 12/06/2013 04:31 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: added not(), for use in prefixing wrapped commands

11855 12/06/2013 04:14 AM Aaron Marcuse-Kubitza

lib/sh/db.sh: added pg_schema_exists()

11854 12/06/2013 04:10 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: added stderr_matches()

11853 12/06/2013 03:59 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: documented that fds 2x/3x should not be used because we use these, as opposed to 1x which is used by the shell internally

11852 12/06/2013 03:57 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: added stdout_contains()

11851 12/06/2013 03:34 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: added stderr2stdout()

11850 12/06/2013 02:52 AM Aaron Marcuse-Kubitza

fix: lib/sh/db.sh: pg_table_exists(): usage: documented that $table is actually required for this function

11849 12/06/2013 02:44 AM Aaron Marcuse-Kubitza

bugfix: inputs/input.Makefile: install: for new-style datasources, use the associated runscript instead (the old-style install target will not do everything that's needed for a new-style datasource)

11848 12/06/2013 01:57 AM Aaron Marcuse-Kubitza

bugfix: /Makefile: moved inputs/reinstall to end so it overrides the corresponding subdir forwarding target

11847 12/06/2013 12:51 AM Aaron Marcuse-Kubitza

bugfix: inputs/input.Makefile: install: for new-style datasources, use the associated runscript instead (the old-style install target will not do everything that's needed for a new-style datasource)

11846 12/06/2013 12:27 AM Aaron Marcuse-Kubitza

bugfix: /Makefile: inputs/install: don't run bin/reinstall_all here, because /install targets are supposed to be idempotent, forward-only actions that don't first remove existing data

11845 12/05/2013 11:55 PM Aaron Marcuse-Kubitza

bugfix: /Makefile: postgres-Darwin: don't prepend $(MAKE) to $(postgresReload-Darwin), because this is now a list of commands

11844 12/05/2013 11:52 PM Aaron Marcuse-Kubitza

bugfix: /Makefile: config: ignore errors if ~/bin/make exists

11843 12/05/2013 11:38 PM Aaron Marcuse-Kubitza

inputs/FIA/COND/postprocess.sql: filtering formula: documented that this was created by Brad, and provided the URL to it on nimoy

11842 12/05/2013 12:27 PM Aaron Marcuse-Kubitza

inputs/CVS/cvs.~.clean_up.sql: remove plot.realLatitude/realLongitude, since this is private data that should not be publicly visible

11841 12/05/2013 12:19 PM Aaron Marcuse-Kubitza

inputs/CVS/cvs.~.clean_up.sql: remove plot.realLatitude/realLongitude, since this is private data that should not be publicly visible

11840 12/05/2013 08:38 AM Aaron Marcuse-Kubitza

bin/make_analytical_db: don't regenerate family_higher_plant_group from the NCBI data because the lookup table is now prepopulated as part of the schema

11839 12/05/2013 08:37 AM Aaron Marcuse-Kubitza

bin/import_all: don't import NCBI because the lookup table is now prepopulated as part of the schema

11838 12/05/2013 08:35 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: include the family_higher_plant_group lookup table values so that these don't need to be regenerated from the NCBI nodes whenever the DB is reloaded

11837 12/05/2013 07:58 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel_update_ancestors(): don't do an index scan if the value being scanned for is NULL, to support testing this function without the indexes in place, without extra full-table scans for NULL values affecting things. this can be used to determine if the function is actually using the indexes, by turning them off and seeing if the runtime changes.

11836 12/05/2013 07:03 AM Aaron Marcuse-Kubitza

schemas/util.sql: explain2table(): documented usage:
PERFORM util.explain2table($$
query
$$);

11835 12/05/2013 05:52 AM Aaron Marcuse-Kubitza

schemas/util.sql: explain2table(): by default, use the util.explain table

11834 12/05/2013 05:49 AM Aaron Marcuse-Kubitza

schemas/util.sql: added explain table

11833 12/05/2013 05:47 AM Aaron Marcuse-Kubitza

schemas/util.sql: added explain2notice()

11832 12/05/2013 05:44 AM Aaron Marcuse-Kubitza

schemas/util.sql: added explain2str()

11831 12/05/2013 05:33 AM Aaron Marcuse-Kubitza

schemas/util.sql: added explain2table()

11830 12/05/2013 05:23 AM Aaron Marcuse-Kubitza

schemas/util.sql: added explain()

11829 12/05/2013 01:31 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: taxonlabel_update_ancestors(): don't create a performance-intensive nested transaction (EXCEPTION block) for each INSERT, because there should no longer be duplicate ancestors, so it's OK to abort the whole transaction if this assertion fails

11828 12/05/2013 01:03 AM Aaron Marcuse-Kubitza

bugfix: schemas/vegbien.sql: taxonlabel_update_ancestors_on_{insert,update}(): only use either the matched taxon's ancestors or the parent's ancestors, to avoid issues related to duplication between these two ancestors lists. this also fixes a bug where the 2nd taxonlabel_update_ancestors() call assumes that the existing ancestors are for the old parent, when in fact they have actually just been set to those for the new matched taxon (which horribly confuses taxonlabel_update_ancestors()).

11827 12/04/2013 10:06 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: _taxonlabel_set_parent_id(): just use a plain UPDATE statement, to avoid the significant parsing and stringification overhead of EXECUTE and quote_nullable(). it is not clear that EXECUTE is actually necessary to avoid caching the query plan, because the cache should be invalidated automatically when the table's ANALYZE statistics are regenerated.

11826 12/04/2013 10:00 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: removed unused function _taxonlabel_set_matched_label_id(), which refers to obsolete fields

11825 12/04/2013 09:58 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: synced to DB (the view renderer apparently changed the text of a view)

11824 12/04/2013 09:44 PM Aaron Marcuse-Kubitza

backups/TNRS.backup: saved copy backups/TNRS.2013-11-18.backup

11823 12/04/2013 07:26 PM Aaron Marcuse-Kubitza

bugfix: bin/import_all: run in errexit mode, so that if the user cancels reinstalling of the import schema, the script will then abort instead of continuing and using the wrong schema

11822 12/04/2013 06:56 PM Aaron Marcuse-Kubitza

bugfix: schemas/Makefile: %/uninstall: always confirm before removing an existing schema, not just for public and r*, because an auxiliary schema might also be used as $version and reinstalled by bin/import_all

11821 12/04/2013 06:04 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: analytical_stem_view: scrubbed_author: removed empty COALESCE around value (left over from when multiple values needed to be combined for many TNRS fields)

11820 12/04/2013 04:57 PM Aaron Marcuse-Kubitza

inputs/CVS/^taxon_observation.**.sample/create.sql: uncommented identifiedBy since this is now part of taxonObservation_

11819 12/04/2013 04:08 PM Aaron Marcuse-Kubitza

fix: inputs/CVS/observation_community/create.sql: communityName: populate from commConcept.commName instead, because commInterpretation.commname is not always populated. this requires left-joining to commConcept.

11818 12/04/2013 03:58 PM Aaron Marcuse-Kubitza

inputs/CVS/observation_community/map.csv: updated output column names to new input column names, to avoid later output column collisions

11817 12/04/2013 03:42 PM Aaron Marcuse-Kubitza

inputs/CVS/observation_community/header.csv, map.csv: updated input column names for cvs.~.clean_up.sql renamings

11816 12/04/2013 03:12 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: provider_count_view: source totals: use the much faster query developed for Brad (wiki.vegpath.org/VegBIEN_FAQ#from-Brad-on-2013-12-4), which avoids the need to do a GROUP BY on all of analytical_stem. eventually, we will want to apply the same optimization to the first publisher subtotals.

11815 12/04/2013 04:19 AM Aaron Marcuse-Kubitza

inputs/CVS/cvs.~.clean_up.sql: commClass, commConcept fields: prepend table name to avoid inter-table collisions upon join

11814 12/04/2013 03:43 AM Aaron Marcuse-Kubitza

added inputs/CVS/observation_community/, as for VegBank

11813 12/04/2013 03:32 AM Aaron Marcuse-Kubitza

inputs/CVS/cvs.~.clean_up.sql: commClass.dba_src_ID: prepend table name to avoid inter-table collisions upon join

11812 12/03/2013 04:32 PM Aaron Marcuse-Kubitza

added inputs/CVS/observationContributor_/, which adds the people collecting the plot

11811 12/03/2013 04:02 PM Aaron Marcuse-Kubitza

inputs/CVS/cvs.~.clean_up.sql: observationContributor.dba_src_ID: prepended table name to avoid collision when left-joining to party

11810 12/03/2013 03:44 PM Aaron Marcuse-Kubitza

bugfix: inputs/input.Makefile: %/header.csv: errexit the command so that errors won't scroll by, which in this case requires `set -o pipefail`

11809 12/03/2013 02:57 PM Aaron Marcuse-Kubitza

fix: inputs/CVS/taxonObservation_/create.sql: mapped identifiedBy, which involves joining to party

11808 12/03/2013 02:35 PM Aaron Marcuse-Kubitza

inputs/CVS/cvs.~.clean_up.sql: don't rename taxonInterpretation.PARTY_ID, so that this can be USING-joined to party in inputs/CVS/taxonObservation_/create.sql

11807 12/03/2013 01:47 PM Aaron Marcuse-Kubitza

schemas/vegbien.ERD.mwb: regenerated exports

11806 12/03/2013 08:58 AM Aaron Marcuse-Kubitza

bin/map: support param start="", which indicates the default value. this fixes a bug in inputs/input.Makefile $(restart_row), which outputs "" if an explicit starting row is not found.

11805 12/03/2013 08:25 AM Aaron Marcuse-Kubitza

inputs/CVS/^taxon_observation.**.sample/map.csv: synced output columns to input columns (which removes the extra *s)