Activity
From 11/29/2013 to 12/28/2013
12/20/2013
- 10:45 PM Revision 11939: fix: planning/timeline/timeline.xls: realigned legend
- 10:44 PM Revision 11938: planning/timeline/timeline.xls: hid previous weeks
- 10:43 PM Revision 11937: planning/timeline/timeline.xls: crossed out and hid completed tasks
- 10:41 PM Revision 11936: planning/timeline/timeline.xls: updated for progress
- 05:41 PM Revision 11935: web/links/index.htm: updated to Firefox bookmarks. open-sourcing BIEN: added links for VegBank's license (GPLv2), and how these terms apply to us (a diff of our changes is not GPL-ed under GPLv2, although it is claimed to be GPL-ed under GPLv3)
- 04:41 PM Revision 11934: inputs/VegBank/^taxon_observation.**.sample/create.sql, map.csv: added new project columns
- 04:31 PM Revision 11933: inputs/VegBank/taxon_observation.**/postprocess.sql: added the project table
- 04:25 PM Revision 11932: mapped inputs/VegBank/project/, which includes the projectName for attribution
- 02:56 PM Revision 11931: inputs/CVS/^taxon_observation.**.sample/create.sql, map.csv: added new project columns
- 02:44 PM Revision 11930: inputs/CVS/taxon_observation.**/postprocess.sql: added the project table
- 02:42 PM Revision 11929: inputs/CVS/project/map.csv: mapped stopDate->projectEndDate
- 02:35 PM Revision 11928: mapped inputs/CVS/project/, which includes the projectName for attribution
- 01:25 AM Revision 11927: inputs/VegBIEN/Redmine/svn/.htaccess: updated to use *much* faster direct repository URL rather than Redmine web interface, now that the repository itself is publicly accessible in addition to the Redmine view of it
- 01:18 AM Revision 11926: planning/timeline/timeline.xls: updated for progress
- 01:13 AM Revision 11925: planning/timeline/timeline.2013.xls: renamed to timeline.xls so that this can continue to be used for 2014 (leaving a symlink from the old filename to preserve permalinks)
- 12:28 AM Revision 11924: fix: inputs/TEX/Specimen*/map.csv, postprocess.sql: habitat: also placed in occurrenceRemarks so that this field gets parsed for growth form information, as requested by Brad (wiki.vegpath.org/TEX_validation#2013-2-26)
12/19/2013
- 11:49 PM Revision 11923: fix: inputs/TEX/Specimen*/map.csv: mapped constant values for specimenHolderInstitutions, country. these have to be added with `rm=1 ./inputs/TEX/Specimen.../run postprocess`.
- 11:42 PM Revision 11922: bugfix: inputs/TEX/Specimen2/map.csv: mapped BARCODE to accessionNumber so that we have a unique ID for each row
- 11:11 PM Revision 11921: schemas/vegbien.sql: analytical_stem_view: scientificName_verbatim: don't use taxonverbatim.taxonname+author as the scientificName_verbatim if only the author is provided. (this lead to weird scientificName_verbatims that contain just the author.)
12/17/2013
- 08:06 AM Revision 11920: inputs/datasource_release_status.xlsx: updated
- 07:28 AM Revision 11919: web/links/index.htm: updated to Firefox bookmarks. added links for fixing the "App Store is temporarily unavailable" error (turn on Spotlight) and modifying a running shell script (unlink it first).
- 05:47 AM Revision 11918: bugfix: bin/map: in_is_db: don't ignore errors when the table does not exist, because these prevent an errexit and allow an import to continue when a staging table is missing. suppressing this error had previously been necessary because metadata-only tables (Source/) used to not have installed staging tables, and the program had to react accordingly.
12/16/2013
- 07:05 PM Revision 11917: inputs/CVS/^taxon_observation.**.sample/create.sql: added Mike Lee's additional plots used to validate confidentiality-related fields (wiki.vegpath.org/CVS_validation#plots-to-include)
- 06:00 PM Revision 11916: bugfix: inputs/CVS/^taxon_observation.**.sample/create.sql: include taxonName in the subset of columns that's imported for the validation, because it is _alt-ed with scientificName for forming the TNRS input name. this is unique to CVS, which is why it was not part of the validation subset copied from the VegBank subset.
- 05:46 PM Revision 11915: /README.TXT: Full database import: documented that you should always start with a clean shell, which does not have changes to the env vars. (there have been inexplicable bugs that went away after closing and reopening the terminal window.) note that running `exec bash` is not sufficient to *reset* the env vars.
- 04:58 PM Revision 11914: fix: lib/sh/util.sh: verbosity_min(): usage: clarified that '' is a special value that causes $verbosity to be overwritten to ''
- 04:45 PM Revision 11913: lib/runscripts/table.run: added test_() target and use it in remake_VegBIEN_mappings() (it would not be clear that remake_VegBIEN_mappings() runs the tests)
- 01:43 PM Revision 11912: bugfix: inputs/.TNRS/schema.sql: granted bien_read SELECT access to derived views as well as the core tnrs table
12/15/2013
- 05:30 PM Revision 11911: updated inputs/datasource_release_status.xlsx
- 05:27 PM Revision 11910: added inputs/datasource_release_status.xlsx, export of Google spreadsheet at https://docs.google.com/spreadsheet/ccc?key=0ArZXrTAXd-TYdDRRb2RxYi11TWZrQVh5bVdKOURCeFE
12/12/2013
- 08:57 AM Revision 11909: planning/timeline/timeline.2013.xls: updated for progress
- 08:35 AM Revision 11908: bugfix: schemas/vegbien.sql: location: use the place_id from the parent location when no place_id is specified. this fixes a bug in analytical_stem_view where the parent location's place_id was used because it was sometimes missing from the sublocation, but the parent place_id *itself* was sometimes missing instead if sublocations each had their own place information. this way, it is always available directly in the sublocation, populated from the parent location if needed.
- 08:27 AM Revision 11907: bugfix: schemas/vegbien.sql: location: added place_id which is autopopulated from the current locationplace. join on this in plot.**, to avoid a 1:many join when a location has multiple locationplaces.
12/11/2013
- 11:10 PM Revision 11906: bugfix: schemas/vegbien.sql: locationevent_unique_within_parent_by_location unique index: need COALESCE() around location_id since it's nullable
- 10:54 PM Revision 11905: fix: inputs/CVS/^taxon_observation.**.sample/: added _no_import because this table duplicates part of what's imported from taxon_observation.**
- 10:42 PM Revision 11904: bugfix: inputs/VegBank/plot/: added _no_import because this table is left-joined and should not be imported separately
- 10:40 PM Revision 11903: bugfix: inputs/{.NCBI,CTFS}/*.src/: added _no_import because these tables are left-joined and should not be imported separately
- 09:56 PM Revision 11902: inputs/import.stats.xls: removed table names from datasources where only one table is imported
- 09:52 PM Revision 11901: fix: inputs/import.stats.xls: removed deleted tables from current import
- 09:51 PM Revision 11900: inputs/import.stats.xls: updated import times
- 07:56 PM Revision 11899: updated backups/TNRS.backup.md5
- 07:56 PM Revision 11898: added backups/vegbien.r11786.backup.md5
- 07:53 PM Revision 11897: /README.TXT: Full database import: backups: added step to download backup to local machine
- 07:45 PM Revision 11896: bugfix: /Makefile: install: need to run inputs/download in live mode so that the flat files are actually downloaded
- 07:43 PM Revision 11895: lib/common.Makefile: added %/live, for use with `make inputs/download`
12/10/2013
- 07:44 AM Revision 11894: planning/timeline/timeline.2013.xls: rescheduled tasks
- 07:40 AM Revision 11893: planning/timeline/timeline.2013.xls: updated for progress
- 07:36 AM Revision 11892: /README.TXT: Full database import: In PostgreSQL: documented that the tables to check are located in the *r# schema*, not public
- 07:32 AM Revision 11891: planning/timeline/timeline.2013.xls: updated for progress
- 07:32 AM Revision 11890: planning/timeline/timeline.2013.xls: datasource validations: reordered datasources according to Brian Enquist's new validation order (wiki.vegpath.org/Spot-checking_validation_order)
- 07:10 AM Revision 11889: fix: schemas/vegbien.sql: analytical_specimen: added specimens-related columns that are in analytical_plot
- 06:35 AM Revision 11888: inputs/GBIF/raw_occurrence_record_plants/map.csv: row_num: remapped to plain *row_num, like the other datasources that have this field
- 06:31 AM Revision 11887: inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: rerun time: noted that this is only fast *after* manual vacuuming of the table (to remove the deleted rows from the index). autovacuum apparently does not run, although it should.
- 05:18 AM Revision 11886: planning/timeline/timeline.2013.xls: hid previous weeks
- 05:18 AM Revision 11885: planning/timeline/timeline.2013.xls: added timespan dots ◦ for supertasks
- 05:15 AM Revision 11884: planning/timeline/timeline.2013.xls: legend: changed to movable text box to avoid needing to erase and repopulate the header columns with the legend cells
- 05:03 AM Revision 11883: planning/timeline/timeline.2013.xls: crossed out and hid completed tasks
- 04:58 AM Revision 11882: planning/timeline/timeline.2013.xls: updated for progress
12/09/2013
- 07:24 PM Revision 11881: inputs/GBIF/raw_occurrence_record_plants/test.xml.ref: reran test, which added yearCollected/monthCollected/dayCollected
- 07:23 PM Revision 11880: inputs/CVS/plantConcept_/create.sql: documented runtime (3 min)
- 06:59 PM Revision 11879: inputs/CTFS/*.src/: added test.xml.ref
- 06:58 PM Revision 11878: inputs/CTFS/*.src/: added VegBIEN.csv
- 06:56 PM Revision 11877: bugfix: inputs/CTFS/TaxonOccurrence*/map.csv: things mapped to taxonObservationID: remapped to taxonOccurrenceID since taxonObservationID is not mapped to anything in VegBIEN (denormalized VegCore doesn't distinguish between taxon occurrences and taxon observations of them)
- 05:46 PM Revision 11876: bugfix: inputs/ARIZ/~.clean_up.sql: prevent "column already exists" errors when there is an input column of the same name as an output column
- 05:44 PM Revision 11875: bugfix: lib/runscripts/datasrc_dir.run: import(): don't run `sql/install` if the schema already exists, because this will try to rerun all the schema-creation queries. note that this idempotent functionality was *not* provided by the `make .../install` target that was previously used (idempotency is new with new-style import).
- 05:26 PM Revision 11874: bugfix: schemas/vegbien.sql: updated for renamed county_centroids column names
- 04:16 PM Revision 11873: inputs/.geoscrub/import_order.txt: added county_centroids so that it would be installed by new-style import
- 03:54 PM Revision 11872: bugfix: lib/runscripts/datasrc_dir.run: import(): can't run `datasrc_make reinstall` anymore because this now defers to the runscript for new-style import datasources (which was done so that `make .../install` properly reinstalls all the datasources). instead, call the applicable make targets manually (there are just 2 of them).
- 03:37 PM Revision 11871: inputs/FIA/TREE/run: documented import() runtime (1.5 h), which includes table cleanup runtime (1 h)
- 03:09 PM Revision 11870: bugfix: bin/pg_dump_limit: support errexit by ignoring the nonzero exit status that grep returns when it doesn't match anything
- 02:43 PM Revision 11869: inputs/GBIF/raw_occurrence_record_plants/run: updated import() runtime (same), documented table cleanup runtime (1.5 h)
- 02:38 PM Revision 11868: inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: CREATE INDEX ... specimenHolderInstitutions: documented runtime (45 min)
- 02:28 PM Revision 11867: inputs/GBIF/raw_occurrence_record_plants/postprocess.sql: Remove institutions that we have direct data for: documented runtime (3.5 min)
- 02:27 PM Revision 11866: /README.TXT: Datasource setup: added steps to backup e-mails
12/06/2013
- 07:46 AM Revision 11865: bugfix: inputs/CTFS/import_order.txt: added *.src so that these would be installed under new-style import as well. this means that their columns will now be automapped, requiring the names to be renamed to VegCore names in */create.sql. note that VegCore taxonOccurrenceID has been renamed to taxonObservationID since this was last run.
- 06:56 AM Revision 11864: inputs/.geoscrub/run: documented import() runtime (20 min)
- 06:12 AM Revision 11863: bugfix: inputs/.NCBI/import_order.txt: added nodes.src, names.src so that these would be installed under new-style import as well. this means that their columns will now be automapped, requiring the names to be renamed to VegCore names in nodes/create.sql.
- 06:01 AM Revision 11862: fix: /Makefile: inputs/reinstall: commented out to avoid a cascade of "overriding commands for target" warnings. this will revert to the default uninstall, install sequence for this target rather than the simultaneous-reinstall optimization (which can still be invoked manually).
- 05:52 AM Revision 11861: lib/sh/local.sh: public_schema_exists(): use a higher log_level for pg_schema_exists, to avoid all the verbose output involved in running the query
- 05:44 AM Revision 11860: bugfix: lib/sh/local.sh: public_schema_exists(): can no longer use psql_script_vegbien for this, because using `SET search_path` (called by psql_script_vegbien) with a schema that does not exist no longer produces an error. instead, use new pg_schema_exists(), which uses a different command that does produce an error if the schema does not exist.
- 05:38 AM Revision 11859: lib/sh/db.sh: added pg_require_schema()
- 05:37 AM Revision 11858: lib/sh/util.sh: stderr2stdout(): documented that this redirects fd 2->1 and log_fd (but *not* back to 2)
- 05:34 AM Revision 11857: bugfix: lib/sh/util.sh: stderr2stdout() use `command` before tee, which re-filters log_fd so that stderr itself is also filtered. this allows log-filtering out an otherwise-confusing benign error when using e.g. stderr_matches().
- 04:31 AM Revision 11856: lib/sh/util.sh: added not(), for use in prefixing wrapped commands
- 04:14 AM Revision 11855: lib/sh/db.sh: added pg_schema_exists()
- 04:10 AM Revision 11854: lib/sh/util.sh: added stderr_matches()
- 03:59 AM Revision 11853: lib/sh/util.sh: documented that fds 2x/3x should not be used because *we* use these, as opposed to 1x which is used by the shell internally
- 03:57 AM Revision 11852: lib/sh/util.sh: added stdout_contains()
- 03:34 AM Revision 11851: lib/sh/util.sh: added stderr2stdout()
- 02:52 AM Revision 11850: fix: lib/sh/db.sh: pg_table_exists(): usage: documented that $table is actually required for this function
- 02:44 AM Revision 11849: bugfix: inputs/input.Makefile: install: for new-style datasources, use the associated runscript instead (the old-style install target will not do everything that's needed for a new-style datasource)
- 01:57 AM Revision 11848: bugfix: /Makefile: moved inputs/reinstall to end so it overrides the corresponding subdir forwarding target
- 12:51 AM Revision 11847: bugfix: inputs/input.Makefile: install: for new-style datasources, use the associated runscript instead (the old-style install target will not do everything that's needed for a new-style datasource)
- 12:27 AM Revision 11846: bugfix: /Makefile: inputs/install: don't run bin/reinstall_all here, because /install targets are supposed to be idempotent, forward-only actions that don't first remove existing data
12/05/2013
- 11:55 PM Revision 11845: bugfix: /Makefile: postgres-Darwin: don't prepend $(MAKE) to $(postgresReload-Darwin), because this is now a list of commands
- 11:52 PM Revision 11844: bugfix: /Makefile: config: ignore errors if ~/bin/make exists
- 11:38 PM Revision 11843: inputs/FIA/COND/postprocess.sql: filtering formula: documented that this was created by Brad, and provided the URL to it on nimoy
- 12:27 PM Revision 11842: inputs/CVS/cvs.~.clean_up.sql: remove plot.realLatitude/realLongitude, since this is private data that should not be publicly visible
- 12:19 PM Revision 11841: inputs/CVS/cvs.~.clean_up.sql: remove plot.realLatitude/realLongitude, since this is private data that should not be publicly visible
- 08:38 AM Revision 11840: bin/make_analytical_db: don't regenerate family_higher_plant_group from the NCBI data because the lookup table is now prepopulated as part of the schema
- 08:37 AM Revision 11839: bin/import_all: don't import NCBI because the lookup table is now prepopulated as part of the schema
- 08:35 AM Revision 11838: schemas/vegbien.sql: include the family_higher_plant_group lookup table values so that these don't need to be regenerated from the NCBI nodes whenever the DB is reloaded
- 07:58 AM Revision 11837: schemas/vegbien.sql: taxonlabel_update_ancestors(): don't do an index scan if the value being scanned for is NULL, to support testing this function without the indexes in place, without extra full-table scans for NULL values affecting things. this can be used to determine if the function is actually using the indexes, by turning them off and seeing if the runtime changes.
- 07:03 AM Revision 11836: schemas/util.sql: explain2table(): documented usage:
- PERFORM util.explain2table($$
query
$$); - 05:52 AM Revision 11835: schemas/util.sql: explain2table(): by default, use the util.explain table
- 05:49 AM Revision 11834: schemas/util.sql: added explain table
- 05:47 AM Revision 11833: schemas/util.sql: added explain2notice()
- 05:44 AM Revision 11832: schemas/util.sql: added explain2str()
- 05:33 AM Revision 11831: schemas/util.sql: added explain2table()
- 05:23 AM Revision 11830: schemas/util.sql: added explain()
- 01:31 AM Revision 11829: schemas/vegbien.sql: taxonlabel_update_ancestors(): don't create a performance-intensive nested transaction (EXCEPTION block) for each INSERT, because there should no longer be duplicate ancestors, so it's OK to abort the whole transaction if this assertion fails
- 01:03 AM Revision 11828: bugfix: schemas/vegbien.sql: taxonlabel_update_ancestors_on_{insert,update}(): only use *either* the matched taxon's ancestors *or* the parent's ancestors, to avoid issues related to duplication between these two ancestors lists. this also fixes a bug where the 2nd taxonlabel_update_ancestors() call assumes that the existing ancestors are for the old *parent*, when in fact they have actually just been set to those for the new *matched taxon* (which horribly confuses taxonlabel_update_ancestors()).
12/04/2013
- 10:06 PM Revision 11827: schemas/vegbien.sql: _taxonlabel_set_parent_id(): just use a plain UPDATE statement, to avoid the significant parsing and stringification overhead of EXECUTE and quote_nullable(). it is not clear that EXECUTE is actually necessary to avoid caching the query plan, because the cache should be invalidated automatically when the table's ANALYZE statistics are regenerated.
- 10:00 PM Revision 11826: schemas/vegbien.sql: removed unused function _taxonlabel_set_matched_label_id(), which refers to obsolete fields
- 09:58 PM Revision 11825: schemas/vegbien.sql: synced to DB (the view renderer apparently changed the text of a view)
- 09:44 PM Revision 11824: backups/TNRS.backup: saved copy backups/TNRS.2013-11-18.backup
- 07:26 PM Revision 11823: bugfix: bin/import_all: run in errexit mode, so that if the user cancels reinstalling of the import schema, the script will then abort instead of continuing and using the wrong schema
- 06:56 PM Revision 11822: bugfix: schemas/Makefile: %/uninstall: always confirm before removing an existing schema, not just for public and r*, because an auxiliary schema might also be used as $version and reinstalled by bin/import_all
- 06:04 PM Revision 11821: schemas/vegbien.sql: analytical_stem_view: scrubbed_author: removed empty COALESCE() around value (left over from when multiple values needed to be combined for many TNRS fields)
- 04:57 PM Revision 11820: inputs/CVS/^taxon_observation.**.sample/create.sql: uncommented identifiedBy since this is now part of taxonObservation_
- 04:08 PM Revision 11819: fix: inputs/CVS/observation_community/create.sql: communityName: populate from commConcept.commName instead, because commInterpretation.commname is not always populated. this requires left-joining to commConcept.
- 03:58 PM Revision 11818: inputs/CVS/observation_community/map.csv: updated output column names to new input column names, to avoid later output column collisions
- 03:42 PM Revision 11817: inputs/CVS/observation_community/header.csv, map.csv: updated input column names for cvs.~.clean_up.sql renamings
- 03:12 PM Revision 11816: schemas/vegbien.sql: provider_count_view: source totals: use the much faster query developed for Brad (wiki.vegpath.org/VegBIEN_FAQ#from-Brad-on-2013-12-4), which avoids the need to do a GROUP BY on all of analytical_stem. eventually, we will want to apply the same optimization to the first publisher subtotals.
- 04:19 AM Revision 11815: inputs/CVS/cvs.~.clean_up.sql: commClass, commConcept fields: prepend table name to avoid inter-table collisions upon join
- 03:43 AM Revision 11814: added inputs/CVS/observation_community/, as for VegBank
- 03:32 AM Revision 11813: inputs/CVS/cvs.~.clean_up.sql: commClass.dba_src_ID: prepend table name to avoid inter-table collisions upon join
12/03/2013
- 04:32 PM Revision 11812: added inputs/CVS/observationContributor_/, which adds the people collecting the plot
- 04:02 PM Revision 11811: inputs/CVS/cvs.~.clean_up.sql: observationContributor.dba_src_ID: prepended table name to avoid collision when left-joining to party
- 03:44 PM Revision 11810: bugfix: inputs/input.Makefile: %/header.csv: errexit the command so that errors won't scroll by, which in this case requires `set -o pipefail`
- 02:57 PM Revision 11809: fix: inputs/CVS/taxonObservation_/create.sql: mapped identifiedBy, which involves joining to party
- 02:35 PM Revision 11808: inputs/CVS/cvs.~.clean_up.sql: don't rename taxonInterpretation.PARTY_ID, so that this can be USING-joined to party in inputs/CVS/taxonObservation_/create.sql
- 01:47 PM Revision 11807: schemas/vegbien.ERD.mwb: regenerated exports
- 08:58 AM Revision 11806: bin/map: support param start="", which indicates the default value. this fixes a bug in inputs/input.Makefile $(restart_row), which outputs "" if an explicit starting row is not found.
- 08:25 AM Revision 11805: inputs/CVS/^taxon_observation.**.sample/map.csv: synced output columns to input columns (which removes the extra *s)
- 08:00 AM Revision 11804: fix: inputs/CVS/plot_/postprocess.sql: locality: include the site name (authorLocation), because this is part of the unique specification of the place that was sampled, and Bob wants this to be included in VegBIEN
- 07:58 AM Revision 11803: inputs/CVS/^taxon_observation.**.sample/create.sql: removed parentLocationID, since this is unused in CVS
- 07:45 AM Revision 11802: bugfix: inputs/input.Makefile: `%/install: %/create.sql`: errexit the command so that errors won't scroll by, which in this case requires `set -o pipefail`
- 06:51 AM Revision 11801: inputs/VegBank/plot/postprocess.sql: locality: include the site name (authorlocation), because this is part of the unique specification of the place that was sampled
- 06:27 AM Revision 11800: bugfix: /README.TXT: Full database import: To restart an aborted import for a specific table: run the two commands in errexit mode so that the datasource does not incorrectly have the temp suffix removed if the import command exited with an error
- 05:19 AM Revision 11799: fix: inputs/CVS/taxon_observation.**/map.csv: omit authorPlantName because it is not specific to the taxonInterpretation row (this is in a separate taxonInterpretation for the original determination instead)
- 04:59 AM Revision 11798: web/links/index.htm: updated to Firefox bookmarks. PostgreSQL: added links for troubleshooting out-of-memory errors, which show up (cryptically) as "The database system is in recovery mode" errors in processes running at the time the out-of-memory condition occurred.
- 02:31 AM Revision 11797: schemas/postgresql.conf: work_mem: documented that this seemingly small # is *multiplied* by max_connections, i.e. 256 MB * 100 = *26 GB*, which approaches total memory (32 GB)
- 01:21 AM Task #831 (New): make tests use their own public schema
- * enables the automated tests to be run on vegbiendev
* allows adding a new datasource directly on vegbiendev, witho... - 12:58 AM Task #501: find out which datasources won't allow their data to be publicly accessible
- see [[Datasource conditions of use]]
12/02/2013
Also available in: Atom