Project

General

Profile

Activity

From 03/06/2014 to 04/04/2014

04/04/2014

06:13 PM Revision 13055: added inputs/Madidi/_src/ to match wiki steps in wiki.vegpath.org/Adding_a_flat-file_datasource
Aaron Marcuse-Kubitza

04/03/2014

07:31 PM Revision 13054: added validation/aggregating/pipeline/validations_on_sparse_datasources.odg
Aaron Marcuse-Kubitza
04:13 PM Revision 13053: planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx
Aaron Marcuse-Kubitza
04:09 PM Revision 13052: planning/workflow/bien3_architecture.pptx: stage I: made all datasources the same height so that the denormalized VegCore schema boxes would all look exactly the same. widened the denormalized VegCore schema boxes to make it visually clear that they have more columns than the staging tables denormalized together
Aaron Marcuse-Kubitza
03:40 PM Revision 13051: planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx
Aaron Marcuse-Kubitza
03:39 PM Revision 13050: planning/workflow/bien3_architecture.pptx: updated to reflect decisions made in the 2014-04-03 conference call (wiki.vegpath.org/2014-04-03_conference_call#import-process-2)
Aaron Marcuse-Kubitza
08:53 AM Revision 13049: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_14_count_of_all_invalid_verbatim_lat_long
Aaron Marcuse-Kubitza
08:35 AM Revision 13048: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_12_distinct_collector_name_collect_num_date_w_count
Aaron Marcuse-Kubitza
08:04 AM Revision 13047: validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: fixed whitespace
Aaron Marcuse-Kubitza
07:32 AM Revision 13046: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed trailing whitespace
Aaron Marcuse-Kubitza
07:31 AM Revision 13045: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_13_count_of_all_verbatim_and_decimal_lat_long
Aaron Marcuse-Kubitza

04/02/2014

05:55 PM Revision 13044: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_11_list_of_three_standard_political_divisions
Aaron Marcuse-Kubitza
05:36 PM Revision 13043: validation/aggregating/specimens/qualitative_validations_specimens.sql: *_of_species_binomials: switched back to the old queries that use the split-apart ranks instead of the concatenated taxon name. note that these will not work on all specimens datasources, but now that #6,7 were selected to use the concatenated taxon name, this isn't a problem.
Aaron Marcuse-Kubitza
05:21 PM Revision 13042: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_species_binomials: renamed columns to species_binomial to reflect reverted query name
Aaron Marcuse-Kubitza
05:16 PM Revision 13041: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_verbatim_species_excluding_author: renamed to *_species_binomials for clarity
Aaron Marcuse-Kubitza
05:14 PM Revision 13040: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _specimens_04_count_of_unique_verbatim_species_with_author, _specimens_05_list_of_unique_verbatim_species_with_author: switched back to original names because #6,7 now do the same thing as #4,5, so we should include the differing result set of #4,5 for datasources that provide it
Aaron Marcuse-Kubitza
05:01 PM Revision 13039: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_10_count_number_of_records_by_institution
Aaron Marcuse-Kubitza
04:38 PM Revision 13038: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: use taxon_name*_with_author everywhere instead of custom column names, for consistency
Aaron Marcuse-Kubitza
04:09 PM Revision 13037: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_verbatim_subspecific_taxa_without_author, etc.: renamed to *_with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
Aaron Marcuse-Kubitza
04:03 PM Revision 13036: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_06_count_of_unique_verb_subsp_taxa_without_author, _specimens_07_list_of_verbatim_subspecific_taxa_without_author
Aaron Marcuse-Kubitza
03:54 PM Revision 13035: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_verbatim_species_without_author, etc.: renamed to *_with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
Aaron Marcuse-Kubitza
03:32 PM Task #884 (Rejected): fix Postgres bug that causes query planner to use seq scans and slow sorts instead of index scans in the import
h3. issue
* see the following @pg_stat_activity@ snapshots (note the @EXPLAIN@ output below each query):...
Aaron Marcuse-Kubitza
03:14 PM Revision 13034: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed extra ; at ends of queries
Aaron Marcuse-Kubitza
03:13 PM Revision 13033: validation/aggregating/specimens/qualitative_validations_specimens.sql: use the concatenated taxon name instead of concatenating the ranks, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
Aaron Marcuse-Kubitza
03:05 PM Revision 13032: validation/aggregating/specimens/qualitative_validations_specimens.sql: use the concatenated taxon name instead of concatenating the ranks, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
Aaron Marcuse-Kubitza
11:17 AM Revision 13031: /README.TXT: Full database import: disk space: added high-water mark of 1.8 TB @11:15:05
Aaron Marcuse-Kubitza
10:56 AM Revision 13030: /README.TXT: Full database import: added steps to figure out which datasource tables were not successfully imported due to disk space errors
Aaron Marcuse-Kubitza
10:45 AM Revision 13029: fix: /README.TXT: Full database import: moved verification of exit statuses before verification of DB contents because there is no point in verifying the DB if the datasources didn't finish importing
Aaron Marcuse-Kubitza
10:10 AM Task #882 (Rejected): add limit on the # of parallel import processes
it turns out this would not fix the problem, because it occurs even when only a few datasources are running Aaron Marcuse-Kubitza
10:07 AM Task #883: have import scripts regularly check disk space and pause processes if getting close to limit
merging info in #882, so that this info is not maintained in two places Aaron Marcuse-Kubitza
09:01 AM Revision 13028: /README.TXT: Full database import: disk space: documented that the entire disk again gets used long after the beginning of the import, when only a few datasources are running (ie. it definitely seems to be a recent bug in Postgres, and not a latent problem)
Aaron Marcuse-Kubitza

04/01/2014

05:40 PM Revision 13027: /README.TXT: Maintenance: added task to regularly re-run full-database import so that bugs in it don't pile up. it needs to be kept in working order so that it works when it's needed.
Aaron Marcuse-Kubitza
05:02 PM Task #883 (Rejected): have import scripts regularly check disk space and pause processes if getting close to limit
h3. issue
* there is no soft limit on disk space inside Postgres, so the hard limit gets reached instead, causing ...
Aaron Marcuse-Kubitza
04:24 PM Revision 13026: /README.TXT: Full database import: added steps to manually reimport the applicable datasources if there are errors due to exceeding available disk space
Aaron Marcuse-Kubitza
04:13 PM Revision 13025: /README.TXT: Full database import: removed extra `ssh -t vegbiendev.nceas.ucsb.edu` before "upload logs", because the previous steps also occur on vegbiendev
Aaron Marcuse-Kubitza
04:11 PM Task #882 (Rejected): add limit on the # of parallel import processes
see description of problem in #883 Aaron Marcuse-Kubitza
04:04 PM Revision 13024: /README.TXT: Notes on system stability: added recommendation to maintain a snapshot copy of the VM as it was at the last successful import, for fallback use if a system upgrade breaks anything. system upgrades on the snapshot VM should be disabled completely, and because this will also disable security fixes, the snapshot VM should be disconnected from the internet and all networking interfaces. (this is an unfortunate consequence of modern OSes being written in non-memory-safe languages such as C and C++.)
Aaron Marcuse-Kubitza
03:43 PM Revision 13023: /README.TXT: Full database import: disk space: documented that a higher high-water mark actually occurs later in the import, so that the disk usage issue actually remains a problem after the very beginning
Aaron Marcuse-Kubitza
03:37 PM Revision 13022: fix: /README.TXT: Full database import: disk space: increased the minimum free space recommendation to 1 TB, because analysis of the disk usage during the beginning of the import shows that actually close to the entire amount is being used. however, this problem is normally undetectable unless the disk space is specifically checked, because it only manifests itself if the available disk space is exceeded completely.
Aaron Marcuse-Kubitza
02:04 PM Revision 13021: /README.TXT: Full database import: documented that the beginning of the import should be scheduled at a time when the DB will not be needed for other uses, because vegbiendev will be slow for the first few hours of the import due to the import using all the available cores
Aaron Marcuse-Kubitza
01:36 PM Revision 13020: /README.TXT: Full database import: documented that CPU load warning e-mails can safely be ignored. they happen because the parallel imports use all the available cores.
Aaron Marcuse-Kubitza
01:31 PM Revision 13019: fix: lib/common.Makefile: $(nice): use an increment of +10 instead of +5 because +5 still leaves the shell sluggish
Aaron Marcuse-Kubitza
01:29 PM Revision 13018: lib/common.Makefile: added $(nice) and use it everywhere its definition is used
Aaron Marcuse-Kubitza
01:14 PM Revision 13017: /README.TXT: Full database import: exiting `screen`: clarify that you must use `exit`, as Ctrl+D gets disabled to prevent accidental exits
Aaron Marcuse-Kubitza
12:47 PM Revision 13016: /README.TXT: Full database import: added step to restart Postgres to free up any disk space used by temp tables from the last import (this is apparently not automatically reclaimed)
Aaron Marcuse-Kubitza
12:45 PM Revision 13015: /Makefile: postgres_restart-Linux: documented that the manual running of the command is needed because for some reason, pg_ctl does not work when run inside make
Aaron Marcuse-Kubitza
12:43 PM Revision 13014: fix: /Makefile: postgres_restart-Linux: added pause after telling the user the command to run
Aaron Marcuse-Kubitza
12:42 PM Revision 13013: /Makefile: $(postgresReload-*): use postgres_restart for the postgres-restarting step
Aaron Marcuse-Kubitza
12:30 PM Revision 13012: bugfix: /Makefile: postgres_restart: added separate Linux version that deals with Linux-specific issues (as in $(postgresReload-Linux))
Aaron Marcuse-Kubitza
12:15 PM Revision 13011: /Makefile: added postgres_restart, since this is often invoked separately from the entire postgres_reload target
Aaron Marcuse-Kubitza
11:40 AM Revision 13010: /README.TXT: Full database import: disk space: increased minimum requirement to 500GB (~200GB extra), as the import may use significant additional space for temp tables
Aaron Marcuse-Kubitza
11:37 AM Revision 13009: /README.TXT: Full database import: documented that env vars set before invoking `screen` will be inherited by it, so these steps will work even if they come before `screen`
Aaron Marcuse-Kubitza
11:26 AM Revision 13008: backups/TNRS.backup.md5: updated
Aaron Marcuse-Kubitza
11:23 AM Revision 13007: /README.TXT: Full database import: added steps to set a custom version, if the auto-assigned one would cause a collision with the last import
Aaron Marcuse-Kubitza
11:08 AM Revision 13006: /README.TXT: Full database import: `unset version`: documented that this is needed because it may have been set in the outer shell
Aaron Marcuse-Kubitza

03/30/2014

07:54 PM Revision 13005: fix: lib/sql_io.py: put_table(): don't warn if can't create pkey, because this just indicates that a set-returning function was used. this should get rid of the last of the confusing benign warnings in the test output.
Aaron Marcuse-Kubitza
07:53 PM Revision 13004: fix: lib/sql.py: flatten(): don't warn if can't create pkey, because this just indicates that a set-returning function was used
Aaron Marcuse-Kubitza
07:52 PM Revision 13003: lib/sql.py: run_query_into() added add_pkey_warn param to support turning off "could not create unique index" warnings, which are sometimes benign (eg. when using set-returning functions with column-based import)
Aaron Marcuse-Kubitza
06:52 PM Revision 13002: /README.TXT: Full database import: disk space: updated schema size (315GB)
Aaron Marcuse-Kubitza
06:45 PM Revision 13001: /README.TXT: Full database import: removed `up` on jupiter because this is done as part of "do steps under Maintenance > "to synchronize vegbiendev, ..."
Aaron Marcuse-Kubitza
06:44 PM Revision 13000: /README.TXT: Full database import: moved "do steps under Maintenance > "to synchronize vegbiendev, ..." outside of "On local machine" because these steps don't only take place on the local machine
Aaron Marcuse-Kubitza
06:41 PM Revision 12999: /README.TXT: use `up` instead of `svn up --force` for consistency
Aaron Marcuse-Kubitza
06:40 PM Revision 12998: fix: /README.TXT: always use `up` instead of `svn up` since this includes --force
Aaron Marcuse-Kubitza
06:39 PM Revision 12997: /README.TXT: Full database import: removed unneeded `ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk` at beginning since this is performed again the first time it's needed
Aaron Marcuse-Kubitza
06:38 PM Revision 12996: fix: /README.TXT: Full database import: removed erroneous line that resulted from a search-and-replace of connection commands in r12396. (it used to read "Follow the steps under Connecting to vegbiendev above, using jupiter instead". this step is now performed on the line below it.)
Aaron Marcuse-Kubitza
06:31 PM Revision 12995: bin/make_analytical_db: removed remake_diff_tables() because this is now done for each datasource in inputs/input.Makefile
Aaron Marcuse-Kubitza
06:28 PM Revision 12994: bugfix: schemas/vegbien.sql: schemas/vegbien.sql(): need to util.use_schema(schema_anchor) *before* initializing vars that use own-schema functions
Aaron Marcuse-Kubitza
06:12 PM Revision 12993: inputs/input.Makefile: validate: redirect the output to the log, as for other import-related operations
Aaron Marcuse-Kubitza
06:08 PM Revision 12992: inputs/input.Makefile: import: validate at the end of the import
Aaron Marcuse-Kubitza
06:02 PM Revision 12991: inputs/input.Makefile: added new-style aggregating validations (`validate` target)
Aaron Marcuse-Kubitza
06:02 PM Revision 12990: bin/make_analytical_db: removed no longer needed "${public}_validations" schema qualifier, now that it is in the search_path
Aaron Marcuse-Kubitza
06:00 PM Revision 12989: fix: bin/vegbien_dest: added public_validations
Aaron Marcuse-Kubitza
05:41 PM Revision 12988: added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt, which is useful to see what fields will be available when we switch to the new GBIF export format
Aaron Marcuse-Kubitza
05:39 PM Revision 12987: lib/sh/util.sh: removed end_try_subshell, which now does the same thing as end_try
Aaron Marcuse-Kubitza
05:38 PM Revision 12986: fix: lib/sh/archives.sh: unzip(): support -p option, which pipes extracted data to stdout
Aaron Marcuse-Kubitza
05:11 PM Revision 12985: added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt.run
Aaron Marcuse-Kubitza
05:11 PM Revision 12984: added lib/runscripts/extract_header.run
Aaron Marcuse-Kubitza
05:09 PM Revision 12983: fix: lib/sh/make.sh: direct the user to use begin_target instead of set_make_vars (set_make_vars is now used by begin_target)
Aaron Marcuse-Kubitza
05:06 PM Revision 12982: fix: lib/runscripts/util.run: to_top_file(): handle $_remake properly, without requiring deferred_check_target_exists to set to_file()'s flags
Aaron Marcuse-Kubitza
05:03 PM Revision 12981: bugfix: lib/sh/util.sh: die(): usage: documented that if msg uses $(...), save_e is needed
Aaron Marcuse-Kubitza
04:59 PM Revision 12980: bugfix: lib/sh/util.sh: already_exists_msg(): need to save_e, because new $(mk_hint) call resets $?
Aaron Marcuse-Kubitza
04:55 PM Revision 12979: lib/sh/util.sh: die(): always errexit even if $e = 0, because die always indicates an error
Aaron Marcuse-Kubitza
04:53 PM Revision 12978: lib/sh/util.sh: added rethrow!(), which always errexits, even if $e = 0
Aaron Marcuse-Kubitza
04:53 PM Revision 12977: lib/sh/util.sh: rethrow(): also work in situations where $e is not set
Aaron Marcuse-Kubitza
04:50 PM Revision 12976: lib/sh/util.sh: rethrow: made it a function since there is now no need for it to be an alias
Aaron Marcuse-Kubitza
04:47 PM Revision 12975: lib/sh/util.sh: rethrow: removed `test "$e" != 0` since errexit only does anything if $e != 0
Aaron Marcuse-Kubitza
04:45 PM Revision 12974: lib/sh/util.sh: removed separate rethrow_exit*, rethrow_subshell*, since they now do the same thing as rethrow*
Aaron Marcuse-Kubitza
04:42 PM Revision 12973: lib/sh/util.sh: rethrow*!: use new errexit, which works in functions *and* subshells
Aaron Marcuse-Kubitza
04:38 PM Revision 12972: lib/sh/util.sh: added errexit(), used in place of (exit "$1") because a bug in bash prevents subshells from triggering errexit
Aaron Marcuse-Kubitza
04:18 PM Revision 12971: lib/sh/util.sh: added bool!()
Aaron Marcuse-Kubitza
03:08 PM Revision 12970: fix: lib/sh/util.sh: redir(): need to indent before invoking an external command (not just in command__exec(), but for all redir() calls)
Aaron Marcuse-Kubitza

03/29/2014

04:10 AM Revision 12969: lib/sh/make.sh: with_rm(): documented that it only works inside a runscript target that starts w/ begin_target
Aaron Marcuse-Kubitza
04:06 AM Revision 12968: *{.sh,run}: runscript targets: use begin_target instead of echo_func so the target name is properly echoed. note that this requires using with_rm so that $rm is properly progagated to applicable invoked targets. (previously, $rm was progagated to all invoked targets. note that with_rm only works inside a runscript target that starts with begin_target.)
Aaron Marcuse-Kubitza
03:58 AM Revision 12967: lib/sh/make.sh: self_make(): renamed to with_rm() for clarity, since this is used only to progagate $rm, and does not also invoke a command with the same name as the current function, as the name might suggest
Aaron Marcuse-Kubitza

03/28/2014

07:17 AM Revision 12966: schemas/vegbien.sql: updated _specimens_01_count_of_total_records_specimens_in_source_db
Aaron Marcuse-Kubitza
07:10 AM Revision 12965: validation/aggregating/specimens/qualitative_validations_specimens.sql: use taxonoccurrence instead of location as the table that all specimens should have, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
Aaron Marcuse-Kubitza
07:03 AM Revision 12964: lib/runscripts/util.run: support conventional main() method as well as `all` target
Aaron Marcuse-Kubitza
03:03 AM Task #562 (New): flatten the mappings
normalized VegCore's @traceable.id_by_source@ now provides an alternate pkey that can be used for duplicate-merging, ... Aaron Marcuse-Kubitza
02:55 AM Task #539 (Rejected): get analytical_stem_view to use merge joins instead of hash joins
the query planner is likely right that hash joins are faster when joining entire tables rather than just the first fe... Aaron Marcuse-Kubitza
02:53 AM Task #440: aggregating validations of imports
see [[Aggregating validations]] Aaron Marcuse-Kubitza
02:52 AM Task #290 (Resolved): benchmark tests for database loading
this is now the [[Aggregating validations]] Aaron Marcuse-Kubitza
02:39 AM Revision 12963: fix: inputs/*/*/map.csv: remapped occurrenceID-mapped fields to dataProviderRecordID when these were not globally unique DwC occurrenceIDs (http://rs.tdwg.org/dwc/terms/#occurrenceID)
Aaron Marcuse-Kubitza
02:34 AM Revision 12962: fix: inputs/CTFS/AggregateObservation/map.csv: field mapped to occurrenceID: remapped to aggregateOrganismObservationID because these are not specimen occurrences
Aaron Marcuse-Kubitza
02:32 AM Revision 12961: fix: mappings/VegCore-VegBIEN.csv: taxonoccurrence.sourceaccessioncode: need to populate from aggregateOrganismObservationID when only that is available
Aaron Marcuse-Kubitza
02:03 AM Revision 12960: bugfix: inputs/NY/Ecatalog_all/map.csv: can't use CatalogNumber as pkey because it's not unique and not always populated. this fixes the NY NULL accessionNumbers bug (wiki.vegpath.org/Aggregating_validations_status#bugs).
Aaron Marcuse-Kubitza
01:31 AM Revision 12959: /README.TXT: moved "to back up e-mails" and "to back up the version history" before settings backup so that the local backup of these is up to date when everything gets backed up
Aaron Marcuse-Kubitza
01:29 AM Revision 12958: inputs/XAL/Specimen/header.csv: updated
Aaron Marcuse-Kubitza
12:45 AM Revision 12957: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: backups/TNRS.backup: do this before the general sync so that any reverse sync that's needed won't include it
Aaron Marcuse-Kubitza
12:44 AM Revision 12956: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: backups/TNRS.backup: use bin/sync_upload now that this works for rsync-ignored files
Aaron Marcuse-Kubitza
12:36 AM Revision 12955: bugfix: lib/sh/sync.sh: don't unintentionally rsync-ignore explicitly-specified files
Aaron Marcuse-Kubitza
12:32 AM Revision 12954: lib/sh/util.sh: filesystem: added is_*(), could_be_*()
Aaron Marcuse-Kubitza
12:31 AM Revision 12953: lib/sh/util.sh: added contains_match()
Aaron Marcuse-Kubitza
12:31 AM Revision 12952: lib/sh/util.sh: added ends_with()
Aaron Marcuse-Kubitza

03/27/2014

11:13 PM Revision 12951: fix: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: run `up` on all machines, not just jupiter, because all must be up-to-date to avoid extraneous diffs
Aaron Marcuse-Kubitza
11:11 PM Revision 12950: bugfix: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: `svn up` on jupiter: need to use up alias because that adds --force
Aaron Marcuse-Kubitza
11:10 PM Revision 12949: bugfix: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: added `svn up` on jupiter: needs to be in main dir (~/bien), not ~/Dropbox/svn/
Aaron Marcuse-Kubitza
11:08 PM Revision 12948: /README.TXT: to synchronize vegbiendev, jupiter, and your local machine: added `svn up` on jupiter to avoid extraneous diffs when rsyncing
Aaron Marcuse-Kubitza
10:41 AM Revision 12947: planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx
Aaron Marcuse-Kubitza
10:32 AM Revision 12946: planning/workflow/bien3_architecture.pptx: stage I: clarified that the database input is intended to be a *normalized* input, and its corresonding output is intended to be *denormalized*
Aaron Marcuse-Kubitza
10:29 AM Revision 12945: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: stage I: clarified that the database input is intended to be a *normalized* input, and its corresonding output is intended to be *denormalized*
Aaron Marcuse-Kubitza
09:02 AM Revision 12944: bugfix: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: _specimens_16_list_distinct_specimen_descriptions: should use DISTINCT
Aaron Marcuse-Kubitza
09:01 AM Revision 12943: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_16_list_distinct_specimen_descriptions
Aaron Marcuse-Kubitza
09:00 AM Revision 12942: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_16_list_distinct_specimen_descriptions
Aaron Marcuse-Kubitza
08:53 AM Revision 12941: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_15_list_distinct_locality_descriptions
Aaron Marcuse-Kubitza
08:48 AM Revision 12940: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_09_list_of_unique_verbatim_author_taxa_with_genus
Aaron Marcuse-Kubitza
08:47 AM Revision 12939: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_08_count_of_unique_verbatim_author_taxa_with_genus
Aaron Marcuse-Kubitza
08:36 AM Revision 12938: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_05_list_of_verbatim_species_excluding_author
Aaron Marcuse-Kubitza
08:35 AM Revision 12937: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_04_count_of_unique_verbatim_species_without_author
Aaron Marcuse-Kubitza
08:23 AM Revision 12936: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_03_list_of_verbatim_families
Aaron Marcuse-Kubitza
08:18 AM Revision 12935: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_02_count_of_unique_verbatim_families
Aaron Marcuse-Kubitza
08:06 AM Revision 12934: schemas/vegbien.ERD.mwb: regenerated exports
Aaron Marcuse-Kubitza
08:04 AM Revision 12933: schemas/vegbien.sql: public_validations: added _specimens_01_count_of_total_records_specimens_in_source_db
Aaron Marcuse-Kubitza
07:35 AM Revision 12932: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_01_count_of_total_records_specimens_in_source_db
Aaron Marcuse-Kubitza
07:34 AM Revision 12931: validation/aggregating/specimens/qualitative_validations_specimens.sql: added config statements for datasource and query planner
Aaron Marcuse-Kubitza
05:06 AM Revision 12930: web/links/index.htm: updated to Firefox bookmarks: Firefox: added instructions for enabling security.password_lifetime and making all tabs load when the browser is opened
Aaron Marcuse-Kubitza
04:43 AM Revision 12929: /README.TXT: Schema changes: manually apply schema changes to the live public schema: moved under "update mappings and staging table column names" because this is a necessary part of that step
Aaron Marcuse-Kubitza
04:43 AM Revision 12928: /README.TXT: Schema changes: manually apply schema changes to the live public schema: moved under "update mappings and staging table column names" because this is a necessary part of that step
Aaron Marcuse-Kubitza
04:40 AM Revision 12927: /README.TXT: Schema changes: changed "update staging table column names" to "update mappings and staging table column names"
Aaron Marcuse-Kubitza
04:13 AM Revision 12926: fix: validation/aggregating/specimens/qualitative_validations_specimens.sql: use pg_dump's formatting for COMMENT ON to facilitate diffing against a pg_dump export of the DDL statements
Aaron Marcuse-Kubitza
04:07 AM Revision 12925: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: removed DDL statements so that running the query file does not alter the database, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#remove-DDL-statements
Aaron Marcuse-Kubitza
04:01 AM Revision 12924: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to DB, which pg_dump-formats the views
Aaron Marcuse-Kubitza
03:57 AM Revision 12923: validation/**.sql: replaced CREATE OR REPLACE VIEW with CREATE VIEW to match pg_dump output for diffing
Aaron Marcuse-Kubitza
03:36 AM Revision 12922: added inputs/NY/validations*.sql*
Aaron Marcuse-Kubitza
03:34 AM Revision 12921: fix: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: use pg_dump's formatting for COMMENT ON to facilitate diffing against a pg_dump export of the DDL statements
Aaron Marcuse-Kubitza
03:31 AM Revision 12920: bugfix: lib/common.Makefile: $(add*): need to wrap w/ $(wildcard) to prevent "targets don't exist" error, because svn 1.7 does not suppress this error even with --force
Aaron Marcuse-Kubitza
03:27 AM Revision 12919: bugfix: inputs/input.Makefile: add!: add* of $(svnFiles): need to ignore errors because svn 1.7 does not suppress the "targets don't exist" error even with --force
Aaron Marcuse-Kubitza

03/26/2014

09:34 PM Revision 12918: fix: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: decimalLatitude/decimalLongitude: need to cast to double precision for numeric comparisons
Aaron Marcuse-Kubitza
09:33 PM Revision 12917: fix: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: CollectedDate: updated for refreshed NY data
Aaron Marcuse-Kubitza
09:30 PM Revision 12916: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: fixed typos in column aliases
Aaron Marcuse-Kubitza
09:23 PM Revision 12915: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: translated column names to VegCore, using `bin/in_place validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql env text=1 bin/repl inputs/NY/Ecatalog_all/map.csv` from the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
Aaron Marcuse-Kubitza
09:23 PM Revision 12914: fix: bin/repl: text mode (whether all patterns are plain text) should default to on, not off, if matching entire cells in a spreadsheet
Aaron Marcuse-Kubitza
07:16 PM Revision 12913: bugfix: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: need to enclose additional mixed-case identifiers in "", using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
Aaron Marcuse-Kubitza
07:15 PM Revision 12912: bugfix: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: need to enclose additional mixed-case identifiers in "", using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
Aaron Marcuse-Kubitza
06:09 PM Revision 12911: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql: abbreviated view names longer than 63 chars to prevent them from being truncated
Aaron Marcuse-Kubitza
06:07 PM Revision 12910: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: escape any ' inside '...' by doubling them
Aaron Marcuse-Kubitza
06:04 PM Revision 12909: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: translated SQL to Postgres
Aaron Marcuse-Kubitza
05:32 PM Revision 12908: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql: changed /* */ comments to COMMENT ON comments, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#prepend-CREATE-VIEW
Aaron Marcuse-Kubitza
04:58 PM Revision 12907: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql: removed no longer needed -- comments containing the query name, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#prepend-CREATE-VIEW
Aaron Marcuse-Kubitza
03:47 PM Revision 12906: validation/aggregating/specimens/qualitative_validations_specimens.sql: moved notes to comments to after the query
Aaron Marcuse-Kubitza
03:46 PM Revision 12905: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: moved notes to comments to after the query
Aaron Marcuse-Kubitza
03:44 PM Revision 12904: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: moved "Check" comments to after the query, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
Aaron Marcuse-Kubitza
03:22 PM Revision 12903: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed "Check: should return [#] rows" comments because these only apply to the NY results, not to all specimens datasources
Aaron Marcuse-Kubitza
03:16 PM Revision 12902: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: prepended CREATE VIEW, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#prepend-CREATE-VIEW and the same abbreviations as the output queries (validation/aggregating/specimens/qualitative_validations_specimens.sql)
Aaron Marcuse-Kubitza
03:01 PM Revision 12901: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: synced "Check" comments to output queries validation/aggregating/specimens/qualitative_validations_specimens.sql
Aaron Marcuse-Kubitza
02:49 PM Revision 12900: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: enclosed mixed-case identifiers in "" using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
Aaron Marcuse-Kubitza
02:37 PM Revision 12899: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: translated column names to VegCore, using `bin/in_place validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql env text=1 bin/repl inputs/NY/Ecatalog_all/map.csv` from the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
Aaron Marcuse-Kubitza
02:29 PM Revision 12898: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to use column names for refreshed NY data
Aaron Marcuse-Kubitza
02:17 PM Revision 12897: fix: bin/repl: don't consider uppercase SQL keywords to indicate that a word is in a sentence
Aaron Marcuse-Kubitza
12:02 AM Revision 12896: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: use our staging tables instead of the BIEN2 MySQL staging tables
Aaron Marcuse-Kubitza

03/25/2014

11:52 PM Revision 12895: validation/aggregating/specimens/**.sql: removed trailing whitespace, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#translate-to-Postgres
Aaron Marcuse-Kubitza
11:39 PM Revision 12894: archived validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.sql
Aaron Marcuse-Kubitza
11:39 PM Revision 12893: added validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql, copied from qualitative_validations_source_db_NYBG.sql
Aaron Marcuse-Kubitza
11:33 PM Revision 12892: validation/aggregating/specimens/qualitative_validations_specimens.sql: added ; at end of `CREATE OR REPLACE VIEW` statements
Aaron Marcuse-Kubitza
04:18 AM Revision 12891: inputs/run: postprocess(): documented runtime on vegbiendev (1 h)
Aaron Marcuse-Kubitza

03/24/2014

06:22 PM Revision 12890: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed input-query-specific comments
Aaron Marcuse-Kubitza
06:21 PM Revision 12889: validation/aggregating/specimens/qualitative_validations_specimens.sql: reworded rowcount check comments to apply to the output queries
Aaron Marcuse-Kubitza
06:18 PM Revision 12888: validation/aggregating/specimens/qualitative_validations_specimens.sql: shortened view names to fit within the 63-char limit without truncation
Aaron Marcuse-Kubitza
05:45 PM Revision 12887: /README.TXT: `make inputs/{NVS,SALVIAS,TEAM}/test`: updated runtime (1 min)
Aaron Marcuse-Kubitza
05:35 PM Revision 12886: schemas/vegbien.sql: specimenreplicate.institution_id: renamed to duplicate_institutions_sourcelist_id, as decided in the conference calls (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2)
Aaron Marcuse-Kubitza
05:32 PM Revision 12885: inputs/run: postprocess(): updated runtime (25 min)
Aaron Marcuse-Kubitza
05:22 PM Revision 12884: fix: validation/aggregating/specimens/qualitative_validations_specimens.sql: changed "Full inner join" to "Full outer join" because a FULL JOIN is a type of outer join, not inner join
Aaron Marcuse-Kubitza
05:04 PM Revision 12883: /README.TXT: calls to `inputs/run postprocess`: direct user to refer to inputs/run for this, so the runtime doesn't have to be updated in multiple places
Aaron Marcuse-Kubitza
05:02 PM Revision 12882: inputs/run: postprocess(): updated runtime (20 min)
Aaron Marcuse-Kubitza
05:01 PM Revision 12881: /README.TXT: Schema changes: added steps to update staging table column names on the local machine and vegbiendev
Aaron Marcuse-Kubitza
04:50 PM Revision 12880: fix: schemas/VegCore/mk_derived: added `EOF` at end to avoid (benign) "here-document delimited by end-of-file" warnings on Linux
Aaron Marcuse-Kubitza
01:49 AM Revision 12879: mappings/VegCore.htm: regenerated from wiki: rename specimenHolderInstitutions to specimen_duplicate_institutions, as decided in the 2014-03-13 conference call (wiki.vegpath.org/2014-03-13_conference_call#schema-changes-2). note that most schema changes (such as this one) involve mappings changes, which are handled automatically by `inputs/run postprocess; yes|make inputs/{NVS,SALVIAS,TEAM}/test`.
Aaron Marcuse-Kubitza
01:43 AM Revision 12878: bugfix: lib/runscripts/table.run: schema/make calls: need to use `make schema` instead because old-style datasources don't have a top-level runscript (the absence of this identifies them as old-style so inputs/input.Makefile works correctly)
Aaron Marcuse-Kubitza
01:21 AM Revision 12877: /README.TXT: Maintenance: VegCore data dictionary: `make inputs/{NVS,SALVIAS,TEAM}/test`: recorded runtime (30 s)
Aaron Marcuse-Kubitza
01:17 AM Revision 12876: /README.TXT: Maintenance: VegCore data dictionary: `make inputs/{NVS,SALVIAS,TEAM}/test`: prepended `time` to enable obtaining the runtime
Aaron Marcuse-Kubitza
01:11 AM Revision 12875: /README.TXT: Maintenance: VegCore data dictionary: `inputs/run postprocess`: updated runtime (20 min)
Aaron Marcuse-Kubitza
12:45 AM Revision 12874: fix: schemas/util.sql: trim(): by default, cascadingly drop dependent columns so that they don't prevent trim() from succeeding. note that this requires the dependent columns to then be manually re-created.
Aaron Marcuse-Kubitza

03/23/2014

11:43 PM Revision 12873: bugfix: inputs/GBIF/table.run: switched to using lib/runscripts/table.run instead of mysql.table.run because some subdirs (Source/) need the regular table.run to work properly. mysql.table.run should instead be used directly by subdirs that use the MySQL install.
Aaron Marcuse-Kubitza

03/22/2014

06:20 AM Revision 12872: bugfix: lib/sh/util.sh: **DON'T** do `shopt -s lastpipe` because this causes a segfault on Linux in stderr_matches(). (it also isn't supported on Mac.) use @PIPESTATUS instead. note that we do not currently need lastpipe, since we use @PIPESTATUS (which actually provides more functionality for our purposes).
Aaron Marcuse-Kubitza
06:02 AM Revision 12871: fix: lib/sh/util.sh: echo_func(): file/line #: display with regular color because the lighter color actually draws attention *to* rather than *away from* the faded text
Aaron Marcuse-Kubitza
05:59 AM Revision 12870: lib/sh/util.sh: added plain()
Aaron Marcuse-Kubitza
05:56 AM Revision 12869: inputs/XAL/Specimen/test.xml.ref: updated for sample data.csv, which contains the columns as a CSV. this fixes a bug where a map.csv must be used on a table that contains the same set of columns (ie. not one with no columns if there are any mappings).
Aaron Marcuse-Kubitza
05:50 AM Revision 12868: bugfix: lib/sql_io.py: put_table(): is_literals: `return sql.value(cur): need to use sql.value_or_none() instead to support multi-row functions, such as _split() used in specimens data`
Aaron Marcuse-Kubitza
05:06 AM Revision 12867: fix: inputs/input.Makefile: don't treat *.xml as data files since these are not currently supported
Aaron Marcuse-Kubitza
04:55 AM Revision 12866: lib/runscripts/util.run: on_exit(): documented that users can also override gateway()/fallback() to perform other commands (or no commands) after the script is read
Aaron Marcuse-Kubitza
04:53 AM Revision 12865: bugfix: lib/sh/db.sh: pg_table_exists(): need ! to negate boolean result
Aaron Marcuse-Kubitza
04:44 AM Revision 12864: fix: lib/runscripts/table.run: table_make_install(): need to inform the user when it skips installing a table, because this is often unexpected
Aaron Marcuse-Kubitza
04:43 AM Revision 12863: fix: lib/runscripts/util.run: run_args_cmd(): need to indent the output of the target that it's running
Aaron Marcuse-Kubitza
04:15 AM Revision 12862: lib/runscripts/table.run: removed no longer used datasrc_make_install()
Aaron Marcuse-Kubitza
04:07 AM Revision 12861: fix: lib/sh/util.sh: fade(): use medium gray instead of light gray because it fades on white *and* black backgrounds
Aaron Marcuse-Kubitza
03:54 AM Revision 12860: lib/sh/util.sh: echo_func(): fade the file/line # to avoid distracting from the function call in the default log output
Aaron Marcuse-Kubitza
03:51 AM Revision 12859: lib/sh/util.sh: added fade()
Aaron Marcuse-Kubitza
03:37 AM Revision 12858: lib/sh/util.sh: highlight_msg(): renamed to highlight_log_msg() to clarify that this contains log++-specific functionality
Aaron Marcuse-Kubitza
03:35 AM Revision 12857: lib/sh/util.sh: moved terminal formatting commands to own section
Aaron Marcuse-Kubitza
03:34 AM Revision 12856: lib/sh/util.sh: highlight_msg(): moved formatting code into separate format() function
Aaron Marcuse-Kubitza
03:21 AM Revision 12855: lib/sh/util.sh: dp(): renamed to ps() to corresponding with pv/pf
Aaron Marcuse-Kubitza
03:19 AM Revision 12854: lib/sh/make.sh: echo_target: use `log-- echo_func`, which now puts the target name first but also provides much-needed indentation
Aaron Marcuse-Kubitza
03:16 AM Revision 12853: lib/sh/util.sh: echo_func(): put file/line # *after* function call instead of before so the function name is listed first
Aaron Marcuse-Kubitza
03:13 AM Revision 12852: lib/sh/util.sh: echo_func(): usage: removed no longer used/implemented minor=1 switch. use log++ instead.
Aaron Marcuse-Kubitza
03:07 AM Revision 12851: lib/runscripts/datasrc_dir.run: import(): use new schema/make, schema/rm
Aaron Marcuse-Kubitza
02:59 AM Revision 12850: lib/runscripts/table.run: load_data(): use the much simpler `schema/make` run target, rather than outsourcing to the legacy Makefile via the convoluted datasrc_make_install()/table_make_install()
Aaron Marcuse-Kubitza
02:26 AM Revision 12849: lib/runscripts/datasrc_dir.run: added schema/rm(), schema/make()
Aaron Marcuse-Kubitza
02:19 AM Revision 12848: lib/sh/util.sh: ignore_err_msg(): usage: added $ignore_e param from stderr_matches()
Aaron Marcuse-Kubitza
02:14 AM Revision 12847: lib/runscripts/table.run: psql: always include ; at end of statement
Aaron Marcuse-Kubitza
01:39 AM Revision 12846: fix: lib/sh/db.sh: pg_cmd(): hide PGPASSWORD at the normal verbosity so that the value of it doesn't appear in any log files
Aaron Marcuse-Kubitza
01:08 AM Revision 12845: lib/sh/util.sh: log_hint(): renamed to log_err_hint() for clarity, because this applies only to hints for errors
Aaron Marcuse-Kubitza
01:06 AM Revision 12844: bugfix: lib/sh/util.sh: log_hint!(): use log_err instead of log_info because hints as used here are attached to (possibly benign) errors. for other uses, use mk_hint().
Aaron Marcuse-Kubitza
01:00 AM Revision 12843: fix: lib/sh/util.sh: highlight_msg(): don't ' '-pad already-formatted text
Aaron Marcuse-Kubitza
12:57 AM Revision 12842: lib/sh/util.sh: manual terminal escape sequences: use highlight_msg() instead
Aaron Marcuse-Kubitza
12:53 AM Revision 12841: lib/sh/util.sh: highlight_msg(): auto-add padding around text if there is a background
Aaron Marcuse-Kubitza
12:51 AM Revision 12840: lib/sh/util.sh: highlight_msg(): use $format itself as the $highlight boolean
Aaron Marcuse-Kubitza
12:48 AM Revision 12839: lib/sh/util.sh: highlight_msg(): split apart the testing of $format and can_highlight_msg
Aaron Marcuse-Kubitza
12:39 AM Revision 12838: lib/sh/util.sh: added has_bg()
Aaron Marcuse-Kubitza
12:28 AM Revision 12837: bugfix: lib/sh/util.sh: highlight_msg(): need to reset any existing formatting before applying new formatting
Aaron Marcuse-Kubitza
12:25 AM Revision 12836: lib/sh/util.sh: added mk_hint() and use it in log_hint!()
Aaron Marcuse-Kubitza
12:16 AM Revision 12835: lib/sh/util.sh: bg_cmd(): also log the command being run
Aaron Marcuse-Kubitza
12:07 AM Revision 12834: fix: lib/sh/util.sh: need `function` before functions that have an alias with the same name
Aaron Marcuse-Kubitza
12:04 AM Revision 12833: lib/sh/util.sh: log!(): use new log:()
Aaron Marcuse-Kubitza
12:00 AM Revision 12832: lib/sh/util.sh: added log:(), which sets an explicit log_level. this also simplifies log+().
Aaron Marcuse-Kubitza

03/21/2014

11:55 PM Revision 12831: lib/sh/util.sh: log+(): set log_level before PS4 so that the PS4 expr doesn't also need to add to log_level
Aaron Marcuse-Kubitza
11:51 PM Revision 12830: lib/sh/util.sh: removed no longer needed log+ alias (which had been renamed from clog+)
Aaron Marcuse-Kubitza
11:48 PM Revision 12829: lib/sh/util.sh: clog*: renamed to log* for clarity (possible now that log* is no longer used for function-local log_level setting)
Aaron Marcuse-Kubitza
11:44 PM Revision 12828: *{.sh,run}: local setting of log_level: use log_local instead of relying on the log* aliases, so that these aliases can instead be used for wrapping commands (the more common use case)
Aaron Marcuse-Kubitza
11:40 PM Revision 12827: bugfix: lib/sh/util.sh: verbosity_compat alias: need to use `declare verbosity="$verbosity"` instead of `declare verbosity`, which would just clear $verbosity
Aaron Marcuse-Kubitza
11:38 PM Revision 12826: bugfix: lib/sh/util.sh: verbosity_min alias: need to use `declare verbosity="$verbosity"` instead of log_local now that verbosity is not one of the vars changed by log++
Aaron Marcuse-Kubitza
11:30 PM Revision 12825: lib/sh/util.sh: log+(): use easier-to-understand log_local instead of prefix-assignments to limit assignments to the invoked command
Aaron Marcuse-Kubitza
11:30 PM Revision 12824: lib/sh/util.sh: log+(): use easier-to-understand log_local instead of prefix-assignments to limit assignments to the invoked command
Aaron Marcuse-Kubitza
10:57 PM Revision 12823: *{.sh,run}: use clog* instead of "log*"
Aaron Marcuse-Kubitza
10:45 PM Revision 12822: bugfix: lib/sh/util.sh: log+(): removed spurious ; between setting of PS4 and log_level, which was causing erratic mismatches between PS4 and log_level. (the ; caused $PS4 to be set in the *caller* when invoked via one of the clog* aliases, rather than being passed as a command-specific env var.)
Aaron Marcuse-Kubitza
10:30 PM Revision 12821: lib/sh/util.sh: $verbosity: stay constant at what the user set it to instead of changing in tandem with $log_level, to facilitate debugging verbosity/log_level-related issues
Aaron Marcuse-Kubitza
10:11 PM Revision 12820: lib/sh/util.sh: log+(): usage: use aliases instead of ""-ed function names
Aaron Marcuse-Kubitza
06:58 PM Revision 12819: added schemas/VegCore.ERD.pdf symlink for easy access
Aaron Marcuse-Kubitza
06:50 PM Revision 12818: lib/sh/util.sh: log_err(): use red background for better visibility of errors, in the same way that lib/exc.py print_ex() does for column-based import
Aaron Marcuse-Kubitza
06:44 PM Revision 12817: bugfix: lib/sh/util.sh: removed echo_func in functions used by log++, to avoid spurious highlighted output
Aaron Marcuse-Kubitza
06:40 PM Revision 12816: lib/sh/util.sh: added missing clog+ alias
Aaron Marcuse-Kubitza
06:35 PM Revision 12815: bugfix: lib/sh/util.sh: log_hint(): use the standard log_fd and log_info() format, not err_fd and log_err() format, for hint messages
Aaron Marcuse-Kubitza
06:27 PM Revision 12814: fix: lib/sh/util.sh: log_msg!(): indent each line, not just the first
Aaron Marcuse-Kubitza
06:26 PM Revision 12813: lib/sh/util.sh: added split_lines()
Aaron Marcuse-Kubitza
06:05 PM Revision 12812: lib/sh/util.sh: log(): factored out helper function log_msg!()
Aaron Marcuse-Kubitza
06:00 PM Revision 12811: fix: lib/sh/util.sh: highlight_msg(): bold instead of underlining because the underlining interferes with the readability of the commands
Aaron Marcuse-Kubitza
05:57 PM Revision 12810: lib/sh/util.sh: highlight_msg(): allow turning off formatting w/ empty $format
Aaron Marcuse-Kubitza
05:53 PM Revision 12809: fix: lib/sh/util.sh: log_err() calls: removed manual highlighting
Aaron Marcuse-Kubitza
05:51 PM Revision 12808: lib/sh/util.sh: log_err(): highlight all error messages using highlight_msg()'s new $format
Aaron Marcuse-Kubitza
05:45 PM Revision 12807: lib/sh/util.sh: highlight_msg(): support custom format
Aaron Marcuse-Kubitza
05:35 PM Revision 12806: lib/sh/db.sh: pg_*_exists(): log the DB statements to check this at a higher log_level so that they don't clutter up the log output
Aaron Marcuse-Kubitza
05:25 PM Revision 12805: lib/sh/util.sh: log(): highlight log_level 1 messages to stand out against other output, for easier debugging
Aaron Marcuse-Kubitza
04:31 PM Revision 12804: *{.sh,run}: stderr_matches() wrapper calls: removed no longer needed prep_try/rethrow
Aaron Marcuse-Kubitza
04:12 PM Revision 12803: bugfix: catch(): also need to support $1='' because this is a now a use case of ignore_e()
Aaron Marcuse-Kubitza
04:02 PM Revision 12802: bugfix: lib/sh/util.sh: ignore_err_msg(): also need to ignore false exit status on no match
Aaron Marcuse-Kubitza
03:49 PM Revision 12801: lib/sh/util.sh: stderr_matches(): moved prep_try/rethrow into the function itself so that callers don't have to wrap this function in a complex sequence of prep_try/rethrow statements
Aaron Marcuse-Kubitza
03:42 PM Revision 12800: *{.sh,run}: stderr_matches() wrapper calls: removed no longer needed prep_try/rethrow
Aaron Marcuse-Kubitza
03:42 PM Revision 12799: lib/sh/util.sh: stderr_matches(): moved prep_try/rethrow into the function itself so that callers don't have to wrap this function in a complex sequence of prep_try/rethrow statements
Aaron Marcuse-Kubitza
03:25 PM Revision 12798: lib/sh/util.sh: added rethrow_exit alias
Aaron Marcuse-Kubitza
03:10 PM Revision 12797: fix: lib/sh/db.sh: pg_table_exists(): use stderr_matches() rather than just the exit status. this also avoids highlighting the benign error.
Aaron Marcuse-Kubitza
03:00 PM Revision 12796: fix: lib/sh/db.sh: pg_table_exists(): use stderr_matches() rather than just the exit status. this also avoids highlighting the benign error.
Aaron Marcuse-Kubitza
02:16 AM Revision 12795: fix: inputs/input.Makefile: removed no longer used special handling of XML inputs, support for which was never added to the Makefile. (bin/map, however, does support importing an XML file into a database.) this fixes a bug in XAL, which used to abort with an error but now just imports an empty table.
Aaron Marcuse-Kubitza
12:34 AM Revision 12794: fix: inputs/input.Makefile: %/install: don't ignore errors if table does not exist, to ensure a proper errexit. this is now possible because every dir that this target is being run on should be a data dir. (Source/ used to be a metadata-only dir.)
Aaron Marcuse-Kubitza
12:31 AM Revision 12793: bugfix: inputs/input.Makefile: $(cleanup): need `set -o pipefail`
Aaron Marcuse-Kubitza
12:02 AM Revision 12792: inputs/VegBank/run: `rm=1 import()`: updated runtime (1 h)
Aaron Marcuse-Kubitza

03/20/2014

11:54 PM Revision 12791: inputs/VegBank/taxon_observation.**/test.xml.ref: updated inserted row count
Aaron Marcuse-Kubitza
11:54 PM Revision 12790: inputs/VegBank/projectcontributor_/test.xml.ref: updated inserted row count
Aaron Marcuse-Kubitza
10:53 PM Revision 12789: bugfix: schemas/util.sql: is_constant(util.col_ref): updated to include standard newline at beginning of comment (applies to newly-imported staging tables)
Aaron Marcuse-Kubitza
10:44 PM Revision 12788: bugfix: inputs/VegBank/import_order.txt: added missing project, needed to trigger the staging table renaming for the project table
Aaron Marcuse-Kubitza
10:42 PM Revision 12787: inputs/VegBank/run: documented `rm=1 import()` runtime (>1.5 h)
Aaron Marcuse-Kubitza
10:40 PM Revision 12786: inputs/VegBank/run: documented `datasrc_make sql/install` runtime (25 min)
Aaron Marcuse-Kubitza
08:27 PM Revision 12785: inputs/MO/Specimen/test.xml.ref: updated, which adds dateCollected mappings
Aaron Marcuse-Kubitza
08:20 PM Revision 12784: inputs/WIN/Specimen/test.xml.ref: updated to map.csv, which has eventDate->dateCollected
Aaron Marcuse-Kubitza
08:13 PM Revision 12783: inputs/VegBank/plantconcept_/create.sql: updated runtime (25 min, ~same)
Aaron Marcuse-Kubitza
08:08 PM Revision 12782: lib/sh/make.sh: begin_target: echo all targets to facilitate debugging without needing the verbose stack trace mode
Aaron Marcuse-Kubitza
08:06 PM Revision 12781: bugfix: lib/sh/make.sh: echo_target: don't include filename/line #, since this is not for the stack trace mode
Aaron Marcuse-Kubitza
07:59 PM Revision 12780: lib/sh/make.sh: added echo_target
Aaron Marcuse-Kubitza
07:58 PM Revision 12779: *{.sh,run}: use new begin_target instead of `echo_func; set_make_vars`
Aaron Marcuse-Kubitza
07:51 PM Revision 12778: lib/runscripts/util.run: runscript template: added sample make target, using new make target template
Aaron Marcuse-Kubitza
07:48 PM Revision 12777: lib/sh/make.sh: added make target template
Aaron Marcuse-Kubitza
07:47 PM Revision 12776: inputs/VegBank/plot/postprocess.sql: remove institutions that we have direct data for: CVS: updated runtime (same)
Aaron Marcuse-Kubitza
07:41 PM Revision 12775: lib/sh/make.sh: added begin_target alias
Aaron Marcuse-Kubitza
07:17 PM Revision 12774: lib/runscripts/datasrc_dir.run: documented how to reinstall staging tables (`rm=1 .../run import`)
Aaron Marcuse-Kubitza
07:13 PM Revision 12773: bugfix: *{.sh,run}: stderr_matches() wrapper callers: use the required wrapper caller usage, which now includes rethrow and prep_try
Aaron Marcuse-Kubitza
07:09 PM Revision 12772: bugfix: lib/sh/util.sh: rethrow*: only `return` if $e is actually nonzero, because rethrow is now being used as a catch-all in situations where there might not be an error
Aaron Marcuse-Kubitza
07:06 PM Revision 12771: lib/sh/util.sh: prep_try: initialize $e to 0 to simplify error-handling coding
Aaron Marcuse-Kubitza
06:56 PM Revision 12770: stderr_matches(): wrapper caller usage: added alternative usage when using `||`
Aaron Marcuse-Kubitza
06:50 PM Revision 12769: lib/sh/util.sh: stderr_matches(): wrapper caller usage: documented usage for a negated condition (ie. prefixed w/ !)
Aaron Marcuse-Kubitza
06:48 PM Revision 12768: lib/sh/util.sh: stderr_matches(): usage: split into wrapper usage and wrapper caller usage for clarity
Aaron Marcuse-Kubitza
06:45 PM Revision 12767: fix: *{.sh,run}: stderr_matches() wrappers: usage: added `rethrow`
Aaron Marcuse-Kubitza
06:45 PM Revision 12766: fix: *{.sh,run}: stderr_matches() wrappers: usage: added `rethrow`
Aaron Marcuse-Kubitza
06:43 PM Revision 12765: fix: lib/sh/util.sh: stderr_matches(): usage: `rethrow` must be called right after stderr_matches(), to avoid calling running other commands if there is an error
Aaron Marcuse-Kubitza
06:40 PM Revision 12764: fix: lib/sh/util.sh: stderr_matches(): when using $ignore_e, also set benign_error=1 to suppress the highlighting of the error
Aaron Marcuse-Kubitza
06:36 PM Revision 12763: bugfix: lib/sh/db.sh: pg_schema_exists(): need to ignore benign error exit status from the "cannot create temporary relation in non-temporary schema" error
Aaron Marcuse-Kubitza
06:34 PM Revision 12762: lib/sh/util.sh: stderr_matches(): supporting ignoring any benign error exit status associated with the error message being tested for
Aaron Marcuse-Kubitza
06:18 PM Revision 12761: lib/sh/util.sh: stderr_matches(): usage: documented where any ignore_e statement would go
Aaron Marcuse-Kubitza
05:37 PM Revision 12760: bugfix: lib/sh/util.sh: stderr_matches(): can't use `try` because this clears the exit status, which is needed for @PIPESTATUS to work. to support this, also need to avoid errexiting since @PIPESTATUS will be used instead.
Aaron Marcuse-Kubitza
01:25 AM Task #878 (New): fix crow's foot notation in ERD
* when there is an open circle on the straight end, also put an open circle on the crow's foot end, so that the outgo... Aaron Marcuse-Kubitza

03/18/2014

06:18 PM Revision 12759: lib/sh/util.sh: added dp(), which debug-prints a message
Aaron Marcuse-Kubitza
05:47 PM Revision 12758: bugfix: inputs/VegBank/plot/postprocess.sql: use CVS.plot_ instead because that has the renamed staging table columns, and is compatible with auto-renaming of the SQL script columns
Aaron Marcuse-Kubitza
05:41 PM Revision 12757: inputs/CVS/plot_/postprocess.sql: add unique constraint on locationName (analogous to the unique constraint in plot), for use by inputs/VegBank/plot/postprocess.sql in removing inter-datasource duplication
Aaron Marcuse-Kubitza
05:26 PM Revision 12756: fix: schemas/util.sql: explain2notice_msg(): don't include EXPLAIN output for simple, single-value queries, to avoid cluttering up the log output
Aaron Marcuse-Kubitza
05:22 PM Revision 12755: schemas/util.sql: added fold_explain_msg()
Aaron Marcuse-Kubitza
05:22 PM Revision 12754: bugfix: bin/repl: only use excluded_prefix_re/excluded_suffix_re in text mode (used in renaming columns in SQL scripts), to prevent the special coding for column renames from also affecting regular regexp/word replacements
Aaron Marcuse-Kubitza
05:10 PM Revision 12753: inputs/VegBank/taxon_observation.**/test.xml.ref: updated inserted row count
Aaron Marcuse-Kubitza
05:34 AM Revision 12752: inputs/run: postprocess(): documented runtime (30 min)
Aaron Marcuse-Kubitza
05:16 AM Revision 12751: bugfix: inputs/input.Makefile: %/postprocess.sql: don't perform replacements using map.csv, because map.csv is not idempotent. this functionality was only there to facilitate switching to new-style import, which is now largely done. (the remaining datasources NVS, SALVIAS, TEAM contain only 1 postprocess.sql: inputs/SALVIAS/projects/postprocess.sql (`st inputs/{NVS,SALVIAS,TEAM}/*/postprocess.sql`).)
Aaron Marcuse-Kubitza
04:59 AM Revision 12750: bugfix: bin/repl: text mode: also don't match if it's part of a '-'-separated identifier
Aaron Marcuse-Kubitza
04:57 AM Revision 12749: bugfix: bin/repl: text mode: also don't match if it's a word in a sentence
Aaron Marcuse-Kubitza
04:42 AM Revision 12748: bugfix: bin/repl: text mode: turned off the suffix matching, because there are cases where a mapping adds a suffix which would cause the same replacement to be performed repeatedly
Aaron Marcuse-Kubitza
04:33 AM Revision 12747: inputs/input.Makefile: %/postprocess.sql: *always* run this, not just if the associated map spreadsheets change, to avoid needing to `touch` them to cause %/postprocess.sql to run
Aaron Marcuse-Kubitza
04:25 AM Revision 12746: bin/repl: text mode: exclude prefixes that should not cause replacement, to avoid doubling leading *
Aaron Marcuse-Kubitza
04:24 AM Revision 12745: fix: inputs/*/*/postprocess.sql: un-doubled *
Aaron Marcuse-Kubitza
04:06 AM Revision 12744: bugfix: inputs/input.Makefile: %/postprocess.sql: also need to apply renames from mappings/VegCore.thesaurus.csv, as these have been applied to map.csv
Aaron Marcuse-Kubitza
04:04 AM Revision 12743: bugfix: lib/runscripts/table.run: custom_postprocess(): need to apply renames to SQL statements in postprocess.sql before it can be run
Aaron Marcuse-Kubitza
04:03 AM Revision 12742: bin/repl: text mode: also match w/ suffix (eg. _verbatim)
Aaron Marcuse-Kubitza
03:10 AM Task #577 (Rejected): use views instead of map spreadsheets to store the datasource mappings
the staging table columns are now renamed instead of creating a view that maps the columns Aaron Marcuse-Kubitza
03:07 AM Task #584 (Resolved): enable running all the import steps from one runscript
Aaron Marcuse-Kubitza
02:59 AM Revision 12741: bugfix: /README.TXT: Maintenance: VegCore data dictionary: apply new data dict mappings: need to use postprocess rather than import runscript target, so that the command also works on an svn checkout without the flat files (the flat files are not needed for the staging table renaming)
Aaron Marcuse-Kubitza

03/15/2014

07:20 PM Revision 12740: lib/sh/db.sh: psql(): $verbose_ok: renamed to $bypass_ok for clarity, because this applies only to the `--output /dev/fd/41` bypass (which when not possible, requires turning off verbose output
Aaron Marcuse-Kubitza
07:15 PM Revision 12739: fix: lib/sh/db.sh: psql(): added $output_data switch analogous to what mysql() has. this causes query results of eg. void-returning functions to be correctly filtered by the logging mechanism, rather than output to stdout.
Aaron Marcuse-Kubitza
06:42 PM Revision 12738: fix: lib/sh/db.sh: psql(): verbosity=0 (errors only) mode: use `SET client_min_messages = WARNING;` instead of NOTICE to hide verbose messages within psql as well
Aaron Marcuse-Kubitza
06:31 PM Revision 12737: lib/sh/db.sh: psql(): replaced `test "$verbose_ok" && can_log` with bool var $verbose_
Aaron Marcuse-Kubitza
06:29 PM Revision 12736: fix: lib/sh/db.sh: psql(): $verbose_: renamed to $verbose_ok for clarity
Aaron Marcuse-Kubitza
06:13 PM Revision 12735: fix: lib/sh/util.sh: stdout_contains(): add another pipe_delay because the `grep` statement was sometimes getting printed before its filtered output
Aaron Marcuse-Kubitza
05:47 PM Revision 12734: bugfix: schemas/util.sql: set_col_types(): need to COALESCE() the executed SQL to '' because util.eval() does not support NULL (and shouldn't, because this indicates a missing COALESCE() in constructing the statement)
Aaron Marcuse-Kubitza
05:43 PM Revision 12733: schemas/util.sql: set_col_types(): use simpler util.eval() instead of manual EXECUTE/util.debug_print_sql()
Aaron Marcuse-Kubitza
05:37 PM Revision 12732: schemas/util.sql: set_col_types(): use string_agg() instead of array_to_string(ARRAY(...)) for clarity
Aaron Marcuse-Kubitza
05:28 PM Revision 12731: bugfix: lib/sh/util.sh: die_error_hidden(): min verbosity to display error should not be hardcoded
Aaron Marcuse-Kubitza
05:18 PM Revision 12730: lib/sh/db.sh: psql(): "to see error details" msg: use new die_error_hidden()
Aaron Marcuse-Kubitza
05:18 PM Revision 12729: lib/sh/util.sh: added die_error_hidden()
Aaron Marcuse-Kubitza
05:13 PM Revision 12728: lib/sh/db.sh: psql(): "to see error details" msg: use new log_hint()
Aaron Marcuse-Kubitza
05:13 PM Revision 12727: lib/sh/util.sh: added log_hint(), whose msg is only displayed if not a benign error
Aaron Marcuse-Kubitza
05:03 PM Revision 12726: bugfix: lib/sh/db.sh: psql(): "to see error details" msg: also don't print it for benign errors ($benign_error)
Aaron Marcuse-Kubitza
05:00 PM Revision 12725: schemas/util.sql: added mk_not_null()
Aaron Marcuse-Kubitza
04:42 PM Revision 12724: lib/sh/db.sh: psql(): on error, display message describing how to see error details (prepend `vb=2` to the command)
Aaron Marcuse-Kubitza
04:31 PM Revision 12723: bugfix: lib/sh/util.sh: log_err(): don't override verbosity manually, as this will not set log_level or PS4. instead, use new log! , which sets these correctly.
Aaron Marcuse-Kubitza
04:24 PM Revision 12722: lib/sh/util.sh: added log! , which force-displays next log message
Aaron Marcuse-Kubitza
03:59 PM Revision 12721: lib/sh/util.sh: save_e: made it idempotent so that it also works if save_e was already called
Aaron Marcuse-Kubitza
03:57 PM Revision 12720: lib/sh/util.sh: save_e: made it idempotent so that it also works if save_e was already called
Aaron Marcuse-Kubitza
03:37 PM Revision 12719: lib/sh/util.sh: rethrow: documented why can't use `(exit "$e")` (bash bug that prevents errexit)
Aaron Marcuse-Kubitza

03/14/2014

09:09 PM Revision 12718: bugfix: /README.TXT: Maintenance: VegCore data dictionary: apply new data dict mappings: need to use import rather than mappings runscript target, to rename the staging tables
Aaron Marcuse-Kubitza
09:06 PM Revision 12717: bugfix: /README.TXT: Maintenance: VegCore data dictionary: also need to apply new data dict mappings on vegbiendev
Aaron Marcuse-Kubitza
08:19 PM Revision 12716: fix: /README.TXT: Maintenance: VegCore data dictionary: added steps to apply the new data dictionary mappings to the datasource mappings and staging tables
Aaron Marcuse-Kubitza
07:53 PM Revision 12715: bugfix: lib/runscripts/util.run: $auto_ignore: need to unexport it so don't pass this to invoked scripts except through fwd()
Aaron Marcuse-Kubitza
07:35 PM Revision 12714: added inputs/run, which runs all the inputs' runscripts using the new auto-forwarding
Aaron Marcuse-Kubitza
07:34 PM Revision 12713: bugfix: lib/runscripts/util.run: auto_fwd's fallback() must be set *after* auto_ignore's fallback() to overwrite it (auto_ignore should only apply if an error would otherwise have been generated by the fallback)
Aaron Marcuse-Kubitza
07:30 PM Revision 12712: lib/runscripts/util.run: fwd(): support subdirs that don't contain a runscript, so that the default value of @subdirs will work in most cases
Aaron Marcuse-Kubitza
07:29 PM Revision 12711: lib/runscripts/util.run: fwd(): set default @subdirs (`{.,}*/`)
Aaron Marcuse-Kubitza
07:26 PM Revision 12710: lib/sh/util.sh: added enter_top_dir and use it in in_top_dir
Aaron Marcuse-Kubitza
07:12 PM Revision 12709: fix: lib/sh/util.sh: commands run inside $(...): need to run with log++ so that these aren't normally debug-printed
Aaron Marcuse-Kubitza
06:41 PM Revision 12708: lib/sh/util.sh: added pv(), which debug-prints var(s)
Aaron Marcuse-Kubitza
06:40 PM Revision 12707: lib/sh/util.sh: added wildcard.()
Aaron Marcuse-Kubitza
06:40 PM Revision 12706: lib/sh/util.sh: added wildcard/()
Aaron Marcuse-Kubitza
06:40 PM Revision 12705: lib/sh/util.sh: added esc_args()
Aaron Marcuse-Kubitza
06:33 PM Revision 12704: web/links/index.htm: updated to Firefox bookmarks: Google Drive: listed bugs that make it very difficult to use (the need to re-download all files when reconnecting a client to an account). added recommendation not to use it (unstable).
Aaron Marcuse-Kubitza
05:25 PM Revision 12703: removed unused inputs/table.run. inputs/*/table.run include lib/runscripts/table.run directly.
Aaron Marcuse-Kubitza
05:02 PM Revision 12702: lib/runscripts/datasrc_dir.run: removed postprocess(), which now does the same thing its auto-forwarded equivalent would
Aaron Marcuse-Kubitza
05:01 PM Revision 12701: lib/runscripts/datasrc_dir.run: removed separate @table_subdirs, because the table-only targets can now safely be invoked on all subdirs, being auto-ignored in subdirs that don't support them
Aaron Marcuse-Kubitza
04:53 PM Revision 12700: lib/runscripts/util.run: fwd(): enable $auto_ignore so that each subdir doesn't have to have a definition for the forwarded target
Aaron Marcuse-Kubitza
04:52 PM Revision 12699: lib/runscripts/util.run: added $auto_ignore switch, which causes fallback() not to generate an error that a non-existant target doesn't exist
Aaron Marcuse-Kubitza
03:55 PM Revision 12698: lib/runscripts/datasrc_dir.run: use new fwd_self alias
Aaron Marcuse-Kubitza
03:55 PM Revision 12697: lib/runscripts/util.run: added fwd_self alias
Aaron Marcuse-Kubitza
03:49 PM Revision 12696: lib/runscripts/datasrc_dir.run: enable $auto_fwd, to create the functionality of lib/forwarding.Makefile's `%` target
Aaron Marcuse-Kubitza
03:47 PM Revision 12695: lib/runscripts/util.run: added $auto_fwd switch
Aaron Marcuse-Kubitza
03:36 PM Revision 12694: bugfix: lib/runscripts/util.run: gateway(): need to use is_callable() rather than func_exists() to check whether the target exists, because external commands (eg. echo) are supported as targets, too
Aaron Marcuse-Kubitza
03:32 PM Revision 12693: lib/sh/util.sh: added is_callable()
Aaron Marcuse-Kubitza
03:23 PM Revision 12692: lib/runscripts/util.run: support custom handlers for *all* targets (gateway()) as well as targets w/o function (fallback())
Aaron Marcuse-Kubitza
03:03 PM Revision 12691: lib/runscripts/table.run: remake_VegBIEN_mappings(): renamed to just mappings() since action make targets should be short names
Aaron Marcuse-Kubitza
07:32 AM Revision 12690: lib/sh/util.sh: stderr_matches(): inline the stderr_matches alias to avoid needing to quote stderr_matches as "stderr_matches" in the most common use case (with pattern as a prefix env var)
Aaron Marcuse-Kubitza
07:29 AM Revision 12689: bugfix: lib/sh/util.sh: stderr_matches(): when passing `pattern=...` as a prefix env var, must be invoked as `"stderr_matches"` to avoid the env var applying to the prep_try portion of the stderr_matches alias
Aaron Marcuse-Kubitza
06:38 AM Revision 12688: added schemas/VegCore/Brad_Boyle/bien3_data_provenance_use_cases.docx* from e-mail from Brad
Aaron Marcuse-Kubitza

03/13/2014

06:53 PM Revision 12687: schemas/vegbien.sql: _plots_08_list_of_plots_which_use_percent_cover, _plots_15_pct_cover_of_each_verb_taxon_in_each_plot_in_each_pro: reran with fixes, which removes the incorrectly auto-added copies columns. (they were only able to be auto-added because the tables had no rows.)
Aaron Marcuse-Kubitza
06:42 PM Revision 12686: bugfix: drop_column(regclass[]): need to run `SELECT NULL::void;` at end of function to avoid folding away functions called in previous query
Aaron Marcuse-Kubitza
06:40 PM Revision 12685: fix: schemas/util.sql: diff(regclass, regclass): moved try_create() of copies column in parent table to auto_rm_freq() so that it would only happen if both tables actually contain a copies column (otherwise, the try_create() will create an empty copies column if both tables are empty)
Aaron Marcuse-Kubitza
06:33 PM Revision 12684: schemas/util.sql: try_create(): also handle "child table is missing column" errors
Aaron Marcuse-Kubitza
05:33 PM Revision 12683: schemas/util.sql: added coalesce(anyarray), which can be used to force evaluation of all values of a COALESCE()
Aaron Marcuse-Kubitza
05:14 PM Revision 12682: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: updated to DB
Aaron Marcuse-Kubitza
05:13 PM Revision 12681: fix: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: removed `public.` qualifier
Aaron Marcuse-Kubitza
05:04 PM Revision 12680: schemas/vegbien.sql: implemented _plots_19_count_of_censuses_per_plot_in_each_project
Aaron Marcuse-Kubitza
05:03 PM Revision 12679: inputs/SALVIAS/validations.sql: implemented _plots_19_count_of_censuses_per_plot_in_each_project
Aaron Marcuse-Kubitza
09:08 AM Revision 12678: validation/aggregating/plots/FIA/bien3_validations_fia_input.sql: _plots_19_count_of_inventories_per_plot_in_each_project: renamed to _plots_19_count_of_censuses_per_plot_in_each_project for clarity
Aaron Marcuse-Kubitza
09:00 AM Revision 12677: validation/aggregating/plots/FIA/bien3_validations_fia_input.sql*: updated from Brad's latest e-mail
Aaron Marcuse-Kubitza
02:06 AM Revision 12676: schemas/util.sql: EXCEPTION blocks with multiple exception types: use OR to merge exception types into one WHEN block
Aaron Marcuse-Kubitza
01:50 AM Revision 12675: schemas/vegbien.sql: public_validations: schema comment: changed "to sync the queries with schemas/vegbien.sql" to "to reset the queries to what's in schemas/vegbien.sql" for clarity
Aaron Marcuse-Kubitza
01:46 AM Revision 12674: fix: schemas/vegbien.sql: schema comment: to reset the key and value columns for all validations queries: updated running of custom keys() functions to use keys() types instead
Aaron Marcuse-Kubitza
01:14 AM Revision 12673: schemas/vegbien.sql: schema comment: to sync the queries with schemas/vegbien.sql: use new public_validations.rm_output_queries() instead of rm_all_queries() to leave the input queries in place
Aaron Marcuse-Kubitza
01:12 AM Revision 12672: schemas/vegbien.sql: schema comment: documented how to reset the key and value columns for all validations queries
Aaron Marcuse-Kubitza

03/12/2014

11:56 PM Revision 12671: schemas/util.sql: mk_keys_func(regtype, util.col_cast[]): indicate in the type comment that the keys() type is autogenerated, so it can be distinguished from custom keys() types when bulk-regenerating keys() types
Aaron Marcuse-Kubitza
11:53 PM Revision 12670: bugfix: schemas/util.sql: show_relations_like(): also need to include composite types, as these are also relations (and are expected to be included by callers of show_relations_like())
Aaron Marcuse-Kubitza
11:49 PM Revision 12669: bugfix: schemas/vegbien.sql: rm_output_queries(): also need to include keys_* and values__* types, as these are also associated with the query
Aaron Marcuse-Kubitza
11:40 PM Revision 12668: schemas/util.sql: added debug_print_func_call(text) and use it where applicable
Aaron Marcuse-Kubitza
11:33 PM Revision 12667: schemas/util.sql: drop_relations_like(): debug-print the regexps so that you can tell which tables it's trying to match
Aaron Marcuse-Kubitza
06:26 PM Revision 12666: schemas/vegbien.sql: public_validations: regenerated ~type tables, which adds `copies` columns for queries with a mismatch in the # of occurrences of each row
Aaron Marcuse-Kubitza
06:18 PM Revision 12665: bugfix: schemas/vegbien.sql: public_validations.validation_views(): need to include views with letters after the query # (eg. _plots_06a_list_of_stems)
Aaron Marcuse-Kubitza
05:41 PM Revision 12664: schemas/util.sql: removed no longer used to_freq(regclass, drop_if_always_1). use to_freq(regclass) and auto_rm_freq() instead.
Aaron Marcuse-Kubitza
05:40 PM Revision 12663: bugfix: schemas/util.sql: diff(regclass, regclass): only drop freq column if *all* tables have all 1s
Aaron Marcuse-Kubitza
05:38 PM Revision 12662: schemas/util.sql: auto_rm_freq(): accept multiple tables, so the freq column is only dropped if *all* tables have all 1s
Aaron Marcuse-Kubitza
05:36 PM Revision 12661: schemas/util.sql: added freq_always_1(regclass[])
Aaron Marcuse-Kubitza
05:35 PM Revision 12660: schemas/util.sql: added drop_column(regclass[])
Aaron Marcuse-Kubitza
05:04 PM Revision 12659: schemas/util.sql: added parent(regclass)
Aaron Marcuse-Kubitza
04:48 PM Revision 12658: schemas/util.sql: try_create(): also handle not_null_violation, which is thrown when trying to add a NOT NULL column to a parent table, which cascades to a child table whose values for the new column will be NULL
Aaron Marcuse-Kubitza
04:44 PM Revision 12657: bugfix: schemas/util.sql: diff(text, text): also need to cast left_/right_ to base type for the IS DISTINCT FROM filter, because the WHERE clause apparently does *not* use columns from the SELECT list, even though GROUP BY and ORDER BY do
Aaron Marcuse-Kubitza
04:13 PM Revision 12656: schemas/util.sql: added to_freq(regclass, drop_if_always_1)
Aaron Marcuse-Kubitza
04:04 PM Revision 12655: schemas/util.sql: added auto_rm_freq(regclass)
Aaron Marcuse-Kubitza
03:53 PM Revision 12654: schemas/util.sql: added freq_always_1(regclass)
Aaron Marcuse-Kubitza
03:00 PM Revision 12653: bugfix: schemas/util.sql: diff(regclass, regclass): need to create a diff when the # of copies of a row differs between the tables. this uses new util.to_freq().
Aaron Marcuse-Kubitza
02:44 PM Revision 12652: schemas/util.sql: added to_freq(regclass)
Aaron Marcuse-Kubitza
02:43 PM Revision 12651: schemas/util.sql: added populate_table(regclass, text)
Aaron Marcuse-Kubitza
01:11 PM Revision 12650: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: updated to DB
Aaron Marcuse-Kubitza
12:53 PM Revision 12649: schemas/util.sql: added copy_types_and_data(regclass, text)
Aaron Marcuse-Kubitza
04:44 AM Revision 12648: schemas/vegbien.sql: public_validations schema comment: added instructions to change the key and value columns for a validations query
Aaron Marcuse-Kubitza
04:41 AM Revision 12647: schemas/vegbien.sql: implemented _plots_16_intercepts_for_each_verb_taxon_in_each_plot_each_proj
Aaron Marcuse-Kubitza
03:44 AM Revision 12646: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: updated to DB
Aaron Marcuse-Kubitza
03:44 AM Revision 12645: fix: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: removed `public.` qualifier
Aaron Marcuse-Kubitza
03:35 AM Revision 12644: schemas/vegbien.sql: implemented _plots_09_list_of_plots_which_use_line_intercept
Aaron Marcuse-Kubitza
03:20 AM Revision 12643: schemas/vegbien.sql: public_validations: queries that use EXISTS(): join locationevent.plot_id to plot.plot_id directly instead of going via location.plot_location_id
Aaron Marcuse-Kubitza
03:04 AM Revision 12642: schemas/vegbien.sql: implemented _plots_08_list_of_plots_which_use_percent_cover
Aaron Marcuse-Kubitza
12:04 AM Revision 12641: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: updated to DB
Aaron Marcuse-Kubitza
12:01 AM Revision 12640: schemas/vegbien.sql: implemented _plots_07_list_of_plots_which_use_counts_of_indiv_per_species
Aaron Marcuse-Kubitza

03/11/2014

09:57 PM Revision 12639: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: updated to DB
Aaron Marcuse-Kubitza
09:56 PM Revision 12638: bugfix: inputs/SALVIAS/validations.sql: _plots_07_list_of_plots_with_counts_of_individuals_per_species: renamed to _plots_07_list_of_plots_*which_use*_... because this query is not intended to include the actual counts, just to say which plots have them (the correct "which use" wording is also used in queries #8, 9)
Aaron Marcuse-Kubitza
04:05 PM Revision 12637: web/links/index.htm: updated to Firefox bookmarks: PostgreSQL: query planner: documented how to prevent incorrect query plans (`SET enable_seqscan = off;`, etc.)
Aaron Marcuse-Kubitza
03:38 PM Revision 12636: web/links/index.htm: updated to Firefox bookmarks: PostgreSQL: query planner: documented that incorrect query plans are an ongoing bug in Postgres, because it does not support index hints and by default does not follow the join order. specifically, Postgres often does the following things in query plans which should normally never be done:
* performs a sequential scan when an index is available (because it incorrectly thinks there are too many dead rows i... Aaron Marcuse-Kubitza

03/07/2014

10:49 PM Revision 12635: schemas/vegbien.sql, inputs/SALVIAS/validations.sql: added _plots_06a_list_of_stems, for use in figuring out the diff in _plots_06_list_of_plots_with_stem_measurements
Aaron Marcuse-Kubitza
09:53 PM Revision 12634: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: updated to DB
Aaron Marcuse-Kubitza
09:50 PM Revision 12633: schemas/vegbien.sql: plot: removed explicit column lists added in the autorename of plot.location_id->plot_id
Aaron Marcuse-Kubitza
09:41 PM Revision 12632: schemas/vegbien.sql: plot: renamed pkey to plot_id. note that the field is autorenamed in all validation views which use it.
Aaron Marcuse-Kubitza
09:18 PM Revision 12631: schemas/vegbien.sql: locationevent: added autopopulated plot_id column which points to the outermost plot of the locationevent's location
Aaron Marcuse-Kubitza
08:55 PM Revision 12630: bugfix: schemas/vegbien.sql: locationevent: added missing fkey on place_visit_id
Aaron Marcuse-Kubitza
04:42 PM Revision 12629: bugfix: schemas/vegbien.sql: _plots_06_list_of_plots_with_stem_measurements: only include stemobservation records which have actual stem IDs, not merely stem-related measurements (DBH, etc.)
Aaron Marcuse-Kubitza
05:51 AM Revision 12628: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: added `SET enable_seqscan = off;` to match what is done by rematerialize_out_view() to run the queries properly
Aaron Marcuse-Kubitza
05:42 AM Revision 12627: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: updated to DB
Aaron Marcuse-Kubitza
05:35 AM Revision 12626: bugfix: schemas/vegbien.sql: _plots_06_list_of_plots_with_stem_measurements: LEFT JOIN to project instead of inner joining, to get Postgres to use the right query plan. this is the last change needed to make query #6 runnable.
Aaron Marcuse-Kubitza
05:25 AM Revision 12625: bugfix: schemas/vegbien.sql: rematerialize_out_view(): run all queries with `SET enable_seqscan = off` to avoid slow query plans. this fixes _plots_06_list_of_plots_with_stem_measurements and significantly speeds up _plots_10_count_of_individuals_per_plot_in_each_project (and possibly others).
Aaron Marcuse-Kubitza
05:23 AM Revision 12624: schemas/vegbien.sql: locationevent: documented `CREATE INDEX locationevent_place_visit_id` runtime (3 min)
Aaron Marcuse-Kubitza
04:53 AM Revision 12623: fix: schemas/vegbien.sql: locationevent: added locationevent_place_visit_id index to facilitate joins to place_visit_id in the validations queries
Aaron Marcuse-Kubitza
02:26 AM Revision 12622: web/links/index.htm: updated to Firefox bookmarks: PostgreSQL: added description of join_collapse_limit config param (which should be turned off, although it is on by default). added links for using TIDs ("the fastest possible access to a single row").
Aaron Marcuse-Kubitza

03/06/2014

10:45 PM Revision 12621: bugfix: schemas/vegbien.sql: source_by_shortname(): documented that in some cases, it is actually a bad idea to use a nested SELECT, because this will prevent Postgres from using an index scan (causing an equally bad slowdown as not inlining in cases where a nested SELECT is required).
Aaron Marcuse-Kubitza
10:26 PM Revision 12620: schemas/postgresql.conf: log_min_messages: dropped the verbosity back down to the default, to avoid clogging up the logs
Aaron Marcuse-Kubitza
10:21 PM Revision 12619: schemas/vegbien.sql: locationevent: documented `VACUUM ANALYZE` runtime (20 min)
Aaron Marcuse-Kubitza
09:51 PM Revision 12618: schemas/postgresql.conf: log_min_messages: show what autovacuum is doing
Aaron Marcuse-Kubitza
09:40 PM Revision 12617: fix: schemas/postgresql.conf: disable autovacuum_vacuum_cost_delay to avoid stalling autovacuuming due to a concurrent query, as this can prevent autovacuuming from happening altogether (http://vegpath.org/links/#PostgreSQL:%20Documentation:%209.3:%20Resource%20Consumption:%2018.4.4.%20Cost-based%20Vacuum%20Delay)
Aaron Marcuse-Kubitza
09:37 PM Revision 12616: web/links/index.htm: updated to Firefox bookmarks: PostgreSQL: added links for troubleshooting autovacuuming (which can slow queries down significantly when it isn't happening for any tables)
Aaron Marcuse-Kubitza
07:35 PM Revision 12615: schemas/vegbien.sql: location: documented `CREATE INDEX plot_source_id` runtime (5 min)
Aaron Marcuse-Kubitza
07:30 PM Revision 12614: fix: schemas/vegbien.sql: location: added plot_source_id index to provide the equivalent of the location.source_id index for outer plots. this will help Postgres choose the right query plans in queries involving outer plots.
Aaron Marcuse-Kubitza
11:30 AM Revision 12613: planning/meetings/BIEN conference call availability.xlsx: updated
Aaron Marcuse-Kubitza
11:06 AM Revision 12612: bugfix: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: need to escape the quotes in \set ... 'SALVIAS'
Aaron Marcuse-Kubitza
11:04 AM Revision 12611: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: removed `SET search_path TO public;` since this is the default
Aaron Marcuse-Kubitza
11:03 AM Revision 12610: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: search_path: removed public_validations since we are not creating views
Aaron Marcuse-Kubitza
11:02 AM Revision 12609: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: use psql var :datasource instead of current_schema() so that the queries are runnable without special configuration of the search_path
Aaron Marcuse-Kubitza
10:59 AM Revision 12608: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: removed `CREATE OR REPLACE VIEW` so the validations views are not unintentionally replaced when running this file
Aaron Marcuse-Kubitza
10:57 AM Revision 12607: validation/aggregating/plots/bien3_validations_salvias_vegbien.sql: updated from DB
Aaron Marcuse-Kubitza
08:57 AM Revision 12606: schemas/vegbien.sql: _plots_18_list_of_subplots_codes_for_each_plot_for_each_project: added ~type table
Aaron Marcuse-Kubitza
08:52 AM Revision 12605: fix: inputs/SALVIAS/validations.sql: _plots_18_list_of_subplots_codes_for_each_plot_for_each_project: changed columns to match output query
Aaron Marcuse-Kubitza
08:31 AM Revision 12604: schemas/vegbien.sql: _plots_15_pct_cover_of_each_verb_taxon_in_each_plot_in_each_pro: added ~type table
Aaron Marcuse-Kubitza
08:29 AM Revision 12603: fix: inputs/SALVIAS/validations.sql: _plots_15_pct_cover_of_each_verb_taxon_in_each_plot_in_each_pro: changed types to match output query
Aaron Marcuse-Kubitza
08:14 AM Revision 12602: bugfix: inputs/SALVIAS/validations.sql: _plots_15_pct_cover_of_each_verb_taxon_in_each_plot_in_each_pro: changed summarizing column from mean_cover->totalpercentcover to match output query
Aaron Marcuse-Kubitza
08:12 AM Revision 12601: bugfix: inputs/SALVIAS/validations.sql: _plots_10a_aggregate_observation_individual_counts: changed individual_id type to match output query
Aaron Marcuse-Kubitza
02:18 AM Revision 12600: bugfix: schemas/Makefile: `%/install: vegbien.sql`: also need to match `public_validations` when used as a schema-qualifier (public_validations._), and after a cast (::) to a schema-qualified type. these occur in schema-qualified casts to the custom return type in the keys() functions.
Aaron Marcuse-Kubitza
01:59 AM Revision 12599: bugfix: schemas/Makefile: `%/install: vegbien.sql`: sed expr: need to use '' instead of "" because $(*q) may contain "
Aaron Marcuse-Kubitza
12:33 AM Revision 12598: bugfix: schemas/vegbien.sql: _plots_10a_aggregate_observation_individual_counts: need to use taxonoccurrence.sourceaccessioncode, not aggregateoccurrence.sourceaccessioncode, because aggregateoccurrence.sourceaccessioncode is not populated
Aaron Marcuse-Kubitza
12:09 AM Revision 12597: schemas/vegbien.sql: public_validations schema comment: documented how to remove a validations query so its columns can be changed (use public_validations.rm_query_view())
Aaron Marcuse-Kubitza
12:07 AM Revision 12596: schemas/vegbien.sql, inputs/SALVIAS/validations.sql: added _plots_10a_aggregate_observation_individual_counts, for use in debugging diffs in _plots_10_count_of_individuals_per_plot_in_each_proj
Aaron Marcuse-Kubitza
12:00 AM Revision 12595: schemas/util.sql: create_if_not_exists(): also support `CREATE FUNCTION` (by handling duplicate_function exceptions)
Aaron Marcuse-Kubitza
 

Also available in: Atom