Project

General

Profile

Activity

From 03/30/2014 to 04/28/2014

04/28/2014

11:11 PM Revision 13329: fix: exports/.rsync_filter.upload: *.csv: don't allow test runs' exports to overwrite the backed up ones
Aaron Marcuse-Kubitza

04/25/2014

11:38 PM Revision 13328: fix: lib/sh/db.sh: psql(): removed debugging changes
Aaron Marcuse-Kubitza
11:36 PM Revision 13327: bugfix: lib/sh/util.sh: highlight_log_msg(): when not can_highlight_log_msg, need to remove any surrounding formatting
Aaron Marcuse-Kubitza
11:01 PM Revision 13326: fix: lib/sh/util.sh: die_error_hidden(): always log local vars at same log_level as echo_func
Aaron Marcuse-Kubitza
10:56 PM Revision 13325: fix: *{.sh,run}: always log kw_params at same log_level as echo_func
Aaron Marcuse-Kubitza
06:27 PM Revision 13324: lib/sh/util.sh: split_lines(): usage: matched up and synced different syntaxes
Aaron Marcuse-Kubitza
06:22 PM Revision 13323: bugfix: lib/sh/util.sh: log_msg!(): split_lines does not support being invoked by wrapper; need to use `declare lines; wrapper "split_lines" str` instead
Aaron Marcuse-Kubitza
06:21 PM Revision 13322: fix: lib/sh/util.sh: split_lines(): usage: documented different syntax for when using wrapper
Aaron Marcuse-Kubitza
06:09 PM Revision 13321: bugfix: lib/sh/util.sh: die_error_hidden(): echo_func to assist debugging
Aaron Marcuse-Kubitza
06:07 PM Revision 13320: bugfix: lib/sh/util.sh: split(): need to limit the effects of IFS to just the splitting, so it doesn't cause strange errors in other functions
Aaron Marcuse-Kubitza
03:20 PM Revision 13319: bugfix: stderr2stdout(): fd 2 *must* be redirected back to fd 2, not log-filtered, in case there are other errors in addition to the benign error. this fixes a bug in pg_schema_exists(), where errors about the DB being down were not displayed because they were log-filtered out.
Aaron Marcuse-Kubitza
06:15 AM Revision 13318: lib/sh/make.sh: set_make_vars: don't display make vars at verbosity 2 to avoid clutter
Aaron Marcuse-Kubitza
05:58 AM Task #905 (Resolved): narrow down the cause of the import bug (incorrect join columns and disk space leak)
_see #887, #902_
h3. -alternate OS approach-
p(. _tried, and problem also occurs on Mac, so using other approac...
Aaron Marcuse-Kubitza

04/24/2014

05:34 PM Revision 13317: bugfix: lib/sh/make.sh: begin_target: don't echo_func twice
Aaron Marcuse-Kubitza
05:29 PM Revision 13316: inputs/GBIF/_MySQL/.rsync_ignore: added GBIFPortalDB-*.data.sql.gz, because these are intermediate files
Aaron Marcuse-Kubitza
05:02 PM Revision 13315: bugfix: /Makefile: $(pg_ctl-Darwin): need to call the command rather than echoing it, as is needed for the Linux version
Aaron Marcuse-Kubitza
04:59 PM Revision 13314: bugfix: /Makefile: $(pg_ctl-Darwin): need to `cd /` because due to pg_ctl bug, current directory must be accessible by it
Aaron Marcuse-Kubitza
03:49 PM Revision 13313: bugfix: lib/runscripts/util.run: a non-runscript should have all args passed to main(). this fixes a bug in backups/*_snapshot where "main" would need to be prepended to any args for the script to run correctly.
Aaron Marcuse-Kubitza
03:43 PM Revision 13312: bugfix: lib/runscripts/util.run: $wrap_fn: invoked script must always run as runscript so that wrapped command is run
Aaron Marcuse-Kubitza
03:40 PM Revision 13311: lib/runscripts/util.run: added $is_runscript, for use by $wrap_fn
Aaron Marcuse-Kubitza
03:36 PM Revision 13310: bugfix: lib/runscripts/util.run: $wrap_fn: $top_script doesn't need to be world-executable for most uses of sudo (only if sudoing to non-root)
Aaron Marcuse-Kubitza

04/23/2014

10:01 PM Revision 13309: bin/in_place: diff: use --brief to avoid scanning the entire file for large files
Aaron Marcuse-Kubitza
09:57 PM Revision 13308: bin/in_place: added $preserve_mtime flag
Aaron Marcuse-Kubitza
07:22 PM Task #887: fix disk space leak that fills the disk and crashes the import
VM upgraded to Ubuntu 14.04 and using the official Ubuntu version of Postgres, but problem still occurs Aaron Marcuse-Kubitza
07:16 PM Revision 13307: web/links/index.htm: updated to Firefox bookmarks: Ubuntu 14.04 upgrade: Apache: documented that MultiViews is actually only broken for redirects with the filename "index"
Aaron Marcuse-Kubitza
07:11 PM Revision 13306: web/.htaccess: for dirs, redirect to index.*: document it is actually just the filename "index" that MultiViews is broken for, other filenames work fine
Aaron Marcuse-Kubitza
06:06 PM Task #903 (Resolved): fix Ubuntu 14.04 upgrade bug that prevents Apache from displaying vegbiendev.nceas.ucsb.edu properly
Aaron Marcuse-Kubitza
06:00 PM Task #903: fix Ubuntu 14.04 upgrade bug that prevents Apache from displaying vegbiendev.nceas.ucsb.edu properly
added workaround for broken MultiViews Aaron Marcuse-Kubitza
04:14 PM Task #903 (Resolved): fix Ubuntu 14.04 upgrade bug that prevents Apache from displaying vegbiendev.nceas.ucsb.edu properly
this consists of 2 problems:
# -http://vegbiendev.nceas.ucsb.edu/index.php now includes the VegCore wiki page inst...
Aaron Marcuse-Kubitza
06:02 PM Revision 13305: bugfix: web/index.php: full directory index: only display if invoked as "vegpath.org/", not "vegpath.org/index.php"
Aaron Marcuse-Kubitza
05:58 PM Revision 13304: bugfix: web/.htaccess: for dirs, redirect to index.*: added workaround for Ubuntu 14.04, which breaks MultiViews
Aaron Marcuse-Kubitza
05:56 PM Revision 13303: /Makefile: postgres-Linux: updated to use the official version that comes with Ubuntu 14.04
Aaron Marcuse-Kubitza
05:53 PM Revision 13302: web/links/index.htm: updated to Firefox bookmarks: Ubuntu 14.04 upgrade: Apache: documented that this breaks MultiViews, so you need to rewrite .htaccess files to avoid using MultiViews
Aaron Marcuse-Kubitza
05:24 PM Revision 13301: web/links/index.htm: updated to Firefox bookmarks: Ubuntu 14.04 upgrade: added Postgres upgrading instructions
Aaron Marcuse-Kubitza
05:05 PM Revision 13300: _license/non-open-source/applies_to.txt: Brad: added "anything he created while not working for iPlant, from 2013-7-1..10-31"
Aaron Marcuse-Kubitza
04:53 PM Revision 13299: web/links/index.htm: updated to Firefox bookmarks: Ubuntu 14.04 upgrade
Aaron Marcuse-Kubitza
04:47 PM Revision 13298: web/links/index.htm: updated to Firefox bookmarks: Ubuntu 14.04 upgrade: added phpMyAdmin fixing instructions
Aaron Marcuse-Kubitza
04:45 PM Revision 13297: web/links/index.htm: updated to Firefox bookmarks: Ubuntu 14.04 upgrade: added phpMyAdmin fixing instructions
Aaron Marcuse-Kubitza
04:42 PM Task #904 (Resolved): add MySQL public user to allow accessing the normalized VegCore data dictionary
__* see "normalized VegCore data dictionary":http://vegbiendev.nceas.ucsb.edu/vegbiendev/db/my/VegCore Aaron Marcuse-Kubitza
04:31 PM Revision 13296: web/links/index.htm: updated to Firefox bookmarks: Ubuntu: Ubuntu 14.04 upgrade: added things broken by it. PostgreSQL: fixed links.
Aaron Marcuse-Kubitza
04:00 PM Task #884 (Rejected): fix Postgres bug that causes query planner to use seq scans and slow sorts instead of index scans in the import
duplicate of #902: slow sorts are caused by joining on the wrong columns, not query planner settings Aaron Marcuse-Kubitza
03:52 PM Task #902 (Resolved): fix bug that causes joining on the wrong columns in the import
_bug fixed in r14074_
h3. issue
* in some queries, the columns being joined on are completely the wrong set (co...
Aaron Marcuse-Kubitza
03:16 PM Task #901: schedule regular pg_dump backups of the DB
see @backups/pg_snapshot@, @backups/mysql_snapshot@ Aaron Marcuse-Kubitza
01:08 PM Revision 13295: /Makefile: postgres-Linux: added warning that the install commands were designed to run on Ubuntu 12.04, which is no longer the version used by vegbiendev (it is now 14.04)
Aaron Marcuse-Kubitza
12:09 PM Revision 13294: backups/mysql_snapshot: documented initial vegbiendev->jupiter upload time for GBIF/raw_occurrence_record.MYD (7 h for 91 GB = 3.7 MB/s)
Aaron Marcuse-Kubitza
12:12 AM Revision 13293: fix: web/links/index.htm: updated to Firefox bookmarks: PostgreSQL: backups: wal_keep_segments method: clarified how to determine the value of wal_keep_segments. filesystem-level backups: documented the advantages of filesystem-level backups over traditional database-level backups with pg_dump.
Aaron Marcuse-Kubitza

04/22/2014

11:32 PM Revision 13292: fix: web/links/index.htm: updated to Firefox bookmarks: PostgreSQL: backups: wal_keep_segments: restored annotations
Aaron Marcuse-Kubitza
11:26 PM Revision 13291: web/links/index.htm: updated to Firefox bookmarks: PostgreSQL: backups: documented how to set up online and offline backups (with two possible approaches for online backups)
Aaron Marcuse-Kubitza
11:24 PM Revision 13290: web/links/index.htm: updated to Firefox bookmarks: PostgreSQL: backups: documented how to set up online and offline backups (with two possible approaches for online backups)
Aaron Marcuse-Kubitza
11:21 PM Revision 13289: web/links/index.htm: updated to Firefox bookmarks: PostgreSQL: backups: documented how to set up online and offline backups (with two possible approaches for online backups)
Aaron Marcuse-Kubitza
10:33 PM Revision 13288: lib/sh/db.sh: pg_snapshot(): perform online backup if possible, to avoid needing to restart the server
Aaron Marcuse-Kubitza
10:31 PM Revision 13287: lib/sh/db.sh: added pg_start_backup(), pg_stop_backup()
Aaron Marcuse-Kubitza
10:29 PM Revision 13286: lib/sh/db.sh: psql(): only set redirs if can redirect
Aaron Marcuse-Kubitza
10:17 PM Revision 13285: bugfix: psql(): when as_root is on, need to avoid redirections as these are not passed through by sudo
Aaron Marcuse-Kubitza
08:14 PM Revision 13284: /README.TXT: use `sudo -u ... -i` instead of `sudo su - ...` to avoid using two commands to accomplish the login
Aaron Marcuse-Kubitza
06:01 PM Revision 13283: bugfix: lib/sh/db.sh: psql(): don't use `--file /dev/fd/40` when can't redirect
Aaron Marcuse-Kubitza
05:59 PM Revision 13282: fix: lib/sh/db.sh: psql(): when using sudo with custom $stdin, raise error that this is not supported
Aaron Marcuse-Kubitza
05:50 PM Revision 13281: lib/sh/db.sh: psql(): $data_bypasses_filter: renamed to $data2stdout for clarity
Aaron Marcuse-Kubitza
05:29 PM Revision 13280: lib/sh/db.sh: psql(): $bypass_ok: renamed to $can_redir for clarity
Aaron Marcuse-Kubitza
05:22 PM Revision 13279: lib/sh/db.sh: psql(): usage: $stdin: documented that can also use process substitution for this
Aaron Marcuse-Kubitza
04:45 PM Revision 13278: bugfix: lib/sh/util.sh: `type` calls: need -- before cmd in case it starts with -
Aaron Marcuse-Kubitza
04:44 PM Revision 13277: lib/sh/util.sh: cmd2rel_path: use is_extern()
Aaron Marcuse-Kubitza
04:00 PM Revision 13276: lib/sh/sync.sh: db_snapshot(): use `end_try` as specified in `try` usage
Aaron Marcuse-Kubitza
03:59 PM Revision 13275: lib/sh/util.sh: try: usage: added location of finally block
Aaron Marcuse-Kubitza
03:57 PM Revision 13274: schemas/postgresql.conf: wal_level: set to hot_standby to enable online backup with pg_start_backup()
Aaron Marcuse-Kubitza
03:04 PM Revision 13273: lib/sh/sync.sh: upload(): always print the function and kw_params
Aaron Marcuse-Kubitza
04:00 AM Revision 13272: added backups/mysql_snapshot, pg_snapshot
Aaron Marcuse-Kubitza

04/21/2014

08:21 PM Revision 13271: bugfix: lib/sh/util.sh: type(): need to handle options before command name
Aaron Marcuse-Kubitza
08:21 PM Revision 13270: lib/sh/util.sh: added 1st_non_opt()
Aaron Marcuse-Kubitza
08:07 PM Revision 13269: lib/sh/util.sh: unalias(): use self_builtin, which is now defined before it
Aaron Marcuse-Kubitza
08:01 PM Revision 13268: lib/runscripts/util.run: sudo(): avoid slow $wrap_fn when using `command` (ie. always executable)
Aaron Marcuse-Kubitza
07:59 PM Revision 13267: lib/sh/util.sh: unalias(): use self_builtin, which is now defined before it
Aaron Marcuse-Kubitza
07:57 PM Revision 13266: bugfix: lib/sh/util.sh: commands: `type` calls: need to account for the fact that any alias is already expanded
Aaron Marcuse-Kubitza
07:56 PM Revision 13265: lib/sh/util.sh: functions: moved before commands since commands are more complex
Aaron Marcuse-Kubitza
07:38 PM Revision 13264: lib/sh/sync.sh: db_copy() and callers: pass args as rsync options
Aaron Marcuse-Kubitza
07:28 PM Revision 13263: fix: lib/sh/sync.sh: db_copy(): need to exclude files which prevent tape backup
Aaron Marcuse-Kubitza
07:22 PM Revision 13262: lib/sh/db.sh: added pg_ctl(), pg_snapshot()
Aaron Marcuse-Kubitza
07:17 PM Revision 13261: lib/sh/sync.sh: db_snapshot(): copy changes before stopping DB to minimize the time that it's shut down
Aaron Marcuse-Kubitza
07:12 PM Revision 13260: lib/sh/sync.sh: db_snapshot(): factored copy operation out into separate db_copy() function
Aaron Marcuse-Kubitza
07:03 PM Revision 13259: lib/sh/db.sh: mysql_snapshot(): use new db_snapshot()
Aaron Marcuse-Kubitza
07:02 PM Revision 13258: lib/sh/sync.sh: added db_snapshot()
Aaron Marcuse-Kubitza
05:55 PM Revision 13257: lib/Firefox_bookmarks.reformat.csv: changed "page's own description" to "page's self-description" for clarity
Aaron Marcuse-Kubitza
05:50 PM Revision 13256: web/links/index.htm: updated to Firefox bookmarks: removed dead links
Aaron Marcuse-Kubitza
05:43 PM Revision 13255: web/links/index.htm: updated to Firefox bookmarks: updated favicons
Aaron Marcuse-Kubitza
05:28 PM Revision 13254: web/links/index.htm: updated to Firefox bookmarks: modifying a running shell script: updated to document that `svn up` actually *does* use two-stage save automatically
Aaron Marcuse-Kubitza
04:46 PM Revision 13253: lib/sh/db.sh: mysql_snapshot(): for large files, don't re-copy entire file
Aaron Marcuse-Kubitza
04:44 PM Revision 13252: lib/sh/db.sh: mysql_snapshot(): use live mode as the default
Aaron Marcuse-Kubitza
04:30 PM Revision 13251: fix: lib/sh/db.sh: mysql_snapshot(): need to create dest dir if doesn't exist
Aaron Marcuse-Kubitza
04:27 PM Revision 13250: bugfix: lib/sh/db.sh: mysql_snapshot(): try: need to use split syntax with prep_try instead, to work with prefix vars
Aaron Marcuse-Kubitza
04:23 PM Revision 13249: bugfix: lib/sh/db.sh: mysql_snapshot(): try: need to use split syntax with prep_try instead, to work with prefix vars
Aaron Marcuse-Kubitza
04:20 PM Revision 13248: fix: lib/sh/util.sh: try usage: documented that the split syntax with prep_try is meant to be used with vars before the cmd
Aaron Marcuse-Kubitza
03:37 PM Revision 13247: fix: lib/sh/util.sh: echo_vars(): also need to print unset vars (including unset kw_params)
Aaron Marcuse-Kubitza
03:31 PM Revision 13246: lib/sh/util.sh: echo_vars(): put loop var on same line as `for`
Aaron Marcuse-Kubitza
02:59 PM Revision 13245: bugfix: lib/sh/util.sh: sudo(): need to preserve PATH separately because -E does not preserve this
Aaron Marcuse-Kubitza
02:17 PM Revision 13244: lib/sh/util.sh: echo_redirs_cmd(): inline the function alias since it's only used in one place
Aaron Marcuse-Kubitza
02:15 PM Revision 13243: bugfix: lib/sh/util.sh: redir(): need to load new aliases before it
Aaron Marcuse-Kubitza
02:13 PM Revision 13242: lib/sh/util.sh: echo_redirs_cmd(): log $PATH to facilitate troubleshooting
Aaron Marcuse-Kubitza
01:54 PM Revision 13241: lib/sh/util.sh: echo_redirs_cmd(): documented what the $(...) section does
Aaron Marcuse-Kubitza
01:50 PM Revision 13240: lib/sh/util.sh: echo_redirs_cmd(): moved comment about <>file redirs to line that it applies to
Aaron Marcuse-Kubitza
01:47 PM Revision 13239: lib/sh/util.sh: moved echo_redirs_cmd() to right before redir() which uses it
Aaron Marcuse-Kubitza
02:55 AM Revision 13238: lib/sh/util.sh: catch(): log at higher log_level, since this is internal code
Aaron Marcuse-Kubitza
02:43 AM Revision 13237: fix: lib/sh/util.sh: die_e(): treat SIGPIPE as benign error
Aaron Marcuse-Kubitza
02:32 AM Revision 13236: lib/sh/util.sh: removed no longer used ignore_sig(). use ignore() instead, which now supports SIG*.
Aaron Marcuse-Kubitza
02:32 AM Revision 13235: lib/sh/util.sh: piped_cmd(): use ignore, which now supports SIG*
Aaron Marcuse-Kubitza
02:31 AM Revision 13234: lib/sh/util.sh: signals: catch(): added echo_func
Aaron Marcuse-Kubitza
02:28 AM Revision 13233: lib/sh/util.sh: set_global_fds(): debug to global stderr in case stderr filtered
Aaron Marcuse-Kubitza
02:26 AM Revision 13232: lib/sh/util.sh: debugging: use configurable debug_fd (set to $err_fd)
Aaron Marcuse-Kubitza
02:13 AM Revision 13231: lib/sh/util.sh: signals: override catch() to support SIG* as exception type
Aaron Marcuse-Kubitza
02:11 AM Revision 13230: lib/sh/util.sh: moved primitives sections before more complex sections that depend on them
Aaron Marcuse-Kubitza
02:07 AM Revision 13229: lib/sh/util.sh: 2nd functions section: moved to 1st functions section
Aaron Marcuse-Kubitza
01:16 AM Revision 13228: bugfix: lib/sh/util.sh: added workaround for bash bug where exit sometimes inxeplicably ignores $?
Aaron Marcuse-Kubitza
01:15 AM Revision 13227: fix: lib/sh/util.sh: self_builtin: avoid $() so that $? isn't modified
Aaron Marcuse-Kubitza
01:07 AM Revision 13226: lib/sh/util.sh: use new self_builtin
Aaron Marcuse-Kubitza
01:06 AM Revision 13225: lib/sh/util.sh: added self_builtin
Aaron Marcuse-Kubitza
12:50 AM Revision 13224: lib/sh/util.sh: pv(), pf(): moved to debugging section
Aaron Marcuse-Kubitza
12:48 AM Revision 13223: bugfix: lib/sh/util.sh: stderr_matches(): also need to handle any filter error, such as caused by Ctrl+C
Aaron Marcuse-Kubitza
12:26 AM Revision 13222: lib/sh/util.sh: stderr_matches(): echo_vars @PIPESTATUS_ to assist debugging
Aaron Marcuse-Kubitza

04/20/2014

11:37 PM Revision 13221: bugfix: lib/sh/util.sh: stderr2stdout(): use piped_cmd to ignore SIGPIPE since the output of this will piped to another command
Aaron Marcuse-Kubitza
06:22 PM Revision 13220: lib/sh/util.sh: setup_log_fd(): $log_fd: use 3 (stdlog) since other scripts are likely to use this for logging as well
Aaron Marcuse-Kubitza
06:20 PM Revision 13219: fix: lib/sh/util.sh: setup_log_fd(): fd_set_default(): use $log_fd instead of repeating the value of it
Aaron Marcuse-Kubitza
06:05 PM Revision 13218: lib/sh/util.sh: die(): log at higher log_level, since this is logging code
Aaron Marcuse-Kubitza
06:02 PM Revision 13217: lib/sh/util.sh: highlight_log_msg(): log at higher log_level, since this is logging code
Aaron Marcuse-Kubitza
05:54 PM Revision 13216: bugfix: lib/runscripts/util.run: $subdirs: adjusted log_level now that echo_vars is one log_level lower
Aaron Marcuse-Kubitza
05:46 PM Revision 13215: bugfix: lib/sh/util.sh: stderr_matches(): only set benign_error=1 if the matched error occurred
Aaron Marcuse-Kubitza
05:44 PM Revision 13214: lib/sh/util.sh: ignore_e(): also set benign_error=1
Aaron Marcuse-Kubitza
05:40 PM Revision 13213: fix: lib/sh/util.sh: prep_try alias: removed inaccurate comment
Aaron Marcuse-Kubitza
05:33 PM Revision 13212: bugfix: lib/sh/util.sh: stdout2fd(): moved after redir() which it depends on
Aaron Marcuse-Kubitza
05:24 PM Revision 13211: fix: lib/sh/util.sh: command(): moved `|| die_e` to command__exec so it would be properly indented under the echoed command
Aaron Marcuse-Kubitza
05:12 PM Revision 13210: lib/sh/util.sh: verbosity_compat(): log at higher log_level because it's logging code
Aaron Marcuse-Kubitza
05:10 PM Revision 13209: lib/sh/util.sh: $benign_error: log at higher log_level because it's logging code
Aaron Marcuse-Kubitza
05:06 PM Revision 13208: lib/runscripts/util.run: $wrap_fn: log at higher log_level because it's startup code
Aaron Marcuse-Kubitza
04:55 PM Revision 13207: lib/sh/util.sh: $top_* vars, $is_outermost: log at higher log_level because it's startup code
Aaron Marcuse-Kubitza
04:52 PM Revision 13206: lib/sh/util.sh: $top_script: echo_vars this like the other $top_* vars
Aaron Marcuse-Kubitza
04:50 PM Revision 13205: lib/sh/util.sh: .(): log at higher log_level because it's startup code
Aaron Marcuse-Kubitza
04:45 PM Revision 13204: lib/sh/util.sh: is_dot_script(): run with higher log_level since this is run at the beginning of the script
Aaron Marcuse-Kubitza
04:44 PM Revision 13203: lib/sh/util.sh, runscripts/util.run: set_paths(): run with higher log_level to hide all the paths that are set at the beginning of the script
Aaron Marcuse-Kubitza
04:28 PM Revision 13202: lib/sh/util.sh: added log++ stub
Aaron Marcuse-Kubitza
04:22 PM Revision 13201: lib/sh/util.sh: added log_local stub
Aaron Marcuse-Kubitza
03:15 PM Revision 13200: lib/sh/util.sh: added log() stub so internal commands can use it
Aaron Marcuse-Kubitza
03:10 PM Revision 13199: fix: lib/sh/util.sh: echo_vars(): log at same log_level as echo_func so kw_params are displayed along with positional params
Aaron Marcuse-Kubitza
03:08 PM Revision 13198: fix: lib/sh/util.sh: rel_path(): log this internal command at a higher log_level so it's normally hidden
Aaron Marcuse-Kubitza
02:43 PM Revision 13197: fix: lib/sh/util.sh: log_msg!(): log split_lines at a higher log_level so it's normally hidden
Aaron Marcuse-Kubitza

04/19/2014

10:22 PM Revision 13196: bugfix: lib/sh/util.sh: stderr_matches(): `log_local; log++` should apply to just stdout_contains() and part of stderr2stdout() rather than all of stderr_matches()
Aaron Marcuse-Kubitza
10:14 PM Revision 13195: inputs/Madidi/_src/: set svn:ignore
Aaron Marcuse-Kubitza
10:13 PM Revision 13194: added backups/vegbien.r13002.backup.md5, vegbien.r13160.backup.md5
Aaron Marcuse-Kubitza
10:12 PM Revision 13193: backups/TNRS.backup.md5: updated
Aaron Marcuse-Kubitza
10:09 PM Revision 13192: lib/sh/util.sh: stderr_matches(): run at higher log_level because error-handling internals should not be logged by default
Aaron Marcuse-Kubitza
10:07 PM Revision 13191: bugfix: lib/sh/db.sh: mysql_ctl(): need to ignore errors if not running
Aaron Marcuse-Kubitza
10:04 PM Revision 13190: bugfix: lib/sh/util.sh: stderr_matches(): handle any error: only ignore_e if the error exit status was associated with the matched error message
Aaron Marcuse-Kubitza
09:57 PM Revision 13189: bugfix: lib/sh/util.sh: stderr_matches(): handle any error: need force-exit with rethrow_exit() because caller's test of return status disables errexit
Aaron Marcuse-Kubitza
09:54 PM Revision 13188: lib/sh/util.sh: added rethrow_exit(), which exits even where errexit is disabled
Aaron Marcuse-Kubitza
09:48 PM Revision 13187: bugfix: lib/sh/db.sh: mysql_snapshot(): need to run `mysql_ctl start` even if there is an error
Aaron Marcuse-Kubitza
09:38 PM Revision 13186: lib/sh/db.sh: mysql_snapshot(): $to: default to $from.bak
Aaron Marcuse-Kubitza
08:06 PM Revision 13185: lib/sh/db.sh: added mysql_snapshot()
Aaron Marcuse-Kubitza
07:46 PM Revision 13184: lib/sh/db.sh: added mysql_ctl()
Aaron Marcuse-Kubitza
07:35 PM Revision 13183: lib/sh/db.sh: pg_cmd(): updated to use new sudo()
Aaron Marcuse-Kubitza
07:14 PM Revision 13182: lib/runscripts/util.run: added sudo() override that uses $wrap_fn to support shell functions
Aaron Marcuse-Kubitza
07:13 PM Revision 13181: fix: lib/runscripts/util.run: $wrap_fn: make it usable even if $top_script isn't world-executable
Aaron Marcuse-Kubitza
07:11 PM Revision 13180: lib/sh/util.sh: sudo alias: use function instead so this can be overridden
Aaron Marcuse-Kubitza
07:09 PM Revision 13179: lib/sh/util.sh: added is_intern()
Aaron Marcuse-Kubitza
07:07 PM Revision 13178: lib/sh/util.sh: is_callable(): use just $1 because multiple args are not applicable
Aaron Marcuse-Kubitza
07:06 PM Revision 13177: lib/sh/util.sh: added is_world_executable()
Aaron Marcuse-Kubitza
07:06 PM Revision 13176: lib/sh/util.sh: added has_perms()
Aaron Marcuse-Kubitza
06:49 PM Revision 13175: lib/sh/util.sh: esc_args(): renamed to just esc() because this can also be used on a single value
Aaron Marcuse-Kubitza
05:51 PM Revision 13174: lib/sh/util.sh: added is_extern()
Aaron Marcuse-Kubitza
10:57 AM Revision 13173: lib/sh/util.sh: added sudo alias to alias-expand command
Aaron Marcuse-Kubitza
10:57 AM Revision 13172: lib/sh/db.sh: pg_cmd(): $as_root: use $sudo
Aaron Marcuse-Kubitza
10:54 AM Revision 13171: lib/sh/util.sh: added $sudo
Aaron Marcuse-Kubitza
10:24 AM Revision 13170: lib/sh/util.sh: added cp alias
Aaron Marcuse-Kubitza
09:55 AM Revision 13169: lib/sh/db.sh: removed no longer used pg_as_root(), which was buggy anyway. use `as_root=1 ...` instead.
Aaron Marcuse-Kubitza
09:38 AM Revision 13168: lib/sh/db.sh: mysql_ANSI: fixed comment
Aaron Marcuse-Kubitza

04/18/2014

06:57 PM Revision 13167: added backups/users.sql.run
Aaron Marcuse-Kubitza
05:34 PM Revision 13166: lib/sh/db.sh: pg_dump(): support dumping entire cluster, and cluster users
Aaron Marcuse-Kubitza
05:10 PM Revision 13165: lib/sh/db.sh: pg_cmd(): added $as_root switch
Aaron Marcuse-Kubitza

04/17/2014

08:21 PM Revision 13164: fix: inputs/SALVIAS/projects/postprocess.sql: remove private data that should not be publicly visible: preserve datasets with ipr_specific = '', because they *are* actually redistributable, according to Brad (http://wiki.vegpath.org/2014-04-17_conference_call#conditions-of-use)
Aaron Marcuse-Kubitza
08:14 PM Task #887: fix disk space leak that fills the disk and crashes the import
main DB backed up, (close to?) ready to roll back and/or upgrade the VM Aaron Marcuse-Kubitza
10:31 AM Task #887: fix disk space leak that fills the disk and crashes the import
submitted support request to restore vegbiendev to last working configuration and install a past revision of Postgres... Aaron Marcuse-Kubitza
08:12 PM Task #901: schedule regular pg_dump backups of the DB
seem to have settled on shutting down VM before tape backup as the approach for this Aaron Marcuse-Kubitza
01:23 PM Task #901 (New): schedule regular pg_dump backups of the DB
* this is not backed up with the rest of the VM due to bandwidth limitations and available tape drive space
* use @-...
Aaron Marcuse-Kubitza
05:28 PM Revision 13163: web/links/index.htm: updated to Firefox bookmarks: PostgreSQL: added operator classes. added backups: filesystem-level backup, continuous archiving, WAL logging, etc. virtual collaboration: updated annotations.
Aaron Marcuse-Kubitza
02:52 PM Revision 13162: lib/sh/db.sh pg_dump(), bin/pg_dump_vegbien: --format=plain: removed comment that this is the plain format, because this is now self-documenting
Aaron Marcuse-Kubitza
02:51 PM Revision 13161: lib/sh/db.sh pg_dump(), bin/pg_dump_vegbien: --format: use the long form of the formats to make the code self-documenting
Aaron Marcuse-Kubitza
03:44 AM Revision 13160: validation/aggregating/specimens/qualitative_validations_specimens.sql: updated to DB
Aaron Marcuse-Kubitza
03:41 AM Revision 13159: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to inputs/NY/validations.sql
Aaron Marcuse-Kubitza
03:32 AM Revision 13158: schemas/vegbien.ERD.mwb: regenerated exports
Aaron Marcuse-Kubitza
03:31 AM Revision 13157: fix: lib/PostgreSQL-MySQL.csv: preserve schema assignments by translating `SET search_path` to `USE`
Aaron Marcuse-Kubitza
02:54 AM Revision 13156: schemas/vegbien.ERD.mwb: regenerated exports
Aaron Marcuse-Kubitza
02:53 AM Revision 13155: schemas/vegbien.ERD.mwb: added geoscrub, TNRS tables, as requested in the 2014-04-10 conference call (wiki.vegpath.org/2014-04-10_conference_call#VegBIEN-schema)
Aaron Marcuse-Kubitza
02:39 AM Revision 13154: schemas/Makefile: vegbien.sql: also include geoscrub, TNRS schemas, as requested in the 2014-04-10 conference call (wiki.vegpath.org/2014-04-10_conference_call#VegBIEN-schema). this involves having a separate public_.sql file for restoring the public schema.
Aaron Marcuse-Kubitza

04/16/2014

11:02 PM Revision 13153: schemas/vegbien.ERD.mwb: regenerated exports
Aaron Marcuse-Kubitza
10:49 PM Revision 13152: bugfix: inputs/NY/validations.sql: _specimens_07_list_of_verbatim_subspecific_taxa_with_author: updated filter condition to match output query
Aaron Marcuse-Kubitza
10:48 PM Revision 13151: inputs/NY/run: `make inputs/NY/validate`: updated runtime (8 min, with added queries)
Aaron Marcuse-Kubitza
10:24 PM Revision 13150: fix: inputs/NY/Ecatalog_all/map.csv, postprocess.sql: remapped substrate, vegetation to locationRemarks
Aaron Marcuse-Kubitza
10:14 PM Task #899 (New): remove dependencies on Mac
* avoids needing to support Mac as well as Linux in all our scripts
** note that Mac software must be installed manu...
Aaron Marcuse-Kubitza
10:10 PM Task #898 (New): remove dependencies on the development machine
* the development process should not require both a VM and a specially-configured local machine to make changes to th... Aaron Marcuse-Kubitza
06:41 PM Revision 13149: fix: inputs/NY/Ecatalog_all/map.csv, postprocess.sql: remapped substrate, vegetation to locationRemarks
Aaron Marcuse-Kubitza
06:35 PM Revision 13148: bugfix: lib/runscripts/import.run: all(): also need to propagate $rm to import()
Aaron Marcuse-Kubitza
04:24 PM Revision 13147: bugfix: inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13*: also need to include coordinate pairs which have one of their coordinates NULL, by using OR instead of AND
Aaron Marcuse-Kubitza
04:15 PM Revision 13146: bugfix: inputs/NY/validations.sql: _specimens_13b_list_of_all_decimal_lat_long: matched column types to output query
Aaron Marcuse-Kubitza
04:14 PM Revision 13145: bugfix: inputs/NY/validations.sql: _specimens_13a_list_of_all_verbatim_lat_long: matched column types to output query
Aaron Marcuse-Kubitza
03:13 PM Revision 13144: inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: added breakdowns _specimens_13a_list_of_all_verbatim_lat_long, _specimens_13b_list_of_all_decimal_lat_long to help troubleshoot the diff
Aaron Marcuse-Kubitza
02:04 PM Revision 13143: fix: inputs/NY/validations.sql, schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: count lat/longs together instead of separately, because the DISTINCT is by coordinate *pair*, not individual coordinate value (which wouldn't make much sense)
Aaron Marcuse-Kubitza

04/15/2014

08:12 PM Revision 13142: bugfix: schemas/vegbien.sql: rm_output_queries(): need to account for the fact that util.truncated_prefixed_name_regexp() returns a whole-string regexp. this drops support for removing output queries with a particular group prefix, which we no longer use.
Aaron Marcuse-Kubitza
07:59 PM Revision 13141: bugfix: schemas/vegbien.sql: rm_output_queries(): need to include relations whose names were truncated, as well
Aaron Marcuse-Kubitza
07:14 PM Revision 13140: fix: schemas/vegbien.sql: public_validations schema comment: to remove a validations query so its columns can be changed: use rm_output_queries() rather than rm_query_view() because that also removes input queries
Aaron Marcuse-Kubitza
07:00 PM Revision 13139: bugfix: schemas/util.sql: is_castable(): need to pass NULL through, for proper NULL propagation
Aaron Marcuse-Kubitza
06:52 PM Revision 13138: fix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: use new is_castable(), which is much more accurate than Brad's custom regexp for determining if something is numeric
Aaron Marcuse-Kubitza
06:29 PM Revision 13137: inputs/NY/validations.-.util.sql: added util.is_castable() wrapper
Aaron Marcuse-Kubitza
06:12 PM Revision 13136: schemas/util.sql: added is_castable()
Aaron Marcuse-Kubitza
06:10 PM Revision 13135: schemas/util.sql: added try_cast()
Aaron Marcuse-Kubitza
05:51 PM Revision 13134: schemas/util.sql: added util.cast(), which allows casting to an arbitrary type without eval()
Aaron Marcuse-Kubitza

04/14/2014

05:04 PM Revision 13133: bugfix: schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: DISTINCT: added coordsaccuracy_m
Aaron Marcuse-Kubitza
05:02 PM Revision 13132: bugfix: schemas/vegbien.sql: coordinates_unique: added coordsaccuracy_m
Aaron Marcuse-Kubitza
04:56 PM Revision 13131: fix: schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to DISTINCT the values that are being counted, because the coordinates_unique unique constraint includes other columns as well, so there may be multiple instances of each lat/long
Aaron Marcuse-Kubitza
04:51 PM Revision 13130: bugfix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to include both lat and long in the value to DISTINCT on
Aaron Marcuse-Kubitza
04:48 PM Revision 13129: fix: inputs/NY/validations.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: need to DISTINCT the values that are being counted, because they are merged by the coordinates_unique unique constraint in the import
Aaron Marcuse-Kubitza
04:24 PM Revision 13128: validation/aggregating/pipeline/aggregating_validations_pipeline.odg: diff tables: integrated row labels into table
Aaron Marcuse-Kubitza
04:04 PM Revision 13127: validation/aggregating/pipeline/aggregating_validations_pipeline.odg: diff tables: added line for different rows (vs. missing/extra)
Aaron Marcuse-Kubitza
03:58 PM Revision 13126: inputs/NY/run: `make inputs/NY/validate`: documented slow queries: _specimens_12_distinct_collector_name_collect_num_date_w_count
Aaron Marcuse-Kubitza
03:23 PM Revision 13125: inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented slow queries (_plots_06a_list_of_stems). these may need to have their query plans rechecked.
Aaron Marcuse-Kubitza
03:22 PM Revision 13124: inputs/NY/run, inputs/SALVIAS/run_: `make inputs/.../validate`: updated runtime (+2 min)
Aaron Marcuse-Kubitza

04/11/2014

04:02 PM Task #887 (Rejected): fix disk space leak that fills the disk and crashes the import
_the bug that triggers this Postgres bug (#902) has now been fixed, so no need to fix this_
h3. issue
* in the ...
Aaron Marcuse-Kubitza

04/10/2014

04:06 PM Revision 13123: fix: inputs/NY/validations.sql: _specimens_*_of_unique_verbatim_author_taxa_with_genus: use scientificName rather than the concatenated ranks, because that is what is imported to taxonlabel.taxonomicname
Aaron Marcuse-Kubitza
03:52 PM Revision 13122: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to inputs/NY/validations.sql
Aaron Marcuse-Kubitza
03:50 PM Revision 13121: validation/aggregating/specimens/qualitative_validations_specimens.sql: updated to DB
Aaron Marcuse-Kubitza
03:41 PM Revision 13120: fix: schemas/vegbien.sql: _specimens_*_of_unique_verb_subsp_taxa_with_author: include only names with subspecies (filtering by taxonverbatim.subspecies rather than taxonlabel.taxonomicname)
Aaron Marcuse-Kubitza
03:13 PM Revision 13119: bugfix: /README.TXT: Full database import: to import just a subset of the datasources: array env var needs to be set *after* opening the `screen` shell because array vars are apparently *not* inherited by the `screen` shell
Aaron Marcuse-Kubitza
02:42 PM Revision 13118: /README.TXT: Full database import: to import just a subset of the datasources: added step to set custom import name
Aaron Marcuse-Kubitza
02:41 PM Revision 13117: /README.TXT: Full database import: added instructions for importing just a subset of the datasources
Aaron Marcuse-Kubitza
02:38 PM Revision 13116: bugfix: lib/sh/util.sh: local_array/export_array: *do* need -a because that it's an array is apparently *not* autodetected by the () on Mac
Aaron Marcuse-Kubitza
02:24 PM Revision 13115: mappings/VegCore-VegBIEN.csv: mapped subspecies to new taxonverbatim.subspecies for easier access by validations queries
Aaron Marcuse-Kubitza
02:05 PM Revision 13114: bugfix: web/.phpPgAdmin/.htaccess: work around phpPgAdmin bug that causes page to be ignored when not logged in
Aaron Marcuse-Kubitza
01:25 PM Revision 13113: fix: inputs/test_taxonomic_names/Taxon/map.csv: scientificName: remapped to scientificName instead of taxonName as this does include the author for some names
Aaron Marcuse-Kubitza
01:25 PM Revision 13112: fix: inputs/NY/Ecatalog_all/map.csv: ScientificName: remapped to scientificName instead of taxonName as this does include the author
Aaron Marcuse-Kubitza
01:17 PM Revision 13111: fix: inputs/NY/validations.sql: _specimens_*_of_unique_verb_subsp_taxa_with_author: use taxonName instead of concatenating the ranks, as that corresponds to what we use as the concatenated taxonomic name
Aaron Marcuse-Kubitza
12:59 PM Revision 13110: bugfix: inputs/NY/validations.sql: _specimens_*_of_verbatim_subspecific_taxa_with_author: need `subspecies IS NOT NULL` filter
Aaron Marcuse-Kubitza
12:57 PM Revision 13109: bugfix: inputs/NY/validations.sql: _specimens_07_list_of_verbatim_subspecific_taxa_with_author: need to include subspecies (as _specimens_06_count_of_unique_verb_subsp_taxa_with_author does)
Aaron Marcuse-Kubitza
12:35 PM Revision 13108: web/.phpPgAdmin/.htaccess: extract path components 1st->last: documented that can't use subject param for this because that goes to the last selected tab, not the default (leftmost) tab
Aaron Marcuse-Kubitza
12:03 PM Revision 13107: bugfix: inputs/NY/validations.sql: _specimens_*_of_species_binomials: removed incorrect `subspecies IS NOT NULL` filter (this should be on *_of_unique_verb_subsp_taxa_with_author instead)
Aaron Marcuse-Kubitza
11:41 AM Revision 13106: schemas/vegbien.sql: taxonverbatim: added subspecies, as decided in the conference call (wiki.vegpath.org/2014-04-10_conference_call#VegBIEN-schema-2)
Aaron Marcuse-Kubitza
06:54 AM Revision 13105: fix: schemas/vegbien.sql: _plots_* with duplicated rows: removed duplicated rows
Aaron Marcuse-Kubitza
06:45 AM Revision 13104: schemas/vegbien.sql: _specimens_*: ran through pipeline
Aaron Marcuse-Kubitza
06:38 AM Revision 13103: removed old version validation/aggregating/plots/SALVIAS/bien3_validations_salvias_db_original.sql. use validation/aggregating/plots/SALVIAS/_archive/bien3_validations_salvias_db_original.sql instead.
Aaron Marcuse-Kubitza
06:19 AM Revision 13102: validation/aggregating/specimens/NY/qualitative_validations_source_db_NYBG.VegCore.sql: updated to inputs/NY/validations.sql
Aaron Marcuse-Kubitza
06:17 AM Revision 13101: validation/aggregating/specimens/qualitative_validations_specimens.sql: updated to DB
Aaron Marcuse-Kubitza
06:07 AM Revision 13100: schemas/vegbien.sql: _specimens_16_list_distinct_specimen_descriptions: re-ran through pipeline after removing duplicated rows
Aaron Marcuse-Kubitza
06:02 AM Revision 13099: schemas/vegbien.sql: rm_output_queries(): also support removing just a particular output query
Aaron Marcuse-Kubitza
05:26 AM Revision 13098: bugfix: schemas/util.sql: remake_diff_table(): need to rm_freq() type_table, because left/right_table don't have freq yet
Aaron Marcuse-Kubitza
05:18 AM Revision 13097: schemas/util.sql: auto_rm_freq(): use new rm_freq()
Aaron Marcuse-Kubitza
05:17 AM Revision 13096: schemas/util.sql: added rm_freq(regclass[])
Aaron Marcuse-Kubitza
03:45 AM Revision 13095: fix: inputs/NY/validations.sql: _specimens_16_list_distinct_specimen_descriptions: removed duplicated rows using DISTINCT
Aaron Marcuse-Kubitza
03:33 AM Revision 13094: schemas/vegbien.sql: _specimens_11_list_of_three_standard_political_divisions: ran through pipeline
Aaron Marcuse-Kubitza
03:31 AM Revision 13093: fix: schemas/vegbien.sql: _specimens_11_list_of_three_standard_political_divisions: use same column names as input query
Aaron Marcuse-Kubitza
03:24 AM Task #345 (Resolved): integrate GNRS into VegBIEN
see "biengeo":http://vegbiendev.nceas.ucsb.edu/fs/derived/biengeo/ Aaron Marcuse-Kubitza
03:21 AM Task #326 (Rejected): generic MOU template to request data
making the database public instead Aaron Marcuse-Kubitza
03:19 AM Task #485: track data provider's citation requirements in VegBIEN
the [[Datasource conditions of use|conditions of use]] have been gathered Aaron Marcuse-Kubitza
03:10 AM Revision 13092: schemas/util.sql: remake_diff_table(): result table comment: documented how to display NULL values that are extra or missing
Aaron Marcuse-Kubitza
02:40 AM Revision 13091: schemas/vegbien.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: ran through pipeline
Aaron Marcuse-Kubitza
02:38 AM Revision 13090: fix: schemas/vegbien.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: also need to convert to text in GROUP BY/ORDER BY
Aaron Marcuse-Kubitza
02:34 AM Revision 13089: bugfix: inputs/NY/validations.sql: _specimens_03_list_of_verbatim_families: use family as specified in query description, not as implemented
Aaron Marcuse-Kubitza
02:32 AM Revision 13088: _license/UCSB/LICENSE.TXT: use (c) verbatim from the e-mail, not as displayed as © by Thunderbird
Aaron Marcuse-Kubitza
02:07 AM Revision 13087: bugfix: schemas/vegbien.sql, inputs/NY/validations.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: cast this to text rather than date because some values for this field are not valid dates and will throw an error if cast to date
Aaron Marcuse-Kubitza

04/09/2014

08:19 PM Revision 13086: fix: inputs/NY/validations.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: dateCollected: matched type to output query
Aaron Marcuse-Kubitza
06:23 PM Revision 13085: validation/aggregating/pipeline/aggregating_validations_pipeline.odg: show that the staging table(s) are denormalized before running the input queries on them. clarified that what is compared are the input and output query *results*, not the queries themselves.
Aaron Marcuse-Kubitza
02:55 PM Revision 13084: schemas/vegbien.sql: _specimens_10_count_number_of_records_by_institution: ran through pipeline
Aaron Marcuse-Kubitza
02:48 PM Revision 13083: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed `public.` prefix to avoid cluttering up the SQL
Aaron Marcuse-Kubitza
02:46 PM Revision 13082: bugfix: schemas/vegbien.sql, validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_10_count_number_of_records_by_institution: need to dereference specimenreplicate.duplicate_institutions_sourcelist_id to the corresponding sourcelist.name
Aaron Marcuse-Kubitza
02:40 PM Revision 13081: schemas/vegbien.sql: public_validations._specimens_*: added comments from validation/aggregating/specimens/qualitative_validations_specimens.sql
Aaron Marcuse-Kubitza
02:25 PM Revision 13080: validation/aggregating/specimens/qualitative_validations_specimens.sql: synced to schemas/vegbien.sql so that it can be diffed with it to sync qualitative_validations_specimens.sql to the DB
Aaron Marcuse-Kubitza
02:55 AM Revision 13079: lib/sql_gen.py: map_expr(): documented that unlike bin/repl SQL identifier handling, this does simplify the resulting expression
Aaron Marcuse-Kubitza
02:54 AM Revision 13078: lib/sql_gen.py: map_expr(): documented that this is a special case of bin/repl SQL identifier handling which does not handle entire source files
Aaron Marcuse-Kubitza
02:52 AM Revision 13077: bin/repl: match as whole-word text (like SQL identifier): documented that this is a generalization of lib/sql_gen.py map_expr() to work on entire source files
Aaron Marcuse-Kubitza
02:50 AM Revision 13076: bin/repl, lib/sql_gen.py Expression transforming: documented that this can also be done in Postgres with expression substitution (wiki.vegpath.org/Postgres_queries#expression-substitution)
Aaron Marcuse-Kubitza

04/08/2014

03:49 PM Revision 13075: fix: inputs/U/Specimen/map.csv: Genus: remapped to taxonName because this field is actually mislabeled in the original column names
Aaron Marcuse-Kubitza
02:55 PM Revision 13074: validation/aggregating/pipeline/validations_on_sparse_datasources.odg: not applicable "✓": increased font size so the size of the character matches the surrounding text
Aaron Marcuse-Kubitza
02:52 PM Revision 13073: validation/aggregating/pipeline/validations_on_sparse_datasources.odg: removed = lines for each input query, because they clutter up the diagram and the "same, so don't need to rewrite" message now shows this as well
Aaron Marcuse-Kubitza
02:50 PM Revision 13072: validation/aggregating/pipeline/validations_on_sparse_datasources.odg: added the denormalized VegCore schema approach for comparison, as requested by Mark
Aaron Marcuse-Kubitza
01:52 PM Revision 13071: schemas/vegbien.sql: remake_diff_tables(schema text): removed bien2_traits runtime because this applies only to one datasource. the bien2_traits runtime is now documented in inputs/bien2_traits/run.
Aaron Marcuse-Kubitza
01:40 PM Revision 13070: inputs/NY/run: `make inputs/NY/validate`: updated runtime (6.5 min). this increases as more queries are able to run successfully.
Aaron Marcuse-Kubitza
01:38 PM Revision 13069: schemas/vegbien.sql: public_validations: schema comment: documented how to run the validations. this information is also in the usage comment for public_validations.remake_diff_table(), but is copied here for easy reference.
Aaron Marcuse-Kubitza
01:19 PM Revision 13068: inputs/SALVIAS/run_: `make inputs/SALVIAS/validate`: documented runtime (5 min)
Aaron Marcuse-Kubitza
12:49 PM Revision 13067: inputs/bien2_traits/run: documented `make inputs/bien2_traits/validate` runtime (9 min)
Aaron Marcuse-Kubitza

04/07/2014

06:21 PM Revision 13066: schemas/vegbien.sql: public_validations: specimens queries: added autogenerated ~type tables
Aaron Marcuse-Kubitza
06:19 PM Revision 13065: inputs/NY/run: `make inputs/NY/validate`: updated runtime (5 min)
Aaron Marcuse-Kubitza
06:09 PM Revision 13064: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed DDL statements, using the steps at wiki.vegpath.org/Aggregating_validations_refactoring#remove-DDL-statements
Aaron Marcuse-Kubitza
06:07 PM Revision 13063: schemas/vegbien.sql: public_validations: added specimens queries to pipeline
Aaron Marcuse-Kubitza
05:51 PM Revision 13062: validation/aggregating/specimens/qualitative_validations_specimens.sql: parameterize queries by datasource
Aaron Marcuse-Kubitza
05:35 PM Revision 13061: validation/aggregating/**.sql output queries: use `SET join_collapse_limit = 1;` to match public_validations.rematerialize_out_view()
Aaron Marcuse-Kubitza
05:17 PM Revision 13060: fix: schemas/vegbien.sql: public_validations.rematerialize_out_view(text, regclass): run with join_collapse_limit = 1 to fix query planner issues. this option has been tested on the queries that do not yet use the standard join sequence (plots #11,12,13,14,16,17,18), and all of these queries also work fine with join_collapse_limit = 1. (the standard join sequence is used to ensure *both* correctness of the query and compatibility with join_collapse_limit = 1, but in some cases is not needed for join_collapse_limit.)
Aaron Marcuse-Kubitza
04:35 PM Revision 13059: validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_12_distinct_collector_name_collect_num_date_w_count: turn off join_collapse_limit instead of enable_mergejoin/enable_hashjoin, because join_collapse_limit is something that we will eventually want to turn off for all queries, which would avoid this query needing special handling. (on the other hand, enable_mergejoin/enable_hashjoin may be necessary for some queries and we probably won't turn them off for all queries.)
Aaron Marcuse-Kubitza
01:43 PM Revision 13058: bugfix: lib/runscripts/table.run: table_make_install(): need to ignore skip_table() errexit
Aaron Marcuse-Kubitza
12:13 PM Task #886 (New): move test DB to vegbiendev VM
* avoids needing to maintain a separate testing machine for the purposes of using the test DB
* helps remove depende...
Aaron Marcuse-Kubitza
10:39 AM Revision 13057: lib/sh/util.sh: import_vars: documented that vars already set will *not* be overwritten
Aaron Marcuse-Kubitza
09:47 AM Revision 13056: inputs/NY/run: documented `make inputs/NY/validate` runtime (2 min, currently for the input queries)
Aaron Marcuse-Kubitza

04/04/2014

06:13 PM Revision 13055: added inputs/Madidi/_src/ to match wiki steps in wiki.vegpath.org/Adding_a_flat-file_datasource
Aaron Marcuse-Kubitza

04/03/2014

07:31 PM Revision 13054: added validation/aggregating/pipeline/validations_on_sparse_datasources.odg
Aaron Marcuse-Kubitza
04:13 PM Revision 13053: planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx
Aaron Marcuse-Kubitza
04:09 PM Revision 13052: planning/workflow/bien3_architecture.pptx: stage I: made all datasources the same height so that the denormalized VegCore schema boxes would all look exactly the same. widened the denormalized VegCore schema boxes to make it visually clear that they have more columns than the staging tables denormalized together
Aaron Marcuse-Kubitza
03:40 PM Revision 13051: planning/workflow/bien3_architecture/stage_I.png, stages.png: synced to bien3_architecture.pptx
Aaron Marcuse-Kubitza
03:39 PM Revision 13050: planning/workflow/bien3_architecture.pptx: updated to reflect decisions made in the 2014-04-03 conference call (wiki.vegpath.org/2014-04-03_conference_call#import-process-2)
Aaron Marcuse-Kubitza
08:53 AM Revision 13049: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_14_count_of_all_invalid_verbatim_lat_long
Aaron Marcuse-Kubitza
08:35 AM Revision 13048: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_12_distinct_collector_name_collect_num_date_w_count
Aaron Marcuse-Kubitza
08:04 AM Revision 13047: validation/aggregating/specimens/qualitative_validations_specimens.sql: _specimens_13_count_of_all_verbatim_and_decimal_lat_long: fixed whitespace
Aaron Marcuse-Kubitza
07:32 AM Revision 13046: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed trailing whitespace
Aaron Marcuse-Kubitza
07:31 AM Revision 13045: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_13_count_of_all_verbatim_and_decimal_lat_long
Aaron Marcuse-Kubitza

04/02/2014

05:55 PM Revision 13044: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_11_list_of_three_standard_political_divisions
Aaron Marcuse-Kubitza
05:36 PM Revision 13043: validation/aggregating/specimens/qualitative_validations_specimens.sql: *_of_species_binomials: switched back to the old queries that use the split-apart ranks instead of the concatenated taxon name. note that these will not work on all specimens datasources, but now that #6,7 were selected to use the concatenated taxon name, this isn't a problem.
Aaron Marcuse-Kubitza
05:21 PM Revision 13042: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_species_binomials: renamed columns to species_binomial to reflect reverted query name
Aaron Marcuse-Kubitza
05:16 PM Revision 13041: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_verbatim_species_excluding_author: renamed to *_species_binomials for clarity
Aaron Marcuse-Kubitza
05:14 PM Revision 13040: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: _specimens_04_count_of_unique_verbatim_species_with_author, _specimens_05_list_of_unique_verbatim_species_with_author: switched back to original names because #6,7 now do the same thing as #4,5, so we should include the differing result set of #4,5 for datasources that provide it
Aaron Marcuse-Kubitza
05:01 PM Revision 13039: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_10_count_number_of_records_by_institution
Aaron Marcuse-Kubitza
04:38 PM Revision 13038: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: use taxon_name*_with_author everywhere instead of custom column names, for consistency
Aaron Marcuse-Kubitza
04:09 PM Revision 13037: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_of_verbatim_subspecific_taxa_without_author, etc.: renamed to *_with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
Aaron Marcuse-Kubitza
04:03 PM Revision 13036: validation/aggregating/specimens/qualitative_validations_specimens.sql: implemented _specimens_06_count_of_unique_verb_subsp_taxa_without_author, _specimens_07_list_of_verbatim_subspecific_taxa_without_author
Aaron Marcuse-Kubitza
03:54 PM Revision 13035: validation/aggregating/specimens/qualitative_validations_specimens.sql, NY/qualitative_validations_source_db_NYBG.VegCore.sql, inputs/NY/validations.sql: *_verbatim_species_without_author, etc.: renamed to *_with_author because these now use the concatenated name, rather than the without-author name that only some specimens datasources provide
Aaron Marcuse-Kubitza
03:32 PM Task #884 (Rejected): fix Postgres bug that causes query planner to use seq scans and slow sorts instead of index scans in the import
h3. issue
* see the following @pg_stat_activity@ snapshots (note the @EXPLAIN@ output below each query):...
Aaron Marcuse-Kubitza
03:14 PM Revision 13034: validation/aggregating/specimens/qualitative_validations_specimens.sql: removed extra ; at ends of queries
Aaron Marcuse-Kubitza
03:13 PM Revision 13033: validation/aggregating/specimens/qualitative_validations_specimens.sql: use the concatenated taxon name instead of concatenating the ranks, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
Aaron Marcuse-Kubitza
03:05 PM Revision 13032: validation/aggregating/specimens/qualitative_validations_specimens.sql: use the concatenated taxon name instead of concatenating the ranks, as decided in the 2014-03-27 conference call (wiki.vegpath.org/2014-03-27_conference_call#aggregating-validations)
Aaron Marcuse-Kubitza
11:17 AM Revision 13031: /README.TXT: Full database import: disk space: added high-water mark of 1.8 TB @11:15:05
Aaron Marcuse-Kubitza
10:56 AM Revision 13030: /README.TXT: Full database import: added steps to figure out which datasource tables were not successfully imported due to disk space errors
Aaron Marcuse-Kubitza
10:45 AM Revision 13029: fix: /README.TXT: Full database import: moved verification of exit statuses before verification of DB contents because there is no point in verifying the DB if the datasources didn't finish importing
Aaron Marcuse-Kubitza
10:10 AM Task #882 (Rejected): add limit on the # of parallel import processes
it turns out this would not fix the problem, because it occurs even when only a few datasources are running Aaron Marcuse-Kubitza
10:07 AM Task #883: have import scripts regularly check disk space and pause processes if getting close to limit
merging info in #882, so that this info is not maintained in two places Aaron Marcuse-Kubitza
09:01 AM Revision 13028: /README.TXT: Full database import: disk space: documented that the entire disk again gets used long after the beginning of the import, when only a few datasources are running (ie. it definitely seems to be a recent bug in Postgres, and not a latent problem)
Aaron Marcuse-Kubitza

04/01/2014

05:40 PM Revision 13027: /README.TXT: Maintenance: added task to regularly re-run full-database import so that bugs in it don't pile up. it needs to be kept in working order so that it works when it's needed.
Aaron Marcuse-Kubitza
05:02 PM Task #883 (Rejected): have import scripts regularly check disk space and pause processes if getting close to limit
h3. issue
* there is no soft limit on disk space inside Postgres, so the hard limit gets reached instead, causing ...
Aaron Marcuse-Kubitza
04:24 PM Revision 13026: /README.TXT: Full database import: added steps to manually reimport the applicable datasources if there are errors due to exceeding available disk space
Aaron Marcuse-Kubitza
04:13 PM Revision 13025: /README.TXT: Full database import: removed extra `ssh -t vegbiendev.nceas.ucsb.edu` before "upload logs", because the previous steps also occur on vegbiendev
Aaron Marcuse-Kubitza
04:11 PM Task #882 (Rejected): add limit on the # of parallel import processes
see description of problem in #883 Aaron Marcuse-Kubitza
04:04 PM Revision 13024: /README.TXT: Notes on system stability: added recommendation to maintain a snapshot copy of the VM as it was at the last successful import, for fallback use if a system upgrade breaks anything. system upgrades on the snapshot VM should be disabled completely, and because this will also disable security fixes, the snapshot VM should be disconnected from the internet and all networking interfaces. (this is an unfortunate consequence of modern OSes being written in non-memory-safe languages such as C and C++.)
Aaron Marcuse-Kubitza
03:43 PM Revision 13023: /README.TXT: Full database import: disk space: documented that a higher high-water mark actually occurs later in the import, so that the disk usage issue actually remains a problem after the very beginning
Aaron Marcuse-Kubitza
03:37 PM Revision 13022: fix: /README.TXT: Full database import: disk space: increased the minimum free space recommendation to 1 TB, because analysis of the disk usage during the beginning of the import shows that actually close to the entire amount is being used. however, this problem is normally undetectable unless the disk space is specifically checked, because it only manifests itself if the available disk space is exceeded completely.
Aaron Marcuse-Kubitza
02:04 PM Revision 13021: /README.TXT: Full database import: documented that the beginning of the import should be scheduled at a time when the DB will not be needed for other uses, because vegbiendev will be slow for the first few hours of the import due to the import using all the available cores
Aaron Marcuse-Kubitza
01:36 PM Revision 13020: /README.TXT: Full database import: documented that CPU load warning e-mails can safely be ignored. they happen because the parallel imports use all the available cores.
Aaron Marcuse-Kubitza
01:31 PM Revision 13019: fix: lib/common.Makefile: $(nice): use an increment of +10 instead of +5 because +5 still leaves the shell sluggish
Aaron Marcuse-Kubitza
01:29 PM Revision 13018: lib/common.Makefile: added $(nice) and use it everywhere its definition is used
Aaron Marcuse-Kubitza
01:14 PM Revision 13017: /README.TXT: Full database import: exiting `screen`: clarify that you must use `exit`, as Ctrl+D gets disabled to prevent accidental exits
Aaron Marcuse-Kubitza
12:47 PM Revision 13016: /README.TXT: Full database import: added step to restart Postgres to free up any disk space used by temp tables from the last import (this is apparently not automatically reclaimed)
Aaron Marcuse-Kubitza
12:45 PM Revision 13015: /Makefile: postgres_restart-Linux: documented that the manual running of the command is needed because for some reason, pg_ctl does not work when run inside make
Aaron Marcuse-Kubitza
12:43 PM Revision 13014: fix: /Makefile: postgres_restart-Linux: added pause after telling the user the command to run
Aaron Marcuse-Kubitza
12:42 PM Revision 13013: /Makefile: $(postgresReload-*): use postgres_restart for the postgres-restarting step
Aaron Marcuse-Kubitza
12:30 PM Revision 13012: bugfix: /Makefile: postgres_restart: added separate Linux version that deals with Linux-specific issues (as in $(postgresReload-Linux))
Aaron Marcuse-Kubitza
12:15 PM Revision 13011: /Makefile: added postgres_restart, since this is often invoked separately from the entire postgres_reload target
Aaron Marcuse-Kubitza
11:40 AM Revision 13010: /README.TXT: Full database import: disk space: increased minimum requirement to 500GB (~200GB extra), as the import may use significant additional space for temp tables
Aaron Marcuse-Kubitza
11:37 AM Revision 13009: /README.TXT: Full database import: documented that env vars set before invoking `screen` will be inherited by it, so these steps will work even if they come before `screen`
Aaron Marcuse-Kubitza
11:26 AM Revision 13008: backups/TNRS.backup.md5: updated
Aaron Marcuse-Kubitza
11:23 AM Revision 13007: /README.TXT: Full database import: added steps to set a custom version, if the auto-assigned one would cause a collision with the last import
Aaron Marcuse-Kubitza
11:08 AM Revision 13006: /README.TXT: Full database import: `unset version`: documented that this is needed because it may have been set in the outer shell
Aaron Marcuse-Kubitza

03/30/2014

07:54 PM Revision 13005: fix: lib/sql_io.py: put_table(): don't warn if can't create pkey, because this just indicates that a set-returning function was used. this should get rid of the last of the confusing benign warnings in the test output.
Aaron Marcuse-Kubitza
07:53 PM Revision 13004: fix: lib/sql.py: flatten(): don't warn if can't create pkey, because this just indicates that a set-returning function was used
Aaron Marcuse-Kubitza
07:52 PM Revision 13003: lib/sql.py: run_query_into() added add_pkey_warn param to support turning off "could not create unique index" warnings, which are sometimes benign (eg. when using set-returning functions with column-based import)
Aaron Marcuse-Kubitza
06:52 PM Revision 13002: /README.TXT: Full database import: disk space: updated schema size (315GB)
Aaron Marcuse-Kubitza
06:45 PM Revision 13001: /README.TXT: Full database import: removed `up` on jupiter because this is done as part of "do steps under Maintenance > "to synchronize vegbiendev, ..."
Aaron Marcuse-Kubitza
06:44 PM Revision 13000: /README.TXT: Full database import: moved "do steps under Maintenance > "to synchronize vegbiendev, ..." outside of "On local machine" because these steps don't only take place on the local machine
Aaron Marcuse-Kubitza
06:41 PM Revision 12999: /README.TXT: use `up` instead of `svn up --force` for consistency
Aaron Marcuse-Kubitza
06:40 PM Revision 12998: fix: /README.TXT: always use `up` instead of `svn up` since this includes --force
Aaron Marcuse-Kubitza
06:39 PM Revision 12997: /README.TXT: Full database import: removed unneeded `ssh -t vegbiendev.nceas.ucsb.edu exec sudo su - aaronmk` at beginning since this is performed again the first time it's needed
Aaron Marcuse-Kubitza
06:38 PM Revision 12996: fix: /README.TXT: Full database import: removed erroneous line that resulted from a search-and-replace of connection commands in r12396. (it used to read "Follow the steps under Connecting to vegbiendev above, using jupiter instead". this step is now performed on the line below it.)
Aaron Marcuse-Kubitza
06:31 PM Revision 12995: bin/make_analytical_db: removed remake_diff_tables() because this is now done for each datasource in inputs/input.Makefile
Aaron Marcuse-Kubitza
06:28 PM Revision 12994: bugfix: schemas/vegbien.sql: schemas/vegbien.sql(): need to util.use_schema(schema_anchor) *before* initializing vars that use own-schema functions
Aaron Marcuse-Kubitza
06:12 PM Revision 12993: inputs/input.Makefile: validate: redirect the output to the log, as for other import-related operations
Aaron Marcuse-Kubitza
06:08 PM Revision 12992: inputs/input.Makefile: import: validate at the end of the import
Aaron Marcuse-Kubitza
06:02 PM Revision 12991: inputs/input.Makefile: added new-style aggregating validations (`validate` target)
Aaron Marcuse-Kubitza
06:02 PM Revision 12990: bin/make_analytical_db: removed no longer needed "${public}_validations" schema qualifier, now that it is in the search_path
Aaron Marcuse-Kubitza
06:00 PM Revision 12989: fix: bin/vegbien_dest: added public_validations
Aaron Marcuse-Kubitza
05:41 PM Revision 12988: added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt, which is useful to see what fields will be available when we switch to the new GBIF export format
Aaron Marcuse-Kubitza
05:39 PM Revision 12987: lib/sh/util.sh: removed end_try_subshell, which now does the same thing as end_try
Aaron Marcuse-Kubitza
05:38 PM Revision 12986: fix: lib/sh/archives.sh: unzip(): support -p option, which pipes extracted data to stdout
Aaron Marcuse-Kubitza
05:11 PM Revision 12985: added inputs/GBIF/_src/0001000-131106143450413.zip.header.txt.run
Aaron Marcuse-Kubitza
05:11 PM Revision 12984: added lib/runscripts/extract_header.run
Aaron Marcuse-Kubitza
05:09 PM Revision 12983: fix: lib/sh/make.sh: direct the user to use begin_target instead of set_make_vars (set_make_vars is now used by begin_target)
Aaron Marcuse-Kubitza
05:06 PM Revision 12982: fix: lib/runscripts/util.run: to_top_file(): handle $_remake properly, without requiring deferred_check_target_exists to set to_file()'s flags
Aaron Marcuse-Kubitza
05:03 PM Revision 12981: bugfix: lib/sh/util.sh: die(): usage: documented that if msg uses $(...), save_e is needed
Aaron Marcuse-Kubitza
04:59 PM Revision 12980: bugfix: lib/sh/util.sh: already_exists_msg(): need to save_e, because new $(mk_hint) call resets $?
Aaron Marcuse-Kubitza
04:55 PM Revision 12979: lib/sh/util.sh: die(): always errexit even if $e = 0, because die always indicates an error
Aaron Marcuse-Kubitza
04:53 PM Revision 12978: lib/sh/util.sh: added rethrow!(), which always errexits, even if $e = 0
Aaron Marcuse-Kubitza
04:53 PM Revision 12977: lib/sh/util.sh: rethrow(): also work in situations where $e is not set
Aaron Marcuse-Kubitza
04:50 PM Revision 12976: lib/sh/util.sh: rethrow: made it a function since there is now no need for it to be an alias
Aaron Marcuse-Kubitza
04:47 PM Revision 12975: lib/sh/util.sh: rethrow: removed `test "$e" != 0` since errexit only does anything if $e != 0
Aaron Marcuse-Kubitza
04:45 PM Revision 12974: lib/sh/util.sh: removed separate rethrow_exit*, rethrow_subshell*, since they now do the same thing as rethrow*
Aaron Marcuse-Kubitza
04:42 PM Revision 12973: lib/sh/util.sh: rethrow*!: use new errexit, which works in functions *and* subshells
Aaron Marcuse-Kubitza
04:38 PM Revision 12972: lib/sh/util.sh: added errexit(), used in place of (exit "$1") because a bug in bash prevents subshells from triggering errexit
Aaron Marcuse-Kubitza
04:18 PM Revision 12971: lib/sh/util.sh: added bool!()
Aaron Marcuse-Kubitza
03:08 PM Revision 12970: fix: lib/sh/util.sh: redir(): need to indent before invoking an external command (not just in command__exec(), but for all redir() calls)
Aaron Marcuse-Kubitza
 

Also available in: Atom