lib/sh/db.sh: added limit() and use it instead of `${limit:+LIMIT $limit}`
lib/sh/db.sh: added mysql_truncate() and use it instead of `mk_truncate|mysql_ANSI`
lib/sh/db.sh: truncate(): renamed to mk_truncate() because it actually just creates a TRUNCATE statement, rather than also executing it
lib/sh/db.sh: use_local/use_remote: unset $prefix after using it so it isn't unintentionally applied as a kw param for a later function
lib/sh/db.sh: mk_select: renamed to mk_select_var since it actually sets a var in the local context rather than returning a query
inputs/GBIF/raw_occurrence_record/run: herbaria_filter.table/make(): specify the different parts used to create the table in an array
inputs/GBIF/raw_occurrence_record/run: renamed herbaria_filter.csv_ to herbaria_filter.ih.csv_ to allow for other tables that get combined into herbaria_filter
bugfix: lib/sh/db.sh: mk_select: ensure newline before LIMIT clause, in case caller provided custom query which did not have trailing newline
bugfix: mappings/VegCore-VegBIEN.csv: place.geovalid: added missing /1 after _alt
bugfix: lib/sql.py: parse_exception(): typed_name_re: added back matching of names without "", since these are used by some error messages (ones that contain () after the function name)
bugfix: lib/sql.py: parse_exception(): typed_name_re: need to allow " within the matched name, since there are now "" around the entire identifer that was passed to Postgres, which may itself include " . always require "" around the matched name, to ensure that the whole name is matched by .+? e.g. when followed by () for a function call. the version of Postgres we currently use apparently no longer has error messages without the "", so we don't need a separate regexp for quoted and unquoted names.
lib/sh/db.sh: mysql_import(): automatically ensure the table is empty (i.e. using truncate()), unless append=1 is specified. extra calls to truncate() now that this happens automatically have also been removed.
bin/map: by_col: ensure verbosity is at least 2 in live mode (using new ints.set_min() instead of max() for clarity). documented that live column-based import MUST be run with verbosity 2+ (3 preferred) to provide debugging information for often-complex errors. without this, debugging is effectively impossible.
added lib/ints.py with renamings of max()->set_*min*(), min()->set_*max*() for easier understandability of the set-ceiling/set-floor use cases of min()/max()
bin/map: Set default verbosity: by_col: documented that showing all queries is primarily to assist debugging, not profiling
lib/sh/util.sh: logging: named it `log++`
lib/sh/util.sh: logging: verbosities: level 0: documented that log++ also suppresses external command output for full support of cron jobs
lib/sh/util.sh: logging: documented `make` equivalents of the various verbosities, where available. (many of the verbosities, such as level 1, are sorely needed in make to avoid excessive output.)
lib/sh/util.sh: die_e(): benign errors: increase log_level so that a benign non-zero exit status will only be displayed at debug verbosities (2+) (it is confusing otherwise)
lib/sh/util.sh: try(): always run the command with benign_error=1 so that any die_e() doesn't prematurely indicate that a particular exit status was an error
lib/sh/util.sh: die_e(): support benign errors using $benign_error flag that should be logged as info messages instead of errors
lib/sh/util.sh: die(): documented that msg can't use $() (because it would reset $?)
inputs/bien_web/observation/VegBIEN.csv, unmapped_terms.csv: regenerated
lib/sh/util.sh: command(): 2>&$err_fd: add to _redirs after echoing command so it isn't echoed at the end of every command (since this redirection is frequently applied)
lib/sh/util.sh: sed: use case statement instead of test to determine flag letter, to easily allow matching multiple `uname` OSes or adding additional flag letters
lib/sh/util.sh: die(): documented that its msg can use $?, because it has not yet been overridden by another command
lib/sh/util.sh: die_e(): use die(), which performs the necessary save_e/rethrow. this requires using $? instead of $e for the exit status, because $e has not yet been set.
lib/sh/util.sh: inlined log_e() into die_e() because that's the only place it's used
lib/sh/util.sh: command(): print "command exited with error" message using new die_e() if command returns false. this requires removing manual die_e()/log_e() calls elsewhere.
lib/sh/util.sh: command(): moved increase of indent inside () so that error-handling statements after () will use the outer log_level
lib/sh/util.sh: added die_e(), which logs that a command exited with an error
lib/sh/util.sh: command(): determine redirections before echoing the command so they can be logged along with the command, instead of as separate exec statements. (these had a higher log_level to avoid cluttering the output with `exec` lines, which usually suppressed the redirections completely.) inline the command__set_fds() nested func so the redirections are all in one place.
lib/sh/util.sh: use simpler `if can_log; then indent; fi` instead of `can_log && indent || true`. however, the `&& indent || true` syntax is still required in aliases such as echo_func which need to allow prefixing the command with a wrapper command or kw param assignments.
inputs/GBIF/raw_occurrence_record/run: dynamically generate herbaria_filter.csv_ from herbaria.ih in new target herbaria_filter.csv_/make()
inputs/GBIF/raw_occurrence_record/run: store the herbaria filter in a MySQL table loaded from a CSV instead of getting it from a hardcoded list of IN (...) values
lib/sh/db.sh: added truncate()
lib/sh/make.sh: set_make_vars: set $target_stem
lib/sh/db.sh: added mysql_import()
lib/sh/db.sh: removed no longer used mk_esc_name()
lib/runscripts/table.run: don't mk_esc_name schema, table because these will be mk_esc_name'd by functions that use them
lib/sh/local.sh: psql(): use $schema_esc, $table_esc instead of just putting $schema, $table in ""
lib/sh/db.sh: mk_esc_name_alias(): don't overwrite an already-defined $*_esc, to allow the user to provide an already-escaped value (such as a schema-qualified table) directly
lib/sh/util.sh: rtrim(): increase the log_level of sed to 4+ instead of 2+ because it is usually run as part of a var assignment, and should therefore have a lower log_level than echo_vars
lib/sh/db.sh: mk_esc_name_alias(): echo_vars the *_esc var when it's set
lib/sh/db.sh: added mk_esc_name_alias() and use it to create mk_schema_esc, mk_table_esc
lib/sh/db.sh: mysql(): run with --local-infile=1
bugfix: lib/sh/db.sh: log_sql(): use can_log() instead because the verbosity now gets decremented as the log_level increases, so the threshold to compare to is 0 instead of 2
lib/sh/util.sh: added set_default()
lib/sh/util.sh: rtrim(): run at higher log_level so that sed command is not normally echoed
inputs/GBIF/raw_occurrence_record/run: renamed herbaria.sql to herbaria.data.sql so it wouldn't be added to svn by `make inputs/GBIF/raw_occurrence_record/add` or `make inputs/add`
inputs/input.Makefile: $(svnFiles): also exclude *.data.sql, which should never be in svn
schemas/vegbien.sql: cultivated_family_locations: documented that table is from sftp://nimoy.nceas.ucsb.edu/home/bien/bien2_scripts/geoscrub/cultivated/cult_by_taxon/flag_by_taxa.inc (i.e. not generated by a function)
schemas/vegbien.sql: place.geovalid: added latLongDomainValid to the values to _and together
schemas/vegbien.sql: place.geovalid: require it to be NOT NULL so that it's always a 2-valued boolean (but default it to false since it's not a required field)
mappings/VegCore-VegBIEN.csv: place.geovalid: use false instead of NULL
inputs/GBIF/raw_occurrence_record/run: table.tsv/make(): exclude deleted rows (i.e. where the deleted timestamp is non-NULL)
inputs/GBIF/raw_occurrence_record/header.csv: regenerated using ./run. since the table is reimported as a CSV, it uses bin/csv2db, which prepends an additional row_num column.
inputs/GBIF/raw_occurrence_record/run: table.tsv/make(): remove explicit cols list to include all cols. the file size of the generated table.tsv will increase by ~3x, but should remain reasonably-sized compared to our available disk space.
bugfix: inputs/GBIF/raw_occurrence_record/run: table.tsv/make(): need \ line continuation after vars so they only apply to the command rather than being set as global vars
bugfix: lib/runscripts/table.run: load_data(): use new $verbosity_min instead of running `verbosity_min` so that the command name logging is not output with the new verbosity
lib/sh/util.sh: added $verbosity_min to set a `verbosity_min` value after the command name, etc. has been logged, so that the logging itself is not output with the new verbosity
schemas/vegbien.sql: range_modeling_input: include only plants (i.e. rows with higher_plant_group IS NOT NULL)
schemas/vegbien.sql: range_modeling_input: added higher_plant_group, for use in restricting rows to plants
inputs/.geoscrub/geoscrub_output/map.csv: *validity: added definitions of the numeric codes from _src/README.TXT
added planning/workflow/validation/GeoDistKM.sql.txt
planning/goals/BIEN3_derived_data_products.docx: updated to most recent version from Brad's e-mail on 2013-4-16
/README.TXT: Full database import: before running screen: added `unset TMOUT` because TMOUT (autologout) causes screen to exit even with background processes active
/README.TXT: Maintenance: added things to put in your .profile on a live machine (e.g. vegbiendev). in particular, you MUST NOT have a TMOUT (autologout) set, because this causes screen to exit even if background processes (e.g. from column-based import) are running
bugfix: lib/runscripts/table.run: load_data(): need to ensure the verbosity is at least 3 because the install logs require verbose output. (3 is the default for the installer, but is overridden by the runscripts, which instead set the default to 1.)
lib/sh/util.sh: logging: added verbosity_min()
lib/sh/util.sh: logging: added verbosity_int() and use it instead of `round_down "$verbosity"`
bugfix: lib/sh/util.sh: log+ alias: don't expand next word because it's not a cmd
lib/runscripts/table.run: $schema, $table: log the `cd` used to calculate the value at log_level 3 instead of 1 (note that the cd() function call for this will be logged at log_level 5)
lib/sh/util.sh: cd(): trace-log the cd() function call itself (at log_level 3) in addition to the cd builtin call
lib/runscripts/table.run: import(): added step to load the data into the staging table before postprocessing it
inputs/GBIF/raw_occurrence_record/run: moved table.tsv.md5/make() and invocation of it to inputs/GBIF/table.run because it's general to all tables (which would all use table.tsv for this datasource). use $target_filename in calling table.tsv.md5/make from table.tsv/make.
bugfix: lib/sql.py: parse_exception(): typed_name_re: need to ensure that full name is matched rather than just first character
web/links/index.htm: updated to Firefox bookmarks. Mountain Lion upgrade: added Python psycopg2, Python OrderedDict, X11.
lib/runscripts/table.run: table_make(): run canon_rel_path on the datasrc dir so it's displayed as the direct path to it, without ..
lib/runscripts/table.run: table_make(): moved comment about "${@/#/$table/}" to right after the line it describes
lib/runscripts/table.run: input_make(): renamed to table_make() to make it clear that the target names are relative to the table subdir itself, not the datasrc dir. it was previously called input_make because it used inputs/input.Makefile directly, but now will use any Makefile in the datasrc dir.
added inputs/GBIF/Makefile, which links to ../input.Makefile, to allow running make directly in the datasrc dir (i.e. without --makefile=.../input.Makefile). this is required by the runscripts.
inputs/GBIF/raw_occurrence_record/run: table.tsv.md5/make(): don't add extra .md5 extension to $target_filename because it already has the extension as part of the target name (now that this command is run in its own make target rather than in table.tsv/make())
lib/sql_gen.py: import OrderedDict from collections instead of ordereddict for Mac 10.8 Mountain Lion upgrade
lib/sh/util.sh: logging: renamed $log_level_indent to $log_indent_step to avoid confusion with the log_level, which is a different kind of indent (using + signs instead of |s)
lib/sh/util.sh: logging: also export PS4, because it follows verbosity and therefore also needs to be propagated to invoked commands
lib/sh/util.sh: can_log(): support decimal verbosities using round_down()
lib/sh/util.sh: always use echo_export instead of export, even when the verbosity at load time would suppress output, because the verbosity may actually increase during the script due to log-- calls, etc., and vars should then still be echoed as expected
lib/sh/util.sh: log+: inlined PS4_prefix_n alias because there is now room for it
lib/sh/util.sh: log+: $verbosity: support decimal verbosities (but not decimal log_levels) by using new float+int
lib/sh/util.sh: removed no longer used log-. use log+ with a negative argument instead.
bugfix: lib/sh/util.sh: log+: PS4 when $1 < 0: need to negate $1 because now it's a negative number
lib/sh/util.sh: log--: use log+ with 1 instead of log so we don't need a separate log- function
lib/sh/util.sh: logging: log+: support negative log_level adjustments. log-: use log+ with the negative of its argument
web/links/index.htm: updated to Firefox bookmarks. sorted MySQL and PostgreSQL sections.
web/links/index.htm: updated to Firefox bookmarks
web/links/index.htm: Mac 10.8 Mountain Lion > PostgreSQL: appended tab to name to disambiguate it from the general PostgreSQL section
bugfix: web/links/index.htm.run: prepend __ to the HTML anchors of bookmarks toolbar links so they don't shadow/override folders of the same name: also renamed the corresponding self-anchor hyperlinks
bugfix: web/links/index.htm.run: prepend __ to the HTML anchors of bookmarks toolbar links so they don't shadow/override folders of the same name: need to replace all occurrences (using /g option to sed) to include both HTML anchors on the line