Project

General

Profile

Statistics
| Revision:

# Date Author Comment
9504 05/23/2013 11:54 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Postprocessing: populated entries for analytical DB for last 4 imports, and for backup, backup test for last import. note that the combined import time for the last import is 3.5 days, compared to 3 days for the column-based import portion.

9503 05/22/2013 11:47 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Postprocessing: added (empty) entries for analytical DB, backup, backup test

9502 05/21/2013 11:18 PM Aaron Marcuse-Kubitza

inputs/GBIF/Specimen/postprocess.sql, inputs/REMIB/Specimen/postprocess.sql: updated for providers in r9459, which adds TEX

9501 05/21/2013 11:10 PM Aaron Marcuse-Kubitza

inputs/*/*/postprocess.sql: Remove institutions that we have direct data for: query to obtain list: updated for current schema

9500 05/21/2013 10:49 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated import times. GBIF has been refreshed (with the range modeling column subset), and column-based import now takes 3 days for 88.4 million rows.

9499 05/21/2013 10:27 PM Aaron Marcuse-Kubitza

README.TXT: Full database import: added warning to perform every single step listed, to avoid breaking column-based import

9498 05/21/2013 10:26 PM Aaron Marcuse-Kubitza

README.TXT: Full database import: Publish the new import: added warning to be sure you have done every single verification step before proceeding. otherwise, a previous valid import could incorrectly be overwritten with a broken one.

9497 05/21/2013 09:07 PM Aaron Marcuse-Kubitza

bugfix: README.TXT: Full database import: To run TNRS/remake analytical DB: need to run `export version=<version>` before the command which uses it rather than after

9496 05/21/2013 08:26 PM Aaron Marcuse-Kubitza

added backups/*.md5

9495 05/21/2013 08:22 PM Aaron Marcuse-Kubitza

added backups/TNRS.2013-5-21.backup.md5

9494 05/21/2013 07:42 PM Aaron Marcuse-Kubitza

README.TXT: Datasource setup: For MySQL inputs: For .sql exports: added steps to grant privileges to the bien user. the privileges list excludes UPDATE, DELETE, ALTER, DROP to prevent bugs in the import scripts from accidentally deleting data.

9493 05/21/2013 07:37 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql, data.sql: updated for new TNRS CSV columns (see bug at https://pods.iplantcollaborative.org/jira/browse/TNRS-183). note that these columns may eventually change back (comment by Naim at https://pods.iplantcollaborative.org/jira/browse/TNRS-183#comment-34444).

9492 05/21/2013 07:33 PM Aaron Marcuse-Kubitza

README.TXT: Full database import: added steps to check that TNRS ran successfully, and fix errors (due to column changes in the TNRS CSV) if it didn't

9491 05/21/2013 07:24 PM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/test_scrub: use sh's -e (errexit) mode so errors in an invoked script cause the script to abort instead of burying the error in more output

9490 05/21/2013 07:19 PM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/test_scrub: documented that `make schemas/"$public"/uninstall` removes the previous results (since it may be confusing why it's prompting the user to uninstall the schema that is an output of the program)

9489 05/21/2013 07:16 PM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/test_scrub: don't need to run the import twice anymore because the accepted names are now included in the tnrs_input_name view that TNRS runs on

9488 05/21/2013 07:09 PM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/test_scrub: updated for current TNRS schema

9487 05/21/2013 06:47 PM Aaron Marcuse-Kubitza

bugfix: inputs/test_taxonomic_names/test_scrub: unset $n so it doesn't limit the # rows. it is set to 2 in the default test environment, so must be unset for n-sensitive programs that should be unlimited.

9486 05/21/2013 06:40 PM Aaron Marcuse-Kubitza

inputs/test_taxonomic_names/test_scrub: updated for current TNRS schema

9485 05/21/2013 01:44 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: herbaria_filter.table/make(): also include the exported plant_fraction herbaria

9484 05/21/2013 01:43 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: added herbaria_filter.plant_fraction.csv_/make(), which exports the plant_fraction herbaria whose plant_fraction >= 0.8

9483 05/21/2013 01:42 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: added plant_fraction.table/make(), which contains the plant fraction for each herbarium

9482 05/21/2013 01:37 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: added mk_drop()

9481 05/21/2013 01:00 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: to_file(): log $stdout so users can tell which file is being created by the command. for some reason, can't use `local redirs=(">$stdout")` because the redirections don't seem to be applied. can't yet use `log+ -2 echo_vars stdout` because log+ does not yet support negative adjustments (they cause PS4 to be emptied out before being re-prepended to).

9480 05/21/2013 12:54 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/util.sh: log+(): adjustment < 0: need to enclose -$1 in $(()) so it gets evaluated before being used as an array index

9479 05/21/2013 12:16 PM Aaron Marcuse-Kubitza

lib/sh/local.sh: psql(): documented that --output is actually for query results, not echoed statements (and thus must be redirected back to fd 1 while fd 1 with the statements gets sent to the logging port)

9478 05/21/2013 12:14 PM Aaron Marcuse-Kubitza

lib/sh/local.sh: psql(): documented why can't use fd 11

9477 05/21/2013 12:09 PM Aaron Marcuse-Kubitza

lib/sh/local.sh: use @redirs instead of manual redirection to set up --output fd, so that the redirection will be echoed along with the command. for some reason, this requires switching to fd 13 instead of 11, because fd 11 gives a "/dev/fd/11: Bad file descriptor" error when 11 is set with exec right before the command instead of on the subshell the command is executed in. (13 was chosen rather than 12 because 2 is for errors, while *3 (or 3) is for logging.)

9476 05/21/2013 04:18 AM Aaron Marcuse-Kubitza

bugfix: lib/sh/db.sh: pg_export_table_to_dir_no_header(): inlined $(pg_header) so setting $cols wouldn't affect pg_export_table_no_header(), which uses it as a kw param

9475 05/20/2013 10:44 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/util.sh: to_file(): require_not_exists check: missing `test` in `if "$if_not_exists"`

9474 05/20/2013 10:39 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: command(): log the function call using echo_func to assist debugging. (use a higher log_level because it's internal.)

9473 05/20/2013 09:29 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: command(): support custom redirections, which will be echoed along with the command

9472 05/20/2013 08:48 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: to_file(): reworded confusing || conditional for require_not_exists into an if statement

9471 05/20/2013 08:21 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/raw_occurrence_record/run: herbaria_filter.table/make(): need to use append=1 with mysql_import so the output table doesn't get re-truncated when additional parts are added

9470 05/20/2013 07:28 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/db.sh: load new aliases before mk_select(), which uses mk_table_esc

9469 05/20/2013 07:27 PM Aaron Marcuse-Kubitza

lib/runscripts/table.run: include make.sh so runscripts based on it can use make-related utils

9468 05/20/2013 06:52 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: added mk_select() and use it in mk_select_var

9467 05/20/2013 06:46 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: added limit() and use it instead of `${limit:+LIMIT $limit}`

9466 05/20/2013 06:44 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: added mysql_truncate() and use it instead of `mk_truncate|mysql_ANSI`

9465 05/20/2013 06:42 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: truncate(): renamed to mk_truncate() because it actually just creates a TRUNCATE statement, rather than also executing it

9464 05/20/2013 06:38 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: use_local/use_remote: unset $prefix after using it so it isn't unintentionally applied as a kw param for a later function

9463 05/20/2013 04:18 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: mk_select: renamed to mk_select_var since it actually sets a var in the local context rather than returning a query

9462 05/20/2013 03:40 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: herbaria_filter.table/make(): specify the different parts used to create the table in an array

9461 05/20/2013 03:19 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: renamed herbaria_filter.csv_ to herbaria_filter.ih.csv_ to allow for other tables that get combined into herbaria_filter

9460 05/20/2013 03:13 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/db.sh: mk_select: ensure newline before LIMIT clause, in case caller provided custom query which did not have trailing newline

9459 05/17/2013 06:00 PM Aaron Marcuse-Kubitza

bugfix: mappings/VegCore-VegBIEN.csv: place.geovalid: added missing /1 after _alt

9458 05/17/2013 05:55 PM Aaron Marcuse-Kubitza

bugfix: lib/sql.py: parse_exception(): typed_name_re: added back matching of names without "", since these are used by some error messages (ones that contain () after the function name)

9457 05/17/2013 05:41 PM Aaron Marcuse-Kubitza

bugfix: lib/sql.py: parse_exception(): typed_name_re: need to allow " within the matched name, since there are now "" around the entire identifer that was passed to Postgres, which may itself include " . always require "" around the matched name, to ensure that the whole name is matched by .+? e.g. when followed by () for a function call. the version of Postgres we currently use apparently no longer has error messages without the "", so we don't need a separate regexp for quoted and unquoted names.

9456 05/17/2013 03:43 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: mysql_import(): automatically ensure the table is empty (i.e. using truncate()), unless append=1 is specified. extra calls to truncate() now that this happens automatically have also been removed.

9455 05/17/2013 01:13 PM Aaron Marcuse-Kubitza

bin/map: by_col: ensure verbosity is at least 2 in live mode (using new ints.set_min() instead of max() for clarity). documented that live column-based import MUST be run with verbosity 2+ (3 preferred) to provide debugging information for often-complex errors. without this, debugging is effectively impossible.

9454 05/17/2013 01:08 PM Aaron Marcuse-Kubitza

added lib/ints.py with renamings of max()->set_*min*(), min()->set_*max*() for easier understandability of the set-ceiling/set-floor use cases of min()/max()

9453 05/17/2013 12:57 PM Aaron Marcuse-Kubitza

bin/map: Set default verbosity: by_col: documented that showing all queries is primarily to assist debugging, not profiling

9452 05/17/2013 11:59 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: logging: named it `log++`

9451 05/17/2013 11:59 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: logging: verbosities: level 0: documented that log++ also suppresses external command output for full support of cron jobs

9450 05/17/2013 11:57 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: logging: documented `make` equivalents of the various verbosities, where available. (many of the verbosities, such as level 1, are sorely needed in make to avoid excessive output.)

  1. verbosities (and `make` equivalents):
  2. 0: just print errors. useful for cron jobs....
9449 05/17/2013 04:03 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: die_e(): benign errors: increase log_level so that a benign non-zero exit status will only be displayed at debug verbosities (2+) (it is confusing otherwise)

9448 05/17/2013 03:36 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: try(): always run the command with benign_error=1 so that any die_e() doesn't prematurely indicate that a particular exit status was an error

9447 05/17/2013 03:34 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: die_e(): support benign errors using $benign_error flag that should be logged as info messages instead of errors

9446 05/17/2013 03:30 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: die(): documented that msg can't use $() (because it would reset $?)

9445 05/17/2013 03:19 AM Aaron Marcuse-Kubitza

inputs/bien_web/observation/VegBIEN.csv, unmapped_terms.csv: regenerated

9444 05/17/2013 03:01 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: command(): 2>&$err_fd: add to _redirs after echoing command so it isn't echoed at the end of every command (since this redirection is frequently applied)

9443 05/17/2013 02:55 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: sed: use case statement instead of test to determine flag letter, to easily allow matching multiple `uname` OSes or adding additional flag letters

9442 05/17/2013 02:46 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: die(): documented that its msg can use $?, because it has not yet been overridden by another command

9441 05/17/2013 02:45 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: die_e(): use die(), which performs the necessary save_e/rethrow. this requires using $? instead of $e for the exit status, because $e has not yet been set.

9440 05/17/2013 02:42 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: inlined log_e() into die_e() because that's the only place it's used

9439 05/17/2013 02:37 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: command(): print "command exited with error" message using new die_e() if command returns false. this requires removing manual die_e()/log_e() calls elsewhere.

9438 05/17/2013 02:34 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: command(): moved increase of indent inside () so that error-handling statements after () will use the outer log_level

9437 05/17/2013 02:31 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: added die_e(), which logs that a command exited with an error

9436 05/17/2013 02:18 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: command(): determine redirections before echoing the command so they can be logged along with the command, instead of as separate exec statements. (these had a higher log_level to avoid cluttering the output with `exec` lines, which usually suppressed the redirections completely.) inline the command__set_fds() nested func so the redirections are all in one place.

9435 05/17/2013 01:54 AM Aaron Marcuse-Kubitza

lib/sh/util.sh: use simpler `if can_log; then indent; fi` instead of `can_log && indent || true`. however, the `&& indent || true` syntax is still required in aliases such as echo_func which need to allow prefixing the command with a wrapper command or kw param assignments.

9434 05/16/2013 09:28 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: dynamically generate herbaria_filter.csv_ from herbaria.ih in new target herbaria_filter.csv_/make()

9433 05/16/2013 09:27 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: store the herbaria filter in a MySQL table loaded from a CSV instead of getting it from a hardcoded list of IN (...) values

9432 05/16/2013 09:24 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: added truncate()

9431 05/16/2013 09:23 PM Aaron Marcuse-Kubitza

lib/sh/make.sh: set_make_vars: set $target_stem

9430 05/16/2013 08:49 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: added mysql_import()

9429 05/16/2013 07:02 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: removed no longer used mk_esc_name()

9428 05/16/2013 07:01 PM Aaron Marcuse-Kubitza

lib/runscripts/table.run: don't mk_esc_name schema, table because these will be mk_esc_name'd by functions that use them

9427 05/16/2013 06:55 PM Aaron Marcuse-Kubitza

lib/sh/local.sh: psql(): use $schema_esc, $table_esc instead of just putting $schema, $table in ""

9426 05/16/2013 06:48 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: mk_esc_name_alias(): don't overwrite an already-defined $*_esc, to allow the user to provide an already-escaped value (such as a schema-qualified table) directly

9425 05/16/2013 06:38 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: rtrim(): increase the log_level of sed to 4+ instead of 2+ because it is usually run as part of a var assignment, and should therefore have a lower log_level than echo_vars

9424 05/16/2013 06:32 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: mk_esc_name_alias(): echo_vars the *_esc var when it's set

9423 05/16/2013 06:31 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: added mk_esc_name_alias() and use it to create mk_schema_esc, mk_table_esc

9422 05/16/2013 05:55 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: mysql(): run with --local-infile=1

9421 05/16/2013 05:48 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/db.sh: log_sql(): use can_log() instead because the verbosity now gets decremented as the log_level increases, so the threshold to compare to is 0 instead of 2

9420 05/16/2013 05:46 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: added set_default()

9419 05/16/2013 05:45 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: rtrim(): run at higher log_level so that sed command is not normally echoed

9418 05/16/2013 04:40 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: renamed herbaria.sql to herbaria.data.sql so it wouldn't be added to svn by `make inputs/GBIF/raw_occurrence_record/add` or `make inputs/add`

9417 05/16/2013 04:38 PM Aaron Marcuse-Kubitza

inputs/input.Makefile: $(svnFiles): also exclude *.data.sql, which should never be in svn

9416 05/16/2013 04:27 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: cultivated_family_locations: documented that table is from sftp://nimoy.nceas.ucsb.edu/home/bien/bien2_scripts/geoscrub/cultivated/cult_by_taxon/flag_by_taxa.inc (i.e. not generated by a function)

9415 05/16/2013 04:15 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place.geovalid: added latLongDomainValid to the values to _and together

9414 05/16/2013 04:09 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: place.geovalid: require it to be NOT NULL so that it's always a 2-valued boolean (but default it to false since it's not a required field)

9413 05/16/2013 04:06 PM Aaron Marcuse-Kubitza

mappings/VegCore-VegBIEN.csv: place.geovalid: use false instead of NULL

9412 05/16/2013 03:46 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: table.tsv/make(): exclude deleted rows (i.e. where the deleted timestamp is non-NULL)

9411 05/16/2013 03:42 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/header.csv: regenerated using ./run. since the table is reimported as a CSV, it uses bin/csv2db, which prepends an additional row_num column.

9410 05/16/2013 03:09 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: table.tsv/make(): remove explicit cols list to include all cols. the file size of the generated table.tsv will increase by ~3x, but should remain reasonably-sized compared to our available disk space.

9409 05/16/2013 03:04 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/raw_occurrence_record/run: table.tsv/make(): need \ line continuation after vars so they only apply to the command rather than being set as global vars

9408 05/16/2013 03:02 PM Aaron Marcuse-Kubitza

bugfix: lib/runscripts/table.run: load_data(): use new $verbosity_min instead of running `verbosity_min` so that the command name logging is not output with the new verbosity

9407 05/16/2013 02:59 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: added $verbosity_min to set a `verbosity_min` value after the command name, etc. has been logged, so that the logging itself is not output with the new verbosity

9406 05/16/2013 02:38 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: range_modeling_input: include only plants (i.e. rows with higher_plant_group IS NOT NULL)

9405 05/16/2013 02:36 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: range_modeling_input: added higher_plant_group, for use in restricting rows to plants