Project

General

Profile

Statistics
| Revision:

# Date Author Comment
9661 05/30/2013 07:45 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: herbaria_filter/make(): use the plant_fraction_for_herbaria_filter view directly instead of first exporting it to a CSV

9660 05/30/2013 07:19 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: mysql_import(): in append mode, use LOAD DATA IGNORE to allow inserting duplicate rows

9659 05/30/2013 07:09 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: herbaria_filter/make(): if remaking, turn off remake mode after doing this target's rm operations, so that prerequisite targets are not also remade

9658 05/30/2013 06:56 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: to_file(): removed no longer needed separate logging of >$stdout, which is now done by command()

9657 05/30/2013 06:50 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: echo_redirs_cmd(): use $ in a subshell instead of manipulating the @redirs array directly, because operations on $ (e.g. $#, $1, shift) are much simpler than the corresponding array operations ( ${#redirs[]}, ${redirs[0]}, redirs=("${redirs[]:1}") )

9656 05/30/2013 06:42 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: echo_redirs_cmd(): log each file redir with a separate log() statement, so each line is indented

9655 05/30/2013 06:38 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: added echo_redirs_cmd and use it in command() to print cmd

9654 05/30/2013 06:32 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: command(): print <>file redirects before command, because they introduce it

9653 05/30/2013 06:31 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: added starts_with()

9652 05/30/2013 05:49 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: to_file(): use @redirs to echo and set >$stdout instead of setting it manually, which is possible now that the command() @redirs bug has been fixed

9651 05/30/2013 05:43 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/util.sh: convention of fds to use for command-specific alternate stdin/stdout/stderr: changed to 40/41/42 because 10/11/12 are used by eval (which is used by set_fds()). use of fd 10/11/12 will cause hard-to-find silent bugs because exec will not print an error when these are used. documented why not to use other series of fds for this purpose:...

9650 05/30/2013 02:43 PM Aaron Marcuse-Kubitza

lib/sh/local.sh: psql(): use new psql() from db.sh instead of psql_script_vegbien/psql_verbose_vegbien. this requires setting local_pg_database=vegbien to replace vegbien_dest used by psql_*_vegbien.

9649 05/30/2013 02:38 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: psql(): set $PG* connection env vars from our connection vars ($server, $user, etc.). use use_pg to import $database so it can be different from $database for MySQL

9648 05/30/2013 02:31 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: added use_pg alias

9647 05/30/2013 02:31 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/db.sh: psql(): added missing `--set ON_ERROR_STOP=1 --quiet` opts from psql_script_vegbien

9646 05/30/2013 02:12 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: added psql(), which replaces psql_script_vegbien and psql_verbose_vegbien for general connections. it also supports separate command and stdin files, to allow using `\copy from pstdin`, with pstdin pointing to a separate, EOF-terminated CSV file instead of inlined with the command and terminated with the \. escape (which may be contained within the CSV file itself).

9645 05/30/2013 01:05 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/local.sh: psql(): $file can't both be passed as a --file param and be prefixed with the necessary \set schema, etc. commands, so instead include $file when cat-ing stdin

9644 05/30/2013 08:28 AM Aaron Marcuse-Kubitza

added inputs/GBIF/raw_occurrence_record/postprocess.sql, which removes institutions that we have direct data for

9643 05/30/2013 08:18 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: herbaria_filter/make(): skip table if already exists (unless remaking), like plant_fraction/make()

9642 05/30/2013 08:16 AM Aaron Marcuse-Kubitza

bugfix: lib/sh/db.sh: mysql_import(): need to use direct connection to DB instead of via ssh, because ssh does not tunnel nonstandard fds

9641 05/30/2013 08:15 AM Aaron Marcuse-Kubitza

lib/sh/db.sh: added ssh2local alias

9640 05/30/2013 07:36 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: herbaria_filter.plant_fraction.csv_/make(): use new plant_fraction_for_herbaria_filter view

9639 05/30/2013 07:13 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: added plant_fraction_for_herbaria_filter/make(). note that for simplicity, plant_fraction_for_herbaria_filter is a view instead of a table.

9638 05/30/2013 06:50 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: .table/(): renamed to */*() because a target named after a table refers to the table unless it has an explicit file extension

9637 05/30/2013 06:49 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: plant_fraction.table/*(): renamed to plant_fraction/*() because a target named after a table refers to the table unless it has an explicit file extension

9636 05/30/2013 06:41 AM Aaron Marcuse-Kubitza

lib/sh/db.sh: mysql_seal_table(): also revoke GRANT OPTION, which apparently needs to be done in addition (and in a separate command, unlike when granting GRANT OPTION)

9635 05/30/2013 06:40 AM Aaron Marcuse-Kubitza

lib/sh/db.sh: mysql_seal_table(): REVOKE: ignore errors if REVOKE was already run

9634 05/30/2013 06:39 AM Aaron Marcuse-Kubitza

lib/sh/db.sh: mysql_seal_table(): REVOKE: removed unneeded explicit database since this is automatically set to the current database

9633 05/30/2013 06:19 AM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: added plant_fraction.table/seal(), which uses new mysql_seal_table()

9632 05/30/2013 06:19 AM Aaron Marcuse-Kubitza

lib/sh/db.sh: added mysql_seal_table(), which prevents further modifications to a table by a user. this uses new mysql_root().

9631 05/30/2013 06:18 AM Aaron Marcuse-Kubitza

lib/sh/db.sh: added mysql_root(). this version uses just use_root (compare to the mysql_root() override in local.sh).

9630 05/30/2013 06:16 AM Aaron Marcuse-Kubitza

lib/sh/local.sh: database connection vars: connect to vegbiendev via ssh and run commands locally, to allow running commands as root (which can only connect to the database locally). this effectively requires an ssh account on vegbiendev, but any ssh account (including an anonymous one, if we set one up) will do. this causes schemas/VegCore/VegCore.my.sql, VegCore.pg.sql to change, because they are now created by mysqldump running on vegbiendev (Linux) instead of on a Mac.

9629 05/29/2013 10:35 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: plant_fraction: added index on plant_fraction for fast extraction of herbaria by fraction threshold

9628 05/29/2013 10:10 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: tables: set ENGINE to MyISAM and DEFAULT CHARSET to utf8 to match the other GBIF tables. (note that MyISAM is not the default, but is needed to avoid row sort order problems and other issues with InnoDB.)

9627 05/29/2013 08:09 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: plant_fraction.table/make(): in remaking mode, drop the table first

9626 05/29/2013 08:04 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: plant_fraction.table/make(): only create and populate the table if it doesn't already exist, to avoid clobbering existing data. the noclobber functionality uses new skip_table(), which is the table analog of require_not_exists().

9625 05/29/2013 08:02 PM Aaron Marcuse-Kubitza

lib/runscripts/table.run, table.run: use new db_make.sh

9624 05/29/2013 08:02 PM Aaron Marcuse-Kubitza

added lib/sh/db_make.sh that includes both db.sh and make.sh, and will eventually contain DB-related make commands

9623 05/29/2013 08:00 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: added skip_table(), which prints an already_exists_msg for tables

9622 05/29/2013 07:56 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: already_exists_msg: undid r9621 because the `|| return 0` should actually always be explicitly specified by the caller, to make it clear that the function will be aborted

9621 05/29/2013 07:47 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: already_exists_msg(): added alias for use as an error handler. note that ..._not_exists() functions should continue to use the "already_exists_msg" function instead to preserve the exit status.

9620 05/29/2013 07:40 PM Aaron Marcuse-Kubitza

lib/sh/util.sh: added already_exists_msg() and use it instead of manually generating the die() call

9619 05/29/2013 07:15 PM Aaron Marcuse-Kubitza

schemas/my.cnf: added innodb_file_per_table so each InnoDB table will get its own file. this should also allow databases with InnoDB tables to be manually renamed.

9618 05/29/2013 07:09 PM Aaron Marcuse-Kubitza

added schemas/my.cnf from /etc/mysql/my.cnf

9617 05/29/2013 06:51 PM Aaron Marcuse-Kubitza

schemas/VegCore/VegCore.my.sql, VegCore.pg.sql: synced to VegCore MySQL DB. for some reason, the fkeys are now output in the opposite order from what they were in before.

9616 05/29/2013 05:22 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: MatchedTaxon: filter out rows where Max_score was not high enough to use the TNRS result as a match. removed now-duplicated filter for this in AcceptedTaxon.

9615 05/29/2013 05:19 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: ScrubbedTaxon: removed extra ; at end of WHERE clause

9614 05/29/2013 03:48 PM Aaron Marcuse-Kubitza

web/links/index.htm: updated to Firefox bookmarks. some broken favicons have also been fixed, by reopening bookmark in Firefox. (this will only update a favicon if there is a newer version. to delete a favicon completely, use Firefox's SQLite Manager plugin.)

9613 05/29/2013 10:17 AM Aaron Marcuse-Kubitza

web/index.php: use XHTML DOCTYPE to match what's used by mod_autoindex. this requires some adjustments in spacing for XHTML's slightly different formatting

9612 05/29/2013 10:15 AM Aaron Marcuse-Kubitza

bugfix: web/.htaccess: need to do DirectoryIndex redirects before checking for existing file/dir, because a DirectoryIndexed dir is existing but still needs to be redirected to the index.* file

9611 05/29/2013 10:01 AM Aaron Marcuse-Kubitza

web/.htaccess: mod_autoindex: use the main.css stylesheet to match the look-and-feel of index.php

9610 05/29/2013 10:00 AM Aaron Marcuse-Kubitza

web/.htaccess: mod_autoindex: Note that some listed files are not web-accessible: use ' instead of " to avoid \-escaping embedded "

9609 05/29/2013 09:39 AM Aaron Marcuse-Kubitza

web/.htaccess: mod_autoindex: sort by description when provided, to allow setting a custom (non-alphabetical) sort order using AddDescription

9608 05/29/2013 09:37 AM Aaron Marcuse-Kubitza

web/.htaccess: mod_autoindex: added note that some listed files are not web-accessible. they will produce a "Forbidden" error when clicked.

9607 05/29/2013 09:36 AM Aaron Marcuse-Kubitza

bugfix: web/index.php: added space between the full directory index and the preceding content

9606 05/29/2013 09:35 AM Aaron Marcuse-Kubitza

web/index.php: moved the full directory index within the rest of the document body

9605 05/29/2013 08:58 AM Aaron Marcuse-Kubitza

web/index.php: include full directory index, since the URL patterns list is just a subset of the content available through vegpath.org

9604 05/29/2013 08:25 AM Aaron Marcuse-Kubitza

web/.htaccess: added mod_autoindex IndexOptions, in particular FoldersFirst

9603 05/29/2013 05:26 AM Aaron Marcuse-Kubitza

bugfix: web/.htaccess: changed "mod_dir listing"->"mod_autoindex listing" because mod_dir does not actually handle the autogenerated listings

9602 05/29/2013 05:24 AM Aaron Marcuse-Kubitza

bugfix: web/.htaccess: DirectoryIndex: use disabled instead of on because on is actually treated as a filename, and does not invoke mod_autoindex. the DirectoryIndex directive and the mod_dir module actually apply only to manual index files, not to autogenerated dir listings (which are handled by mod_autoindex).

9601 05/29/2013 04:58 AM Aaron Marcuse-Kubitza

web/index.php: removed no longer needed custom alias j.mp/vegpath# for when page reached through vegbiendev.nceas.ucsb.edu, because vegpath.org is a much more reliable domain than the previous path.vg, and a separate way to reach VegPath when path.vg is down is no longer needed

9600 05/29/2013 04:43 AM Aaron Marcuse-Kubitza

web/.htaccess: <dir>/all forces mod_dir listing: use simpler $mod_dir_listing env var instead of query string modification to indicate that an explicit mod_dir listing should be displayed. this causes /all to replace ?index=1 as the way to force a mod_dir listing. note that the %{ENV:...} test needs to use $REDIRECT_mod_dir_listing instead of $mod_dir_listing, because a redirect will occur between the /all rule and the index.* rule, causing all env vars to be prepended with REDIRECT_ .

9599 05/29/2013 03:48 AM Aaron Marcuse-Kubitza

web/.htaccess: <dir>/all forces mod_dir listing, as a simpler syntax than ?index=1

9598 05/29/2013 03:28 AM Aaron Marcuse-Kubitza

web/.htaccess: for dirs, redirect to index.*: allow requesting a mod_dir listing instead with ?index=1

9597 05/29/2013 03:26 AM Aaron Marcuse-Kubitza

web/.htaccess: handle DirectoryIndex redirects in a RewriteRule instead of with `DirectoryIndex index`, so that RewriteConds can be used to configure when index.* is used as the DirectoryIndex instead of a mod_dir listing

9596 05/29/2013 02:30 AM Aaron Marcuse-Kubitza

web/.htaccess: handle DirectoryIndex subrequests when there is no DirectoryIndex: moved comment about -F subrequest after line it applies to

9595 05/29/2013 02:27 AM Aaron Marcuse-Kubitza

inputs/GBIF/_MySQL/run: documented steps to reload GBIF MySQL

9594 05/29/2013 02:19 AM Aaron Marcuse-Kubitza

web/.htaccess: RewriteRules: added standard [discardpath,noescape,qsappend] options where missing (these should be the default, but aren't)

9593 05/24/2013 03:13 PM Aaron Marcuse-Kubitza

inputs/GBIF/raw_occurrence_record/run: herbaria_filter.table/make(): inline the PRIMARY KEY statement with its column

9592 05/24/2013 03:10 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/raw_occurrence_record/run: plant_fraction.table/make(): create the table once with "IF NOT EXISTS" and then populate it with INSERT SELECT, to avoid locking it while it's being repopulated. dropping and recreating the table with CREATE TABLE AS prevented phpMyAdmin from even reading the database's tables list, because it was unable to fetch a rowcount for plant_fraction.

9591 05/24/2013 03:04 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: mysql(): when echoing queries, also echo runtimes (turned on with `--verbose --verbose --verbose`)

9590 05/24/2013 02:32 PM Aaron Marcuse-Kubitza

added lib/runscripts/datasrc_dir.run

9589 05/24/2013 02:30 PM Aaron Marcuse-Kubitza

inputs/GBIF/_MySQL/run: added load_data(), which loads the dumpfile into MySQL

9588 05/24/2013 02:06 PM Aaron Marcuse-Kubitza

lib/sh/db.sh: added mysql_rm_privileged_statements()

9587 05/24/2013 02:00 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/resume_import.sh: sed calls: moved end-of-line comments to their own line because end-of-line comments are not supported on Mac

9586 05/24/2013 01:55 PM Aaron Marcuse-Kubitza

lib/runscripts/table_dir.run: renamed table to subdir because this can apply to any datasrc subdir. moved table-specific code to table.run.

9585 05/24/2013 01:43 PM Aaron Marcuse-Kubitza

lib/runscripts/table_dir.run: renamed table to subdir because this can apply to any datasrc subdir. moved table-specific code to table.run.

9584 05/24/2013 01:21 PM Aaron Marcuse-Kubitza

lib/runscripts/table_dir.run: table_make(): moved $silent flag to lib/sh/make.sh make() so all make callers can use it

9583 05/24/2013 12:35 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.data.sql.run: override ^.preamble.sql/make() and use ../_src/GBIFPortalDB-2013-02-20.dump as the dumpfile instead of this file, which does not contain the preamble

9582 05/24/2013 12:23 PM Aaron Marcuse-Kubitza

bugfix: lib/sh/resume_import.sh: $preamble_file: use the extension .0.preamble.sql instead of .preamble.sql so the preamble file sorts before the other *.sql files

9581 05/24/2013 12:22 PM Aaron Marcuse-Kubitza

removed inputs/GBIF/_MySQL/MySQL.data.sql*, since we are using the much faster exported TSVs instead (see raw_occurrence_record/table.tsv). this also avoids confusion between GBIFPortalDB-2013-02-20.data.sql* and MySQL.data.sql* when loading data into MySQL.

9580 05/24/2013 12:18 PM Aaron Marcuse-Kubitza

bugfix: inputs/GBIF/_MySQL/MySQL.data.sql.run: moved to GBIFPortalDB-2013-02-20.data.sql.run since it's actually the raw input file, not the ANSI export of it, that needs to be imported

9579 05/24/2013 12:16 PM Aaron Marcuse-Kubitza

lib/sh/resume_import.sh: get_pkey_at_pos(): changed $quote to ` to work with inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.data.sql

9578 05/24/2013 11:50 AM Aaron Marcuse-Kubitza

lib/sh/db.sh: mysql(): added $log_queries flag, which can be turned off to avoid using --verbose. this is useful when running bulk INSERT statements.

9577 05/24/2013 11:35 AM Aaron Marcuse-Kubitza

lib/sh/local.sh: added mysql_local()

9576 05/24/2013 11:24 AM Aaron Marcuse-Kubitza

lib/sh/local.sh: added mysql_root()

9575 05/24/2013 11:24 AM Aaron Marcuse-Kubitza

lib/sh/local.sh: added $root_user, $root_password

9574 05/24/2013 11:22 AM Aaron Marcuse-Kubitza

lib/sh/db.sh: added use_root alias (similar to use_local/use_remote)

9573 05/24/2013 11:21 AM Aaron Marcuse-Kubitza

added inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.schema.z.clean_up.sql, which removes duplicated and unnecessary indexes in raw_occurrence_record

9572 05/24/2013 11:20 AM Aaron Marcuse-Kubitza

added inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.schema.0.preamble.sql

9571 05/24/2013 11:02 AM Aaron Marcuse-Kubitza

bugfix: lib/sh/resume_import.sh: sql_preamble(): also stop at first "-- Table structure for table" line (when using a full dumpfile rather than a data-only subset)

9570 05/24/2013 10:58 AM Aaron Marcuse-Kubitza

lib/sh/resume_import.sh: resume_import(): run connection preamble (first few lines of dumpfile) before continuing with main file at offset, so that connection setting are reapplied

9569 05/24/2013 06:45 AM Aaron Marcuse-Kubitza

lib/sh/resume_import.sh: is_pkey_imported__int(): use echo_stdout so the user can see the result of the > function in each iteration

9568 05/24/2013 06:42 AM Aaron Marcuse-Kubitza

added lib/sh/resume_import.sh and use it in inputs/GBIF/_MySQL/MySQL.data.sql.run

9567 05/24/2013 06:32 AM Aaron Marcuse-Kubitza

inputs/GBIF/_MySQL/MySQL.data.sql.run: is_pkey_imported__int(): made pkey name configurable in $pkey_name

9566 05/24/2013 05:32 AM Aaron Marcuse-Kubitza

inputs/GBIF/_MySQL/MySQL.data.sql.run: import_resume_pos() run time: removed seconds because the precision is likely only to the nearest half-minute

9565 05/24/2013 05:31 AM Aaron Marcuse-Kubitza

inputs/GBIF/_MySQL/MySQL.data.sql.run: documented that import_resume_pos() takes 6 min to run, with 37 iterations

9564 05/24/2013 05:20 AM Aaron Marcuse-Kubitza

added inputs/GBIF/_MySQL/MySQL.data.sql.run, with helper functions for resuming the import to MySQL from where it left off. this is very useful if the import is interrupted for any reason, because otherwise, the entire import would have to be run again from the start, taking 40-50 hours. import_resume_pos() uses new binsearch() to find where in the file the import left off, based on which pkeys have already been imported. (GBIF pkeys are unfortnately not in any order in the input file, nor are they in insertion order in the imported table, because MySQL instead clusters the table by the pkey. this necessitates a much more complex solution to resuming a partial import.)

9563 05/24/2013 05:14 AM Aaron Marcuse-Kubitza

lib/sh/binsearch.sh: binsearch(): also echo_vars the iter_num, to track how close binsearch is to finding the value (it will always take the same # iters, log2(max - min) )

9562 05/24/2013 05:11 AM Aaron Marcuse-Kubitza

lib/sh/binsearch.sh: binsearch(): also echo_vars the min/max so these can be used as shortcut inputs if binsearch is run again