added inputs/GBIF/raw_occurrence_record/postprocess.sql, which removes institutions that we have direct data for
inputs/GBIF/raw_occurrence_record/run: herbaria_filter/make(): skip table if already exists (unless remaking), like plant_fraction/make()
bugfix: lib/sh/db.sh: mysql_import(): need to use direct connection to DB instead of via ssh, because ssh does not tunnel nonstandard fds
lib/sh/db.sh: added ssh2local alias
inputs/GBIF/raw_occurrence_record/run: herbaria_filter.plant_fraction.csv_/make(): use new plant_fraction_for_herbaria_filter view
inputs/GBIF/raw_occurrence_record/run: added plant_fraction_for_herbaria_filter/make(). note that for simplicity, plant_fraction_for_herbaria_filter is a view instead of a table.
inputs/GBIF/raw_occurrence_record/run: .table/(): renamed to */*() because a target named after a table refers to the table unless it has an explicit file extension
inputs/GBIF/raw_occurrence_record/run: plant_fraction.table/*(): renamed to plant_fraction/*() because a target named after a table refers to the table unless it has an explicit file extension
lib/sh/db.sh: mysql_seal_table(): also revoke GRANT OPTION, which apparently needs to be done in addition (and in a separate command, unlike when granting GRANT OPTION)
lib/sh/db.sh: mysql_seal_table(): REVOKE: ignore errors if REVOKE was already run
lib/sh/db.sh: mysql_seal_table(): REVOKE: removed unneeded explicit database since this is automatically set to the current database
inputs/GBIF/raw_occurrence_record/run: added plant_fraction.table/seal(), which uses new mysql_seal_table()
lib/sh/db.sh: added mysql_seal_table(), which prevents further modifications to a table by a user. this uses new mysql_root().
lib/sh/db.sh: added mysql_root(). this version uses just use_root (compare to the mysql_root() override in local.sh).
lib/sh/local.sh: database connection vars: connect to vegbiendev via ssh and run commands locally, to allow running commands as root (which can only connect to the database locally). this effectively requires an ssh account on vegbiendev, but any ssh account (including an anonymous one, if we set one up) will do. this causes schemas/VegCore/VegCore.my.sql, VegCore.pg.sql to change, because they are now created by mysqldump running on vegbiendev (Linux) instead of on a Mac.
inputs/GBIF/raw_occurrence_record/run: plant_fraction: added index on plant_fraction for fast extraction of herbaria by fraction threshold
inputs/GBIF/raw_occurrence_record/run: tables: set ENGINE to MyISAM and DEFAULT CHARSET to utf8 to match the other GBIF tables. (note that MyISAM is not the default, but is needed to avoid row sort order problems and other issues with InnoDB.)
inputs/GBIF/raw_occurrence_record/run: plant_fraction.table/make(): in remaking mode, drop the table first
inputs/GBIF/raw_occurrence_record/run: plant_fraction.table/make(): only create and populate the table if it doesn't already exist, to avoid clobbering existing data. the noclobber functionality uses new skip_table(), which is the table analog of require_not_exists().
lib/runscripts/table.run, table.run: use new db_make.sh
added lib/sh/db_make.sh that includes both db.sh and make.sh, and will eventually contain DB-related make commands
lib/sh/db.sh: added skip_table(), which prints an already_exists_msg for tables
lib/sh/util.sh: already_exists_msg: undid r9621 because the `|| return 0` should actually always be explicitly specified by the caller, to make it clear that the function will be aborted
lib/sh/util.sh: already_exists_msg(): added alias for use as an error handler. note that ..._not_exists() functions should continue to use the "already_exists_msg" function instead to preserve the exit status.
lib/sh/util.sh: added already_exists_msg() and use it instead of manually generating the die() call
schemas/my.cnf: added innodb_file_per_table so each InnoDB table will get its own file. this should also allow databases with InnoDB tables to be manually renamed.
added schemas/my.cnf from /etc/mysql/my.cnf
schemas/VegCore/VegCore.my.sql, VegCore.pg.sql: synced to VegCore MySQL DB. for some reason, the fkeys are now output in the opposite order from what they were in before.
inputs/.TNRS/schema.sql: MatchedTaxon: filter out rows where Max_score was not high enough to use the TNRS result as a match. removed now-duplicated filter for this in AcceptedTaxon.
inputs/.TNRS/schema.sql: ScrubbedTaxon: removed extra ; at end of WHERE clause
web/links/index.htm: updated to Firefox bookmarks. some broken favicons have also been fixed, by reopening bookmark in Firefox. (this will only update a favicon if there is a newer version. to delete a favicon completely, use Firefox's SQLite Manager plugin.)
web/index.php: use XHTML DOCTYPE to match what's used by mod_autoindex. this requires some adjustments in spacing for XHTML's slightly different formatting
bugfix: web/.htaccess: need to do DirectoryIndex redirects before checking for existing file/dir, because a DirectoryIndexed dir is existing but still needs to be redirected to the index.* file
web/.htaccess: mod_autoindex: use the main.css stylesheet to match the look-and-feel of index.php
web/.htaccess: mod_autoindex: Note that some listed files are not web-accessible: use ' instead of " to avoid \-escaping embedded "
web/.htaccess: mod_autoindex: sort by description when provided, to allow setting a custom (non-alphabetical) sort order using AddDescription
web/.htaccess: mod_autoindex: added note that some listed files are not web-accessible. they will produce a "Forbidden" error when clicked.
bugfix: web/index.php: added space between the full directory index and the preceding content
web/index.php: moved the full directory index within the rest of the document body
web/index.php: include full directory index, since the URL patterns list is just a subset of the content available through vegpath.org
web/.htaccess: added mod_autoindex IndexOptions, in particular FoldersFirst
bugfix: web/.htaccess: changed "mod_dir listing"->"mod_autoindex listing" because mod_dir does not actually handle the autogenerated listings
bugfix: web/.htaccess: DirectoryIndex: use disabled instead of on because on is actually treated as a filename, and does not invoke mod_autoindex. the DirectoryIndex directive and the mod_dir module actually apply only to manual index files, not to autogenerated dir listings (which are handled by mod_autoindex).
web/index.php: removed no longer needed custom alias j.mp/vegpath# for when page reached through vegbiendev.nceas.ucsb.edu, because vegpath.org is a much more reliable domain than the previous path.vg, and a separate way to reach VegPath when path.vg is down is no longer needed
web/.htaccess: <dir>/all forces mod_dir listing: use simpler $mod_dir_listing env var instead of query string modification to indicate that an explicit mod_dir listing should be displayed. this causes /all to replace ?index=1 as the way to force a mod_dir listing. note that the %{ENV:...} test needs to use $REDIRECT_mod_dir_listing instead of $mod_dir_listing, because a redirect will occur between the /all rule and the index.* rule, causing all env vars to be prepended with REDIRECT_ .
web/.htaccess: <dir>/all forces mod_dir listing, as a simpler syntax than ?index=1
web/.htaccess: for dirs, redirect to index.*: allow requesting a mod_dir listing instead with ?index=1
web/.htaccess: handle DirectoryIndex redirects in a RewriteRule instead of with `DirectoryIndex index`, so that RewriteConds can be used to configure when index.* is used as the DirectoryIndex instead of a mod_dir listing
web/.htaccess: handle DirectoryIndex subrequests when there is no DirectoryIndex: moved comment about -F subrequest after line it applies to
inputs/GBIF/_MySQL/run: documented steps to reload GBIF MySQL
web/.htaccess: RewriteRules: added standard [discardpath,noescape,qsappend] options where missing (these should be the default, but aren't)
inputs/GBIF/raw_occurrence_record/run: herbaria_filter.table/make(): inline the PRIMARY KEY statement with its column
bugfix: inputs/GBIF/raw_occurrence_record/run: plant_fraction.table/make(): create the table once with "IF NOT EXISTS" and then populate it with INSERT SELECT, to avoid locking it while it's being repopulated. dropping and recreating the table with CREATE TABLE AS prevented phpMyAdmin from even reading the database's tables list, because it was unable to fetch a rowcount for plant_fraction.
lib/sh/db.sh: mysql(): when echoing queries, also echo runtimes (turned on with `--verbose --verbose --verbose`)
added lib/runscripts/datasrc_dir.run
inputs/GBIF/_MySQL/run: added load_data(), which loads the dumpfile into MySQL
lib/sh/db.sh: added mysql_rm_privileged_statements()
bugfix: lib/sh/resume_import.sh: sed calls: moved end-of-line comments to their own line because end-of-line comments are not supported on Mac
lib/runscripts/table_dir.run: renamed table to subdir because this can apply to any datasrc subdir. moved table-specific code to table.run.
lib/runscripts/table_dir.run: table_make(): moved $silent flag to lib/sh/make.sh make() so all make callers can use it
bugfix: inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.data.sql.run: override ^.preamble.sql/make() and use ../_src/GBIFPortalDB-2013-02-20.dump as the dumpfile instead of this file, which does not contain the preamble
bugfix: lib/sh/resume_import.sh: $preamble_file: use the extension .0.preamble.sql instead of .preamble.sql so the preamble file sorts before the other *.sql files
removed inputs/GBIF/_MySQL/MySQL.data.sql*, since we are using the much faster exported TSVs instead (see raw_occurrence_record/table.tsv). this also avoids confusion between GBIFPortalDB-2013-02-20.data.sql* and MySQL.data.sql* when loading data into MySQL.
bugfix: inputs/GBIF/_MySQL/MySQL.data.sql.run: moved to GBIFPortalDB-2013-02-20.data.sql.run since it's actually the raw input file, not the ANSI export of it, that needs to be imported
lib/sh/resume_import.sh: get_pkey_at_pos(): changed $quote to ` to work with inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.data.sql
lib/sh/db.sh: mysql(): added $log_queries flag, which can be turned off to avoid using --verbose. this is useful when running bulk INSERT statements.
lib/sh/local.sh: added mysql_local()
lib/sh/local.sh: added mysql_root()
lib/sh/local.sh: added $root_user, $root_password
lib/sh/db.sh: added use_root alias (similar to use_local/use_remote)
added inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.schema.z.clean_up.sql, which removes duplicated and unnecessary indexes in raw_occurrence_record
added inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.schema.0.preamble.sql
bugfix: lib/sh/resume_import.sh: sql_preamble(): also stop at first "-- Table structure for table" line (when using a full dumpfile rather than a data-only subset)
lib/sh/resume_import.sh: resume_import(): run connection preamble (first few lines of dumpfile) before continuing with main file at offset, so that connection setting are reapplied
lib/sh/resume_import.sh: is_pkey_imported__int(): use echo_stdout so the user can see the result of the > function in each iteration
added lib/sh/resume_import.sh and use it in inputs/GBIF/_MySQL/MySQL.data.sql.run
inputs/GBIF/_MySQL/MySQL.data.sql.run: is_pkey_imported__int(): made pkey name configurable in $pkey_name
inputs/GBIF/_MySQL/MySQL.data.sql.run: import_resume_pos() run time: removed seconds because the precision is likely only to the nearest half-minute
inputs/GBIF/_MySQL/MySQL.data.sql.run: documented that import_resume_pos() takes 6 min to run, with 37 iterations
added inputs/GBIF/_MySQL/MySQL.data.sql.run, with helper functions for resuming the import to MySQL from where it left off. this is very useful if the import is interrupted for any reason, because otherwise, the entire import would have to be run again from the start, taking 40-50 hours. import_resume_pos() uses new binsearch() to find where in the file the import left off, based on which pkeys have already been imported. (GBIF pkeys are unfortnately not in any order in the input file, nor are they in insertion order in the imported table, because MySQL instead clusters the table by the pkey. this necessitates a much more complex solution to resuming a partial import.)
lib/sh/binsearch.sh: binsearch(): also echo_vars the iter_num, to track how close binsearch is to finding the value (it will always take the same # iters, log2(max - min) )
lib/sh/binsearch.sh: binsearch(): also echo_vars the min/max so these can be used as shortcut inputs if binsearch is run again
bugfix: lib/sh/util.sh: caching: cache_key for function inputs: need to use `declare -p kw_param` instead of "$kw_param" because declare accepts a param name, not value`
lib/sh/binsearch.sh: binsearch(): doc comment: fixed typo in "truncates"
bugfix: lib/sh/util.sh: func_override(): need to match shortest _* suffix instead of longest in case the function being overridden itself contained _
bugfix: lib/sh/util.sh: file_size: Linux: need % in %s
lib/sh/db.sh: mysql(): added $data_only flag which enables --skip-column-names and $output_data
bugfix: lib/sh/util.sh: file_size: need to use --format instead of -f on Linux
added lib/runscripts/table_dir.run and use it in table.run
inputs/GBIF/raw_occurrence_record/run: herbaria_filter.ih.csv_/make(): don't use any outer limit value, so that all the IH herbaria are always used. this also ensures that the first GBIF rows will be from an IH herbarium.
inputs/GBIF/raw_occurrence_record/run: herbaria_filter.table/make(): herbaria_filter: don't explicitly set ENGINE or DEFAULT CHARSET, because these should be set to the database values instead so that collations, etc. match
lib/sh/util.sh: filesystem: added file_size alias
lib/sh/util.sh: exceptions: added signals-related functions ignore_sig(), piped_cmd() and helper sig_e()
lib/sh/util.sh: $sed_cmd: don't use `command`, which causes sed calls (which are usually internal) to always be logged. instead, use echo_run wherever sed needs to be logged.
lib/sh/util.sh: echo_run(): added trailing-space alias to alias-expand next word, which is a command
lib/sh/binsearch.sh: binsearch(): echo $i at log_level 1 so it's displayed by default, as a progress indicator
lib/sh/binsearch.sh: binsearch(): echo the command being run using new echo_run()
lib/sh/util.sh: log+: set PS4 from $log_level instead of relative to its previous value. this allows PS4 to work properly at negative log_levels, in spite of the inability to store a "negative" value in a prefix string.