/ - Changes - BIEN 3 - NCEAS Projects

root @ 9610

#	Date	Author	Comment
9610	05/29/2013 10:00 AM	Aaron Marcuse-Kubitza	web/.htaccess: mod_autoindex: Note that some listed files are not web-accessible: use ' instead of " to avoid \-escaping embedded "
9609	05/29/2013 09:39 AM	Aaron Marcuse-Kubitza	web/.htaccess: mod_autoindex: sort by description when provided, to allow setting a custom (non-alphabetical) sort order using AddDescription
9608	05/29/2013 09:37 AM	Aaron Marcuse-Kubitza	web/.htaccess: mod_autoindex: added note that some listed files are not web-accessible. they will produce a "Forbidden" error when clicked.
9607	05/29/2013 09:36 AM	Aaron Marcuse-Kubitza	bugfix: web/index.php: added space between the full directory index and the preceding content
9606	05/29/2013 09:35 AM	Aaron Marcuse-Kubitza	web/index.php: moved the full directory index within the rest of the document body
9605	05/29/2013 08:58 AM	Aaron Marcuse-Kubitza	web/index.php: include full directory index, since the URL patterns list is just a subset of the content available through vegpath.org
9604	05/29/2013 08:25 AM	Aaron Marcuse-Kubitza	web/.htaccess: added mod_autoindex IndexOptions, in particular FoldersFirst
9603	05/29/2013 05:26 AM	Aaron Marcuse-Kubitza	bugfix: web/.htaccess: changed "mod_dir listing"->"mod_autoindex listing" because mod_dir does not actually handle the autogenerated listings
9602	05/29/2013 05:24 AM	Aaron Marcuse-Kubitza	bugfix: web/.htaccess: DirectoryIndex: use disabled instead of on because on is actually treated as a filename, and does not invoke mod_autoindex. the DirectoryIndex directive and the mod_dir module actually apply only to manual index files, not to autogenerated dir listings (which are handled by mod_autoindex).
9601	05/29/2013 04:58 AM	Aaron Marcuse-Kubitza	web/index.php: removed no longer needed custom alias j.mp/vegpath# for when page reached through vegbiendev.nceas.ucsb.edu, because vegpath.org is a much more reliable domain than the previous path.vg, and a separate way to reach VegPath when path.vg is down is no longer needed
9600	05/29/2013 04:43 AM	Aaron Marcuse-Kubitza	web/.htaccess: <dir>/all forces mod_dir listing: use simpler $mod_dir_listing env var instead of query string modification to indicate that an explicit mod_dir listing should be displayed. this causes /all to replace ?index=1 as the way to force a mod_dir listing. note that the %{ENV:...} test needs to use $REDIRECT_mod_dir_listing instead of $mod_dir_listing, because a redirect will occur between the /all rule and the index.* rule, causing all env vars to be prepended with REDIRECT_ .
9599	05/29/2013 03:48 AM	Aaron Marcuse-Kubitza	web/.htaccess: <dir>/all forces mod_dir listing, as a simpler syntax than ?index=1
9598	05/29/2013 03:28 AM	Aaron Marcuse-Kubitza	web/.htaccess: for dirs, redirect to index.*: allow requesting a mod_dir listing instead with ?index=1
9597	05/29/2013 03:26 AM	Aaron Marcuse-Kubitza	web/.htaccess: handle DirectoryIndex redirects in a RewriteRule instead of with `DirectoryIndex index`, so that RewriteConds can be used to configure when index.* is used as the DirectoryIndex instead of a mod_dir listing
9596	05/29/2013 02:30 AM	Aaron Marcuse-Kubitza	web/.htaccess: handle DirectoryIndex subrequests when there is no DirectoryIndex: moved comment about -F subrequest after line it applies to
9595	05/29/2013 02:27 AM	Aaron Marcuse-Kubitza	inputs/GBIF/_MySQL/run: documented steps to reload GBIF MySQL
9594	05/29/2013 02:19 AM	Aaron Marcuse-Kubitza	web/.htaccess: RewriteRules: added standard [discardpath,noescape,qsappend] options where missing (these should be the default, but aren't)
9593	05/24/2013 03:13 PM	Aaron Marcuse-Kubitza	inputs/GBIF/raw_occurrence_record/run: herbaria_filter.table/make(): inline the PRIMARY KEY statement with its column
9592	05/24/2013 03:10 PM	Aaron Marcuse-Kubitza	bugfix: inputs/GBIF/raw_occurrence_record/run: plant_fraction.table/make(): create the table once with "IF NOT EXISTS" and then populate it with INSERT SELECT, to avoid locking it while it's being repopulated. dropping and recreating the table with CREATE TABLE AS prevented phpMyAdmin from even reading the database's tables list, because it was unable to fetch a rowcount for plant_fraction.
9591	05/24/2013 03:04 PM	Aaron Marcuse-Kubitza	lib/sh/db.sh: mysql(): when echoing queries, also echo runtimes (turned on with `--verbose --verbose --verbose`)
9590	05/24/2013 02:32 PM	Aaron Marcuse-Kubitza	added lib/runscripts/datasrc_dir.run
9589	05/24/2013 02:30 PM	Aaron Marcuse-Kubitza	inputs/GBIF/_MySQL/run: added load_data(), which loads the dumpfile into MySQL
9588	05/24/2013 02:06 PM	Aaron Marcuse-Kubitza	lib/sh/db.sh: added mysql_rm_privileged_statements()
9587	05/24/2013 02:00 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/resume_import.sh: sed calls: moved end-of-line comments to their own line because end-of-line comments are not supported on Mac
9586	05/24/2013 01:55 PM	Aaron Marcuse-Kubitza	lib/runscripts/table_dir.run: renamed table to subdir because this can apply to any datasrc subdir. moved table-specific code to table.run.
9585	05/24/2013 01:43 PM	Aaron Marcuse-Kubitza	lib/runscripts/table_dir.run: renamed table to subdir because this can apply to any datasrc subdir. moved table-specific code to table.run.
9584	05/24/2013 01:21 PM	Aaron Marcuse-Kubitza	lib/runscripts/table_dir.run: table_make(): moved $silent flag to lib/sh/make.sh make() so all make callers can use it
9583	05/24/2013 12:35 PM	Aaron Marcuse-Kubitza	bugfix: inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.data.sql.run: override ^.preamble.sql/make() and use ../_src/GBIFPortalDB-2013-02-20.dump as the dumpfile instead of this file, which does not contain the preamble
9582	05/24/2013 12:23 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/resume_import.sh: $preamble_file: use the extension .0.preamble.sql instead of .preamble.sql so the preamble file sorts before the other *.sql files
9581	05/24/2013 12:22 PM	Aaron Marcuse-Kubitza	removed inputs/GBIF/_MySQL/MySQL.data.sql, since we are using the much faster exported TSVs instead (see raw_occurrence_record/table.tsv). this also avoids confusion between GBIFPortalDB-2013-02-20.data.sql and MySQL.data.sql* when loading data into MySQL.
9580	05/24/2013 12:18 PM	Aaron Marcuse-Kubitza	bugfix: inputs/GBIF/_MySQL/MySQL.data.sql.run: moved to GBIFPortalDB-2013-02-20.data.sql.run since it's actually the raw input file, not the ANSI export of it, that needs to be imported
9579	05/24/2013 12:16 PM	Aaron Marcuse-Kubitza	lib/sh/resume_import.sh: get_pkey_at_pos(): changed $quote to ` to work with inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.data.sql
9578	05/24/2013 11:50 AM	Aaron Marcuse-Kubitza	lib/sh/db.sh: mysql(): added $log_queries flag, which can be turned off to avoid using --verbose. this is useful when running bulk INSERT statements.
9577	05/24/2013 11:35 AM	Aaron Marcuse-Kubitza	lib/sh/local.sh: added mysql_local()
9576	05/24/2013 11:24 AM	Aaron Marcuse-Kubitza	lib/sh/local.sh: added mysql_root()
9575	05/24/2013 11:24 AM	Aaron Marcuse-Kubitza	lib/sh/local.sh: added $root_user, $root_password
9574	05/24/2013 11:22 AM	Aaron Marcuse-Kubitza	lib/sh/db.sh: added use_root alias (similar to use_local/use_remote)
9573	05/24/2013 11:21 AM	Aaron Marcuse-Kubitza	added inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.schema.z.clean_up.sql, which removes duplicated and unnecessary indexes in raw_occurrence_record
9572	05/24/2013 11:20 AM	Aaron Marcuse-Kubitza	added inputs/GBIF/_MySQL/GBIFPortalDB-2013-02-20.schema.0.preamble.sql
9571	05/24/2013 11:02 AM	Aaron Marcuse-Kubitza	bugfix: lib/sh/resume_import.sh: sql_preamble(): also stop at first "-- Table structure for table" line (when using a full dumpfile rather than a data-only subset)
9570	05/24/2013 10:58 AM	Aaron Marcuse-Kubitza	lib/sh/resume_import.sh: resume_import(): run connection preamble (first few lines of dumpfile) before continuing with main file at offset, so that connection setting are reapplied
9569	05/24/2013 06:45 AM	Aaron Marcuse-Kubitza	lib/sh/resume_import.sh: is_pkey_imported__int(): use echo_stdout so the user can see the result of the > function in each iteration
9568	05/24/2013 06:42 AM	Aaron Marcuse-Kubitza	added lib/sh/resume_import.sh and use it in inputs/GBIF/_MySQL/MySQL.data.sql.run
9567	05/24/2013 06:32 AM	Aaron Marcuse-Kubitza	inputs/GBIF/_MySQL/MySQL.data.sql.run: is_pkey_imported__int(): made pkey name configurable in $pkey_name
9566	05/24/2013 05:32 AM	Aaron Marcuse-Kubitza	inputs/GBIF/_MySQL/MySQL.data.sql.run: import_resume_pos() run time: removed seconds because the precision is likely only to the nearest half-minute
9565	05/24/2013 05:31 AM	Aaron Marcuse-Kubitza	inputs/GBIF/_MySQL/MySQL.data.sql.run: documented that import_resume_pos() takes 6 min to run, with 37 iterations
9564	05/24/2013 05:20 AM	Aaron Marcuse-Kubitza	added inputs/GBIF/_MySQL/MySQL.data.sql.run, with helper functions for resuming the import to MySQL from where it left off. this is very useful if the import is interrupted for any reason, because otherwise, the entire import would have to be run again from the start, taking 40-50 hours. import_resume_pos() uses new binsearch() to find where in the file the import left off, based on which pkeys have already been imported. (GBIF pkeys are unfortnately not in any order in the input file, nor are they in insertion order in the imported table, because MySQL instead clusters the table by the pkey. this necessitates a much more complex solution to resuming a partial import.)
9563	05/24/2013 05:14 AM	Aaron Marcuse-Kubitza	lib/sh/binsearch.sh: binsearch(): also echo_vars the iter_num, to track how close binsearch is to finding the value (it will always take the same # iters, log2(max - min) )
9562	05/24/2013 05:11 AM	Aaron Marcuse-Kubitza	lib/sh/binsearch.sh: binsearch(): also echo_vars the min/max so these can be used as shortcut inputs if binsearch is run again
9561	05/24/2013 04:58 AM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: caching: cache_key for function inputs: need to use `declare -p kw_param` instead of "$kw_param" because declare accepts a param name, not value`
9560	05/24/2013 03:40 AM	Aaron Marcuse-Kubitza	lib/sh/binsearch.sh: binsearch(): doc comment: fixed typo in "truncates"
9559	05/24/2013 03:17 AM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: func_override(): need to match shortest _* suffix instead of longest in case the function being overridden itself contained _
9558	05/24/2013 01:51 AM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: file_size: Linux: need % in %s
9557	05/24/2013 01:43 AM	Aaron Marcuse-Kubitza	lib/sh/db.sh: mysql(): added $data_only flag which enables --skip-column-names and $output_data
9556	05/24/2013 01:41 AM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: file_size: need to use --format instead of -f on Linux
9555	05/24/2013 01:22 AM	Aaron Marcuse-Kubitza	added lib/runscripts/table_dir.run and use it in table.run
9554	05/24/2013 01:20 AM	Aaron Marcuse-Kubitza	inputs/GBIF/raw_occurrence_record/run: herbaria_filter.ih.csv_/make(): don't use any outer limit value, so that all the IH herbaria are always used. this also ensures that the first GBIF rows will be from an IH herbarium.
9553	05/24/2013 01:17 AM	Aaron Marcuse-Kubitza	inputs/GBIF/raw_occurrence_record/run: herbaria_filter.table/make(): herbaria_filter: don't explicitly set ENGINE or DEFAULT CHARSET, because these should be set to the database values instead so that collations, etc. match
9552	05/24/2013 12:50 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: filesystem: added file_size alias
9551	05/24/2013 12:34 AM	Aaron Marcuse-Kubitza	lib/sh/util.sh: exceptions: added signals-related functions ignore_sig(), piped_cmd() and helper sig_e()
9550	05/23/2013 11:40 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: $sed_cmd: don't use `command`, which causes sed calls (which are usually internal) to always be logged. instead, use echo_run wherever sed needs to be logged.
9549	05/23/2013 11:38 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: echo_run(): added trailing-space alias to alias-expand next word, which is a command
9548	05/23/2013 11:31 PM	Aaron Marcuse-Kubitza	lib/sh/binsearch.sh: binsearch(): echo $i at log_level 1 so it's displayed by default, as a progress indicator
9547	05/23/2013 11:30 PM	Aaron Marcuse-Kubitza	lib/sh/binsearch.sh: binsearch(): echo $i at log_level 1 so it's displayed by default, as a progress indicator
9546	05/23/2013 11:29 PM	Aaron Marcuse-Kubitza	lib/sh/binsearch.sh: binsearch(): echo the command being run using new echo_run()
9545	05/23/2013 11:25 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: log+: set PS4 from $log_level instead of relative to its previous value. this allows PS4 to work properly at negative log_levels, in spite of the inability to store a "negative" value in a prefix string.
9544	05/23/2013 11:23 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added float_set_min()
9543	05/23/2013 11:22 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: log+(): log_level: set it using simpler $(()), since log_level will never be fractional (although verbosity can be). log_level may of course be fractional in invoked scripts, but that does not affect util.sh.
9542	05/23/2013 10:44 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: log++: also track a numeric log_level var, which follows the PS4 prefix
9541	05/23/2013 10:35 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: MatchedTaxon: matchedFamily: use Accepted_family when the Name_matched_accepted_family is not provided, as it's omitted by the current TNRS CSV schema
9540	05/23/2013 09:54 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: log+(): PS4: split if statement onto multiple lines for clarity
9539	05/23/2013 09:44 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added back echo_run(), usable for internal commands where command() would be used for external commands
9538	05/23/2013 09:33 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added int2bool()
9537	05/23/2013 09:25 PM	Aaron Marcuse-Kubitza	*{.sh,run}: use new `\|\| ignore` instead of ignore_e/end_try
9536	05/23/2013 09:25 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: added ignore(), which uses \|\|-syntax
9535	05/23/2013 09:13 PM	Aaron Marcuse-Kubitza	lib/sh/util.sh: ignore(): renamed to ignore_e() so ignore() can be used for a simpler, \|\|-based command
9534	05/23/2013 09:09 PM	Aaron Marcuse-Kubitza	bugfix: lib/sh/util.sh: catch(): need && between test and e=0 so e=0 is only run if $e was equal to the desired value
9533	05/23/2013 08:22 PM	Aaron Marcuse-Kubitza	added lib/sh/binsearch.sh
9532	05/23/2013 06:27 PM	Aaron Marcuse-Kubitza	bugfix: README.TXT: Full database import: screen: need to unset TMOUT, version after running `screen` rather than before so they take effect within the `screen` shell
9531	05/23/2013 06:25 PM	Aaron Marcuse-Kubitza	README.TXT: Full database import: after running `screen`: run `set -o ignoreeof` to prevent Ctrl+D from exiting `screen` to keep attached jobs
9530	05/23/2013 04:40 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: documented how to estimate total runtime. note that our tnrs_db wrapper in inputs/.TNRS/tnrs/tnrs.make uses inputs/.TNRS/tnrs/logs/tnrs.make.log.sql as the log file.
9529	05/23/2013 03:33 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql, data.sql: updated TNRS CSV columns to preserve Name_matched_accepted_family even though it isn't present in the current TNRS CSVs. this way, Name_matched_accepted_family can still be used for previously-scrubbed names, and family_matched can be added back to analytical_stem_view. (now that bin/tnrs_db uses an explicit columns list in COPY TO, the absence of a column in the CSV is no longer a problem.)
9528	05/23/2013 03:28 PM	Aaron Marcuse-Kubitza	README.TXT: updating TNRS CSV columns: use the entire "COPY tnrs ..." statement instead of just the body of it so that the explicit columns list is included. this way, the COPY statement will cause an error if the TNRS schema was changed but inputs/.TNRS/data.sql was not yet updated.
9527	05/23/2013 03:00 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: removed unused imports
9526	05/23/2013 02:55 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: cumulative_tnrs_profiler: use tnrs.tnrs_request()'s new cumulative_profiler param instead of doing the profiling manually. this also ensures that there isn't extra time between when the cumulative profiler starts/stops and when the per-request profiler starts/stops (because Profiler's new add_subprofiler() method is used).
9525	05/23/2013 02:53 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: single_tnrs_request(): added support for a cumulative profiler using the cumulative_profiler kw param
9524	05/23/2013 02:53 PM	Aaron Marcuse-Kubitza	lib/profiling.py: Profiler: added add_subprofiler(), for use with cumulative profilers
9523	05/23/2013 02:48 PM	Aaron Marcuse-Kubitza	lib/profiling.py: Profiler: added add_time() and use it instead of `self.total +=`
9522	05/23/2013 02:38 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: tnrs_profiler: renamed to cumulative_tnrs_profiler to distinguish it from the tnrs_profiler used by tnrs.tnrs_request(), which just profiles the current request
9521	05/23/2013 02:36 PM	Aaron Marcuse-Kubitza	bugfix: bin/tnrs_db: cumulative profiler: use len(names) instead of this_ct (cur.rowcount) in case the actual # rows fetched differed from the rowcount
9520	05/23/2013 02:32 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: repeated_tnrs_request(): renamed to tnrs_request() since this is the function that should usually be used, to ensure that debugging information is output in the case of an error. (the TNRS request must be made again to output this information.)
9519	05/23/2013 02:30 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: tnrs_request(): renamed to single_tnrs_request() to distinguish it from repeated_tnrs_request()
9518	05/23/2013 02:25 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: removed no longer used $wait flag (which caused tnrs_db to wait max_pause for new rows to be added), because tnrs_db is now invoked automatically after each import by the import_scrub target (in inputs/input.Makefile) and does not need to run as a daemon. note that when scrub is invoked, it is possible that a previous datasource's import has already scrubbed the names for this import, because tnrs_db runs until all rows in tnrs_input_name are scrubbed....
9517	05/23/2013 02:14 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: removed no longer needed explicit population of the Time_submitted, which is now done automatically by the tnrs table. however, this requires starting the transaction before submitting data, so Time_submitted is correctly set to the submission time rather than the insertion time. the setting of the correct time can be tested by inserting `time.sleep(n_sec)` after the TNRS request and checking that the Time_submitted is close to the time tnrs_db was run instead of n_sec seconds later.
9516	05/23/2013 02:09 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: start transaction before submitting data, so Time_submitted is correctly set to the submission time rather than the insertion time. these may differ by several minutes if TNRS is slow. the setting of the correct time can be tested by inserting `time.sleep(n_sec)` after the TNRS request, removing the explicit setting of Time_submitted, and checking that the Time_submitted is close to the time tnrs_db was run instead of n_sec seconds later.
9515	05/23/2013 02:05 PM	Aaron Marcuse-Kubitza	bugfix: bin/tnrs_db: wrap just the TNRS request and the storing of the response data in a function (undoing part of r9514), because the transaction start time for Time_submitted should not be until the TNRS request is actually made (it often takes several minutes to materialize the next set of input names on a full DB)
9514	05/23/2013 01:56 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: Iterate over unscrubbed verbatim taxonlabels: put loop body in a function (which returns whether or not the loop should continue), so that the loop body can easily be wrapped in a transaction using sql.with_savepoint()
9513	05/23/2013 01:19 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs.Time_submitted: set default to now() (the timestamp of the start of the current transaction, http://www.postgresql.org/docs/9.1/static/functions-datetime.html) so that it would automatically be populated when rows are added. note that because the start of the current transaction instead of the exact time at insertion is used, all rows inserted in the same transaction (e.g. as part of the same batch) will have the same value for this, linking them together.
9512	05/23/2013 01:10 PM	Aaron Marcuse-Kubitza	inputs/.TNRS/schema.sql: tnrs_populate_derived_fields(): renamed to tnrs_populate_fields() so it can be used to populate other fields as well
9511	05/23/2013 01:07 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: removed no longer needed explicit appending of derived cols, and instead use append_csv()'s new support for importing CSVs whose columns are a subset of the full table

Project

General

Profile

root @ 9610