/trunk/bin - Changes - BIEN 3 - NCEAS Projects

root/trunk/bin @ 11974

svn:ignore: dotlockfile

#	Date	Author	Comment
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
11952	01/15/2014 08:16 AM	Aaron Marcuse-Kubitza	bugfix: bin/boldify: also match [[]]-style links at the beginning and end of a line
11951	01/15/2014 08:11 AM	Aaron Marcuse-Kubitza	bin/boldify: made it idempotent
11950	01/15/2014 08:08 AM	Aaron Marcuse-Kubitza	bugfix: bin/boldify: fixed extended regular expression syntax, which doesn't support a \] inside [] (you instead have to put the ] right after the opening [^ )
11949	01/15/2014 07:59 AM	Aaron Marcuse-Kubitza	added bin/boldify, which makes Redmine links bold
11918	12/17/2013 05:47 AM	Aaron Marcuse-Kubitza	bugfix: bin/map: in_is_db: don't ignore errors when the table does not exist, because these prevent an errexit and allow an import to continue when a staging table is missing. suppressing this error had previously been necessary because metadata-only tables (Source/) used to not have installed staging tables, and the program had to react accordingly.
11870	12/09/2013 03:09 PM	Aaron Marcuse-Kubitza	bugfix: bin/pg_dump_limit: support errexit by ignoring the nonzero exit status that grep returns when it doesn't match anything
11840	12/05/2013 08:38 AM	Aaron Marcuse-Kubitza	bin/make_analytical_db: don't regenerate family_higher_plant_group from the NCBI data because the lookup table is now prepopulated as part of the schema
11839	12/05/2013 08:37 AM	Aaron Marcuse-Kubitza	bin/import_all: don't import NCBI because the lookup table is now prepopulated as part of the schema
11823	12/04/2013 07:26 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: run in errexit mode, so that if the user cancels reinstalling of the import schema, the script will then abort instead of continuing and using the wrong schema
11806	12/03/2013 08:58 AM	Aaron Marcuse-Kubitza	bin/map: support param start="", which indicates the default value. this fixes a bug in inputs/input.Makefile $(restart_row), which outputs "" if an explicit starting row is not found.
11456	10/29/2013 03:33 AM	Aaron Marcuse-Kubitza	bugfix: bin/with_all: @inputs default value: use `local`, so that the default value is only set for the current function and doesn't leak back out into the caller. this fixes a bug in subset imports where import_all's Source/import call to with_all would add the .* datasources, but these would then stay in for the import_scrub call, causing extra .* datasources to incorrectly be imported.
11434	10/24/2013 05:07 PM	Aaron Marcuse-Kubitza	bin/make_analytical_db: removed no longer needed setting of $schema to $public, because this is now done by psql()
11430	10/24/2013 04:03 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: restore the working dir when main() is done, in case it started as something other than the root dir
11429	10/24/2013 03:49 PM	Aaron Marcuse-Kubitza	bin/after_import: support turning off the end-of-import backup for imports that are not the full database
11423	10/24/2013 01:11 PM	Aaron Marcuse-Kubitza	bugfix: bin/make_analytical_db: when running into a public schema other than "public", also pass this to `/run export_` (which currently uses $schema instead of $public)
11422	10/24/2013 01:10 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: fix $ `when .-included without args (which causes bash to put the wrong values in $` instead of leaving it empty)
11421	10/24/2013 01:09 PM	Aaron Marcuse-Kubitza	bin/import_all: `make schemas/$version/install`: reinstall instead to allow re-running the import to the same custom schema (e.g. 2013-10-18.Brian_Enquist.Canadensys)
11420	10/24/2013 01:07 PM	Aaron Marcuse-Kubitza	bin/import_all: `make schemas/$version/install`: ignore errors if schema exists, to support running with -e
11419	10/23/2013 11:10 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: removing inputs/.TNRS/tnrs/tnrs.make.lock: use `"rm" -f` instead of plain "rm" to avoid having an error exit status, which will abort the script if run with the -e flag (as runscripts are)
11416	10/23/2013 10:34 PM	Aaron Marcuse-Kubitza	bin/_all: _main(): renamed to just main() because it does not matter that other shell-includes' main() methods will clobber this, because it is only executed once
11415	10/23/2013 10:29 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: Source tables: use .../import instead of import_temp because import_temp is only needed when importing all tables, to prevent the temp suffix from being removed yet
11396	10/21/2013 07:14 PM	Aaron Marcuse-Kubitza	fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error
11393	10/20/2013 05:21 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: need to publish datasources that won't be published by `make .../import`, so that the per-datasource import XPaths that refer to TNRS/geoscrub will link up with the TNRS/geoscrub source entry instead of creating a new entry without the metadata (because the entry with the metadata was named TNRS.new/geoscrub.new)
11390	10/20/2013 04:55 PM	Aaron Marcuse-Kubitza	bin/import_all: removed no longer needed import of geoscrub data, because analytical_stem_view is now joined to the geoscrub_output table directly, instead of using the imported canon_place entries
11374	10/19/2013 06:56 PM	Aaron Marcuse-Kubitza	bin/with_all: $all: renamed to $hidden_srcs for clarity, since this now just adds the hidden (.*) datasources, rather than always using all datasources
11373	10/19/2013 06:50 PM	Aaron Marcuse-Kubitza	bugfix: bin/with_all: in $all mode, just prepend the .* datasources to the user-selected (or default) @inputs, so that using $all to add these datasources doesn't inadvertently cause the action to be performed for all datasources
11371	10/19/2013 02:15 PM	Aaron Marcuse-Kubitza	bin/import_all: usage: documented that this can now be run with a custom datasources list (each of the form inputs/src/)
11370	10/19/2013 02:02 PM	Aaron Marcuse-Kubitza	bin/with_all: added support for providing a custom list of inputs to run the command on
11286	10/17/2013 04:44 PM	Aaron Marcuse-Kubitza	bin/import_all: use just import_scrub, not reimport_scrub, because import_scrub now automatically publishes the datasource's import (i.e. removes the temp suffix)
11227	10/09/2013 10:12 PM	Aaron Marcuse-Kubitza	bin/map: usage: documented that verbosity > 3 in commit mode turns on debug_temp mode, which creates real tables instead of temp tables
10871	09/05/2013 12:11 AM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: use reimport_scrub instead of import_scrub so that the temp suffix of the datasource name is removed
10868	09/04/2013 11:48 PM	Aaron Marcuse-Kubitza	bugfix: bin/after_import: run backups/fix_perms right after the backup files are created to make them private
10865	09/04/2013 05:27 PM	Aaron Marcuse-Kubitza	bugfix: bin/make_analytical_db: `/run export_`: don't take input from the terminal, because this causes rm to prompt the user (from a background task) about overwriting the previous export
10854	09/04/2013 01:28 PM	Aaron Marcuse-Kubitza	bin/map: allow user to override the source env var, which is used as the source.shortname value in the DB
10849	08/31/2013 07:44 PM	Aaron Marcuse-Kubitza	bugfix: bin/import_all: `rm inputs/.TNRS/tnrs/tnrs.make.lock`: need to use `"rm"` instead of `rm` so that we don't use any rm alias the user might have in their shell (import_all is run in the calling shell so that the jobs are owned by the calling shell)
10847	08/31/2013 07:27 PM	Aaron Marcuse-Kubitza	bin/import_all: added step to remove any leftover TNRS lockfile (previously done manually)
10742	08/26/2013 08:45 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: add entry to new batch table
10599	08/06/2013 12:32 AM	Aaron Marcuse-Kubitza	bugfix: bin/import_times: filtering out the Source subdirs: need to match 1 at the beginning of the line only
10598	08/06/2013 12:29 AM	Aaron Marcuse-Kubitza	bin/import_times: filter out the Source subdirs, which now have single-row data and therefore are included in the rowcounts list
10589	08/04/2013 12:59 AM	Aaron Marcuse-Kubitza	bin/after_import: usage: documented that it requires $version
10586	08/03/2013 09:14 PM	Aaron Marcuse-Kubitza	bin/import_all: use new bin/after_import
10585	08/03/2013 09:13 PM	Aaron Marcuse-Kubitza	added bin/after_import, which performs post-normalized-import actions separately from bin/import_all
10580	08/03/2013 12:25 AM	Aaron Marcuse-Kubitza	bin/import_all: with_all import_scrub: documented that this step uses $by_col, so that users know to include by_col=1 when running this step separately
10579	08/03/2013 12:24 AM	Aaron Marcuse-Kubitza	bin/import_all: use column-based import (by_col=1) by default, instead of requiring the user to explicitly specify it. instead turn it off explicitly (by_col=) for row-based import.
10576	08/02/2013 11:55 PM	Aaron Marcuse-Kubitza	bin/import_all: don't set $dump_opts until running the backup command that uses it, so that the user can run this backup command separately just by copying the line out of the script (without worrying about env vars that need to be set, other than $version which is visible outside the script)
10448	07/26/2013 08:16 PM	Aaron Marcuse-Kubitza	bin/my2pg: use s!...!...! when either the regexp or the replacement contains / , to avoid unnecessary \-s
10447	07/26/2013 08:09 PM	Aaron Marcuse-Kubitza	bin/my2pg: commenting out table options: added explanatory comment, because it is not obvious from the regexp what this does
10445	07/26/2013 06:35 PM	Aaron Marcuse-Kubitza	bin/my2pg: comment out table options (http://dev.mysql.com/doc/refman/5.5/en/server-sql-mode.html#sqlmode_no_table_options) instead of removing them, because they include table COMMENTs, which contain important metadata such as table definitions. (note that table COMMENTs use a slightly different syntax than column COMMENTs, so the table COMMENTs will not be commented out twice.)
10444	07/26/2013 06:19 PM	Aaron Marcuse-Kubitza	bin/my2pg: comment out COMMENTs instead of removing them so that they will be included in the PostgreSQL translation. COMMENTs contain important metadata about columns, such as definitions and the meanings of integer flag values.
10442	07/26/2013 05:56 PM	Aaron Marcuse-Kubitza	bin/my2pg: added instructions for regenerating *.schema.sql whenever this script is changed
10441	07/26/2013 05:22 PM	Aaron Marcuse-Kubitza	bin/my2pg: COMMENT: also match COMMENTs with embedded ', because there will only be one COMMENT per line, so the contents of the COMMENT can just extend to the last ' on the line
10439	07/26/2013 04:29 PM	Aaron Marcuse-Kubitza	bin/my2pg: replace MySQL ` quotes with " quotes to support exports that were generated without ANSI_QUOTES mode. (this replacement only applies to schema exports, not data.) ANSI_QUOTES is only available with mysqldump --compatible modes that also include NO_TABLE_OPTIONS, which omits important table options such as comments. in particular, these comments are part of schemas/VegCore/VegCore.ERD.mwb but were not being included in VegCore.my.sql.
10348	07/19/2013 11:40 AM	Aaron Marcuse-Kubitza	bugfix: bin/repl: text mode: repurpose this to match SQL identifiers, for use by inputs/input.Makefile %/postprocess.sql. %/postprocess.sql is the only place currently using this mode, so this will not affect other scripts.
10283	07/14/2013 05:52 AM	Aaron Marcuse-Kubitza	bugfix: bin/*: spell out [:alnum:] as [a-zA-Z0-9] because Python unfortunately doesn't support character classes
10278	07/14/2013 02:44 AM	Aaron Marcuse-Kubitza	bin/*: replaced confusing regexp constructs involving \W inside [] with the much clearer explicit character class [:alnum:] . this avoids adding or subtracting from an inverted class in order to reach a subset of the corresponding positive class, because the subset can just be named explicitly instead.
10277	07/14/2013 02:38 AM	Aaron Marcuse-Kubitza	bugfix: bin/repl: doesn't make sense to use other chars in a [^\W_] regexp, because they will have no effect since \w doesn't include the other chars to begin with. this is a result of confusion with the ^ and \W double negative.
10255	07/11/2013 11:33 AM	Aaron Marcuse-Kubitza	bin/filter_out_ci, lib/maps.py: simplify(): also remove distinguishing #... suffix from terms (e.g. UNUSED#institutionID), to support mapping multiple columns to the special terms OMIT, PRIVATE, UNUSED (VegCore.vegpath.org#Special-terms), without creating a collision in the staging table renaming. note that this change must not be made to bin/canon, because this would cause suffixed terms to be autorenamed to their unsuffixed VegCore versions.
10237	07/10/2013 08:20 PM	Aaron Marcuse-Kubitza	bin/my2pg*: keep MySQL indefinite dates as text strings instead of translating them (to the first of the month or year) to fit into a PostgreSQL timestamp. this allows the application to decide how to handle these values, which otherwise have no corresponding value in PostgreSQL. this requires changing the date/time related types to text instead of leaving them as-is, so that they can store the custom MySQL strings.
10225	07/10/2013 04:51 PM	Aaron Marcuse-Kubitza	bin/my2pg: use util.sh $top_dir instead of setting $selfDir
10224	07/10/2013 04:50 PM	Aaron Marcuse-Kubitza	bin/my2pg: use the util.sh sed wrapper, which fixes the LANG=.UTF-8 "illegal byte sequence" errors on invalid UTF-8
10191	07/09/2013 12:56 AM	Aaron Marcuse-Kubitza	bin/map: removed no longer used support for map.csv input column prefixes (expand out the prefixes instead). this used to be used by SpeciesLink to use just one mapping for a single term with multiple DwC namespaces, but was replaced with an explicit, ordered rather than implicit, unordered /_alt-ing together of the terms.
10190	07/08/2013 11:47 PM	Aaron Marcuse-Kubitza	bin/map: removed no longer accurate comment that this is case- and punctuation-insensitive, since the case- and punctuation-insensitivity is now instead handled by map.csv preprocessing scripts before the mappings are even provided to bin/map
10140	07/02/2013 02:31 PM	Aaron Marcuse-Kubitza	bugfix: bin/map: in_is_db: inline metadata value columns (used by new-style import) so that they can be compared by value in XML simplifying functions (lib/xml_func.py)
10115	07/02/2013 03:50 AM	Aaron Marcuse-Kubitza	bin/map: map_table(): Resolve prefixes: combined db_xml.ColRef() constructor call with creation of args (as tuple) for clarity
10114	07/02/2013 03:35 AM	Aaron Marcuse-Kubitza	bin/map: update_in_label(): use in_schema instead of the map spreadsheet column name when available, to allow using one spreadsheet for all datasources (which would not have a datasource-specific spreadsheet column name)
10077	06/27/2013 01:40 AM	Aaron Marcuse-Kubitza	bin/src_map: support custom (or no) new_term_prefix. no new_term_prefix is useful for views whose columns have already been renamed in the underlying tables and should not have * re-prepended.
10066	06/26/2013 02:26 PM	Aaron Marcuse-Kubitza	bin/make: moved $make_filter_active test to lib/sh/make.sh make() so that it's also used when make() is run directly (e.g. in a runscript) rather than via the bin/make wrapper in the PATH
10062	06/26/2013 01:05 PM	Aaron Marcuse-Kubitza	bugfix: bin/make: use separate $make_filter_active flag instead of $is_outermost for avoiding duplicate output filtering, so that an outer runscript, which sets $is_outermost but does not activate the make filter, will not prevent the make filter from being activated when make is invoked
10061	06/26/2013 01:00 PM	Aaron Marcuse-Kubitza	bugfix: bin/make: need to use sys_cmd instead of command so that the system make command is invoked instead of the wrapper (which would cause infinite mutual recursion for the ~/bien working copy, although not for the ~/Dropbox/svn working copy because nonrecursive=1 was able to remove the single recursion)
10060	06/26/2013 12:19 PM	Aaron Marcuse-Kubitza	bin/make: use .rel to do relative includes
10049	06/26/2013 12:34 AM	Aaron Marcuse-Kubitza	bugfix: bin/make: do not alter the PATH passed to the invoked make command, since this is a general-purpose wrapper and is not linked to a specific working copy (it could be used to wrap any make invocation, not just for commands in the svn dir). this uses lib/sh/local.sh's new PATH_add= flag.
10023	06/25/2013 12:23 PM	Aaron Marcuse-Kubitza	bugfix: bin/make: need to leave bin/, ~/bin/ in the PATH when running make nonrecursively, so that commands invoked by it which are located in these dirs (e.g. put, which will be used by `make inputs/upload`) can still be found. this requires using command()'s new nonrecursive=1 flag instead of running no_PATH_recursion, so that no_PATH_recursion() only affects the resolution of the command path, but does not propagate the filtered PATH to the invoked command itself.
10004	06/23/2013 03:43 PM	Aaron Marcuse-Kubitza	added bin/.rsync_ignore with filters from /README.TXT > Maintenance > to synchronize vegbiendev, jupiter, and your local machine. these filters will now be used with bin/sync_upload in addition to the periodic backup commands.
9999	06/22/2013 12:23 AM	Aaron Marcuse-Kubitza	bin/tnrs_db: documented total runtime (10 days)
9998	06/21/2013 11:58 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: documented current runtime (162 ms/name)
9925	06/19/2013 09:43 AM	Aaron Marcuse-Kubitza	bin/make_analytical_db: use new mk_table() instead of TRUNCATE/INSERT
9924	06/19/2013 09:41 AM	Aaron Marcuse-Kubitza	bin/make_analytical_db: added mk_table() and use it in mk_analytical_table()
9915	06/18/2013 06:22 PM	Aaron Marcuse-Kubitza	bin/make_analytical_db: added `/run export_` to make the geoscrub_input CSV export
9892	06/12/2013 01:35 PM	Aaron Marcuse-Kubitza	added bin/sync_upload, a wrapper around sync_upload()
9891	06/12/2013 01:23 PM	Aaron Marcuse-Kubitza	added bin/sync_upload, a wrapper around sync_upload()
9869	06/12/2013 07:12 AM	Aaron Marcuse-Kubitza	bugfix: bin/make: use verbosity_compat because some make-invoked commands (e.g. bin/map) don't support verbosity=""
9862	06/12/2013 06:27 AM	Aaron Marcuse-Kubitza	bugfix: bin/make: include local.sh so that its default verbosity-setting make() override will be used
9785	06/09/2013 12:37 PM	Aaron Marcuse-Kubitza	bin/repl: added unescape_html() filter function, which can be specified as the replacement string
9784	06/09/2013 12:35 PM	Aaron Marcuse-Kubitza	bin/repl: support Unicode characters in the matched portion of the string
9741	06/06/2013 02:25 AM	Aaron Marcuse-Kubitza	bugfix: bin/make: use standard make logging port (1) instead of $log_fd (30) so that the output of bin/make can in turn be filtered by util.sh using the standard cmd_log_fd=1
9738	06/06/2013 02:00 AM	Aaron Marcuse-Kubitza	bin/make: use `readlink -f` on BASH_SOURCE⁰ so that this script can also be run via a symlink
9733	06/06/2013 01:32 AM	Aaron Marcuse-Kubitza	bin/make: don't print make cmd by default, so that only `make` output is printed at verbosity 1
9732	06/06/2013 01:31 AM	Aaron Marcuse-Kubitza	bin/make: don't reinvoke make() if the make filter has already been set up, as indicated by $is_outermost (instead, invoke make directly using exec)
9727	06/06/2013 12:08 AM	Aaron Marcuse-Kubitza	added bin/make, which runs make, hiding verbose messages about making included Makefiles. this should be used in preference to plain make, to avoid excessive log messages that prevent the user from seeing the core commands that are being run.
9530	05/23/2013 04:40 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: documented how to estimate total runtime. note that our tnrs_db wrapper in inputs/.TNRS/tnrs/tnrs.make uses inputs/.TNRS/tnrs/logs/tnrs.make.log.sql as the log file.
9527	05/23/2013 03:00 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: removed unused imports
9526	05/23/2013 02:55 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: cumulative_tnrs_profiler: use tnrs.tnrs_request()'s new cumulative_profiler param instead of doing the profiling manually. this also ensures that there isn't extra time between when the cumulative profiler starts/stops and when the per-request profiler starts/stops (because Profiler's new add_subprofiler() method is used).
9522	05/23/2013 02:38 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: tnrs_profiler: renamed to cumulative_tnrs_profiler to distinguish it from the tnrs_profiler used by tnrs.tnrs_request(), which just profiles the current request
9521	05/23/2013 02:36 PM	Aaron Marcuse-Kubitza	bugfix: bin/tnrs_db: cumulative profiler: use len(names) instead of this_ct (cur.rowcount) in case the actual # rows fetched differed from the rowcount
9520	05/23/2013 02:32 PM	Aaron Marcuse-Kubitza	lib/tnrs.py: repeated_tnrs_request(): renamed to tnrs_request() since this is the function that should usually be used, to ensure that debugging information is output in the case of an error. (the TNRS request must be made again to output this information.)
9518	05/23/2013 02:25 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: removed no longer used $wait flag (which caused tnrs_db to wait max_pause for new rows to be added), because tnrs_db is now invoked automatically after each import by the import_scrub target (in inputs/input.Makefile) and does not need to run as a daemon. note that when scrub is invoked, it is possible that a previous datasource's import has already scrubbed the names for this import, because tnrs_db runs until all rows in tnrs_input_name are scrubbed....
9517	05/23/2013 02:14 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: removed no longer needed explicit population of the Time_submitted, which is now done automatically by the tnrs table. however, this requires starting the transaction before submitting data, so Time_submitted is correctly set to the submission time rather than the insertion time. the setting of the correct time can be tested by inserting `time.sleep(n_sec)` after the TNRS request and checking that the Time_submitted is close to the time tnrs_db was run instead of n_sec seconds later.
9516	05/23/2013 02:09 PM	Aaron Marcuse-Kubitza	bin/tnrs_db: start transaction before submitting data, so Time_submitted is correctly set to the submission time rather than the insertion time. these may differ by several minutes if TNRS is slow. the setting of the correct time can be tested by inserting `time.sleep(n_sec)` after the TNRS request, removing the explicit setting of Time_submitted, and checking that the Time_submitted is close to the time tnrs_db was run instead of n_sec seconds later.
9515	05/23/2013 02:05 PM	Aaron Marcuse-Kubitza	bugfix: bin/tnrs_db: wrap just the TNRS request and the storing of the response data in a function (undoing part of r9514), because the transaction start time for Time_submitted should not be until the TNRS request is actually made (it often takes several minutes to materialize the next set of input names on a full DB)

Project

General

Profile

root/trunk/bin @ 11974