Project

General

Profile

Statistics
| Revision:
  • svn:ignore: dotlockfile

# Date Author Comment
13983 07/11/2014 08:41 AM Aaron Marcuse-Kubitza

bugfix: bin/import_all: was run without initial "." test: don't exit nonzero because this will close the subshell

13982 07/11/2014 08:38 AM Aaron Marcuse-Kubitza

bugfix: bin/import_all: ensure that this is run in a subshell, which is needed so errexits don't close the terminal window

13981 07/11/2014 08:32 AM Aaron Marcuse-Kubitza

bin/import_all: documented that this must be run in a subshell (obtained by running `$0`)

13980 07/11/2014 08:25 AM Aaron Marcuse-Kubitza

bugfix: bin/import_all: need to always use log files for background processes

13979 07/11/2014 08:12 AM Aaron Marcuse-Kubitza

fix: bin/import_all: Source/import: don't use by_col=1 for this because it's slower for small #s of rows. by_col mode is no longer needed for metadata-only tables because these tables now have a single empty row so that they also work in row-based mode.

13978 07/11/2014 08:06 AM Aaron Marcuse-Kubitza

fix: bin/import_all: hidden srcs: use with_all for this to avoid needing to list every source, and to display the backgrounded command with the variables substituted

13977 07/11/2014 07:40 AM Aaron Marcuse-Kubitza

bin/import_all: TNRS, geoscrub: integrated into the list of metadata sources

13976 07/11/2014 07:39 AM Aaron Marcuse-Kubitza

bin/import_all: TNRS, geoscrub: use import rather than publish because the non-imported tables have now been excluded

13974 07/10/2014 07:25 PM Aaron Marcuse-Kubitza

fix: bin/import_all: updated for new metadata datasource names (see issue #940)

13866 06/26/2014 04:11 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: taxon_match: insert names via taxon_match_input auto-updatable view instead of directly into taxon_match, to allow the taxon_match columns to be renamed while still supporting inserts using the TNRS column names

13862 06/26/2014 02:38 AM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs_match: renamed to taxon_match to use the normalized VegCore name for this, and to avoid repeating the schema name

13850 06/25/2014 03:33 PM Aaron Marcuse-Kubitza

inputs/.TNRS/schema.sql: tnrs: renamed to tnrs_match to distinguish it from other TNRS-related tables

13793 06/17/2014 04:26 PM Aaron Marcuse-Kubitza

fix: bin/in_place: usage: removed duplicate copy of [preserve_mtime=1]

13309 04/23/2014 10:01 PM Aaron Marcuse-Kubitza

bin/in_place: diff: use --brief to avoid scanning the entire file for large files

13308 04/23/2014 09:57 PM Aaron Marcuse-Kubitza

bin/in_place: added $preserve_mtime flag

13161 04/17/2014 02:51 PM Aaron Marcuse-Kubitza

lib/sh/db.sh pg_dump(), bin/pg_dump_vegbien: --format: use the long form of the formats to make the code self-documenting

13077 04/09/2014 02:52 AM Aaron Marcuse-Kubitza

bin/repl: match as whole-word text (like SQL identifier): documented that this is a generalization of lib/sql_gen.py map_expr() to work on entire source files

13076 04/09/2014 02:50 AM Aaron Marcuse-Kubitza

bin/repl, lib/sql_gen.py Expression transforming: documented that this can also be done in Postgres with expression substitution (wiki.vegpath.org/Postgres_queries#expression-substitution)

12995 03/30/2014 06:31 PM Aaron Marcuse-Kubitza

bin/make_analytical_db: removed remake_diff_tables() because this is now done for each datasource in inputs/input.Makefile

12990 03/30/2014 06:02 PM Aaron Marcuse-Kubitza

bin/make_analytical_db: removed no longer needed "${public}_validations" schema qualifier, now that it is in the search_path

12989 03/30/2014 06:00 PM Aaron Marcuse-Kubitza

fix: bin/vegbien_dest: added public_validations

12914 03/26/2014 09:23 PM Aaron Marcuse-Kubitza

fix: bin/repl: text mode (whether all patterns are plain text) should default to on, not off, if matching entire cells in a spreadsheet

12897 03/26/2014 02:17 PM Aaron Marcuse-Kubitza

fix: bin/repl: don't consider uppercase SQL keywords to indicate that a word is in a sentence

12754 03/18/2014 05:22 PM Aaron Marcuse-Kubitza

bugfix: bin/repl: only use excluded_prefix_re/excluded_suffix_re in text mode (used in renaming columns in SQL scripts), to prevent the special coding for column renames from also affecting regular regexp/word replacements

12750 03/18/2014 04:59 AM Aaron Marcuse-Kubitza

bugfix: bin/repl: text mode: also don't match if it's part of a '-'-separated identifier

12749 03/18/2014 04:57 AM Aaron Marcuse-Kubitza

bugfix: bin/repl: text mode: also don't match if it's a word in a sentence

12748 03/18/2014 04:42 AM Aaron Marcuse-Kubitza

bugfix: bin/repl: text mode: turned off the suffix matching, because there are cases where a mapping adds a suffix which would cause the same replacement to be performed repeatedly

12746 03/18/2014 04:25 AM Aaron Marcuse-Kubitza

bin/repl: text mode: exclude prefixes that should not cause replacement, to avoid doubling leading *

12742 03/18/2014 04:03 AM Aaron Marcuse-Kubitza

bin/repl: text mode: also match w/ suffix (eg. _verbatim)

12394 02/23/2014 08:29 PM Aaron Marcuse-Kubitza

bin/psql_verbose_vegbien: use \\ instead of \ inside '' because this is sh, not bash

12393 02/23/2014 08:26 PM Aaron Marcuse-Kubitza

bin/psql_verbose_vegbien: changed prep-statement order to match lib/sh/db.sh psql()

12392 02/23/2014 08:26 PM Aaron Marcuse-Kubitza

bin/psql_verbose_vegbien: use `\set VERBOSITY terse` to hide stack traces/DETAIL sections of error messages, like in lib/sh/db.sh psql()

12391 02/23/2014 08:11 PM Aaron Marcuse-Kubitza

bin/make_analytical_db: added `public_validations.remake_diff_tables()`

12041 02/05/2014 12:55 AM Aaron Marcuse-Kubitza

bugfix: bin/pg_dump_vegbien: fixed arg-count check to allow passing command-line options to pg_dump via args

11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

11952 01/15/2014 08:16 AM Aaron Marcuse-Kubitza

bugfix: bin/boldify: also match [[]]-style links at the beginning and end of a line

11951 01/15/2014 08:11 AM Aaron Marcuse-Kubitza

bin/boldify: made it idempotent

11950 01/15/2014 08:08 AM Aaron Marcuse-Kubitza

bugfix: bin/boldify: fixed extended regular expression syntax, which doesn't support a \] inside [] (you instead have to put the ] right after the opening [^ )

11949 01/15/2014 07:59 AM Aaron Marcuse-Kubitza

added bin/boldify, which makes Redmine links bold

11918 12/17/2013 05:47 AM Aaron Marcuse-Kubitza

bugfix: bin/map: in_is_db: don't ignore errors when the table does not exist, because these prevent an errexit and allow an import to continue when a staging table is missing. suppressing this error had previously been necessary because metadata-only tables (Source/) used to not have installed staging tables, and the program had to react accordingly.

11870 12/09/2013 03:09 PM Aaron Marcuse-Kubitza

bugfix: bin/pg_dump_limit: support errexit by ignoring the nonzero exit status that grep returns when it doesn't match anything

11840 12/05/2013 08:38 AM Aaron Marcuse-Kubitza

bin/make_analytical_db: don't regenerate family_higher_plant_group from the NCBI data because the lookup table is now prepopulated as part of the schema

11839 12/05/2013 08:37 AM Aaron Marcuse-Kubitza

bin/import_all: don't import NCBI because the lookup table is now prepopulated as part of the schema

11823 12/04/2013 07:26 PM Aaron Marcuse-Kubitza

bugfix: bin/import_all: run in errexit mode, so that if the user cancels reinstalling of the import schema, the script will then abort instead of continuing and using the wrong schema

11806 12/03/2013 08:58 AM Aaron Marcuse-Kubitza

bin/map: support param start="", which indicates the default value. this fixes a bug in inputs/input.Makefile $(restart_row), which outputs "" if an explicit starting row is not found.

11456 10/29/2013 03:33 AM Aaron Marcuse-Kubitza

bugfix: bin/with_all: @inputs default value: use `local`, so that the default value is only set for the current function and doesn't leak back out into the caller. this fixes a bug in subset imports where import_all's Source/import call to with_all would add the .* datasources, but these would then stay in for the import_scrub call, causing extra .* datasources to incorrectly be imported.

11434 10/24/2013 05:07 PM Aaron Marcuse-Kubitza

bin/make_analytical_db: removed no longer needed setting of $schema to $public, because this is now done by psql()

11430 10/24/2013 04:03 PM Aaron Marcuse-Kubitza

bugfix: bin/import_all: restore the working dir when main() is done, in case it started as something other than the root dir

11429 10/24/2013 03:49 PM Aaron Marcuse-Kubitza

bin/after_import: support turning off the end-of-import backup for imports that are not the full database

11423 10/24/2013 01:11 PM Aaron Marcuse-Kubitza

bugfix: bin/make_analytical_db: when running into a public schema other than "public", also pass this to `/run export_` (which currently uses $schema instead of $public)

11422 10/24/2013 01:10 PM Aaron Marcuse-Kubitza

bugfix: bin/import_all: fix $ when .-included without args (which causes bash to put the wrong values in $ instead of leaving it empty)

11421 10/24/2013 01:09 PM Aaron Marcuse-Kubitza

bin/import_all: `make schemas/$version/install`: reinstall instead to allow re-running the import to the same custom schema (e.g. 2013-10-18.Brian_Enquist.Canadensys)

11420 10/24/2013 01:07 PM Aaron Marcuse-Kubitza

bin/import_all: `make schemas/$version/install`: ignore errors if schema exists, to support running with -e

11419 10/23/2013 11:10 PM Aaron Marcuse-Kubitza

bugfix: bin/import_all: removing inputs/.TNRS/tnrs/tnrs.make.lock: use `"rm" -f` instead of plain "rm" to avoid having an error exit status, which will abort the script if run with the -e flag (as runscripts are)

11416 10/23/2013 10:34 PM Aaron Marcuse-Kubitza

bin/*_all: *_main(): renamed to just main() because it does not matter that other shell-includes' main() methods will clobber this, because it is only executed once

11415 10/23/2013 10:29 PM Aaron Marcuse-Kubitza

bugfix: bin/import_all: Source tables: use .../import instead of import_temp because import_temp is only needed when importing all tables, to prevent the temp suffix from being removed yet

11396 10/21/2013 07:14 PM Aaron Marcuse-Kubitza

fix: bin/map: put template: comment out the "Put template:" label so that the output is valid XML, and displays properly in a browser rather than showing a syntax error

11393 10/20/2013 05:21 PM Aaron Marcuse-Kubitza

bugfix: bin/import_all: need to publish datasources that won't be published by `make .../import`, so that the per-datasource import XPaths that refer to TNRS/geoscrub will link up with the TNRS/geoscrub source entry instead of creating a new entry without the metadata (because the entry with the metadata was named TNRS.new/geoscrub.new)

11390 10/20/2013 04:55 PM Aaron Marcuse-Kubitza

bin/import_all: removed no longer needed import of geoscrub data, because analytical_stem_view is now joined to the geoscrub_output table directly, instead of using the imported canon_place entries

11374 10/19/2013 06:56 PM Aaron Marcuse-Kubitza

bin/with_all: $all: renamed to $hidden_srcs for clarity, since this now just adds the hidden (.*) datasources, rather than always using all datasources

11373 10/19/2013 06:50 PM Aaron Marcuse-Kubitza

bugfix: bin/with_all: in $all mode, just prepend the .* datasources to the user-selected (or default) @inputs, so that using $all to add these datasources doesn't inadvertently cause the action to be performed for all datasources

11371 10/19/2013 02:15 PM Aaron Marcuse-Kubitza

bin/import_all: usage: documented that this can now be run with a custom datasources list (each of the form inputs/src/)

11370 10/19/2013 02:02 PM Aaron Marcuse-Kubitza

bin/with_all: added support for providing a custom list of inputs to run the command on

11286 10/17/2013 04:44 PM Aaron Marcuse-Kubitza

bin/import_all: use just import_scrub, not reimport_scrub, because import_scrub now automatically publishes the datasource's import (i.e. removes the temp suffix)

11227 10/09/2013 10:12 PM Aaron Marcuse-Kubitza

bin/map: usage: documented that verbosity > 3 in commit mode turns on debug_temp mode, which creates real tables instead of temp tables

10871 09/05/2013 12:11 AM Aaron Marcuse-Kubitza

bugfix: bin/import_all: use reimport_scrub instead of import_scrub so that the temp suffix of the datasource name is removed

10868 09/04/2013 11:48 PM Aaron Marcuse-Kubitza

bugfix: bin/after_import: run backups/fix_perms right after the backup files are created to make them private

10865 09/04/2013 05:27 PM Aaron Marcuse-Kubitza

bugfix: bin/make_analytical_db: `/run export_`: don't take input from the terminal, because this causes rm to prompt the user (from a background task) about overwriting the previous export

10854 09/04/2013 01:28 PM Aaron Marcuse-Kubitza

bin/map: allow user to override the source env var, which is used as the source.shortname value in the DB

10849 08/31/2013 07:44 PM Aaron Marcuse-Kubitza

bugfix: bin/import_all: `rm inputs/.TNRS/tnrs/tnrs.make.lock`: need to use `"rm"` instead of `rm` so that we don't use any rm alias the user might have in their shell (import_all is run in the calling shell so that the jobs are owned by the calling shell)

10847 08/31/2013 07:27 PM Aaron Marcuse-Kubitza

bin/import_all: added step to remove any leftover TNRS lockfile (previously done manually)

10742 08/26/2013 08:45 PM Aaron Marcuse-Kubitza

bin/tnrs_db: add entry to new batch table

10599 08/06/2013 12:32 AM Aaron Marcuse-Kubitza

bugfix: bin/import_times: filtering out the Source subdirs: need to match 1 at the beginning of the line only

10598 08/06/2013 12:29 AM Aaron Marcuse-Kubitza

bin/import_times: filter out the Source subdirs, which now have single-row data and therefore are included in the rowcounts list

10589 08/04/2013 12:59 AM Aaron Marcuse-Kubitza

bin/after_import: usage: documented that it requires $version

10586 08/03/2013 09:14 PM Aaron Marcuse-Kubitza

bin/import_all: use new bin/after_import

10585 08/03/2013 09:13 PM Aaron Marcuse-Kubitza

added bin/after_import, which performs post-normalized-import actions separately from bin/import_all

10580 08/03/2013 12:25 AM Aaron Marcuse-Kubitza

bin/import_all: with_all import_scrub: documented that this step uses $by_col, so that users know to include by_col=1 when running this step separately

10579 08/03/2013 12:24 AM Aaron Marcuse-Kubitza

bin/import_all: use column-based import (by_col=1) by default, instead of requiring the user to explicitly specify it. instead turn it off explicitly (by_col=) for row-based import.

10576 08/02/2013 11:55 PM Aaron Marcuse-Kubitza

bin/import_all: don't set $dump_opts until running the backup command that uses it, so that the user can run this backup command separately just by copying the line out of the script (without worrying about env vars that need to be set, other than $version which is visible outside the script)

10448 07/26/2013 08:16 PM Aaron Marcuse-Kubitza

bin/my2pg: use s!...!...! when either the regexp or the replacement contains / , to avoid unnecessary \-s

10447 07/26/2013 08:09 PM Aaron Marcuse-Kubitza

bin/my2pg: commenting out table options: added explanatory comment, because it is not obvious from the regexp what this does

10445 07/26/2013 06:35 PM Aaron Marcuse-Kubitza

bin/my2pg: comment out table options (http://dev.mysql.com/doc/refman/5.5/en/server-sql-mode.html#sqlmode_no_table_options) instead of removing them, because they include table COMMENTs, which contain important metadata such as table definitions. (note that table COMMENTs use a slightly different syntax than column COMMENTs, so the table COMMENTs will not be commented out twice.)

10444 07/26/2013 06:19 PM Aaron Marcuse-Kubitza

bin/my2pg: comment out COMMENTs instead of removing them so that they will be included in the PostgreSQL translation. COMMENTs contain important metadata about columns, such as definitions and the meanings of integer flag values.

10442 07/26/2013 05:56 PM Aaron Marcuse-Kubitza

bin/my2pg: added instructions for regenerating *.schema.sql whenever this script is changed

10441 07/26/2013 05:22 PM Aaron Marcuse-Kubitza

bin/my2pg: COMMENT: also match COMMENTs with embedded ', because there will only be one COMMENT per line, so the contents of the COMMENT can just extend to the last ' on the line

10439 07/26/2013 04:29 PM Aaron Marcuse-Kubitza

bin/my2pg: replace MySQL ` quotes with " quotes to support exports that were generated without ANSI_QUOTES mode. (this replacement only applies to schema exports, not data.) ANSI_QUOTES is only available with mysqldump --compatible modes that also include NO_TABLE_OPTIONS, which omits important table options such as comments. in particular, these comments are part of schemas/VegCore/VegCore.ERD.mwb but were not being included in VegCore.my.sql.

10348 07/19/2013 11:40 AM Aaron Marcuse-Kubitza

bugfix: bin/repl: text mode: repurpose this to match SQL identifiers, for use by inputs/input.Makefile %/postprocess.sql. %/postprocess.sql is the only place currently using this mode, so this will not affect other scripts.

10283 07/14/2013 05:52 AM Aaron Marcuse-Kubitza

bugfix: bin/*: spell out [:alnum:] as [a-zA-Z0-9] because Python unfortunately doesn't support character classes

10278 07/14/2013 02:44 AM Aaron Marcuse-Kubitza

bin/*: replaced confusing regexp constructs involving \W inside [] with the much clearer explicit character class [:alnum:] . this avoids adding or subtracting from an inverted class in order to reach a subset of the corresponding positive class, because the subset can just be named explicitly instead.

10277 07/14/2013 02:38 AM Aaron Marcuse-Kubitza

bugfix: bin/repl: doesn't make sense to use other chars in a [^\W_] regexp, because they will have no effect since \w doesn't include the other chars to begin with. this is a result of confusion with the ^ and \W double negative.

10255 07/11/2013 11:33 AM Aaron Marcuse-Kubitza

bin/filter_out_ci, lib/maps.py: simplify(): also remove distinguishing #... suffix from terms (e.g. UNUSED#institutionID), to support mapping multiple columns to the special terms OMIT, PRIVATE, UNUSED (VegCore.vegpath.org#Special-terms), without creating a collision in the staging table renaming. note that this change must not be made to bin/canon, because this would cause suffixed terms to be autorenamed to their *un*suffixed VegCore versions.

10237 07/10/2013 08:20 PM Aaron Marcuse-Kubitza

bin/my2pg*: keep MySQL indefinite dates as text strings instead of translating them (to the first of the month or year) to fit into a PostgreSQL timestamp. this allows the application to decide how to handle these values, which otherwise have no corresponding value in PostgreSQL. this requires changing the date/time related types to text instead of leaving them as-is, so that they can store the custom MySQL strings.

10225 07/10/2013 04:51 PM Aaron Marcuse-Kubitza

bin/my2pg: use util.sh $top_dir instead of setting $selfDir

10224 07/10/2013 04:50 PM Aaron Marcuse-Kubitza

bin/my2pg*: use the util.sh sed wrapper, which fixes the LANG=*.UTF-8 "illegal byte sequence" errors on invalid UTF-8

10191 07/09/2013 12:56 AM Aaron Marcuse-Kubitza

bin/map: removed no longer used support for map.csv input column prefixes (expand out the prefixes instead). this used to be used by SpeciesLink to use just one mapping for a single term with multiple DwC namespaces, but was replaced with an explicit, ordered rather than implicit, unordered /_alt-ing together of the terms.

10190 07/08/2013 11:47 PM Aaron Marcuse-Kubitza

bin/map: removed no longer accurate comment that this is case- and punctuation-insensitive, since the case- and punctuation-insensitivity is now instead handled by map.csv preprocessing scripts before the mappings are even provided to bin/map

10140 07/02/2013 02:31 PM Aaron Marcuse-Kubitza

bugfix: bin/map: in_is_db: inline metadata value columns (used by new-style import) so that they can be compared by value in XML simplifying functions (lib/xml_func.py)

10115 07/02/2013 03:50 AM Aaron Marcuse-Kubitza

bin/map: map_table(): Resolve prefixes: combined db_xml.ColRef() constructor call with creation of args (as tuple) for clarity

10114 07/02/2013 03:35 AM Aaron Marcuse-Kubitza

bin/map: update_in_label(): use in_schema instead of the map spreadsheet column name when available, to allow using one spreadsheet for all datasources (which would not have a datasource-specific spreadsheet column name)