Project

General

Profile

Statistics
| Revision:

# Date Author Comment
3428 07/17/2012 08:44 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Divide fields into input columns and literal values: Translate values: Allow literal values other than strings or None (from the XML parsing), because sql_io.put_table() is getting an optimization for iterations containing only literal values, which just returns the pkey of the single row for these values (which is usually an integer) instead of a temp table with the same value in each row

3427 07/17/2012 08:28 PM Aaron Marcuse-Kubitza

bin/map: by_col: Stripping XML functions not in the DB: Remove DB functions based on whether a plain SQL function of that name exists, rather than whether a relational function (i.e. a table) of that name exists. This will allow column-based import to use plain SQL functions that don't have a corresponding relational function.

3426 07/17/2012 08:23 PM Aaron Marcuse-Kubitza

db_xml.py: Don't remove any explicit pkey because the output table may be a SQL function, which does not have a pkey. This feature only existed to support importing VegBank XML exports, which we don't use (and which would be incompatible with the schema anyway).

3425 07/17/2012 08:19 PM Aaron Marcuse-Kubitza

sql.py: function_exists(): Fixed bug where select() needed to be run with auto-rollback in case it raised an exception

3424 07/17/2012 08:08 PM Aaron Marcuse-Kubitza

xml_func.py: process(): Changed rel_funcs param to a callback is_rel_func, so that caller can specify any dynamic function to determine if a name is a relational function rather than having to list out all known relational functions

3423 07/17/2012 07:54 PM Aaron Marcuse-Kubitza

sql.py: function_exists(): Use simpler cast to regproc instead of query of information_schema.routines to determine if function exists. When the schema is not specified, this also limits the schemas checked to the search_path instead of the whole DB.

3422 07/17/2012 07:51 PM Aaron Marcuse-Kubitza

schemas/functions.sql, py_functions.sql: Renamed trigger functions to avoid collisions with plain SQL functions of the same name but different signatures, so that the plain SQL functions can be uniquely identified by their name without also requiring their signature

3421 07/17/2012 07:39 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): In queries without a FROM clause, don't order by pkey

3420 07/17/2012 07:15 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): Support queries without a FROM clause

3419 07/17/2012 07:03 PM Aaron Marcuse-Kubitza

sql.py: Added DoesNotExistException and parse it in run_query()

3418 07/17/2012 06:46 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Removed no longer used conds var (invalid rows are removed from the in_table using sql.delete() instead of being filtered out in the main select)

3417 07/17/2012 06:43 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Removed no longer used distinct_on var (sql.distinct_table() handles filtering the join_cols)

3416 07/17/2012 06:24 PM Aaron Marcuse-Kubitza

schemas/py_functions.sql: _date(): Just run str() on the returned datetime because it will usually be converted to a PostgreSQL timestamp anyway, so excluding the time from the string isn't necessary

3415 07/17/2012 06:15 PM Aaron Marcuse-Kubitza

schemas/py_functions.sql: Added _date()

3414 07/17/2012 06:14 PM Aaron Marcuse-Kubitza

sql.py: run_query(): Exception parsing: Remove PL/Python prefix from exception message so that the regexps can match at the beginning of the message

3413 07/17/2012 05:50 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Handle sql.InvalidValueExceptions by filtering the value out of all input columns. This will be useful for SQL functions that raise exceptions.

3412 07/17/2012 04:49 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: namedplace, plantname: *_unique UNIQUE INDEX: Reordered columns to put rank after parent_id and plantname so that these columns, which are usually input table columns, can be used in a merge join index scan, while rank, which is usually a literal value, can applied as an index filter condition after the merge join

3411 07/17/2012 04:42 PM Aaron Marcuse-Kubitza

sql.py: distinct_table(): Removed literal values from UNIQUE INDEXes because the query planner did not seem to use them to do a merge join

3410 07/17/2012 04:01 PM Aaron Marcuse-Kubitza

README.TXT: Maintenance: Full DB: Documented how to test full DB backup

3409 07/17/2012 03:47 PM Aaron Marcuse-Kubitza

backups/Makefile: Added %.backup/test

3408 07/16/2012 08:32 PM Aaron Marcuse-Kubitza

README.TXT: Documented maintenance of full DB (back up/restore)

3407 07/16/2012 08:23 PM Aaron Marcuse-Kubitza

backups/Makefile: Full DB backups: Added vegbien.backup

3406 07/16/2012 08:22 PM Aaron Marcuse-Kubitza

pg_dump_vegbien: If first arg is "all", dump entire DB. Require a first arg so that Usage message will be displayed if run with no args.

3405 07/16/2012 08:03 PM Aaron Marcuse-Kubitza

Always output Usage messages to stderr and word-wrap them using `fold -s`

3404 07/16/2012 07:37 PM Aaron Marcuse-Kubitza

backups/Makefile: Factored backup command into $(backup) for later use by full DB backups. Made Backups, Archived imports sections subsections of Archived imports so Full DB backups can have its own section.

3403 07/16/2012 07:16 PM Aaron Marcuse-Kubitza

backups/Makefile: Fixed bug where $(SHELL) needed to be set to bash so that $'\n' would be interpreted correctly

3402 07/16/2012 07:06 PM Aaron Marcuse-Kubitza

backups/Makefile: Fixed bug where *.sql files needed to be restored using psql because pg_restore only supports "non-plain-text formats"

3401 07/16/2012 06:35 PM Aaron Marcuse-Kubitza

pg_dump_vegbien: For consistency with setting the --schema option, use `set -- "$" args...` to append options to $ which are then passed to pg_dump, instead of specifying several variables which are then included in the pg_dump command

3400 07/16/2012 06:26 PM Aaron Marcuse-Kubitza

pg_dump_vegbien: Pass command line options directly to pg_dump after parsing out any schema name

3399 07/16/2012 06:19 PM Aaron Marcuse-Kubitza

backups/Makefile: Don't log stderr or run the command verbosely and instead just output the command and run time to the terminal. This matches what we do for pg_dump, which works better because it just prints the useful information when it's done running.

3398 07/16/2012 05:28 PM Aaron Marcuse-Kubitza

backups/Makefile: Remove log files after successful restore/extraction because they are only useful for tail -f when the restore operation is running in the background

3397 07/16/2012 05:14 PM Aaron Marcuse-Kubitza

pg_dump_vegbien: Save owners when saving data (for full export)

3396 07/16/2012 05:03 PM Aaron Marcuse-Kubitza

pg_dump_vegbien: Use SET SESSION AUTHORIZATION to ensure that owners are always recorded in the same format. This will help make plain text backups comparable using diff.

3395 07/16/2012 04:39 PM Aaron Marcuse-Kubitza

backups/Makefile: Backups: Fixed bug where `%.sql: %` needed to come before %.sql with no prerequisites to be matched first

3394 07/16/2012 04:14 PM Aaron Marcuse-Kubitza

Moved archived imports and make targets to maintain them to new backups dir

3393 07/16/2012 04:08 PM Aaron Marcuse-Kubitza

Moved archived imports and make targets to maintain them to new backups dir

3392 07/16/2012 03:29 PM Aaron Marcuse-Kubitza

Added psql_script_vegbien

3391 07/16/2012 12:54 PM Aaron Marcuse-Kubitza

root Makefile: VegBIEN DB: Schemas: Added schemas/%.sql to extract a compressed custom-format backup to plain SQL

3390 07/16/2012 12:33 PM Aaron Marcuse-Kubitza

root Makefile: VegBIEN DB: Schemas: Added schemas/%.backup/uninstall so that a schema can be removed by its backup file name (with extension) as well as its name

3389 07/13/2012 04:43 PM Aaron Marcuse-Kubitza

README.TXT: Documented that files and directories can easily be remade with `make <file>-remake` and `make <dir>/remake`, which do the make clean or rm automatically

3388 07/13/2012 04:37 PM Aaron Marcuse-Kubitza

pg_dump_vegbien: Added plain env var which creates plain text output instead, for use in debugging and validation

3387 07/13/2012 04:14 PM Aaron Marcuse-Kubitza

input.Makefile: Mapping: $(map): Fixed bug where use_staged was causing in_schema, in_database, etc. to be set even when not outputting to vegbien, causing bin/map to try to output to a DB when it should output to stdout, by moving the use_staged test to $(map2db)

3386 07/13/2012 03:07 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Creating full_in_table: If out_table is a SQL function, just use in_table because in_table is not modified

3385 07/13/2012 02:47 PM Aaron Marcuse-Kubitza

input.Makefile: Documentation: import/steps.by_col.sql: $(steps): Fixed bug where stderr needed to be redirected to stdout so that it would be passed to debug2redmine and included in the file

3384 07/13/2012 02:26 PM Aaron Marcuse-Kubitza

root Makefile: Datasources: import: Don't pause after "To import all inputs at once" message because this prevents running sequential import with & (as `make import &`)

3383 07/13/2012 02:09 PM Aaron Marcuse-Kubitza

sql.py: create_table(): Only version temp tables, so that e.g. staging tables are not created with a version # if they already exist. This will help make `make install` idempotent.

3382 07/13/2012 02:00 PM Aaron Marcuse-Kubitza

input.Makefile: Import to VegBIEN: $(import): Only redirect stderr to stdout if outputting to a log file. This allows unlogged imports to redirect the "Inserted # new rows into database" message, which is output to stdout rather than stderr for use by validation and automated testing, to a separate location.

3381 07/13/2012 01:54 PM Aaron Marcuse-Kubitza

README.TXT: Documented how to remove an archived import. Moved archived imports commands from Data import to new section Maintenance of archived imports. Changed "Empty the DB" to "Remove the last import" for clarity, and made the command apply just to the public schema and not also to the functions schemas (reinstalling those is documented under Schema changes).

3380 07/13/2012 01:44 PM Aaron Marcuse-Kubitza

root Makefile: $(rmSchemaCmd) for uninstalling schemas: Escape schema name with "" so that schemas/%/uninstall can be used to remove archived imports (as well as aborted restores of them)

3379 07/13/2012 01:14 PM Aaron Marcuse-Kubitza

root Makefile: schemas/%.backup: Removed explicit check if file exists because now that the target name matches the name of the file created, the file will be rebuilt only if it doesn't exist

3378 07/13/2012 01:13 PM Aaron Marcuse-Kubitza

schemas: Ignore log files (from restore operations) for svn

3377 07/13/2012 12:49 PM Aaron Marcuse-Kubitza

main Makefile: Installation: install: Download input data using new inputs/download target

3376 07/13/2012 12:48 PM Aaron Marcuse-Kubitza

main Makefile: VegBIEN DB: Schemas: schemas/%/install, schemas/py_functions/install: Ignore errors if schema exists, so that install can be idempotent

3375 07/13/2012 12:38 PM Aaron Marcuse-Kubitza

inputs/Makefile: Added download to download input data from vegbiendev

3374 07/13/2012 10:05 AM Aaron Marcuse-Kubitza

main Makefile: VegBIEN DB: Removing database warning message: Changed "public schema versions" to "archived imports" for clarity

3373 07/13/2012 10:01 AM Aaron Marcuse-Kubitza

main Makefile: VegBIEN DB: Schemas: Added schemas/%.backup/restore and documented it in README.TXT

3372 07/13/2012 09:42 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import. Marked partial imports affecting the slowest datasource with a comment when they might otherwise appear to be comparable with full imports due to having a similar # of rows.

3371 07/13/2012 08:58 AM Aaron Marcuse-Kubitza

root Makefile: schemas/%.backup: Removed _always prerequisite because an existing backup should not be overwritten by running `make schemas/%.backup`

3370 07/13/2012 08:53 AM Aaron Marcuse-Kubitza

README.TXT: Updated warning messages to match what's actually output by the commands in question

3369 07/13/2012 08:47 AM Aaron Marcuse-Kubitza

README.TXT: Documented how to back up an archived import

3368 07/12/2012 03:26 PM Aaron Marcuse-Kubitza

root Makefile: VegBIEN DB: Schemas: schemas/%/backup: Time the backup

3367 07/12/2012 03:25 PM Aaron Marcuse-Kubitza

root Makefile: VegBIEN DB: Schemas: schemas/%/backup: Renamed to the name of the file created (schemas/%.backup) so that the file will automatically be removed by make if you cancel the backup. If this didn't happen, the existing partial file would prevent make from backing it up again.

3366 07/12/2012 03:19 PM Aaron Marcuse-Kubitza

schemas: Ignore *.backup files for svn

3365 07/12/2012 03:00 PM Aaron Marcuse-Kubitza

root Makefile: VegBIEN DB: Schemas: schemas/%/backup: Documented that it can't be used for the current (unrotated) public schema because pg_dump doesn't back up the CREATE SCHEMA statement for it

3364 07/12/2012 02:45 PM Aaron Marcuse-Kubitza

root Makefile: VegBIEN DB: Schemas: schemas/%/backup: Changed file extension to .backup to match what pgAdmin3 expects

3363 07/12/2012 02:40 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/maps/VegX.stems.csv: Fixed PlotObsID mapping to point to sourceAccessionCode so that it would map to the same place as organisms' PlotObsID and the entries would be merged. This will hopefully also fix a SALVIAS stems import bug where the specimenreplicate join on existing/inserted rows matched multiple rows, in spite of using the specimenreplicate_unique_catalognumber UNIQUE INDEX.

3362 07/12/2012 02:35 PM Aaron Marcuse-Kubitza

root Makefile: VegBIEN DB: Schemas: Added schemas/%/backup for backing up an import's schema

3361 07/12/2012 02:34 PM Aaron Marcuse-Kubitza

pg_dump_vegbien: Added data option for backing up an import's schema

3360 07/12/2012 01:32 PM Aaron Marcuse-Kubitza

sql_io.py: cast(): Don't output warnings for errors that are stored in the errors table, because this clutters the log output unnecessarily. Warnings are still output when there is no errors table.

3359 07/12/2012 01:20 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: party: party_unique UNIQUE INDEX: Moved surname before givenname, middlename to enable the index to be used when just the person's last name is known. (Note that if the organization is not known, and explicit NULL would need to be put in for that field to use the index.)

3358 07/12/2012 01:14 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): DuplicateKeyException: Uniquifying input table to avoid internal duplicate keys: Include all join columns (including literal values) in the unique index, not just table columns, to encourage PostgreSQL to use the index in a merge join.

3357 07/12/2012 01:11 PM Aaron Marcuse-Kubitza

sql.py: distinct_table(): Support literal values as column expressions. Filter out untyped NULLs because the type unknown can't be indexed.

3356 07/12/2012 01:09 PM Aaron Marcuse-Kubitza

sql.py: add_index(): Support literal values as expressions

3355 07/12/2012 01:06 PM Aaron Marcuse-Kubitza

sql_gen.py: ensure_not_null(): Support Literal value inputs, which are passed through if not nullable

3354 07/12/2012 01:05 PM Aaron Marcuse-Kubitza

sql_gen.py: Added cast_literal()

3353 07/12/2012 11:52 AM Aaron Marcuse-Kubitza

sql.py: Added simplify_expr() and use it in parse_expr_col(). This will remove extraneous information from the condition that gets included in the errors table, so it's easy for the data provider to see what constraint is being violated as it applies to their data.

3352 07/12/2012 11:40 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Handle CheckExceptions. This fixes a REMIB bug where some rows violated the specimenreplicate.specimenreplicate_required_key CHECK constraint, but there was previously no way to filter out just these rows so all rows were removed instead.

3351 07/12/2012 11:33 AM Aaron Marcuse-Kubitza

sql.py: Added map_expr()

3350 07/12/2012 11:33 AM Aaron Marcuse-Kubitza

sql_gen.py: Added NotCond

3349 07/12/2012 10:06 AM Aaron Marcuse-Kubitza

sql.py: run_query(): ConstraintException: Get and include the constraint condition in the exception

3348 07/12/2012 10:05 AM Aaron Marcuse-Kubitza

sql.py: Added constraint_cond()

3347 07/12/2012 09:26 AM Aaron Marcuse-Kubitza

sql.py: run_query(): ConstraintException: Include table in the constraint name because constraints are scoped by table

3346 07/12/2012 09:18 AM Aaron Marcuse-Kubitza

sql.py: Added ConstraintException and parse it in run_query()

3345 07/12/2012 09:12 AM Aaron Marcuse-Kubitza

sql.py: ConstraintException: Also store the constraint's condition, if any

3344 07/12/2012 08:24 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: specimenreplicate: specimenreplicate_unique_catalognumber UNIQUE INDEX: Limited to rows where sourceaccessioncode IS NULL, so that it will only be used as an alternate if the datasource does not specify a sourceaccessioncode. This should fix the SpeciesLink bug where input rows with catalognumber_dwc IS NULL match multiple output rows when compared on the specimenreplicate_unique_catalognumber UNIQUE INDEX because they were excluded from the uniquifying by the index filter.

3343 07/12/2012 08:10 AM Aaron Marcuse-Kubitza

schemas/vegbien.ERD.mwb: Synced with schema

3342 07/12/2012 08:06 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: specimenreplicate: Added UNIQUE INDEX on plantobservation_id for direct vouchers in plots data

3341 07/12/2012 07:16 AM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated with stats from latest import

3340 07/11/2012 10:36 PM Aaron Marcuse-Kubitza

sql.py: Removed no longer used constraint_cols(). index_cols() can always be used for a UNIQUE constraint because a UNIQUE index is automatically created for it.

3339 07/11/2012 10:34 PM Aaron Marcuse-Kubitza

bin/map: Fixed bug where errors table indexes could not be looked up using index_cols() because their schema was not in the search_path, by explicitly adding the in_schema at the end of the search_path. This is the only reason the in_schema needs to be in the search_path, but it's unavoidable because the "duplicate key value violates unique constraint" error does not included the constraint's schema.

3338 07/11/2012 10:24 PM Aaron Marcuse-Kubitza

sql.py: run_query(): DuplicateKeyException: Don't parse table name out of constraint name because it isn't used and can't reliably be determined for table names containing "_"

3337 07/11/2012 09:10 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: Reattached location.datasource_id :[] mappings directly to the location itself, rather than to the location via the locationevent

3336 07/11/2012 08:37 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: Mapped to taxonoccurrence.sourceaccessioncode to avoid underconstraining taxonoccurrence

3335 07/11/2012 08:02 PM Aaron Marcuse-Kubitza

xml_func.py: process(): Merge mergeable siblings recursively so that newly-mergeable children are also merged

3334 07/11/2012 07:44 PM Aaron Marcuse-Kubitza

xml_dom.py: Removed no longer used merge_same_name()

3333 07/11/2012 07:43 PM Aaron Marcuse-Kubitza

xml_func.py: process(): Recombining pieces of nodes that were split apart in the mappings: Only combine nodes that are explicitly marked as mergeable, to avoid unwanted merges. Refactored to use new xml_dom.merge_by_name().

3332 07/11/2012 07:39 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.stems.csv: Marked aggregateoccurrence as mergeable, in preparation for switching to explicit merging only to avoid unwanted merges

3331 07/11/2012 07:33 PM Aaron Marcuse-Kubitza

xml_dom.py: Moved merge_adjacent() recursion and name checking into merge() so that other callers of merge() can take advantage of it, too. Added merge_by_name() and use it in merge_same_name(). Removed now-unused merge_adjacent().

3330 07/11/2012 07:30 PM Aaron Marcuse-Kubitza

xpath.py: get(): Support checking all children when the elem name is '*'

3329 07/11/2012 07:01 PM Aaron Marcuse-Kubitza

xml_dom.py: replace(): Assert that old node not removed from parent tree