Project

General

Profile

# Date Author Comment
12152 02/13/2014 03:37 AM Aaron Marcuse-Kubitza

lib/sql.py: distinct_table(): don't cache the table creation, because this prevents different distinct_tables from being created for the same input table

12151 02/13/2014 03:36 AM Aaron Marcuse-Kubitza

lib/sql.py: create_table(), copy_table_struct(): support custom query options, such as cacheable

12149 02/13/2014 12:01 AM Aaron Marcuse-Kubitza

lib/sql.py: parse_exception(): parse "operator does not exist" errors as MissingCastExceptions (these appear when a staging table column is of the wrong type)

11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

10841 08/31/2013 01:02 AM Aaron Marcuse-Kubitza

lib/sql.py: distinct_table(): support custom filters on the distincting query

10548 07/31/2013 11:58 PM Aaron Marcuse-Kubitza

bugfix: lib/sql.py: parse_exception(): MissingCastException from DoesNotExistException for function: handle overloaded functions where none of the overloads supports the given arg types (so assume text). this may have become a bug from system upgrades?

10139 07/02/2013 02:29 PM Aaron Marcuse-Kubitza

lib/sql.py: added col_default_value(), col_is_constant(), which interface with corresponding util-schema functions

9757 06/06/2013 12:59 PM Aaron Marcuse-Kubitza

bugfix: lib/sql.py: parse_exception(): make_DuplicateKeyException(): handle nested exceptions (which should never be generated, but may be in case of sql.py bugs such as wiki.vegpath.org/To_Do#Fixes > #1) by printing the nested exception and then rethrowing the original exception, so that the original exception does not get lost and still ends up at the end of the program's output, to enable debugging

9458 05/17/2013 05:55 PM Aaron Marcuse-Kubitza

bugfix: lib/sql.py: parse_exception(): typed_name_re: added back matching of names without "", since these are used by some error messages (ones that contain () after the function name)

9457 05/17/2013 05:41 PM Aaron Marcuse-Kubitza

bugfix: lib/sql.py: parse_exception(): typed_name_re: need to allow " within the matched name, since there are now "" around the entire identifer that was passed to Postgres, which may itself include " . always require "" around the matched name, to ensure that the whole name is matched by .+? e.g. when followed by () for a function call. the version of Postgres we currently use apparently no longer has error messages without the "", so we don't need a separate regexp for quoted and unquoted names.

9390 05/15/2013 11:21 PM Aaron Marcuse-Kubitza

bugfix: lib/sql.py: parse_exception(): typed_name_re: need to ensure that full name is matched rather than just first character

8819 05/05/2013 10:00 AM Aaron Marcuse-Kubitza

bugfix: lib/sql.py: parse_exception(): typed_name_re (used by MissingCastException, etc.): don't require "" around the value, because function names in "function does not exist" exceptions are (now?) output without them. this exception did not occur in the last import, but does occur in the automated testing and when running the associated query in pgAdmin (in both Linux and Mac OS X).

8379 04/09/2013 04:37 PM Aaron Marcuse-Kubitza

lib/sql.py: parse_exception(): typed_name_re: Fixed bug where need to require the "" around the table/column name, because otherwise, the regexp will try to match as few characters as possible, causing it to match only the first letter of the name

8335 04/04/2013 09:52 AM Aaron Marcuse-Kubitza

lib/sql.py: mk_select(): using subset function: Turn off enable_sort (within the transaction) to avoid unwanted slow sorts. This change (along with the subset functions themselves) should significantly reduce the long FIA.occurrence_all table subset time (~8 hours altogether) and with it the total import time, which had more than doubled as a result of the FIA refresh. Note that this issue would have been even more pronounced for larger datasets, such as the GBIF refresh, which would have taken ~2.5 days longer (400 million rows * ~30% are plants * (FIA: ~8 hours/16.7 million rows) * 1 day/24 hours).

8334 04/04/2013 09:30 AM Aaron Marcuse-Kubitza

lib/sql.py: mk_select(): Use subset function when it's available for fast querying at large OFFSET values

8333 04/04/2013 09:29 AM Aaron Marcuse-Kubitza

lib/sql.py: Added has_subset_func()

8223 03/28/2013 12:24 AM Aaron Marcuse-Kubitza

lib/sql.py: table_order_by(): Documented that it returns None if table is a view, because table_cluster_on() would return None. This is necessary for inputs/FIA/occurrence_all/ sorting to work correctly, because specifying a manual sort order would prevent the query planner from just using fast nested loop joins and instead cause it to perform a slow sort. (This appears to be a bug in the query planner, because when the column list specified matches the joined-on indexes, there should be no need for post-nested loop re-sorting.)

7242 01/16/2013 06:41 AM Aaron Marcuse-Kubitza

sql.py: Added view_exists()

7179 01/11/2013 06:01 AM Aaron Marcuse-Kubitza

sql.py: distinct_table(): Added support for custom joins used in creating the new table. This can then be used by sql_io.put_table() to filter out duplicate rows in the out_table, so that they don't create duplicate key errors and the resulting index holes.

7113 01/07/2013 08:16 PM Aaron Marcuse-Kubitza

sql.py: create_table(): Support creating a table like a view

7112 01/07/2013 08:04 PM Aaron Marcuse-Kubitza

sql.py: Added InvalidTypeException and parse it in parse_exception()

6976 12/20/2012 10:05 AM Aaron Marcuse-Kubitza

sql.py: DbConn._db(): search_path: Don't append the existing search_path, because it usually includes the public schema, which is now different from the schema being imported into. This fixes a bug where sql.function_exists() would find public-schema functions in both the public schema and the import's schema because both were in the search_path, causing a DuplicateException "more than one function named ...". Note that the elements of the existing search_path are no longer needed now that vegbien_dest's $schemas includes $public. Also note that if an instance of DbConn does not specify the schemas param, the existing search_path will be left as-is rather than overwritten with an empty list.

6905 12/19/2012 12:29 AM Aaron Marcuse-Kubitza

sql.py: DbConn.col_info(): raising sql_gen.NoUnderlyingTableException: Fixed bug where also need to catch DoesNotExistException, which is thrown by ::regclass

6904 12/19/2012 12:26 AM Aaron Marcuse-Kubitza

sql.py: DbConn.col_info(): Fixed bug where need to run run_query() recoverably, because this query throws an exception if the column's table does not exist (the information_schema query just returned no rows)

6903 12/19/2012 12:22 AM Aaron Marcuse-Kubitza

sql.py: DbConn.col_info(): Fixed bug where need to use pg_get_expr() on pg_attrdef.adbin instead of shortcut field adsrc, because adsrc does not include schema qualifiers on table names (including strings passed to `nextval('..._seq'::regclass)`)

6902 12/18/2012 11:42 PM Aaron Marcuse-Kubitza

sql.py: DbConn.col_info(): Fixed bug where need to pass through cacheable param to run_query()

6901 12/18/2012 11:41 PM Aaron Marcuse-Kubitza

sql.py: DbConn.col_info(): Fixed bug where need to use .to_str(self) instead of self.esc_value() because self.esc_value() expects a value, not a sql_gen.Literal instance

6900 12/18/2012 11:34 PM Aaron Marcuse-Kubitza

sql.py: DbConn.col_info(): Fixed bug where self needs to be used everywhere that db normally is, because this is a DbConn method rather than a global function

6899 12/18/2012 11:31 PM Aaron Marcuse-Kubitza

sql.py: DbConn.col_info(): For PostgreSQL, use pg_catalog tables directly instead of their views in information_schema. This allows using ::regclass to look up the table in the search_path, and fixes a bug in imports with an explicit public schema where column types were looked up in public instead of public.<version>. Also don't wrap default using sql_gen.as_Code() when it's None (indicating no default value, aka default=NULL), because this value is interpreted specially by sql_gen.TypedCol.

6301 11/19/2012 05:34 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: If 1st param is hstore, only perform the cast on the value param. This fixes a bug in _map() calls whose value is a non-text type, such as SALVIAS.plotMetadata.AccessCode.

5990 11/05/2012 03:56 PM Aaron Marcuse-Kubitza

sql.py: Added table_has_pkey()

5989 11/05/2012 03:13 PM Aaron Marcuse-Kubitza

sql.py: table_pkey_col(): For PostgreSQL DBs, use pg_catalog via index_cols() and table_pkey_index(), in order to use the search_path to look up the table. This fixes a bug where the pkey would be selected from information_schema.table_constraints in random order, and this order sometimes returned the corresponding table in the public schema but sometimes in other schemas, such as VegBank. This became a problem now that VegBIEN has a place table, which conflicts with VegBank's place table. (Most other VegBank tables that are mapped to have been renamed in VegBIEN.)

5988 11/05/2012 02:41 PM Aaron Marcuse-Kubitza

sql.py: add_pkey_or_index(): Use new add_pkey_index()

5987 11/05/2012 02:41 PM Aaron Marcuse-Kubitza

sql.py: Added add_pkey_index()

5967 11/02/2012 03:35 PM Aaron Marcuse-Kubitza

sql.py: run_query_into(): add_pkey_: Use add_pkey_or_index() instead of add_pkey(). This should prevent errors for input rows which match multiple output rows in VegBIEN.

5903 11/01/2012 12:55 AM Aaron Marcuse-Kubitza

sql.py: distinct_table(): Use DISTINCT ON instead of a unique index and insert_select()'s ignore mode to remove duplicate rows. This uses whichever sorting method PostgreSQL deems to be fastest instead of requiring the use of a B-tree index. Since most of the slower operations in TNRS's import are distinct_table() calls, this should speed up the TNRS import, which is a bottleneck for the DB import as a whole because the TNRS import must complete before other datasources can be imported.

5902 11/01/2012 12:36 AM Aaron Marcuse-Kubitza

sql.py: distinct_table(): Changed comment about distinct_on column index to include just the input table, so that the function does not guarantee a unique index on the output table's distinct_on columns

5890 10/31/2012 09:40 PM Aaron Marcuse-Kubitza

sql.py: run_query(): Give failed EXPLAIN approximately the log_level of its query, so that queries which produce an error in the EXPLAIN before the query itself is even run will still be logged

5823 10/26/2012 08:18 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): "could not create unique index" DuplicateKeyException: Fixed bug where can't use make_DuplicateKeyException() because it tries to retrieve information about the index in question, but the index it was trying to create doesn't exist

5792 10/25/2012 05:16 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: If first param's type is anyelement (for polymorphic function, which had mismatched arg types), use type text, as all types can cast to it

5765 10/25/2012 07:44 AM Aaron Marcuse-Kubitza

sql.py: Added add_pkey_or_index()

5764 10/25/2012 07:32 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): Parse "could not create unique index ... Key is duplicated" errors as DuplicateKeyException

5763 10/25/2012 07:27 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): DuplicateKeyException: Factored out creation of DuplicateKeyException into helper function

5724 10/23/2012 06:38 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): types cannot be matched MissingCastException: Use the first type as the type to cast to instead of text

5723 10/23/2012 05:59 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): InvalidValueException: Fixed bug in regexp where can't use .*? before (?:...)? surrounding matched value, because it prevents the value from being matched now that it is optional

5717 10/23/2012 05:17 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): InvalidValueException: Also match exceptions which don't provide a specific value but just indicate that a value was invalid, such as PL/Python's "day is out of range for month"

5715 10/23/2012 04:34 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Return the actual type of the function's 1st param, using new function_param0_type(), rather than just text

5714 10/23/2012 04:31 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Fixed bug where can't return the function name as the name of what was missing the cast, because this must be a column

5713 10/23/2012 04:28 AM Aaron Marcuse-Kubitza

sql.py: Added function_param0_type()

5712 10/23/2012 04:26 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Only treat DoesNotExistException as a MissingCastException if the query that was run did not already include a cast, to avoid infinite exception-handling recursion

5711 10/23/2012 04:24 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Fixed bug where determining whether the exception is a MissingCastException rather than a DoesNotExistException needs to check whether the function exists rather than whether it's the same in the exception message as in the query that was run. The exception message will of course copy the function name verbatim from the query, so there is no information in the exception message itself to indicate whether the DoesNotExistException was caused by a missing cast or by a nonexistent function.

5710 10/23/2012 04:19 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Documented that the regexp match to extract the function name also checks that a function signature with param types was matched, indicating a function call rather than cast to regproc. This check will also help avoid infinite recursion when function MissingCastException parsing calls database structure introspection functions.

5709 10/23/2012 04:15 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Don't match quotes around the function name because this particular exception (incorrect param type) does not include them. Casts to regproc, which also produce a DoesNotExistException, include the quotes but do not indicate a MissingCastException.

5708 10/23/2012 04:12 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Fixed bug where the 1st param's type in the exception's function signature is not actually the type the argument needs to have, because this is just the argument's current type

5707 10/23/2012 04:04 AM Aaron Marcuse-Kubitza

sql.py: parse_exception(): typed_name_re: Also match identifiers without quotes, such as functions in "No function matches the given name and argument types" errors. This fixes a bug where DoesNotExistExceptions could not be parsed as MissingCastExceptions when applicable because the DoesNotExistException pattern would not even match.

5690 10/19/2012 06:20 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Support functions with named parameters

5689 10/19/2012 06:18 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): function MissingCastException: Support function names enclosed in quotes on the context line

5683 10/19/2012 05:50 PM Aaron Marcuse-Kubitza

sql.py: parse_exception(): DoesNotExistException: If item not found was a function and not found only because of a missing cast, raise MissingCastException instead. This should allow automatic casts to be added on function parameters as well as table columns.

5642 10/18/2012 01:01 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): Don't add table0 to order_by with no table, because this could cause it not to match a corresponding DISTINCT ON column with no explicit table. PostgreSQL apparently does not treat a column with no explicit table and a column with the applicable table as identical for purposes of ORDER BY/DISTINCT ON checking, even when they refer to the same physical column.

5641 10/18/2012 12:53 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): order_by defaults to first distinct_on column when distinct_on provided

5579 10/17/2012 09:45 AM Aaron Marcuse-Kubitza

sql.py: DbConn: Added set_encoding()

5577 10/17/2012 09:28 AM Aaron Marcuse-Kubitza

sql.py: run_query(): Factored exception parsing out into new parse_exception()

5576 10/17/2012 09:22 AM Aaron Marcuse-Kubitza

sql.py: Added EncodingException and parse it in run_query()

5575 10/17/2012 09:14 AM Aaron Marcuse-Kubitza

sql.py: Removed no longer used NameException

5529 10/15/2012 03:23 PM Aaron Marcuse-Kubitza

sql.py: Added copy_table()

5528 10/15/2012 03:14 PM Aaron Marcuse-Kubitza

sql.mk_select() calls: Removed no longer needed order_by=None when limit=0

5527 10/15/2012 03:11 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): Set order_by to None if limit == 0

5525 10/15/2012 02:52 PM Aaron Marcuse-Kubitza

sql.py: table_order_by(): Cache the order_by in table.order_by and propagate it when a LIKE table is created

5523 10/15/2012 02:36 PM Aaron Marcuse-Kubitza

sql.select() calls: Removed order_by=None everywhere that a stable row order is required (i.e. consistent between selects, or consistent between table transformations). This causes several tests to return different inserted row counts, because the input table is now being accessed in pkey order instead of in table order. This fixes a bug where tables with more rows than ~100 would return different results for repeated calls of the same non-ordered select.

5522 10/15/2012 02:27 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): Use table_order_by() instead of table_pkey_col() to determine what column(s) to order by if order_by is set to order_by_pkey

5521 10/15/2012 02:26 PM Aaron Marcuse-Kubitza

sql.py: Added table_pkey_index(), index_order_by(), table_cluster_on(), table_order_by()

5520 10/15/2012 01:10 PM Aaron Marcuse-Kubitza

sql.py: Added index_exprs() and use it in index_cols()

5516 10/15/2012 12:10 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): order_by is order_by_pkey: Only order by the table's actual pkey, if it has one, rather than using the first column if it doesn't

5447 10/12/2012 01:11 AM Aaron Marcuse-Kubitza

sql.py: ConstraintException: message: Wrap condition in strings.as_tt()

5446 10/12/2012 12:30 AM Aaron Marcuse-Kubitza

sql.py: run_query(): DuplicateKeyException: Also retrieve the index's condition using new index_cond()

5445 10/12/2012 12:28 AM Aaron Marcuse-Kubitza

sql.py: Added index_cond()

5443 10/11/2012 11:40 PM Aaron Marcuse-Kubitza

sql.py: constraint_cond(): Fixed NotImplementedError message to apply to this function

5398 10/10/2012 07:10 AM Aaron Marcuse-Kubitza

sql.py: mk_update(): in_place: Convert columns of type character varying to text so that they can be merge-joined with text columns. Note that these two types are equivalent but not aliases of one another, so the explicit type change is needed.

5396 10/10/2012 06:52 AM Aaron Marcuse-Kubitza

sql.py: mk_update(): in_place: Factored retrieval of column type out into separate statement for clarity

5393 10/10/2012 05:51 AM Aaron Marcuse-Kubitza

sql.py: table_pkey_col(): Raise a DoesNotExistException if the table has no pkey

5392 10/10/2012 05:23 AM Aaron Marcuse-Kubitza

sql.py: pkey_col(): Call table_pkey_col() directly rather than via pkey_name(). pkey_name(): Call pkey_col() instead of table_pkey_col() now that pkey_col() calls table_pkey_col().

5391 10/10/2012 05:14 AM Aaron Marcuse-Kubitza

sql.py: pkey_col(): Documented that if there is no pkey, returns the first column in the table

5390 10/10/2012 05:13 AM Aaron Marcuse-Kubitza

sql.py: pkey_col(): Specify recover directly as a kw_arg because it's the only kw_arg passed to pkey_name()

5389 10/10/2012 05:10 AM Aaron Marcuse-Kubitza

sql.py: Added table_pkey_col() and use it in pkey_name()

5388 10/10/2012 05:01 AM Aaron Marcuse-Kubitza

sql.py: Renamed pkey() to pkey_name()

5387 10/10/2012 04:45 AM Aaron Marcuse-Kubitza

sql.py: Renamed pkey_col_() to pkey_col()

5386 10/10/2012 04:43 AM Aaron Marcuse-Kubitza

sql.py: Removed no longer used pkey_col

5383 10/10/2012 04:36 AM Aaron Marcuse-Kubitza

sql.py: Added table_cols()

5367 10/10/2012 12:55 AM Aaron Marcuse-Kubitza

Moved expression transforming functions from sql.py to sql_gen.py because they do not manipulate an actual database and merely generate SQL

5366 10/10/2012 12:38 AM Aaron Marcuse-Kubitza

sql.py: Added true_expr, false_expr and use them where their values are used

5365 10/10/2012 12:34 AM Aaron Marcuse-Kubitza

sql.py: simplify_expr(): Also simplify "AND true" expressions

5364 10/10/2012 12:30 AM Aaron Marcuse-Kubitza

sql.py: simplify_expr(): Also simplify "AND false" expressions

5363 10/10/2012 12:19 AM Aaron Marcuse-Kubitza

sql.py: Added atom_re and use it in simplify_parens()

5362 10/10/2012 12:19 AM Aaron Marcuse-Kubitza

sql.py: Added or_re and use it in simplify_expr()

5361 10/10/2012 12:18 AM Aaron Marcuse-Kubitza

sql.py: logic_op_re(): Added expr_re param for an expr on the other side of the operator

5360 10/09/2012 11:54 PM Aaron Marcuse-Kubitza

sql.py: simplify_parens(): Use bool_re

5359 10/09/2012 11:54 PM Aaron Marcuse-Kubitza

sql.py: Removed no longer needed paren_re()

5358 10/09/2012 11:53 PM Aaron Marcuse-Kubitza

sql.py: true_re, false_re: Removed no longer needed paren_re() because simplify_parens() now handles this

5357 10/09/2012 11:50 PM Aaron Marcuse-Kubitza

sql.py: simplify_expr(): Removed final simplify_parens() because this is now done by simplify_recursive()

5356 10/09/2012 11:49 PM Aaron Marcuse-Kubitza

sql.py: simplify_expr(): Use new simplify_recursive(). This also fixes a bug where some logic expressions are not simplified because of extra parens.