/trunk/lib/sql.py - Changes - BIEN 3 - NCEAS Projects

root/trunk/lib/sql.py @ 12642

#	Date	Author	Comment
12152	02/13/2014 03:37 AM	Aaron Marcuse-Kubitza	lib/sql.py: distinct_table(): don't cache the table creation, because this prevents different distinct_tables from being created for the same input table
12151	02/13/2014 03:36 AM	Aaron Marcuse-Kubitza	lib/sql.py: create_table(), copy_table_struct(): support custom query options, such as cacheable
12149	02/13/2014 12:01 AM	Aaron Marcuse-Kubitza	lib/sql.py: parse_exception(): parse "operator does not exist" errors as MissingCastExceptions (these appear when a staging table column is of the wrong type)
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
10841	08/31/2013 01:02 AM	Aaron Marcuse-Kubitza	lib/sql.py: distinct_table(): support custom filters on the distincting query
10548	07/31/2013 11:58 PM	Aaron Marcuse-Kubitza	bugfix: lib/sql.py: parse_exception(): MissingCastException from DoesNotExistException for function: handle overloaded functions where none of the overloads supports the given arg types (so assume text). this may have become a bug from system upgrades?
10139	07/02/2013 02:29 PM	Aaron Marcuse-Kubitza	lib/sql.py: added col_default_value(), col_is_constant(), which interface with corresponding util-schema functions
9757	06/06/2013 12:59 PM	Aaron Marcuse-Kubitza	bugfix: lib/sql.py: parse_exception(): make_DuplicateKeyException(): handle nested exceptions (which should never be generated, but may be in case of sql.py bugs such as wiki.vegpath.org/To_Do#Fixes > #1) by printing the nested exception and then rethrowing the original exception, so that the original exception does not get lost and still ends up at the end of the program's output, to enable debugging
9458	05/17/2013 05:55 PM	Aaron Marcuse-Kubitza	bugfix: lib/sql.py: parse_exception(): typed_name_re: added back matching of names without "", since these are used by some error messages (ones that contain () after the function name)
9457	05/17/2013 05:41 PM	Aaron Marcuse-Kubitza	bugfix: lib/sql.py: parse_exception(): typed_name_re: need to allow " within the matched name, since there are now "" around the entire identifer that was passed to Postgres, which may itself include " . always require "" around the matched name, to ensure that the whole name is matched by .+? e.g. when followed by () for a function call. the version of Postgres we currently use apparently no longer has error messages without the "", so we don't need a separate regexp for quoted and unquoted names.
9390	05/15/2013 11:21 PM	Aaron Marcuse-Kubitza	bugfix: lib/sql.py: parse_exception(): typed_name_re: need to ensure that full name is matched rather than just first character
8819	05/05/2013 10:00 AM	Aaron Marcuse-Kubitza	bugfix: lib/sql.py: parse_exception(): typed_name_re (used by MissingCastException, etc.): don't require "" around the value, because function names in "function does not exist" exceptions are (now?) output without them. this exception did not occur in the last import, but does occur in the automated testing and when running the associated query in pgAdmin (in both Linux and Mac OS X).
8379	04/09/2013 04:37 PM	Aaron Marcuse-Kubitza	lib/sql.py: parse_exception(): typed_name_re: Fixed bug where need to require the "" around the table/column name, because otherwise, the regexp will try to match as few characters as possible, causing it to match only the first letter of the name
8335	04/04/2013 09:52 AM	Aaron Marcuse-Kubitza	lib/sql.py: mk_select(): using subset function: Turn off enable_sort (within the transaction) to avoid unwanted slow sorts. This change (along with the subset functions themselves) should significantly reduce the long FIA.occurrence_all table subset time (~8 hours altogether) and with it the total import time, which had more than doubled as a result of the FIA refresh. Note that this issue would have been even more pronounced for larger datasets, such as the GBIF refresh, which would have taken ~2.5 days longer (400 million rows * ~30% are plants * (FIA: ~8 hours/16.7 million rows) * 1 day/24 hours).
8334	04/04/2013 09:30 AM	Aaron Marcuse-Kubitza	lib/sql.py: mk_select(): Use subset function when it's available for fast querying at large OFFSET values
8333	04/04/2013 09:29 AM	Aaron Marcuse-Kubitza	lib/sql.py: Added has_subset_func()
8223	03/28/2013 12:24 AM	Aaron Marcuse-Kubitza	lib/sql.py: table_order_by(): Documented that it returns None if table is a view, because table_cluster_on() would return None. This is necessary for inputs/FIA/occurrence_all/ sorting to work correctly, because specifying a manual sort order would prevent the query planner from just using fast nested loop joins and instead cause it to perform a slow sort. (This appears to be a bug in the query planner, because when the column list specified matches the joined-on indexes, there should be no need for post-nested loop re-sorting.)
7242	01/16/2013 06:41 AM	Aaron Marcuse-Kubitza	sql.py: Added view_exists()
7179	01/11/2013 06:01 AM	Aaron Marcuse-Kubitza	sql.py: distinct_table(): Added support for custom joins used in creating the new table. This can then be used by sql_io.put_table() to filter out duplicate rows in the out_table, so that they don't create duplicate key errors and the resulting index holes.
7113	01/07/2013 08:16 PM	Aaron Marcuse-Kubitza	sql.py: create_table(): Support creating a table like a view
7112	01/07/2013 08:04 PM	Aaron Marcuse-Kubitza	sql.py: Added InvalidTypeException and parse it in parse_exception()
6976	12/20/2012 10:05 AM	Aaron Marcuse-Kubitza	sql.py: DbConn._db(): search_path: Don't append the existing search_path, because it usually includes the public schema, which is now different from the schema being imported into. This fixes a bug where sql.function_exists() would find public-schema functions in both the public schema and the import's schema because both were in the search_path, causing a DuplicateException "more than one function named ...". Note that the elements of the existing search_path are no longer needed now that vegbien_dest's $schemas includes $public. Also note that if an instance of DbConn does not specify the schemas param, the existing search_path will be left as-is rather than overwritten with an empty list.
6905	12/19/2012 12:29 AM	Aaron Marcuse-Kubitza	sql.py: DbConn.col_info(): raising sql_gen.NoUnderlyingTableException: Fixed bug where also need to catch DoesNotExistException, which is thrown by ::regclass
6904	12/19/2012 12:26 AM	Aaron Marcuse-Kubitza	sql.py: DbConn.col_info(): Fixed bug where need to run run_query() recoverably, because this query throws an exception if the column's table does not exist (the information_schema query just returned no rows)
6903	12/19/2012 12:22 AM	Aaron Marcuse-Kubitza	sql.py: DbConn.col_info(): Fixed bug where need to use pg_get_expr() on pg_attrdef.adbin instead of shortcut field adsrc, because adsrc does not include schema qualifiers on table names (including strings passed to `nextval('..._seq'::regclass)`)
6902	12/18/2012 11:42 PM	Aaron Marcuse-Kubitza	sql.py: DbConn.col_info(): Fixed bug where need to pass through cacheable param to run_query()
6901	12/18/2012 11:41 PM	Aaron Marcuse-Kubitza	sql.py: DbConn.col_info(): Fixed bug where need to use .to_str(self) instead of self.esc_value() because self.esc_value() expects a value, not a sql_gen.Literal instance
6900	12/18/2012 11:34 PM	Aaron Marcuse-Kubitza	sql.py: DbConn.col_info(): Fixed bug where self needs to be used everywhere that db normally is, because this is a DbConn method rather than a global function
6899	12/18/2012 11:31 PM	Aaron Marcuse-Kubitza	sql.py: DbConn.col_info(): For PostgreSQL, use pg_catalog tables directly instead of their views in information_schema. This allows using ::regclass to look up the table in the search_path, and fixes a bug in imports with an explicit public schema where column types were looked up in public instead of public.<version>. Also don't wrap default using sql_gen.as_Code() when it's None (indicating no default value, aka default=NULL), because this value is interpreted specially by sql_gen.TypedCol.
6301	11/19/2012 05:34 PM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: If 1st param is hstore, only perform the cast on the value param. This fixes a bug in _map() calls whose value is a non-text type, such as SALVIAS.plotMetadata.AccessCode.
5990	11/05/2012 03:56 PM	Aaron Marcuse-Kubitza	sql.py: Added table_has_pkey()
5989	11/05/2012 03:13 PM	Aaron Marcuse-Kubitza	sql.py: table_pkey_col(): For PostgreSQL DBs, use pg_catalog via index_cols() and table_pkey_index(), in order to use the search_path to look up the table. This fixes a bug where the pkey would be selected from information_schema.table_constraints in random order, and this order sometimes returned the corresponding table in the public schema but sometimes in other schemas, such as VegBank. This became a problem now that VegBIEN has a place table, which conflicts with VegBank's place table. (Most other VegBank tables that are mapped to have been renamed in VegBIEN.)
5988	11/05/2012 02:41 PM	Aaron Marcuse-Kubitza	sql.py: add_pkey_or_index(): Use new add_pkey_index()
5987	11/05/2012 02:41 PM	Aaron Marcuse-Kubitza	sql.py: Added add_pkey_index()
5967	11/02/2012 03:35 PM	Aaron Marcuse-Kubitza	sql.py: run_query_into(): add_pkey_: Use add_pkey_or_index() instead of add_pkey(). This should prevent errors for input rows which match multiple output rows in VegBIEN.
5903	11/01/2012 12:55 AM	Aaron Marcuse-Kubitza	sql.py: distinct_table(): Use DISTINCT ON instead of a unique index and insert_select()'s ignore mode to remove duplicate rows. This uses whichever sorting method PostgreSQL deems to be fastest instead of requiring the use of a B-tree index. Since most of the slower operations in TNRS's import are distinct_table() calls, this should speed up the TNRS import, which is a bottleneck for the DB import as a whole because the TNRS import must complete before other datasources can be imported.
5902	11/01/2012 12:36 AM	Aaron Marcuse-Kubitza	sql.py: distinct_table(): Changed comment about distinct_on column index to include just the input table, so that the function does not guarantee a unique index on the output table's distinct_on columns
5890	10/31/2012 09:40 PM	Aaron Marcuse-Kubitza	sql.py: run_query(): Give failed EXPLAIN approximately the log_level of its query, so that queries which produce an error in the EXPLAIN before the query itself is even run will still be logged
5823	10/26/2012 08:18 PM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): "could not create unique index" DuplicateKeyException: Fixed bug where can't use make_DuplicateKeyException() because it tries to retrieve information about the index in question, but the index it was trying to create doesn't exist
5792	10/25/2012 05:16 PM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: If first param's type is anyelement (for polymorphic function, which had mismatched arg types), use type text, as all types can cast to it
5765	10/25/2012 07:44 AM	Aaron Marcuse-Kubitza	sql.py: Added add_pkey_or_index()
5764	10/25/2012 07:32 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): Parse "could not create unique index ... Key is duplicated" errors as DuplicateKeyException
5763	10/25/2012 07:27 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): DuplicateKeyException: Factored out creation of DuplicateKeyException into helper function
5724	10/23/2012 06:38 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): types cannot be matched MissingCastException: Use the first type as the type to cast to instead of text
5723	10/23/2012 05:59 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): InvalidValueException: Fixed bug in regexp where can't use .*? before (?:...)? surrounding matched value, because it prevents the value from being matched now that it is optional
5717	10/23/2012 05:17 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): InvalidValueException: Also match exceptions which don't provide a specific value but just indicate that a value was invalid, such as PL/Python's "day is out of range for month"
5715	10/23/2012 04:34 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: Return the actual type of the function's 1st param, using new function_param0_type(), rather than just text
5714	10/23/2012 04:31 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: Fixed bug where can't return the function name as the name of what was missing the cast, because this must be a column
5713	10/23/2012 04:28 AM	Aaron Marcuse-Kubitza	sql.py: Added function_param0_type()
5712	10/23/2012 04:26 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: Only treat DoesNotExistException as a MissingCastException if the query that was run did not already include a cast, to avoid infinite exception-handling recursion
5711	10/23/2012 04:24 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: Fixed bug where determining whether the exception is a MissingCastException rather than a DoesNotExistException needs to check whether the function exists rather than whether it's the same in the exception message as in the query that was run. The exception message will of course copy the function name verbatim from the query, so there is no information in the exception message itself to indicate whether the DoesNotExistException was caused by a missing cast or by a nonexistent function.
5710	10/23/2012 04:19 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: Documented that the regexp match to extract the function name also checks that a function signature with param types was matched, indicating a function call rather than cast to regproc. This check will also help avoid infinite recursion when function MissingCastException parsing calls database structure introspection functions.
5709	10/23/2012 04:15 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: Don't match quotes around the function name because this particular exception (incorrect param type) does not include them. Casts to regproc, which also produce a DoesNotExistException, include the quotes but do not indicate a MissingCastException.
5708	10/23/2012 04:12 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: Fixed bug where the 1st param's type in the exception's function signature is not actually the type the argument needs to have, because this is just the argument's current type
5707	10/23/2012 04:04 AM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): typed_name_re: Also match identifiers without quotes, such as functions in "No function matches the given name and argument types" errors. This fixes a bug where DoesNotExistExceptions could not be parsed as MissingCastExceptions when applicable because the DoesNotExistException pattern would not even match.
5690	10/19/2012 06:20 PM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: Support functions with named parameters
5689	10/19/2012 06:18 PM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): function MissingCastException: Support function names enclosed in quotes on the context line
5683	10/19/2012 05:50 PM	Aaron Marcuse-Kubitza	sql.py: parse_exception(): DoesNotExistException: If item not found was a function and not found only because of a missing cast, raise MissingCastException instead. This should allow automatic casts to be added on function parameters as well as table columns.
5642	10/18/2012 01:01 PM	Aaron Marcuse-Kubitza	sql.py: mk_select(): Don't add table0 to order_by with no table, because this could cause it not to match a corresponding DISTINCT ON column with no explicit table. PostgreSQL apparently does not treat a column with no explicit table and a column with the applicable table as identical for purposes of ORDER BY/DISTINCT ON checking, even when they refer to the same physical column.
5641	10/18/2012 12:53 PM	Aaron Marcuse-Kubitza	sql.py: mk_select(): order_by defaults to first distinct_on column when distinct_on provided
5579	10/17/2012 09:45 AM	Aaron Marcuse-Kubitza	sql.py: DbConn: Added set_encoding()
5577	10/17/2012 09:28 AM	Aaron Marcuse-Kubitza	sql.py: run_query(): Factored exception parsing out into new parse_exception()
5576	10/17/2012 09:22 AM	Aaron Marcuse-Kubitza	sql.py: Added EncodingException and parse it in run_query()
5575	10/17/2012 09:14 AM	Aaron Marcuse-Kubitza	sql.py: Removed no longer used NameException
5529	10/15/2012 03:23 PM	Aaron Marcuse-Kubitza	sql.py: Added copy_table()
5528	10/15/2012 03:14 PM	Aaron Marcuse-Kubitza	sql.mk_select() calls: Removed no longer needed order_by=None when limit=0
5527	10/15/2012 03:11 PM	Aaron Marcuse-Kubitza	sql.py: mk_select(): Set order_by to None if limit == 0
5525	10/15/2012 02:52 PM	Aaron Marcuse-Kubitza	sql.py: table_order_by(): Cache the order_by in table.order_by and propagate it when a LIKE table is created
5523	10/15/2012 02:36 PM	Aaron Marcuse-Kubitza	sql.select() calls: Removed order_by=None everywhere that a stable row order is required (i.e. consistent between selects, or consistent between table transformations). This causes several tests to return different inserted row counts, because the input table is now being accessed in pkey order instead of in table order. This fixes a bug where tables with more rows than ~100 would return different results for repeated calls of the same non-ordered select.
5522	10/15/2012 02:27 PM	Aaron Marcuse-Kubitza	sql.py: mk_select(): Use table_order_by() instead of table_pkey_col() to determine what column(s) to order by if order_by is set to order_by_pkey
5521	10/15/2012 02:26 PM	Aaron Marcuse-Kubitza	sql.py: Added table_pkey_index(), index_order_by(), table_cluster_on(), table_order_by()
5520	10/15/2012 01:10 PM	Aaron Marcuse-Kubitza	sql.py: Added index_exprs() and use it in index_cols()
5516	10/15/2012 12:10 PM	Aaron Marcuse-Kubitza	sql.py: mk_select(): order_by is order_by_pkey: Only order by the table's actual pkey, if it has one, rather than using the first column if it doesn't
5447	10/12/2012 01:11 AM	Aaron Marcuse-Kubitza	sql.py: ConstraintException: message: Wrap condition in strings.as_tt()
5446	10/12/2012 12:30 AM	Aaron Marcuse-Kubitza	sql.py: run_query(): DuplicateKeyException: Also retrieve the index's condition using new index_cond()
5445	10/12/2012 12:28 AM	Aaron Marcuse-Kubitza	sql.py: Added index_cond()
5443	10/11/2012 11:40 PM	Aaron Marcuse-Kubitza	sql.py: constraint_cond(): Fixed NotImplementedError message to apply to this function
5398	10/10/2012 07:10 AM	Aaron Marcuse-Kubitza	sql.py: mk_update(): in_place: Convert columns of type character varying to text so that they can be merge-joined with text columns. Note that these two types are equivalent but not aliases of one another, so the explicit type change is needed.
5396	10/10/2012 06:52 AM	Aaron Marcuse-Kubitza	sql.py: mk_update(): in_place: Factored retrieval of column type out into separate statement for clarity
5393	10/10/2012 05:51 AM	Aaron Marcuse-Kubitza	sql.py: table_pkey_col(): Raise a DoesNotExistException if the table has no pkey
5392	10/10/2012 05:23 AM	Aaron Marcuse-Kubitza	sql.py: pkey_col(): Call table_pkey_col() directly rather than via pkey_name(). pkey_name(): Call pkey_col() instead of table_pkey_col() now that pkey_col() calls table_pkey_col().
5391	10/10/2012 05:14 AM	Aaron Marcuse-Kubitza	sql.py: pkey_col(): Documented that if there is no pkey, returns the first column in the table
5390	10/10/2012 05:13 AM	Aaron Marcuse-Kubitza	sql.py: pkey_col(): Specify recover directly as a kw_arg because it's the only kw_arg passed to pkey_name()
5389	10/10/2012 05:10 AM	Aaron Marcuse-Kubitza	sql.py: Added table_pkey_col() and use it in pkey_name()
5388	10/10/2012 05:01 AM	Aaron Marcuse-Kubitza	sql.py: Renamed pkey() to pkey_name()
5387	10/10/2012 04:45 AM	Aaron Marcuse-Kubitza	sql.py: Renamed pkey_col_() to pkey_col()
5386	10/10/2012 04:43 AM	Aaron Marcuse-Kubitza	sql.py: Removed no longer used pkey_col
5383	10/10/2012 04:36 AM	Aaron Marcuse-Kubitza	sql.py: Added table_cols()
5367	10/10/2012 12:55 AM	Aaron Marcuse-Kubitza	Moved expression transforming functions from sql.py to sql_gen.py because they do not manipulate an actual database and merely generate SQL
5366	10/10/2012 12:38 AM	Aaron Marcuse-Kubitza	sql.py: Added true_expr, false_expr and use them where their values are used
5365	10/10/2012 12:34 AM	Aaron Marcuse-Kubitza	sql.py: simplify_expr(): Also simplify "AND true" expressions
5364	10/10/2012 12:30 AM	Aaron Marcuse-Kubitza	sql.py: simplify_expr(): Also simplify "AND false" expressions
5363	10/10/2012 12:19 AM	Aaron Marcuse-Kubitza	sql.py: Added atom_re and use it in simplify_parens()
5362	10/10/2012 12:19 AM	Aaron Marcuse-Kubitza	sql.py: Added or_re and use it in simplify_expr()
5361	10/10/2012 12:18 AM	Aaron Marcuse-Kubitza	sql.py: logic_op_re(): Added expr_re param for an expr on the other side of the operator
5360	10/09/2012 11:54 PM	Aaron Marcuse-Kubitza	sql.py: simplify_parens(): Use bool_re
5359	10/09/2012 11:54 PM	Aaron Marcuse-Kubitza	sql.py: Removed no longer needed paren_re()
5358	10/09/2012 11:53 PM	Aaron Marcuse-Kubitza	sql.py: true_re, false_re: Removed no longer needed paren_re() because simplify_parens() now handles this
5357	10/09/2012 11:50 PM	Aaron Marcuse-Kubitza	sql.py: simplify_expr(): Removed final simplify_parens() because this is now done by simplify_recursive()
5356	10/09/2012 11:49 PM	Aaron Marcuse-Kubitza	sql.py: simplify_expr(): Use new simplify_recursive(). This also fixes a bug where some logic expressions are not simplified because of extra parens.

Project

General

Profile