lib/sql.py: distinct_table(): don't cache the table creation, because this prevents different distinct_tables from being created for the same input table
lib/sql.py: create_table(), copy_table_struct(): support custom query options, such as cacheable
lib/sql.py: parse_exception(): parse "operator does not exist" errors as MissingCastExceptions (these appear when a staging table column is of the wrong type)
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
lib/sql.py: distinct_table(): support custom filters on the distincting query
bugfix: lib/sql.py: parse_exception(): MissingCastException from DoesNotExistException for function: handle overloaded functions where none of the overloads supports the given arg types (so assume text). this may have become a bug from system upgrades?
lib/sql.py: added col_default_value(), col_is_constant(), which interface with corresponding util-schema functions
bugfix: lib/sql.py: parse_exception(): make_DuplicateKeyException(): handle nested exceptions (which should never be generated, but may be in case of sql.py bugs such as wiki.vegpath.org/To_Do#Fixes > #1) by printing the nested exception and then rethrowing the original exception, so that the original exception does not get lost and still ends up at the end of the program's output, to enable debugging
bugfix: lib/sql.py: parse_exception(): typed_name_re: added back matching of names without "", since these are used by some error messages (ones that contain () after the function name)
bugfix: lib/sql.py: parse_exception(): typed_name_re: need to allow " within the matched name, since there are now "" around the entire identifer that was passed to Postgres, which may itself include " . always require "" around the matched name, to ensure that the whole name is matched by .+? e.g. when followed by () for a function call. the version of Postgres we currently use apparently no longer has error messages without the "", so we don't need a separate regexp for quoted and unquoted names.
bugfix: lib/sql.py: parse_exception(): typed_name_re: need to ensure that full name is matched rather than just first character
bugfix: lib/sql.py: parse_exception(): typed_name_re (used by MissingCastException, etc.): don't require "" around the value, because function names in "function does not exist" exceptions are (now?) output without them. this exception did not occur in the last import, but does occur in the automated testing and when running the associated query in pgAdmin (in both Linux and Mac OS X).
lib/sql.py: parse_exception(): typed_name_re: Fixed bug where need to require the "" around the table/column name, because otherwise, the regexp will try to match as few characters as possible, causing it to match only the first letter of the name
lib/sql.py: mk_select(): using subset function: Turn off enable_sort (within the transaction) to avoid unwanted slow sorts. This change (along with the subset functions themselves) should significantly reduce the long FIA.occurrence_all table subset time (~8 hours altogether) and with it the total import time, which had more than doubled as a result of the FIA refresh. Note that this issue would have been even more pronounced for larger datasets, such as the GBIF refresh, which would have taken ~2.5 days longer (400 million rows * ~30% are plants * (FIA: ~8 hours/16.7 million rows) * 1 day/24 hours).
lib/sql.py: mk_select(): Use subset function when it's available for fast querying at large OFFSET values
lib/sql.py: Added has_subset_func()
lib/sql.py: table_order_by(): Documented that it returns None if table is a view, because table_cluster_on() would return None. This is necessary for inputs/FIA/occurrence_all/ sorting to work correctly, because specifying a manual sort order would prevent the query planner from just using fast nested loop joins and instead cause it to perform a slow sort. (This appears to be a bug in the query planner, because when the column list specified matches the joined-on indexes, there should be no need for post-nested loop re-sorting.)
sql.py: Added view_exists()
sql.py: distinct_table(): Added support for custom joins used in creating the new table. This can then be used by sql_io.put_table() to filter out duplicate rows in the out_table, so that they don't create duplicate key errors and the resulting index holes.
sql.py: create_table(): Support creating a table like a view
sql.py: Added InvalidTypeException and parse it in parse_exception()
sql.py: DbConn._db(): search_path: Don't append the existing search_path, because it usually includes the public schema, which is now different from the schema being imported into. This fixes a bug where sql.function_exists() would find public-schema functions in both the public schema and the import's schema because both were in the search_path, causing a DuplicateException "more than one function named ...". Note that the elements of the existing search_path are no longer needed now that vegbien_dest's $schemas includes $public. Also note that if an instance of DbConn does not specify the schemas param, the existing search_path will be left as-is rather than overwritten with an empty list.
sql.py: DbConn.col_info(): raising sql_gen.NoUnderlyingTableException: Fixed bug where also need to catch DoesNotExistException, which is thrown by ::regclass
sql.py: DbConn.col_info(): Fixed bug where need to run run_query() recoverably, because this query throws an exception if the column's table does not exist (the information_schema query just returned no rows)
sql.py: DbConn.col_info(): Fixed bug where need to use pg_get_expr() on pg_attrdef.adbin instead of shortcut field adsrc, because adsrc does not include schema qualifiers on table names (including strings passed to `nextval('..._seq'::regclass)`)
sql.py: DbConn.col_info(): Fixed bug where need to pass through cacheable param to run_query()
sql.py: DbConn.col_info(): Fixed bug where need to use .to_str(self) instead of self.esc_value() because self.esc_value() expects a value, not a sql_gen.Literal instance
sql.py: DbConn.col_info(): Fixed bug where self needs to be used everywhere that db normally is, because this is a DbConn method rather than a global function
sql.py: DbConn.col_info(): For PostgreSQL, use pg_catalog tables directly instead of their views in information_schema. This allows using ::regclass to look up the table in the search_path, and fixes a bug in imports with an explicit public schema where column types were looked up in public instead of public.<version>. Also don't wrap default using sql_gen.as_Code() when it's None (indicating no default value, aka default=NULL), because this value is interpreted specially by sql_gen.TypedCol.
sql.py: parse_exception(): function MissingCastException: If 1st param is hstore, only perform the cast on the value param. This fixes a bug in _map() calls whose value is a non-text type, such as SALVIAS.plotMetadata.AccessCode.
sql.py: Added table_has_pkey()
sql.py: table_pkey_col(): For PostgreSQL DBs, use pg_catalog via index_cols() and table_pkey_index(), in order to use the search_path to look up the table. This fixes a bug where the pkey would be selected from information_schema.table_constraints in random order, and this order sometimes returned the corresponding table in the public schema but sometimes in other schemas, such as VegBank. This became a problem now that VegBIEN has a place table, which conflicts with VegBank's place table. (Most other VegBank tables that are mapped to have been renamed in VegBIEN.)
sql.py: add_pkey_or_index(): Use new add_pkey_index()
sql.py: Added add_pkey_index()
sql.py: run_query_into(): add_pkey_: Use add_pkey_or_index() instead of add_pkey(). This should prevent errors for input rows which match multiple output rows in VegBIEN.
sql.py: distinct_table(): Use DISTINCT ON instead of a unique index and insert_select()'s ignore mode to remove duplicate rows. This uses whichever sorting method PostgreSQL deems to be fastest instead of requiring the use of a B-tree index. Since most of the slower operations in TNRS's import are distinct_table() calls, this should speed up the TNRS import, which is a bottleneck for the DB import as a whole because the TNRS import must complete before other datasources can be imported.
sql.py: distinct_table(): Changed comment about distinct_on column index to include just the input table, so that the function does not guarantee a unique index on the output table's distinct_on columns
sql.py: run_query(): Give failed EXPLAIN approximately the log_level of its query, so that queries which produce an error in the EXPLAIN before the query itself is even run will still be logged
sql.py: parse_exception(): "could not create unique index" DuplicateKeyException: Fixed bug where can't use make_DuplicateKeyException() because it tries to retrieve information about the index in question, but the index it was trying to create doesn't exist
sql.py: parse_exception(): function MissingCastException: If first param's type is anyelement (for polymorphic function, which had mismatched arg types), use type text, as all types can cast to it
sql.py: Added add_pkey_or_index()
sql.py: parse_exception(): Parse "could not create unique index ... Key is duplicated" errors as DuplicateKeyException
sql.py: parse_exception(): DuplicateKeyException: Factored out creation of DuplicateKeyException into helper function
sql.py: parse_exception(): types cannot be matched MissingCastException: Use the first type as the type to cast to instead of text
sql.py: parse_exception(): InvalidValueException: Fixed bug in regexp where can't use .*? before (?:...)? surrounding matched value, because it prevents the value from being matched now that it is optional
sql.py: parse_exception(): InvalidValueException: Also match exceptions which don't provide a specific value but just indicate that a value was invalid, such as PL/Python's "day is out of range for month"
sql.py: parse_exception(): function MissingCastException: Return the actual type of the function's 1st param, using new function_param0_type(), rather than just text
sql.py: parse_exception(): function MissingCastException: Fixed bug where can't return the function name as the name of what was missing the cast, because this must be a column
sql.py: Added function_param0_type()
sql.py: parse_exception(): function MissingCastException: Only treat DoesNotExistException as a MissingCastException if the query that was run did not already include a cast, to avoid infinite exception-handling recursion
sql.py: parse_exception(): function MissingCastException: Fixed bug where determining whether the exception is a MissingCastException rather than a DoesNotExistException needs to check whether the function exists rather than whether it's the same in the exception message as in the query that was run. The exception message will of course copy the function name verbatim from the query, so there is no information in the exception message itself to indicate whether the DoesNotExistException was caused by a missing cast or by a nonexistent function.
sql.py: parse_exception(): function MissingCastException: Documented that the regexp match to extract the function name also checks that a function signature with param types was matched, indicating a function call rather than cast to regproc. This check will also help avoid infinite recursion when function MissingCastException parsing calls database structure introspection functions.
sql.py: parse_exception(): function MissingCastException: Don't match quotes around the function name because this particular exception (incorrect param type) does not include them. Casts to regproc, which also produce a DoesNotExistException, include the quotes but do not indicate a MissingCastException.
sql.py: parse_exception(): function MissingCastException: Fixed bug where the 1st param's type in the exception's function signature is not actually the type the argument needs to have, because this is just the argument's current type
sql.py: parse_exception(): typed_name_re: Also match identifiers without quotes, such as functions in "No function matches the given name and argument types" errors. This fixes a bug where DoesNotExistExceptions could not be parsed as MissingCastExceptions when applicable because the DoesNotExistException pattern would not even match.
sql.py: parse_exception(): function MissingCastException: Support functions with named parameters
sql.py: parse_exception(): function MissingCastException: Support function names enclosed in quotes on the context line
sql.py: parse_exception(): DoesNotExistException: If item not found was a function and not found only because of a missing cast, raise MissingCastException instead. This should allow automatic casts to be added on function parameters as well as table columns.
sql.py: mk_select(): Don't add table0 to order_by with no table, because this could cause it not to match a corresponding DISTINCT ON column with no explicit table. PostgreSQL apparently does not treat a column with no explicit table and a column with the applicable table as identical for purposes of ORDER BY/DISTINCT ON checking, even when they refer to the same physical column.
sql.py: mk_select(): order_by defaults to first distinct_on column when distinct_on provided
sql.py: DbConn: Added set_encoding()
sql.py: run_query(): Factored exception parsing out into new parse_exception()
sql.py: Added EncodingException and parse it in run_query()
sql.py: Removed no longer used NameException
sql.py: Added copy_table()
sql.mk_select() calls: Removed no longer needed order_by=None when limit=0
sql.py: mk_select(): Set order_by to None if limit == 0
sql.py: table_order_by(): Cache the order_by in table.order_by and propagate it when a LIKE table is created
sql.select() calls: Removed order_by=None everywhere that a stable row order is required (i.e. consistent between selects, or consistent between table transformations). This causes several tests to return different inserted row counts, because the input table is now being accessed in pkey order instead of in table order. This fixes a bug where tables with more rows than ~100 would return different results for repeated calls of the same non-ordered select.
sql.py: mk_select(): Use table_order_by() instead of table_pkey_col() to determine what column(s) to order by if order_by is set to order_by_pkey
sql.py: Added table_pkey_index(), index_order_by(), table_cluster_on(), table_order_by()
sql.py: Added index_exprs() and use it in index_cols()
sql.py: mk_select(): order_by is order_by_pkey: Only order by the table's actual pkey, if it has one, rather than using the first column if it doesn't
sql.py: ConstraintException: message: Wrap condition in strings.as_tt()
sql.py: run_query(): DuplicateKeyException: Also retrieve the index's condition using new index_cond()
sql.py: Added index_cond()
sql.py: constraint_cond(): Fixed NotImplementedError message to apply to this function
sql.py: mk_update(): in_place: Convert columns of type character varying to text so that they can be merge-joined with text columns. Note that these two types are equivalent but not aliases of one another, so the explicit type change is needed.
sql.py: mk_update(): in_place: Factored retrieval of column type out into separate statement for clarity
sql.py: table_pkey_col(): Raise a DoesNotExistException if the table has no pkey
sql.py: pkey_col(): Call table_pkey_col() directly rather than via pkey_name(). pkey_name(): Call pkey_col() instead of table_pkey_col() now that pkey_col() calls table_pkey_col().
sql.py: pkey_col(): Documented that if there is no pkey, returns the first column in the table
sql.py: pkey_col(): Specify recover directly as a kw_arg because it's the only kw_arg passed to pkey_name()
sql.py: Added table_pkey_col() and use it in pkey_name()
sql.py: Renamed pkey() to pkey_name()
sql.py: Renamed pkey_col_() to pkey_col()
sql.py: Removed no longer used pkey_col
sql.py: Added table_cols()
Moved expression transforming functions from sql.py to sql_gen.py because they do not manipulate an actual database and merely generate SQL
sql.py: Added true_expr, false_expr and use them where their values are used
sql.py: simplify_expr(): Also simplify "AND true" expressions
sql.py: simplify_expr(): Also simplify "AND false" expressions
sql.py: Added atom_re and use it in simplify_parens()
sql.py: Added or_re and use it in simplify_expr()
sql.py: logic_op_re(): Added expr_re param for an expr on the other side of the operator
sql.py: simplify_parens(): Use bool_re
sql.py: Removed no longer needed paren_re()
sql.py: true_re, false_re: Removed no longer needed paren_re() because simplify_parens() now handles this
sql.py: simplify_expr(): Removed final simplify_parens() because this is now done by simplify_recursive()
sql.py: simplify_expr(): Use new simplify_recursive(). This also fixes a bug where some logic expressions are not simplified because of extra parens.