xml_func.py: process(): Call list() on the items' iterator so that its length can be determined wherever it's used without needing to call list() on it separately
sql.py: mk_select(): Put '*' on same line as 'SELECT' if there is no DISTINCT ON, to avoid extra newlines
sql_gen.py: NamedTable.to_str(): Put AS clause on same line as code being named if it didn't contain a newline, to avoid extra newlines
sql.py: run_query(): Always parse exceptions, whether recover is set or not, to avoid making the caller set recover just to parse exceptions. If recover is not set, just don't run any queries when generating the parsed exception and return an empty value for the needed information. (A partial parsed exception is better than an unparsed one.)
sql_gen.py: is_temp_col(): Use is_table_col() so that the type of the parameter is also checked
inputs/import.stats.xls: Bolded the ms/row totals. Added note that non-bolded totals are not directly comparable because different methods were run with different numbers of rows.
sql.py: put_table(): Inserting new rows: Removed no longer needed filter-out LEFT JOIN, since the INSERT IGNORE replacement handles duplicate keys
sql.py: put_table(): Inserting new rows: Use insert_select()'s (mk_insert_select()'s) new ignore option to ignore duplicate keys
sql_gen.py: to_name_only_col(): Consider any non-table column, including NamedCol, to be already name-only. This fixes a bug in sql.mk_insert_select() where the value of a NamedCol was removed by to_name_only_col() even though it was needed.
sql.py: mk_insert_select(): INSERT IGNORE: Use an EXCEPTION block for each individual row because "When an error is caught by an EXCEPTION clause, [...] all changes to persistent database state within the block are rolled back." (http://www.postgresql.org/docs/8.3/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING). Documented that cursor stays at current position in spite of automatic ROLLBACK on EXCEPTION.
sql.py: mk_insert_select(): INSERT IGNORE: Use RETURN QUERY on the insert query instead of RETURN NEXT on a local var to forward the RETURNING value
sql.py: mk_insert_select(): INSERT IGNORE: Support RETURNING clause. Always return something to set the correct rowcount.
sql_gen.py: Use an as_*() function instead of manually checking the type wherever possible
sql.py: mk_insert_select(): INSERT IGNORE: Added duplicate key handling by using EXCEPTION block to catch unique_violation
sql.py: mk_insert_select(): INSERT IGNORE: Fixed bug where user-defined types were not supported correctly, by referencing the column's type directly in PL/pgSQL instead of retrieving it from the information_schema. Use a row variable of the output table's row type to store the select query data, so that you don't need to create a separate local var for each output column.
import.stats.xls: Changed \/row (ms)" to "ms/row"
sql.py: mk_insert_select(): INSERT IGNORE: Used typed vars to store each column of the select query because the select query's column names are not known, so it would be unknown which record var column to use for each output column. (Add types so there are no unknown-type values that PostgreSQL will say can't be cast.)
sql.py: insert_select(): In ignore mode, always recover from errors
sql.py: mk_insert_select(): INSERT IGNORE: Implemented simple cursor loop that just inserts each row, without ignoring duplicate keys
sql.py: mk_insert_select(): Added ignore param to do an INSERT IGNORE equivalent operation. Added basic structure for INSERT IGNORE. Support embeddable queries without RETURNING columns (which is the case for most INSERT IGNORE queries).
sql.py: run_query_into(): Always set into.schema to None, so that table is created in the temp schema in debug_temp mode
sql.py: DbConn.DbCursor.execute(): Don't cache ADD COLUMN queries because the rest of the query must be unique in order to use caching
sql.py: add_col(): Don't ignore already-existing columns because sometimes name truncation causes collisions, requiring the caller to version the name. add_index_col(): Version the column name to avoid collisions.
sql.py: add_index(): Version index names to avoid collisions
sql.py: DbConn.DbCursor.execute(): Always cache structural changes, not just if they throw an exception
sql.py: add_index(): Fixed bug where expr could not be deep-copied until ensure_not_null() had been run on it, because ensure_not_null() modifies the index_cols of the table and this needs to be modified on the original table
sql_gen.py: EnsureNotNull: Added to_str() which uses the column's index col if available. This works around a PostgreSQL bug where indexes on expressions are sometimes not used, but indexes on plain columns are used.
sql_gen.py: Store index cols in the table instead of in the column, because columns are often recreated from the table and name but tables are generally not copied
sql_gen.py: Added is_indexed_col()
sql.py: add_index(): Use new sql.ensure_not_null(), which creates a separate column to store the index expr where possible
sql.py: Added add_not_null(), add_index_col(), ensure_not_null()
sql_gen.py: Added is_temp_col()
sql_gen.py: concat(): Also preserve cast suffix
sql_gen.py: Col: Added index_col attr
sql_gen.py: Added suffixed_col()
sql.py: run_query_into(): Set the into table to be a temp table
sql_gen.py: Table: Store whether table is temp table
sql_gen.py: concat(): Also preserve appended column names
sql_gen.py: Col.to_str(): Take for_str param which does both concat() and str()/clean_name() instead of use_concat param which doesn't remove quotes from the strings before concatenating, causing strings to be incorrectly truncated
sql_gen.py: EnsureNotNull: Cast the null sentinel to the type it's the sentinel for, so that COALESCE returns the same type as the provided type, instead of potentially incorrectly assuming the constant's type. This fixes a bug where COALESCE returned an incorrect type.
sql.py: cast(): Use new sql_gen.Cast
sql_gen.py: Added Cast
sql_gen.py: concat(): Consider ")"s part of the existing suffix, so that parentheses match up
Removed extra trailing whitespace
main Makefile: Added explicit schemas/temp/reinstall target because vegbiendev (Ubuntu 12.04) seems to ignore %/reinstall
vegbien_dest: Added temp to schemas list so the intermediate tables will be created there, where they are easy to remove with `make schemas/temp/reinstall`
main Makefile: Added %/reinstall, which calls the corresponding uninstall and install targets
Added temp schema to store intermediate tables permanently when desired for debugging. It is not part of schemas/all because it should not be remade to include current intermediate tables when remaking other schema exports.
schemas: Moved *.sql.make into Makefile to take advantage of % pattern matching
Removed no longer used schemas/vegbien_empty.sql
main Makefile: Removed empty_db, because `make schemas/reinstall` has the same effect and is simpler
README.TXT: Changed documentation to use make schemas/reinstall to empty the DB, since that command is simpler. Added how to archive the last import.
db_xml.py: put_table(): Removed `if not db.debug_temp` check because that is done by sql.empty_temp()
sql.py: put_table(): Use new empty_temp()
import.stats.xls: Added comments for estimated numbers. Added "," separators to large numbers.
sql.py: empty_temp(): In debug_temp mode, leave temp tables there for debugging
sql.py: empty_temp(): Don't output at log_level 2 because it's an internal query, not part of the core algorithm
sql.py: truncate(): Added kw_args to pass to run_query()
Added inputs/import.stats.xls, which compares row-based and column-based import. This shows that column-based import is slowed down by table locking when run simultaneously, so we will need a new INSERT IGNORE replacement that doesn't lock tables.
inputs: Ignore OpenOffice.org lock files
sql.py: empty_temp(): Don't print log message if not emptying any tables
db_xml.py: put_table(): Empty unneeded temp tables to free up memory
sql.py: Added empty_temp()
sql.py: Use new lists.mk_seq()
lists.py: Added mk_seq()
lists.py: is_seq(): Also return true for sets
schemas/vegbien.sql: specimenreplicate: Added indexes using COALESCE to match what sql_gen does
sql.py: put_table(): Getting output table pkeys of existing/inserted rows: Do a DISTINCT ON the input pkey (row_num) in case the plain JOIN matched multiple output table rows for one input table row
sql.py: put_table(): Empty unneeded temp tables to free up memory and avoid running out of memory (the temp tables seem to be in-memory only)
sql_gen.py: null_sentinels: Added value for type timestamp with time zone. Put each type on its own line for clarity.
schemas/vegbien.sql: locationdetermination: Changed indexes to use COALESCE to match what sql_gen now does
schemas/vegbien.sql: location: Added indexes using COALESCE to match what sql_gen does
sql.py: cast_temp_col(): Add an index on the created column
sql_gen.py: null_sentinels: Added value for type double precision
sql_gen.py: ensure_not_null(): Warn of no null sentinel for type, even if caller catches error
schemas/py_functions.sql: Added plain function _namePart() and use it in trigger function _namePart()
schemas/py_functions.sql: Added plain functions _dateRangeStart() and _dateRangeEnd() and use them in trigger functions _dateRangeStart() and _dateRangeEnd()
schemas/functions.sql: _label(): Ensure that label is NOT NULL so it doesn't NULL out the entire string
schemas/functions.sql: Added plain function _nullIf() and use it in trigger function _nullIf()
sql.py: DbConn.DbCursor._cache_result(): Corrected comment to reflect why different types of queries are cached differently
sql.py: add_col(): Catch DuplicateExceptions so that columns that already existed are ignored
sql.py: run_query(): DuplicateException: Also match "column already exists" errors
sql.py: Merged DuplicateTableException and DuplicateFunctionException into one exception DuplicateException, with a type variable for the type of duplicate item. Added ExceptionWithNameType.
schemas/functions.sql: Fixed bug where external function calls needed to be schema-qualified in case functions schema is not in the search_path
schemas/functions.sql: Added plain function _label() and use it in trigger function _label()
sql.py: put_table(): Support plain SQL functions in addition to relational functions
sql_gen.py: Added NamedArg. FunctionCall: Support named arguments (http://www.postgresql.org/docs/9.0/static/sql-syntax-calling-funcs.html).
schemas/functions.sql: Added plain function _merge() and use it in trigger function _merge()
schemas/functions.sql: Added plain function _alt() and use it in trigger function _alt()
schemas/functions.sql: Removed no longer used ensure_not_null()
sql.py: put_table(): MissingCastException: Use cast_temp_col() so that cast will occur before any main insert, which locks the output table and should take as little time as possible
sql.py: Added cast_temp_col()
sql.py: add_col(): Support additional run_query() kw_args. add_row_num(): Use new add_col().
sql.py: Added add_col()
sql_gen.py: Col.__str__(): Truncate any table name using concat() to ensure that the full column name is included in the string
strings.py, sql_gen.py: Renamed add_suffix() to concat() to reflect that this is a fixed-length replacement for +
sql.py: put_table(): Moved MissingCastException to the top of the exceptions list because it's more of a core exception than the others, and will be raised before any rows are even inserted
sql.py: DbConn.with_savepoint(): Always release savepoint, because after ROLLBACK TO SAVEPOINT, "The savepoint remains valid and can be rolled back to again" (http://www.postgresql.org/docs/8.3/static/sql-rollback-to.html). Moved `self._savepoint -= 1` to the main try block's new finally block.
sql.py: put_table(): Lock output table right before, and in the same nested transaction as, the insert statement that needs lock, so that it is not released in a prior autocommit and is held for as little time as possible