Added inputs/import.stats.xls, which compares row-based and column-based import. This shows that column-based import is slowed down by table locking when run simultaneously, so we will need a new INSERT IGNORE replacement that doesn't lock tables.
inputs: Ignore OpenOffice.org lock files
sql.py: empty_temp(): Don't print log message if not emptying any tables
db_xml.py: put_table(): Empty unneeded temp tables to free up memory
sql.py: Added empty_temp()
sql.py: Use new lists.mk_seq()
lists.py: Added mk_seq()
lists.py: is_seq(): Also return true for sets
schemas/vegbien.sql: specimenreplicate: Added indexes using COALESCE to match what sql_gen does
sql.py: put_table(): Getting output table pkeys of existing/inserted rows: Do a DISTINCT ON the input pkey (row_num) in case the plain JOIN matched multiple output table rows for one input table row
sql.py: put_table(): Empty unneeded temp tables to free up memory and avoid running out of memory (the temp tables seem to be in-memory only)
sql_gen.py: null_sentinels: Added value for type timestamp with time zone. Put each type on its own line for clarity.
schemas/vegbien.sql: locationdetermination: Changed indexes to use COALESCE to match what sql_gen now does
schemas/vegbien.sql: location: Added indexes using COALESCE to match what sql_gen does
sql.py: cast_temp_col(): Add an index on the created column
sql_gen.py: null_sentinels: Added value for type double precision
sql_gen.py: ensure_not_null(): Warn of no null sentinel for type, even if caller catches error
schemas/py_functions.sql: Added plain function _namePart() and use it in trigger function _namePart()
schemas/py_functions.sql: Added plain functions _dateRangeStart() and _dateRangeEnd() and use them in trigger functions _dateRangeStart() and _dateRangeEnd()
schemas/functions.sql: _label(): Ensure that label is NOT NULL so it doesn't NULL out the entire string
schemas/functions.sql: Added plain function _nullIf() and use it in trigger function _nullIf()
sql.py: DbConn.DbCursor._cache_result(): Corrected comment to reflect why different types of queries are cached differently
sql.py: add_col(): Catch DuplicateExceptions so that columns that already existed are ignored
sql.py: run_query(): DuplicateException: Also match "column already exists" errors
sql.py: Merged DuplicateTableException and DuplicateFunctionException into one exception DuplicateException, with a type variable for the type of duplicate item. Added ExceptionWithNameType.
schemas/functions.sql: Fixed bug where external function calls needed to be schema-qualified in case functions schema is not in the search_path
schemas/functions.sql: Added plain function _label() and use it in trigger function _label()
sql.py: put_table(): Support plain SQL functions in addition to relational functions
sql_gen.py: Added NamedArg. FunctionCall: Support named arguments (http://www.postgresql.org/docs/9.0/static/sql-syntax-calling-funcs.html).
schemas/functions.sql: Added plain function _merge() and use it in trigger function _merge()
schemas/functions.sql: Added plain function _alt() and use it in trigger function _alt()
schemas/functions.sql: Removed no longer used ensure_not_null()
sql.py: put_table(): MissingCastException: Use cast_temp_col() so that cast will occur before any main insert, which locks the output table and should take as little time as possible
sql.py: Added cast_temp_col()
sql.py: add_col(): Support additional run_query() kw_args. add_row_num(): Use new add_col().
sql.py: Added add_col()
sql_gen.py: Col.__str__(): Truncate any table name using concat() to ensure that the full column name is included in the string
strings.py, sql_gen.py: Renamed add_suffix() to concat() to reflect that this is a fixed-length replacement for +
sql.py: put_table(): Moved MissingCastException to the top of the exceptions list because it's more of a core exception than the others, and will be raised before any rows are even inserted
sql.py: DbConn.with_savepoint(): Always release savepoint, because after ROLLBACK TO SAVEPOINT, "The savepoint remains valid and can be rolled back to again" (http://www.postgresql.org/docs/8.3/static/sql-rollback-to.html). Moved `self._savepoint -= 1` to the main try block's new finally block.
sql.py: put_table(): Lock output table right before, and in the same nested transaction as, the insert statement that needs lock, so that it is not released in a prior autocommit and is held for as little time as possible
db_xml.py: put_table(): Removed no longer needed commit param
bin/map: Removed rollback() call before closing the connection because PostgreSQL does this automatically
csv2db: Log inserts with log_level=5 so they are not shown for verbosity 4, which is used to see the savepoints and autocommits
Removed unnecessary db.db.commit() calls because commits are now done automatically by DbConn's autocommit mode
sql.py: DbConn.do_autocommit(): Output the "Autocommitting" debug message with level=4 so that it doesn't clutter up the logging output for normal verbosities
DbConn: autocommit mode defaults to True so that all scripts get the benefit of automatic commits
input.Makefile: Staging tables: import/install-%: Include the table name in the log file name so that successive tables for the same datasource don't overwrite the same log file
sql.py: DbConn: Don't always autocommit in debug_temp mode, because this could cause autocommit mode to be turned on when the user does not expect it
bin/map: connect_db(): Autocommit in commit mode to avoid the need for manual commits. This should also reduce the time that table locks are held, to avoid unnecessary contention when multiple processes are trying to insert into the same output table. (The program always uses nested transactions to support rollbacks, so there is no problem autocommitting whenever a top-level nested transaction or top-level query completes.)
sql_gen.py: Removed TempFunction because that functionality is now provided by DbConn.TempFunction()
sql.py: Use new DbConn.TempFunction()
sql.py: DbConn: Added TempFunction()
sql.py: Use new DbConn.debug_temp config option to control whether temporary objects should instead be permanent
sql.py: DbConn: Added config option debug_temp
sql.py: function_exists(): Fixed bug where trigger functions needed to be excluded, since they cannot be called directly
sql.py: Added function_exists()
sql_gen.py: Made Function an alias of Table so that isinstance(..., Function) will always work correctly
sql_gen.py: Added as_Function()
sql.py: put_table(): Lock the output table in EXCLUSIVE mode before getting its pkey so that an ACCESS SHARE lock is not acquired before EXCLUSIVE (causing a lock upgrade and deadlock). This race condition may not have been previously noticeable because pkey() is cached, so calling it doesn't necessarily execute a query or acquire an ACCESS SHARE lock.
sql.py: put_table(): Document that must be run at the beginning of a transaction
sql.py: put_table(), mk_select(): Switched back to having put_table() acquire the EXCLUSIVE locks, but right at the beginning of the transaction, in order to avoid lock upgrades which cause deadlocks
sql.py: with_autocommit(): Only allow turning autocommit on, because the opposite is not meaningful and may conflict with the session-global isolation level
sql.py: DbConn: Set the transaction isolation level to READ COMMITTED using set_isolation_level() so that the isolation level affects all transactions in the session, not just the current one
sql.py: DbConn: Always set the transaction isolation level to READ COMMITTED so that when a table is locked for update, its contents are frozen at that point rather than earlier. This ensures that no concurrent duplicate keys were inserted between the time the table was snapshotted (at the beginning of the transaction for SERIALIZABLE) and the time it was locked for update.
sql.py: put_table(): Removed locking output tables to prevent concurrent duplicate keys because that is now done automatically by mk_select()
sql.py: mk_select(): Filtering on no match: Lock the joined table in EXCLUSIVE mode to prevent concurrent duplicate keys when used with INSERT SELECT
sql_gen.py: Added underlying_table() and use it in underlying_col()
main Makefile: schemas/rotate: Fixed bug where needed to run schemas/public/install, not full schemas/install, after renaming public schema
sql.py: put_table(): Lock output tables to prevent concurrent duplicate keys
sql.py: Added lock_table()
bin/map: connect_db(): Only use autocommit mode if verbosity > 3, to avoid accidentally activating it if you want debug output in normal import mode
bin/map: connect_db(): Only use autocommit mode if verbosity > 2, because it causes the intermediate tables to be created as permanent tables, which you don't want unless you're actually debugging (verbosity = 2 is normal for column-based import)
sql.py: put_table(): remove_all_rows(): Changed log message to "Ignoring all rows" because NULL is not necessarily the pkey value that will be returned for the rows
sql.py: put_table(): Don't add index on columns that will have values filtered out, because indexes have already been added on all columns in the iteration's input table by flatten()
sql.py: DbConn._db(): Setting serializable isolation level: Always set this (if self.serializable is set), even in autocommit mode, because autocommit mode is implemented by manual commits in the DbConn wrapper object rather than using the underlying connection's autocommit mode (which does not allow setting the isolation level)
sql.py: DbConn._db(): Setting search_path: Use `SET search_path` and `SHOW search_path` instead of combining the old and new search_paths in SQL itself using `SELECT set_config('search_path', ...)`
csv2db: ProgressInputStream: Use default progress message 'Read %d line(s)' because there is not necessarily one CSV row per line, due to embedded newlines
input.Makefile: Staging tables: import/install-%: Only output to the log file if log option is non-empty (which it is by default)
csv2db: Support reinstalling just the errors table using new errors_table_only option
sql.py: Added drop_table()
schemas/vegbien.sql: method: Changed indexes to use `COALESCE` to match what sql_gen now does
schemas/vegbien.sql: locationevent: Added indexes using COALESCE to match what sql_gen does
schemas/vegbien.ERD.mwb: Synced with schema
schemas/vegbien.sql: party: Changed indexes to use `COALESCE` to match what sql_gen now does
Wrap sys.stderr.write() calls in strings.to_raw_str() to avoid UnicodeEncodeErrors when stderr is to a file and the default encoding is ASCII
strings.py: Added to_raw_str()
bin/map: When logging the row # being processed, add 1 because row # is interally 0-based, but 1-based to the user
bin/map: Log the row # being processed with level=1.1 so that the user can see a status report if desired
exc.py: str_(): Fixed bug where UnicodeEncodeError would be raised when msg contains non-ASCII chars, by wrapping e.args0 in strings.ustr()
exc.py: print_ex(): Wrap msg in strings.to_unicode() to try to avoid UnicodeEncodeError when msg contains non-ASCII chars
sql.py: create_table(): Don't set pkey.nullable to False because the caller should make sure the pkey has the appropriate type
csv2db: Use sql_gen.TypedCol.nullable instead of manually adding 'NOT NULL' to the type. Ensure that pkeys are properly NOT NULL.
csv2db: Adding indexes: Create plain indexes using ensure_not_null=False because the indexes will primarily be used by the user to search for specific values, rather than by the mapping script which uses the ensure_not_null
sql.py: DbConn.col_info(): Run query with log_level=4 because it gathers information about database structure, and should have the same log_level as other queries that do that
csv2db: Adding indexes: Fixed bug where col.to_Col() could not be used because sql.add_index() does not support name-only columns (plain strings are OK, though)
sql.py: create_table(): has_pkey: Use new TypedCol.constraints to store 'PRIMARY KEY'
sql_gen.py: TypedCol: Added constraints instance var
sql_gen.py: EnsureNotNull: Made coalesce() all uppercase to match how pg_dump spells it