Project

General

Profile

Statistics
| Revision:

# Date Author Comment
2977 06/20/2012 07:46 PM Aaron Marcuse-Kubitza

main Makefile: Removed empty_db, because `make schemas/reinstall` has the same effect and is simpler

2976 06/20/2012 07:40 PM Aaron Marcuse-Kubitza

README.TXT: Changed documentation to use make schemas/reinstall to empty the DB, since that command is simpler. Added how to archive the last import.

2975 06/20/2012 07:11 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Removed `if not db.debug_temp` check because that is done by sql.empty_temp()

2974 06/20/2012 07:10 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Use new empty_temp()

2973 06/20/2012 07:06 PM Aaron Marcuse-Kubitza

import.stats.xls: Added comments for estimated numbers. Added "," separators to large numbers.

2972 06/20/2012 06:21 PM Aaron Marcuse-Kubitza

sql.py: empty_temp(): In debug_temp mode, leave temp tables there for debugging

2971 06/20/2012 06:06 PM Aaron Marcuse-Kubitza

sql.py: empty_temp(): Don't output at log_level 2 because it's an internal query, not part of the core algorithm

2970 06/20/2012 06:06 PM Aaron Marcuse-Kubitza

sql.py: truncate(): Added kw_args to pass to run_query()

2969 06/20/2012 05:52 PM Aaron Marcuse-Kubitza

Added inputs/import.stats.xls, which compares row-based and column-based import. This shows that column-based import is slowed down by table locking when run simultaneously, so we will need a new INSERT IGNORE replacement that doesn't lock tables.

2968 06/20/2012 03:14 PM Aaron Marcuse-Kubitza

inputs: Ignore OpenOffice.org lock files

2967 06/20/2012 02:19 PM Aaron Marcuse-Kubitza

sql.py: empty_temp(): Don't print log message if not emptying any tables

2966 06/20/2012 02:16 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Empty unneeded temp tables to free up memory

2965 06/20/2012 02:14 PM Aaron Marcuse-Kubitza

sql.py: Added empty_temp()

2964 06/20/2012 02:14 PM Aaron Marcuse-Kubitza

sql.py: Use new lists.mk_seq()

2963 06/20/2012 02:13 PM Aaron Marcuse-Kubitza

lists.py: Added mk_seq()

2962 06/20/2012 02:11 PM Aaron Marcuse-Kubitza

lists.py: is_seq(): Also return true for sets

2961 06/19/2012 03:02 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: specimenreplicate: Added indexes using COALESCE to match what sql_gen does

2960 06/19/2012 02:08 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Getting output table pkeys of existing/inserted rows: Do a DISTINCT ON the input pkey (row_num) in case the plain JOIN matched multiple output table rows for one input table row

2959 06/19/2012 01:44 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Empty unneeded temp tables to free up memory and avoid running out of memory (the temp tables seem to be in-memory only)

2958 06/19/2012 01:30 PM Aaron Marcuse-Kubitza

sql_gen.py: null_sentinels: Added value for type timestamp with time zone. Put each type on its own line for clarity.

2957 06/19/2012 01:03 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: locationdetermination: Changed indexes to use COALESCE to match what sql_gen now does

2956 06/19/2012 12:23 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: location: Added indexes using COALESCE to match what sql_gen does

2955 06/19/2012 12:06 PM Aaron Marcuse-Kubitza

sql.py: cast_temp_col(): Add an index on the created column

2954 06/19/2012 11:55 AM Aaron Marcuse-Kubitza

sql_gen.py: null_sentinels: Added value for type double precision

2953 06/19/2012 11:52 AM Aaron Marcuse-Kubitza

sql_gen.py: ensure_not_null(): Warn of no null sentinel for type, even if caller catches error

2952 06/19/2012 10:11 AM Aaron Marcuse-Kubitza

schemas/py_functions.sql: Added plain function _namePart() and use it in trigger function _namePart()

2951 06/19/2012 09:42 AM Aaron Marcuse-Kubitza

schemas/py_functions.sql: Added plain functions _dateRangeStart() and _dateRangeEnd() and use them in trigger functions _dateRangeStart() and _dateRangeEnd()

2950 06/19/2012 09:28 AM Aaron Marcuse-Kubitza

schemas/functions.sql: _label(): Ensure that label is NOT NULL so it doesn't NULL out the entire string

2949 06/19/2012 09:23 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added plain function _nullIf() and use it in trigger function _nullIf()

2948 06/19/2012 08:56 AM Aaron Marcuse-Kubitza

sql.py: DbConn.DbCursor._cache_result(): Corrected comment to reflect why different types of queries are cached differently

2947 06/19/2012 08:46 AM Aaron Marcuse-Kubitza

sql.py: add_col(): Catch DuplicateExceptions so that columns that already existed are ignored

2946 06/19/2012 08:43 AM Aaron Marcuse-Kubitza

sql.py: run_query(): DuplicateException: Also match "column already exists" errors

2945 06/19/2012 08:20 AM Aaron Marcuse-Kubitza

sql.py: Merged DuplicateTableException and DuplicateFunctionException into one exception DuplicateException, with a type variable for the type of duplicate item. Added ExceptionWithNameType.

2944 06/19/2012 08:05 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Fixed bug where external function calls needed to be schema-qualified in case functions schema is not in the search_path

2943 06/19/2012 07:59 AM Aaron Marcuse-Kubitza

schemas/functions.sql: Added plain function _label() and use it in trigger function _label()

2942 06/19/2012 07:50 AM Aaron Marcuse-Kubitza

sql.py: put_table(): Support plain SQL functions in addition to relational functions

2941 06/18/2012 11:08 PM Aaron Marcuse-Kubitza

sql_gen.py: Added NamedArg. FunctionCall: Support named arguments (http://www.postgresql.org/docs/9.0/static/sql-syntax-calling-funcs.html).

2940 06/18/2012 10:54 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added plain function _merge() and use it in trigger function _merge()

2939 06/18/2012 10:49 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Added plain function _alt() and use it in trigger function _alt()

2938 06/18/2012 10:37 PM Aaron Marcuse-Kubitza

schemas/functions.sql: Removed no longer used ensure_not_null()

2937 06/18/2012 10:22 PM Aaron Marcuse-Kubitza

sql.py: put_table(): MissingCastException: Use cast_temp_col() so that cast will occur before any main insert, which locks the output table and should take as little time as possible

2936 06/18/2012 10:18 PM Aaron Marcuse-Kubitza

sql.py: Added cast_temp_col()

2935 06/18/2012 10:17 PM Aaron Marcuse-Kubitza

sql.py: add_col(): Support additional run_query() kw_args. add_row_num(): Use new add_col().

2934 06/18/2012 10:09 PM Aaron Marcuse-Kubitza

sql.py: Added add_col()

2933 06/18/2012 08:17 PM Aaron Marcuse-Kubitza

sql_gen.py: Col.__str__(): Truncate any table name using concat() to ensure that the full column name is included in the string

2932 06/18/2012 07:59 PM Aaron Marcuse-Kubitza

strings.py, sql_gen.py: Renamed add_suffix() to concat() to reflect that this is a fixed-length replacement for +

2931 06/18/2012 07:49 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Moved MissingCastException to the top of the exceptions list because it's more of a core exception than the others, and will be raised before any rows are even inserted

2930 06/18/2012 06:20 PM Aaron Marcuse-Kubitza

sql.py: DbConn.with_savepoint(): Always release savepoint, because after ROLLBACK TO SAVEPOINT, "The savepoint remains valid and can be rolled back to again" (http://www.postgresql.org/docs/8.3/static/sql-rollback-to.html). Moved `self._savepoint -= 1` to the main try block's new finally block.

2929 06/18/2012 05:59 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Lock output table right before, and in the same nested transaction as, the insert statement that needs lock, so that it is not released in a prior autocommit and is held for as little time as possible

2928 06/18/2012 05:38 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Removed no longer needed commit param

2927 06/18/2012 05:16 PM Aaron Marcuse-Kubitza

bin/map: Removed rollback() call before closing the connection because PostgreSQL does this automatically

2926 06/18/2012 05:14 PM Aaron Marcuse-Kubitza

csv2db: Log inserts with log_level=5 so they are not shown for verbosity 4, which is used to see the savepoints and autocommits

2925 06/18/2012 05:13 PM Aaron Marcuse-Kubitza

Removed unnecessary db.db.commit() calls because commits are now done automatically by DbConn's autocommit mode

2924 06/18/2012 04:54 PM Aaron Marcuse-Kubitza

sql.py: DbConn.do_autocommit(): Output the "Autocommitting" debug message with level=4 so that it doesn't clutter up the logging output for normal verbosities

2923 06/18/2012 04:50 PM Aaron Marcuse-Kubitza

DbConn: autocommit mode defaults to True so that all scripts get the benefit of automatic commits

2922 06/18/2012 04:49 PM Aaron Marcuse-Kubitza

input.Makefile: Staging tables: import/install-%: Include the table name in the log file name so that successive tables for the same datasource don't overwrite the same log file

2921 06/18/2012 04:39 PM Aaron Marcuse-Kubitza

sql.py: DbConn: Don't always autocommit in debug_temp mode, because this could cause autocommit mode to be turned on when the user does not expect it

2920 06/18/2012 04:36 PM Aaron Marcuse-Kubitza

bin/map: connect_db(): Autocommit in commit mode to avoid the need for manual commits. This should also reduce the time that table locks are held, to avoid unnecessary contention when multiple processes are trying to insert into the same output table. (The program always uses nested transactions to support rollbacks, so there is no problem autocommitting whenever a top-level nested transaction or top-level query completes.)

2919 06/18/2012 04:29 PM Aaron Marcuse-Kubitza

sql_gen.py: Removed TempFunction because that functionality is now provided by DbConn.TempFunction()

2918 06/18/2012 04:28 PM Aaron Marcuse-Kubitza

sql.py: Use new DbConn.TempFunction()

2917 06/18/2012 04:28 PM Aaron Marcuse-Kubitza

sql.py: DbConn: Added TempFunction()

2916 06/18/2012 04:25 PM Aaron Marcuse-Kubitza

sql.py: Use new DbConn.debug_temp config option to control whether temporary objects should instead be permanent

2915 06/18/2012 04:20 PM Aaron Marcuse-Kubitza

sql.py: DbConn: Added config option debug_temp

2914 06/18/2012 04:12 PM Aaron Marcuse-Kubitza

sql.py: function_exists(): Fixed bug where trigger functions needed to be excluded, since they cannot be called directly

2913 06/18/2012 03:49 PM Aaron Marcuse-Kubitza

sql.py: Added function_exists()

2912 06/18/2012 03:49 PM Aaron Marcuse-Kubitza

sql_gen.py: Made Function an alias of Table so that isinstance(..., Function) will always work correctly

2911 06/18/2012 03:45 PM Aaron Marcuse-Kubitza

sql_gen.py: Added as_Function()

2910 06/15/2012 06:16 AM Aaron Marcuse-Kubitza

sql.py: put_table(): Lock the output table in EXCLUSIVE mode before getting its pkey so that an ACCESS SHARE lock is not acquired before EXCLUSIVE (causing a lock upgrade and deadlock). This race condition may not have been previously noticeable because pkey() is cached, so calling it doesn't necessarily execute a query or acquire an ACCESS SHARE lock.

2909 06/15/2012 05:52 AM Aaron Marcuse-Kubitza

sql.py: put_table(): Document that must be run at the beginning of a transaction

2908 06/15/2012 05:49 AM Aaron Marcuse-Kubitza

sql.py: put_table(), mk_select(): Switched back to having put_table() acquire the EXCLUSIVE locks, but right at the beginning of the transaction, in order to avoid lock upgrades which cause deadlocks

2907 06/15/2012 05:35 AM Aaron Marcuse-Kubitza

sql.py: with_autocommit(): Only allow turning autocommit on, because the opposite is not meaningful and may conflict with the session-global isolation level

2906 06/15/2012 05:33 AM Aaron Marcuse-Kubitza

sql.py: DbConn: Set the transaction isolation level to READ COMMITTED using set_isolation_level() so that the isolation level affects all transactions in the session, not just the current one

2905 06/15/2012 05:21 AM Aaron Marcuse-Kubitza

sql.py: DbConn: Always set the transaction isolation level to READ COMMITTED so that when a table is locked for update, its contents are frozen at that point rather than earlier. This ensures that no concurrent duplicate keys were inserted between the time the table was snapshotted (at the beginning of the transaction for SERIALIZABLE) and the time it was locked for update.

2904 06/15/2012 05:02 AM Aaron Marcuse-Kubitza

sql.py: put_table(): Removed locking output tables to prevent concurrent duplicate keys because that is now done automatically by mk_select()

2903 06/15/2012 05:01 AM Aaron Marcuse-Kubitza

sql.py: mk_select(): Filtering on no match: Lock the joined table in EXCLUSIVE mode to prevent concurrent duplicate keys when used with INSERT SELECT

2902 06/15/2012 04:59 AM Aaron Marcuse-Kubitza

sql_gen.py: Added underlying_table() and use it in underlying_col()

2901 06/15/2012 04:39 AM Aaron Marcuse-Kubitza

main Makefile: schemas/rotate: Fixed bug where needed to run schemas/public/install, not full schemas/install, after renaming public schema

2900 06/15/2012 04:32 AM Aaron Marcuse-Kubitza

sql.py: put_table(): Lock output tables to prevent concurrent duplicate keys

2899 06/15/2012 04:31 AM Aaron Marcuse-Kubitza

sql.py: Added lock_table()

2898 06/15/2012 03:53 AM Aaron Marcuse-Kubitza

bin/map: connect_db(): Only use autocommit mode if verbosity > 3, to avoid accidentally activating it if you want debug output in normal import mode

2897 06/15/2012 03:45 AM Aaron Marcuse-Kubitza

bin/map: connect_db(): Only use autocommit mode if verbosity > 2, because it causes the intermediate tables to be created as permanent tables, which you don't want unless you're actually debugging (verbosity = 2 is normal for column-based import)

2896 06/15/2012 03:25 AM Aaron Marcuse-Kubitza

sql.py: put_table(): remove_all_rows(): Changed log message to "Ignoring all rows" because NULL is not necessarily the pkey value that will be returned for the rows

2895 06/15/2012 03:17 AM Aaron Marcuse-Kubitza

sql.py: put_table(): Don't add index on columns that will have values filtered out, because indexes have already been added on all columns in the iteration's input table by flatten()

2894 06/15/2012 03:12 AM Aaron Marcuse-Kubitza

sql.py: DbConn._db(): Setting serializable isolation level: Always set this (if self.serializable is set), even in autocommit mode, because autocommit mode is implemented by manual commits in the DbConn wrapper object rather than using the underlying connection's autocommit mode (which does not allow setting the isolation level)

2893 06/15/2012 03:08 AM Aaron Marcuse-Kubitza

sql.py: DbConn._db(): Setting search_path: Use `SET search_path` and `SHOW search_path` instead of combining the old and new search_paths in SQL itself using `SELECT set_config('search_path', ...)`

2892 06/15/2012 02:47 AM Aaron Marcuse-Kubitza

csv2db: ProgressInputStream: Use default progress message 'Read %d line(s)' because there is not necessarily one CSV row per line, due to embedded newlines

2891 06/15/2012 01:47 AM Aaron Marcuse-Kubitza

input.Makefile: Staging tables: import/install-%: Only output to the log file if log option is non-empty (which it is by default)

2890 06/15/2012 01:46 AM Aaron Marcuse-Kubitza

csv2db: Support reinstalling just the errors table using new errors_table_only option

2889 06/15/2012 01:45 AM Aaron Marcuse-Kubitza

sql.py: Added drop_table()

2888 06/15/2012 01:20 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: method: Changed indexes to use `COALESCE` to match what sql_gen now does

2887 06/15/2012 01:16 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: specimenreplicate: Added indexes using COALESCE to match what sql_gen does

2886 06/15/2012 01:12 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: locationevent: Added indexes using COALESCE to match what sql_gen does

2885 06/15/2012 12:57 AM Aaron Marcuse-Kubitza

schemas/vegbien.ERD.mwb: Synced with schema

2884 06/15/2012 12:54 AM Aaron Marcuse-Kubitza

schemas/vegbien.sql: party: Changed indexes to use `COALESCE` to match what sql_gen now does

2883 06/15/2012 12:38 AM Aaron Marcuse-Kubitza

Wrap sys.stderr.write() calls in strings.to_raw_str() to avoid UnicodeEncodeErrors when stderr is to a file and the default encoding is ASCII

2882 06/15/2012 12:37 AM Aaron Marcuse-Kubitza

strings.py: Added to_raw_str()

2881 06/15/2012 12:12 AM Aaron Marcuse-Kubitza

bin/map: When logging the row # being processed, add 1 because row # is interally 0-based, but 1-based to the user

2880 06/15/2012 12:05 AM Aaron Marcuse-Kubitza

bin/map: Log the row # being processed with level=1.1 so that the user can see a status report if desired

2879 06/14/2012 11:35 PM Aaron Marcuse-Kubitza

exc.py: str_(): Fixed bug where UnicodeEncodeError would be raised when msg contains non-ASCII chars, by wrapping e.args0 in strings.ustr()

2878 06/14/2012 11:23 PM Aaron Marcuse-Kubitza

exc.py: print_ex(): Wrap msg in strings.to_unicode() to try to avoid UnicodeEncodeError when msg contains non-ASCII chars