Activity - BIEN 3 - NCEAS Projects

Activity

From 05/31/2012 to 06/29/2012

06/29/2012

08:09 AM Revision 3175: db_xml.py: put_table(): Subsetting in_table: Prepend schema to subset table name so that in pg_stat_activity, it's clear which datasource a particular query is from: Aaron Marcuse-Kubitza
07:46 AM Revision 3174: sql_io.py: cast_temp_col(): add_col()'s distinguishing comment param: Add the type in case the same input column is being cast to different types, and both types have the same first word (causing their new column names to be the same): Aaron Marcuse-Kubitza
07:42 AM Revision 3173: sql_io.py: cast_temp_col(): Name the new column with only the first word of the type, to save space in the limited identifier length: Aaron Marcuse-Kubitza
07:41 AM Revision 3172: strings.py: Added first_word(): Aaron Marcuse-Kubitza
07:35 AM Revision 3171: sql_io.py: cast_temp_col(): Use sql_gen.suffixed_col() to create the new column name: Aaron Marcuse-Kubitza
06:16 AM Revision 3170: inputs/import.stats.xls: Added run time for SALVIAS organisms, which just finished: Aaron Marcuse-Kubitza
06:14 AM Revision 3169: inputs/import.stats.xls: Use [1]-style footnotes because copying and pasting to Gmail doesn't preserve the superscripts: Aaron Marcuse-Kubitza
06:11 AM Revision 3168: inputs/import.stats.xls: Updated for latest simultaneous column-based import: Aaron Marcuse-Kubitza
04:42 AM Revision 3167: sql_io.py: cast_temp_col(): Don't automatically create an index on the new column, because it doesn't necessarily need an index and the main index used for the join is now added automatically by distinct_table(): Aaron Marcuse-Kubitza
04:39 AM Revision 3166: sql.py: flatten(): Don't automatically create indexes on all columns, because most columns don't need indexes and the main index used for the join is now added automatically by distinct_table(): Aaron Marcuse-Kubitza
04:35 AM Revision 3165: sql.py: Removed no longer needed add_index_col() and ensure_not_null() because we are not using index columns: Aaron Marcuse-Kubitza
04:33 AM Revision 3164: sql.py: add_index(): Don't create index columns for nullable columns, because they require indexes to be created on all columns in order to use a distinct_table() temp table. Also, now that we are no longer using LEFT JOINs, the COALESCE() call would only be evaluated once (in the plain JOIN) in the event that PostgreSQL doesn't use an index on a COALESCE() expression.: Aaron Marcuse-Kubitza
03:30 AM Revision 3163: schemas/vegbien.sql: location: Dropped unique constraint on lat/long because it covered only some rows, which interfered with column-based import's selection of different insert methods based on the presence or absence of duplicate keys. (With the constraint, locations with coordinates would have duplicates eliminated, but locations without coordinates would not be able to find which row was added for a particular location because there was no lookup key to join on, and would all just use the first inserted row.) The previous behavior didn't make much sense anyway, because it would assert that two locationevents occurred in the same place just because they had the same coordinates, which may not have been precise enough to make this determination. Asserting that two locationevents occurred in the same place is really part of the secondary validation, not the import process.: Aaron Marcuse-Kubitza
01:58 AM Revision 3162: sql.py: DbConn: Fixed bug where Exceptions did not have the query appended if the query was not run in cacheable mode, by moving _add_cursor_info() from DbCursor.execute() to run_query() so it would also get called for non-cacheable queries that use a native cursor rather than a wrapper. Fixed bug where non-cacheable queries were not autocommitted, by moving self.do_autocommit() from DbCursor.execute() to run_query() so it would also get called for non-cacheable queries that use a native cursor rather than a wrapper.: Aaron Marcuse-Kubitza
01:54 AM Revision 3161: sql.py: DbConn._db(): Record that a transaction is already open before setting the search_path so that a query is never run with an _savepoint value less than 1 (manual transactions are not supported yet): Aaron Marcuse-Kubitza
01:52 AM Revision 3160: sql.py: DbConn.with_savepoint(): Increment _savepoint before running queries so they don't get autocommitted: Aaron Marcuse-Kubitza
01:10 AM Revision 3159: sql.py: empty_temp(): Empty temp tables even in debug_temp mode, so that it can be seen which tables have been garbage collected and disk space leaks can be detected. This will not affect the external re-runnability of slow queries in debug_temp mode, as long as the user aborts the debug_temp import while the slow query is still running.: Aaron Marcuse-Kubitza
01:07 AM Revision 3158: sql_gen.py: ColDict: Use OrderedDict so that order of keys in input dict (if ordered) will be preserved. This should ensure that tempt table unique indexes have their columns in the same order as the output table, so that a merge join can be used.: Aaron Marcuse-Kubitza
01:01 AM Revision 3157: util.py: dict_subset(): Use OrderedDict so that order of keys in input dict (if ordered) will be preserved: Aaron Marcuse-Kubitza
12:55 AM Revision 3156: main Makefile: python-Darwin: Added pip installation instructions. python-Linux: Added ordereddict.: Aaron Marcuse-Kubitza
12:04 AM Revision 3155: sql.py: DbConn.col_info(): cacheable param defaults to True now that callers explicitly turn off cacheable when needed: Aaron Marcuse-Kubitza
12:00 AM Revision 3154: sql.py: add_index_col(): Explicitly set update()'s col_info caching depending on whether col_info will be changed later by add_not_null(): Aaron Marcuse-Kubitza

06/28/2012

11:55 PM Revision 3153: sql.py: mk_update(): Renamed cacheable param to cacheable_ so it wouldn't conflict with update()'s cacheable param: Aaron Marcuse-Kubitza
11:54 PM Revision 3152: sql.py: mk_update(): Added cacheable param to set whether column structure information used to generate the query can be cached: Aaron Marcuse-Kubitza
11:40 PM Revision 3151: sql.py: add_index_col(): Explicitly set col_info()'s caching depending on whether col_info will be changed later by add_not_null(): Aaron Marcuse-Kubitza
11:35 PM Revision 3150: sql.py: DbConn.col_info(): Allow caller to specify whether query is cacheable: Aaron Marcuse-Kubitza
11:22 PM Revision 3149: csv2db: Fixed bug where CREATE TABLE statement was cached, causing it not to be re-executed after a rollback due to a failed COPY FROM. Avoid re-creating the table after a failed COPY FROM, and instead just remove any existing rows.: Aaron Marcuse-Kubitza
11:09 PM Revision 3148: sql.py: add_index(): Don't generate a unique name for the index because the database does that automatically: Aaron Marcuse-Kubitza
11:00 PM Revision 3147: csv2db: Vacuum table instead of just reanalyzing it because for some reason reanalyzing it isn't enough to fix the cached row count (causing pgAdmin3 to report that the table needs to be vacuumed): Aaron Marcuse-Kubitza
10:54 PM Revision 3146: csv2db: Don't add indexes on the created table because they use up more disk space than the table itself and currently aren't used. (The import process adds indexes on each iteration's column subset instead.): Aaron Marcuse-Kubitza
10:21 PM Revision 3145: db_xml.py: partition_size: Turning partitioning back on (with a larger limit), since the largest datasources' temp tables are still too big: Aaron Marcuse-Kubitza
10:20 PM Revision 3144: sql_io.py: put_table(): Fixed bug where if there were multiple unique constraints that were violated, only the distinct temp table for the last one would get garbage-collected: Aaron Marcuse-Kubitza
09:01 PM Revision 3143: db_xml.py: partition_size: Set to sys.maxint to disable partitioning. The last bugfix, which avoided returning a large result set to the client which was never read, seems to have fixed the disk space leak, so it's worth reattempting a full simultaneous import.: Aaron Marcuse-Kubitza
08:30 PM Revision 3142: db_xml.py: put_table(): Subsetting in_table: Truncate in_table when finished with it, to avoid temp table disk space leaks: Aaron Marcuse-Kubitza
07:56 PM Revision 3141: sql.py: insert_select(): If caller is only interested in the rowcount (if returning == None), keep the NULL rows for each insert on the server using CREATE TABLE AS. (CREATE TABLE AS sets rowcount to # rows in query, so rowcount will still be set correctly.): Aaron Marcuse-Kubitza
04:59 PM Revision 3140: top-level map: Added support for custom public schema, to be able to run imports and tests simultaneously (e.g. on a dev machine): Aaron Marcuse-Kubitza

06/27/2012

10:56 PM Revision 3139: csv2db: Fixed bug where table needed to be a sql_gen.Table object with the proper schema, so that errors_table would be created in the correct schema. Removed no longer needed changing of the search_path.: Aaron Marcuse-Kubitza
10:55 PM Revision 3138: csv2db: Fixed bug where table needed to be a sql_gen.Table object with the proper schema, so that errors_table would be created in the correct schema. Removed no longer needed changing of the search_path.: Aaron Marcuse-Kubitza
10:12 PM Revision 3137: sql.py: DbConn.with_savepoint(): Open a new transaction if one is not already open: Aaron Marcuse-Kubitza
10:11 PM Revision 3136: sql.py: DbConn: _savepoint starts at 1 because the driver is not in autocommit mode, so a transaction is already open: Aaron Marcuse-Kubitza
10:05 PM Revision 3135: sql.py: DbConn: _savepoint starts at 1 because the driver is not in autocommit mode, so a transaction is already open: Aaron Marcuse-Kubitza
09:31 PM Revision 3134: csv2db: Create errors table first, so that imports can start using it right away: Aaron Marcuse-Kubitza
09:25 PM Revision 3133: input.Makefile: Added import/steps.by_col.sql to generate a Redmine-formatted list of steps for column-based import: Aaron Marcuse-Kubitza
08:56 PM Revision 3132: bin/map: Optimized default verbosities for the mode: automated tests should not be verbose, column-based import should show all queries to assist profiling, and row-based import should just show row progress: Aaron Marcuse-Kubitza
08:43 PM Revision 3131: sql_io.py: put(): Run data import queries with log_level=3.5 so they don't clutter the output at the normal import verbosity of 3: Aaron Marcuse-Kubitza
08:27 PM Revision 3130: db_xml.py: put_table(): Work around PostgreSQL's temp table disk space leak by reconnecting to the DB after every partition: Aaron Marcuse-Kubitza
08:26 PM Revision 3129: sql.py: mk_select(): Also support limit and start values of type long: Aaron Marcuse-Kubitza
08:13 PM Revision 3128: sql_gen.py: suffixed_table(): Fixed bug where needed to copy *all* table attrs, such as is_temp status: Aaron Marcuse-Kubitza
08:05 PM Revision 3127: sql.py: create_table(): Fixed bug where needed to run query in recover mode in case the table exists and was created before the current connection, such that the CREATE TABLE statement would not have been cached: Aaron Marcuse-Kubitza
07:50 PM Revision 3126: sql.py: create_table(): Removed final newline after query because that's added by the logging mechanism: Aaron Marcuse-Kubitza
07:43 PM Revision 3125: sql.py: Added reconnect(): Aaron Marcuse-Kubitza
07:37 PM Revision 3124: sql.py: DbConn._reset(): Assert that _savepoint is 0 instead of setting it to 0: Aaron Marcuse-Kubitza
07:31 PM Revision 3123: db_xml.py: put_table(): put_table_(): Removed no longer used limit, start params: Aaron Marcuse-Kubitza
07:23 PM Revision 3122: db_xml.py: put_table(): Merged partitioning and subsetting into same section for simplicity, to avoid creating extra temp tables, and to later allow the connection to be closed and reopened between partitions. partition_size: Expressed value without exponent notation to ensure that it's an integer.: Aaron Marcuse-Kubitza
07:11 PM Revision 3121: db_xml.py: put_table(): Partitioning in_table: Adjust bounds of last partition to actual row #s included: Aaron Marcuse-Kubitza
06:43 PM Revision 3120: sql.py: DbConn: Added _ to reset() to indicate that it's a protected method and users should not call it directly: Aaron Marcuse-Kubitza
06:41 PM Revision 3119: sql.py: DbConn.close(): Reset the connection completely using reset(): Aaron Marcuse-Kubitza
06:40 PM Revision 3118: sql.py: DbConn: Added clear_cache() and reset() and use reset() in __init__(): Aaron Marcuse-Kubitza
06:31 PM Revision 3117: bin/map: Use new DbConn.close(): Aaron Marcuse-Kubitza
06:31 PM Revision 3116: sql.py: DbConn: Added close(): Aaron Marcuse-Kubitza
06:07 PM Revision 3115: db_xml.py: partition_size: Set to just more than the size of the largest data source that was successfully imported in simultaneous import: Aaron Marcuse-Kubitza
05:32 PM Revision 3114: db_xml.py: put_table(): Partition in_table if larger than a threshold. The threshold is initially set to disable partitioning. Partitioning will hopefully eliminate the excessive disk usage for large input tables, which has caused the system to run out of disk space due to what may be a bug in PostgreSQL.: Aaron Marcuse-Kubitza
05:27 PM Revision 3113: db_xml.py: put_table(): Set in_table's default srcs to in_table itself instead of sql_gen.src_self, so that any copies of in_table will inherit the same srcs instead of being treated as a top-level table. This ensures that the top-level table's errors table will always be used.: Aaron Marcuse-Kubitza
05:17 PM Revision 3112: sql_io.py: cast(): Always convert exceptions to warnings if the input is a column or expression, even if there is no place to save the errors, so that invalid data does not need to be handled by the caller in a (much slower) extra exception-handling loop: Aaron Marcuse-Kubitza
04:47 PM Revision 3111: sql_io.py: put_table(): MissingCastException: When casting, handle InvalidValueException by filtering out invalid values with invalid2null() in a loop: Aaron Marcuse-Kubitza
04:45 PM Revision 3110: sql_io.py: cast_temp_col(): Run sql.update() in recover mode in case expr produces errors. Don't cache sql.update() in case this function will be called again after error recovery.: Aaron Marcuse-Kubitza
04:40 PM Revision 3109: sql.py: Generalized FunctionValueException to InvalidValueException so that it will match all invalid-value errors, not just those occurring in user-defined functions: Aaron Marcuse-Kubitza
04:22 PM Revision 3108: sql_io.py: put_table(): Removed no longer used sql.FunctionValueException handling, because type casting functions now do their own invalid value handling: Aaron Marcuse-Kubitza
03:44 PM Revision 3107: db_xml.py: put_table(): Subsetting in_table: Call put_table() recursively using put_table_() to ensure that limit and start are reset to their default values, in case the table gets partitioned (which needs up-to-date limit and start values): Aaron Marcuse-Kubitza
03:14 PM Revision 3106: sql_io.py: put_table(): mk_main_select(): Fixed bug where the table of each cond needed to be changed to insert_in_table because mk_main_select() uses the distinct table rather than the full input table: Aaron Marcuse-Kubitza
03:12 PM Revision 3105: sql_gen.py: with_table(): Support columns that are wrapped in a FunctionCall object: Aaron Marcuse-Kubitza
02:47 PM Revision 3104: sql_gen.py: index_cols: Store just the name of the index column, and add the table in index_col(), in case the table is ever copied and renamed: Aaron Marcuse-Kubitza

06/26/2012

11:06 PM Revision 3103: Moved error tracking from sql.py to sql_io.py: Aaron Marcuse-Kubitza
11:04 PM Revision 3102: sql_io.py: put_table(): Use sql.distinct_table() to uniquify input table, instead of DISTINCT ON. This avoids letting PostgreSQL create a sort temp table to store the output of the DISTINCT ON, which is not automatically removed until the end of the connection, causing database bloat that can use up the available disk space.: Aaron Marcuse-Kubitza
10:36 PM Revision 3101: sql_gen.py: suffixed_table(): Use concat(): Aaron Marcuse-Kubitza
10:34 PM Revision 3100: sql_gen.py: with_default_table(): Remove no longer used overwrite param: Aaron Marcuse-Kubitza
10:33 PM Revision 3099: sql.py: distinct_table(): Return new table instead of renaming input table so that columns that use input table will continue to work correctly: Aaron Marcuse-Kubitza
10:31 PM Revision 3098: sql_gen.py: Moved NamedCol check from with_default_table() to with_table(): Aaron Marcuse-Kubitza
09:39 PM Revision 3097: sql.py: distinct_table(): Fixed bug where empty distinct_on cols needed to create a table with one sample row, instead of returning the original table, because this indicates that the full set of distinct_on columns are all literal values and should only occur once: Aaron Marcuse-Kubitza
09:12 PM Revision 3096: sql.py: run_query(): DuplicateKeyException: Fixed bug where only constraint names matching a certain format were interpreted as DuplicateKeyExceptions. Support constraint names with the name and table separated by ".", not just "_".: Aaron Marcuse-Kubitza
09:10 PM Revision 3095: sql.py: run_query(): Exception parsing: Match patterns only at the beginning of the exception message to avoid matching embedded messages in causes and literal values: Aaron Marcuse-Kubitza
08:46 PM Revision 3094: sql.py: Added distinct_table(): Aaron Marcuse-Kubitza
08:46 PM Revision 3093: sql_gen.py: Added with_table() and use it in with_default_table(): Aaron Marcuse-Kubitza
07:52 PM Revision 3092: sql.py: mk_insert_select(): ignore mode: Support inserting all columns when cols == None: Aaron Marcuse-Kubitza
07:47 PM Revision 3091: sql_gen.py: Col, Table: Support non-string names: Aaron Marcuse-Kubitza
07:25 PM Revision 3090: sql_gen.py: row_count: Use new all_cols: Aaron Marcuse-Kubitza
07:24 PM Revision 3089: sql_gen.py: Added all_cols: Aaron Marcuse-Kubitza
07:17 PM Revision 3088: sql_gen.py: Use new as_Name() instead of db.esc_name(): Aaron Marcuse-Kubitza
07:12 PM Revision 3087: sql_gen.py: Name: Truncate the input name: Aaron Marcuse-Kubitza
07:11 PM Revision 3086: sql_gen.py: Added Name class and associated functions: Aaron Marcuse-Kubitza
06:46 PM Revision 3085: sql.py: create_table(): Support creating temp tables. This fixes a bug in copy_table_struct() where the created table was not a temp table if the source table was. copy_table_struct(): Removed no longer needed versioning because that is now handled by create_table().: Aaron Marcuse-Kubitza
06:33 PM Revision 3084: sql.py: Added copy_table_struct(): Aaron Marcuse-Kubitza
06:32 PM Revision 3083: sql.py: Moved add_indexes() to Indexes subsection: Aaron Marcuse-Kubitza
06:30 PM Revision 3082: sql.py: create_table(): Support LIKE table: Aaron Marcuse-Kubitza
05:18 PM Revision 3081: Moved Data cleanup from sql.py to sql_io.py: Aaron Marcuse-Kubitza
05:18 PM Revision 3080: Moved error tracking from sql.py to sql_io.py: Aaron Marcuse-Kubitza
05:12 PM Revision 3079: sql.py: Organized Database structure introspection and Structural changes functions into subsections: Aaron Marcuse-Kubitza
04:56 PM Revision 3078: Moved error tracking from sql.py to sql_io.py: Aaron Marcuse-Kubitza
04:46 PM Revision 3077: Moved Heuristic queries from sql.py to new sql_io.py: Aaron Marcuse-Kubitza
04:32 PM Revision 3076: Added top-level analysis dir for range modeling: Aaron Marcuse-Kubitza
04:02 PM Revision 3075: sql.py: run_query_into(): Documented why analyze() must be run manually on newly populated temp tables: Aaron Marcuse-Kubitza
03:57 PM Revision 3074: sql.py: DbConn: Added autoanalyze mode. Added autoanalyze() which runs analyze() only if in autoanalyze mode. Use new autoanalyze() in functions that change a table's contents.: Aaron Marcuse-Kubitza
03:52 PM Revision 3073: sql.py: run_query_into(): analyze() the created table to ensure the query planner's initial stats are accurate: Aaron Marcuse-Kubitza

06/25/2012

09:46 PM Revision 3072: inputs/SpeciesLink/src: Added custom header that overwrites existing header so that column names will not be too long for the staging table: Aaron Marcuse-Kubitza
09:35 PM Revision 3071: cat_csv: Support overwriting the existing header using a separate header file: Aaron Marcuse-Kubitza
08:49 PM Revision 3070: schemas/vegbien.sql: Added location.location_coords index to speed up large imports by providing an index for merge joins: Aaron Marcuse-Kubitza
08:43 PM Revision 3069: csv2db: Reanalyze table, so that query planner stats are up to date even though the table doesn't need to be vacuumed anymore: Aaron Marcuse-Kubitza
08:42 PM Revision 3068: sql.py: Added analyze(): Aaron Marcuse-Kubitza
08:11 PM Revision 3067: csv2db: Removed no longer needed table vacuum (cleanup_table() now avoids creating dead rows): Aaron Marcuse-Kubitza
08:10 PM Revision 3066: sql.py: cleanup_table(): Use update()'s new in_place mode to avoid needing to vacuum the table: Aaron Marcuse-Kubitza
08:02 PM Revision 3065: sql.py: mk_update(): in_place: Support updating multiple columns at once: Aaron Marcuse-Kubitza
07:44 PM Revision 3064: sql.py: update() calls: Use in_place where possible to avoid creating dead rows, which bloats table size: Aaron Marcuse-Kubitza
07:37 PM Revision 3063: sql.py: DbConn.col_info(): Support user-defined types: Aaron Marcuse-Kubitza
07:33 PM Revision 3062: sql_gen.py: Added Nullif: Aaron Marcuse-Kubitza
07:27 PM Revision 3061: sql_gen.py: Added Coalesce class and use it in EnsureNotNull: Aaron Marcuse-Kubitza
07:15 PM Revision 3060: sql_gen.py: Added coalesce and use it in EnsureNotNull: Aaron Marcuse-Kubitza
07:00 PM Revision 3059: sql.py: DbConn.col_info(): Don't cache the structure query because the column type, etc. may change between calls. This fixes a bug in mk_update() where the column type would be retrieved before a NOT NULL constraint was added, causing the NOT NULL constraint not to be in the cache info about the column.: Aaron Marcuse-Kubitza
06:56 PM Revision 3058: sql.py: mk_update(): Implemented in_place mode: Aaron Marcuse-Kubitza
06:39 PM Revision 3057: sql.py: mk_update(): Factored out filtering of input values so only `.to_str(db)` is used inline in the creation of the query: Aaron Marcuse-Kubitza
06:32 PM Revision 3056: sql.py: mk_update(): Added in_place param: Aaron Marcuse-Kubitza
06:13 PM Revision 3055: csvs.py: TsvReader: Prevent "new-line character seen in unquoted field" errors by replacing '\r' with '\n': Aaron Marcuse-Kubitza
06:12 PM Revision 3054: csv2db: Adding indexes: Fixed bug where sql.add_index()'s ensure_not_null param needed to be renamed to ensure_not_null_: Aaron Marcuse-Kubitza
05:16 PM Revision 3053: sql.py: cast(): columns values clause: Use start=0 to avoid "SELECT statement missing a WHERE, LIMIT, or OFFSET clause" warning: Aaron Marcuse-Kubitza
02:45 PM Revision 3052: inputs/import.stats.xls: Updated for most recent run: Aaron Marcuse-Kubitza

06/22/2012

07:46 PM Revision 3051: sql.py: Removed no longer needed mk_track_data_error(): Aaron Marcuse-Kubitza
07:45 PM Revision 3050: sql.py: track_data_error(): Use for loop and insert() (ignoring DuplicateKeyException) to insert entries into the errors table, to get the same optimization benefits this change provides in other filter-out contexts, and to improve clarity: Aaron Marcuse-Kubitza
07:35 PM Revision 3049: sql.py: cast(): Use FOR loop with EXCEPTION block instead of CROSS JOIN with LEFT JOIN to insert entries into the errors table, to get the same optimization benefits this change provides in other filter-out contexts, and to improve clarity: Aaron Marcuse-Kubitza
07:31 PM Revision 3048: sql_gen.py: NamedValues: Support None cols param for no named columns: Aaron Marcuse-Kubitza
06:55 PM Revision 3047: sql.py: ensure_not_null(): Made the use of index columns configurable based on the # of rows in the table, because for small datasources, they seem to add 6-25% to the total import time: Aaron Marcuse-Kubitza
06:12 PM Revision 3046: xml_func.py: _noCV: Fixed bug where assumed items was an iterator when it's now a list: Aaron Marcuse-Kubitza
06:02 PM Revision 3045: sql.py: add_index_col(), cast_temp_col(): Cache add_col() by providing new comment param to distinguish new columns of the same (colliding) suffixed name but from different source columns: Aaron Marcuse-Kubitza
05:54 PM Revision 3044: sql.py: add_index_col(), cast_temp_col(): Cache the update that fills in the new column, since it's idempotent: Aaron Marcuse-Kubitza
05:52 PM Revision 3043: sql.py: update(): Pass cacheable to run_query(): Aaron Marcuse-Kubitza
05:48 PM Revision 3042: sql.py: add_col(): Added comment param which can be used to distinguish columns of the same name from each other when they contain different data, to allow the ADD COLUMN query (and related queries, such as adding indexes) to be cached: Aaron Marcuse-Kubitza
05:46 PM Revision 3041: sql_gen.py: Added esc_comment(): Aaron Marcuse-Kubitza
05:37 PM Revision 3040: sql.py: DbConn.DbCursor.execute(): Allow ADD COLUMN to be cached if it has a distinguishing comment, because then the rest of query will be unique in the face of name collisions: Aaron Marcuse-Kubitza
05:29 PM Revision 3039: sql.py: add_col(): Document that name may be versioned, so caller needs to propagate any renaming back to any source column for the TypedCol: Aaron Marcuse-Kubitza
05:27 PM Revision 3038: sql.py: add_col() callers: Removed column name versioning because that is now handled by add_col(): Aaron Marcuse-Kubitza
05:22 PM Revision 3037: sql.py: add_col() callers: Fixed bug where needed to propagate any renaming of typed column back to regular column: Aaron Marcuse-Kubitza
05:19 PM Revision 3036: sql.py: add_col(): Version column names to avoid collisions. (Previously, callers were required to do this themselves.): Aaron Marcuse-Kubitza
05:13 PM Revision 3035: sql.py: cast_temp_col(): Handle column name collisions like add_index_col(): Aaron Marcuse-Kubitza
04:49 PM Revision 3034: sql.py: mk_insert_select(): INSERT IGNORE: Switched to using FOR loop rather than cursors because cursors are only needed if you want to process multiple rows in the same EXCEPTION block (which you can't do because then all the previous inserts will be rolled back if one row is a duplicate key): Aaron Marcuse-Kubitza
04:40 PM Revision 3033: sql.py: mk_insert_select(): INSERT IGNORE: Moved FETCH FROM cursor outside EXCEPTION block since only the insert needs to have unique_violations ignored: Aaron Marcuse-Kubitza
04:33 PM Revision 3032: sql.py: put_table(): Removed no longer needed EXCLUSIVE lock on the output table: Aaron Marcuse-Kubitza
04:25 PM Revision 3031: add_index_col(): Run all operations with log_level=3 because it is the equivalent of creating an index, and that has log_level=3 to be excluded from the primary algorithm steps. add_not_null(): Run query with log_level=3 because it's an internal operation and for add_index_col() above.: Aaron Marcuse-Kubitza
04:23 PM Revision 3030: sql.py: update(): Pass any log_level to run_query(): Aaron Marcuse-Kubitza
04:11 PM Revision 3029: xml_func.py: process(): Added pass-through optimization for aggregating functions with one arg: Aaron Marcuse-Kubitza
04:02 PM Revision 3028: xml_func.py: process(): Call list() on the items' iterator so that its length can be determined wherever it's used without needing to call list() on it separately: Aaron Marcuse-Kubitza
03:43 PM Revision 3027: sql.py: mk_select(): Put '*' on same line as 'SELECT' if there is no DISTINCT ON, to avoid extra newlines: Aaron Marcuse-Kubitza
03:38 PM Revision 3026: sql_gen.py: NamedTable.to_str(): Put AS clause on same line as code being named if it didn't contain a newline, to avoid extra newlines: Aaron Marcuse-Kubitza
03:29 PM Revision 3025: sql.py: run_query(): Always parse exceptions, whether recover is set or not, to avoid making the caller set recover just to parse exceptions. If recover is not set, just don't run any queries when generating the parsed exception and return an empty value for the needed information. (A partial parsed exception is better than an unparsed one.): Aaron Marcuse-Kubitza
03:18 PM Revision 3024: sql_gen.py: is_temp_col(): Use is_table_col() so that the type of the parameter is also checked: Aaron Marcuse-Kubitza
03:15 PM Revision 3023: inputs/import.stats.xls: Bolded the ms/row totals. Added note that non-bolded totals are not directly comparable because different methods were run with different numbers of rows.: Aaron Marcuse-Kubitza
03:10 PM Revision 3022: sql.py: put_table(): Inserting new rows: Removed no longer needed filter-out LEFT JOIN, since the INSERT IGNORE replacement handles duplicate keys: Aaron Marcuse-Kubitza
03:02 PM Revision 3021: sql.py: put_table(): Inserting new rows: Use insert_select()'s (mk_insert_select()'s) new ignore option to ignore duplicate keys: Aaron Marcuse-Kubitza
02:59 PM Revision 3020: sql_gen.py: to_name_only_col(): Consider any non-table column, including NamedCol, to be already name-only. This fixes a bug in sql.mk_insert_select() where the value of a NamedCol was removed by to_name_only_col() even though it was needed.: Aaron Marcuse-Kubitza
02:30 PM Revision 3019: sql.py: mk_insert_select(): INSERT IGNORE: Use an EXCEPTION block for each individual row because "When an error is caught by an EXCEPTION clause, [...] all changes to persistent database state within the block are rolled back." (http://www.postgresql.org/docs/8.3/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING). Documented that cursor stays at current position in spite of automatic ROLLBACK on EXCEPTION.: Aaron Marcuse-Kubitza
02:11 PM Revision 3018: sql.py: mk_insert_select(): INSERT IGNORE: Use RETURN QUERY on the insert query instead of RETURN NEXT on a local var to forward the RETURNING value: Aaron Marcuse-Kubitza
02:08 PM Revision 3017: sql.py: mk_insert_select(): INSERT IGNORE: Support RETURNING clause. Always return something to set the correct rowcount.: Aaron Marcuse-Kubitza
02:06 PM Revision 3016: sql_gen.py: Use an as_*() function instead of manually checking the type wherever possible: Aaron Marcuse-Kubitza
01:18 PM Revision 3015: sql.py: mk_insert_select(): INSERT IGNORE: Added duplicate key handling by using EXCEPTION block to catch unique_violation: Aaron Marcuse-Kubitza
12:58 PM Revision 3014: sql.py: mk_insert_select(): INSERT IGNORE: Fixed bug where user-defined types were not supported correctly, by referencing the column's type directly in PL/pgSQL instead of retrieving it from the information_schema. Use a row variable of the output table's row type to store the select query data, so that you don't need to create a separate local var for each output column.: Aaron Marcuse-Kubitza

06/21/2012

10:24 PM Revision 3013: import.stats.xls: Changed \/row (ms)" to "ms/row": Aaron Marcuse-Kubitza
09:52 PM Revision 3012: sql.py: mk_insert_select(): INSERT IGNORE: Used typed vars to store each column of the select query because the select query's column names are not known, so it would be unknown which record var column to use for each output column. (Add types so there are no unknown-type values that PostgreSQL will say can't be cast.): Aaron Marcuse-Kubitza
09:21 PM Revision 3011: sql.py: insert_select(): In ignore mode, always recover from errors: Aaron Marcuse-Kubitza
09:14 PM Revision 3010: sql.py: mk_insert_select(): INSERT IGNORE: Implemented simple cursor loop that just inserts each row, without ignoring duplicate keys: Aaron Marcuse-Kubitza
08:26 PM Revision 3009: sql.py: mk_insert_select(): Added ignore param to do an INSERT IGNORE equivalent operation. Added basic structure for INSERT IGNORE. Support embeddable queries without RETURNING columns (which is the case for most INSERT IGNORE queries).: Aaron Marcuse-Kubitza
06:50 PM Revision 3008: sql.py: run_query_into(): Always set into.schema to None, so that table is created in the temp schema in debug_temp mode: Aaron Marcuse-Kubitza
06:25 PM Revision 3007: sql.py: DbConn.DbCursor.execute(): Don't cache ADD COLUMN queries because the rest of the query must be unique in order to use caching: Aaron Marcuse-Kubitza
06:24 PM Revision 3006: sql.py: add_col(): Don't ignore already-existing columns because sometimes name truncation causes collisions, requiring the caller to version the name. add_index_col(): Version the column name to avoid collisions.: Aaron Marcuse-Kubitza
06:08 PM Revision 3005: sql.py: add_index(): Version index names to avoid collisions: Aaron Marcuse-Kubitza
05:58 PM Revision 3004: sql.py: DbConn.DbCursor.execute(): Always cache structural changes, not just if they throw an exception: Aaron Marcuse-Kubitza
05:40 PM Revision 3003: sql.py: add_index(): Fixed bug where expr could not be deep-copied until ensure_not_null() had been run on it, because ensure_not_null() modifies the index_cols of the table and this needs to be modified on the original table: Aaron Marcuse-Kubitza
05:39 PM Revision 3002: sql.py: add_index(): Fixed bug where expr could not be deep-copied until ensure_not_null() had been run on it, because ensure_not_null() modifies the index_cols of the table and this needs to be modified on the original table: Aaron Marcuse-Kubitza
05:12 PM Revision 3001: sql_gen.py: EnsureNotNull: Added to_str() which uses the column's index col if available. This works around a PostgreSQL bug where indexes on expressions are sometimes not used, but indexes on plain columns are used.: Aaron Marcuse-Kubitza
05:09 PM Revision 3000: sql_gen.py: Store index cols in the table instead of in the column, because columns are often recreated from the table and name but tables are generally not copied: Aaron Marcuse-Kubitza
04:21 PM Revision 2999: sql_gen.py: Added is_indexed_col(): Aaron Marcuse-Kubitza
04:15 PM Revision 2998: sql.py: add_index(): Use new sql.ensure_not_null(), which creates a separate column to store the index expr where possible: Aaron Marcuse-Kubitza
04:12 PM Revision 2997: sql.py: Added add_not_null(), add_index_col(), ensure_not_null(): Aaron Marcuse-Kubitza
04:10 PM Revision 2996: sql_gen.py: Added is_temp_col(): Aaron Marcuse-Kubitza
03:57 PM Revision 2995: sql_gen.py: concat(): Also preserve cast suffix: Aaron Marcuse-Kubitza
03:44 PM Revision 2994: sql_gen.py: Col: Added index_col attr: Aaron Marcuse-Kubitza
03:19 PM Revision 2993: sql_gen.py: Added suffixed_col(): Aaron Marcuse-Kubitza
02:59 PM Revision 2992: sql.py: run_query_into(): Set the into table to be a temp table: Aaron Marcuse-Kubitza
02:59 PM Revision 2991: sql_gen.py: Table: Store whether table is temp table: Aaron Marcuse-Kubitza
01:50 PM Revision 2990: sql_gen.py: concat(): Also preserve appended column names: Aaron Marcuse-Kubitza
01:48 PM Revision 2989: sql_gen.py: Col.to_str(): Take for_str param which does both concat() and str()/clean_name() instead of use_concat param which doesn't remove quotes from the strings before concatenating, causing strings to be incorrectly truncated: Aaron Marcuse-Kubitza

06/20/2012

09:46 PM Revision 2988: sql_gen.py: EnsureNotNull: Cast the null sentinel to the type it's the sentinel for, so that COALESCE() returns the same type as the provided type, instead of potentially incorrectly assuming the constant's type. This fixes a bug where COALESCE() returned an incorrect type.: Aaron Marcuse-Kubitza
09:38 PM Revision 2987: sql.py: cast(): Use new sql_gen.Cast: Aaron Marcuse-Kubitza
09:38 PM Revision 2986: sql_gen.py: Added Cast: Aaron Marcuse-Kubitza
09:37 PM Revision 2985: sql_gen.py: concat(): Consider ")"s part of the existing suffix, so that parentheses match up: Aaron Marcuse-Kubitza
09:05 PM Revision 2984: Removed extra trailing whitespace: Aaron Marcuse-Kubitza
08:36 PM Revision 2983: main Makefile: Added explicit schemas/temp/reinstall target because vegbiendev (Ubuntu 12.04) seems to ignore %/reinstall: Aaron Marcuse-Kubitza
08:24 PM Revision 2982: vegbien_dest: Added temp to schemas list so the intermediate tables will be created there, where they are easy to remove with `make schemas/temp/reinstall`: Aaron Marcuse-Kubitza
08:23 PM Revision 2981: main Makefile: Added %/reinstall, which calls the corresponding uninstall and install targets: Aaron Marcuse-Kubitza
08:11 PM Revision 2980: Added temp schema to store intermediate tables permanently when desired for debugging. It is not part of schemas/all because it should not be remade to include current intermediate tables when remaking other schema exports.: Aaron Marcuse-Kubitza
08:04 PM Revision 2979: schemas: Moved *.sql.make into Makefile to take advantage of % pattern matching: Aaron Marcuse-Kubitza
07:47 PM Revision 2978: Removed no longer used schemas/vegbien_empty.sql: Aaron Marcuse-Kubitza
07:46 PM Revision 2977: main Makefile: Removed empty_db, because `make schemas/reinstall` has the same effect and is simpler: Aaron Marcuse-Kubitza
07:40 PM Revision 2976: README.TXT: Changed documentation to use make schemas/reinstall to empty the DB, since that command is simpler. Added how to archive the last import.: Aaron Marcuse-Kubitza
07:11 PM Revision 2975: db_xml.py: put_table(): Removed `if not db.debug_temp` check because that is done by sql.empty_temp(): Aaron Marcuse-Kubitza
07:10 PM Revision 2974: sql.py: put_table(): Use new empty_temp(): Aaron Marcuse-Kubitza
07:06 PM Revision 2973: import.stats.xls: Added comments for estimated numbers. Added "," separators to large numbers.: Aaron Marcuse-Kubitza
06:21 PM Revision 2972: sql.py: empty_temp(): In debug_temp mode, leave temp tables there for debugging: Aaron Marcuse-Kubitza
06:06 PM Revision 2971: sql.py: empty_temp(): Don't output at log_level 2 because it's an internal query, not part of the core algorithm: Aaron Marcuse-Kubitza
06:06 PM Revision 2970: sql.py: truncate(): Added kw_args to pass to run_query(): Aaron Marcuse-Kubitza
05:52 PM Revision 2969: Added inputs/import.stats.xls, which compares row-based and column-based import. This shows that column-based import is slowed down by table locking when run simultaneously, so we will need a new INSERT IGNORE replacement that doesn't lock tables.: Aaron Marcuse-Kubitza
03:14 PM Revision 2968: inputs: Ignore OpenOffice.org lock files: Aaron Marcuse-Kubitza
02:19 PM Revision 2967: sql.py: empty_temp(): Don't print log message if not emptying any tables: Aaron Marcuse-Kubitza
02:16 PM Revision 2966: db_xml.py: put_table(): Empty unneeded temp tables to free up memory: Aaron Marcuse-Kubitza
02:14 PM Revision 2965: sql.py: Added empty_temp(): Aaron Marcuse-Kubitza
02:14 PM Revision 2964: sql.py: Use new lists.mk_seq(): Aaron Marcuse-Kubitza
02:13 PM Revision 2963: lists.py: Added mk_seq(): Aaron Marcuse-Kubitza
02:11 PM Revision 2962: lists.py: is_seq(): Also return true for sets: Aaron Marcuse-Kubitza

06/19/2012

03:02 PM Revision 2961: schemas/vegbien.sql: specimenreplicate: Added indexes using COALESCE() to match what sql_gen does: Aaron Marcuse-Kubitza
02:08 PM Revision 2960: sql.py: put_table(): Getting output table pkeys of existing/inserted rows: Do a DISTINCT ON the input pkey (row_num) in case the plain JOIN matched multiple output table rows for one input table row: Aaron Marcuse-Kubitza
01:44 PM Revision 2959: sql.py: put_table(): Empty unneeded temp tables to free up memory and avoid running out of memory (the temp tables seem to be in-memory only): Aaron Marcuse-Kubitza
01:33 PM Task #378: create automated feedback mechanism: Errors tables are part of this. Aaron Marcuse-Kubitza
01:32 PM Task #426: Reload DB using column-based import: Fixing bugs in the import related to importing large amounts of data at once. Aaron Marcuse-Kubitza
01:31 PM Task #425 (Resolved): Reimplement row-based logging mechanism for column-based import: Aaron Marcuse-Kubitza
01:30 PM Revision 2958: sql_gen.py: null_sentinels: Added value for type timestamp with time zone. Put each type on its own line for clarity.: Aaron Marcuse-Kubitza
01:03 PM Revision 2957: schemas/vegbien.sql: locationdetermination: Changed indexes to use COALESCE() to match what sql_gen now does: Aaron Marcuse-Kubitza
12:23 PM Revision 2956: schemas/vegbien.sql: location: Added indexes using COALESCE() to match what sql_gen does: Aaron Marcuse-Kubitza
12:06 PM Revision 2955: sql.py: cast_temp_col(): Add an index on the created column: Aaron Marcuse-Kubitza
11:55 AM Revision 2954: sql_gen.py: null_sentinels: Added value for type double precision: Aaron Marcuse-Kubitza
11:52 AM Revision 2953: sql_gen.py: ensure_not_null(): Warn of no null sentinel for type, even if caller catches error: Aaron Marcuse-Kubitza
10:11 AM Revision 2952: schemas/py_functions.sql: Added plain function _namePart() and use it in trigger function _namePart(): Aaron Marcuse-Kubitza
09:42 AM Revision 2951: schemas/py_functions.sql: Added plain functions _dateRangeStart() and _dateRangeEnd() and use them in trigger functions _dateRangeStart() and _dateRangeEnd(): Aaron Marcuse-Kubitza
09:28 AM Revision 2950: schemas/functions.sql: _label(): Ensure that label is NOT NULL so it doesn't NULL out the entire string: Aaron Marcuse-Kubitza
09:23 AM Revision 2949: schemas/functions.sql: Added plain function _nullIf() and use it in trigger function _nullIf(): Aaron Marcuse-Kubitza
08:56 AM Revision 2948: sql.py: DbConn.DbCursor._cache_result(): Corrected comment to reflect why different types of queries are cached differently: Aaron Marcuse-Kubitza
08:46 AM Revision 2947: sql.py: add_col(): Catch DuplicateExceptions so that columns that already existed are ignored: Aaron Marcuse-Kubitza
08:43 AM Revision 2946: sql.py: run_query(): DuplicateException: Also match "column already exists" errors: Aaron Marcuse-Kubitza
08:20 AM Revision 2945: sql.py: Merged DuplicateTableException and DuplicateFunctionException into one exception DuplicateException, with a type variable for the type of duplicate item. Added ExceptionWithNameType.: Aaron Marcuse-Kubitza
08:05 AM Revision 2944: schemas/functions.sql: Fixed bug where external function calls needed to be schema-qualified in case functions schema is not in the search_path: Aaron Marcuse-Kubitza
07:59 AM Revision 2943: schemas/functions.sql: Added plain function _label() and use it in trigger function _label(): Aaron Marcuse-Kubitza
07:50 AM Revision 2942: sql.py: put_table(): Support plain SQL functions in addition to relational functions: Aaron Marcuse-Kubitza

06/18/2012

11:08 PM Revision 2941: sql_gen.py: Added NamedArg. FunctionCall: Support named arguments (http://www.postgresql.org/docs/9.0/static/sql-syntax-calling-funcs.html).: Aaron Marcuse-Kubitza
10:54 PM Revision 2940: schemas/functions.sql: Added plain function _merge() and use it in trigger function _merge(): Aaron Marcuse-Kubitza
10:49 PM Revision 2939: schemas/functions.sql: Added plain function _alt() and use it in trigger function _alt(): Aaron Marcuse-Kubitza
10:37 PM Revision 2938: schemas/functions.sql: Removed no longer used ensure_not_null(): Aaron Marcuse-Kubitza
10:22 PM Revision 2937: sql.py: put_table(): MissingCastException: Use cast_temp_col() so that cast will occur before any main insert, which locks the output table and should take as little time as possible: Aaron Marcuse-Kubitza
10:18 PM Revision 2936: sql.py: Added cast_temp_col(): Aaron Marcuse-Kubitza
10:17 PM Revision 2935: sql.py: add_col(): Support additional run_query() kw_args. add_row_num(): Use new add_col().: Aaron Marcuse-Kubitza
10:09 PM Revision 2934: sql.py: Added add_col(): Aaron Marcuse-Kubitza
08:17 PM Revision 2933: sql_gen.py: Col.__str__(): Truncate any table name using concat() to ensure that the full column name is included in the string: Aaron Marcuse-Kubitza
07:59 PM Revision 2932: strings.py, sql_gen.py: Renamed add_suffix() to concat() to reflect that this is a fixed-length replacement for +: Aaron Marcuse-Kubitza
07:49 PM Revision 2931: sql.py: put_table(): Moved MissingCastException to the top of the exceptions list because it's more of a core exception than the others, and will be raised before any rows are even inserted: Aaron Marcuse-Kubitza
06:20 PM Revision 2930: sql.py: DbConn.with_savepoint(): Always release savepoint, because after ROLLBACK TO SAVEPOINT, "The savepoint remains valid and can be rolled back to again" (http://www.postgresql.org/docs/8.3/static/sql-rollback-to.html). Moved `self._savepoint -= 1` to the main try block's new finally block.: Aaron Marcuse-Kubitza
05:59 PM Revision 2929: sql.py: put_table(): Lock output table right before, and in the same nested transaction as, the insert statement that needs lock, so that it is not released in a prior autocommit and is held for as little time as possible: Aaron Marcuse-Kubitza
05:38 PM Revision 2928: db_xml.py: put_table(): Removed no longer needed commit param: Aaron Marcuse-Kubitza
05:16 PM Revision 2927: bin/map: Removed rollback() call before closing the connection because PostgreSQL does this automatically: Aaron Marcuse-Kubitza
05:14 PM Revision 2926: csv2db: Log inserts with log_level=5 so they are not shown for verbosity 4, which is used to see the savepoints and autocommits: Aaron Marcuse-Kubitza
05:13 PM Revision 2925: Removed unnecessary db.db.commit() calls because commits are now done automatically by DbConn's autocommit mode: Aaron Marcuse-Kubitza
04:54 PM Revision 2924: sql.py: DbConn.do_autocommit(): Output the "Autocommitting" debug message with level=4 so that it doesn't clutter up the logging output for normal verbosities: Aaron Marcuse-Kubitza
04:50 PM Revision 2923: DbConn: autocommit mode defaults to True so that all scripts get the benefit of automatic commits: Aaron Marcuse-Kubitza
04:49 PM Revision 2922: input.Makefile: Staging tables: import/install-%: Include the table name in the log file name so that successive tables for the same datasource don't overwrite the same log file: Aaron Marcuse-Kubitza
04:39 PM Revision 2921: sql.py: DbConn: Don't always autocommit in debug_temp mode, because this could cause autocommit mode to be turned on when the user does not expect it: Aaron Marcuse-Kubitza
04:36 PM Revision 2920: bin/map: connect_db(): Autocommit in commit mode to avoid the need for manual commits. This should also reduce the time that table locks are held, to avoid unnecessary contention when multiple processes are trying to insert into the same output table. (The program always uses nested transactions to support rollbacks, so there is no problem autocommitting whenever a top-level nested transaction or top-level query completes.): Aaron Marcuse-Kubitza
04:29 PM Revision 2919: sql_gen.py: Removed TempFunction because that functionality is now provided by DbConn.TempFunction(): Aaron Marcuse-Kubitza
04:28 PM Revision 2918: sql.py: Use new DbConn.TempFunction(): Aaron Marcuse-Kubitza
04:28 PM Revision 2917: sql.py: DbConn: Added TempFunction(): Aaron Marcuse-Kubitza
04:25 PM Revision 2916: sql.py: Use new DbConn.debug_temp config option to control whether temporary objects should instead be permanent: Aaron Marcuse-Kubitza
04:20 PM Revision 2915: sql.py: DbConn: Added config option debug_temp: Aaron Marcuse-Kubitza
04:12 PM Revision 2914: sql.py: function_exists(): Fixed bug where trigger functions needed to be excluded, since they cannot be called directly: Aaron Marcuse-Kubitza
03:49 PM Revision 2913: sql.py: Added function_exists(): Aaron Marcuse-Kubitza
03:49 PM Revision 2912: sql_gen.py: Made Function an alias of Table so that isinstance(..., Function) will always work correctly: Aaron Marcuse-Kubitza
03:45 PM Revision 2911: sql_gen.py: Added as_Function(): Aaron Marcuse-Kubitza

06/15/2012

06:16 AM Revision 2910: sql.py: put_table(): Lock the output table in EXCLUSIVE mode *before* getting its pkey so that an ACCESS SHARE lock is not acquired before EXCLUSIVE (causing a lock upgrade and deadlock). This race condition may not have been previously noticeable because pkey() is cached, so calling it doesn't necessarily execute a query or acquire an ACCESS SHARE lock.: Aaron Marcuse-Kubitza
05:52 AM Revision 2909: sql.py: put_table(): Document that must be run at the *beginning* of a transaction: Aaron Marcuse-Kubitza
05:49 AM Revision 2908: sql.py: put_table(), mk_select(): Switched back to having put_table() acquire the EXCLUSIVE locks, but right at the beginning of the transaction, in order to avoid lock upgrades which cause deadlocks: Aaron Marcuse-Kubitza
05:35 AM Revision 2907: sql.py: with_autocommit(): Only allow turning autocommit on, because the opposite is not meaningful and may conflict with the session-global isolation level: Aaron Marcuse-Kubitza
05:33 AM Revision 2906: sql.py: DbConn: Set the transaction isolation level to READ COMMITTED using set_isolation_level() so that the isolation level affects all transactions in the session, not just the current one: Aaron Marcuse-Kubitza
05:21 AM Revision 2905: sql.py: DbConn: Always set the transaction isolation level to READ COMMITTED so that when a table is locked for update, its contents are frozen at that point rather than earlier. This ensures that no concurrent duplicate keys were inserted between the time the table was snapshotted (at the beginning of the transaction for SERIALIZABLE) and the time it was locked for update.: Aaron Marcuse-Kubitza
05:02 AM Revision 2904: sql.py: put_table(): Removed locking output tables to prevent concurrent duplicate keys because that is now done automatically by mk_select(): Aaron Marcuse-Kubitza
05:01 AM Revision 2903: sql.py: mk_select(): Filtering on no match: Lock the joined table in EXCLUSIVE mode to prevent concurrent duplicate keys when used with INSERT SELECT: Aaron Marcuse-Kubitza
04:59 AM Revision 2902: sql_gen.py: Added underlying_table() and use it in underlying_col(): Aaron Marcuse-Kubitza
04:39 AM Revision 2901: main Makefile: schemas/rotate: Fixed bug where needed to run schemas/public/install, not full schemas/install, after renaming public schema: Aaron Marcuse-Kubitza
04:32 AM Revision 2900: sql.py: put_table(): Lock output tables to prevent concurrent duplicate keys: Aaron Marcuse-Kubitza
04:31 AM Revision 2899: sql.py: Added lock_table(): Aaron Marcuse-Kubitza
03:53 AM Revision 2898: bin/map: connect_db(): Only use autocommit mode if verbosity > 3, to avoid accidentally activating it if you want debug output in normal import mode: Aaron Marcuse-Kubitza
03:45 AM Revision 2897: bin/map: connect_db(): Only use autocommit mode if verbosity > 2, because it causes the intermediate tables to be created as permanent tables, which you don't want unless you're actually debugging (verbosity = 2 is normal for column-based import): Aaron Marcuse-Kubitza
03:25 AM Revision 2896: sql.py: put_table(): remove_all_rows(): Changed log message to "Ignoring all rows" because NULL is not necessarily the pkey value that will be returned for the rows: Aaron Marcuse-Kubitza
03:17 AM Revision 2895: sql.py: put_table(): Don't add index on columns that will have values filtered out, because indexes have already been added on all columns in the iteration's input table by flatten(): Aaron Marcuse-Kubitza
03:12 AM Revision 2894: sql.py: DbConn._db(): Setting serializable isolation level: Always set this (if self.serializable is set), even in autocommit mode, because autocommit mode is implemented by manual commits in the DbConn wrapper object rather than using the underlying connection's autocommit mode (which does not allow setting the isolation level): Aaron Marcuse-Kubitza
03:08 AM Revision 2893: sql.py: DbConn._db(): Setting search_path: Use `SET search_path` and `SHOW search_path` instead of combining the old and new search_paths in SQL itself using `SELECT set_config('search_path', ...)`: Aaron Marcuse-Kubitza
02:47 AM Revision 2892: csv2db: ProgressInputStream: Use default progress message 'Read %d line(s)' because there is not necessarily one CSV row per line, due to embedded newlines: Aaron Marcuse-Kubitza
01:47 AM Revision 2891: input.Makefile: Staging tables: import/install-%: Only output to the log file if log option is non-empty (which it is by default): Aaron Marcuse-Kubitza
01:46 AM Revision 2890: csv2db: Support reinstalling just the errors table using new errors_table_only option: Aaron Marcuse-Kubitza
01:45 AM Revision 2889: sql.py: Added drop_table(): Aaron Marcuse-Kubitza
01:20 AM Revision 2888: schemas/vegbien.sql: method: Changed indexes to use `COALESCE(..., E'\\N')` to match what sql_gen now does: Aaron Marcuse-Kubitza
01:16 AM Revision 2887: schemas/vegbien.sql: specimenreplicate: Added indexes using COALESCE() to match what sql_gen does: Aaron Marcuse-Kubitza
01:12 AM Revision 2886: schemas/vegbien.sql: locationevent: Added indexes using COALESCE() to match what sql_gen does: Aaron Marcuse-Kubitza
12:57 AM Revision 2885: schemas/vegbien.ERD.mwb: Synced with schema: Aaron Marcuse-Kubitza
12:54 AM Revision 2884: schemas/vegbien.sql: party: Changed indexes to use `COALESCE(..., E'\\N')` to match what sql_gen now does: Aaron Marcuse-Kubitza
12:38 AM Revision 2883: Wrap sys.stderr.write() calls in strings.to_raw_str() to avoid UnicodeEncodeErrors when stderr is to a file and the default encoding is ASCII: Aaron Marcuse-Kubitza
12:37 AM Revision 2882: strings.py: Added to_raw_str(): Aaron Marcuse-Kubitza
12:12 AM Revision 2881: bin/map: When logging the row # being processed, add 1 because row # is interally 0-based, but 1-based to the user: Aaron Marcuse-Kubitza
12:05 AM Revision 2880: bin/map: Log the row # being processed with level=1.1 so that the user can see a status report if desired: Aaron Marcuse-Kubitza

06/14/2012

11:35 PM Revision 2879: exc.py: str_(): Fixed bug where UnicodeEncodeError would be raised when msg contains non-ASCII chars, by wrapping e.args[0] in strings.ustr(): Aaron Marcuse-Kubitza
11:23 PM Revision 2878: exc.py: print_ex(): Wrap msg in strings.to_unicode() to try to avoid UnicodeEncodeError when msg contains non-ASCII chars: Aaron Marcuse-Kubitza
11:20 PM Revision 2877: sql.py: create_table(): Don't set pkey.nullable to False because the caller should make sure the pkey has the appropriate type: Aaron Marcuse-Kubitza
11:20 PM Revision 2876: csv2db: Use sql_gen.TypedCol.nullable instead of manually adding 'NOT NULL' to the type. Ensure that pkeys are properly NOT NULL.: Aaron Marcuse-Kubitza
11:15 PM Revision 2875: csv2db: Adding indexes: Create plain indexes using ensure_not_null=False because the indexes will primarily be used by the user to search for specific values, rather than by the mapping script which uses the ensure_not_null: Aaron Marcuse-Kubitza
11:13 PM Revision 2874: sql.py: DbConn.col_info(): Run query with log_level=4 because it gathers information about database structure, and should have the same log_level as other queries that do that: Aaron Marcuse-Kubitza
11:09 PM Revision 2873: csv2db: Adding indexes: Fixed bug where col.to_Col() could not be used because sql.add_index() does not support name-only columns (plain strings are OK, though): Aaron Marcuse-Kubitza
11:00 PM Revision 2872: sql.py: create_table(): has_pkey: Use new TypedCol.constraints to store 'PRIMARY KEY': Aaron Marcuse-Kubitza
10:59 PM Revision 2871: sql_gen.py: TypedCol: Added constraints instance var: Aaron Marcuse-Kubitza
10:38 PM Revision 2870: sql_gen.py: EnsureNotNull: Made coalesce() all uppercase to match how pg_dump spells it: Aaron Marcuse-Kubitza
10:36 PM Revision 2869: schemas/vegbien.sql: namedplace: Fixed bug where parent_id needed to be included in UNIQUE CONSTRAINT (now UNIQUE INDEX), since there can be more than one e.g. city of the same name if they are in different states: Aaron Marcuse-Kubitza
10:32 PM Revision 2868: schemas/vegbien.sql: plantname: Changed indexes to use `COALESCE(..., 2147483647)` to match what sql_gen now does: Aaron Marcuse-Kubitza
10:22 PM Revision 2867: schemas/py_functions.sql: _dateRangeStart, _dateRangeEnd: Changed indexes to use `COALESCE(..., E'\N')` to match what sql_gen now does: Aaron Marcuse-Kubitza
10:19 PM Revision 2866: schemas/py_functions.sql: _namePart: Changed indexes to use `COALESCE(..., E'\N')` to match what sql_gen now does: Aaron Marcuse-Kubitza
10:16 PM Revision 2865: schemas/functions.sql: _nullIf: Changed indexes to use `COALESCE(..., E'\N')` to match what sql_gen now does: Aaron Marcuse-Kubitza
10:15 PM Revision 2864: schemas/functions.sql: _nullIf: Require a non-NULL null-equivalent value: Aaron Marcuse-Kubitza
10:12 PM Revision 2863: schemas/functions.sql: _label: Changed indexes to use `COALESCE(..., E'\N')` to match what sql_gen now does: Aaron Marcuse-Kubitza
10:09 PM Revision 2862: schemas/functions.sql: _label: Require a non-NULL label: Aaron Marcuse-Kubitza
10:04 PM Revision 2861: sql_gen.py: null_sentinels: Removed types where a sentinel doesn't make sense (unknown types, boolean) because types with no sentinel are now handled gracefully by users of ensure_not_null(): Aaron Marcuse-Kubitza
10:00 PM Revision 2860: sql.py: add_index(): ensure_not_null: Handle unknown types gracefully: Aaron Marcuse-Kubitza
09:52 PM Revision 2859: sql_gen.py: MockDb: Added col_info(): Aaron Marcuse-Kubitza
09:48 PM Revision 2858: sql_gen.py: Use as_*() functions where the auto-wrapping was previously done manually: Aaron Marcuse-Kubitza
09:44 PM Revision 2857: sql_gen.py: CompareCond.to_str(): Use ensure_not_null()'s new type_ param to apply same function to both sides but not if the right side is already NOT NULL: Aaron Marcuse-Kubitza
09:26 PM Revision 2856: sql_gen.py: null_sentinels: Added value for character varying type: Aaron Marcuse-Kubitza
08:45 PM Revision 2855: sql_gen.py: ensure_not_null(): Support non-column inputs if type_ is set: Aaron Marcuse-Kubitza
08:32 PM Revision 2854: sql_gen.py: null_sentinels: Added value for USER-DEFINED type: Aaron Marcuse-Kubitza
08:31 PM Revision 2853: sql.py: mk_select(): Joins: Filtering on no match: Use '~=' sql_gen.CompareCond operator so that IS NULL is always used, regardless of the not-null column's nullability: Aaron Marcuse-Kubitza
08:24 PM Revision 2852: sql_gen.py: null_sentinels: Added value for boolean type: Aaron Marcuse-Kubitza
08:21 PM Revision 2851: sql_gen.py: ensure_not_null(): Added type_ param to override the underlying column's type: Aaron Marcuse-Kubitza
08:09 PM Revision 2850: sql_gen.py: EnsureNotNull: Take a type param instead of a null param so that the EnsureNotNull object stores the underlying column's type: Aaron Marcuse-Kubitza
08:00 PM Revision 2849: sql_gen.py: underlying_col(): Support non-Col inputs: Aaron Marcuse-Kubitza
07:42 PM Revision 2848: sql_gen.py: EnsureNotNull: Removed default value for null param to remind user that default value depends on value's type and will not always be a string: Aaron Marcuse-Kubitza
07:39 PM Revision 2847: sql.py: add_index(): Added ensure_not_null param to disable the ensure_not_null functionality to force a plain index: Aaron Marcuse-Kubitza
07:14 PM Revision 2846: sql.py: flatten(): Add indexes on the created table so its columns can be used in an O(n) merge join: Aaron Marcuse-Kubitza
07:12 PM Revision 2845: sql_gen.py: null_sentinels: Added value for integer type: Aaron Marcuse-Kubitza
06:52 PM Revision 2844: sql_gen.py: CompareCond.to_str(): Always wrap the left-side column if it's nullable. Wrap the right-side value if the left side was wrapped, rather than if both the left and right side are nullable. This causes coalesce() indexes to be used to look up NULL values using the value NULL gets coalesced to, rather than doing a sequential scan.: Aaron Marcuse-Kubitza
06:43 PM Revision 2843: sql_gen.py: Run truncate() on all identifiers so that literal-string-based lookups for an identifier (such as in db.col_info()) don't use the untruncated value: Aaron Marcuse-Kubitza
06:40 PM Revision 2842: sql_gen.py: Added truncate(): Aaron Marcuse-Kubitza
06:40 PM Revision 2841: sql.py: put_table(): Resolving default value column: Fixed bug where default value of None was used as a key for mapping, even though this is an invalid Col name: Aaron Marcuse-Kubitza
05:30 PM Revision 2840: sql_gen.py: ensure_not_null(): If input column cannot be ensured to be NULL, pass any raised exception through rather than suppressing it and leaving the column in a nullable state: Aaron Marcuse-Kubitza

06/13/2012

11:11 PM Revision 2839: schemas/functions.sql: _merge: Changed indexes to use `COALESCE(..., E'\\N')` to match what sql_gen now does: Aaron Marcuse-Kubitza
11:08 PM Revision 2838: schemas/functions.sql: _alt: Changed indexes to use `COALESCE(..., E'\\N')` to match what sql_gen now does: Aaron Marcuse-Kubitza
10:59 PM Revision 2837: sql_gen.py: CompareCond.to_str(): Handle nullable columns using ensure_not_null(): Aaron Marcuse-Kubitza
10:46 PM Revision 2836: sql_gen.py: ensure_not_null(): Raise NoUnderlyingTableException if can't ensure not null for that reason: Aaron Marcuse-Kubitza
10:20 PM Revision 2835: sql_gen.py: is_underlying_table(): Support non-Table inputs: Aaron Marcuse-Kubitza
10:15 PM Revision 2834: sql_gen.py: NamedValues: Call set_cols_table() with the created table, not just the name, so that is_underlying_table() works properly: Aaron Marcuse-Kubitza
09:49 PM Revision 2833: sql_gen.py: underlying_col(): If no underlying table, raise NoUnderlyingTableException: Aaron Marcuse-Kubitza
09:45 PM Revision 2832: sql_gen.py: Added is_underlying_table(): Aaron Marcuse-Kubitza
09:34 PM Revision 2831: sql_gen.py: ensure_not_null(): Call underlying_col() on the column to remove all renamings: Aaron Marcuse-Kubitza
09:33 PM Revision 2830: sql_gen.py: Added underlying_col(): Aaron Marcuse-Kubitza
09:24 PM Revision 2829: sql_gen.py: Join.to_str(): join(): Removed no longer needed `*_table = *_table.to_Table()`: Aaron Marcuse-Kubitza
09:22 PM Revision 2828: sql_gen.py: Col: Support Table objects that are not just names, by calling `.to_Table()` on the table before stringifying it: Aaron Marcuse-Kubitza
09:22 PM Revision 2827: sql_gen.py: Col: Support Table objects that are not just names, by calling `.to_Table()` on the table before stringifying it: Aaron Marcuse-Kubitza
08:59 PM Revision 2826: sql_gen.py: ensure_not_null(): Added ignore_unknown_type param: Aaron Marcuse-Kubitza
08:55 PM Revision 2825: sql_gen.py: CompareCond.to_str(): Put handling nullable columns as a separate step so it can be expanded: Aaron Marcuse-Kubitza
08:36 PM Revision 2824: csv2db: Errors table: Removed no longer needed sql_gen.EnsureNotNull() because this is now added automatically: Aaron Marcuse-Kubitza
08:35 PM Revision 2823: sql.py: add_index(): Handle nullable columns by using sql_gen.ensure_not_null() to automatically add a coalesce() wrapper where needed: Aaron Marcuse-Kubitza
08:33 PM Revision 2822: sql_gen.py: Added ensure_not_null(): Aaron Marcuse-Kubitza
08:29 PM Revision 2821: sql.py: DbConn.col_info(): Fixed bug where is_nullable needed to be cast to a boolean: Aaron Marcuse-Kubitza
08:06 PM Revision 2820: sql.py: cast(): Support string column name inputs: Aaron Marcuse-Kubitza
07:12 PM Revision 2819: sql.py: DbConn: Renamed col_default() to col_info() and have it return a sql_gen.TypedCol object containing all the TypedCol info about the column, not just the default value: Aaron Marcuse-Kubitza
06:55 PM Revision 2818: sql_gen.py: TypedCol: Added default and nullable params: Aaron Marcuse-Kubitza
06:53 PM Revision 2817: dicts.py: Import util after items that util depends on have been defined, to avoid unsatisfied circular dependency: Aaron Marcuse-Kubitza
06:30 PM Revision 2816: sql.py: DbConn.col_default(): Pass the connection to sql_gen.as_Code() so it fixes the syntax on values returned by PostgreSQL: Aaron Marcuse-Kubitza
06:29 PM Revision 2815: sql_gen.py: as_Code(): Added optional db param, which causes the function to run db.std_code() on the value to fix the syntax: Aaron Marcuse-Kubitza
06:27 PM Revision 2814: sql.py: DbConn: Added std_code(): Aaron Marcuse-Kubitza
03:02 PM Revision 2813: db_xml.py: Removed into_table_name() because this functionality is now handled by sql.into_table_name(): Aaron Marcuse-Kubitza
02:59 PM Revision 2812: sql.py: into_table_name(): Also parse hierarchical tables (mappings with a rank column) using a special syntax: Aaron Marcuse-Kubitza
02:44 PM Revision 2811: sql.py: put_table(): Fixed bug where distinct_on included columns that were not in the input table, and were thus incorrectly taken from the LEFT JOINed output table: Aaron Marcuse-Kubitza

06/12/2012

10:32 PM Revision 2810: sql.py: track_data_error(): Do nothing if cols are empty, because mk_track_data_error() requires at least one col. mk_track_data_error(): Assert that cols are not empty because VALUES clause requires at least one row.: Aaron Marcuse-Kubitza
10:26 PM Revision 2809: bin/map: by_col: Pass on_error to db_xml.put_table() that calls ex_tracker.track(): Aaron Marcuse-Kubitza
10:14 PM Revision 2808: sql.py: put_table(): No handler for exception: Pass exception to on_error() instead of raising a warning, so that error message can be formatted: Aaron Marcuse-Kubitza
10:08 PM Revision 2807: db_xml.py: put_table(): Pass on_error to sql.put_table(): Aaron Marcuse-Kubitza
10:07 PM Revision 2806: db_xml.py: put_table(): Take on_error param like row-based put(): Aaron Marcuse-Kubitza
10:06 PM Revision 2805: sql.py: put_table(): Take on_error param: Aaron Marcuse-Kubitza
09:48 PM Revision 2804: sql.py: get_cur_query(): Removed no longer used input_params parameter: Aaron Marcuse-Kubitza
09:46 PM Revision 2803: sql.py: Removed unused mogrify(): Aaron Marcuse-Kubitza
09:42 PM Revision 2802: sql.py: DbConn.DbCursor.execute(): Removed no longer used params parameter: Aaron Marcuse-Kubitza
09:32 PM Revision 2801: sql.py: with_autocommit(): Use isolation_level attr and set_isolation_level() method of connection instead of autocommit attr to support older versions of psycopg2: Aaron Marcuse-Kubitza
09:18 PM Revision 2800: sql.py: DbConn.DbCursor.execute(): Only fetch all rows for empty SELECT query, to support older versions of Python that would give a "no results to fetch" error for other types of queries: Aaron Marcuse-Kubitza
09:13 PM Revision 2799: csv2db: When reraising exception, use `raise` instead of `raise e` to preserve whole stack trace: Aaron Marcuse-Kubitza
09:12 PM Revision 2798: sql.py: Removed no longer used _query_lookup(): Aaron Marcuse-Kubitza
09:05 PM Revision 2797: sql.py: DbConn: Cache queries without params, as params are no longer used: Aaron Marcuse-Kubitza
09:03 PM Revision 2796: sql.py: DbConn.is_cached(): Removed no longer used params parameter: Aaron Marcuse-Kubitza
09:01 PM Revision 2795: sql.py: Removed no longer used run_raw_query(): Aaron Marcuse-Kubitza
09:00 PM Revision 2794: sql.py: run_query(): Call db.run_query() directly instead of via run_raw_query(): Aaron Marcuse-Kubitza
08:56 PM Revision 2793: sql.py: DbConn.run_query(): Removed no longer used params parameter: Aaron Marcuse-Kubitza
08:50 PM Revision 2792: sql.py: DbConn._db(): Setting search_path: Use esc_value() instead of params: Aaron Marcuse-Kubitza
08:43 PM Revision 2791: sql.py: run_query(): Removed no longer used params parameter: Aaron Marcuse-Kubitza
08:37 PM Revision 2790: sql.py: run_query_into(): Moved main case (into != None) outside of if statement because the special-case if statement contains `return`: Aaron Marcuse-Kubitza
08:35 PM Revision 2789: sql.py: run_query_into(): Removed no longer used params parameter: Aaron Marcuse-Kubitza
08:32 PM Revision 2788: sql.py: mk_insert_select(): Removed no longer used params parameter: Aaron Marcuse-Kubitza
08:25 PM Revision 2787: sql.py: mk_insert_select(): Return just the query instead of the query plus empty params: Aaron Marcuse-Kubitza
08:22 PM Revision 2786: sql.py: mk_select(): Return just the query instead of the query plus empty params: Aaron Marcuse-Kubitza
08:12 PM Revision 2785: sql.py: tables(): Use select() instead of a custom run_query() to avoid using params, which will be deprecated to make it easier to support old versions of Python: Aaron Marcuse-Kubitza
08:11 PM Revision 2784: sql.py: DbConn.DbCursor.execute(): Require that params are empty, to ensure that code uses db.esc_value() instead. This keeps literal values in the same place as the rest of the query, so that they do not need to be maintained and passed around separately in a params list.: Aaron Marcuse-Kubitza
08:06 PM Revision 2783: sql.py: constraint_cols(): Use db.esc_value() instead of params: Aaron Marcuse-Kubitza
08:05 PM Revision 2782: sql.py: index_cols(): Use db.esc_value() instead of params: Aaron Marcuse-Kubitza
08:00 PM Revision 2781: sql.py: add_pkey(): Use simpler `ADD PRIMARY KEY` syntax to avoid having to create a name for the primary key: Aaron Marcuse-Kubitza
07:57 PM Revision 2780: db_xml.py: put_table(): Subsetting in_table: Add pkey to created temp table to facilitate joining it with intermediate tables: Aaron Marcuse-Kubitza
07:47 PM Revision 2779: sql.py: tables(): Use select() instead of a custom run_query() to avoid using params, which will be deprecated to make it easier to support old versions of Python: Aaron Marcuse-Kubitza
07:14 PM Revision 2778: schemas/postgresql.nimoy.conf: shared_buffers: Fixed syntax error where decimals were not supported: Aaron Marcuse-Kubitza
07:03 PM Revision 2777: sql.py: truncate(): Re-added support for string tables using sql_gen.as_Table(). This fixes empty_db(), which relied on this functionality.: Aaron Marcuse-Kubitza
07:01 PM Revision 2776: sql_gen.py: as_Table(): Added schema param to use as default schema: Aaron Marcuse-Kubitza
06:54 PM Revision 2775: inputs/SALVIAS: Switched to using CSV exports of the DB, so that staging tables could be created for column-based import: Aaron Marcuse-Kubitza
06:48 PM Revision 2774: sql.py: run_query_into(): Added add_indexes_ param which causes the function to add indexes on the created table: Aaron Marcuse-Kubitza
06:18 PM Revision 2773: sql.py: create_table(): Use new add_indexes(): Aaron Marcuse-Kubitza
06:15 PM Revision 2772: sql.py: Added add_indexes(): Aaron Marcuse-Kubitza
05:38 PM Revision 2771: sql.py: get_cur_query(): Fixed bug where strings.ustr() needed to be used instead of str() when ensuring that get_cur_query() returns a string: Aaron Marcuse-Kubitza
05:35 PM Revision 2770: sql.py: cast(): Removed conditional checks for save_errors, since it's now always true if the function got passed the `not save_errors` special case: Aaron Marcuse-Kubitza
05:32 PM Revision 2769: sql.py: cast(): Only convert errors to warnings if errors will be saved in errors_table, so that import will always be aborted if user supplied invalid values in the mappings, even if these values are passed through a relational function: Aaron Marcuse-Kubitza
04:58 PM Revision 2768: sql.py: put_table(): Support inserting tables with all default values, by providing the pkey's default value for all rows so that the SELECT query has at least one column: Aaron Marcuse-Kubitza
04:57 PM Revision 2767: sql_gen.py: is_table_col(): Check that input is a Col object: Aaron Marcuse-Kubitza
04:31 PM Revision 2766: sql.py: put_table(): Assert that mapping is non-empty: Aaron Marcuse-Kubitza
04:23 PM Revision 2765: sql.py: mk_select(): Assert that fields list is non-empty: Aaron Marcuse-Kubitza
04:18 PM Revision 2764: sql.py: DbConn.DbCursor.execute(): Set _is_insert only if query *starts with* INSERT, so that function definitions containing INSERT are not cached as INSERT statements (exceptions only) themselves: Aaron Marcuse-Kubitza
04:10 PM Revision 2763: sql.py: DbConn.DbCursor.execute(): Fixed bug where params == None would apparently turn off the mogrifier completely, causing "%"s to be excessively escaped, by just setting params to None if it was [] or () and not using strings.esc_for_mogrify() at all: Aaron Marcuse-Kubitza
03:51 PM Revision 2762: sql.py: DbConn.DbCursor.execute(): If not using params, escape the query using strings.esc_for_mogrify() in case any literals contained "%"s: Aaron Marcuse-Kubitza
03:49 PM Revision 2761: strings.py: Added esc_for_mogrify(): Aaron Marcuse-Kubitza
03:05 PM Revision 2760: sql.py: create_table(): Add indexes on all non-pkey columns, unless turned off or deferred using new param col_indexes: Aaron Marcuse-Kubitza
02:46 PM Revision 2759: csv2db: Add column indexes on errors table. Use typed_cols and `.to_Col()` to iterate over columns to add indexes on, for the main and errors tables.: Aaron Marcuse-Kubitza

06/11/2012

11:13 PM Revision 2758: sql.py: Added track_data_error(). put_table(): ignore(): Take extra e param for the exception. Use track_data_error() to store the invalid value in the errors table.: Aaron Marcuse-Kubitza
11:11 PM Revision 2757: sql_gen.py: Join.to_str(): Add newline before and after right table if it's been renamed (and therefore takes up multiple lines): Aaron Marcuse-Kubitza
11:10 PM Revision 2756: exc.py: ExceptionWithCause: Store the cause in an instance variable for later use: Aaron Marcuse-Kubitza
10:47 PM Revision 2755: sql.py: mk_track_data_error(): Rename the errors_table to make the generated SQL less verbose: Aaron Marcuse-Kubitza
10:42 PM Revision 2754: sql.py: mk_insert_select(): Run sql_gen.remove_table_rename() on table to get just the actual name in the DB: Aaron Marcuse-Kubitza
10:42 PM Revision 2753: sql_gen.py: Added remove_table_rename(): Aaron Marcuse-Kubitza
10:40 PM Revision 2752: sql_gen.py: Col: Run `.to_Table()` on table to get just the reference to the table, not any SQL code that defines it: Aaron Marcuse-Kubitza
10:10 PM Revision 2751: sql.py: Added mk_track_data_error() and use it in cast(). This also ensures that if only one source column's row in the CROSS JOIN violates a unique constraint, other source columns' rows are still inserted.: Aaron Marcuse-Kubitza
09:59 PM Revision 2750: sql_gen.py: with_default_table(): Added overwrite param to overwrite the table (if it isn't a NamedCol): Aaron Marcuse-Kubitza
09:57 PM Revision 2749: sql_gen.py: Join.to_str(): join(): Get just the table name of left_table and right_table using `.to_Table()`. Moved order switching of tables inside join() because the order reversing only applies to an individual condition.: Aaron Marcuse-Kubitza
09:34 PM Revision 2748: sql_gen.py: Renamed set_default_table() to with_default_table() and copy col before modifying it so don't modify input: Aaron Marcuse-Kubitza
09:14 PM Revision 2747: sql_gen.py: Added set_default_table(). as_ValueCond(): Use set_default_table() instead of as_Col() so that any name-only column also gets its table set. Join.to_str(): Parse left side using set_default_table() instead of as_Col() so that any name-only column also gets its table set.: Aaron Marcuse-Kubitza
07:31 PM Revision 2746: sql_gen.py: Join: mapping param defaults to {} for e.g. CROSS JOINs. to_str(): Omit join_cond if mapping is empty, rather than if join is a specific type.: Aaron Marcuse-Kubitza
07:24 PM Revision 2745: sql_gen.py: NamedValues: Change cols to Col objects with the table set to `name`: Aaron Marcuse-Kubitza
07:23 PM Revision 2744: sql_gen.py: Added set_cols_table(): Aaron Marcuse-Kubitza
07:00 PM Revision 2743: sql.py: mk_insert_select(): returning: Use sql_gen.to_name_only_col(): Aaron Marcuse-Kubitza
06:52 PM Revision 2742: sql_gen.py: NamedTable: cols: Use sql_gen.Col objects or name strings instead of pre-rendered SQL code: Aaron Marcuse-Kubitza
06:35 PM Revision 2741: sql_gen.py: NamedTable: Wrap nested code in Expr if needed: Aaron Marcuse-Kubitza
06:27 PM Revision 2740: sql_gen.py: Added NamedValues: Aaron Marcuse-Kubitza
06:04 PM Revision 2739: sql_gen.py: Values: Support multiple rows: Aaron Marcuse-Kubitza
05:54 PM Revision 2738: sql.py: insert(): Use new sql_gen.Values: Aaron Marcuse-Kubitza
05:53 PM Revision 2737: sql_gen.py: Added Values and default: Aaron Marcuse-Kubitza
05:26 PM Revision 2736: sql_gen.py: Join.to_str(): Don't add join condition for CROSS JOINs: Aaron Marcuse-Kubitza
05:03 PM Revision 2735: sql.py: put_table(): Factored out errors_table name setting so it can be used by ignore(): Aaron Marcuse-Kubitza
04:38 PM Revision 2734: bin/map: If doing full import, clear errors table: Aaron Marcuse-Kubitza
04:37 PM Revision 2733: sql.py: truncate(): Support sql_gen.Table objects: Aaron Marcuse-Kubitza
04:21 PM Revision 2732: sql.py: Moved truncate() to Database structure queries section: Aaron Marcuse-Kubitza
04:11 PM Revision 2731: sql.py: tables(): Run query with log_level=4 because it's a low-level structure-determining query: Aaron Marcuse-Kubitza
04:07 PM Revision 2730: sql.py: table_exists(): Use new tables() exact param so that LIKE special chars in table name are not interpreted specially: Aaron Marcuse-Kubitza
04:07 PM Revision 2729: sql.py: tables(): Added exact param to check for exact matches only: Aaron Marcuse-Kubitza
04:03 PM Revision 2728: sql.py: put_table(): MissingCastException: Use new errors_table(): Aaron Marcuse-Kubitza
04:02 PM Revision 2727: csv2db: Use new sql.errors_table(): Aaron Marcuse-Kubitza
04:02 PM Revision 2726: sql.py: Added table_exists() and errors_table(): Aaron Marcuse-Kubitza

06/08/2012

09:59 PM Revision 2725: sql.py: DbConn.print_notices(): Fixed bug where it should not do anything for a MySQL connection, because that doesn't store notices the way Postgres does: Aaron Marcuse-Kubitza
09:54 PM Revision 2724: sql.py: put_table(): MissingCastException: Debug message: Added Redmine formatting: Aaron Marcuse-Kubitza
09:48 PM Revision 2723: schemas/functions.sql, vegbien.sql: Removed no longer needed cast functions, which are now created on the fly by column-based import: Aaron Marcuse-Kubitza
09:45 PM Revision 2722: schemas/functions.sql: _nullIf(): Ignore uncastable value, because a value that's invalid for the given type is still well-defined as not matching the nullif() criterion: Aaron Marcuse-Kubitza
09:25 PM Revision 2721: sql.py: put_table(): MissingCastException: Debug message: Removed "'s" so it wouldn't mess up syntax highlighting when pasting debug output into a SQL file: Aaron Marcuse-Kubitza
09:08 PM Revision 2720: sql.py: cast(): Made errors table also store SQLSTATE in error_code column: Aaron Marcuse-Kubitza
08:40 PM Revision 2719: sql.py: cast(): Documented that the value and error are inserted for *each* source column (hence the CROSS JOIN): Aaron Marcuse-Kubitza
08:39 PM Revision 2718: sql.py: cast(): Version the function name if using an errors table, to avoid collisions with other cast functions when the function name is truncated (or, more rarely, collisions with casts to the same type and on the same input columns but of a different table): Aaron Marcuse-Kubitza
08:25 PM Revision 2717: sql.py: cast(): function_name: Fixed bug where sql_gen.FunctionCall()'s parameters needed to be passed with *args syntax: Aaron Marcuse-Kubitza
08:23 PM Revision 2716: sql.py: mk_flatten_mapping(): Propagate the original columns' sources to the flattened columns so they won't be lost in the flattening: Aaron Marcuse-Kubitza
08:20 PM Revision 2715: sql.py: put_table(): MissingCastException: Use in_tables0's source table to locate the errors table in case in_tables0 has been subset into a temp table (which removes the schema name): Aaron Marcuse-Kubitza
08:19 PM Revision 2714: db_xml.py: put_table(): Track in_table's source so its original schema can be obtained and auxiliary tables located: Aaron Marcuse-Kubitza
08:17 PM Revision 2713: sql_gen.py: Derived.set_srcs(): Added optional overwrite param so that a default srcs value can be set only if one isn't already set: Aaron Marcuse-Kubitza
08:04 PM Revision 2712: sql_gen.py: Made Table a Derived element so that a row-subset temp table could retain the schema of the table it came from, and any auxiliary tables in that schema could be located properly: Aaron Marcuse-Kubitza
08:01 PM Revision 2711: sql_gen.py: Moved srcs-related functionality from Col to new superclass Derived: Aaron Marcuse-Kubitza
07:53 PM Revision 2710: sql.py: cast(): save_errors: Fixed bug where srcs needed to have their names extracted before being wrapped in sql_gen.Literals. Fixed bug where errors table INSERT needed to prefix the CROSS JOIN-ed VALUES statements with SELECT * FROM because the CROSS JOIN makes it a whole SELECT query, not just a VALUES statement.: Aaron Marcuse-Kubitza
07:40 PM Revision 2709: schemas/vegbien.ERD.mwb: Adjusted lines to make less intersections happen on the first page: Aaron Marcuse-Kubitza
07:27 PM Revision 2708: sql.py: put_table(): MissingCastException: Fixed bug where errors_table needed to have the same schema as in_tables0 as well as part of the same name: Aaron Marcuse-Kubitza
07:23 PM Revision 2707: sql_gen.py: Added suffixed_table(): Aaron Marcuse-Kubitza
07:14 PM Revision 2706: sql.py: MissingCastException: Print log message that it's casting the column, to introduce the SQL function definition that follows: Aaron Marcuse-Kubitza
07:06 PM Revision 2705: sql.py: put_table(): MissingCastException: Use new cast() instead of relying on existing cast functions in the database: Aaron Marcuse-Kubitza
07:04 PM Revision 2704: sql.py: Added cast(): Aaron Marcuse-Kubitza
06:00 PM Revision 2703: sql_gen.py: Added wrap() and use it in wrap_in_func(): Aaron Marcuse-Kubitza
05:36 PM Revision 2702: sql.py: put_table(): Track the input column(s) a column is derived from, so that error messages can be attributes to the proper input column(s): Aaron Marcuse-Kubitza
05:35 PM Revision 2701: sql_gen.py: Col: Support tracking the column(s) a column is derived from, so that error messages can be attributes to the proper input column(s): Aaron Marcuse-Kubitza
05:33 PM Revision 2700: objects.py: BasicObject: Allow subclasses to customize which attrs are compared on, by adding _compare_on() method: Aaron Marcuse-Kubitza
04:45 PM Revision 2699: lists.py: uniqify(): Document that it will work on any iterable, not just lists: Aaron Marcuse-Kubitza
04:06 PM Revision 2698: sql.py: mk_insert_select(): embeddable: Use sql_gen.TempFunction and sql_gen.FunctionCall: Aaron Marcuse-Kubitza
04:04 PM Revision 2697: sql_gen.py: Added TempFunction: Aaron Marcuse-Kubitza
03:26 PM Revision 2696: schemas/functions.sql, vegbien.sql: Cast functions: Removed `RETURN new;` at end (artifact of when they were relational functions). Made the EXCEPTION block the main block of the function, to avoid unnecessary nesting.: Aaron Marcuse-Kubitza
02:26 PM Revision 2695: csv2db: Errors table: index_cols: Remove no longer needed sql_gen.Col() (now done by EnsureNotNull): Aaron Marcuse-Kubitza
02:25 PM Revision 2694: sql_gen.py: EnsureNotNull: Run value through as_Col() so FunctionCall won't default it to a Literal: Aaron Marcuse-Kubitza
02:19 PM Revision 2693: csv2db: Use sql_gen.EnsureNotNull instead of the ensure_not_null() function in the functions schema to avoid a dependency on the functions schema, which would cause the UNIQUE index to be dropped whenever the functions schema is reinstalled: Aaron Marcuse-Kubitza
02:18 PM Revision 2692: sql_gen.py: Added EnsureNotNull: Aaron Marcuse-Kubitza
02:17 PM Revision 2691: sql_gen.py: Added InternalFunction: Aaron Marcuse-Kubitza
02:08 PM Revision 2690: sql_gen.py: FunctionCall: Ensure all args are Code objects using as_Value(): Aaron Marcuse-Kubitza
01:51 PM Revision 2689: csv2db: Errors table: Add UNIQUE index on all columns: Aaron Marcuse-Kubitza
01:50 PM Revision 2688: sql.py: add_index(): Support multiple column(s) or expression(s). Support separate table (not extracted from cols). Support UNIQUE indexes.: Aaron Marcuse-Kubitza

06/07/2012

09:41 PM Revision 2687: sql.py: add_index(): Fixed bug where expr needed to be deep copied so that any column nested in it (e.g. inside a FunctionCall) wouldn't be modified when col.table is set to None: Aaron Marcuse-Kubitza
09:35 PM Revision 2686: sql.py: add_pkey(): Support multiple, custom columns: Aaron Marcuse-Kubitza
09:24 PM Revision 2685: csv2db: Vacuum the created table: Aaron Marcuse-Kubitza
09:23 PM Revision 2684: sql.py: Added vacuum(): Aaron Marcuse-Kubitza
09:23 PM Revision 2683: sql.py: DbConn: Added with_autocommit(): Aaron Marcuse-Kubitza
08:58 PM Revision 2682: csv2db: Create errors table for use by column-based import: Aaron Marcuse-Kubitza
08:57 PM Revision 2681: sql.py: create_table(): Added has_pkey param to disable making the first column the primary key: Aaron Marcuse-Kubitza
08:21 PM Revision 2680: csv2db: Use verbosity-based logging like bin/map. Use sql.create_table(). Add indexes on the columns to speed up column-based import and to speed up searching the table for particular values.: Aaron Marcuse-Kubitza
08:00 PM Revision 2679: sql.py: create_table(): Don't add indexes on columns, because that shouldn't happen until after the table's rows have been inserted: Aaron Marcuse-Kubitza
07:55 PM Revision 2678: sql.py: DbConn._db(): Output 'SET TRANSACTION ISOLATION LEVEL SERIALIZABLE' with log_level=4 because that should not be shown when the search_path is shown, which has log_level=3: Aaron Marcuse-Kubitza
07:43 PM Revision 2677: sql.py: cleanup_table(): Use update(), which also fixes some formatting bugs: Aaron Marcuse-Kubitza
07:24 PM Revision 2676: sql.py: DbConn._db(): Output connection configuration statements with log_level=3: Aaron Marcuse-Kubitza
07:15 PM Revision 2675: sql.py: Added create_table(): Aaron Marcuse-Kubitza
07:14 PM Revision 2674: sql_gen.py: Added TypedCol: Aaron Marcuse-Kubitza
06:59 PM Revision 2673: sql.py: insert_select(): Pass log_level to run_query_into(): Aaron Marcuse-Kubitza
05:15 PM Revision 2672: streams.py: LineCountInputStream: Fixed bug where EOF was incorrectly considered a line, causing the final line count (used by ProgressInputStream) to be off by one: Aaron Marcuse-Kubitza
04:25 PM Revision 2671: sql.py: DbConn: Added print_notices() and call it after running a query: Aaron Marcuse-Kubitza
04:12 PM Revision 2670: dicts.py: Added IdCompared: Aaron Marcuse-Kubitza
03:50 PM Revision 2669: lists.py: Added clear(): Aaron Marcuse-Kubitza
03:24 PM Revision 2668: sql.py: DbConn.do_autocommit(): Fixed typo in 'Autocommiting' debug message: Aaron Marcuse-Kubitza
03:20 PM Revision 2667: sql_gen.py: ColDict: Extend dicts.DictProxy instead of UserDict.DictMixin because that already provides some of the functionality needed by ColDict: Aaron Marcuse-Kubitza

06/06/2012

09:46 PM Revision 2666: sql.py: run_query(): Only pass non-None debug_msg_ref to DbConn.run_query() if filtering with log_ignore_excs or can't mogrify() (and therefore can't print queries before they're run): Aaron Marcuse-Kubitza
09:36 PM Revision 2665: sql.py: DbConn: Added can_mogrify() and use it in mogrify(): Aaron Marcuse-Kubitza
09:35 PM Revision 2664: sql.py: DbConn.run_query(): Log query before running if no debug_msg_ref specified. Documented debug_msg_ref param.: Aaron Marcuse-Kubitza
08:43 PM Revision 2663: sql.py: DbConn: Added mogrify() and use it in esc_value(): Aaron Marcuse-Kubitza
07:58 PM Revision 2662: schemas/functions.sql: _nullIf(): Fixed bug where wrong var name was used to retrieve type param. Reenabled _nullIf now that it's working.: Aaron Marcuse-Kubitza
07:25 PM Revision 2661: sql_gen.py: ColDict.__setitem__(): Translate NULL values to the appropriate default value of the key column (which is often NULL, but not always) so that select query filtering/joins work correctly: Aaron Marcuse-Kubitza
07:21 PM Revision 2660: sql.py: DbConn.col_default(): Fixed bug where returned string needed to be wrapped in sql_gen.as_Code() because it's sometimes a SQL expression and sometimes NULL: Aaron Marcuse-Kubitza
07:20 PM Revision 2659: sql_gen.py: Added as_Code(). Split SQL code objects into separate sections so unparameterized classes would be separate from general classes.: Aaron Marcuse-Kubitza
07:08 PM Revision 2658: sql_gen.py: Code.to_str(): Fixed bug where needed to raise NotImplementedError, not NotImplemented: Aaron Marcuse-Kubitza
06:58 PM Revision 2657: dicts.py: is_dict(): Fixed bug where lists also have __getitem__() methods, so keys() was checked for instead: Aaron Marcuse-Kubitza
06:44 PM Revision 2656: sql.py: put_table(): Fixed bug where mapping was still a plain dict because that's what dicts.join() returns, by moving the wrapping of it in a sql_gen.ColDict after dicts.join(): Aaron Marcuse-Kubitza
06:28 PM Revision 2655: sql_gen.py: ColDict: Fixed bug where copy() needed to be implemented: Aaron Marcuse-Kubitza
06:14 PM Revision 2654: sql.py: into_table_name(): Fixed bug where key needed to be passed through str() now that ColDict uses Col objects for everything: Aaron Marcuse-Kubitza
06:11 PM Revision 2653: sql_gen.py: ColDict: Extend UserDict.DictMixin instead of dict because its non-core function implementations route all inner dict accesses to the core functions __getitem__() and __setitem__(). sql.py: put(): DuplicateKeyException: Wrap util.dict_subset_right_join() in a sql_gen.ColDict because the dict returned by util.dict_subset_right_join() is just a plain dict. (This change must happen at the same time because the previous functionality relied on a bug in ColDict.): Aaron Marcuse-Kubitza
06:04 PM Revision 2652: util.py: DefaultDict: Use dicts.DictProxy instead of collections.defaultdict so that it provides a view of the given dict instead of copying it: Aaron Marcuse-Kubitza
06:03 PM Revision 2651: dicts.py: Added DictProxy: Aaron Marcuse-Kubitza
05:37 PM Revision 2650: sql.py: mk_select(): conds: Support dict-like objects other than dict itself (such as sql_gen.ColDict): Aaron Marcuse-Kubitza
05:16 PM Revision 2649: dicts.py: Added is_dict(): Aaron Marcuse-Kubitza
05:06 PM Revision 2648: sql.py: DbConn.DbCursor.execute(): When setting self.query, provide default values for query, params because some debug printing statements use self.query directly instead of _add_cursor_info(): Aaron Marcuse-Kubitza
05:01 PM Revision 2647: sql.py: insert(): Fixed bug where column remaing needed to be removed and sql_gen.Literal value unwrapped: Aaron Marcuse-Kubitza
03:40 PM Revision 2646: sql.py: mk_insert_select(): cols: Fixed bug where table name needed to be removed from columns with a table: Aaron Marcuse-Kubitza
03:40 PM Revision 2645: sql_gen.py: ColDict: Only allow dict inputs to constructor so update() can later be used. Ensure keys_table is a Table object.: Aaron Marcuse-Kubitza
02:55 PM Revision 2644: sql_gen.py: ColDict: Translate NULL values to the key column's actual default value. This avoids incorrectly comparing against NULL in join/filter conditions when the column's actual default value must be used.: Aaron Marcuse-Kubitza
02:52 PM Revision 2643: sql.py: DbConn: Added col_default(): Aaron Marcuse-Kubitza
02:52 PM Revision 2642: sql_gen.py: ColDict: Take a db param for later use in getting default column values: Aaron Marcuse-Kubitza
02:16 PM Revision 2641: sql_gen.py: ColDict: Take a keys_table param for operations that require the output table. Use this to convert all string keys to sql_gen.Col objects.: Aaron Marcuse-Kubitza
01:55 PM Revision 2640: sql.py: put(): Use a sql_gen.ColDict for row, just like in put_table(), so that put() will benefit from planned enhancements to sql_gen.ColDict, such as translating NULL to the default value of a NOT NULL column: Aaron Marcuse-Kubitza
01:38 PM Revision 2639: schemas/functions.sql: _nullIf: Fixed bug where NOT NULL parameters were not supported, because an input NULL value would not match an existing DEFAULT value in a select query, by temporarily disabling _nullIf until this can be supported. Removed previous workarounds.: Aaron Marcuse-Kubitza

06/05/2012

03:21 PM Revision 2638: bin/map: out_is_db, row-based mode: Disabled using DB relational functions instead of XML functions because they were causing problems: Aaron Marcuse-Kubitza
03:18 PM Revision 2637: sql.py: put(): DuplicateKeyException: Fixed bug where select() call needs to be surrounded by value_or_none(), not value(), because sometimes there is no entry: Aaron Marcuse-Kubitza
03:17 PM Revision 2636: schemas/functions.sql: Added indexes on all relational function parameters since the unique index will not be used for lookups: Aaron Marcuse-Kubitza
02:29 PM Revision 2635: schemas/py_functions.sql: Added _namePart relational function: Aaron Marcuse-Kubitza
01:06 PM Revision 2634: schemas/py_functions.sql: Added _dateRangeEnd relational function: Aaron Marcuse-Kubitza
12:59 PM Revision 2633: main Makefile: schemas/%/uninstall: Removed delete confirmation message because it's repetitive to keep being asked to confirm deletion when running schemas/reinstall. schemas/%/uninstall is not intended to be run by itself, so the public schema's delete confirmation message will protect the entire schemas/uninstall sequence.: Aaron Marcuse-Kubitza
12:55 PM Revision 2632: schemas/py_functions.sql: _dateRangeStart(): Fixed bug where needed to get the first element of the array returned by parse_date_range(): Aaron Marcuse-Kubitza
12:32 PM Revision 2631: main Makefile: VegBIEN DB: Schemas: Changed to only allow reinstalling all schemas at once (using the usual `make schemas/reinstall` command), because reinstalling just one schema would silently remove anything in other schemas that depends on that schema: Aaron Marcuse-Kubitza
11:52 AM Revision 2630: schemas/py_functions.sql: Added _dateRangeStart relational function: Aaron Marcuse-Kubitza
10:53 AM Revision 2629: vegbien_dest: $schemas: Added py_functions: Aaron Marcuse-Kubitza
10:49 AM Revision 2628: pg_dump_vegbien: Fixed bug where needed space before $noOwners options: Aaron Marcuse-Kubitza
10:37 AM Revision 2627: main Makefile: VegBIEN DB: DB and bien user: mk_db: Also `createlang plpythonu`: Aaron Marcuse-Kubitza
10:26 AM Revision 2626: pg_dump_vegbien: Log the pg_dump command being run: Aaron Marcuse-Kubitza
10:23 AM Revision 2625: main Makefile: VegBIEN DB: DB and bien user: Added schemas/py_functions/reset. db: Create py_functions schema.: Aaron Marcuse-Kubitza
10:22 AM Revision 2624: main Makefile: VegBIEN DB: DB and bien user: Added schemas/py_functions/reset. db: Create py_functions schema.: Aaron Marcuse-Kubitza
10:16 AM Revision 2623: schemas/py_functions.sql.make: Fixed bug where owners needed to be included because schema is imported as superuser so that untrusted PL/Python functions can be created: Aaron Marcuse-Kubitza
10:15 AM Revision 2622: pg_dump_vegbien: Support optionally including owners: Aaron Marcuse-Kubitza
09:59 AM Revision 2621: main Makefile: VegBIEN DB: DB and bien user: Factored $(confirmRm<schema>) functions message text out into $(confirmRmSchema) function: Aaron Marcuse-Kubitza
09:52 AM Revision 2620: schemas/Makefile, py_functions.sql.make: Generate py_functions.sql from vegbien's py_functions schema: Aaron Marcuse-Kubitza
09:32 AM Revision 2619: main Makefile: postgres-Linux: Install postgresql-plpython: Aaron Marcuse-Kubitza
09:27 AM Revision 2618: main Makefile: python-Linux, postgres-Linux: Fixed bug where apt-get installs needed to each be run in a separate command, so that if any package was not found, the other packages would still install. (apt-get aborts on the first invalid package name.): Aaron Marcuse-Kubitza
09:18 AM Revision 2617: db_dump_localize: Use new pg_version: Aaron Marcuse-Kubitza
09:18 AM Revision 2616: Added pg_version: Aaron Marcuse-Kubitza
08:05 AM Revision 2615: sql.py: into_table_name(): If relational function has a value argument, don't include other arguments, to save space: Aaron Marcuse-Kubitza
08:03 AM Revision 2614: sql.py: add_pkey(): Version the index name just in case add_suffix() doesn't correctly preserve a needed version #: Aaron Marcuse-Kubitza
08:01 AM Revision 2613: sql_gen.py: add_suffix(): Fixed bug where only strings already at the max length had the version preserved, even though appending the suffix could bring it past the max length and still cause the version to be overwritten. Fixed bug where last # in str, not first, should be considered to precede the version.: Aaron Marcuse-Kubitza
07:46 AM Revision 2612: sql.py: put_table(): mapping param: Fixed documentation of supported key/value types: Aaron Marcuse-Kubitza
07:09 AM Revision 2611: db_xml.py: put_table(): Removed no longer accurate comment about handling _simplifyPath: Aaron Marcuse-Kubitza
07:01 AM Revision 2610: schemas/functions.sql: Added _nullIf relational function: Aaron Marcuse-Kubitza
06:39 AM Revision 2609: sql_gen.py: add_suffix(): Preserve version so that it won't be truncated off the string, leading to collisions: Aaron Marcuse-Kubitza

06/04/2012

03:35 PM Revision 2608: sql_gen.py: identifier_max_len: Fixed bug where PostgreSQL's max length was actually 63, not 64: Aaron Marcuse-Kubitza
03:18 PM Revision 2607: schemas/functions.sql: _label(): Fixed bug where some Python syntax had not been translated to PostgreSQL: Aaron Marcuse-Kubitza
03:07 PM Revision 2606: schemas/functions.sql: Added _label relational function: Aaron Marcuse-Kubitza
03:06 PM Revision 2605: db_xml.py: put_table(): Subsetting in_table: Fixed bug where in_table was not being ordered by the row_num, because order_by was set to None when it should have been omitted so it would default to the pkey: Aaron Marcuse-Kubitza
02:51 PM Revision 2604: csv2db: Increased frequency of "Processed .. row(s)" messages to match slower, more common INSERT case instead of faster, less used COPY FROM case: Aaron Marcuse-Kubitza
02:40 PM Revision 2603: schemas/functions.sql: _merge(): Fixed bug where values were ordered by value instead of by sort order (column name): Aaron Marcuse-Kubitza
02:17 PM Revision 2602: xml_func.py: process(): Refactored to emphasize special handling for row-based and column-based modes. In row-based mode, always use a DB relational function over a local XML function when possible, to faciliate testing of DB relational functions in row-based mode. (The shadowed local XML version will still be tested in non-DB modes, such as outputting to intermediate XML files.): Aaron Marcuse-Kubitza
01:01 PM Revision 2601: bin/map: Move retrieval of out_db's relational functions outside of process_input() so they can also be used by the non-by_col case: Aaron Marcuse-Kubitza
12:52 PM Revision 2600: bin/map: out_is_db: Don't evaluate relational functions in xml_func.process() because these will be evaluated by db_xml.put(): Aaron Marcuse-Kubitza
12:41 PM Revision 2599: xml_func.py: Removed no longer used strip(): Aaron Marcuse-Kubitza
12:40 PM Revision 2598: bin/map: Use xml_func.process(..., strip=True) instead of xml_func.strip(): Aaron Marcuse-Kubitza
12:39 PM Revision 2597: xml_func.py: process(): Added strip()'s functionality via strip option: Aaron Marcuse-Kubitza
12:10 PM Revision 2596: schemas/functions.sql: Added _merge relational function: Aaron Marcuse-Kubitza
11:48 AM Revision 2595: schemas/functions.sql: Added join_strs() aggregate: Aaron Marcuse-Kubitza
10:21 AM Revision 2594: sql.py: Renamed index_pkey() to add_pkey() to be consistent with add_index(): Aaron Marcuse-Kubitza
10:07 AM Revision 2593: sql.py: into_table_name(): In function args, omit column name for function result columns: Aaron Marcuse-Kubitza
09:57 AM Revision 2592: sql.py: into_table_name(): In function args, keep the input table name for input columns to identify where they came from, except for the *main* input table name because it makes the string too long: Aaron Marcuse-Kubitza
09:22 AM Revision 2591: sql_gen.py: esc_name(): Don't return plain name if is_safe_name(), because this makes the SQL inconsistent when some names have "_"s and some don't: Aaron Marcuse-Kubitza
09:17 AM Revision 2590: sql.py: index_pkey(): Use sql_gen.add_suffix() to ensure index name isn't too long: Aaron Marcuse-Kubitza
09:15 AM Revision 2589: sql.py: put_table(): insert_out_pkeys, insert_in_pkeys: Use sql_gen.add_suffix() to ensure name isn't too long: Aaron Marcuse-Kubitza
09:07 AM Revision 2588: sql.py: next_version(): Use new sql_gen.add_suffix(). Removed identifier_max_len because it is now in sql_gen.: Aaron Marcuse-Kubitza
09:07 AM Revision 2587: sql_gen.py: Added identifier_max_len and add_suffix(): Aaron Marcuse-Kubitza
09:04 AM Revision 2586: next_version(): Append the version # so it looks more natural. Take into account the max identifier length.: Aaron Marcuse-Kubitza
09:03 AM Revision 2585: strings.py: Added add_suffix(): Aaron Marcuse-Kubitza
08:51 AM Revision 2584: sql.py: put_table(): Name the in_table just "in" plus the version #, and the insert_in_pkeys/insert_out_pkeys based on in_table, so that they don't take up so much space in the SQL: Aaron Marcuse-Kubitza
08:50 AM Revision 2583: sql_gen.py: is_safe_name(): Fixed bug where keywords were incorrectly considered safe: Aaron Marcuse-Kubitza
08:40 AM Revision 2582: strings.py: repr_no_u(): Fixed bug where "u" prefix was removed even in reprs of non-strings: Aaron Marcuse-Kubitza
08:32 AM Revision 2581: db_xml.py: into_table_name(): Removed no longer necessary handling of simple functions, which is now done by sql.into_table_name(). Ensure that rank params in functions (not tables) are not treated specially as hierarchical.: Aaron Marcuse-Kubitza
08:21 AM Revision 2580: sql.py: put_table(): If into == None: For function calls, include the arguments in the into table name: Aaron Marcuse-Kubitza
08:17 AM Revision 2579: sql_gen.py: to_name_only_col(): Support non-Col Code inputs: Aaron Marcuse-Kubitza
07:42 AM Revision 2578: sql_gen.py CompareCond.to_str(), callers of combine_conds(): Removed unnecessary grouping () to make SQL clearer: Aaron Marcuse-Kubitza
07:31 AM Revision 2577: sql_gen.py: Added combine_conds() and use it in Join.to_str() and sql.py mk_select(): Aaron Marcuse-Kubitza
07:18 AM Revision 2576: sql_gen.py Join.to_str(), sql.py mk_select(): Combining conditions: Don't add newlines where not needed, so that output is less vertically spread out: Aaron Marcuse-Kubitza
07:10 AM Revision 2575: sql_gen.py: is_safe_name(): Fixed bug where names starting with a digit were incorrectly considered safe: Aaron Marcuse-Kubitza
07:06 AM Revision 2574: sql.py: put_table(): Separate temp table names from into table name with "_" instead of "-" so that quoting the table name will usually be unnecessary: Aaron Marcuse-Kubitza
07:03 AM Revision 2573: sql.py: esc_name_by_module(): Remove unused param ignore_case: Aaron Marcuse-Kubitza
06:59 AM Revision 2572: sql_gen.py: esc_name(): If is_safe_name(), just return name, to avoid escessive escaping in debug output for Redmine: Aaron Marcuse-Kubitza
06:55 AM Revision 2571: sql_gen.py: is_safe_name(): Don't consider uppercase letters safe because they would cause inconsistent behavior in PostgreSQL if quoted vs. not quoted (only unquoted identifiers are case-insensitive): Aaron Marcuse-Kubitza
06:51 AM Revision 2570: sql.py: Removed no longer needed check_name(): Aaron Marcuse-Kubitza
06:50 AM Revision 2569: sql.py: esc_name_by_module(): psycopg2: If ignore_case is set but name is unsafe, just escape it instead of raising an exception: Aaron Marcuse-Kubitza
06:49 AM Revision 2568: sql_gen.py: Added is_safe_name(): Aaron Marcuse-Kubitza
06:39 AM Revision 2567: sql.py: put_table(): col_ustr(): Removed no longer needed sql_gen.as_Col() because mapping and join_cols now ensure that their contents are sql_gen.Col objects: Aaron Marcuse-Kubitza

06/01/2012

08:29 PM Revision 2566: schemas/functions.sql: Added _alt relational function: Aaron Marcuse-Kubitza
08:28 PM Revision 2565: sql.py: put_table(): Make mapping and join_cols a sql_gen.ColDict so that literal values will always be turned into sql_gen.Col objects. DuplicateKeyException: Use dict_subset_right_join() instead of dict_subset() so that all columns in a constraint are included in joins on out_table (such as for a relational function with omitted arguments).: Aaron Marcuse-Kubitza
08:25 PM Revision 2564: sql_gen.py: Added ColDict: Aaron Marcuse-Kubitza
08:19 PM Revision 2563: sql_gen.py: as_Col(): Added optional name param to specify that non-Col input will be renamed using NamedCol with the given name: Aaron Marcuse-Kubitza
07:06 PM Revision 2562: sql.py: put_table(): FunctionValueException: Fixed bug where only function calls, not plain columns, were handled, by using sql_gen.unwrap_func_call() to remove any function call *only if* there was one: Aaron Marcuse-Kubitza
07:04 PM Revision 2561: sql_gen.py: Added unwrap_func_call(): Aaron Marcuse-Kubitza
06:47 PM Revision 2560: bin/map: by_col: Stripping XML functions not in the DB: Fixed bug where preserve_funcs.add() was used when `preserve_funcs |=` should have been used to add the entire iterable that sql.tables() returns: Aaron Marcuse-Kubitza
06:45 PM Revision 2559: sql.py: not_null_col: Changed value to 'not_null_col' so that column doesn't seem like a status indicator of whether some value is not null (in fact it's just a column that is always not null): Aaron Marcuse-Kubitza
06:05 PM Revision 2558: xml_func.py: Replaced xpath.get_1() with xpath.get_value() where possible, for simplicity: Aaron Marcuse-Kubitza
05:59 PM Revision 2557: xml_func.py: strip(): Evaluate structural functions like _ignore and _ref by process() instead of removing them. Store structural functions' names in structural_funcs module var. This ensures that _ref targets are still expanded in column-based import.: Aaron Marcuse-Kubitza
05:56 PM Revision 2556: xpath.py: get(): Create attrs: Put keys last so that any lookahead assertion's path will be created last as it would have without the assertion. This ensures that any value argument of an XML function will always go last even if a lookahead assertion would otherwise have caused it to be created with the element's keys, which previously were created before the attributes.: Aaron Marcuse-Kubitza
04:55 PM Revision 2555: sql.py: put_table(): If is_func, default into table name ends in () instead of '-pkeys': Aaron Marcuse-Kubitza
04:54 PM Revision 2554: schemas/vegbien.sql, functions.sql: Made cast functions STRICT to enable the RETURNS NULL ON NULL INPUT optimization: Aaron Marcuse-Kubitza
04:33 PM Revision 2553: db_xml.py: put_table(): Pass is_func to sql.put_table(): Aaron Marcuse-Kubitza
04:32 PM Revision 2552: sql.py: put_table(): Added is_func param for whether out_table is the name of a SQL function, not a table: Aaron Marcuse-Kubitza
04:09 PM Revision 2551: db_xml.py: put_table(): Treat every node name that starts with "_" as a function, not just members of put_table_special_funcs. This ensures that DB function args are always treated as values, not children with fkeys to parent.: Aaron Marcuse-Kubitza
03:40 PM Revision 2550: bin/map: by_col: Strip only XML functions that are not in the DB: Aaron Marcuse-Kubitza
03:39 PM Revision 2549: db_xml.py: put_table(): Make special_funcs externally available as module constant put_table_special_funcs: Aaron Marcuse-Kubitza
03:38 PM Revision 2548: sql.py: tables(): Changed schema param to schema_like and filter the schema using LIKE so that all schemas can be selected: Aaron Marcuse-Kubitza
03:15 PM Task #343: integrate TNRS into VegBIEN: This involves translating existing validation utilities to Python/Postgres Aaron Marcuse-Kubitza
03:14 PM Task #427 (Resolved): Load all plots data: Aaron Marcuse-Kubitza
03:14 PM Task #426 (Resolved): Reload DB using column-based import: Aaron Marcuse-Kubitza
03:14 PM Task #425 (Resolved): Reimplement row-based logging mechanism for column-based import: Aaron Marcuse-Kubitza
03:13 PM Task #424 (New): Finish translating XML functions to SQL functions for column-based import: * -_map-
* _range*
* -_avg-
* _compass
* a few others Aaron Marcuse-Kubitza
03:12 PM Task #386: load Canadensys data: Half of the Canadensys datasources have been loaded Aaron Marcuse-Kubitza
03:12 PM Task #325 (Resolved): map CTFS data: See "*CTFS VegX map spreadsheet*":https://projects.nceas.ucsb.edu/nceas/projects/bien/repository/raw/inputs/CTFS/maps... Aaron Marcuse-Kubitza
03:09 PM Task #366: refactor VegX: See Format section of [[VegX]] schema Aaron Marcuse-Kubitza
03:07 PM Task #348 (Resolved): 1st draft of schema: Aaron Marcuse-Kubitza
03:05 PM Task #371 (Resolved): formalize proposed changes to VegX: Aaron Marcuse-Kubitza
01:56 PM Revision 2547: to_do/timeline.doc: Updated to reflect the month we spent on optimization and column-based import: Aaron Marcuse-Kubitza
12:54 PM Revision 2546: sql.py: put_table(): in_table name: Remove '-pkeys' suffix from the into table name before adding '-input' so that the name is shorter and clearer: Aaron Marcuse-Kubitza
12:43 PM Revision 2545: sql.py: put_table(): Wrap repr() calls for debug messages in strings.as_tt() to add Redmine formatting: Aaron Marcuse-Kubitza
12:39 PM Revision 2544: sql.py: put_table(): Output "Adding index" debug message with level=2.5 so it's not part of the Redmine steps: Aaron Marcuse-Kubitza

05/31/2012

03:39 PM Revision 2543: schemas/vegbien.sql, functions.sql: Cast functions: Fixed bug where invalid value exceptions were not being caught, because implicit conversions to the return type apparently only happen outside the block containing the RETURN statement (i.e. at the end of the function). Fixed by adding explicit type conversion to return type, so that type conversion would happen inside try block.: Aaron Marcuse-Kubitza
03:31 PM Revision 2542: sql.py: put_table(): Re-enabled FunctionValueException handling, by just filtering out the value on all input columns that use the named function (since the error message does not specify which column it was that had the invalid value). This is in some ways better, anyway, because that way the invalid value is filtered out right away in all columns that could contain it, instead of potentially once for each column (if the value appears in more than one input column).: Aaron Marcuse-Kubitza
03:18 PM Revision 2541: sql.py: add_index(): Fixed bug where expressions could not be converted to a string until their table name had been removed: Aaron Marcuse-Kubitza
03:17 PM Revision 2540: sql_gen.py: Added Expr: Aaron Marcuse-Kubitza
03:13 PM Revision 2539: sql.py: add_index(): Fixed bug where expressions needed to be enclosed in () to distinguish them from plain columns: Aaron Marcuse-Kubitza
03:06 PM Revision 2538: sql.py: add_index(): Support simple expressions as well as columns: Aaron Marcuse-Kubitza
02:37 PM Revision 2537: sql.py: Renamed index_col() to add_index() so its name isn't similar to index_cols(): Aaron Marcuse-Kubitza
02:33 PM Revision 2536: sql_gen.py: FunctionCall: Removed __repr__() because it's a Code object and its to_str() does not take extra arguments: Aaron Marcuse-Kubitza
02:12 PM Revision 2535: sql.py: run_query(): FunctionValueException: Expanded parsing to include regular function calls, not just relational functions' trigger functions. put_table(): Disabled FunctionValueException handling because this expands FunctionValueException beyond what put_table() could handle.: Aaron Marcuse-Kubitza
01:38 PM Revision 2534: sql.py: put_table(): MissingCastException: Fixed bug where renaming of cast literal value was not properly propagated to the returned value of the function call, causing the query to assume that a DISTINCT ON column referred to column in one of the joined tables instead of a named column in the SELECT columns list. This logic error would have been very difficult to catch without inspecting the code!: Aaron Marcuse-Kubitza
01:33 PM Revision 2533: sql_gen.py: Added wrap_in_func(): Aaron Marcuse-Kubitza
01:25 PM Revision 2532: sql_gen.py: FunctionCall: Filter args through remove_col_rename() to remove any renamings from the function args: Aaron Marcuse-Kubitza
01:20 PM Revision 2531: sql.py: put_table(): No handler for exception: Print full exception instead of just first line to assist in debugging: Aaron Marcuse-Kubitza
01:06 PM Revision 2530: schemas/vegbien.sql, functions.sql: Removed _to* relational functions because type casting for those types is now automatic: Aaron Marcuse-Kubitza
01:02 PM Revision 2529: mappings/DwC2-VegBIEN.specimens.csv: Removed _to* relational functions because type casting for those types is now automatic: Aaron Marcuse-Kubitza
12:59 PM Revision 2528: schemas/functions.sql: Added cast functions for _to* relational functions: Aaron Marcuse-Kubitza
12:58 PM Revision 2527: schemas/vegbien.sql: Changed cast functions' input types to text because type must match exactly, not just be implicitly castable: Aaron Marcuse-Kubitza
12:47 PM Revision 2526: sql.py: run_query(): MissingCastException parsing: Support multiple-word types: Aaron Marcuse-Kubitza
12:38 PM Revision 2525: sql.py: put_table(): Handle MissingCastExceptions by attempting to call a function with the name of the type on the column: Aaron Marcuse-Kubitza
12:33 PM Revision 2524: sql_gen.py: Added Functions section with Function and FunctionCall: Aaron Marcuse-Kubitza
11:56 AM Revision 2523: sql.py: Added MissingCastException and parse it in run_query(): Aaron Marcuse-Kubitza
11:36 AM Revision 2522: schemas/vegbien.sql: Added cast functions for enum types which map invalid values to NULL: Aaron Marcuse-Kubitza
10:57 AM Revision 2521: sql.py: put_table(): Fixed bug where some exceptions with no handler would not even allow insertion of no rows into the out_table (due to type mismatch issues), by creating an empty pkeys table as a special case: Aaron Marcuse-Kubitza
10:49 AM Revision 2520: sql.py: put_table(): Preparing to insert new rows: Fixed bug where main_select needed to be generated *after* distinct_on was set in the if statement: Aaron Marcuse-Kubitza
10:48 AM Revision 2519: sql.py: put_table(): log_exc(): Fixed bug where the exception strings rather than the exceptions themselves needed to be put in the set, because exceptions are not comparable with ==: Aaron Marcuse-Kubitza
10:25 AM Revision 2518: sql.py: put_table(): Moved mk_main_select() call out of try block since it is not related to the exceptions that may be thrown: Aaron Marcuse-Kubitza
10:17 AM Revision 2517: sql.py: put_table(): log_exc(): Check if exception already caught before to avoid infinite loops: Aaron Marcuse-Kubitza
09:35 AM Revision 2516: Added debug2redmine and helper file debug2redmine.csv: Aaron Marcuse-Kubitza
09:20 AM Revision 2515: sql.py, db_xml.py: Removed unnecessary calls to sql_gen.clean_name() now that str() handles this automatically: Aaron Marcuse-Kubitza
09:14 AM Revision 2514: sql_gen.py: sql_gen classes inherit from new base class BasicObject, whose __str__() calls clean_name() on the object's repr(). Changed the main debug-repr producing method to be repr() instead of str().: Aaron Marcuse-Kubitza
08:45 AM Revision 2513: Moved clean_name() from sql.py to sql_gen.py because it's DB-general and so that it can be used by sql_gen.py without circular dependencies: Aaron Marcuse-Kubitza
08:41 AM Revision 2512: db_xml.py: into_table_name(): Handle hierarchical tables specially by including their rank in the into table. Interpret any table with a value column as a function, regardless of out_table name.: Aaron Marcuse-Kubitza

Also available in: Atom