Project

General

Profile

Statistics
| Revision:

# Date Author Comment
3103 06/26/2012 11:06 PM Aaron Marcuse-Kubitza

Moved error tracking from sql.py to sql_io.py

3102 06/26/2012 11:04 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Use sql.distinct_table() to uniquify input table, instead of DISTINCT ON. This avoids letting PostgreSQL create a sort temp table to store the output of the DISTINCT ON, which is not automatically removed until the end of the connection, causing database bloat that can use up the available disk space.

3101 06/26/2012 10:36 PM Aaron Marcuse-Kubitza

sql_gen.py: suffixed_table(): Use concat()

3100 06/26/2012 10:34 PM Aaron Marcuse-Kubitza

sql_gen.py: with_default_table(): Remove no longer used overwrite param

3099 06/26/2012 10:33 PM Aaron Marcuse-Kubitza

sql.py: distinct_table(): Return new table instead of renaming input table so that columns that use input table will continue to work correctly

3098 06/26/2012 10:31 PM Aaron Marcuse-Kubitza

sql_gen.py: Moved NamedCol check from with_default_table() to with_table()

3097 06/26/2012 09:39 PM Aaron Marcuse-Kubitza

sql.py: distinct_table(): Fixed bug where empty distinct_on cols needed to create a table with one sample row, instead of returning the original table, because this indicates that the full set of distinct_on columns are all literal values and should only occur once

3096 06/26/2012 09:12 PM Aaron Marcuse-Kubitza

sql.py: run_query(): DuplicateKeyException: Fixed bug where only constraint names matching a certain format were interpreted as DuplicateKeyExceptions. Support constraint names with the name and table separated by ".", not just "_".

3095 06/26/2012 09:10 PM Aaron Marcuse-Kubitza

sql.py: run_query(): Exception parsing: Match patterns only at the beginning of the exception message to avoid matching embedded messages in causes and literal values

3094 06/26/2012 08:46 PM Aaron Marcuse-Kubitza

sql.py: Added distinct_table()

3093 06/26/2012 08:46 PM Aaron Marcuse-Kubitza

sql_gen.py: Added with_table() and use it in with_default_table()

3092 06/26/2012 07:52 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): ignore mode: Support inserting all columns when cols == None

3091 06/26/2012 07:47 PM Aaron Marcuse-Kubitza

sql_gen.py: Col, Table: Support non-string names

3090 06/26/2012 07:25 PM Aaron Marcuse-Kubitza

sql_gen.py: row_count: Use new all_cols

3089 06/26/2012 07:24 PM Aaron Marcuse-Kubitza

sql_gen.py: Added all_cols

3088 06/26/2012 07:17 PM Aaron Marcuse-Kubitza

sql_gen.py: Use new as_Name() instead of db.esc_name()

3087 06/26/2012 07:12 PM Aaron Marcuse-Kubitza

sql_gen.py: Name: Truncate the input name

3086 06/26/2012 07:11 PM Aaron Marcuse-Kubitza

sql_gen.py: Added Name class and associated functions

3085 06/26/2012 06:46 PM Aaron Marcuse-Kubitza

sql.py: create_table(): Support creating temp tables. This fixes a bug in copy_table_struct() where the created table was not a temp table if the source table was. copy_table_struct(): Removed no longer needed versioning because that is now handled by create_table().

3084 06/26/2012 06:33 PM Aaron Marcuse-Kubitza

sql.py: Added copy_table_struct()

3083 06/26/2012 06:32 PM Aaron Marcuse-Kubitza

sql.py: Moved add_indexes() to Indexes subsection

3082 06/26/2012 06:30 PM Aaron Marcuse-Kubitza

sql.py: create_table(): Support LIKE table

3081 06/26/2012 05:18 PM Aaron Marcuse-Kubitza

Moved Data cleanup from sql.py to sql_io.py

3080 06/26/2012 05:18 PM Aaron Marcuse-Kubitza

Moved error tracking from sql.py to sql_io.py

3079 06/26/2012 05:12 PM Aaron Marcuse-Kubitza

sql.py: Organized Database structure introspection and Structural changes functions into subsections

3078 06/26/2012 04:56 PM Aaron Marcuse-Kubitza

Moved error tracking from sql.py to sql_io.py

3077 06/26/2012 04:46 PM Aaron Marcuse-Kubitza

Moved Heuristic queries from sql.py to new sql_io.py

3076 06/26/2012 04:32 PM Aaron Marcuse-Kubitza

Added top-level analysis dir for range modeling

3075 06/26/2012 04:02 PM Aaron Marcuse-Kubitza

sql.py: run_query_into(): Documented why analyze() must be run manually on newly populated temp tables

3074 06/26/2012 03:57 PM Aaron Marcuse-Kubitza

sql.py: DbConn: Added autoanalyze mode. Added autoanalyze() which runs analyze() only if in autoanalyze mode. Use new autoanalyze() in functions that change a table's contents.

3073 06/26/2012 03:52 PM Aaron Marcuse-Kubitza

sql.py: run_query_into(): analyze() the created table to ensure the query planner's initial stats are accurate

3072 06/25/2012 09:46 PM Aaron Marcuse-Kubitza

inputs/SpeciesLink/src: Added custom header that overwrites existing header so that column names will not be too long for the staging table

3071 06/25/2012 09:35 PM Aaron Marcuse-Kubitza

cat_csv: Support overwriting the existing header using a separate header file

3070 06/25/2012 08:49 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: Added location.location_coords index to speed up large imports by providing an index for merge joins

3069 06/25/2012 08:43 PM Aaron Marcuse-Kubitza

csv2db: Reanalyze table, so that query planner stats are up to date even though the table doesn't need to be vacuumed anymore

3068 06/25/2012 08:42 PM Aaron Marcuse-Kubitza

sql.py: Added analyze()

3067 06/25/2012 08:11 PM Aaron Marcuse-Kubitza

csv2db: Removed no longer needed table vacuum (cleanup_table() now avoids creating dead rows)

3066 06/25/2012 08:10 PM Aaron Marcuse-Kubitza

sql.py: cleanup_table(): Use update()'s new in_place mode to avoid needing to vacuum the table

3065 06/25/2012 08:02 PM Aaron Marcuse-Kubitza

sql.py: mk_update(): in_place: Support updating multiple columns at once

3064 06/25/2012 07:44 PM Aaron Marcuse-Kubitza

sql.py: update() calls: Use in_place where possible to avoid creating dead rows, which bloats table size

3063 06/25/2012 07:37 PM Aaron Marcuse-Kubitza

sql.py: DbConn.col_info(): Support user-defined types

3062 06/25/2012 07:33 PM Aaron Marcuse-Kubitza

sql_gen.py: Added Nullif

3061 06/25/2012 07:27 PM Aaron Marcuse-Kubitza

sql_gen.py: Added Coalesce class and use it in EnsureNotNull

3060 06/25/2012 07:15 PM Aaron Marcuse-Kubitza

sql_gen.py: Added coalesce and use it in EnsureNotNull

3059 06/25/2012 07:00 PM Aaron Marcuse-Kubitza

sql.py: DbConn.col_info(): Don't cache the structure query because the column type, etc. may change between calls. This fixes a bug in mk_update() where the column type would be retrieved before a NOT NULL constraint was added, causing the NOT NULL constraint not to be in the cache info about the column.

3058 06/25/2012 06:56 PM Aaron Marcuse-Kubitza

sql.py: mk_update(): Implemented in_place mode

3057 06/25/2012 06:39 PM Aaron Marcuse-Kubitza

sql.py: mk_update(): Factored out filtering of input values so only `.to_str(db)` is used inline in the creation of the query

3056 06/25/2012 06:32 PM Aaron Marcuse-Kubitza

sql.py: mk_update(): Added in_place param

3055 06/25/2012 06:13 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader: Prevent "new-line character seen in unquoted field" errors by replacing '\r' with '\n'

3054 06/25/2012 06:12 PM Aaron Marcuse-Kubitza

csv2db: Adding indexes: Fixed bug where sql.add_index()'s ensure_not_null param needed to be renamed to ensure_not_null_

3053 06/25/2012 05:16 PM Aaron Marcuse-Kubitza

sql.py: cast(): columns values clause: Use start=0 to avoid "SELECT statement missing a WHERE, LIMIT, or OFFSET clause" warning

3052 06/25/2012 02:45 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Updated for most recent run

3051 06/22/2012 07:46 PM Aaron Marcuse-Kubitza

sql.py: Removed no longer needed mk_track_data_error()

3050 06/22/2012 07:45 PM Aaron Marcuse-Kubitza

sql.py: track_data_error(): Use for loop and insert() (ignoring DuplicateKeyException) to insert entries into the errors table, to get the same optimization benefits this change provides in other filter-out contexts, and to improve clarity

3049 06/22/2012 07:35 PM Aaron Marcuse-Kubitza

sql.py: cast(): Use FOR loop with EXCEPTION block instead of CROSS JOIN with LEFT JOIN to insert entries into the errors table, to get the same optimization benefits this change provides in other filter-out contexts, and to improve clarity

3048 06/22/2012 07:31 PM Aaron Marcuse-Kubitza

sql_gen.py: NamedValues: Support None cols param for no named columns

3047 06/22/2012 06:55 PM Aaron Marcuse-Kubitza

sql.py: ensure_not_null(): Made the use of index columns configurable based on the # of rows in the table, because for small datasources, they seem to add 6-25% to the total import time

3046 06/22/2012 06:12 PM Aaron Marcuse-Kubitza

xml_func.py: _noCV: Fixed bug where assumed items was an iterator when it's now a list

3045 06/22/2012 06:02 PM Aaron Marcuse-Kubitza

sql.py: add_index_col(), cast_temp_col(): Cache add_col() by providing new comment param to distinguish new columns of the same (colliding) suffixed name but from different source columns

3044 06/22/2012 05:54 PM Aaron Marcuse-Kubitza

sql.py: add_index_col(), cast_temp_col(): Cache the update that fills in the new column, since it's idempotent

3043 06/22/2012 05:52 PM Aaron Marcuse-Kubitza

sql.py: update(): Pass cacheable to run_query()

3042 06/22/2012 05:48 PM Aaron Marcuse-Kubitza

sql.py: add_col(): Added comment param which can be used to distinguish columns of the same name from each other when they contain different data, to allow the ADD COLUMN query (and related queries, such as adding indexes) to be cached

3041 06/22/2012 05:46 PM Aaron Marcuse-Kubitza

sql_gen.py: Added esc_comment()

3040 06/22/2012 05:37 PM Aaron Marcuse-Kubitza

sql.py: DbConn.DbCursor.execute(): Allow ADD COLUMN to be cached if it has a distinguishing comment, because then the rest of query will be unique in the face of name collisions

3039 06/22/2012 05:29 PM Aaron Marcuse-Kubitza

sql.py: add_col(): Document that name may be versioned, so caller needs to propagate any renaming back to any source column for the TypedCol

3038 06/22/2012 05:27 PM Aaron Marcuse-Kubitza

sql.py: add_col() callers: Removed column name versioning because that is now handled by add_col()

3037 06/22/2012 05:22 PM Aaron Marcuse-Kubitza

sql.py: add_col() callers: Fixed bug where needed to propagate any renaming of typed column back to regular column

3036 06/22/2012 05:19 PM Aaron Marcuse-Kubitza

sql.py: add_col(): Version column names to avoid collisions. (Previously, callers were required to do this themselves.)

3035 06/22/2012 05:13 PM Aaron Marcuse-Kubitza

sql.py: cast_temp_col(): Handle column name collisions like add_index_col()

3034 06/22/2012 04:49 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Switched to using FOR loop rather than cursors because cursors are only needed if you want to process multiple rows in the same EXCEPTION block (which you can't do because then all the previous inserts will be rolled back if one row is a duplicate key)

3033 06/22/2012 04:40 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Moved FETCH FROM cursor outside EXCEPTION block since only the insert needs to have unique_violations ignored

3032 06/22/2012 04:33 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Removed no longer needed EXCLUSIVE lock on the output table

3031 06/22/2012 04:25 PM Aaron Marcuse-Kubitza

add_index_col(): Run all operations with log_level=3 because it is the equivalent of creating an index, and that has log_level=3 to be excluded from the primary algorithm steps. add_not_null(): Run query with log_level=3 because it's an internal operation and for add_index_col() above.

3030 06/22/2012 04:23 PM Aaron Marcuse-Kubitza

sql.py: update(): Pass any log_level to run_query()

3029 06/22/2012 04:11 PM Aaron Marcuse-Kubitza

xml_func.py: process(): Added pass-through optimization for aggregating functions with one arg

3028 06/22/2012 04:02 PM Aaron Marcuse-Kubitza

xml_func.py: process(): Call list() on the items' iterator so that its length can be determined wherever it's used without needing to call list() on it separately

3027 06/22/2012 03:43 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): Put '*' on same line as 'SELECT' if there is no DISTINCT ON, to avoid extra newlines

3026 06/22/2012 03:38 PM Aaron Marcuse-Kubitza

sql_gen.py: NamedTable.to_str(): Put AS clause on same line as code being named if it didn't contain a newline, to avoid extra newlines

3025 06/22/2012 03:29 PM Aaron Marcuse-Kubitza

sql.py: run_query(): Always parse exceptions, whether recover is set or not, to avoid making the caller set recover just to parse exceptions. If recover is not set, just don't run any queries when generating the parsed exception and return an empty value for the needed information. (A partial parsed exception is better than an unparsed one.)

3024 06/22/2012 03:18 PM Aaron Marcuse-Kubitza

sql_gen.py: is_temp_col(): Use is_table_col() so that the type of the parameter is also checked

3023 06/22/2012 03:15 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Bolded the ms/row totals. Added note that non-bolded totals are not directly comparable because different methods were run with different numbers of rows.

3022 06/22/2012 03:10 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Inserting new rows: Removed no longer needed filter-out LEFT JOIN, since the INSERT IGNORE replacement handles duplicate keys

3021 06/22/2012 03:02 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Inserting new rows: Use insert_select()'s (mk_insert_select()'s) new ignore option to ignore duplicate keys

3020 06/22/2012 02:59 PM Aaron Marcuse-Kubitza

sql_gen.py: to_name_only_col(): Consider any non-table column, including NamedCol, to be already name-only. This fixes a bug in sql.mk_insert_select() where the value of a NamedCol was removed by to_name_only_col() even though it was needed.

3019 06/22/2012 02:30 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Use an EXCEPTION block for each individual row because "When an error is caught by an EXCEPTION clause, [...] all changes to persistent database state within the block are rolled back." (http://www.postgresql.org/docs/8.3/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING). Documented that cursor stays at current position in spite of automatic ROLLBACK on EXCEPTION.

3018 06/22/2012 02:11 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Use RETURN QUERY on the insert query instead of RETURN NEXT on a local var to forward the RETURNING value

3017 06/22/2012 02:08 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Support RETURNING clause. Always return something to set the correct rowcount.

3016 06/22/2012 02:06 PM Aaron Marcuse-Kubitza

sql_gen.py: Use an as_*() function instead of manually checking the type wherever possible

3015 06/22/2012 01:18 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Added duplicate key handling by using EXCEPTION block to catch unique_violation

3014 06/22/2012 12:58 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Fixed bug where user-defined types were not supported correctly, by referencing the column's type directly in PL/pgSQL instead of retrieving it from the information_schema. Use a row variable of the output table's row type to store the select query data, so that you don't need to create a separate local var for each output column.

3013 06/21/2012 10:24 PM Aaron Marcuse-Kubitza

import.stats.xls: Changed \/row (ms)" to "ms/row"

3012 06/21/2012 09:52 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Used typed vars to store each column of the select query because the select query's column names are not known, so it would be unknown which record var column to use for each output column. (Add types so there are no unknown-type values that PostgreSQL will say can't be cast.)

3011 06/21/2012 09:21 PM Aaron Marcuse-Kubitza

sql.py: insert_select(): In ignore mode, always recover from errors

3010 06/21/2012 09:14 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Implemented simple cursor loop that just inserts each row, without ignoring duplicate keys

3009 06/21/2012 08:26 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): Added ignore param to do an INSERT IGNORE equivalent operation. Added basic structure for INSERT IGNORE. Support embeddable queries without RETURNING columns (which is the case for most INSERT IGNORE queries).

3008 06/21/2012 06:50 PM Aaron Marcuse-Kubitza

sql.py: run_query_into(): Always set into.schema to None, so that table is created in the temp schema in debug_temp mode

3007 06/21/2012 06:25 PM Aaron Marcuse-Kubitza

sql.py: DbConn.DbCursor.execute(): Don't cache ADD COLUMN queries because the rest of the query must be unique in order to use caching

3006 06/21/2012 06:24 PM Aaron Marcuse-Kubitza

sql.py: add_col(): Don't ignore already-existing columns because sometimes name truncation causes collisions, requiring the caller to version the name. add_index_col(): Version the column name to avoid collisions.

3005 06/21/2012 06:08 PM Aaron Marcuse-Kubitza

sql.py: add_index(): Version index names to avoid collisions

3004 06/21/2012 05:58 PM Aaron Marcuse-Kubitza

sql.py: DbConn.DbCursor.execute(): Always cache structural changes, not just if they throw an exception