Project

General

Profile

Statistics
| Revision:

# Date Author Comment
3049 06/22/2012 07:35 PM Aaron Marcuse-Kubitza

sql.py: cast(): Use FOR loop with EXCEPTION block instead of CROSS JOIN with LEFT JOIN to insert entries into the errors table, to get the same optimization benefits this change provides in other filter-out contexts, and to improve clarity

3048 06/22/2012 07:31 PM Aaron Marcuse-Kubitza

sql_gen.py: NamedValues: Support None cols param for no named columns

3047 06/22/2012 06:55 PM Aaron Marcuse-Kubitza

sql.py: ensure_not_null(): Made the use of index columns configurable based on the # of rows in the table, because for small datasources, they seem to add 6-25% to the total import time

3046 06/22/2012 06:12 PM Aaron Marcuse-Kubitza

xml_func.py: _noCV: Fixed bug where assumed items was an iterator when it's now a list

3045 06/22/2012 06:02 PM Aaron Marcuse-Kubitza

sql.py: add_index_col(), cast_temp_col(): Cache add_col() by providing new comment param to distinguish new columns of the same (colliding) suffixed name but from different source columns

3044 06/22/2012 05:54 PM Aaron Marcuse-Kubitza

sql.py: add_index_col(), cast_temp_col(): Cache the update that fills in the new column, since it's idempotent

3043 06/22/2012 05:52 PM Aaron Marcuse-Kubitza

sql.py: update(): Pass cacheable to run_query()

3042 06/22/2012 05:48 PM Aaron Marcuse-Kubitza

sql.py: add_col(): Added comment param which can be used to distinguish columns of the same name from each other when they contain different data, to allow the ADD COLUMN query (and related queries, such as adding indexes) to be cached

3041 06/22/2012 05:46 PM Aaron Marcuse-Kubitza

sql_gen.py: Added esc_comment()

3040 06/22/2012 05:37 PM Aaron Marcuse-Kubitza

sql.py: DbConn.DbCursor.execute(): Allow ADD COLUMN to be cached if it has a distinguishing comment, because then the rest of query will be unique in the face of name collisions

3039 06/22/2012 05:29 PM Aaron Marcuse-Kubitza

sql.py: add_col(): Document that name may be versioned, so caller needs to propagate any renaming back to any source column for the TypedCol

3038 06/22/2012 05:27 PM Aaron Marcuse-Kubitza

sql.py: add_col() callers: Removed column name versioning because that is now handled by add_col()

3037 06/22/2012 05:22 PM Aaron Marcuse-Kubitza

sql.py: add_col() callers: Fixed bug where needed to propagate any renaming of typed column back to regular column

3036 06/22/2012 05:19 PM Aaron Marcuse-Kubitza

sql.py: add_col(): Version column names to avoid collisions. (Previously, callers were required to do this themselves.)

3035 06/22/2012 05:13 PM Aaron Marcuse-Kubitza

sql.py: cast_temp_col(): Handle column name collisions like add_index_col()

3034 06/22/2012 04:49 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Switched to using FOR loop rather than cursors because cursors are only needed if you want to process multiple rows in the same EXCEPTION block (which you can't do because then all the previous inserts will be rolled back if one row is a duplicate key)

3033 06/22/2012 04:40 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Moved FETCH FROM cursor outside EXCEPTION block since only the insert needs to have unique_violations ignored

3032 06/22/2012 04:33 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Removed no longer needed EXCLUSIVE lock on the output table

3031 06/22/2012 04:25 PM Aaron Marcuse-Kubitza

add_index_col(): Run all operations with log_level=3 because it is the equivalent of creating an index, and that has log_level=3 to be excluded from the primary algorithm steps. add_not_null(): Run query with log_level=3 because it's an internal operation and for add_index_col() above.

3030 06/22/2012 04:23 PM Aaron Marcuse-Kubitza

sql.py: update(): Pass any log_level to run_query()

3029 06/22/2012 04:11 PM Aaron Marcuse-Kubitza

xml_func.py: process(): Added pass-through optimization for aggregating functions with one arg

3028 06/22/2012 04:02 PM Aaron Marcuse-Kubitza

xml_func.py: process(): Call list() on the items' iterator so that its length can be determined wherever it's used without needing to call list() on it separately

3027 06/22/2012 03:43 PM Aaron Marcuse-Kubitza

sql.py: mk_select(): Put '*' on same line as 'SELECT' if there is no DISTINCT ON, to avoid extra newlines

3026 06/22/2012 03:38 PM Aaron Marcuse-Kubitza

sql_gen.py: NamedTable.to_str(): Put AS clause on same line as code being named if it didn't contain a newline, to avoid extra newlines

3025 06/22/2012 03:29 PM Aaron Marcuse-Kubitza

sql.py: run_query(): Always parse exceptions, whether recover is set or not, to avoid making the caller set recover just to parse exceptions. If recover is not set, just don't run any queries when generating the parsed exception and return an empty value for the needed information. (A partial parsed exception is better than an unparsed one.)

3024 06/22/2012 03:18 PM Aaron Marcuse-Kubitza

sql_gen.py: is_temp_col(): Use is_table_col() so that the type of the parameter is also checked

3023 06/22/2012 03:15 PM Aaron Marcuse-Kubitza

inputs/import.stats.xls: Bolded the ms/row totals. Added note that non-bolded totals are not directly comparable because different methods were run with different numbers of rows.

3022 06/22/2012 03:10 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Inserting new rows: Removed no longer needed filter-out LEFT JOIN, since the INSERT IGNORE replacement handles duplicate keys

3021 06/22/2012 03:02 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Inserting new rows: Use insert_select()'s (mk_insert_select()'s) new ignore option to ignore duplicate keys

3020 06/22/2012 02:59 PM Aaron Marcuse-Kubitza

sql_gen.py: to_name_only_col(): Consider any non-table column, including NamedCol, to be already name-only. This fixes a bug in sql.mk_insert_select() where the value of a NamedCol was removed by to_name_only_col() even though it was needed.

3019 06/22/2012 02:30 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Use an EXCEPTION block for each individual row because "When an error is caught by an EXCEPTION clause, [...] all changes to persistent database state within the block are rolled back." (http://www.postgresql.org/docs/8.3/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING). Documented that cursor stays at current position in spite of automatic ROLLBACK on EXCEPTION.

3018 06/22/2012 02:11 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Use RETURN QUERY on the insert query instead of RETURN NEXT on a local var to forward the RETURNING value

3017 06/22/2012 02:08 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Support RETURNING clause. Always return something to set the correct rowcount.

3016 06/22/2012 02:06 PM Aaron Marcuse-Kubitza

sql_gen.py: Use an as_*() function instead of manually checking the type wherever possible

3015 06/22/2012 01:18 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Added duplicate key handling by using EXCEPTION block to catch unique_violation

3014 06/22/2012 12:58 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Fixed bug where user-defined types were not supported correctly, by referencing the column's type directly in PL/pgSQL instead of retrieving it from the information_schema. Use a row variable of the output table's row type to store the select query data, so that you don't need to create a separate local var for each output column.

3013 06/21/2012 10:24 PM Aaron Marcuse-Kubitza

import.stats.xls: Changed \/row (ms)" to "ms/row"

3012 06/21/2012 09:52 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Used typed vars to store each column of the select query because the select query's column names are not known, so it would be unknown which record var column to use for each output column. (Add types so there are no unknown-type values that PostgreSQL will say can't be cast.)

3011 06/21/2012 09:21 PM Aaron Marcuse-Kubitza

sql.py: insert_select(): In ignore mode, always recover from errors

3010 06/21/2012 09:14 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): INSERT IGNORE: Implemented simple cursor loop that just inserts each row, without ignoring duplicate keys

3009 06/21/2012 08:26 PM Aaron Marcuse-Kubitza

sql.py: mk_insert_select(): Added ignore param to do an INSERT IGNORE equivalent operation. Added basic structure for INSERT IGNORE. Support embeddable queries without RETURNING columns (which is the case for most INSERT IGNORE queries).

3008 06/21/2012 06:50 PM Aaron Marcuse-Kubitza

sql.py: run_query_into(): Always set into.schema to None, so that table is created in the temp schema in debug_temp mode

3007 06/21/2012 06:25 PM Aaron Marcuse-Kubitza

sql.py: DbConn.DbCursor.execute(): Don't cache ADD COLUMN queries because the rest of the query must be unique in order to use caching

3006 06/21/2012 06:24 PM Aaron Marcuse-Kubitza

sql.py: add_col(): Don't ignore already-existing columns because sometimes name truncation causes collisions, requiring the caller to version the name. add_index_col(): Version the column name to avoid collisions.

3005 06/21/2012 06:08 PM Aaron Marcuse-Kubitza

sql.py: add_index(): Version index names to avoid collisions

3004 06/21/2012 05:58 PM Aaron Marcuse-Kubitza

sql.py: DbConn.DbCursor.execute(): Always cache structural changes, not just if they throw an exception

3003 06/21/2012 05:40 PM Aaron Marcuse-Kubitza

sql.py: add_index(): Fixed bug where expr could not be deep-copied until ensure_not_null() had been run on it, because ensure_not_null() modifies the index_cols of the table and this needs to be modified on the original table

3002 06/21/2012 05:39 PM Aaron Marcuse-Kubitza

sql.py: add_index(): Fixed bug where expr could not be deep-copied until ensure_not_null() had been run on it, because ensure_not_null() modifies the index_cols of the table and this needs to be modified on the original table

3001 06/21/2012 05:12 PM Aaron Marcuse-Kubitza

sql_gen.py: EnsureNotNull: Added to_str() which uses the column's index col if available. This works around a PostgreSQL bug where indexes on expressions are sometimes not used, but indexes on plain columns are used.

3000 06/21/2012 05:09 PM Aaron Marcuse-Kubitza

sql_gen.py: Store index cols in the table instead of in the column, because columns are often recreated from the table and name but tables are generally not copied

2999 06/21/2012 04:21 PM Aaron Marcuse-Kubitza

sql_gen.py: Added is_indexed_col()

2998 06/21/2012 04:15 PM Aaron Marcuse-Kubitza

sql.py: add_index(): Use new sql.ensure_not_null(), which creates a separate column to store the index expr where possible

2997 06/21/2012 04:12 PM Aaron Marcuse-Kubitza

sql.py: Added add_not_null(), add_index_col(), ensure_not_null()

2996 06/21/2012 04:10 PM Aaron Marcuse-Kubitza

sql_gen.py: Added is_temp_col()

2995 06/21/2012 03:57 PM Aaron Marcuse-Kubitza

sql_gen.py: concat(): Also preserve cast suffix

2994 06/21/2012 03:44 PM Aaron Marcuse-Kubitza

sql_gen.py: Col: Added index_col attr

2993 06/21/2012 03:19 PM Aaron Marcuse-Kubitza

sql_gen.py: Added suffixed_col()

2992 06/21/2012 02:59 PM Aaron Marcuse-Kubitza

sql.py: run_query_into(): Set the into table to be a temp table

2991 06/21/2012 02:59 PM Aaron Marcuse-Kubitza

sql_gen.py: Table: Store whether table is temp table

2990 06/21/2012 01:50 PM Aaron Marcuse-Kubitza

sql_gen.py: concat(): Also preserve appended column names

2989 06/21/2012 01:48 PM Aaron Marcuse-Kubitza

sql_gen.py: Col.to_str(): Take for_str param which does both concat() and str()/clean_name() instead of use_concat param which doesn't remove quotes from the strings before concatenating, causing strings to be incorrectly truncated

2988 06/20/2012 09:46 PM Aaron Marcuse-Kubitza

sql_gen.py: EnsureNotNull: Cast the null sentinel to the type it's the sentinel for, so that COALESCE returns the same type as the provided type, instead of potentially incorrectly assuming the constant's type. This fixes a bug where COALESCE returned an incorrect type.

2987 06/20/2012 09:38 PM Aaron Marcuse-Kubitza

sql.py: cast(): Use new sql_gen.Cast

2986 06/20/2012 09:38 PM Aaron Marcuse-Kubitza

sql_gen.py: Added Cast

2985 06/20/2012 09:37 PM Aaron Marcuse-Kubitza

sql_gen.py: concat(): Consider ")"s part of the existing suffix, so that parentheses match up

2984 06/20/2012 09:05 PM Aaron Marcuse-Kubitza

Removed extra trailing whitespace

2983 06/20/2012 08:36 PM Aaron Marcuse-Kubitza

main Makefile: Added explicit schemas/temp/reinstall target because vegbiendev (Ubuntu 12.04) seems to ignore %/reinstall

2982 06/20/2012 08:24 PM Aaron Marcuse-Kubitza

vegbien_dest: Added temp to schemas list so the intermediate tables will be created there, where they are easy to remove with `make schemas/temp/reinstall`

2981 06/20/2012 08:23 PM Aaron Marcuse-Kubitza

main Makefile: Added %/reinstall, which calls the corresponding uninstall and install targets

2980 06/20/2012 08:11 PM Aaron Marcuse-Kubitza

Added temp schema to store intermediate tables permanently when desired for debugging. It is not part of schemas/all because it should not be remade to include current intermediate tables when remaking other schema exports.

2979 06/20/2012 08:04 PM Aaron Marcuse-Kubitza

schemas: Moved *.sql.make into Makefile to take advantage of % pattern matching

2978 06/20/2012 07:47 PM Aaron Marcuse-Kubitza

Removed no longer used schemas/vegbien_empty.sql

2977 06/20/2012 07:46 PM Aaron Marcuse-Kubitza

main Makefile: Removed empty_db, because `make schemas/reinstall` has the same effect and is simpler

2976 06/20/2012 07:40 PM Aaron Marcuse-Kubitza

README.TXT: Changed documentation to use make schemas/reinstall to empty the DB, since that command is simpler. Added how to archive the last import.

2975 06/20/2012 07:11 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Removed `if not db.debug_temp` check because that is done by sql.empty_temp()

2974 06/20/2012 07:10 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Use new empty_temp()

2973 06/20/2012 07:06 PM Aaron Marcuse-Kubitza

import.stats.xls: Added comments for estimated numbers. Added "," separators to large numbers.

2972 06/20/2012 06:21 PM Aaron Marcuse-Kubitza

sql.py: empty_temp(): In debug_temp mode, leave temp tables there for debugging

2971 06/20/2012 06:06 PM Aaron Marcuse-Kubitza

sql.py: empty_temp(): Don't output at log_level 2 because it's an internal query, not part of the core algorithm

2970 06/20/2012 06:06 PM Aaron Marcuse-Kubitza

sql.py: truncate(): Added kw_args to pass to run_query()

2969 06/20/2012 05:52 PM Aaron Marcuse-Kubitza

Added inputs/import.stats.xls, which compares row-based and column-based import. This shows that column-based import is slowed down by table locking when run simultaneously, so we will need a new INSERT IGNORE replacement that doesn't lock tables.

2968 06/20/2012 03:14 PM Aaron Marcuse-Kubitza

inputs: Ignore OpenOffice.org lock files

2967 06/20/2012 02:19 PM Aaron Marcuse-Kubitza

sql.py: empty_temp(): Don't print log message if not emptying any tables

2966 06/20/2012 02:16 PM Aaron Marcuse-Kubitza

db_xml.py: put_table(): Empty unneeded temp tables to free up memory

2965 06/20/2012 02:14 PM Aaron Marcuse-Kubitza

sql.py: Added empty_temp()

2964 06/20/2012 02:14 PM Aaron Marcuse-Kubitza

sql.py: Use new lists.mk_seq()

2963 06/20/2012 02:13 PM Aaron Marcuse-Kubitza

lists.py: Added mk_seq()

2962 06/20/2012 02:11 PM Aaron Marcuse-Kubitza

lists.py: is_seq(): Also return true for sets

2961 06/19/2012 03:02 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: specimenreplicate: Added indexes using COALESCE to match what sql_gen does

2960 06/19/2012 02:08 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Getting output table pkeys of existing/inserted rows: Do a DISTINCT ON the input pkey (row_num) in case the plain JOIN matched multiple output table rows for one input table row

2959 06/19/2012 01:44 PM Aaron Marcuse-Kubitza

sql.py: put_table(): Empty unneeded temp tables to free up memory and avoid running out of memory (the temp tables seem to be in-memory only)

2958 06/19/2012 01:30 PM Aaron Marcuse-Kubitza

sql_gen.py: null_sentinels: Added value for type timestamp with time zone. Put each type on its own line for clarity.

2957 06/19/2012 01:03 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: locationdetermination: Changed indexes to use COALESCE to match what sql_gen now does

2956 06/19/2012 12:23 PM Aaron Marcuse-Kubitza

schemas/vegbien.sql: location: Added indexes using COALESCE to match what sql_gen does

2955 06/19/2012 12:06 PM Aaron Marcuse-Kubitza

sql.py: cast_temp_col(): Add an index on the created column

2954 06/19/2012 11:55 AM Aaron Marcuse-Kubitza

sql_gen.py: null_sentinels: Added value for type double precision

2953 06/19/2012 11:52 AM Aaron Marcuse-Kubitza

sql_gen.py: ensure_not_null(): Warn of no null sentinel for type, even if caller catches error

2952 06/19/2012 10:11 AM Aaron Marcuse-Kubitza

schemas/py_functions.sql: Added plain function _namePart() and use it in trigger function _namePart()

2951 06/19/2012 09:42 AM Aaron Marcuse-Kubitza

schemas/py_functions.sql: Added plain functions _dateRangeStart() and _dateRangeEnd() and use them in trigger functions _dateRangeStart() and _dateRangeEnd()

2950 06/19/2012 09:28 AM Aaron Marcuse-Kubitza

schemas/functions.sql: _label(): Ensure that label is NOT NULL so it doesn't NULL out the entire string