/trunk/lib/sql_io.py - Changes - BIEN 3 - NCEAS Projects

root/trunk/lib/sql_io.py @ 14825

#	Date	Author	Comment
14822	10/14/2014 10:09 AM	Aaron Marcuse-Kubitza	bugfix: lib/sql_io.py: null_strs_str_default: removed "NA" because this is the abbr for a Spanish province (Navarra). this fixes the 2nd bug of #955, geovalidation duplicated rows.
14821	10/14/2014 10:00 AM	Aaron Marcuse-Kubitza	lib/sql_io.py: cleanup_table(): debug-print null_strs
14820	10/14/2014 09:56 AM	Aaron Marcuse-Kubitza	lib/sql_io.py: null_strs: made it customizable from an env var, since the same list of null_strs doesn't work for all datasources (see #957)
14816	10/14/2014 08:35 AM	Aaron Marcuse-Kubitza	lib/sql_io.py: null_strs: made it customizable from an env var, since the same list of null_strs doesn't work for all datasources (see #957)
14785	09/30/2014 07:36 AM	Aaron Marcuse-Kubitza	lib/sql_io.py: null_strs: added N/A and NA (this will remove a common abbr for North America, but we don't use the continent, so this is OK)
14589	08/26/2014 05:08 PM	Aaron Marcuse-Kubitza	fix: lib/sql_io.py: append_csv(): use new csvs.ProgressInputFilter instead of streams.ProgressInputStream(csvs.StreamFilter(__)), so that the input to csvs.InputRewriter is a reader, not a stream. this avoids the need for csvs.InputRewriter to accept a stream whose lines are tuples, instead of the expected reader.
14585	08/26/2014 04:46 PM	Aaron Marcuse-Kubitza	lib/sql_io.py: added commented-out debug statement used to troubleshoot copy_expert() errors
14074	07/15/2014 05:44 PM	Aaron Marcuse-Kubitza	bugfix: lib/sql_io.py: put_table(): handle_MissingCastException(): when updating join_cols, don't add new entry for join_cols[out_col], only update existing one. this fixes #902 (import bug), and with #902 fixed, #887 (disk space leak) should no longer occur.
13005	03/30/2014 07:54 PM	Aaron Marcuse-Kubitza	fix: lib/sql_io.py: put_table(): don't warn if can't create pkey, because this just indicates that a set-returning function was used. this should get rid of the last of the confusing benign warnings in the test output.
12868	03/22/2014 05:50 AM	Aaron Marcuse-Kubitza	bugfix: lib/sql_io.py: put_table(): is_literals: `return sql.value(cur): need to use sql.value_or_none() instead to support multi-row functions, such as _split() used in specimens data`
12153	02/13/2014 03:48 AM	Aaron Marcuse-Kubitza	lib/sql_io.py: automatic handling of input/output column type mismatches: also do this for identifying columns, which first cause an error in a join in sql.distinct_table() rather than in the main insert (and thus were not handled by the existing error handling). previously, the user would have had to manually cast the input column in postprocess.sql. this involves getting handle_MissingCastException() to update join_cols as well as mapping.
12150	02/13/2014 12:29 AM	Aaron Marcuse-Kubitza	lib/sql_io.py: put_table(): main loop MissingCastException handler: factored out into nested function so that it can also be used elsewhere
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
11151	10/02/2013 02:45 AM	Aaron Marcuse-Kubitza	lib/sql_io.py: put_table(): default param: documented that this will be used for all missing rows, regardless of which error caused them not to be inserted. this means that auto-forwarding (wiki.vegpath.org/Auto-forwarding) can be used with any type of constraint violation, not just NOT NULL constraints (which it is typically used with).
11033	09/21/2013 09:01 PM	Aaron Marcuse-Kubitza	lib/sql_io.py: put_table(): added link to new INSERT ON DUPLICATE SELECT wiki page, which now contains the explanation in the doc comment
10845	08/31/2013 06:32 PM	Aaron Marcuse-Kubitza	bugfix: lib/sql_io.py: put_table(): Getting output table pkeys of existing/inserted rows: need to include the index cond in the join condition here, too (using var join_custom_cond), so that an index scan can be used instead of a much slower full-table sort
10843	08/31/2013 05:52 PM	Aaron Marcuse-Kubitza	bugfix: lib/sql_io.py: put_table(): DuplicateKeyException: need to include any index cond in the join condition, so that an index scan can be used instead of a much slower full-table sort (otherwise the query planner will not know that it can restrict results to rows satisfying the index cond)
10838	08/30/2013 10:38 PM	Aaron Marcuse-Kubitza	lib/sql_io.py: ensure_cond(): documented meaning of passed, failed params (at least one row passed/failed the constraint)
10302	07/18/2013 12:04 AM	Aaron Marcuse-Kubitza	lib/sql_io.py: put_table(): documented that PostgreSQL 9.1+ now provides a way to implement insert/on duplicate select just once for each table (instead of dynamically for each insert) using the new INSTEAD OF triggers (http://www.postgresql.org/docs/9.1/static/plpgsql-trigger.html). INSTEAD OF triggers were not used when put_table() was developed, because it was necessary to support PostgreSQL 9.0, which was installed on the Mac and not easily upgradeable. it was eventually upgraded to add PostGIS, which required a complete reinstall of the DB from the staging tables, with the associated staging table reload bugs, as well as complete removal of the old Postgres version.
10195	07/09/2013 02:50 PM	Aaron Marcuse-Kubitza	lib/sql_io.py: cleanup_table(): added assertion that the table exists, so that if it doesn't, the error will occur as part of an assertion rather than as part of the util.table_nulls_mapped__get() call, which might confusingly lead users to believe that this is a bug in util.table_nulls_mapped__get() when in fact the problem is that the table is not installed
10188	07/06/2013 07:21 PM	Aaron Marcuse-Kubitza	lib/sql_io.py: cleanup_table(): don't run the slow ALTER TABLE statement again if the table has already been cleaned up. documented that it is idempotent (and actually was before this change as well).
10187	07/06/2013 07:19 PM	Aaron Marcuse-Kubitza	lib/sql_io.py: added table_nulls_mapped__set(), "__get() wrappers around the corresponding util schema functions
10165	07/03/2013 10:48 PM	Aaron Marcuse-Kubitza	lib/sql_io.py: put_table() (column-based import): complexity note: clarified that INSERT RETURNING throws an error on duplicate instead of returning the existing row. added blank line after ¶ for readability.
10164	07/03/2013 10:44 PM	Aaron Marcuse-Kubitza	lib/sql_io.py: put_table() (column-based import): warning about triggers populating unique constraint-covered columns: corrected limitation to include only the unique constraint used to do the DISTINCT ON, since other unique constraints are not affected by column-based import. note that the primary key will normally not be the DISTINCT ON constraint, so trigger-populated natural keys are supported unless the input table contains duplicate rows for some generated keys.
9508	05/23/2013 12:43 PM	Aaron Marcuse-Kubitza	lib/sql_io.py: append_csv(): support importing CSVs whose columns are a subset of the full table and/or in a different order. when the header exactly matches the columns, the explicit column list will still be omitted as an optimization. this uses code from r4927.
8820	05/05/2013 10:12 AM	Aaron Marcuse-Kubitza	lib/sql_io.py: put_table(): Calling wrapper function: adding pkey or index on the resulting table: don't display warning if a pkey can't be added, because this is actually a legitimate situation when the called function is set-returning and can return multiple rows for one input. having this as a warning results in spurious warnings in the automated tests (which look confusingly like ignored errors because Python warnings include debugging context information). e.g. `make inputs/Madidi/IndividualObservation/test.by_col.xml` causes this error in the sourcelist->sourcename splitting step (which of course can produce multiple specimenholder institutions)....
8077	03/19/2013 02:05 AM	Aaron Marcuse-Kubitza	lib/sql_io.py: mk_errors_table(): Create a unique index on the MD5 of the value and error instead of on the values directly, because some strings are too long to index (e.g. row 2537268 of MO.Specimen causes an error "index row size 3032 exceeds maximum 2712 for index [...] Values larger than 1/3 of a buffer page cannot be indexed")
7395	01/31/2013 02:49 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Documented that much of the complexity of the normalizing algorithm is due to PostgreSQL not having a native command for insert/on duplicate select
7394	01/31/2013 02:24 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Corrected "insert/if not exists get" to "insert/on duplicate select"
7393	01/31/2013 01:52 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Removed no longer applicable requirement that it be run at the beginning of a transaction, which was only required when the output table was locked during the function call
7392	01/31/2013 01:48 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Documented that the function's insert/if not exists get algorithm does not support database triggers that populate fields covered by a unique constraint
7180	01/11/2013 06:06 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): DuplicateKeyException: Uniquifying input table to avoid internal duplicate keys: Also filter out duplicate rows in the out_table, so that they don't create duplicate key errors and the resulting index holes
7117	01/08/2013 08:46 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ensure_cond(): Fixed bug where need to wrap strings used in the tracked error message in strings.ustr()
6801	12/12/2012 04:41 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): is_function: Fixed bug where need to add the pkeys table's test pkey constraint after the data is added rather than when the empty table is created, to avoid adding a pkey constraint that will later be violated by data which returns multiple output rows for an input row (such as calls to _split())
6800	12/12/2012 04:36 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): insert_into_pkeys(): Allow callers to override run_query_into()'s add_pkey_ param in case the initial version of the pkeys table should not yet have the test pkey constraint (e.g. because data is added after the table is created)
6300	11/19/2012 05:32 PM	Aaron Marcuse-Kubitza	sql_io.py: cast(): Use sql_gen.Cast() to generate the cast, in order to take advantage of its support for casts to unknown
6226	11/15/2012 10:43 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Special handling for functions with hstore params: Fixed bug where need to unwrap literal values of mapping, which might be sql_gen.Literal objects
6220	11/15/2012 10:04 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Added special handling for functions with hstore params. Note that although _map() doesn't exist yet as a DB function, this code must be in place before _map() is created to avoid param type mismatch errors.
6005	11/05/2012 09:54 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Removed assertion that into's row count be at least full_in_table's row count, because now that DISTINCT ON is used to satisfy the into table pkey, this is no longer necessarily true
5993	11/05/2012 04:49 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Setting pkeys of missing rows: Fixed bug in column-based import where when input rows match multiple output rows in one of this iteration's input tables, the into table's pkey constraint is violated because full_in_table contains multiple entries for an input pkey
5968	11/02/2012 03:37 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Switched back to using run_query_into()'s add_pkey_ option now that it uses sql.add_pkey_or_index() instead of sql.add_pkey()
5895	10/31/2012 10:05 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ensure_cond(): When deleting rows rows that do not satisfy the condition, handle sql.DoesNotExistExceptions caused by columns in the condition that were not replaced with NULL. These occur when out_table is a function, and the columns of the table the condition relates to therefore can't be found using out_table.
5894	10/31/2012 09:59 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Calling function: Do not cache the function call, because it may be retried after error handling
5889	10/31/2012 08:45 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): sql.DatabaseErrors: Factored exception-handling code out into handle_unknown_exc(), for use by other exception handlers
5888	10/31/2012 08:39 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ensure_cond(): is_function: Fixed bug where can't replace out_table_cols with NULL because out_table is a function, not a table
5827	10/30/2012 12:07 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ensure_cond(): Fixed bug where test if any rows failed cond did not check if cur != None (which is the case when cond == sql_gen.true_expr) before checking cur.rowcount
5824	10/29/2012 09:37 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Assertion that into and full_in_table have the same row count: Allow into to have more rows than full_in_table, in case an input row matched multiple output rows. This should not happen for a properly-configured database, but seems to happen periodically nevertheless (currently, to the MO datasource) and should not abort the import when it does.
5818	10/26/2012 06:00 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ensure_cond(): When adding the failed condition to the errors table, also include the original, untranslated condition from the DB schema in addition to the translation of the condition into the input schema
5817	10/26/2012 05:45 PM	Aaron Marcuse-Kubitza	sql_io.py: track_data_error(): Fixed bug where errors whose column had no srcs (indicated by () ) were incorrectly being ignored. This affected NOT NULL exceptions where the column was not provided by the dataset.
5815	10/26/2012 05:38 PM	Aaron Marcuse-Kubitza	sql_io.py: track_data_error(): Support errors with no columns by inserting a single entry with column set to NULL
5813	10/26/2012 05:00 PM	Aaron Marcuse-Kubitza	sql_io.py: mk_errors_table(): Made "column" column nullable, because some errors (such as check constraint violations) don't have any corresponding columns if its columns weren't provided in the input data
5791	10/25/2012 05:12 PM	Aaron Marcuse-Kubitza	sql_io.py: cast(): Set the created function's value param type to anyelement to support any input type, not just text
5766	10/25/2012 07:51 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): insert_into_pkeys(): Use new sql.add_pkey_or_index() instead of sql.add_pkey() in order to just print a warning if for some reason there were duplicate entries for an input row in the iteration's pkeys table. This should provide a workaround for bugs (often in the schema itself, related to its unique indexes) that cause an input row to match multiple output rows when joining on the output table using the unique constraint's columns.
5726	10/23/2012 07:56 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): is_function: Moved definition of wrapper function inside try block of main loop because the creation of the empty pkeys table (whose row type is needed for the wrapper function) can itself produce MissingCastExceptions, which must be thrown inside the loop in order to be handled properly
5719	10/23/2012 05:33 AM	Aaron Marcuse-Kubitza	sql_io.py: put(): Pass on_error through to put_table()
5718	10/23/2012 05:19 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): log_exc(): Return False if removing all rows and have callers break the main loop so that no further exception-handling code is processed before the main loop is exited
5594	10/17/2012 11:50 AM	Aaron Marcuse-Kubitza	sql_io.py: import_csv(): Add a row_num column at the beginning of the table, which is autopopulated by csvs.RowNumFilter (it cannot be autopopulated by the serial datatype, because this does not support COPY FROM with a NULL-equivalent value in the serial field). This fixes a bug in csv2db where rows would not stay in inserted order upon querying the table, and would be returned in a different order each query, which prevented LIMIT/OFFSET based subsetting from returning consistent, nonoverlapping results. This occurs because PostgreSQL unfortunately does not return rows in inserted order (or any stable order: "If sorting is not chosen, the rows will be returned in an unspecified order [which] must not be relied on" <http://www.postgresql.org/docs/8.3/static/queries-order.html>), so an explicit ORDER BY is always needed to ensure staging table rows are retrievable in the order they were inserted.
5591	10/17/2012 11:04 AM	Aaron Marcuse-Kubitza	sql_io.py: import_csv(): Take a reader and header rather than a stream to allow callers to pass in a wrapped CSV reader for filtering, etc.
5590	10/17/2012 11:00 AM	Aaron Marcuse-Kubitza	sql_io.py: append_csv(): Take a reader and header rather than a stream_info and stream to allow callers to use the simpler csvs.reader_and_header() function. This also allows callers to pass in a wrapped CSV reader for filtering, etc.
5588	10/17/2012 10:42 AM	Aaron Marcuse-Kubitza	sql_io.py: append_csv(): Wrap input stream in a ProgressInputStream that reports rows (rather than lines) read
5584	10/17/2012 09:55 AM	Aaron Marcuse-Kubitza	sql_io.py: append_csv(): Removed no longer used INSERT mode, since all callers now use the default COPY FROM
5583	10/17/2012 09:53 AM	Aaron Marcuse-Kubitza	sql_io.py: import_csv(): Removed no longer needed manual setting of use_copy_from, which defaults to True in append_csv()
5578	10/17/2012 09:32 AM	Aaron Marcuse-Kubitza	sql_io.py: append_csv(): Parse any exceptions generated by the COPY FROM using new sql.parse_exception()
5573	10/17/2012 09:01 AM	Aaron Marcuse-Kubitza	sql_io.py: append_csv(): Don't disable COPY FROM for TSVs, which are now supported using csvs.InputRewriter
5572	10/17/2012 08:59 AM	Aaron Marcuse-Kubitza	sql_io.py: append_csv(): COPY FROM: Wrap provided stream in standardizing stream to fix ragged rows (with unequal # columns) and nonstandard CSV dialects (such as TSV with \-escaped newlines)
5569	10/17/2012 07:25 AM	Aaron Marcuse-Kubitza	sql_io.py: row_num_col_def: Changed type to integer so the row_num can be populated directly by the insert process
5568	10/17/2012 07:19 AM	Aaron Marcuse-Kubitza	sql_io.py: Added row_num_col_def for use by import_csv(). The row_num column will be necessary again because PostgreSQL unfortunately does not return rows in inserted order (or any stable order: "If sorting is not chosen, the rows will be returned in an unspecified order [which] must not be relied on" <http://www.postgresql.org/docs/8.3/static/queries-order.html>), so an explicit ORDER BY is always needed to ensure staging table rows are retrievable in the order they were inserted.
5553	10/16/2012 09:19 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Ensuring into's out_pkey is different from in_pkey: Prepend "out." instead of out_table to avoid long column names for the output pkey
5530	10/15/2012 03:23 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): full_in_table: Create it using new sql.copy_table() instead of sql.run_query_into()
5528	10/15/2012 03:14 PM	Aaron Marcuse-Kubitza	sql.mk_select() calls: Removed no longer needed order_by=None when limit=0
5523	10/15/2012 02:36 PM	Aaron Marcuse-Kubitza	sql.select() calls: Removed order_by=None everywhere that a stable row order is required (i.e. consistent between selects, or consistent between table transformations). This causes several tests to return different inserted row counts, because the input table is now being accessed in pkey order instead of in table order. This fixes a bug where tables with more rows than ~100 would return different results for repeated calls of the same non-ordered select.
5505	10/15/2012 08:45 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ensure_cond(): track_data_error(): Concatenate the columns in the constraint together using , rather than adding a separate entry for each column, because the constraint is applicable to all columns together rather than to each column separately
5504	10/15/2012 08:26 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Renamed ignore_cond() to ensure_cond() for clarity
5450	10/12/2012 03:18 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): DuplicateKeyException: Fixed bug where indexes with conditions needed to have the input rows filtered by the condition, to prevent trying to retrieve an existing/inserted row using a join on the index columns when the index in fact does not apply. This fixes a bug in the import of taxonconcept where the taxonconcept_0_unique_identifying_name unique index has a condition which was not satisfied for input rows with no identifyingtaxonomicname, causing any input row with NULL in this column to match all taxonconcepts with a NULL identifyingtaxonomicname. This uses ignore_cond()'s new support for constraints that did not fail at least once.
5449	10/12/2012 03:12 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ignore_cond(): Added support for constraints that did not fail at least once, and therefore should not be required to simplify to a non-false value. As part of this, only track the failed constraint in the errors table if it actually failed at least once based on the deleted row count or the `failed` param.
5444	10/12/2012 12:11 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): insert_into_pkeys(): Take a query as the param instead of sql.mk_select()'s params, to allow the caller to pass in any query without needing insert_into_pkeys() to manually pass through those args
5442	10/11/2012 09:36 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ignore_cond(): Log message: Replaced don't with do not so it wouldn't mess up syntax highlighting when viewing the log file in a text editor
5394	10/10/2012 06:10 AM	Aaron Marcuse-Kubitza	sql_io.py: cleanup_table(): Use sql.table_pkey_col() instead of sql.pkey_col() so that only an actual pkey column is removed from the list of columns to clean. This fixes a bug where the first column in the table was not cleaned up if there was no pkey. Note that this bug only affected newly re-created staging tables, because staging tables previously had a special row_num pkey column added if they did not already have a pkey. The row_num column is now added by column-based import instead.
5388	10/10/2012 05:01 AM	Aaron Marcuse-Kubitza	sql.py: Renamed pkey() to pkey_name()
5387	10/10/2012 04:45 AM	Aaron Marcuse-Kubitza	sql.py: Renamed pkey_col_() to pkey_col()
5384	10/10/2012 04:37 AM	Aaron Marcuse-Kubitza	cleanup_table(): Use new sql.table_cols() instead of sql.table_col_names()
5381	10/10/2012 03:33 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Resolving default value column: If ignoring all rows, use input cols directly instead of cols from joined-together input table. In addition to being simpler, this prevents the returned column's name from growing longer and longer as each iteration prepends its input table table name to the default value column name.
5380	10/10/2012 03:07 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Moved changing the table of the default value column from Resolving the default value column to Setting pkeys of missing rows, because the table change is only needed in this section
5379	10/10/2012 03:04 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Resolving default value column: Always call sql_gen.remove_col_rename() because it will just pass the value through if it's not a column
5377	10/10/2012 02:30 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Replaced limit_ref integer with ignore_all_ref boolean, because it is no longer used as a select statement limit
5376	10/10/2012 02:29 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): remove_all_rows(): Corrected "just create an empty pkeys table" comment to "just return the default value column"
5375	10/10/2012 02:27 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): mk_main_select(): Removed setting limit to limit_ref⁰, because an empty pkeys table is no longer created when ignoring all rows
5374	10/10/2012 02:19 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Setting pkeys of missing rows: Removed "limit_ref⁰ == 0" check because this code is never reached in that case
5373	10/10/2012 02:16 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Ignoring all rows for unrecoverable errors: Even in multi-row mode, just return whatever the default value or column was, instead of creating an output table containing the default value filled in for every row. This also assists the optimization to skip empty levels of taxonconcepts, because it folds the empty level to that level's parent level rather than creating a whole new temp table with ultimately the same contents.
5369	10/10/2012 01:24 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ignore_cond(): Changed "Ignoring rows where" message with the negated (filter-out) condition to "Ignoring rows that don't satisfy" with the filter condition for clarity
5368	10/10/2012 01:22 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): ignore_cond(): If cond simplifies to false, remove all rows instead of filtering out individual rows which will all be filtered out. This optimization should improve import times of tables, such as taxonconcept, which use a check constraint instead of NOT NULL constraints to prevent empty rows. The taxonomic schema refactoring caused the creation of many more levels of taxonconcepts, many of which (such as variety, forma, cultivar) are empty for most datasources, so this optimization should also reduce overall import times for datasources that have any empty levels of taxonconcept. Note that this optimization is only possible now that sql_gen.simplify_expr() is able to simplify all the way to a single boolean value for the taxonconcept_required_key constraint.
5367	10/10/2012 12:55 AM	Aaron Marcuse-Kubitza	Moved expression transforming functions from sql.py to sql_gen.py because they do not manipulate an actual database and merely generate SQL
5337	10/09/2012 10:16 PM	Aaron Marcuse-Kubitza	sql.py: Renamed table_cols() to table_col_names() for clarity, because it does not return sql_gen.Col objects
5289	10/05/2012 10:52 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Resolving default value column: Fixed bug where the default value col needed to have its table changed from in_table to full_in_table if it's a table column, and needed to have any column rename removed if it's a literal value
5239	10/04/2012 07:43 PM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Resolve default value column after the main loop (inserts and selects), so that the default value column can refer to an output column that is not in the original mapping but is added to the mapping from a col_defaults entry. This requires deferring the "Missing mapping for NOT NULL column" warning until the default value column is resolved, and including all columns in the full_in_table since the default value input column is not yet known.
5192	10/03/2012 04:50 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Fixed bug where row_ct_ref was incorrectly being incremented when the iteration is a function call. This bug only occurred in row-based mode, because the DB cursor for a function call is not stored in column-based mode.
5164	10/02/2012 08:56 PM	Aaron Marcuse-Kubitza	sql_io.py: append_csv(): In INSERT mode, print # rows read (different from # lines read if some fields contained embedded newlines) and # rows inserted (different from # rows read if some violated a constraint)
5129	09/28/2012 03:00 PM	Aaron Marcuse-Kubitza	sql_io.py: cleanup_table(): Don't clean up the pkey, because the canonicalization involved may produce collisions (as it does for TNRS.tnrs)
5070	09/27/2012 10:12 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Removed comment that can support in_tables of any fixed-size iterable type, because the iterable must be ordered so that the first table can be treated specially
5069	09/27/2012 10:09 AM	Aaron Marcuse-Kubitza	sql_io.py: put_table(): Support in_tables of any fixed-size iterable type

Project

General

Profile