Project

General

Profile

# Date Author Comment
14826 10/14/2014 12:36 PM Aaron Marcuse-Kubitza

lib/sql_io.py: cleanup_table(): trim(): documented that this also converts character varying fields to text

14822 10/14/2014 10:09 AM Aaron Marcuse-Kubitza

bugfix: lib/sql_io.py: null_strs_str_default: removed "NA" because this is the abbr for a Spanish province (Navarra). this fixes the 2nd bug of #955, geovalidation duplicated rows.

14821 10/14/2014 10:00 AM Aaron Marcuse-Kubitza

lib/sql_io.py: cleanup_table(): debug-print null_strs

14820 10/14/2014 09:56 AM Aaron Marcuse-Kubitza

lib/sql_io.py: null_strs: made it customizable from an env var, since the same list of null_strs doesn't work for all datasources (see #957)

14816 10/14/2014 08:35 AM Aaron Marcuse-Kubitza

lib/sql_io.py: null_strs: made it customizable from an env var, since the same list of null_strs doesn't work for all datasources (see #957)

14785 09/30/2014 07:36 AM Aaron Marcuse-Kubitza

lib/sql_io.py: null_strs: added N/A and NA (this will remove a common abbr for North America, but we don't use the continent, so this is OK)

14589 08/26/2014 05:08 PM Aaron Marcuse-Kubitza

fix: lib/sql_io.py: append_csv(): use new csvs.ProgressInputFilter instead of streams.ProgressInputStream(csvs.StreamFilter(__)), so that the input to csvs.InputRewriter is a reader, not a stream. this avoids the need for csvs.InputRewriter to accept a stream whose lines are tuples, instead of the expected reader.

14585 08/26/2014 04:46 PM Aaron Marcuse-Kubitza

lib/sql_io.py: added commented-out debug statement used to troubleshoot copy_expert() errors

14074 07/15/2014 05:44 PM Aaron Marcuse-Kubitza

bugfix: lib/sql_io.py: put_table(): handle_MissingCastException(): when updating join_cols, don't add new entry for join_cols[out_col], only update existing one. this fixes #902 (import bug), and with #902 fixed, #887 (disk space leak) should no longer occur.

13005 03/30/2014 07:54 PM Aaron Marcuse-Kubitza

fix: lib/sql_io.py: put_table(): don't warn if can't create pkey, because this just indicates that a set-returning function was used. this should get rid of the last of the confusing benign warnings in the test output.

12868 03/22/2014 05:50 AM Aaron Marcuse-Kubitza

bugfix: lib/sql_io.py: put_table(): is_literals: `return sql.value(cur): need to use sql.value_or_none() instead to support multi-row functions, such as _split() used in specimens data`

12153 02/13/2014 03:48 AM Aaron Marcuse-Kubitza

lib/sql_io.py: automatic handling of input/output column type mismatches: also do this for identifying columns, which first cause an error in a join in sql.distinct_table() rather than in the main insert (and thus were not handled by the existing error handling). previously, the user would have had to manually cast the input column in postprocess.sql. this involves getting handle_MissingCastException() to update join_cols as well as mapping.

12150 02/13/2014 12:29 AM Aaron Marcuse-Kubitza

lib/sql_io.py: put_table(): main loop MissingCastException handler: factored out into nested function so that it can also be used elsewhere

11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

11151 10/02/2013 02:45 AM Aaron Marcuse-Kubitza

lib/sql_io.py: put_table(): default param: documented that this will be used for all missing rows, regardless of which error caused them not to be inserted. this means that auto-forwarding (wiki.vegpath.org/Auto-forwarding) can be used with any type of constraint violation, not just NOT NULL constraints (which it is typically used with).

11033 09/21/2013 09:01 PM Aaron Marcuse-Kubitza

lib/sql_io.py: put_table(): added link to new INSERT ON DUPLICATE SELECT wiki page, which now contains the explanation in the doc comment

10845 08/31/2013 06:32 PM Aaron Marcuse-Kubitza

bugfix: lib/sql_io.py: put_table(): Getting output table pkeys of existing/inserted rows: need to include the index cond in the join condition here, too (using var join_custom_cond), so that an index scan can be used instead of a much slower full-table sort

10843 08/31/2013 05:52 PM Aaron Marcuse-Kubitza

bugfix: lib/sql_io.py: put_table(): DuplicateKeyException: need to include any index cond in the join condition, so that an index scan can be used instead of a much slower full-table sort (otherwise the query planner will not know that it can restrict results to rows satisfying the index cond)

10838 08/30/2013 10:38 PM Aaron Marcuse-Kubitza

lib/sql_io.py: ensure_cond(): documented meaning of passed, failed params (at least one row passed/failed the constraint)

10302 07/18/2013 12:04 AM Aaron Marcuse-Kubitza

lib/sql_io.py: put_table(): documented that PostgreSQL 9.1+ now provides a way to implement insert/on duplicate select just once for each table (instead of dynamically for each insert) using the new INSTEAD OF triggers (http://www.postgresql.org/docs/9.1/static/plpgsql-trigger.html). INSTEAD OF triggers were not used when put_table() was developed, because it was necessary to support PostgreSQL 9.0, which was installed on the Mac and not easily upgradeable. it was eventually upgraded to add PostGIS, which required a complete reinstall of the DB from the staging tables, with the associated staging table reload bugs, as well as complete removal of the old Postgres version.

10195 07/09/2013 02:50 PM Aaron Marcuse-Kubitza

lib/sql_io.py: cleanup_table(): added assertion that the table exists, so that if it doesn't, the error will occur as part of an assertion rather than as part of the util.table_nulls_mapped__get() call, which might confusingly lead users to believe that this is a bug in util.table_nulls_mapped__get() when in fact the problem is that the table is not installed

10188 07/06/2013 07:21 PM Aaron Marcuse-Kubitza

lib/sql_io.py: cleanup_table(): don't run the slow ALTER TABLE statement again if the table has already been cleaned up. documented that it is idempotent (and actually was before this change as well).

10187 07/06/2013 07:19 PM Aaron Marcuse-Kubitza

lib/sql_io.py: added table_nulls_mapped__set(), "__get() wrappers around the corresponding util schema functions

10165 07/03/2013 10:48 PM Aaron Marcuse-Kubitza

lib/sql_io.py: put_table() (column-based import): complexity note: clarified that INSERT RETURNING throws an error on duplicate instead of returning the existing row. added blank line after ΒΆ for readability.

10164 07/03/2013 10:44 PM Aaron Marcuse-Kubitza

lib/sql_io.py: put_table() (column-based import): warning about triggers populating unique constraint-covered columns: corrected limitation to include only the unique constraint used to do the DISTINCT ON, since other unique constraints are not affected by column-based import. note that the primary key will normally not be the DISTINCT ON constraint, so trigger-populated natural keys are supported unless the input table contains duplicate rows for some generated keys.

9508 05/23/2013 12:43 PM Aaron Marcuse-Kubitza

lib/sql_io.py: append_csv(): support importing CSVs whose columns are a subset of the full table and/or in a different order. when the header exactly matches the columns, the explicit column list will still be omitted as an optimization. this uses code from r4927.

8820 05/05/2013 10:12 AM Aaron Marcuse-Kubitza

lib/sql_io.py: put_table(): Calling wrapper function: adding pkey or index on the resulting table: don't display warning if a pkey can't be added, because this is actually a legitimate situation when the called function is set-returning and can return multiple rows for one input. having this as a warning results in spurious warnings in the automated tests (which look confusingly like ignored errors because Python warnings include debugging context information). e.g. `make inputs/Madidi/IndividualObservation/test.by_col.xml` causes this error in the sourcelist->sourcename splitting step (which of course can produce multiple specimenholder institutions)....

8077 03/19/2013 02:05 AM Aaron Marcuse-Kubitza

lib/sql_io.py: mk_errors_table(): Create a unique index on the MD5 of the value and error instead of on the values directly, because some strings are too long to index (e.g. row 2537268 of MO.Specimen causes an error "index row size 3032 exceeds maximum 2712 for index [...] Values larger than 1/3 of a buffer page cannot be indexed")

7395 01/31/2013 02:49 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Documented that much of the complexity of the normalizing algorithm is due to PostgreSQL not having a native command for insert/on duplicate select

7394 01/31/2013 02:24 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Corrected "insert/if not exists get" to "insert/on duplicate select"

7393 01/31/2013 01:52 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Removed no longer applicable requirement that it be run at the beginning of a transaction, which was only required when the output table was locked during the function call

7392 01/31/2013 01:48 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Documented that the function's insert/if not exists get algorithm does not support database triggers that populate fields covered by a unique constraint

7180 01/11/2013 06:06 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): DuplicateKeyException: Uniquifying input table to avoid internal duplicate keys: Also filter out duplicate rows in the out_table, so that they don't create duplicate key errors and the resulting index holes

7117 01/08/2013 08:46 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ensure_cond(): Fixed bug where need to wrap strings used in the tracked error message in strings.ustr()

6801 12/12/2012 04:41 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): is_function: Fixed bug where need to add the pkeys table's test pkey constraint after the data is added rather than when the empty table is created, to avoid adding a pkey constraint that will later be violated by data which returns multiple output rows for an input row (such as calls to _split())

6800 12/12/2012 04:36 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): insert_into_pkeys(): Allow callers to override run_query_into()'s add_pkey_ param in case the initial version of the pkeys table should not yet have the test pkey constraint (e.g. because data is added after the table is created)

6300 11/19/2012 05:32 PM Aaron Marcuse-Kubitza

sql_io.py: cast(): Use sql_gen.Cast() to generate the cast, in order to take advantage of its support for casts to unknown

6226 11/15/2012 10:43 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Special handling for functions with hstore params: Fixed bug where need to unwrap literal values of mapping, which might be sql_gen.Literal objects

6220 11/15/2012 10:04 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Added special handling for functions with hstore params. Note that although _map() doesn't exist yet as a DB function, this code must be in place before _map() is created to avoid param type mismatch errors.

6005 11/05/2012 09:54 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Removed assertion that into's row count be at least full_in_table's row count, because now that DISTINCT ON is used to satisfy the into table pkey, this is no longer necessarily true

5993 11/05/2012 04:49 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Setting pkeys of missing rows: Fixed bug in column-based import where when input rows match multiple output rows in one of this iteration's input tables, the into table's pkey constraint is violated because full_in_table contains multiple entries for an input pkey

5968 11/02/2012 03:37 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Switched back to using run_query_into()'s add_pkey_ option now that it uses sql.add_pkey_or_index() instead of sql.add_pkey()

5895 10/31/2012 10:05 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ensure_cond(): When deleting rows rows that do not satisfy the condition, handle sql.DoesNotExistExceptions caused by columns in the condition that were not replaced with NULL. These occur when out_table is a function, and the columns of the table the condition relates to therefore can't be found using out_table.

5894 10/31/2012 09:59 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Calling function: Do not cache the function call, because it may be retried after error handling

5889 10/31/2012 08:45 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): sql.DatabaseErrors: Factored exception-handling code out into handle_unknown_exc(), for use by other exception handlers

5888 10/31/2012 08:39 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ensure_cond(): is_function: Fixed bug where can't replace out_table_cols with NULL because out_table is a function, not a table

5827 10/30/2012 12:07 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ensure_cond(): Fixed bug where test if any rows failed cond did not check if cur != None (which is the case when cond == sql_gen.true_expr) before checking cur.rowcount

5824 10/29/2012 09:37 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Assertion that into and full_in_table have the same row count: Allow into to have more rows than full_in_table, in case an input row matched multiple output rows. This should not happen for a properly-configured database, but seems to happen periodically nevertheless (currently, to the MO datasource) and should not abort the import when it does.

5818 10/26/2012 06:00 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ensure_cond(): When adding the failed condition to the errors table, also include the original, untranslated condition from the DB schema in addition to the translation of the condition into the input schema

5817 10/26/2012 05:45 PM Aaron Marcuse-Kubitza

sql_io.py: track_data_error(): Fixed bug where errors whose column had no srcs (indicated by () ) were incorrectly being ignored. This affected NOT NULL exceptions where the column was not provided by the dataset.

5815 10/26/2012 05:38 PM Aaron Marcuse-Kubitza

sql_io.py: track_data_error(): Support errors with no columns by inserting a single entry with column set to NULL

5813 10/26/2012 05:00 PM Aaron Marcuse-Kubitza

sql_io.py: mk_errors_table(): Made "column" column nullable, because some errors (such as check constraint violations) don't have any corresponding columns if its columns weren't provided in the input data

5791 10/25/2012 05:12 PM Aaron Marcuse-Kubitza

sql_io.py: cast(): Set the created function's value param type to anyelement to support any input type, not just text

5766 10/25/2012 07:51 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): insert_into_pkeys(): Use new sql.add_pkey_or_index() instead of sql.add_pkey() in order to just print a warning if for some reason there were duplicate entries for an input row in the iteration's pkeys table. This should provide a workaround for bugs (often in the schema itself, related to its unique indexes) that cause an input row to match multiple output rows when joining on the output table using the unique constraint's columns.

5726 10/23/2012 07:56 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): is_function: Moved definition of wrapper function inside try block of main loop because the creation of the empty pkeys table (whose row type is needed for the wrapper function) can itself produce MissingCastExceptions, which must be thrown inside the loop in order to be handled properly

5719 10/23/2012 05:33 AM Aaron Marcuse-Kubitza

sql_io.py: put(): Pass on_error through to put_table()

5718 10/23/2012 05:19 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): log_exc(): Return False if removing all rows and have callers break the main loop so that no further exception-handling code is processed before the main loop is exited

5594 10/17/2012 11:50 AM Aaron Marcuse-Kubitza

sql_io.py: import_csv(): Add a row_num column at the beginning of the table, which is autopopulated by csvs.RowNumFilter (it cannot be autopopulated by the serial datatype, because this does not support COPY FROM with a NULL-equivalent value in the serial field). This fixes a bug in csv2db where rows would not stay in inserted order upon querying the table, and would be returned in a different order each query, which prevented LIMIT/OFFSET based subsetting from returning consistent, nonoverlapping results. This occurs because PostgreSQL unfortunately does not return rows in inserted order (or any stable order: "If sorting is not chosen, the rows will be returned in an unspecified order [which] must not be relied on" <http://www.postgresql.org/docs/8.3/static/queries-order.html&gt;), so an explicit ORDER BY is always needed to ensure staging table rows are retrievable in the order they were inserted.

5591 10/17/2012 11:04 AM Aaron Marcuse-Kubitza

sql_io.py: import_csv(): Take a reader and header rather than a stream to allow callers to pass in a wrapped CSV reader for filtering, etc.

5590 10/17/2012 11:00 AM Aaron Marcuse-Kubitza

sql_io.py: append_csv(): Take a reader and header rather than a stream_info and stream to allow callers to use the simpler csvs.reader_and_header() function. This also allows callers to pass in a wrapped CSV reader for filtering, etc.

5588 10/17/2012 10:42 AM Aaron Marcuse-Kubitza

sql_io.py: append_csv(): Wrap input stream in a ProgressInputStream that reports rows (rather than lines) read

5584 10/17/2012 09:55 AM Aaron Marcuse-Kubitza

sql_io.py: append_csv(): Removed no longer used INSERT mode, since all callers now use the default COPY FROM

5583 10/17/2012 09:53 AM Aaron Marcuse-Kubitza

sql_io.py: import_csv(): Removed no longer needed manual setting of use_copy_from, which defaults to True in append_csv()

5578 10/17/2012 09:32 AM Aaron Marcuse-Kubitza

sql_io.py: append_csv(): Parse any exceptions generated by the COPY FROM using new sql.parse_exception()

5573 10/17/2012 09:01 AM Aaron Marcuse-Kubitza

sql_io.py: append_csv(): Don't disable COPY FROM for TSVs, which are now supported using csvs.InputRewriter

5572 10/17/2012 08:59 AM Aaron Marcuse-Kubitza

sql_io.py: append_csv(): COPY FROM: Wrap provided stream in standardizing stream to fix ragged rows (with unequal # columns) and nonstandard CSV dialects (such as TSV with \-escaped newlines)

5569 10/17/2012 07:25 AM Aaron Marcuse-Kubitza

sql_io.py: row_num_col_def: Changed type to integer so the row_num can be populated directly by the insert process

5568 10/17/2012 07:19 AM Aaron Marcuse-Kubitza

sql_io.py: Added row_num_col_def for use by import_csv(). The row_num column will be necessary again because PostgreSQL unfortunately does not return rows in inserted order (or any stable order: "If sorting is not chosen, the rows will be returned in an unspecified order [which] must not be relied on" <http://www.postgresql.org/docs/8.3/static/queries-order.html&gt;), so an explicit ORDER BY is always needed to ensure staging table rows are retrievable in the order they were inserted.

5553 10/16/2012 09:19 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Ensuring into's out_pkey is different from in_pkey: Prepend "out." instead of out_table to avoid long column names for the output pkey

5530 10/15/2012 03:23 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): full_in_table: Create it using new sql.copy_table() instead of sql.run_query_into()

5528 10/15/2012 03:14 PM Aaron Marcuse-Kubitza

sql.mk_select() calls: Removed no longer needed order_by=None when limit=0

5523 10/15/2012 02:36 PM Aaron Marcuse-Kubitza

sql.select() calls: Removed order_by=None everywhere that a stable row order is required (i.e. consistent between selects, or consistent between table transformations). This causes several tests to return different inserted row counts, because the input table is now being accessed in pkey order instead of in table order. This fixes a bug where tables with more rows than ~100 would return different results for repeated calls of the same non-ordered select.

5505 10/15/2012 08:45 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ensure_cond(): track_data_error(): Concatenate the columns in the constraint together using , rather than adding a separate entry for each column, because the constraint is applicable to all columns together rather than to each column separately

5504 10/15/2012 08:26 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Renamed ignore_cond() to ensure_cond() for clarity

5450 10/12/2012 03:18 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): DuplicateKeyException: Fixed bug where indexes with conditions needed to have the input rows filtered by the condition, to prevent trying to retrieve an existing/inserted row using a join on the index columns when the index in fact does not apply. This fixes a bug in the import of taxonconcept where the taxonconcept_0_unique_identifying_name unique index has a condition which was not satisfied for input rows with no identifyingtaxonomicname, causing any input row with NULL in this column to match all taxonconcepts with a NULL identifyingtaxonomicname. This uses ignore_cond()'s new support for constraints that did not fail at least once.

5449 10/12/2012 03:12 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ignore_cond(): Added support for constraints that did not fail at least once, and therefore should not be required to simplify to a non-false value. As part of this, only track the failed constraint in the errors table if it actually failed at least once based on the deleted row count or the `failed` param.

5444 10/12/2012 12:11 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): insert_into_pkeys(): Take a query as the param instead of sql.mk_select()'s params, to allow the caller to pass in any query without needing insert_into_pkeys() to manually pass through those args

5442 10/11/2012 09:36 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ignore_cond(): Log message: Replaced don't with do not so it wouldn't mess up syntax highlighting when viewing the log file in a text editor

5394 10/10/2012 06:10 AM Aaron Marcuse-Kubitza

sql_io.py: cleanup_table(): Use sql.table_pkey_col() instead of sql.pkey_col() so that only an actual pkey column is removed from the list of columns to clean. This fixes a bug where the first column in the table was not cleaned up if there was no pkey. Note that this bug only affected newly re-created staging tables, because staging tables previously had a special row_num pkey column added if they did not already have a pkey. The row_num column is now added by column-based import instead.

5388 10/10/2012 05:01 AM Aaron Marcuse-Kubitza

sql.py: Renamed pkey() to pkey_name()

5387 10/10/2012 04:45 AM Aaron Marcuse-Kubitza

sql.py: Renamed pkey_col_() to pkey_col()

5384 10/10/2012 04:37 AM Aaron Marcuse-Kubitza

cleanup_table(): Use new sql.table_cols() instead of sql.table_col_names()

5381 10/10/2012 03:33 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Resolving default value column: If ignoring all rows, use input cols directly instead of cols from joined-together input table. In addition to being simpler, this prevents the returned column's name from growing longer and longer as each iteration prepends its input table table name to the default value column name.

5380 10/10/2012 03:07 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Moved changing the table of the default value column from Resolving the default value column to Setting pkeys of missing rows, because the table change is only needed in this section

5379 10/10/2012 03:04 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Resolving default value column: Always call sql_gen.remove_col_rename() because it will just pass the value through if it's not a column

5377 10/10/2012 02:30 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Replaced limit_ref integer with ignore_all_ref boolean, because it is no longer used as a select statement limit

5376 10/10/2012 02:29 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): remove_all_rows(): Corrected "just create an empty pkeys table" comment to "just return the default value column"

5375 10/10/2012 02:27 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): mk_main_select(): Removed setting limit to limit_ref0, because an empty pkeys table is no longer created when ignoring all rows

5374 10/10/2012 02:19 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Setting pkeys of missing rows: Removed "limit_ref0 == 0" check because this code is never reached in that case

5373 10/10/2012 02:16 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Ignoring all rows for unrecoverable errors: Even in multi-row mode, just return whatever the default value or column was, instead of creating an output table containing the default value filled in for every row. This also assists the optimization to skip empty levels of taxonconcepts, because it folds the empty level to that level's parent level rather than creating a whole new temp table with ultimately the same contents.

5369 10/10/2012 01:24 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ignore_cond(): Changed "Ignoring rows where" message with the negated (filter-out) condition to "Ignoring rows that don't satisfy" with the filter condition for clarity

5368 10/10/2012 01:22 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): ignore_cond(): If cond simplifies to false, remove all rows instead of filtering out individual rows which will all be filtered out. This optimization should improve import times of tables, such as taxonconcept, which use a check constraint instead of NOT NULL constraints to prevent empty rows. The taxonomic schema refactoring caused the creation of many more levels of taxonconcepts, many of which (such as variety, forma, cultivar) are empty for most datasources, so this optimization should also reduce overall import times for datasources that have any empty levels of taxonconcept. Note that this optimization is only possible now that sql_gen.simplify_expr() is able to simplify all the way to a single boolean value for the taxonconcept_required_key constraint.

5367 10/10/2012 12:55 AM Aaron Marcuse-Kubitza

Moved expression transforming functions from sql.py to sql_gen.py because they do not manipulate an actual database and merely generate SQL

5337 10/09/2012 10:16 PM Aaron Marcuse-Kubitza

sql.py: Renamed table_cols() to table_col_names() for clarity, because it does not return sql_gen.Col objects

5289 10/05/2012 10:52 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Resolving default value column: Fixed bug where the default value col needed to have its table changed from in_table to full_in_table if it's a table column, and needed to have any column rename removed if it's a literal value

5239 10/04/2012 07:43 PM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Resolve default value column after the main loop (inserts and selects), so that the default value column can refer to an output column that is not in the original mapping but is added to the mapping from a col_defaults entry. This requires deferring the "Missing mapping for NOT NULL column" warning until the default value column is resolved, and including all columns in the full_in_table since the default value input column is not yet known.

5192 10/03/2012 04:50 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Fixed bug where row_ct_ref was incorrectly being incremented when the iteration is a function call. This bug only occurred in row-based mode, because the DB cursor for a function call is not stored in column-based mode.

5164 10/02/2012 08:56 PM Aaron Marcuse-Kubitza

sql_io.py: append_csv(): In INSERT mode, print # rows read (different from # lines read if some fields contained embedded newlines) and # rows inserted (different from # rows read if some violated a constraint)

5129 09/28/2012 03:00 PM Aaron Marcuse-Kubitza

sql_io.py: cleanup_table(): Don't clean up the pkey, because the canonicalization involved may produce collisions (as it does for TNRS.tnrs)

5070 09/27/2012 10:12 AM Aaron Marcuse-Kubitza

sql_io.py: put_table(): Removed comment that can support in_tables of any fixed-size iterable type, because the iterable must be ordered so that the first table can be treated specially