moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
csv2db: Open input stream in universal newlines mode to support files with \r as the line ending
sql_io.py: import_csv(): Take a reader and header rather than a stream to allow callers to pass in a wrapped CSV reader for filtering, etc.
csv2db, tnrs_db: Removed ProgressInputStream wrapper around input stream, which is no longer needed (and causes overlapping output) now that sql_io.append_csv() prints # rows read
csv2db: Removed no longer needed manual setting of use_copy_from, which defaults to True in sql_io.import_csv()
csv2db: Removed no longer needed separate handling of sql.DatabaseErrors, because all recoverable errors caused by COPY FROM (EncodingException and ragged rows) are now handled or avoided
csv2db: Handle EncodingException separately by changing the connection encoding to LATIN1 and retrying
csv2db: Removed no longer used has_row_num param
sql_io.py: import_csv(): Don't add a row number column to the created table because it is now added automatically to the temp table by column-based import (row-based import now also does not require a pkey for DB inputs)
csv2db: Use new sql_io.import_csv()
csv2db: Don't truncate the table before loading rows because it has just been created, and is therefore empty. This statement may be left over from a time when the table was created only once, and its creation was not rolled back if the import fails.
sql_io.py: cleanup_table(): Print 'Cleaning up table' log message
sql_io.py: cleanup_table(): Also vacuum and reanalyze table
csv2db: COPY FROM mode: Removed no longer needed explicit column list, now that the initial table has the exact width of the CSV (the row_num is added later)
csv2db: Add any row_num column after creating the table, so it does not interfere with row widths when using COPY FROM without explicit column names
csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.
csv2db: When no command is specified, just clean up the specified table
csv2db: Removed no longer used errors_table_only option
csv2db: Removed no longer needed creation of errors table, because it is now created automatically by column-based import
csv2db: Made input_cmd optional when errors_table_only is on, because the CSV header is not needed to create the errors table
csv2db: Added has_row_num param to disable creating a row_num column
csv2db: log(): sys.stderr.write(): Run strings.to_raw_str() on message to handle Unicode chars
csv2db: Run strings.to_unicode() on column names to handle Unicode chars
csv2db: esc_name(): Use db.esc_name()
csv2db: COPY FROM: Fixed %-injection bug where column names' %s were not escaped prior to cursor.mogrify(), by changing the code to use inline db.esc_value() instead
csv2db: Creating errors table: Only drop existing errors table in errors_table_only mode, so that errors tables are not unintentionally deleted when `make inputs/install` is run. This helps to make `make install` idempotent.
csv2db: verbosity defaults to 3 so that detailed queries with profiling stats are included in the log file, to assist in optimization
csv2db: Don't cache per-row INSERT queries because this bloats the cache (there aren't repeated identical INSERTs that shouldn't be re-run like in row-based import)
csv2db: Fixed bug where CREATE TABLE statement was cached, causing it not to be re-executed after a rollback due to a failed COPY FROM. Avoid re-creating the table after a failed COPY FROM, and instead just remove any existing rows.
csv2db: Vacuum table instead of just reanalyzing it because for some reason reanalyzing it isn't enough to fix the cached row count (causing pgAdmin3 to report that the table needs to be vacuumed)
csv2db: Don't add indexes on the created table because they use up more disk space than the table itself and currently aren't used. (The import process adds indexes on each iteration's column subset instead.)
csv2db: Fixed bug where table needed to be a sql_gen.Table object with the proper schema, so that errors_table would be created in the correct schema. Removed no longer needed changing of the search_path.
csv2db: Create errors table first, so that imports can start using it right away
Moved Data cleanup from sql.py to sql_io.py
Moved error tracking from sql.py to sql_io.py
csv2db: Reanalyze table, so that query planner stats are up to date even though the table doesn't need to be vacuumed anymore
csv2db: Removed no longer needed table vacuum (cleanup_table() now avoids creating dead rows)
csv2db: Adding indexes: Fixed bug where sql.add_index()'s ensure_not_null param needed to be renamed to ensure_not_null_
csv2db: Log inserts with log_level=5 so they are not shown for verbosity 4, which is used to see the savepoints and autocommits
Removed unnecessary db.db.commit() calls because commits are now done automatically by DbConn's autocommit mode
csv2db: ProgressInputStream: Use default progress message 'Read %d line(s)' because there is not necessarily one CSV row per line, due to embedded newlines
csv2db: Support reinstalling just the errors table using new errors_table_only option
csv2db: Use sql_gen.TypedCol.nullable instead of manually adding 'NOT NULL' to the type. Ensure that pkeys are properly NOT NULL.
csv2db: Adding indexes: Create plain indexes using ensure_not_null=False because the indexes will primarily be used by the user to search for specific values, rather than by the mapping script which uses the ensure_not_null
csv2db: Adding indexes: Fixed bug where col.to_Col() could not be used because sql.add_index() does not support name-only columns (plain strings are OK, though)
csv2db: Errors table: Removed no longer needed sql_gen.EnsureNotNull() because this is now added automatically
csv2db: When reraising exception, use `raise` instead of `raise e` to preserve whole stack trace
sql.py: create_table(): Add indexes on all non-pkey columns, unless turned off or deferred using new param col_indexes
csv2db: Add column indexes on errors table. Use typed_cols and `.to_Col()` to iterate over columns to add indexes on, for the main and errors tables.
csv2db: Use new sql.errors_table()
sql.py: cast(): Made errors table also store SQLSTATE in error_code column
csv2db: Errors table: index_cols: Remove no longer needed sql_gen.Col() (now done by EnsureNotNull)
csv2db: Use sql_gen.EnsureNotNull instead of the ensure_not_null() function in the functions schema to avoid a dependency on the functions schema, which would cause the UNIQUE index to be dropped whenever the functions schema is reinstalled
csv2db: Errors table: Add UNIQUE index on all columns
csv2db: Vacuum the created table
csv2db: Create errors table for use by column-based import
csv2db: Use verbosity-based logging like bin/map. Use sql.create_table(). Add indexes on the columns to speed up column-based import and to speed up searching the table for particular values.
csv2db: Increased frequency of "Processed .. row(s)" messages to match slower, more common INSERT case instead of faster, less used COPY FROM case
csv2db: Switched to using plain table names rather than table_is_esc
csv2db: Use new sql.cleanup_table() to map NULL-equivalents to NULL. Consider the empty string to be NULL.
Calls to sql.esc_name*(): Removed preserve_case=True because it is now the default
csv2db: Fixed bug where extra columns were not truncated in INSERT mode. Replace empty column names with the column # to avoid errors with CSVs that have trailing ","s, etc.
csv2db: Fall back to manually inserting each row (autodetecting the encoding for each field) if COPY FROM doesn't work
Added csv2db to load a command's CSV output stream into a PostgreSQL table