/trunk/bin/csv2db - Changes - BIEN 3 - NCEAS Projects

root/trunk/bin/csv2db @ 11972

svn:executable: *

#	Date	Author	Comment
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
7652	02/26/2013 06:51 AM	Aaron Marcuse-Kubitza	csv2db: Open input stream in universal newlines mode to support files with \r as the line ending
5591	10/17/2012 11:04 AM	Aaron Marcuse-Kubitza	sql_io.py: import_csv(): Take a reader and header rather than a stream to allow callers to pass in a wrapped CSV reader for filtering, etc.
5589	10/17/2012 10:44 AM	Aaron Marcuse-Kubitza	csv2db, tnrs_db: Removed ProgressInputStream wrapper around input stream, which is no longer needed (and causes overlapping output) now that sql_io.append_csv() prints # rows read
5582	10/17/2012 09:50 AM	Aaron Marcuse-Kubitza	csv2db: Removed no longer needed manual setting of use_copy_from, which defaults to True in sql_io.import_csv()
5581	10/17/2012 09:49 AM	Aaron Marcuse-Kubitza	csv2db: Removed no longer needed separate handling of sql.DatabaseErrors, because all recoverable errors caused by COPY FROM (EncodingException and ragged rows) are now handled or avoided
5580	10/17/2012 09:46 AM	Aaron Marcuse-Kubitza	csv2db: Handle EncodingException separately by changing the connection encoding to LATIN1 and retrying
5028	09/27/2012 12:17 AM	Aaron Marcuse-Kubitza	csv2db: Removed no longer used has_row_num param
5027	09/27/2012 12:14 AM	Aaron Marcuse-Kubitza	sql_io.py: import_csv(): Don't add a row number column to the created table because it is now added automatically to the temp table by column-based import (row-based import now also does not require a pkey for DB inputs)
4996	09/25/2012 09:12 PM	Aaron Marcuse-Kubitza	csv2db: Use new sql_io.import_csv()
4994	09/25/2012 09:05 PM	Aaron Marcuse-Kubitza	csv2db: Don't truncate the table before loading rows because it has just been created, and is therefore empty. This statement may be left over from a time when the table was created only once, and its creation was not rolled back if the import fails.
4993	09/25/2012 08:44 PM	Aaron Marcuse-Kubitza	sql_io.py: cleanup_table(): Print 'Cleaning up table' log message
4992	09/25/2012 08:41 PM	Aaron Marcuse-Kubitza	sql_io.py: cleanup_table(): Also vacuum and reanalyze table
4927	09/21/2012 03:57 PM	Aaron Marcuse-Kubitza	csv2db: COPY FROM mode: Removed no longer needed explicit column list, now that the initial table has the exact width of the CSV (the row_num is added later)
4926	09/21/2012 03:55 PM	Aaron Marcuse-Kubitza	csv2db: Add any row_num column after creating the table, so it does not interfere with row widths when using COPY FROM without explicit column names
4925	09/21/2012 03:48 PM	Aaron Marcuse-Kubitza	csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.
4924	09/21/2012 03:44 PM	Aaron Marcuse-Kubitza	csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.
4446	09/05/2012 03:56 AM	Aaron Marcuse-Kubitza	csv2db: When no command is specified, just clean up the specified table
4441	09/05/2012 03:07 AM	Aaron Marcuse-Kubitza	csv2db: Removed no longer used errors_table_only option
4439	09/05/2012 02:59 AM	Aaron Marcuse-Kubitza	csv2db: Removed no longer needed creation of errors table, because it is now created automatically by column-based import
4258	08/28/2012 01:49 PM	Aaron Marcuse-Kubitza	csv2db: Made input_cmd optional when errors_table_only is on, because the CSV header is not needed to create the errors table
4257	08/28/2012 01:47 PM	Aaron Marcuse-Kubitza	csv2db: Added has_row_num param to disable creating a row_num column
3610	07/25/2012 10:53 PM	Aaron Marcuse-Kubitza	csv2db: log(): sys.stderr.write(): Run strings.to_raw_str() on message to handle Unicode chars
3609	07/25/2012 10:52 PM	Aaron Marcuse-Kubitza	csv2db: Run strings.to_unicode() on column names to handle Unicode chars
3608	07/25/2012 10:36 PM	Aaron Marcuse-Kubitza	csv2db: esc_name(): Use db.esc_name()
3578	07/24/2012 07:11 AM	Aaron Marcuse-Kubitza	csv2db: COPY FROM: Fixed %-injection bug where column names' %s were not escaped prior to cursor.mogrify(), by changing the code to use inline db.esc_value() instead
3440	07/18/2012 03:46 PM	Aaron Marcuse-Kubitza	csv2db: Creating errors table: Only drop existing errors table in errors_table_only mode, so that errors tables are not unintentionally deleted when `make inputs/install` is run. This helps to make `make install` idempotent.
3271	07/09/2012 02:04 PM	Aaron Marcuse-Kubitza	csv2db: verbosity defaults to 3 so that detailed queries with profiling stats are included in the log file, to assist in optimization
3270	07/09/2012 02:01 PM	Aaron Marcuse-Kubitza	csv2db: Don't cache per-row INSERT queries because this bloats the cache (there aren't repeated identical INSERTs that shouldn't be re-run like in row-based import)
3149	06/28/2012 11:22 PM	Aaron Marcuse-Kubitza	csv2db: Fixed bug where CREATE TABLE statement was cached, causing it not to be re-executed after a rollback due to a failed COPY FROM. Avoid re-creating the table after a failed COPY FROM, and instead just remove any existing rows.
3147	06/28/2012 11:00 PM	Aaron Marcuse-Kubitza	csv2db: Vacuum table instead of just reanalyzing it because for some reason reanalyzing it isn't enough to fix the cached row count (causing pgAdmin3 to report that the table needs to be vacuumed)
3146	06/28/2012 10:54 PM	Aaron Marcuse-Kubitza	csv2db: Don't add indexes on the created table because they use up more disk space than the table itself and currently aren't used. (The import process adds indexes on each iteration's column subset instead.)
3139	06/27/2012 10:56 PM	Aaron Marcuse-Kubitza	csv2db: Fixed bug where table needed to be a sql_gen.Table object with the proper schema, so that errors_table would be created in the correct schema. Removed no longer needed changing of the search_path.
3138	06/27/2012 10:55 PM	Aaron Marcuse-Kubitza	csv2db: Fixed bug where table needed to be a sql_gen.Table object with the proper schema, so that errors_table would be created in the correct schema. Removed no longer needed changing of the search_path.
3134	06/27/2012 09:31 PM	Aaron Marcuse-Kubitza	csv2db: Create errors table first, so that imports can start using it right away
3081	06/26/2012 05:18 PM	Aaron Marcuse-Kubitza	Moved Data cleanup from sql.py to sql_io.py
3080	06/26/2012 05:18 PM	Aaron Marcuse-Kubitza	Moved error tracking from sql.py to sql_io.py
3069	06/25/2012 08:43 PM	Aaron Marcuse-Kubitza	csv2db: Reanalyze table, so that query planner stats are up to date even though the table doesn't need to be vacuumed anymore
3067	06/25/2012 08:11 PM	Aaron Marcuse-Kubitza	csv2db: Removed no longer needed table vacuum (cleanup_table() now avoids creating dead rows)
3054	06/25/2012 06:12 PM	Aaron Marcuse-Kubitza	csv2db: Adding indexes: Fixed bug where sql.add_index()'s ensure_not_null param needed to be renamed to ensure_not_null_
2926	06/18/2012 05:14 PM	Aaron Marcuse-Kubitza	csv2db: Log inserts with log_level=5 so they are not shown for verbosity 4, which is used to see the savepoints and autocommits
2925	06/18/2012 05:13 PM	Aaron Marcuse-Kubitza	Removed unnecessary db.db.commit() calls because commits are now done automatically by DbConn's autocommit mode
2892	06/15/2012 02:47 AM	Aaron Marcuse-Kubitza	csv2db: ProgressInputStream: Use default progress message 'Read %d line(s)' because there is not necessarily one CSV row per line, due to embedded newlines
2890	06/15/2012 01:46 AM	Aaron Marcuse-Kubitza	csv2db: Support reinstalling just the errors table using new errors_table_only option
2876	06/14/2012 11:20 PM	Aaron Marcuse-Kubitza	csv2db: Use sql_gen.TypedCol.nullable instead of manually adding 'NOT NULL' to the type. Ensure that pkeys are properly NOT NULL.
2875	06/14/2012 11:15 PM	Aaron Marcuse-Kubitza	csv2db: Adding indexes: Create plain indexes using ensure_not_null=False because the indexes will primarily be used by the user to search for specific values, rather than by the mapping script which uses the ensure_not_null
2873	06/14/2012 11:09 PM	Aaron Marcuse-Kubitza	csv2db: Adding indexes: Fixed bug where col.to_Col() could not be used because sql.add_index() does not support name-only columns (plain strings are OK, though)
2824	06/13/2012 08:36 PM	Aaron Marcuse-Kubitza	csv2db: Errors table: Removed no longer needed sql_gen.EnsureNotNull() because this is now added automatically
2799	06/12/2012 09:13 PM	Aaron Marcuse-Kubitza	csv2db: When reraising exception, use `raise` instead of `raise e` to preserve whole stack trace
2760	06/12/2012 03:05 PM	Aaron Marcuse-Kubitza	sql.py: create_table(): Add indexes on all non-pkey columns, unless turned off or deferred using new param col_indexes
2759	06/12/2012 02:46 PM	Aaron Marcuse-Kubitza	csv2db: Add column indexes on errors table. Use typed_cols and `.to_Col()` to iterate over columns to add indexes on, for the main and errors tables.
2727	06/11/2012 04:02 PM	Aaron Marcuse-Kubitza	csv2db: Use new sql.errors_table()
2720	06/08/2012 09:08 PM	Aaron Marcuse-Kubitza	sql.py: cast(): Made errors table also store SQLSTATE in error_code column
2695	06/08/2012 02:26 PM	Aaron Marcuse-Kubitza	csv2db: Errors table: index_cols: Remove no longer needed sql_gen.Col() (now done by EnsureNotNull)
2693	06/08/2012 02:19 PM	Aaron Marcuse-Kubitza	csv2db: Use sql_gen.EnsureNotNull instead of the ensure_not_null() function in the functions schema to avoid a dependency on the functions schema, which would cause the UNIQUE index to be dropped whenever the functions schema is reinstalled
2689	06/08/2012 01:51 PM	Aaron Marcuse-Kubitza	csv2db: Errors table: Add UNIQUE index on all columns
2685	06/07/2012 09:24 PM	Aaron Marcuse-Kubitza	csv2db: Vacuum the created table
2682	06/07/2012 08:58 PM	Aaron Marcuse-Kubitza	csv2db: Create errors table for use by column-based import
2680	06/07/2012 08:21 PM	Aaron Marcuse-Kubitza	csv2db: Use verbosity-based logging like bin/map. Use sql.create_table(). Add indexes on the columns to speed up column-based import and to speed up searching the table for particular values.
2604	06/04/2012 02:51 PM	Aaron Marcuse-Kubitza	csv2db: Increased frequency of "Processed .. row(s)" messages to match slower, more common INSERT case instead of faster, less used COPY FROM case
2289	05/22/2012 02:19 PM	Aaron Marcuse-Kubitza	csv2db: Switched to using plain table names rather than table_is_esc
2116	05/09/2012 12:36 AM	Aaron Marcuse-Kubitza	csv2db: Use new sql.cleanup_table() to map NULL-equivalents to NULL. Consider the empty string to be NULL.
2062	05/04/2012 04:55 PM	Aaron Marcuse-Kubitza	Calls to sql.esc_name*(): Removed preserve_case=True because it is now the default
1965	04/24/2012 11:43 AM	Aaron Marcuse-Kubitza	csv2db: Fixed bug where extra columns were not truncated in INSERT mode. Replace empty column names with the column # to avoid errors with CSVs that have trailing ","s, etc.
1963	04/23/2012 09:57 PM	Aaron Marcuse-Kubitza	csv2db: Fall back to manually inserting each row (autodetecting the encoding for each field) if COPY FROM doesn't work
1942	04/23/2012 04:14 PM	Aaron Marcuse-Kubitza	Added csv2db to load a command's CSV output stream into a PostgreSQL table

Project

General

Profile