Project

General

Profile

  • svn:executable: *

# Date Author Comment
11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

7652 02/26/2013 06:51 AM Aaron Marcuse-Kubitza

csv2db: Open input stream in universal newlines mode to support files with \r as the line ending

5591 10/17/2012 11:04 AM Aaron Marcuse-Kubitza

sql_io.py: import_csv(): Take a reader and header rather than a stream to allow callers to pass in a wrapped CSV reader for filtering, etc.

5589 10/17/2012 10:44 AM Aaron Marcuse-Kubitza

csv2db, tnrs_db: Removed ProgressInputStream wrapper around input stream, which is no longer needed (and causes overlapping output) now that sql_io.append_csv() prints # rows read

5582 10/17/2012 09:50 AM Aaron Marcuse-Kubitza

csv2db: Removed no longer needed manual setting of use_copy_from, which defaults to True in sql_io.import_csv()

5581 10/17/2012 09:49 AM Aaron Marcuse-Kubitza

csv2db: Removed no longer needed separate handling of sql.DatabaseErrors, because all recoverable errors caused by COPY FROM (EncodingException and ragged rows) are now handled or avoided

5580 10/17/2012 09:46 AM Aaron Marcuse-Kubitza

csv2db: Handle EncodingException separately by changing the connection encoding to LATIN1 and retrying

5028 09/27/2012 12:17 AM Aaron Marcuse-Kubitza

csv2db: Removed no longer used has_row_num param

5027 09/27/2012 12:14 AM Aaron Marcuse-Kubitza

sql_io.py: import_csv(): Don't add a row number column to the created table because it is now added automatically to the temp table by column-based import (row-based import now also does not require a pkey for DB inputs)

4996 09/25/2012 09:12 PM Aaron Marcuse-Kubitza

csv2db: Use new sql_io.import_csv()

4994 09/25/2012 09:05 PM Aaron Marcuse-Kubitza

csv2db: Don't truncate the table before loading rows because it has just been created, and is therefore empty. This statement may be left over from a time when the table was created only once, and its creation was not rolled back if the import fails.

4993 09/25/2012 08:44 PM Aaron Marcuse-Kubitza

sql_io.py: cleanup_table(): Print 'Cleaning up table' log message

4992 09/25/2012 08:41 PM Aaron Marcuse-Kubitza

sql_io.py: cleanup_table(): Also vacuum and reanalyze table

4927 09/21/2012 03:57 PM Aaron Marcuse-Kubitza

csv2db: COPY FROM mode: Removed no longer needed explicit column list, now that the initial table has the exact width of the CSV (the row_num is added later)

4926 09/21/2012 03:55 PM Aaron Marcuse-Kubitza

csv2db: Add any row_num column after creating the table, so it does not interfere with row widths when using COPY FROM without explicit column names

4925 09/21/2012 03:48 PM Aaron Marcuse-Kubitza

csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.

4924 09/21/2012 03:44 PM Aaron Marcuse-Kubitza

csv2db: Fixed bug where tables without a row_num (such as *.src tables) were not properly supported when the CSV contained ragged rows, because the columns were truncated to # column names + 1 but there was no row_num to be the +1. This was solved by moving row_num to the end, so that it does not impact the column count whether it's there or not.

4446 09/05/2012 03:56 AM Aaron Marcuse-Kubitza

csv2db: When no command is specified, just clean up the specified table

4441 09/05/2012 03:07 AM Aaron Marcuse-Kubitza

csv2db: Removed no longer used errors_table_only option

4439 09/05/2012 02:59 AM Aaron Marcuse-Kubitza

csv2db: Removed no longer needed creation of errors table, because it is now created automatically by column-based import

4258 08/28/2012 01:49 PM Aaron Marcuse-Kubitza

csv2db: Made input_cmd optional when errors_table_only is on, because the CSV header is not needed to create the errors table

4257 08/28/2012 01:47 PM Aaron Marcuse-Kubitza

csv2db: Added has_row_num param to disable creating a row_num column

3610 07/25/2012 10:53 PM Aaron Marcuse-Kubitza

csv2db: log(): sys.stderr.write(): Run strings.to_raw_str() on message to handle Unicode chars

3609 07/25/2012 10:52 PM Aaron Marcuse-Kubitza

csv2db: Run strings.to_unicode() on column names to handle Unicode chars

3608 07/25/2012 10:36 PM Aaron Marcuse-Kubitza

csv2db: esc_name(): Use db.esc_name()

3578 07/24/2012 07:11 AM Aaron Marcuse-Kubitza

csv2db: COPY FROM: Fixed %-injection bug where column names' %s were not escaped prior to cursor.mogrify(), by changing the code to use inline db.esc_value() instead

3440 07/18/2012 03:46 PM Aaron Marcuse-Kubitza

csv2db: Creating errors table: Only drop existing errors table in errors_table_only mode, so that errors tables are not unintentionally deleted when `make inputs/install` is run. This helps to make `make install` idempotent.

3271 07/09/2012 02:04 PM Aaron Marcuse-Kubitza

csv2db: verbosity defaults to 3 so that detailed queries with profiling stats are included in the log file, to assist in optimization

3270 07/09/2012 02:01 PM Aaron Marcuse-Kubitza

csv2db: Don't cache per-row INSERT queries because this bloats the cache (there aren't repeated identical INSERTs that shouldn't be re-run like in row-based import)

3149 06/28/2012 11:22 PM Aaron Marcuse-Kubitza

csv2db: Fixed bug where CREATE TABLE statement was cached, causing it not to be re-executed after a rollback due to a failed COPY FROM. Avoid re-creating the table after a failed COPY FROM, and instead just remove any existing rows.

3147 06/28/2012 11:00 PM Aaron Marcuse-Kubitza

csv2db: Vacuum table instead of just reanalyzing it because for some reason reanalyzing it isn't enough to fix the cached row count (causing pgAdmin3 to report that the table needs to be vacuumed)

3146 06/28/2012 10:54 PM Aaron Marcuse-Kubitza

csv2db: Don't add indexes on the created table because they use up more disk space than the table itself and currently aren't used. (The import process adds indexes on each iteration's column subset instead.)

3139 06/27/2012 10:56 PM Aaron Marcuse-Kubitza

csv2db: Fixed bug where table needed to be a sql_gen.Table object with the proper schema, so that errors_table would be created in the correct schema. Removed no longer needed changing of the search_path.

3138 06/27/2012 10:55 PM Aaron Marcuse-Kubitza

csv2db: Fixed bug where table needed to be a sql_gen.Table object with the proper schema, so that errors_table would be created in the correct schema. Removed no longer needed changing of the search_path.

3134 06/27/2012 09:31 PM Aaron Marcuse-Kubitza

csv2db: Create errors table first, so that imports can start using it right away

3081 06/26/2012 05:18 PM Aaron Marcuse-Kubitza

Moved Data cleanup from sql.py to sql_io.py

3080 06/26/2012 05:18 PM Aaron Marcuse-Kubitza

Moved error tracking from sql.py to sql_io.py

3069 06/25/2012 08:43 PM Aaron Marcuse-Kubitza

csv2db: Reanalyze table, so that query planner stats are up to date even though the table doesn't need to be vacuumed anymore

3067 06/25/2012 08:11 PM Aaron Marcuse-Kubitza

csv2db: Removed no longer needed table vacuum (cleanup_table() now avoids creating dead rows)

3054 06/25/2012 06:12 PM Aaron Marcuse-Kubitza

csv2db: Adding indexes: Fixed bug where sql.add_index()'s ensure_not_null param needed to be renamed to ensure_not_null_

2926 06/18/2012 05:14 PM Aaron Marcuse-Kubitza

csv2db: Log inserts with log_level=5 so they are not shown for verbosity 4, which is used to see the savepoints and autocommits

2925 06/18/2012 05:13 PM Aaron Marcuse-Kubitza

Removed unnecessary db.db.commit() calls because commits are now done automatically by DbConn's autocommit mode

2892 06/15/2012 02:47 AM Aaron Marcuse-Kubitza

csv2db: ProgressInputStream: Use default progress message 'Read %d line(s)' because there is not necessarily one CSV row per line, due to embedded newlines

2890 06/15/2012 01:46 AM Aaron Marcuse-Kubitza

csv2db: Support reinstalling just the errors table using new errors_table_only option

2876 06/14/2012 11:20 PM Aaron Marcuse-Kubitza

csv2db: Use sql_gen.TypedCol.nullable instead of manually adding 'NOT NULL' to the type. Ensure that pkeys are properly NOT NULL.

2875 06/14/2012 11:15 PM Aaron Marcuse-Kubitza

csv2db: Adding indexes: Create plain indexes using ensure_not_null=False because the indexes will primarily be used by the user to search for specific values, rather than by the mapping script which uses the ensure_not_null

2873 06/14/2012 11:09 PM Aaron Marcuse-Kubitza

csv2db: Adding indexes: Fixed bug where col.to_Col() could not be used because sql.add_index() does not support name-only columns (plain strings are OK, though)

2824 06/13/2012 08:36 PM Aaron Marcuse-Kubitza

csv2db: Errors table: Removed no longer needed sql_gen.EnsureNotNull() because this is now added automatically

2799 06/12/2012 09:13 PM Aaron Marcuse-Kubitza

csv2db: When reraising exception, use `raise` instead of `raise e` to preserve whole stack trace

2760 06/12/2012 03:05 PM Aaron Marcuse-Kubitza

sql.py: create_table(): Add indexes on all non-pkey columns, unless turned off or deferred using new param col_indexes

2759 06/12/2012 02:46 PM Aaron Marcuse-Kubitza

csv2db: Add column indexes on errors table. Use typed_cols and `.to_Col()` to iterate over columns to add indexes on, for the main and errors tables.

2727 06/11/2012 04:02 PM Aaron Marcuse-Kubitza

csv2db: Use new sql.errors_table()

2720 06/08/2012 09:08 PM Aaron Marcuse-Kubitza

sql.py: cast(): Made errors table also store SQLSTATE in error_code column

2695 06/08/2012 02:26 PM Aaron Marcuse-Kubitza

csv2db: Errors table: index_cols: Remove no longer needed sql_gen.Col() (now done by EnsureNotNull)

2693 06/08/2012 02:19 PM Aaron Marcuse-Kubitza

csv2db: Use sql_gen.EnsureNotNull instead of the ensure_not_null() function in the functions schema to avoid a dependency on the functions schema, which would cause the UNIQUE index to be dropped whenever the functions schema is reinstalled

2689 06/08/2012 01:51 PM Aaron Marcuse-Kubitza

csv2db: Errors table: Add UNIQUE index on all columns

2685 06/07/2012 09:24 PM Aaron Marcuse-Kubitza

csv2db: Vacuum the created table

2682 06/07/2012 08:58 PM Aaron Marcuse-Kubitza

csv2db: Create errors table for use by column-based import

2680 06/07/2012 08:21 PM Aaron Marcuse-Kubitza

csv2db: Use verbosity-based logging like bin/map. Use sql.create_table(). Add indexes on the columns to speed up column-based import and to speed up searching the table for particular values.

2604 06/04/2012 02:51 PM Aaron Marcuse-Kubitza

csv2db: Increased frequency of "Processed .. row(s)" messages to match slower, more common INSERT case instead of faster, less used COPY FROM case

2289 05/22/2012 02:19 PM Aaron Marcuse-Kubitza

csv2db: Switched to using plain table names rather than table_is_esc

2116 05/09/2012 12:36 AM Aaron Marcuse-Kubitza

csv2db: Use new sql.cleanup_table() to map NULL-equivalents to NULL. Consider the empty string to be NULL.

2062 05/04/2012 04:55 PM Aaron Marcuse-Kubitza

Calls to sql.esc_name*(): Removed preserve_case=True because it is now the default

1965 04/24/2012 11:43 AM Aaron Marcuse-Kubitza

csv2db: Fixed bug where extra columns were not truncated in INSERT mode. Replace empty column names with the column # to avoid errors with CSVs that have trailing ","s, etc.

1963 04/23/2012 09:57 PM Aaron Marcuse-Kubitza

csv2db: Fall back to manually inserting each row (autodetecting the encoding for each field) if COPY FROM doesn't work

1942 04/23/2012 04:14 PM Aaron Marcuse-Kubitza

Added csv2db to load a command's CSV output stream into a PostgreSQL table