/lib/sql_io.py - Diff - BIEN 3 - NCEAS Projects

« Previous | Next »

Revision 5594

Added by Aaron Marcuse-Kubitza about 12 years ago

sql_io.py: import_csv(): Add a row_num column at the beginning of the table, which is autopopulated by csvs.RowNumFilter (it cannot be autopopulated by the serial datatype, because this does not support COPY FROM with a NULL-equivalent value in the serial field). This fixes a bug in csv2db where rows would not stay in inserted order upon querying the table, and would be returned in a different order each query, which prevented LIMIT/OFFSET based subsetting from returning consistent, nonoverlapping results. This occurs because PostgreSQL unfortunately does not return rows in inserted order (or any stable order: "If sorting is not chosen, the rows will be returned in an unspecified order [which] must not be relied on" <http://www.postgresql.org/docs/8.3/static/queries-order.html>), so an explicit ORDER BY is always needed to ensure staging table rows are retrievable in the order they were inserted.

         def esc_name(name): return db.esc_name(name)
         typed_cols = [sql_gen.TypedCol(v, 'text') for v in col_names]
         typed_cols.insert(0, row_num_col_def)
         header.insert(0, row_num_col_def.name)
         reader = csvs.RowNumFilter(reader)
         log('Creating table')
         # Note that this is not rolled back if the import fails. Instead, it is

Also available in: Unified diff

Project

General

Profile

Revision 5594

Added by Aaron Marcuse-Kubitza about 12 years ago