Project

General

Profile

« Previous | Next » 

Revision 5594

sql_io.py: import_csv(): Add a row_num column at the beginning of the table, which is autopopulated by csvs.RowNumFilter (it cannot be autopopulated by the serial datatype, because this does not support COPY FROM with a NULL-equivalent value in the serial field). This fixes a bug in csv2db where rows would not stay in inserted order upon querying the table, and would be returned in a different order each query, which prevented LIMIT/OFFSET based subsetting from returning consistent, nonoverlapping results. This occurs because PostgreSQL unfortunately does not return rows in inserted order (or any stable order: "If sorting is not chosen, the rows will be returned in an unspecified order [which] must not be relied on" <http://www.postgresql.org/docs/8.3/static/queries-order.html&gt;), so an explicit ORDER BY is always needed to ensure staging table rows are retrievable in the order they were inserted.

View differences:

sql_io.py
268 268
    def esc_name(name): return db.esc_name(name)
269 269
    
270 270
    typed_cols = [sql_gen.TypedCol(v, 'text') for v in col_names]
271
    typed_cols.insert(0, row_num_col_def)
272
    header.insert(0, row_num_col_def.name)
273
    reader = csvs.RowNumFilter(reader)
271 274
    
272 275
    log('Creating table')
273 276
    # Note that this is not rolled back if the import fails. Instead, it is

Also available in: Unified diff