Revision 5594
Added by Aaron Marcuse-Kubitza over 12 years ago
lib/sql_io.py | ||
---|---|---|
268 | 268 |
def esc_name(name): return db.esc_name(name) |
269 | 269 |
|
270 | 270 |
typed_cols = [sql_gen.TypedCol(v, 'text') for v in col_names] |
271 |
typed_cols.insert(0, row_num_col_def) |
|
272 |
header.insert(0, row_num_col_def.name) |
|
273 |
reader = csvs.RowNumFilter(reader) |
|
271 | 274 |
|
272 | 275 |
log('Creating table') |
273 | 276 |
# Note that this is not rolled back if the import fails. Instead, it is |
Also available in: Unified diff
sql_io.py: import_csv(): Add a row_num column at the beginning of the table, which is autopopulated by csvs.RowNumFilter (it cannot be autopopulated by the serial datatype, because this does not support COPY FROM with a NULL-equivalent value in the serial field). This fixes a bug in csv2db where rows would not stay in inserted order upon querying the table, and would be returned in a different order each query, which prevented LIMIT/OFFSET based subsetting from returning consistent, nonoverlapping results. This occurs because PostgreSQL unfortunately does not return rows in inserted order (or any stable order: "If sorting is not chosen, the rows will be returned in an unspecified order [which] must not be relied on" <http://www.postgresql.org/docs/8.3/static/queries-order.html>), so an explicit ORDER BY is always needed to ensure staging table rows are retrievable in the order they were inserted.