Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  _archive 1598 almost 13 years Aaron Marcuse-Kubitza Moved _archive/tapir2flatClient/trunk/client/ t...
  analysis 3076 over 12 years Aaron Marcuse-Kubitza Added top-level analysis dir for range modeling
  backups 4751 over 12 years Aaron Marcuse-Kubitza backups/Makefile: Backups: Full DB: Specify the...
  bin 5591 over 12 years Aaron Marcuse-Kubitza sql_io.py: import_csv(): Take a reader and head...
  config 272 about 13 years Aaron Marcuse-Kubitza Moved bien_password to new config dir
  inputs 5562 over 12 years Aaron Marcuse-Kubitza inputs/FIA/Organism/map.csv: Height: Remapped t...
  lib 5594 over 12 years Aaron Marcuse-Kubitza sql_io.py: import_csv(): Add a row_num column a...
  mappings 5567 over 12 years Aaron Marcuse-Kubitza mappings/VegCore.csv: Removed unit-ambiguous he...
  schemas 5559 over 12 years Aaron Marcuse-Kubitza schemas/functions.sql: Added _ft_to_m()
  to_do 4524 over 12 years Aaron Marcuse-Kubitza to_do/timeline.doc: Updated to reflect addition...
  validation 4523 over 12 years Aaron Marcuse-Kubitza Added validation/
Makefile 9.86 KB 5459 over 12 years Aaron Marcuse-Kubitza Makefile: Moved setting of $(root) before inclu...
README.TXT 12.9 KB 5563 over 12 years Aaron Marcuse-Kubitza README.TXT: Data import: import_all: Added NCBI...
map 989 Bytes 5158 over 12 years Aaron Marcuse-Kubitza root map: Removed no longer needed public schem...
new_terms.csv 30.4 KB 4887 over 12 years Aaron Marcuse-Kubitza Regenerated root unmapped_terms.csv, new_terms.csv
unmapped_terms.csv 5.8 KB 4887 over 12 years Aaron Marcuse-Kubitza Regenerated root unmapped_terms.csv, new_terms.csv

Latest revisions

# Date Author Comment
5594 10/17/2012 11:50 AM Aaron Marcuse-Kubitza

sql_io.py: import_csv(): Add a row_num column at the beginning of the table, which is autopopulated by csvs.RowNumFilter (it cannot be autopopulated by the serial datatype, because this does not support COPY FROM with a NULL-equivalent value in the serial field). This fixes a bug in csv2db where rows would not stay in inserted order upon querying the table, and would be returned in a different order each query, which prevented LIMIT/OFFSET based subsetting from returning consistent, nonoverlapping results. This occurs because PostgreSQL unfortunately does not return rows in inserted order (or any stable order: "If sorting is not chosen, the rows will be returned in an unspecified order [which] must not be relied on" <http://www.postgresql.org/docs/8.3/static/queries-order.html&gt;), so an explicit ORDER BY is always needed to ensure staging table rows are retrievable in the order they were inserted.

5593 10/17/2012 11:43 AM Aaron Marcuse-Kubitza

csvs.py: Added RowNumFilter, which adds a row # column at the beginning of each row

5592 10/17/2012 11:42 AM Aaron Marcuse-Kubitza

streams.py: LineCountStream, LineCountInputStream: Fixed bug where line_num was 1 too high because it started at 1 and was incremented before each line is returned. It now properly starts at 1, but the initial line_num value is 0 to increment to 1 upon encountering the first line. This off-by-one behavior may have been needed for code that associates an error message with a line #, but such code should add 1 to the line_num to get the line # of the error if the error prevents the next line from being read by the LineCount*Stream.

5591 10/17/2012 11:04 AM Aaron Marcuse-Kubitza

sql_io.py: import_csv(): Take a reader and header rather than a stream to allow callers to pass in a wrapped CSV reader for filtering, etc.

5590 10/17/2012 11:00 AM Aaron Marcuse-Kubitza

sql_io.py: append_csv(): Take a reader and header rather than a stream_info and stream to allow callers to use the simpler csvs.reader_and_header() function. This also allows callers to pass in a wrapped CSV reader for filtering, etc.

5589 10/17/2012 10:44 AM Aaron Marcuse-Kubitza

csv2db, tnrs_db: Removed ProgressInputStream wrapper around input stream, which is no longer needed (and causes overlapping output) now that sql_io.append_csv() prints # rows read

5588 10/17/2012 10:42 AM Aaron Marcuse-Kubitza

sql_io.py: append_csv(): Wrap input stream in a ProgressInputStream that reports rows (rather than lines) read

5587 10/17/2012 10:40 AM Aaron Marcuse-Kubitza

csvs.py: InputRewriter: Use new StreamFilter to translate StopIteration EOF to ''

5586 10/17/2012 10:36 AM Aaron Marcuse-Kubitza

csvs.py: Added StreamFilter

5585 10/17/2012 10:36 AM Aaron Marcuse-Kubitza

csvs.py: InputRewriter: Also support stream inputs which report EOF as '' instead of StopIteration

View all revisions | View revisions

Also available in: Atom