Project

General

Profile

# Date Author Comment
14620 08/28/2014 07:57 PM Aaron Marcuse-Kubitza

bugfix: lib/csvs.py: JsonReader: need to pass col_order to row_dict_to_list_reader

14617 08/28/2014 07:10 PM Aaron Marcuse-Kubitza

lib/csvs.py: JsonReader: added support for values that are arrays

14616 08/28/2014 07:05 PM Aaron Marcuse-Kubitza

lib/csvs.py: MultiFilter: inherit from WrapReader instead of Filter to avoid needing to define a no-op filter_() function

14615 08/28/2014 06:49 PM Aaron Marcuse-Kubitza

bugfix: lib/csvs.py: row_dict_to_list_reader: need to override next() directly instead of just using Filter, because Filter doesn't support returning multiple rows for one input row (in this case, prepending a header row). this caused the 1st data row to be missing.

14614 08/28/2014 06:47 PM Aaron Marcuse-Kubitza

lib/csvs.py: Filter: inherit from WrapReader, which separates out the CSV-reader API code

14613 08/28/2014 06:43 PM Aaron Marcuse-Kubitza

lib/csvs.py: added WrapReader

14612 08/28/2014 06:43 PM Aaron Marcuse-Kubitza

lib/csvs.py: added Reader

14600 08/28/2014 03:10 AM Aaron Marcuse-Kubitza

lib/csvs.py: JsonReader: factored out row-dict-to-list into new row_dict_to_list_reader so that JSON-specific preprocessing is kept separate from the row format translation

14599 08/27/2014 03:17 PM Aaron Marcuse-Kubitza

lib/csvs.py: added MultiFilter, which enables applying multiple filters by nesting

14595 08/26/2014 07:44 PM Aaron Marcuse-Kubitza

lib/csvs.py: added JsonReader, which reads parsed JSON data as row tuples

14594 08/26/2014 07:43 PM Aaron Marcuse-Kubitza

lib/csvs.py: added row_dict_to_list(), which translates a CSV dict-based row to a list-based one

14593 08/26/2014 07:43 PM Aaron Marcuse-Kubitza

lib/csvs.py: RowNumFilter: added support for filtering the header row as well

14592 08/26/2014 07:42 PM Aaron Marcuse-Kubitza

lib/csvs.py: ColInsertFilter: added support for filtering the header row as well

14591 08/26/2014 05:12 PM Aaron Marcuse-Kubitza

lib/csvs.py: InputRewriter: documented that this is also a stream (in addition to inheriting from StreamFilter)

14590 08/26/2014 05:11 PM Aaron Marcuse-Kubitza

bugfix: lib/csvs.py: InputRewriter: accept a reader, as would be expected, instead of a custom stream whose lines are tuples

14586 08/26/2014 04:49 PM Aaron Marcuse-Kubitza

lib/csvs.py: added ProgressInputFilter, analogous to streams.ProgressInputStream

14577 08/25/2014 10:16 PM Aaron Marcuse-Kubitza

lib/csvs.py: added header(stream)

11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

9961 06/20/2013 06:20 AM Aaron Marcuse-Kubitza

lib/csvs.py: sniff(): support single-column spreadsheets by defaulting to the Excel dialect when the delimiter can't be determined

9509 05/23/2013 12:55 PM Aaron Marcuse-Kubitza

lib/csvs.py: ColInsertFilter: support using a literal value instead of a function for the mk_value param, since this is the most common use case

8202 03/27/2013 08:03 PM Aaron Marcuse-Kubitza

lib/csvs.py: stream_info(): Fixed bug where headers with multiline columns were not supported because only the first line (not the first multiline row) is sniffed for the dialect

8071 03/16/2013 12:44 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader.next(): Fixed bug where empty line needs to be separately returned as [], because csv.reader would interpret it as EOF since the line ending has already been removed

8070 03/16/2013 12:25 PM Aaron Marcuse-Kubitza

csvs.py: sniff(): TSVs: Turn off quoting because TSVs use \-escapes instead of quotes to escape delimeters, newlines, etc.

8069 03/16/2013 11:49 AM Aaron Marcuse-Kubitza

csvs.py: InputRewriter.readline(): Surround function in a try block that prints all exceptions, so that debugging information is available if an error occurs when this stream is used as input for psycopg's copy_expert() (COPY FROM)

7290 01/18/2013 07:13 AM Aaron Marcuse-Kubitza

csvs.py: ColInsertFilter: Support adding multiple, consecutive columns

7212 01/14/2013 12:15 PM Aaron Marcuse-Kubitza

csvs.py: sniff(): TSVs: Don't turn off quoting, because some TSVs (such as Madidi.IndividualObservation) do quote fields

7211 01/14/2013 12:13 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader: Use csv.reader.next() when possible to support quoted fields, such as in Madidi.IndividualObservation

6589 12/04/2012 09:18 PM Aaron Marcuse-Kubitza

csvs.py: stream_info(): Use the Excel dialect and an empty header if the CSV file is empty

5736 10/23/2012 09:08 AM Aaron Marcuse-Kubitza

csvs.py: RowNumFilter: Use new ColInsertFilter

5735 10/23/2012 09:08 AM Aaron Marcuse-Kubitza

csvs.py: Added ColInsertFilter

5593 10/17/2012 11:43 AM Aaron Marcuse-Kubitza

csvs.py: Added RowNumFilter, which adds a row # column at the beginning of each row

5587 10/17/2012 10:40 AM Aaron Marcuse-Kubitza

csvs.py: InputRewriter: Use new StreamFilter to translate StopIteration EOF to ''

5586 10/17/2012 10:36 AM Aaron Marcuse-Kubitza

csvs.py: Added StreamFilter

5585 10/17/2012 10:36 AM Aaron Marcuse-Kubitza

csvs.py: InputRewriter: Also support stream inputs which report EOF as '' instead of StopIteration

5574 10/17/2012 09:14 AM Aaron Marcuse-Kubitza

csvs.py: Filter: Added empty close() method to support using it as a stream (such as with streams.ProgressInputStream)

5571 10/17/2012 08:56 AM Aaron Marcuse-Kubitza

csvs.py: Added InputRewriter, which wraps a reader, writing each row back to CSV

5570 10/17/2012 08:54 AM Aaron Marcuse-Kubitza

csvs.py: Added ColCtFilter, which gives all rows the same # columns

5439 10/11/2012 08:23 PM Aaron Marcuse-Kubitza

csvs.py: sniff(): Support multi-char delims using \t, such as \t|\t used by NCBI. Support custom line suffixes, such as \t| used by NCBI.

5438 10/11/2012 08:18 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader.next(): Remove only the autodetected line ending instead of any standard line ending. Note that this requires all header override files to use the same line ending as the CSV they override, which is now the case.

5437 10/11/2012 08:15 PM Aaron Marcuse-Kubitza

csvs.py: is_tsv(): Support multi-char delimiters by checking only the first char of the delimiter

5436 10/11/2012 08:12 PM Aaron Marcuse-Kubitza

csvs.py: sniff(): Also autodetect the line ending

5435 10/11/2012 08:11 PM Aaron Marcuse-Kubitza

csvs.py: sniff(): Also autodetect the line ending

5433 10/11/2012 07:59 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader.next(): Renamed raw_contents var to line, since this is just the line with the ending removed

5431 10/11/2012 07:22 PM Aaron Marcuse-Kubitza

csvs.py: Modify csv.Dialect._validate() to ignore "delimiter must be a 1-character string" errors, in order to support multi-char delimiters used by TsvReader

5430 10/11/2012 07:21 PM Aaron Marcuse-Kubitza

csvs.py: Modify csv.Dialect._validate() to ignore "delimiter must be a 1-character string" errors, in order to support multi-char delimiters used by TsvReader

5429 10/11/2012 06:58 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader: Use str.split() instead of csv.reader().next() to parse the row, for efficiency and to support multi-char delimiters. This is possible because the TSV dialect doesn't use CSV parsing features other than the delimiter and newline-escaping (which is handled separately).

5426 10/10/2012 11:43 AM Aaron Marcuse-Kubitza

csvs.py: delims: Added |

5170 10/02/2012 09:44 PM Aaron Marcuse-Kubitza

csvs.py: tsv_encode_map: Escape \n as \n (instead of as a \ followed by a newline) for clarity. Added escape for \r by using strings.json_encode_map. TsvReader: Decode all escapes in tsv_encode_map.

5146 10/01/2012 08:46 PM Aaron Marcuse-Kubitza

csvs.py: Added tsv_encode_map for use in creating TSVs parsed by TsvReader

5145 10/01/2012 06:42 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader: Also interpret '\t' as a tab, to provide a mechanism for encoding embedded tabs

4211 08/24/2012 07:00 PM Aaron Marcuse-Kubitza

csvs.py: delims: Added ";", which is phpMyAdmin's default CSV delimiter

3055 06/25/2012 06:13 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader: Prevent "new-line character seen in unquoted field" errors by replacing '\r' with '\n'

2114 05/09/2012 12:33 AM Aaron Marcuse-Kubitza

csvs.py: Added row filters

1958 04/23/2012 08:54 PM Aaron Marcuse-Kubitza

csvs.py: reader_and_header(): Use make_reader()

1923 04/20/2012 04:21 PM Aaron Marcuse-Kubitza

csvs.py: stream_info(): Added parse_header option. reader_and_header(): Use stream_info()'s new parse_header option.

1660 03/27/2012 08:30 PM Aaron Marcuse-Kubitza

csvs.py: stream_info(): If header_line == '', set dialect to None rather than trying (and failing) to auto-detect it

1623 03/26/2012 06:09 PM Aaron Marcuse-Kubitza

csvs.py: Added TsvReader to support TSV quirks. Added reader_class(). reader_and_header(): Use reader_class() to automatically use TsvReader instead of csv.reader for TSVs. Added is_tsv() and use it where `dialect.delimiter == '\t'` was used.

1621 03/26/2012 04:40 PM Aaron Marcuse-Kubitza

csvs.py: stream_info(): Set dialect.quoting = csv.QUOTE_NONE for TSVs because they usually don't quote fields. Factored dialect detecting code into new function sniff().

1446 03/18/2012 04:14 PM Aaron Marcuse-Kubitza

csvs.py: Added csv modifications to compare Dialect instances

1444 03/16/2012 06:25 PM Aaron Marcuse-Kubitza

csvs.py: Added stream_info() to return NamedTuple {header_line, dialect} for later use in cat_csv. Changed reader_and_header() to use stream_info().

1442 03/16/2012 06:04 PM Aaron Marcuse-Kubitza

csvs.py: reader_and_header(): Restrict delimiters to common delimiters so that e.g. letters are not considered delimiters just because they appear frequently

1411 03/13/2012 07:41 PM Aaron Marcuse-Kubitza

csvs.py: Set dialect.doublequote to True because Sniffer doesn't turn this on by default

1388 03/13/2012 04:08 PM Aaron Marcuse-Kubitza

Added csvs.py for CSV I/O such as automatically detecting the dialect based on the header line