Project

General

Profile

# Date Author Comment
14577 08/25/2014 10:16 PM Aaron Marcuse-Kubitza

lib/csvs.py: added header(stream)

11970 01/20/2014 11:33 AM Aaron Marcuse-Kubitza

moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).

9961 06/20/2013 06:20 AM Aaron Marcuse-Kubitza

lib/csvs.py: sniff(): support single-column spreadsheets by defaulting to the Excel dialect when the delimiter can't be determined

9509 05/23/2013 12:55 PM Aaron Marcuse-Kubitza

lib/csvs.py: ColInsertFilter: support using a literal value instead of a function for the mk_value param, since this is the most common use case

8202 03/27/2013 08:03 PM Aaron Marcuse-Kubitza

lib/csvs.py: stream_info(): Fixed bug where headers with multiline columns were not supported because only the first line (not the first multiline row) is sniffed for the dialect

8071 03/16/2013 12:44 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader.next(): Fixed bug where empty line needs to be separately returned as [], because csv.reader would interpret it as EOF since the line ending has already been removed

8070 03/16/2013 12:25 PM Aaron Marcuse-Kubitza

csvs.py: sniff(): TSVs: Turn off quoting because TSVs use \-escapes instead of quotes to escape delimeters, newlines, etc.

8069 03/16/2013 11:49 AM Aaron Marcuse-Kubitza

csvs.py: InputRewriter.readline(): Surround function in a try block that prints all exceptions, so that debugging information is available if an error occurs when this stream is used as input for psycopg's copy_expert() (COPY FROM)

7290 01/18/2013 07:13 AM Aaron Marcuse-Kubitza

csvs.py: ColInsertFilter: Support adding multiple, consecutive columns

7212 01/14/2013 12:15 PM Aaron Marcuse-Kubitza

csvs.py: sniff(): TSVs: Don't turn off quoting, because some TSVs (such as Madidi.IndividualObservation) do quote fields

7211 01/14/2013 12:13 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader: Use csv.reader.next() when possible to support quoted fields, such as in Madidi.IndividualObservation

6589 12/04/2012 09:18 PM Aaron Marcuse-Kubitza

csvs.py: stream_info(): Use the Excel dialect and an empty header if the CSV file is empty

5736 10/23/2012 09:08 AM Aaron Marcuse-Kubitza

csvs.py: RowNumFilter: Use new ColInsertFilter

5735 10/23/2012 09:08 AM Aaron Marcuse-Kubitza

csvs.py: Added ColInsertFilter

5593 10/17/2012 11:43 AM Aaron Marcuse-Kubitza

csvs.py: Added RowNumFilter, which adds a row # column at the beginning of each row

5587 10/17/2012 10:40 AM Aaron Marcuse-Kubitza

csvs.py: InputRewriter: Use new StreamFilter to translate StopIteration EOF to ''

5586 10/17/2012 10:36 AM Aaron Marcuse-Kubitza

csvs.py: Added StreamFilter

5585 10/17/2012 10:36 AM Aaron Marcuse-Kubitza

csvs.py: InputRewriter: Also support stream inputs which report EOF as '' instead of StopIteration

5574 10/17/2012 09:14 AM Aaron Marcuse-Kubitza

csvs.py: Filter: Added empty close() method to support using it as a stream (such as with streams.ProgressInputStream)

5571 10/17/2012 08:56 AM Aaron Marcuse-Kubitza

csvs.py: Added InputRewriter, which wraps a reader, writing each row back to CSV

5570 10/17/2012 08:54 AM Aaron Marcuse-Kubitza

csvs.py: Added ColCtFilter, which gives all rows the same # columns

5439 10/11/2012 08:23 PM Aaron Marcuse-Kubitza

csvs.py: sniff(): Support multi-char delims using \t, such as \t|\t used by NCBI. Support custom line suffixes, such as \t| used by NCBI.

5438 10/11/2012 08:18 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader.next(): Remove only the autodetected line ending instead of any standard line ending. Note that this requires all header override files to use the same line ending as the CSV they override, which is now the case.

5437 10/11/2012 08:15 PM Aaron Marcuse-Kubitza

csvs.py: is_tsv(): Support multi-char delimiters by checking only the first char of the delimiter

5436 10/11/2012 08:12 PM Aaron Marcuse-Kubitza

csvs.py: sniff(): Also autodetect the line ending

5435 10/11/2012 08:11 PM Aaron Marcuse-Kubitza

csvs.py: sniff(): Also autodetect the line ending

5433 10/11/2012 07:59 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader.next(): Renamed raw_contents var to line, since this is just the line with the ending removed

5431 10/11/2012 07:22 PM Aaron Marcuse-Kubitza

csvs.py: Modify csv.Dialect._validate() to ignore "delimiter must be a 1-character string" errors, in order to support multi-char delimiters used by TsvReader

5430 10/11/2012 07:21 PM Aaron Marcuse-Kubitza

csvs.py: Modify csv.Dialect._validate() to ignore "delimiter must be a 1-character string" errors, in order to support multi-char delimiters used by TsvReader

5429 10/11/2012 06:58 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader: Use str.split() instead of csv.reader().next() to parse the row, for efficiency and to support multi-char delimiters. This is possible because the TSV dialect doesn't use CSV parsing features other than the delimiter and newline-escaping (which is handled separately).

5426 10/10/2012 11:43 AM Aaron Marcuse-Kubitza

csvs.py: delims: Added |

5170 10/02/2012 09:44 PM Aaron Marcuse-Kubitza

csvs.py: tsv_encode_map: Escape \n as \n (instead of as a \ followed by a newline) for clarity. Added escape for \r by using strings.json_encode_map. TsvReader: Decode all escapes in tsv_encode_map.

5146 10/01/2012 08:46 PM Aaron Marcuse-Kubitza

csvs.py: Added tsv_encode_map for use in creating TSVs parsed by TsvReader

5145 10/01/2012 06:42 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader: Also interpret '\t' as a tab, to provide a mechanism for encoding embedded tabs

4211 08/24/2012 07:00 PM Aaron Marcuse-Kubitza

csvs.py: delims: Added ";", which is phpMyAdmin's default CSV delimiter

3055 06/25/2012 06:13 PM Aaron Marcuse-Kubitza

csvs.py: TsvReader: Prevent "new-line character seen in unquoted field" errors by replacing '\r' with '\n'

2114 05/09/2012 12:33 AM Aaron Marcuse-Kubitza

csvs.py: Added row filters

1958 04/23/2012 08:54 PM Aaron Marcuse-Kubitza

csvs.py: reader_and_header(): Use make_reader()

1923 04/20/2012 04:21 PM Aaron Marcuse-Kubitza

csvs.py: stream_info(): Added parse_header option. reader_and_header(): Use stream_info()'s new parse_header option.

1660 03/27/2012 08:30 PM Aaron Marcuse-Kubitza

csvs.py: stream_info(): If header_line == '', set dialect to None rather than trying (and failing) to auto-detect it

1623 03/26/2012 06:09 PM Aaron Marcuse-Kubitza

csvs.py: Added TsvReader to support TSV quirks. Added reader_class(). reader_and_header(): Use reader_class() to automatically use TsvReader instead of csv.reader for TSVs. Added is_tsv() and use it where `dialect.delimiter == '\t'` was used.

1621 03/26/2012 04:40 PM Aaron Marcuse-Kubitza

csvs.py: stream_info(): Set dialect.quoting = csv.QUOTE_NONE for TSVs because they usually don't quote fields. Factored dialect detecting code into new function sniff().

1446 03/18/2012 04:14 PM Aaron Marcuse-Kubitza

csvs.py: Added csv modifications to compare Dialect instances

1444 03/16/2012 06:25 PM Aaron Marcuse-Kubitza

csvs.py: Added stream_info() to return NamedTuple {header_line, dialect} for later use in cat_csv. Changed reader_and_header() to use stream_info().

1442 03/16/2012 06:04 PM Aaron Marcuse-Kubitza

csvs.py: reader_and_header(): Restrict delimiters to common delimiters so that e.g. letters are not considered delimiters just because they appear frequently

1411 03/13/2012 07:41 PM Aaron Marcuse-Kubitza

csvs.py: Set dialect.doublequote to True because Sniffer doesn't turn this on by default

1388 03/13/2012 04:08 PM Aaron Marcuse-Kubitza

Added csvs.py for CSV I/O such as automatically detecting the dialect based on the header line