/trunk/lib/csvs.py - Changes - BIEN 3 - NCEAS Projects

root/trunk/lib/csvs.py @ 14843

#	Date	Author	Comment
14620	08/28/2014 07:57 PM	Aaron Marcuse-Kubitza	bugfix: lib/csvs.py: JsonReader: need to pass col_order to row_dict_to_list_reader
14617	08/28/2014 07:10 PM	Aaron Marcuse-Kubitza	lib/csvs.py: JsonReader: added support for values that are arrays
14616	08/28/2014 07:05 PM	Aaron Marcuse-Kubitza	lib/csvs.py: MultiFilter: inherit from WrapReader instead of Filter to avoid needing to define a no-op filter_() function
14615	08/28/2014 06:49 PM	Aaron Marcuse-Kubitza	bugfix: lib/csvs.py: row_dict_to_list_reader: need to override next() directly instead of just using Filter, because Filter doesn't support returning multiple rows for one input row (in this case, prepending a header row). this caused the 1st data row to be missing.
14614	08/28/2014 06:47 PM	Aaron Marcuse-Kubitza	lib/csvs.py: Filter: inherit from WrapReader, which separates out the CSV-reader API code
14613	08/28/2014 06:43 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added WrapReader
14612	08/28/2014 06:43 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added Reader
14600	08/28/2014 03:10 AM	Aaron Marcuse-Kubitza	lib/csvs.py: JsonReader: factored out row-dict-to-list into new row_dict_to_list_reader so that JSON-specific preprocessing is kept separate from the row format translation
14599	08/27/2014 03:17 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added MultiFilter, which enables applying multiple filters by nesting
14595	08/26/2014 07:44 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added JsonReader, which reads parsed JSON data as row tuples
14594	08/26/2014 07:43 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added row_dict_to_list(), which translates a CSV dict-based row to a list-based one
14593	08/26/2014 07:43 PM	Aaron Marcuse-Kubitza	lib/csvs.py: RowNumFilter: added support for filtering the header row as well
14592	08/26/2014 07:42 PM	Aaron Marcuse-Kubitza	lib/csvs.py: ColInsertFilter: added support for filtering the header row as well
14591	08/26/2014 05:12 PM	Aaron Marcuse-Kubitza	lib/csvs.py: InputRewriter: documented that this is also a stream (in addition to inheriting from StreamFilter)
14590	08/26/2014 05:11 PM	Aaron Marcuse-Kubitza	bugfix: lib/csvs.py: InputRewriter: accept a reader, as would be expected, instead of a custom stream whose lines are tuples
14586	08/26/2014 04:49 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added ProgressInputFilter, analogous to streams.ProgressInputStream
14577	08/25/2014 10:16 PM	Aaron Marcuse-Kubitza	lib/csvs.py: added header(stream)
11970	01/20/2014 11:33 AM	Aaron Marcuse-Kubitza	moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
9961	06/20/2013 06:20 AM	Aaron Marcuse-Kubitza	lib/csvs.py: sniff(): support single-column spreadsheets by defaulting to the Excel dialect when the delimiter can't be determined
9509	05/23/2013 12:55 PM	Aaron Marcuse-Kubitza	lib/csvs.py: ColInsertFilter: support using a literal value instead of a function for the mk_value param, since this is the most common use case
8202	03/27/2013 08:03 PM	Aaron Marcuse-Kubitza	lib/csvs.py: stream_info(): Fixed bug where headers with multiline columns were not supported because only the first line (not the first multiline row) is sniffed for the dialect
8071	03/16/2013 12:44 PM	Aaron Marcuse-Kubitza	csvs.py: TsvReader.next(): Fixed bug where empty line needs to be separately returned as [], because csv.reader would interpret it as EOF since the line ending has already been removed
8070	03/16/2013 12:25 PM	Aaron Marcuse-Kubitza	csvs.py: sniff(): TSVs: Turn off quoting because TSVs use \-escapes instead of quotes to escape delimeters, newlines, etc.
8069	03/16/2013 11:49 AM	Aaron Marcuse-Kubitza	csvs.py: InputRewriter.readline(): Surround function in a try block that prints all exceptions, so that debugging information is available if an error occurs when this stream is used as input for psycopg's copy_expert() (COPY FROM)
7290	01/18/2013 07:13 AM	Aaron Marcuse-Kubitza	csvs.py: ColInsertFilter: Support adding multiple, consecutive columns
7212	01/14/2013 12:15 PM	Aaron Marcuse-Kubitza	csvs.py: sniff(): TSVs: Don't turn off quoting, because some TSVs (such as Madidi.IndividualObservation) do quote fields
7211	01/14/2013 12:13 PM	Aaron Marcuse-Kubitza	csvs.py: TsvReader: Use csv.reader.next() when possible to support quoted fields, such as in Madidi.IndividualObservation
6589	12/04/2012 09:18 PM	Aaron Marcuse-Kubitza	csvs.py: stream_info(): Use the Excel dialect and an empty header if the CSV file is empty
5736	10/23/2012 09:08 AM	Aaron Marcuse-Kubitza	csvs.py: RowNumFilter: Use new ColInsertFilter
5735	10/23/2012 09:08 AM	Aaron Marcuse-Kubitza	csvs.py: Added ColInsertFilter
5593	10/17/2012 11:43 AM	Aaron Marcuse-Kubitza	csvs.py: Added RowNumFilter, which adds a row # column at the beginning of each row
5587	10/17/2012 10:40 AM	Aaron Marcuse-Kubitza	csvs.py: InputRewriter: Use new StreamFilter to translate StopIteration EOF to ''
5586	10/17/2012 10:36 AM	Aaron Marcuse-Kubitza	csvs.py: Added StreamFilter
5585	10/17/2012 10:36 AM	Aaron Marcuse-Kubitza	csvs.py: InputRewriter: Also support stream inputs which report EOF as '' instead of StopIteration
5574	10/17/2012 09:14 AM	Aaron Marcuse-Kubitza	csvs.py: Filter: Added empty close() method to support using it as a stream (such as with streams.ProgressInputStream)
5571	10/17/2012 08:56 AM	Aaron Marcuse-Kubitza	csvs.py: Added InputRewriter, which wraps a reader, writing each row back to CSV
5570	10/17/2012 08:54 AM	Aaron Marcuse-Kubitza	csvs.py: Added ColCtFilter, which gives all rows the same # columns
5439	10/11/2012 08:23 PM	Aaron Marcuse-Kubitza	csvs.py: sniff(): Support multi-char delims using \t, such as \t\|\t used by NCBI. Support custom line suffixes, such as \t\| used by NCBI.
5438	10/11/2012 08:18 PM	Aaron Marcuse-Kubitza	csvs.py: TsvReader.next(): Remove only the autodetected line ending instead of any standard line ending. Note that this requires all header override files to use the same line ending as the CSV they override, which is now the case.
5437	10/11/2012 08:15 PM	Aaron Marcuse-Kubitza	csvs.py: is_tsv(): Support multi-char delimiters by checking only the first char of the delimiter
5436	10/11/2012 08:12 PM	Aaron Marcuse-Kubitza	csvs.py: sniff(): Also autodetect the line ending
5435	10/11/2012 08:11 PM	Aaron Marcuse-Kubitza	csvs.py: sniff(): Also autodetect the line ending
5433	10/11/2012 07:59 PM	Aaron Marcuse-Kubitza	csvs.py: TsvReader.next(): Renamed raw_contents var to line, since this is just the line with the ending removed
5431	10/11/2012 07:22 PM	Aaron Marcuse-Kubitza	csvs.py: Modify csv.Dialect._validate() to ignore "delimiter must be a 1-character string" errors, in order to support multi-char delimiters used by TsvReader
5430	10/11/2012 07:21 PM	Aaron Marcuse-Kubitza	csvs.py: Modify csv.Dialect._validate() to ignore "delimiter must be a 1-character string" errors, in order to support multi-char delimiters used by TsvReader
5429	10/11/2012 06:58 PM	Aaron Marcuse-Kubitza	csvs.py: TsvReader: Use str.split() instead of csv.reader().next() to parse the row, for efficiency and to support multi-char delimiters. This is possible because the TSV dialect doesn't use CSV parsing features other than the delimiter and newline-escaping (which is handled separately).
5426	10/10/2012 11:43 AM	Aaron Marcuse-Kubitza	csvs.py: delims: Added \|
5170	10/02/2012 09:44 PM	Aaron Marcuse-Kubitza	csvs.py: tsv_encode_map: Escape \n as \n (instead of as a \ followed by a newline) for clarity. Added escape for \r by using strings.json_encode_map. TsvReader: Decode all escapes in tsv_encode_map.
5146	10/01/2012 08:46 PM	Aaron Marcuse-Kubitza	csvs.py: Added tsv_encode_map for use in creating TSVs parsed by TsvReader
5145	10/01/2012 06:42 PM	Aaron Marcuse-Kubitza	csvs.py: TsvReader: Also interpret '\t' as a tab, to provide a mechanism for encoding embedded tabs
4211	08/24/2012 07:00 PM	Aaron Marcuse-Kubitza	csvs.py: delims: Added ";", which is phpMyAdmin's default CSV delimiter
3055	06/25/2012 06:13 PM	Aaron Marcuse-Kubitza	csvs.py: TsvReader: Prevent "new-line character seen in unquoted field" errors by replacing '\r' with '\n'
2114	05/09/2012 12:33 AM	Aaron Marcuse-Kubitza	csvs.py: Added row filters
1958	04/23/2012 08:54 PM	Aaron Marcuse-Kubitza	csvs.py: reader_and_header(): Use make_reader()
1923	04/20/2012 04:21 PM	Aaron Marcuse-Kubitza	csvs.py: stream_info(): Added parse_header option. reader_and_header(): Use stream_info()'s new parse_header option.
1660	03/27/2012 08:30 PM	Aaron Marcuse-Kubitza	csvs.py: stream_info(): If header_line == '', set dialect to None rather than trying (and failing) to auto-detect it
1623	03/26/2012 06:09 PM	Aaron Marcuse-Kubitza	csvs.py: Added TsvReader to support TSV quirks. Added reader_class(). reader_and_header(): Use reader_class() to automatically use TsvReader instead of csv.reader for TSVs. Added is_tsv() and use it where `dialect.delimiter == '\t'` was used.
1621	03/26/2012 04:40 PM	Aaron Marcuse-Kubitza	csvs.py: stream_info(): Set dialect.quoting = csv.QUOTE_NONE for TSVs because they usually don't quote fields. Factored dialect detecting code into new function sniff().
1446	03/18/2012 04:14 PM	Aaron Marcuse-Kubitza	csvs.py: Added csv modifications to compare Dialect instances
1444	03/16/2012 06:25 PM	Aaron Marcuse-Kubitza	csvs.py: Added stream_info() to return NamedTuple {header_line, dialect} for later use in cat_csv. Changed reader_and_header() to use stream_info().
1442	03/16/2012 06:04 PM	Aaron Marcuse-Kubitza	csvs.py: reader_and_header(): Restrict delimiters to common delimiters so that e.g. letters are not considered delimiters just because they appear frequently
1411	03/13/2012 07:41 PM	Aaron Marcuse-Kubitza	csvs.py: Set dialect.doublequote to True because Sniffer doesn't turn this on by default
1388	03/13/2012 04:08 PM	Aaron Marcuse-Kubitza	Added csvs.py for CSV I/O such as automatically detecting the dialect based on the header line

Project

General

Profile