lib/csvs.py: added header(stream)
moved everything into /trunk/ to create the standard svn layout, for use with tools that require this (eg. git-svn). IMPORTANT: do NOT do an `svn up`. instead, re-use your working copy's existing files with `svn switch` (http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.switch.html).
lib/csvs.py: sniff(): support single-column spreadsheets by defaulting to the Excel dialect when the delimiter can't be determined
lib/csvs.py: ColInsertFilter: support using a literal value instead of a function for the mk_value param, since this is the most common use case
lib/csvs.py: stream_info(): Fixed bug where headers with multiline columns were not supported because only the first line (not the first multiline row) is sniffed for the dialect
csvs.py: TsvReader.next(): Fixed bug where empty line needs to be separately returned as [], because csv.reader would interpret it as EOF since the line ending has already been removed
csvs.py: sniff(): TSVs: Turn off quoting because TSVs use \-escapes instead of quotes to escape delimeters, newlines, etc.
csvs.py: InputRewriter.readline(): Surround function in a try block that prints all exceptions, so that debugging information is available if an error occurs when this stream is used as input for psycopg's copy_expert() (COPY FROM)
csvs.py: ColInsertFilter: Support adding multiple, consecutive columns
csvs.py: sniff(): TSVs: Don't turn off quoting, because some TSVs (such as Madidi.IndividualObservation) do quote fields
csvs.py: TsvReader: Use csv.reader.next() when possible to support quoted fields, such as in Madidi.IndividualObservation
csvs.py: stream_info(): Use the Excel dialect and an empty header if the CSV file is empty
csvs.py: RowNumFilter: Use new ColInsertFilter
csvs.py: Added ColInsertFilter
csvs.py: Added RowNumFilter, which adds a row # column at the beginning of each row
csvs.py: InputRewriter: Use new StreamFilter to translate StopIteration EOF to ''
csvs.py: Added StreamFilter
csvs.py: InputRewriter: Also support stream inputs which report EOF as '' instead of StopIteration
csvs.py: Filter: Added empty close() method to support using it as a stream (such as with streams.ProgressInputStream)
csvs.py: Added InputRewriter, which wraps a reader, writing each row back to CSV
csvs.py: Added ColCtFilter, which gives all rows the same # columns
csvs.py: sniff(): Support multi-char delims using \t, such as \t|\t used by NCBI. Support custom line suffixes, such as \t| used by NCBI.
csvs.py: TsvReader.next(): Remove only the autodetected line ending instead of any standard line ending. Note that this requires all header override files to use the same line ending as the CSV they override, which is now the case.
csvs.py: is_tsv(): Support multi-char delimiters by checking only the first char of the delimiter
csvs.py: sniff(): Also autodetect the line ending
csvs.py: TsvReader.next(): Renamed raw_contents var to line, since this is just the line with the ending removed
csvs.py: Modify csv.Dialect._validate() to ignore "delimiter must be a 1-character string" errors, in order to support multi-char delimiters used by TsvReader
csvs.py: TsvReader: Use str.split() instead of csv.reader().next() to parse the row, for efficiency and to support multi-char delimiters. This is possible because the TSV dialect doesn't use CSV parsing features other than the delimiter and newline-escaping (which is handled separately).
csvs.py: delims: Added |
csvs.py: tsv_encode_map: Escape \n as \n (instead of as a \ followed by a newline) for clarity. Added escape for \r by using strings.json_encode_map. TsvReader: Decode all escapes in tsv_encode_map.
csvs.py: Added tsv_encode_map for use in creating TSVs parsed by TsvReader
csvs.py: TsvReader: Also interpret '\t' as a tab, to provide a mechanism for encoding embedded tabs
csvs.py: delims: Added ";", which is phpMyAdmin's default CSV delimiter
csvs.py: TsvReader: Prevent "new-line character seen in unquoted field" errors by replacing '\r' with '\n'
csvs.py: Added row filters
csvs.py: reader_and_header(): Use make_reader()
csvs.py: stream_info(): Added parse_header option. reader_and_header(): Use stream_info()'s new parse_header option.
csvs.py: stream_info(): If header_line == '', set dialect to None rather than trying (and failing) to auto-detect it
csvs.py: Added TsvReader to support TSV quirks. Added reader_class(). reader_and_header(): Use reader_class() to automatically use TsvReader instead of csv.reader for TSVs. Added is_tsv() and use it where `dialect.delimiter == '\t'` was used.
csvs.py: stream_info(): Set dialect.quoting = csv.QUOTE_NONE for TSVs because they usually don't quote fields. Factored dialect detecting code into new function sniff().
csvs.py: Added csv modifications to compare Dialect instances
csvs.py: Added stream_info() to return NamedTuple {header_line, dialect} for later use in cat_csv. Changed reader_and_header() to use stream_info().
csvs.py: reader_and_header(): Restrict delimiters to common delimiters so that e.g. letters are not considered delimiters just because they appear frequently
csvs.py: Set dialect.doublequote to True because Sniffer doesn't turn this on by default
Added csvs.py for CSV I/O such as automatically detecting the dialect based on the header line