Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  _archive 1598 almost 13 years Aaron Marcuse-Kubitza Moved _archive/tapir2flatClient/trunk/client/ t...
  bin 1718 almost 13 years Aaron Marcuse-Kubitza bin/map: process_rows(): When iterating over ea...
  config 272 about 13 years Aaron Marcuse-Kubitza Moved bien_password to new config dir
  inputs 1717 almost 13 years Aaron Marcuse-Kubitza inputs/XAL: Accepted initial test outputs
  lib 1712 almost 13 years Aaron Marcuse-Kubitza xml_parse.py: Added support for parsing consecu...
  mappings 1625 almost 13 years Aaron Marcuse-Kubitza mappings/DwC2-VegBIEN.specimens.csv: minimumEle...
  schemas 1657 almost 13 years Aaron Marcuse-Kubitza schemas/postgresql.conf: Decreased shared_buffe...
  to_do 811 almost 13 years Aaron Marcuse-Kubitza Added to_do/milestones.doc
Makefile 7.63 KB 1665 almost 13 years Aaron Marcuse-Kubitza main Makefile: php-Darwin: Added instruction to...
README.TXT 1.8 KB 1556 almost 13 years Aaron Marcuse-Kubitza README.TXT: Added instructions how to stop all ...
map 867 Bytes 1299 almost 13 years Aaron Marcuse-Kubitza map: On nimoy, use bien2_staging unless otherwi...

Latest revisions

# Date Author Comment
1718 04/02/2012 08:05 AM Aaron Marcuse-Kubitza

bin/map: process_rows(): When iterating over each row, only retrieve the next row if the end (limit of # of rows) has not been reached. This prevents the next row from being fetched, possibly causing an entire additional consecutive XML document to be parsed, if the limit has already been reached. This is primarily useful for XML inputs with a ".0.top" segment prepended before the other documents, which contains just the first two nodes for fast parsing of this smaller XML document when only the first two nodes are needed for testing. Without this fix, the ".0.top" segment would have needed to contain the first three nodes instead.

1717 04/02/2012 07:55 AM Aaron Marcuse-Kubitza

inputs/XAL: Accepted initial test outputs

1716 04/02/2012 07:54 AM Aaron Marcuse-Kubitza

inputs/XAL: Added maps

1715 04/02/2012 07:52 AM Aaron Marcuse-Kubitza

bin/map: Extended consecutive XML document support to direct-XML inputs (without a map spreadsheet). Factored out consecutive XML document row-iteration code into helper method get_rows() which does the iters.flatten() and itertools.imap() calls.

1714 04/02/2012 07:37 AM Aaron Marcuse-Kubitza

bin/map: Fixed bug in iteration over consecutive XML documents where only the first element of the first document was processed. Use of iters.flatten() and itertools.imap() fixes this problem so that the consecutive XML documents are regarded as a continuous stream of rows.

1713 04/02/2012 07:16 AM Aaron Marcuse-Kubitza

bin/map: Use new xml_parse.docs_iter() to iterate over each consecutive XML document in stdin

1712 04/02/2012 07:16 AM Aaron Marcuse-Kubitza

xml_parse.py: Added support for parsing consecutive XML documents in a stream

1711 04/02/2012 07:01 AM Aaron Marcuse-Kubitza

Added iters.py

1710 03/29/2012 10:33 PM Aaron Marcuse-Kubitza

streams.py: Added FilterStream. Changed TracedStream to use FilterStream.

1709 03/29/2012 10:24 PM Aaron Marcuse-Kubitza

Moved parse_str() from xml_dom.py to xml_parse.py

View all revisions | View revisions

Also available in: Atom