/ - Changes - BIEN 3 - NCEAS Projects

root @ 1712

#	Date	Author	Comment
1712	04/02/2012 07:16 AM	Aaron Marcuse-Kubitza	xml_parse.py: Added support for parsing consecutive XML documents in a stream
1711	04/02/2012 07:01 AM	Aaron Marcuse-Kubitza	Added iters.py
1710	03/29/2012 10:33 PM	Aaron Marcuse-Kubitza	streams.py: Added FilterStream. Changed TracedStream to use FilterStream.
1709	03/29/2012 10:24 PM	Aaron Marcuse-Kubitza	Moved parse_str() from xml_dom.py to xml_parse.py
1708	03/29/2012 10:24 PM	Aaron Marcuse-Kubitza	Added xml_parse.py
1707	03/29/2012 10:21 PM	Aaron Marcuse-Kubitza	streams.py: CaptureStream: Ignore start_str when recording and end_str when not recording
1706	03/29/2012 10:13 PM	Aaron Marcuse-Kubitza	streams.py: CaptureStream: Get each match as a separate array elem instead of concatenated together
1705	03/29/2012 09:59 PM	Aaron Marcuse-Kubitza	ch_root, repl, map: Use new maps.col_info() instead of parsing col name manually. This allows maps with prefixes containing ":" to be supported, without the ":" being misinterpreted as the label-root separator.
1704	03/29/2012 09:57 PM	Aaron Marcuse-Kubitza	maps.py: Added col_info() to get label, root, prefixes from col_name. Added col_formats() for use by combinable(). Use new col_formats() in combinable(). Removed no longer needed col_label().
1703	03/29/2012 09:55 PM	Aaron Marcuse-Kubitza	input.Makefile: Use with_cat instead of with_cat_csv for XML sources
1702	03/29/2012 09:54 PM	Aaron Marcuse-Kubitza	Renamed inputs/XAL/src/digir.xml.make to digir.specimens.xml.make so it would generate an output file with the proper table name
1701	03/29/2012 08:53 PM	Aaron Marcuse-Kubitza	bin/map: Support concatenated XML documents for XML inputs
1700	03/29/2012 08:46 PM	Aaron Marcuse-Kubitza	bin/map: Merged XML inputs with and without a map into the in_is_xml section
1699	03/29/2012 08:33 PM	Aaron Marcuse-Kubitza	digir_client: Output profiling information
1698	03/29/2012 08:21 PM	Aaron Marcuse-Kubitza	Added inputs/XAL/src/digir.xml.make
1697	03/29/2012 08:21 PM	Aaron Marcuse-Kubitza	digir_client: Import http to take advantage of httplib modifications to deal with IncompleteRead errors
1696	03/29/2012 08:20 PM	Aaron Marcuse-Kubitza	Added http.py with httplib modifications to deal with IncompleteRead errors
1695	03/29/2012 07:46 PM	Aaron Marcuse-Kubitza	digir_client: Fixed bug where chunk size was being adjusted even if count == None (indicating no determinable last chunk), causing a type mismatch between None and the integer total
1694	03/29/2012 07:28 PM	Aaron Marcuse-Kubitza	input.Makefile: Removed no longer needed "ifneq ($(wildcard test/),)" guard around Testing section because all inputs now have a test subdir
1693	03/29/2012 07:25 PM	Aaron Marcuse-Kubitza	Added inputs/XAL
1692	03/29/2012 07:22 PM	Aaron Marcuse-Kubitza	digir_client: Made chunk_size a configurable env var. Removed schema env var because schema is always the same for DiGIR (can be different for TAPIR). Make sure output ends in a newline so that consecutive XML documents are on different lines.
1691	03/29/2012 07:13 PM	Aaron Marcuse-Kubitza	digir_client: Fixed bug where chunk_size records would always be retrieved even in the last chunk, which ignored any manual count the user might have set via the "n" option
1690	03/29/2012 07:07 PM	Aaron Marcuse-Kubitza	digir_client: Repeatedly retrieve data in chunks. Provide match count. Added section comments.
1689	03/29/2012 06:52 PM	Aaron Marcuse-Kubitza	xpath.py: Added get_value() to run get_1() and returns the value of any result node
1688	03/29/2012 06:51 PM	Aaron Marcuse-Kubitza	xml_dom.py: Added parse_str()
1687	03/29/2012 06:13 PM	Aaron Marcuse-Kubitza	digir_client: Use new streams.copy() to copy returned data to stdout
1686	03/29/2012 06:13 PM	Aaron Marcuse-Kubitza	streams.py: Added copy(). Added section comment for traced streams.
1685	03/29/2012 06:06 PM	Aaron Marcuse-Kubitza	digir_client: Label debugging output
1684	03/29/2012 05:54 PM	Aaron Marcuse-Kubitza	streams.py: Renamed LineCountOutputStream to LineCountStream since TracedStream now works on both input and output streams
1683	03/29/2012 05:52 PM	Aaron Marcuse-Kubitza	digir_client: Capture diagnostics for later use in determining next start/count values
1682	03/29/2012 05:51 PM	Aaron Marcuse-Kubitza	streams.py: Added CaptureStream to wrap a stream, capturing matching text. Renamed TracedOutputStream to TracedStream and made it work on both input and output streams. Made TracedStream inherit from WrapStream so that close() would be forwarded properly.
1681	03/29/2012 05:23 PM	Aaron Marcuse-Kubitza	bin/map: Changed XML input prefix handling to prepend prefix directly to XPath instead of separating it from the XPath with a "/". Changed get_with_prefix() to use new strings.with_prefixes().
1680	03/29/2012 05:21 PM	Aaron Marcuse-Kubitza	strings.py: Added with_prefixes()
1679	03/29/2012 04:56 PM	Aaron Marcuse-Kubitza	digir_client: Made schema customizable
1678	03/29/2012 04:35 PM	Aaron Marcuse-Kubitza	digir_client: Set header sendTime, source dynamically. In debug mode, print the request XML.
1677	03/29/2012 04:03 PM	Aaron Marcuse-Kubitza	Added local_ip to get local IP address
1676	03/29/2012 03:48 PM	Aaron Marcuse-Kubitza	bin/map: Added prefixes support for XML inputs
1675	03/28/2012 11:12 PM	Aaron Marcuse-Kubitza	digir_client: Filter by darwin:Kingdom=PLANTAE because presumably all records will have this. Don't debug-print URL.
1674	03/28/2012 11:07 PM	Aaron Marcuse-Kubitza	Added initial bin/digir_client
1673	03/28/2012 07:58 PM	Aaron Marcuse-Kubitza	Renamed timeout.py to timeouts.py. Renamed timeout_ vars to timeout.
1672	03/28/2012 07:52 PM	Aaron Marcuse-Kubitza	opts.py: get_env_var(): default defaults to None
1671	03/28/2012 06:35 PM	Aaron Marcuse-Kubitza	inputs/SpeciesLink: Accepted test outputs for new TAPIR download
1670	03/28/2012 06:03 PM	Aaron Marcuse-Kubitza	bin/tapir/tapir2flat.php: Output to specieslink.specimens.csv instead of specieslink.txt so that the output file can be used right away without renaming
1669	03/28/2012 05:52 PM	Aaron Marcuse-Kubitza	inputs/REMIB/src/nodes.make: Stop after a configurable # of empty responses (indicating no more nodes), instead of at a preset node ID, because there seem to be many more nodes than are listed on the web form
1668	03/27/2012 11:10 PM	Aaron Marcuse-Kubitza	input.Makefile: import/rotate: Add "." before the date
1667	03/27/2012 11:08 PM	Aaron Marcuse-Kubitza	input.Makefile: Added targets for editing import: import/rotate, import/rm
1666	03/27/2012 09:41 PM	Aaron Marcuse-Kubitza	bin/tapir/tapir2flat.php: Fixed XML parsing to strip control chars so DOMDocument::loadXML() wouldn't complain about "PCDATA invalid Char value 8 in Entity", etc.
1665	03/27/2012 09:07 PM	Aaron Marcuse-Kubitza	main Makefile: php-Darwin: Added instruction to set PHPRC if needed
1664	03/27/2012 09:03 PM	Aaron Marcuse-Kubitza	Added inputs/SpeciesLink/src/tapir.make
1663	03/27/2012 09:03 PM	Aaron Marcuse-Kubitza	input.Makefile: `src/%: src/%.make`: Don't tee recipe's stderr to make's stderr, because long-running make_scripts usually will be tracked using `tail -f`
1662	03/27/2012 09:00 PM	Aaron Marcuse-Kubitza	input.Makefile: `src/%: src/%.make`: Name the log file using the make_script name instead of the output file name
1661	03/27/2012 08:31 PM	Aaron Marcuse-Kubitza	cat_csv: If dialect == None, ignore that file because it's empty
1660	03/27/2012 08:30 PM	Aaron Marcuse-Kubitza	csvs.py: stream_info(): If header_line == '', set dialect to None rather than trying (and failing) to auto-detect it
1659	03/27/2012 08:19 PM	Aaron Marcuse-Kubitza	input.Makefile: Use new sort_filenames to putmultiple numbered sources in the correct order, dealing correctly with embedded numbers that aren't padded with leading zeros
1658	03/27/2012 08:18 PM	Aaron Marcuse-Kubitza	Added sort_filenames to sort a list of filenames, comparing embedded numbers numerically instead of lexicographically
1657	03/27/2012 07:18 PM	Aaron Marcuse-Kubitza	schemas/postgresql.conf: Decreased shared_buffers again because 4000MB wasn't enough less than 4GB SHMMAX
1656	03/27/2012 07:16 PM	Aaron Marcuse-Kubitza	schemas/postgresql.conf: Expressed shared_buffers in MB, since decimal GB doesn't seem to work anymore on 9.1
1655	03/27/2012 07:14 PM	Aaron Marcuse-Kubitza	schemas/postgresql.conf: Decreased shared_buffers to 3.9GB, slightly less than SHMMAX
1654	03/27/2012 07:11 PM	Aaron Marcuse-Kubitza	schemas/postgresql.conf: Optimized again using same changes as were applied to 8.4 version
1653	03/27/2012 07:10 PM	Aaron Marcuse-Kubitza	schemas/postgresql.conf: Replaced with original 9.1 version
1652	03/27/2012 07:03 PM	Aaron Marcuse-Kubitza	schemas/postgresql.conf: Optimized using analogous settings as postgresql.nimoy.conf
1651	03/27/2012 06:43 PM	Aaron Marcuse-Kubitza	inputs/REMIB/src/nodes.make: Don't abort entire import on empty response, because an empty response is also returned for nodes that are temporarily down, not just nodes that don't exist (assumed to be after the highest numbered node). Instead, stop import after 150 nodes if user did not specify an explicit # nodes.
1650	03/27/2012 05:50 PM	Aaron Marcuse-Kubitza	inputs/REMIB/src/nodes.make: Abort prefix on empty response using break, rather than just done = True, to avoid running any more code except the finally block. Moved metadata row validation outside metadata row retrieval try-except block.
1649	03/27/2012 05:41 PM	Aaron Marcuse-Kubitza	inputs/REMIB/src/nodes.make: If a read times out, abort the entire node rather than just the prefix to avoid waiting 20 sec for each of 26*26 prefixes
1648	03/27/2012 05:40 PM	Aaron Marcuse-Kubitza	profiling.py ItersProfiler, exc.py ExPercentTracker: Only output fraction of rows with errors if self.iter_ct > 0, to avoid divide-by-zero error
1647	03/27/2012 04:55 PM	Aaron Marcuse-Kubitza	inputs/REMIB/src/nodes.make: Fixed bug where row count was output in the middle of the row processing code, instead of after the first row is processed and the row count incremented. This removes "Processed 0 row(s)" messages at the beginning of every prefix.
1646	03/27/2012 04:40 PM	Aaron Marcuse-Kubitza	inputs/REMIB/src/nodes.make: Support custom starting node ID and # nodes processed via env vars
1645	03/27/2012 04:29 PM	Aaron Marcuse-Kubitza	Renamed inputs/REMIB/src/nodes.all.0.header.specimens.csv to node.0.header.specimens.csv so it would sort correctly with the new output file names
1644	03/27/2012 04:27 PM	Aaron Marcuse-Kubitza	Renamed inputs/REMIB/src/nodes.all.specimens.csv.make to inputs/REMIB/src/nodes.make since it will not be used to generate nodes.all.specimens.csv. However, it can still be used with the `src/%.make` make target, but will generate a dummy empty output file "nodes".
1643	03/27/2012 04:21 PM	Aaron Marcuse-Kubitza	inputs/REMIB/src/nodes.all.specimens.csv.make: Write each node to a separate output file
1642	03/27/2012 04:00 PM	Aaron Marcuse-Kubitza	inputs/REMIB/src/nodes.all.specimens.csv.make: Raise InputException instead of AssertionError if invalid metadata row, so that it will be caught and printed instead of aborting the program
1641	03/27/2012 03:56 PM	Aaron Marcuse-Kubitza	inputs/REMIB/src/nodes.all.specimens.csv.make: Moved header reading code inside TimeoutException try-except block since read sometimes times out before the header is even read
1640	03/27/2012 03:55 PM	Aaron Marcuse-Kubitza	schemas/postgresql.nimoy.conf: Increased shared_buffers to 1.5GB since kernel.shmmax has been increased to 2GB
1639	03/26/2012 11:07 PM	Aaron Marcuse-Kubitza	Renamed inputs/REMIB/src/remib_raw.0.header.specimens.txt to nodes.all.0.header.specimens.csv
1638	03/26/2012 10:57 PM	Aaron Marcuse-Kubitza	inputs/REMIB/src/nodes.all.specimens.csv.make: Increased read timeout
1637	03/26/2012 10:55 PM	Aaron Marcuse-Kubitza	inputs/REMIB/src/nodes.all.specimens.csv.make: Timeout stuck reads because sometimes nodes are offline, etc.
1636	03/26/2012 10:53 PM	Aaron Marcuse-Kubitza	exc.py: str_(): Strip trailing whitespace. print_ex(): Since str_() now strips trailing whitespace, strings.ensure_newl() is no longer necessary.
1635	03/26/2012 10:43 PM	Aaron Marcuse-Kubitza	streams.py: Added TimeoutInputStream and WrapStream. Changed StreamIter to use new WrapStream.
1634	03/26/2012 10:42 PM	Aaron Marcuse-Kubitza	Added timeout.py
1633	03/26/2012 10:25 PM	Aaron Marcuse-Kubitza	inputs/REMIB/src/nodes.all.specimens.csv.make: Download from all prefixes of all nodes. Stop when a node produces an empty response (not even an error), which indicates no more nodes. Changed status messages.
1632	03/26/2012 10:17 PM	Aaron Marcuse-Kubitza	input.Makefile: `src/%: src/%.make`: Append stderr to log file
1631	03/26/2012 09:21 PM	Aaron Marcuse-Kubitza	Added inputs/REMIB/src/nodes.all.specimens.csv.make to download REMIB data for all nodes
1630	03/26/2012 09:20 PM	Aaron Marcuse-Kubitza	Added streams.py for I/O, which contains StreamIter, TracedOutputStream, and LineCountOutputStream
1629	03/26/2012 09:20 PM	Aaron Marcuse-Kubitza	term.py: Added clear_line. Corrected file comment.
1628	03/26/2012 08:06 PM	Aaron Marcuse-Kubitza	Makefiles: Let subdir's Makefile decide whether to delete on error
1627	03/26/2012 08:05 PM	Aaron Marcuse-Kubitza	input.Makefile: Save partial outputs of aborted src make scripts
1626	03/26/2012 06:44 PM	Aaron Marcuse-Kubitza	input.Makefile: Fixed bug in `%: %.make` rule to use $< instead of $*
1625	03/26/2012 06:20 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: minimumElevationInMeters: Remove any "ca." prefix
1624	03/26/2012 06:19 PM	Aaron Marcuse-Kubitza	xml_func.py: _replace: Strip whitespace from the returned string
1623	03/26/2012 06:09 PM	Aaron Marcuse-Kubitza	csvs.py: Added TsvReader to support TSV quirks. Added reader_class(). reader_and_header(): Use reader_class() to automatically use TsvReader instead of csv.reader for TSVs. Added is_tsv() and use it where `dialect.delimiter == '\t'` was used.
1622	03/26/2012 06:06 PM	Aaron Marcuse-Kubitza	strings.py: Added extract_line_ending() and remove_line_ending(). ensure_newl(): Use new remove_line_ending(). Moved Parsing section to top since it is used by the other sections.
1621	03/26/2012 04:40 PM	Aaron Marcuse-Kubitza	csvs.py: stream_info(): Set dialect.quoting = csv.QUOTE_NONE for TSVs because they usually don't quote fields. Factored dialect detecting code into new function sniff().
1620	03/26/2012 03:45 PM	Aaron Marcuse-Kubitza	input.Makefile: verify: Added reverify option, which can be turned off to prevent regenerating the verify/%.out file from the DB (which can be time-consuming), and instead just diff verify/%.out with verify/%.ref
1619	03/24/2012 10:31 PM	Aaron Marcuse-Kubitza	count_error_rows: Allow input to be specified as last arg(s) in addition to as stdin
1618	03/24/2012 10:30 PM	Aaron Marcuse-Kubitza	exc.py: ExPercentTracker: When diplaying fraction of iters that had errors, don't duplicate the iter_text ("row", etc.) in the numerator
1617	03/24/2012 10:27 PM	Aaron Marcuse-Kubitza	bin/map: Use new ExPercentTracker iter_num tracking to track distinct row #s with errors
1616	03/24/2012 10:27 PM	Aaron Marcuse-Kubitza	exc.py: ExPercentTracker: Track iter_nums of Exceptions as well, to distinguish how many distinct iters had errors
1615	03/24/2012 10:10 PM	Aaron Marcuse-Kubitza	Added bin/count_error_rows to count distinct rows with errors in `map` error messages
1614	03/24/2012 09:06 PM	Aaron Marcuse-Kubitza	input.Makefile: Changed "%.out: .make" rule to ": %.make" so that any file can be built from a corresponding .make file. This will allow flat files to be retrieved dynamically by running an associated .make file.
1613	03/24/2012 09:01 PM	Aaron Marcuse-Kubitza	xml_func.py: FormatException: Inherit from ExceptionWithCause instead of SyntaxError because a FormatException signals a different kind of error condition (related to the input value rather than the function syntax)

Project

General

Profile

root @ 1712