util.py: Added sort_by_len(), shortest(), longest()
join: Use new maps.join_combinable() to check if column names match
maps.py: Added cols_combinable() and use it in combinable(). Added join_combinable() and associates helper functions. Added documentation labels to each section.
xml_parse.py: ConsecXmlInputStream: Removed read() because that's now defined in streams.FilterStream
xml_parse.py: parse_next(): Strip control characters from input stream because they mess up the parser
streams.py: FilterStream: Forward all reads to readline()
strings.py: Added is_ctrl() and strip_ctrl()
xml_parse.py: parse_next(): On parser error, advance to next XML document since the rest of the current document is corrupted
streams.py: Added consume(). Added documentation labels to each section.
bin/map: For XML inputs, wrap sys.stdin in a LineCountStream and use new xml_parse.docs_iter() on_error() to add input line # to XML parsing exceptions
xml_parse.py: Added on_error() handler to parse_next() (passed through by docs_iter()), so that the caller can add useful info like the input line # to the exception message, and decide not to suppress rather than re-raising the exception
VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field identificationLabel2 to identificationLabel. Distinguish what are now two identificationLabel fields of the same name by tagging each one with [@id=2] or [@id=1]. inputs/SALVIAS-CSV/maps/VegX.organisms.csv: Merge tag1/stem_tag1 and tag2/stem_tag2 using _alt, since they are never set to different values when both are not NULL (although sometimes just one or just the other is not NULL).
VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field tag2 to identificationLabel2 to reflect that it will become a second instance of identificationLabel
VegX-VegBIEN.organisms.csv: Re-mapped individualOrganismObservation user-defined field lineCover to already existing volumeCanopy
VegX-VegBIEN.organisms.csv: Re-mapped individualOrganismObservation user-defined field cover to already existing attribute.coverPercent
VegX-VegBIEN.organisms.csv: Re-mapped individualOrganismObservation user-defined field count to already existing aggregateOrganismObservation.aggregateValue
vegbien.ERD.mwb: Fixed lines
README.TXT: Documented that `make reinstall_db` will delete your VegBIEN DB
README.TXT: Documented that `make empty_db` will delete your VegBIEN DB
root Makefile: empty_db: Confirm deletion just like for rm_db. rm_db: put $(confirmRmDb) on a separate line and move the $(error) call to the main $(confirm) macro since you always want to abort make if the user cancels (not just not run that command).
root Makefile: rm_db: If user cancels, abort in case target was reinstall_db to prevent installing
root Makefile: core, rm_core: Fixed bug where no longer existing prerequisites postgres_user, rm_postgres_user were not removed
root Makefile: rm_db: Confirm deletion with user. Merged postgres_user, rm_postgres_user into db, rm_db so that deletion confirmation applies to user deletion as well (which would indirectly cause the DB to be deleted).
README.TXT: Testing: Updated to add missing mappings
root Makefile: test-all: Added missing_mappings
Moved maps validation targets from main Makefile to input.Makefile. main Makefile: maps validation: Summarize the output of the inputs' maps validations.
Makefile: Also find missing input mappings, in addition to missing join mappings
join: Also produce warnings for no input mapping (if no comment explaining why no input mapping), in addition to no join mapping
inputs/NY/maps/DwC.specimens.csv: Documented why there is no input mapping for key
VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined fields stem* to remove the stem* prefix to be consistent with VegBIEN
VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation/plotObservation user-defined fields sourceaccessioncode to sourceAccessionCode to be consistent with VegX case sensitivity
VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field interceptCm to lineCover to be consistent with VegBIEN
VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field individualCode to authorPlantCode to be consistent with VegBIEN
VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field htFirstBranchM to heightFirstBranch to be consistent with VegBIEN
VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field coverPercent to cover to be consistent with VegBIEN
VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field siltPercent to silt to be consistent with VegBIEN
VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field sandPercent to sand to be consistent with VegBIEN
VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field pottasium to potassium to be consistent with VegBIEN
VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field organicPercent to organic to be consistent with VegBIEN
VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field clayPercent to clay to be consistent with VegBIEN
VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field cationCap to cationExchangeCapacity to be consistent with VegBIEN
VegX-VegBIEN.organisms.csv: Renamed plotObservation user-defined field precipMm to precipitation to be consistent with VegBIEN
VegX-VegBIEN.organisms.csv: Changed plotObservation user-defined field plotMethodology to /simpleUserdefined[name=method]/*ID/method/name
schemas/postgresql.nimoy.conf: Increased default_statistics_target to 8.4 default value to improve execution query plans
Added schemas/postgresql.Mac.conf (for tuning developers' local testing DBs)
schemas/postgresql*.conf: Increased checkpoint_segments and checkpoint_completion_target so that checkpoints (performance intensive) are written less often and load-balanced better
xml_dom.py: Don't print whitespace from parsed XML document when pretty-printing XML. minidom modifications section: Added subsection labels for the class each modification applies to.
Parser.py: Renamed SyntaxException to SyntaxError because it's an unexpected condition that should exit the program, a.k.a. an error
bin/map: process_rows(): When iterating over each row, only retrieve the next row if the end (limit of # of rows) has not been reached. This prevents the next row from being fetched, possibly causing an entire additional consecutive XML document to be parsed, if the limit has already been reached. This is primarily useful for XML inputs with a ".0.top" segment prepended before the other documents, which contains just the first two nodes for fast parsing of this smaller XML document when only the first two nodes are needed for testing. Without this fix, the ".0.top" segment would have needed to contain the first three nodes instead.
inputs/XAL: Accepted initial test outputs
inputs/XAL: Added maps
bin/map: Extended consecutive XML document support to direct-XML inputs (without a map spreadsheet). Factored out consecutive XML document row-iteration code into helper method get_rows() which does the iters.flatten() and itertools.imap() calls.
bin/map: Fixed bug in iteration over consecutive XML documents where only the first element of the first document was processed. Use of iters.flatten() and itertools.imap() fixes this problem so that the consecutive XML documents are regarded as a continuous stream of rows.
bin/map: Use new xml_parse.docs_iter() to iterate over each consecutive XML document in stdin
xml_parse.py: Added support for parsing consecutive XML documents in a stream
Added iters.py
streams.py: Added FilterStream. Changed TracedStream to use FilterStream.
Moved parse_str() from xml_dom.py to xml_parse.py
Added xml_parse.py
streams.py: CaptureStream: Ignore start_str when recording and end_str when not recording
streams.py: CaptureStream: Get each match as a separate array elem instead of concatenated together
ch_root, repl, map: Use new maps.col_info() instead of parsing col name manually. This allows maps with prefixes containing ":" to be supported, without the ":" being misinterpreted as the label-root separator.
maps.py: Added col_info() to get label, root, prefixes from col_name. Added col_formats() for use by combinable(). Use new col_formats() in combinable(). Removed no longer needed col_label().
input.Makefile: Use with_cat instead of with_cat_csv for XML sources
Renamed inputs/XAL/src/digir.xml.make to digir.specimens.xml.make so it would generate an output file with the proper table name
bin/map: Support concatenated XML documents for XML inputs
bin/map: Merged XML inputs with and without a map into the in_is_xml section
digir_client: Output profiling information
Added inputs/XAL/src/digir.xml.make
digir_client: Import http to take advantage of httplib modifications to deal with IncompleteRead errors
Added http.py with httplib modifications to deal with IncompleteRead errors
digir_client: Fixed bug where chunk size was being adjusted even if count == None (indicating no determinable last chunk), causing a type mismatch between None and the integer total
input.Makefile: Removed no longer needed "ifneq ($(wildcard test/),)" guard around Testing section because all inputs now have a test subdir
Added inputs/XAL
digir_client: Made chunk_size a configurable env var. Removed schema env var because schema is always the same for DiGIR (can be different for TAPIR). Make sure output ends in a newline so that consecutive XML documents are on different lines.
digir_client: Fixed bug where chunk_size records would always be retrieved even in the last chunk, which ignored any manual count the user might have set via the "n" option
digir_client: Repeatedly retrieve data in chunks. Provide match count. Added section comments.
xpath.py: Added get_value() to run get_1() and returns the value of any result node
xml_dom.py: Added parse_str()
digir_client: Use new streams.copy() to copy returned data to stdout
streams.py: Added copy(). Added section comment for traced streams.
digir_client: Label debugging output
streams.py: Renamed LineCountOutputStream to LineCountStream since TracedStream now works on both input and output streams
digir_client: Capture diagnostics for later use in determining next start/count values
streams.py: Added CaptureStream to wrap a stream, capturing matching text. Renamed TracedOutputStream to TracedStream and made it work on both input and output streams. Made TracedStream inherit from WrapStream so that close() would be forwarded properly.
bin/map: Changed XML input prefix handling to prepend prefix directly to XPath instead of separating it from the XPath with a "/". Changed get_with_prefix() to use new strings.with_prefixes().
strings.py: Added with_prefixes()
digir_client: Made schema customizable
digir_client: Set header sendTime, source dynamically. In debug mode, print the request XML.
Added local_ip to get local IP address
bin/map: Added prefixes support for XML inputs
digir_client: Filter by darwin:Kingdom=PLANTAE because presumably all records will have this. Don't debug-print URL.
Added initial bin/digir_client
Renamed timeout.py to timeouts.py. Renamed timeout_ vars to timeout.
opts.py: get_env_var(): default defaults to None
inputs/SpeciesLink: Accepted test outputs for new TAPIR download
bin/tapir/tapir2flat.php: Output to specieslink.specimens.csv instead of specieslink.txt so that the output file can be used right away without renaming
inputs/REMIB/src/nodes.make: Stop after a configurable # of empty responses (indicating no more nodes), instead of at a preset node ID, because there seem to be many more nodes than are listed on the web form
input.Makefile: import/rotate: Add "." before the date