bin/map: Use dummy synchronous Pool implementation if not using parallel processing
bin/map: Use multiprocessing instead of pp for parallel processing because it's easier to use (it uses the Python threading API and doesn't require providing all the functions a task calls). Allow the user to set the cpus option to to use all system CPUs (needed because in test mode, the default is 0 CPUs to turn off parallel processing).
disown_all, stop_imports: Use /bin/bash instead of /bin/sh because array subscripting is used
input.Makefile: Editing import: Use $(datasrc) instead of $(db) since $(db) is only set for DB-source inputs
input.Makefile: Import: If profile is on and test mode is on, output formatted profile stats to stdout
sql.py: index_cols(): Cache return values in db.index_cols
bin/map: Don't import pp unless cpus != 0 because it's slow and doesn't need to happen if we're not using parallelization. cpus option defaults to 0 in test mode so tests run faster.
sql.py: pkey(): Use pkeys cache from db object instead of parameter
sql.py: Wrapped db connection inside an object that can also store the cache of the pkeys and index_cols
bin/map: If cpus is 0, run without Parallel Python
bin/map: Set up Parallel Python with an env-var-customizable # CPUs
root Makefile: python-Linux: Added `sudo pip install pp`
root Makefile: python-Linux: Added python-parallel to installs
mappings: Build VegX-VegBIEN.organisms.csv from VegX-VegBIEN.stems.csv instead of vice versa. This entails switching the roots around so stem points to organism instead of the other way around, which is a complex operation. Re-rooted VegX-VegBIEN.organisms.csv at /plantobservation instead of /taxonoccurrence to avoid traveling up the hierarchy to taxonoccurrence and back down again to plantobservation, etc. as would otherwise have been the case.
bin/map: When determining if outer elements are types, look for /*s/ anywhere in the string instead of just at the beginning, because there might be root attrs (namespaces), etc. before it
xpath.py: get(): forward (parent-to-child) pointers: If last target object exists but doesn't have an ID attr (which indicates a bug), recover gracefully by just assuming the ID is 0. (Any bug will be noticeable in the output, which needs to be generated through workarounds like this in order to be able to debug.)
VegX mappings: Updated stemParent mapping for VegX 1.5.3
VegX mappings: Changed taxonDetermination of role identifier to instead have explicitly no role, because data providers' VegX files generally do not provide role information and we don't want the default taxonDetermination XPaths to require this
inputs/CTFS/maps/VegX.organisms.csv: Connected plot to plotObservation by using new support for backward (child-to-parent) pointers whose target is a text element containing an ID
xml_dom.py: get_id(): If the node doesn't have an ID, assumes the node itself is the ID. This enables backward (child-to-parent) pointers whose target is a text element containing an ID, rather than a regular element with an ID attribute.
VegX mappings: Map locationevent.sourceaccessioncode to plotUniqueIdentifier since this field is no longer being used by authorlocationcode
VegX mappings: Map the authorlocationcode to plotName instead of plotUniqueIdentifier because it's a better fit
inputs/CTFS/maps/VegX.organisms.csv: Fixed bug in Species taxonConcept mapping where the role was computer instead of identifier
xml_dom.py: value(): Skip comment nodes. This fixes a bug where comments inside text elements would prevent the value from being retrieved.
inputs/CTFS/test: Accepted test outputs for new VegX_CTFS_row_120000_bci.0.test.organisms.xml instead of VegX_CTFS_row_180000.0.test.organisms.xml, which didn't have <taxonNameUsageConcepts> that match up with <individualOrganisms>
inputs/CTFS/maps/VegX.organisms.csv: Added taxonConcept mappings
mappings/VegX-VegBIEN.organisms.csv: Added species taxonConcept mapping for identifier role
Added expand_xpath to expand XPath abbreviations
VegX mappings: Renamed taxonNameUsageConceptsID to taxonNameUsageConceptID (no plural) to match VegX 1.5.3
inputs/CTFS/maps/VegX.organisms.csv: Corrected CensusNumber input mapping
mappings/Makefile: Generate self maps for all core maps
mappings/Makefile: VegX-VegBIEN.stems.csv: Removed $(rootAttrs) from out root because stems don't use tcs namespace elements (stems don't have taxonDeterminations separate from the main organism)
VegX mappings: taxonConcept mappings: Added "tcs:" namespace prefix to appropriate elements. This will make the taxonConcept XPaths compatible with CTFS VegX.
input.Makefile: Vars/functions: Make: $(subMake): When forwarding to another dir based off of $(root), forward to $(root) rather than directly to the dir of the target. This ensures that any special targets that are only defined in the root Makefile still get run, even when the target is in a subdir with its own Makefile.
inputs/CTFS/test: Accepted initial test outputs. A lot of leaves are still unmapped with the default mappings.
inputs/CTFS/maps: Added initial maps
input.Makefile: Maps building: full via maps (maps/$(via).%.full.csv): $(makeFullCsv): Sort all maps so that rows are re-ordered whether or not a core self map exists. This way, if a core self map is created, it will not cause the sort order of the generated via-format XMLs to change. This makes it easier to accept any changes to test outputs that result from adding a core self map.
mappings/Makefile: VegX: Added VegX.self.organisms.csv. Added root attrs to chRoot maps, commented out since it's not ready to be checked in yet.
xpath.py: get(): Run xml_dom.by_tag_name() with ignore_namespace=False (possibly later set to True)
xml_dom.py: Comments: Added clean_comment() and mk_comment(). Searching child nodes: by_tag_name(): Added ignore_namespace option to ignore namespace of node name.
root Makefile: Added %-remake target
mappings/Makefile: Renamed joinMaps to dwcMaps and chrootMaps to vegxMaps. Added commented-out code to create VegX.self.organisms.csv (not ready to check in yet because it affects many dependent maps).
input.Makefile: Removed no longer needed $(noEmptyMap)
xml_func.py: process(): Use new xml_dom.mk_comment()
xml_dom.py: Added clean_comment() and mk_comment() to properly sanitize comment contents (comments can't contain '--')
Added inputs/TRTE
inputs/QMOR/test: Added initial accepted test outputs
inputs/QMOR/maps: Added maps
Added inputs/QMOR
inputs/MT/test: Added initial accepted test outputs
inputs/MT/maps: Added maps
mappings/Makefile: DwC-VegBIEN.specimens.csv: Don't call remove_empty to produce it, because join now deals with empty mappings correctly by still raising a warning. Removed no longer needed intermediate DwC.ci-VegBIEN.specimens.csv.
join: Also print "No join mapping" warning if a join mapping was found but it was empty. The warning in that case is actually "No non-empty join mapping" to distinguish it from a mapping that's missing entirely. input.Makefile: missing_mappings: Support new "No join mapping" error message.
Added inputs/MT
Added disown_all to disown all running jobs
stop_imports: Call jobspecs relative to $selfDir, rather than assuming it will be run from the svn root dir
union: Call maps.merge_headers() using **dict(prefer=header_num) instead of just prefer=header_num in order to work on Python 2.5.2 (which nimoy is running)
inputs/ACAD/test: Accepted initial test outputs
Added inputs/ACAD/maps/ maps
Accepted new test outputs resulting from the addition of the id -> occurrenceID mapping in mappings/DwC1-DwC2.specimens.csv
inputs/SALVIAS*/maps: Cleaned up maps for the first time since all via maps became subject to cleanup
input.Makefile: Removed no longer needed default "maps/.$(via).%.csv.last_cleanup" rule
input.Makefile: Maps building: Via maps cleanup: Added `env ignore=1` since with the switch to subtracting $(coreMap), all inputs will attempt to subtract some map, even if it's not subtractable
input.Makefile: Don't clean src maps, only build them
inputs/ARIZ/maps/DwC.specimens.csv: Re-cleaned up to take advantage of additional entries now removed by subtract
input.Makefile: Maps building: Via maps cleanup: Subtract $(coreMap) instead of $(coreSelfMap) so that entries whose input and output maps to the same place are subtracted as well
subtract: Also remove mappings whose input and output maps to the same non-empty value in map_1
util.py: Added all_equal(), all_equal_ignore_none(), have_same_value()
mappings/DwC1-DwC2.specimens.csv: Added id -> occurrenceID mapping
inputs/SALVIAS-CSV/maps/VegX.%.full.csv: Regenerated using new src maps
mappings/DwC1-DwC2.specimens.csv: Added mappings from dcterms elements without namespace to with namespace
inputs/SALVIAS-CSV: Built maps/src.%.csv
Added inputs/ACAD/maps/src.specimens.csv
input.Makefile: Maps building: Autogen src maps with known table names. Sources: $(withCatSrcs): Fixed bug where substitution pattern did not contain %.
Added src_map to make a source map spreadsheet from a CSV header
input.Makefile: Split Maps section into "Existing maps discovery" and "Maps building" sections. Sources: Added cat, cat-% to cat out sources.
input.Makefile: Factored out sources-related code to new Sources section
input.Makefile: $(srcMaps): Removed `$(filter-out maps/src.join.%.csv,...)` because maps/src.join.%.csv are no longer created
README.TXT: Schema changes: Split updating graphical ERD exports into separate section. Update graphical ERD exports: Added schemas/vegbien.ERD.core.pdf .
README.TXT: Added Datasource setup section with instructions to add a new datasource
Added inputs/ACAD
input.Makefile: Only setSvnIgnore the input dir, since it already exists and doesn't need to be added (inputs/Makefile adds it)
inputs/*/maps/DwC.specimens.csv: Removed extranenous XML meta info from DwC column root, since it now just needs to be present in the core via map mappings/DwC-VegBIEN.specimens.csv
union: Use new maps.merge_headers() to write properly combined header
maps.py: join_combinable(): Fixed roots_combinable() to run on col names instead of roots, which were passed in. merge_mappings(): Factored out mapping column combining into merge_mapping_cols(), which handles an optional prefer param as well to take the header_num env var. Added merge_headers().
util.py: Added sort_by_len(), shortest(), longest()
join: Use new maps.join_combinable() to check if column names match
maps.py: Added cols_combinable() and use it in combinable(). Added join_combinable() and associates helper functions. Added documentation labels to each section.
xml_parse.py: ConsecXmlInputStream: Removed read() because that's now defined in streams.FilterStream
xml_parse.py: parse_next(): Strip control characters from input stream because they mess up the parser
streams.py: FilterStream: Forward all reads to readline()
strings.py: Added is_ctrl() and strip_ctrl()
xml_parse.py: parse_next(): On parser error, advance to next XML document since the rest of the current document is corrupted
streams.py: Added consume(). Added documentation labels to each section.
bin/map: For XML inputs, wrap sys.stdin in a LineCountStream and use new xml_parse.docs_iter() on_error() to add input line # to XML parsing exceptions