/ - Changes - BIEN 3 - NCEAS Projects

root @ 1857

#	Date	Author	Comment
1857	04/14/2012 12:37 PM	Aaron Marcuse-Kubitza	bin/map: Use dummy synchronous Pool implementation if not using parallel processing
1856	04/14/2012 12:18 PM	Aaron Marcuse-Kubitza	bin/map: Use multiprocessing instead of pp for parallel processing because it's easier to use (it uses the Python threading API and doesn't require providing all the functions a task calls). Allow the user to set the cpus option to to use all system CPUs (needed because in test mode, the default is 0 CPUs to turn off parallel processing).
1855	04/13/2012 04:41 PM	Aaron Marcuse-Kubitza	disown_all, stop_imports: Use /bin/bash instead of /bin/sh because array subscripting is used
1854	04/13/2012 04:38 PM	Aaron Marcuse-Kubitza	input.Makefile: Editing import: Use $(datasrc) instead of $(db) since $(db) is only set for DB-source inputs
1853	04/13/2012 04:31 PM	Aaron Marcuse-Kubitza	input.Makefile: Import: If profile is on and test mode is on, output formatted profile stats to stdout
1852	04/13/2012 03:00 PM	Aaron Marcuse-Kubitza	sql.py: index_cols(): Cache return values in db.index_cols
1851	04/13/2012 02:56 PM	Aaron Marcuse-Kubitza	bin/map: Don't import pp unless cpus != 0 because it's slow and doesn't need to happen if we're not using parallelization. cpus option defaults to 0 in test mode so tests run faster.
1850	04/13/2012 02:52 PM	Aaron Marcuse-Kubitza	sql.py: pkey(): Use pkeys cache from db object instead of parameter
1849	04/13/2012 02:44 PM	Aaron Marcuse-Kubitza	sql.py: Wrapped db connection inside an object that can also store the cache of the pkeys and index_cols
1848	04/13/2012 02:27 PM	Aaron Marcuse-Kubitza	bin/map: If cpus is 0, run without Parallel Python
1847	04/13/2012 02:19 PM	Aaron Marcuse-Kubitza	bin/map: Set up Parallel Python with an env-var-customizable # CPUs
1846	04/13/2012 02:18 PM	Aaron Marcuse-Kubitza	bin/map: Set up Parallel Python with an env-var-customizable # CPUs
1845	04/13/2012 12:58 PM	Aaron Marcuse-Kubitza	root Makefile: python-Linux: Added `sudo pip install pp`
1844	04/13/2012 12:47 PM	Aaron Marcuse-Kubitza	root Makefile: python-Linux: Added python-parallel to installs
1843	04/13/2012 12:19 PM	Aaron Marcuse-Kubitza	mappings: Build VegX-VegBIEN.organisms.csv from VegX-VegBIEN.stems.csv instead of vice versa. This entails switching the roots around so stem points to organism instead of the other way around, which is a complex operation. Re-rooted VegX-VegBIEN.organisms.csv at /plantobservation instead of /taxonoccurrence to avoid traveling up the hierarchy to taxonoccurrence and back down again to plantobservation, etc. as would otherwise have been the case.
1842	04/13/2012 11:43 AM	Aaron Marcuse-Kubitza	bin/map: When determining if outer elements are types, look for /*s/ anywhere in the string instead of just at the beginning, because there might be root attrs (namespaces), etc. before it
1841	04/13/2012 10:45 AM	Aaron Marcuse-Kubitza	bin/map: When determining if outer elements are types, look for /*s/ anywhere in the string instead of just at the beginning, because there might be root attrs (namespaces), etc. before it
1840	04/13/2012 10:44 AM	Aaron Marcuse-Kubitza	xpath.py: get(): forward (parent-to-child) pointers: If last target object exists but doesn't have an ID attr (which indicates a bug), recover gracefully by just assuming the ID is 0. (Any bug will be noticeable in the output, which needs to be generated through workarounds like this in order to be able to debug.)
1839	04/10/2012 05:18 PM	Aaron Marcuse-Kubitza	VegX mappings: Updated stemParent mapping for VegX 1.5.3
1838	04/10/2012 04:54 PM	Aaron Marcuse-Kubitza	VegX mappings: Changed taxonDetermination of role identifier to instead have explicitly no role, because data providers' VegX files generally do not provide role information and we don't want the default taxonDetermination XPaths to require this
1837	04/10/2012 04:34 PM	Aaron Marcuse-Kubitza	inputs/CTFS/maps/VegX.organisms.csv: Connected plot to plotObservation by using new support for backward (child-to-parent) pointers whose target is a text element containing an ID
1836	04/10/2012 04:33 PM	Aaron Marcuse-Kubitza	xml_dom.py: get_id(): If the node doesn't have an ID, assumes the node itself is the ID. This enables backward (child-to-parent) pointers whose target is a text element containing an ID, rather than a regular element with an ID attribute.
1835	04/10/2012 04:04 PM	Aaron Marcuse-Kubitza	VegX mappings: Map locationevent.sourceaccessioncode to plotUniqueIdentifier since this field is no longer being used by authorlocationcode
1834	04/10/2012 03:48 PM	Aaron Marcuse-Kubitza	VegX mappings: Map the authorlocationcode to plotName instead of plotUniqueIdentifier because it's a better fit
1833	04/10/2012 03:13 PM	Aaron Marcuse-Kubitza	inputs/CTFS/maps/VegX.organisms.csv: Fixed bug in Species taxonConcept mapping where the role was computer instead of identifier
1832	04/10/2012 03:11 PM	Aaron Marcuse-Kubitza	xml_dom.py: value(): Skip comment nodes. This fixes a bug where comments inside text elements would prevent the value from being retrieved.
1831	04/10/2012 03:02 PM	Aaron Marcuse-Kubitza	inputs/CTFS/test: Accepted test outputs for new VegX_CTFS_row_120000_bci.0.test.organisms.xml instead of VegX_CTFS_row_180000.0.test.organisms.xml, which didn't have <taxonNameUsageConcepts> that match up with <individualOrganisms>
1830	04/10/2012 02:16 PM	Aaron Marcuse-Kubitza	inputs/CTFS/test: Accepted test outputs for new VegX_CTFS_row_120000_bci.0.test.organisms.xml instead of VegX_CTFS_row_180000.0.test.organisms.xml, which didn't have <taxonNameUsageConcepts> that match up with <individualOrganisms>
1829	04/10/2012 01:59 PM	Aaron Marcuse-Kubitza	inputs/CTFS/maps/VegX.organisms.csv: Added taxonConcept mappings
1828	04/10/2012 01:59 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.organisms.csv: Added species taxonConcept mapping for identifier role
1827	04/10/2012 01:33 PM	Aaron Marcuse-Kubitza	Added expand_xpath to expand XPath abbreviations
1826	04/10/2012 12:43 PM	Aaron Marcuse-Kubitza	VegX mappings: Renamed taxonNameUsageConceptsID to taxonNameUsageConceptID (no plural) to match VegX 1.5.3
1825	04/10/2012 12:33 PM	Aaron Marcuse-Kubitza	inputs/CTFS/maps/VegX.organisms.csv: Corrected CensusNumber input mapping
1824	04/10/2012 12:24 PM	Aaron Marcuse-Kubitza	mappings/Makefile: Generate self maps for all core maps
1823	04/10/2012 12:19 PM	Aaron Marcuse-Kubitza	mappings/Makefile: VegX-VegBIEN.stems.csv: Removed $(rootAttrs) from out root because stems don't use tcs namespace elements (stems don't have taxonDeterminations separate from the main organism)
1822	04/10/2012 12:13 PM	Aaron Marcuse-Kubitza	VegX mappings: taxonConcept mappings: Added "tcs:" namespace prefix to appropriate elements. This will make the taxonConcept XPaths compatible with CTFS VegX.
1821	04/09/2012 06:52 PM	Aaron Marcuse-Kubitza	input.Makefile: Vars/functions: Make: $(subMake): When forwarding to another dir based off of $(root), forward to $(root) rather than directly to the dir of the target. This ensures that any special targets that are only defined in the root Makefile still get run, even when the target is in a subdir with its own Makefile.
1820	04/09/2012 06:41 PM	Aaron Marcuse-Kubitza	inputs/CTFS/test: Accepted initial test outputs. A lot of leaves are still unmapped with the default mappings.
1819	04/09/2012 06:40 PM	Aaron Marcuse-Kubitza	inputs/CTFS/maps: Added initial maps
1818	04/09/2012 06:39 PM	Aaron Marcuse-Kubitza	VegX mappings: taxonConcept mappings: Added "tcs:" namespace prefix to appropriate elements. This will make the taxonConcept XPaths compatible with CTFS VegX.
1817	04/09/2012 06:13 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: full via maps (maps/$(via).%.full.csv): $(makeFullCsv): Sort all maps so that rows are re-ordered whether or not a core self map exists. This way, if a core self map is created, it will not cause the sort order of the generated via-format XMLs to change. This makes it easier to accept any changes to test outputs that result from adding a core self map.
1816	04/09/2012 05:53 PM	Aaron Marcuse-Kubitza	mappings/Makefile: VegX: Added VegX.self.organisms.csv. Added root attrs to chRoot maps, commented out since it's not ready to be checked in yet.
1815	04/09/2012 05:34 PM	Aaron Marcuse-Kubitza	xpath.py: get(): Run xml_dom.by_tag_name() with ignore_namespace=False (possibly later set to True)
1814	04/09/2012 05:32 PM	Aaron Marcuse-Kubitza	xml_dom.py: Comments: Added clean_comment() and mk_comment(). Searching child nodes: by_tag_name(): Added ignore_namespace option to ignore namespace of node name.
1813	04/09/2012 05:26 PM	Aaron Marcuse-Kubitza	root Makefile: Added %-remake target
1812	04/09/2012 04:53 PM	Aaron Marcuse-Kubitza	mappings/Makefile: Renamed joinMaps to dwcMaps and chrootMaps to vegxMaps. Added commented-out code to create VegX.self.organisms.csv (not ready to check in yet because it affects many dependent maps).
1811	04/09/2012 02:52 PM	Aaron Marcuse-Kubitza	input.Makefile: Removed no longer needed $(noEmptyMap)
1810	04/09/2012 12:40 PM	Aaron Marcuse-Kubitza	xml_func.py: process(): Use new xml_dom.mk_comment()
1809	04/09/2012 12:40 PM	Aaron Marcuse-Kubitza	xml_dom.py: Added clean_comment() and mk_comment() to properly sanitize comment contents (comments can't contain '--')
1808	04/09/2012 12:14 PM	Aaron Marcuse-Kubitza	Added inputs/TRTE
1807	04/03/2012 08:26 PM	Aaron Marcuse-Kubitza	inputs/QMOR/test: Added initial accepted test outputs
1806	04/03/2012 08:26 PM	Aaron Marcuse-Kubitza	inputs/QMOR/maps: Added maps
1805	04/03/2012 08:20 PM	Aaron Marcuse-Kubitza	Added inputs/QMOR
1804	04/03/2012 08:14 PM	Aaron Marcuse-Kubitza	inputs/MT/test: Added initial accepted test outputs
1803	04/03/2012 08:14 PM	Aaron Marcuse-Kubitza	inputs/MT/maps: Added maps
1802	04/03/2012 08:13 PM	Aaron Marcuse-Kubitza	mappings/Makefile: DwC-VegBIEN.specimens.csv: Don't call remove_empty to produce it, because join now deals with empty mappings correctly by still raising a warning. Removed no longer needed intermediate DwC.ci-VegBIEN.specimens.csv.
1801	04/03/2012 08:09 PM	Aaron Marcuse-Kubitza	join: Also print "No join mapping" warning if a join mapping was found but it was empty. The warning in that case is actually "No non-empty join mapping" to distinguish it from a mapping that's missing entirely. input.Makefile: missing_mappings: Support new "No join mapping" error message.
1800	04/03/2012 08:08 PM	Aaron Marcuse-Kubitza	join: Also print "No join mapping" warning if a join mapping was found but it was empty. The warning in that case is actually "No non-empty join mapping" to distinguish it from a mapping that's missing entirely. input.Makefile: missing_mappings: Support new "No join mapping" error message.
1799	04/03/2012 07:33 PM	Aaron Marcuse-Kubitza	Added inputs/MT
1798	04/03/2012 07:26 PM	Aaron Marcuse-Kubitza	Added disown_all to disown all running jobs
1797	04/03/2012 07:26 PM	Aaron Marcuse-Kubitza	stop_imports: Call jobspecs relative to $selfDir, rather than assuming it will be run from the svn root dir
1796	04/03/2012 07:18 PM	Aaron Marcuse-Kubitza	union: Call maps.merge_headers() using **dict(prefer=header_num) instead of just prefer=header_num in order to work on Python 2.5.2 (which nimoy is running)
1795	04/03/2012 07:00 PM	Aaron Marcuse-Kubitza	inputs/ACAD/test: Accepted initial test outputs
1794	04/03/2012 07:00 PM	Aaron Marcuse-Kubitza	Added inputs/ACAD/maps/ maps
1793	04/03/2012 06:59 PM	Aaron Marcuse-Kubitza	Accepted new test outputs resulting from the addition of the id -> occurrenceID mapping in mappings/DwC1-DwC2.specimens.csv
1792	04/03/2012 06:57 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS*/maps: Cleaned up maps for the first time since all via maps became subject to cleanup
1791	04/03/2012 06:55 PM	Aaron Marcuse-Kubitza	input.Makefile: Removed no longer needed default "maps/.$(via).%.csv.last_cleanup" rule
1790	04/03/2012 06:54 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: Via maps cleanup: Added `env ignore=1` since with the switch to subtracting $(coreMap), all inputs will attempt to subtract some map, even if it's not subtractable
1789	04/03/2012 06:47 PM	Aaron Marcuse-Kubitza	input.Makefile: Don't clean src maps, only build them
1788	04/03/2012 06:45 PM	Aaron Marcuse-Kubitza	inputs/ARIZ/maps/DwC.specimens.csv: Re-cleaned up to take advantage of additional entries now removed by subtract
1787	04/03/2012 06:36 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: Via maps cleanup: Subtract $(coreMap) instead of $(coreSelfMap) so that entries whose input and output maps to the same place are subtracted as well
1786	04/03/2012 06:35 PM	Aaron Marcuse-Kubitza	subtract: Also remove mappings whose input and output maps to the same non-empty value in map_1
1785	04/03/2012 06:32 PM	Aaron Marcuse-Kubitza	util.py: Added all_equal(), all_equal_ignore_none(), have_same_value()
1784	04/03/2012 05:45 PM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Added id -> occurrenceID mapping
1783	04/03/2012 05:43 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS-CSV/maps/VegX.%.full.csv: Regenerated using new src maps
1782	04/03/2012 05:41 PM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Added mappings from dcterms elements without namespace to with namespace
1781	04/03/2012 05:40 PM	Aaron Marcuse-Kubitza	inputs/SALVIAS-CSV: Built maps/src.%.csv
1780	04/03/2012 05:24 PM	Aaron Marcuse-Kubitza	Added inputs/ACAD/maps/src.specimens.csv
1779	04/03/2012 05:23 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: Autogen src maps with known table names. Sources: $(withCatSrcs): Fixed bug where substitution pattern did not contain %.
1778	04/03/2012 05:22 PM	Aaron Marcuse-Kubitza	Added src_map to make a source map spreadsheet from a CSV header
1777	04/03/2012 04:32 PM	Aaron Marcuse-Kubitza	input.Makefile: Split Maps section into "Existing maps discovery" and "Maps building" sections. Sources: Added cat, cat-% to cat out sources.
1776	04/03/2012 04:17 PM	Aaron Marcuse-Kubitza	input.Makefile: Factored out sources-related code to new Sources section
1775	04/03/2012 04:08 PM	Aaron Marcuse-Kubitza	input.Makefile: $(srcMaps): Removed `$(filter-out maps/src.join.%.csv,...)` because maps/src.join.%.csv are no longer created
1774	04/03/2012 03:47 PM	Aaron Marcuse-Kubitza	README.TXT: Schema changes: Split updating graphical ERD exports into separate section. Update graphical ERD exports: Added schemas/vegbien.ERD.core.pdf .
1773	04/03/2012 03:42 PM	Aaron Marcuse-Kubitza	README.TXT: Added Datasource setup section with instructions to add a new datasource
1772	04/03/2012 03:38 PM	Aaron Marcuse-Kubitza	Added inputs/ACAD
1771	04/03/2012 03:37 PM	Aaron Marcuse-Kubitza	input.Makefile: Only setSvnIgnore the input dir, since it already exists and doesn't need to be added (inputs/Makefile adds it)
1770	04/03/2012 03:23 PM	Aaron Marcuse-Kubitza	inputs/*/maps/DwC.specimens.csv: Removed extranenous XML meta info from DwC column root, since it now just needs to be present in the core via map mappings/DwC-VegBIEN.specimens.csv
1769	04/03/2012 03:22 PM	Aaron Marcuse-Kubitza	union: Use new maps.merge_headers() to write properly combined header
1768	04/03/2012 03:21 PM	Aaron Marcuse-Kubitza	maps.py: join_combinable(): Fixed roots_combinable() to run on col names instead of roots, which were passed in. merge_mappings(): Factored out mapping column combining into merge_mapping_cols(), which handles an optional prefer param as well to take the header_num env var. Added merge_headers().
1767	04/03/2012 03:17 PM	Aaron Marcuse-Kubitza	util.py: Added sort_by_len(), shortest(), longest()
1766	04/03/2012 02:12 PM	Aaron Marcuse-Kubitza	join: Use new maps.join_combinable() to check if column names match
1765	04/03/2012 02:11 PM	Aaron Marcuse-Kubitza	maps.py: Added cols_combinable() and use it in combinable(). Added join_combinable() and associates helper functions. Added documentation labels to each section.
1764	04/03/2012 01:13 PM	Aaron Marcuse-Kubitza	xml_parse.py: ConsecXmlInputStream: Removed read() because that's now defined in streams.FilterStream
1763	04/03/2012 01:11 PM	Aaron Marcuse-Kubitza	xml_parse.py: parse_next(): Strip control characters from input stream because they mess up the parser
1762	04/03/2012 01:10 PM	Aaron Marcuse-Kubitza	streams.py: FilterStream: Forward all reads to readline()
1761	04/03/2012 01:08 PM	Aaron Marcuse-Kubitza	strings.py: Added is_ctrl() and strip_ctrl()
1760	04/03/2012 08:34 AM	Aaron Marcuse-Kubitza	xml_parse.py: parse_next(): On parser error, advance to next XML document since the rest of the current document is corrupted
1759	04/03/2012 08:33 AM	Aaron Marcuse-Kubitza	streams.py: Added consume(). Added documentation labels to each section.
1758	04/03/2012 08:23 AM	Aaron Marcuse-Kubitza	bin/map: For XML inputs, wrap sys.stdin in a LineCountStream and use new xml_parse.docs_iter() on_error() to add input line # to XML parsing exceptions

Project

General

Profile