/ - Changes - BIEN 3 - NCEAS Projects

root @ 1906

#	Date	Author	Comment
1906	04/17/2012 09:59 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: Only cache exceptions for inserts since they are not idempotent, but an invalid insert will always be invalid. If a cached result in an exception, re-raise it in a separate method other than the constructor to ensure that the cursor object is still created, and that its query instance var is set.
1905	04/17/2012 09:11 PM	Aaron Marcuse-Kubitza	sql.py: insert(): Cache insert queries by default. This works because any DuplicateKeyException, etc. would be cached as well. This saves many inserts for rows that we already know are in the database.
1904	04/17/2012 09:06 PM	Aaron Marcuse-Kubitza	sql.py: DbConn.run_query(): Cache exceptions raised by queries as well
1903	04/17/2012 08:48 PM	Aaron Marcuse-Kubitza	sql.py: DbConn.run_query(): When debug logging, label queries with their cache status (hit/miss/non-cacheable)
1902	04/17/2012 08:25 PM	Aaron Marcuse-Kubitza	sql.py: DbConn.run_query(): Also debug-log queries that produce exceptions
1901	04/17/2012 08:18 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: Allow creator to provide a log function to call on debug messages, instead of using stderr directly
1900	04/17/2012 08:01 PM	Aaron Marcuse-Kubitza	bin/map: Pass debug mode to DbConn so that SQL query debugging works again
1899	04/17/2012 07:49 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: DbCursor: Fixed bug where caching was always turned on, by passing the cacheable setting to it from run_query(). Turned caching back on (uncommented it) since it's now working.
1898	04/17/2012 07:21 PM	Aaron Marcuse-Kubitza	bin/map: map_rows()/map_table(): Pass kw_args to process_rows() so rows_start can be specified when using them. DB inputs: Skip the pre-start rows in the SQL query itself, so that they don't need to be iterated over by the cursor in the main loop.
1897	04/17/2012 07:07 PM	Aaron Marcuse-Kubitza	bin/map: Fixed bug introduced in r1718 where the row # would not be incremented if i < start, causing an semi-infinite loop that only ended when the input rows were exhausted. process_rows(): Added optional rows_start parameter to use if the input rows already have the pre-start rows skipped.
1896	04/17/2012 05:49 PM	Aaron Marcuse-Kubitza	input.Makefile: Sources: cat: Changed Usage message to use "--silent" make option
1895	04/17/2012 05:45 PM	Aaron Marcuse-Kubitza	input.Makefile: Sources: cat: Added Usage message with instructions for removing echoed make commands
1894	04/17/2012 05:17 PM	Aaron Marcuse-Kubitza	run_*query(): Fixed bug where INSERTs, etc. were cached by making callers (such as select()) explicitly turn on caching. DbConn.run_query(): Fixed bug where cur.mogrify() was not supported under MySQL by making the cache key a tuple of the unmogrified query and its params instead of the mogrified string query. CacheCursor: Store attributes of the original cursor that we use, such as query and rowcount.
1893	04/17/2012 04:38 PM	Aaron Marcuse-Kubitza	sql.py: Made row() and value() cache the result by fetching all rows before returning the first row
1892	04/17/2012 04:37 PM	Aaron Marcuse-Kubitza	iters.py: Added func_iter() and consume_iter()
1891	04/17/2012 04:11 PM	Aaron Marcuse-Kubitza	sql.py: Cache the results of queries (when all rows are read)
1890	04/17/2012 03:48 PM	Aaron Marcuse-Kubitza	Proxy.py: Fixed infinite recursion bug by removing setattr() (which prevents the class and subclasses from storing instance variables using "self." syntax)
1889	04/16/2012 10:19 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: Added run_query(). run_raw_query(): Use new DbConn.run_query().
1888	04/16/2012 10:18 PM	Aaron Marcuse-Kubitza	Added Proxy.py
1887	04/16/2012 09:32 PM	Aaron Marcuse-Kubitza	parallel.py: MultiProducerPool: Added code to create a shared Namespace object, commented out. Updated share() doc comment to reflect that it will writably share the values as well.
1886	04/16/2012 08:49 PM	Aaron Marcuse-Kubitza	bin/map: Share locals() with the pool at various times to try to get as many unpicklable values into the shared vars as possible
1885	04/16/2012 08:45 PM	Aaron Marcuse-Kubitza	dicts.py: Turned id_dict() factory function into IdDict class. parallel.py: MultiProducerPool: Added share_vars(). main_loop(): Only consider the program to be done if the queue is empty and there are no running tasks.
1884	04/16/2012 08:00 PM	Aaron Marcuse-Kubitza	collection.py: rmap(): Treat only built-in sequences specially instead of iterables. Pass whether the value is a leaf to the func. Added option to only recurse up to a certain # of levels.
1883	04/16/2012 07:10 PM	Aaron Marcuse-Kubitza	Added lists.py
1882	04/16/2012 04:40 PM	Aaron Marcuse-Kubitza	collection.py: rmap(): Fixed bugs: Made it recursive. Use iters.is_iterable() instead of isinstance(value, list) to work on all iterables. Use value and not nonexistent var list_.
1881	04/16/2012 04:38 PM	Aaron Marcuse-Kubitza	iters.py: Added is_iterable()
1880	04/16/2012 04:11 PM	Aaron Marcuse-Kubitza	parallel.py: prepickle(): Pickle all objects in vars_id_dict_ by ID, not just unpicklable ones. This ensures that a DB connection created in the main process will be shared with subprocesses by reference (id()) instead of by value, so that each process can take advantage of e.g. shared caches in the connection object. Note that this may require some synchronization.
1879	04/16/2012 04:06 PM	Aaron Marcuse-Kubitza	parallel.py: MultiProducerPool.main_loop(): Got rid of no longer correct doc comment
1878	04/16/2012 04:05 PM	Aaron Marcuse-Kubitza	bin/map: Share on_error with the pool
1877	04/16/2012 04:05 PM	Aaron Marcuse-Kubitza	parallel.py: MultiProducerPool: Pickle objects by ID if they're accessible to the main_loop process. This should allow e.g. DB connections and pools to be pickled, if they were defined in the main process.
1876	04/14/2012 09:31 PM	Aaron Marcuse-Kubitza	Added dicts.py with id_dict() and MergeDict
1875	04/14/2012 09:30 PM	Aaron Marcuse-Kubitza	Added collection.py with rmap()
1874	04/14/2012 07:38 PM	Aaron Marcuse-Kubitza	db_xml.py: put(): Moved pool.apply_async() from put_child() to put_(), and don't use lambdas because they can't be pickled
1873	04/14/2012 07:35 PM	Aaron Marcuse-Kubitza	parallel.py: MultiProducerPool.apply_async(): Prepickle all function args. Try pickling the args before the queue pickles them, to get better debugging output.
1872	04/14/2012 07:33 PM	Aaron Marcuse-Kubitza	sql.py: with_savepoint(): Use new rand.rand_int()
1871	04/14/2012 07:33 PM	Aaron Marcuse-Kubitza	rand.py: rand_int() Fixed bug where newly-created objects did not have unique IDs because they were on the stack. So, we have to use random.randint() anyway.
1870	04/14/2012 07:27 PM	Aaron Marcuse-Kubitza	Added rand.py
1869	04/14/2012 06:56 PM	Aaron Marcuse-Kubitza	sql.py: DbConn: Made it picklable by establishing a connection on demand
1868	04/14/2012 06:54 PM	Aaron Marcuse-Kubitza	bin/map: Also consume asynchronous tasks before closing the DB connection (this is where most if not all tasks will be consumed)
1867	04/14/2012 06:44 PM	Aaron Marcuse-Kubitza	Runnable.py: Made it picklable
1866	04/14/2012 06:44 PM	Aaron Marcuse-Kubitza	Added eval_.py
1865	04/14/2012 05:35 PM	Aaron Marcuse-Kubitza	Added Runnable
1864	04/14/2012 03:05 PM	Aaron Marcuse-Kubitza	db_xml.py: put(): Added parallel processing support for inserting children with fkeys to parent asynchronously
1863	04/14/2012 03:03 PM	Aaron Marcuse-Kubitza	parallel.py: Fixed bugs: Added self param to instance methods and inner classes where needed
1862	04/14/2012 02:32 PM	Aaron Marcuse-Kubitza	parallel.py: Changed to use multi-producer pool, which requires calling pool.main_loop()
1861	04/14/2012 01:04 PM	Aaron Marcuse-Kubitza	parallel.py: Pool: Added doc comment
1860	04/14/2012 01:03 PM	Aaron Marcuse-Kubitza	parallel.py: Pool: apply_async(): Return a result object like multiprocessing.Pool.apply_async()
1859	04/14/2012 12:53 PM	Aaron Marcuse-Kubitza	bin/map: Use new parallel.py for parallel processing
1858	04/14/2012 12:51 PM	Aaron Marcuse-Kubitza	Added parallel.py for parallel processing
1857	04/14/2012 12:37 PM	Aaron Marcuse-Kubitza	bin/map: Use dummy synchronous Pool implementation if not using parallel processing
1856	04/14/2012 12:18 PM	Aaron Marcuse-Kubitza	bin/map: Use multiprocessing instead of pp for parallel processing because it's easier to use (it uses the Python threading API and doesn't require providing all the functions a task calls). Allow the user to set the cpus option to to use all system CPUs (needed because in test mode, the default is 0 CPUs to turn off parallel processing).
1855	04/13/2012 04:41 PM	Aaron Marcuse-Kubitza	disown_all, stop_imports: Use /bin/bash instead of /bin/sh because array subscripting is used
1854	04/13/2012 04:38 PM	Aaron Marcuse-Kubitza	input.Makefile: Editing import: Use $(datasrc) instead of $(db) since $(db) is only set for DB-source inputs
1853	04/13/2012 04:31 PM	Aaron Marcuse-Kubitza	input.Makefile: Import: If profile is on and test mode is on, output formatted profile stats to stdout
1852	04/13/2012 03:00 PM	Aaron Marcuse-Kubitza	sql.py: index_cols(): Cache return values in db.index_cols
1851	04/13/2012 02:56 PM	Aaron Marcuse-Kubitza	bin/map: Don't import pp unless cpus != 0 because it's slow and doesn't need to happen if we're not using parallelization. cpus option defaults to 0 in test mode so tests run faster.
1850	04/13/2012 02:52 PM	Aaron Marcuse-Kubitza	sql.py: pkey(): Use pkeys cache from db object instead of parameter
1849	04/13/2012 02:44 PM	Aaron Marcuse-Kubitza	sql.py: Wrapped db connection inside an object that can also store the cache of the pkeys and index_cols
1848	04/13/2012 02:27 PM	Aaron Marcuse-Kubitza	bin/map: If cpus is 0, run without Parallel Python
1847	04/13/2012 02:19 PM	Aaron Marcuse-Kubitza	bin/map: Set up Parallel Python with an env-var-customizable # CPUs
1846	04/13/2012 02:18 PM	Aaron Marcuse-Kubitza	bin/map: Set up Parallel Python with an env-var-customizable # CPUs
1845	04/13/2012 12:58 PM	Aaron Marcuse-Kubitza	root Makefile: python-Linux: Added `sudo pip install pp`
1844	04/13/2012 12:47 PM	Aaron Marcuse-Kubitza	root Makefile: python-Linux: Added python-parallel to installs
1843	04/13/2012 12:19 PM	Aaron Marcuse-Kubitza	mappings: Build VegX-VegBIEN.organisms.csv from VegX-VegBIEN.stems.csv instead of vice versa. This entails switching the roots around so stem points to organism instead of the other way around, which is a complex operation. Re-rooted VegX-VegBIEN.organisms.csv at /plantobservation instead of /taxonoccurrence to avoid traveling up the hierarchy to taxonoccurrence and back down again to plantobservation, etc. as would otherwise have been the case.
1842	04/13/2012 11:43 AM	Aaron Marcuse-Kubitza	bin/map: When determining if outer elements are types, look for /*s/ anywhere in the string instead of just at the beginning, because there might be root attrs (namespaces), etc. before it
1841	04/13/2012 10:45 AM	Aaron Marcuse-Kubitza	bin/map: When determining if outer elements are types, look for /*s/ anywhere in the string instead of just at the beginning, because there might be root attrs (namespaces), etc. before it
1840	04/13/2012 10:44 AM	Aaron Marcuse-Kubitza	xpath.py: get(): forward (parent-to-child) pointers: If last target object exists but doesn't have an ID attr (which indicates a bug), recover gracefully by just assuming the ID is 0. (Any bug will be noticeable in the output, which needs to be generated through workarounds like this in order to be able to debug.)
1839	04/10/2012 05:18 PM	Aaron Marcuse-Kubitza	VegX mappings: Updated stemParent mapping for VegX 1.5.3
1838	04/10/2012 04:54 PM	Aaron Marcuse-Kubitza	VegX mappings: Changed taxonDetermination of role identifier to instead have explicitly no role, because data providers' VegX files generally do not provide role information and we don't want the default taxonDetermination XPaths to require this
1837	04/10/2012 04:34 PM	Aaron Marcuse-Kubitza	inputs/CTFS/maps/VegX.organisms.csv: Connected plot to plotObservation by using new support for backward (child-to-parent) pointers whose target is a text element containing an ID
1836	04/10/2012 04:33 PM	Aaron Marcuse-Kubitza	xml_dom.py: get_id(): If the node doesn't have an ID, assumes the node itself is the ID. This enables backward (child-to-parent) pointers whose target is a text element containing an ID, rather than a regular element with an ID attribute.
1835	04/10/2012 04:04 PM	Aaron Marcuse-Kubitza	VegX mappings: Map locationevent.sourceaccessioncode to plotUniqueIdentifier since this field is no longer being used by authorlocationcode
1834	04/10/2012 03:48 PM	Aaron Marcuse-Kubitza	VegX mappings: Map the authorlocationcode to plotName instead of plotUniqueIdentifier because it's a better fit
1833	04/10/2012 03:13 PM	Aaron Marcuse-Kubitza	inputs/CTFS/maps/VegX.organisms.csv: Fixed bug in Species taxonConcept mapping where the role was computer instead of identifier
1832	04/10/2012 03:11 PM	Aaron Marcuse-Kubitza	xml_dom.py: value(): Skip comment nodes. This fixes a bug where comments inside text elements would prevent the value from being retrieved.
1831	04/10/2012 03:02 PM	Aaron Marcuse-Kubitza	inputs/CTFS/test: Accepted test outputs for new VegX_CTFS_row_120000_bci.0.test.organisms.xml instead of VegX_CTFS_row_180000.0.test.organisms.xml, which didn't have <taxonNameUsageConcepts> that match up with <individualOrganisms>
1830	04/10/2012 02:16 PM	Aaron Marcuse-Kubitza	inputs/CTFS/test: Accepted test outputs for new VegX_CTFS_row_120000_bci.0.test.organisms.xml instead of VegX_CTFS_row_180000.0.test.organisms.xml, which didn't have <taxonNameUsageConcepts> that match up with <individualOrganisms>
1829	04/10/2012 01:59 PM	Aaron Marcuse-Kubitza	inputs/CTFS/maps/VegX.organisms.csv: Added taxonConcept mappings
1828	04/10/2012 01:59 PM	Aaron Marcuse-Kubitza	mappings/VegX-VegBIEN.organisms.csv: Added species taxonConcept mapping for identifier role
1827	04/10/2012 01:33 PM	Aaron Marcuse-Kubitza	Added expand_xpath to expand XPath abbreviations
1826	04/10/2012 12:43 PM	Aaron Marcuse-Kubitza	VegX mappings: Renamed taxonNameUsageConceptsID to taxonNameUsageConceptID (no plural) to match VegX 1.5.3
1825	04/10/2012 12:33 PM	Aaron Marcuse-Kubitza	inputs/CTFS/maps/VegX.organisms.csv: Corrected CensusNumber input mapping
1824	04/10/2012 12:24 PM	Aaron Marcuse-Kubitza	mappings/Makefile: Generate self maps for all core maps
1823	04/10/2012 12:19 PM	Aaron Marcuse-Kubitza	mappings/Makefile: VegX-VegBIEN.stems.csv: Removed $(rootAttrs) from out root because stems don't use tcs namespace elements (stems don't have taxonDeterminations separate from the main organism)
1822	04/10/2012 12:13 PM	Aaron Marcuse-Kubitza	VegX mappings: taxonConcept mappings: Added "tcs:" namespace prefix to appropriate elements. This will make the taxonConcept XPaths compatible with CTFS VegX.
1821	04/09/2012 06:52 PM	Aaron Marcuse-Kubitza	input.Makefile: Vars/functions: Make: $(subMake): When forwarding to another dir based off of $(root), forward to $(root) rather than directly to the dir of the target. This ensures that any special targets that are only defined in the root Makefile still get run, even when the target is in a subdir with its own Makefile.
1820	04/09/2012 06:41 PM	Aaron Marcuse-Kubitza	inputs/CTFS/test: Accepted initial test outputs. A lot of leaves are still unmapped with the default mappings.
1819	04/09/2012 06:40 PM	Aaron Marcuse-Kubitza	inputs/CTFS/maps: Added initial maps
1818	04/09/2012 06:39 PM	Aaron Marcuse-Kubitza	VegX mappings: taxonConcept mappings: Added "tcs:" namespace prefix to appropriate elements. This will make the taxonConcept XPaths compatible with CTFS VegX.
1817	04/09/2012 06:13 PM	Aaron Marcuse-Kubitza	input.Makefile: Maps building: full via maps (maps/$(via).%.full.csv): $(makeFullCsv): Sort all maps so that rows are re-ordered whether or not a core self map exists. This way, if a core self map is created, it will not cause the sort order of the generated via-format XMLs to change. This makes it easier to accept any changes to test outputs that result from adding a core self map.
1816	04/09/2012 05:53 PM	Aaron Marcuse-Kubitza	mappings/Makefile: VegX: Added VegX.self.organisms.csv. Added root attrs to chRoot maps, commented out since it's not ready to be checked in yet.
1815	04/09/2012 05:34 PM	Aaron Marcuse-Kubitza	xpath.py: get(): Run xml_dom.by_tag_name() with ignore_namespace=False (possibly later set to True)
1814	04/09/2012 05:32 PM	Aaron Marcuse-Kubitza	xml_dom.py: Comments: Added clean_comment() and mk_comment(). Searching child nodes: by_tag_name(): Added ignore_namespace option to ignore namespace of node name.
1813	04/09/2012 05:26 PM	Aaron Marcuse-Kubitza	root Makefile: Added %-remake target
1812	04/09/2012 04:53 PM	Aaron Marcuse-Kubitza	mappings/Makefile: Renamed joinMaps to dwcMaps and chrootMaps to vegxMaps. Added commented-out code to create VegX.self.organisms.csv (not ready to check in yet because it affects many dependent maps).
1811	04/09/2012 02:52 PM	Aaron Marcuse-Kubitza	input.Makefile: Removed no longer needed $(noEmptyMap)
1810	04/09/2012 12:40 PM	Aaron Marcuse-Kubitza	xml_func.py: process(): Use new xml_dom.mk_comment()
1809	04/09/2012 12:40 PM	Aaron Marcuse-Kubitza	xml_dom.py: Added clean_comment() and mk_comment() to properly sanitize comment contents (comments can't contain '--')
1808	04/09/2012 12:14 PM	Aaron Marcuse-Kubitza	Added inputs/TRTE
1807	04/03/2012 08:26 PM	Aaron Marcuse-Kubitza	inputs/QMOR/test: Added initial accepted test outputs

Project

General

Profile