Project

General

Profile

Statistics
| Revision:

# Date Author Comment
1877 04/16/2012 04:05 PM Aaron Marcuse-Kubitza

parallel.py: MultiProducerPool: Pickle objects by ID if they're accessible to the main_loop process. This should allow e.g. DB connections and pools to be pickled, if they were defined in the main process.

1876 04/14/2012 09:31 PM Aaron Marcuse-Kubitza

Added dicts.py with id_dict() and MergeDict

1875 04/14/2012 09:30 PM Aaron Marcuse-Kubitza

Added collection.py with rmap()

1874 04/14/2012 07:38 PM Aaron Marcuse-Kubitza

db_xml.py: put(): Moved pool.apply_async() from put_child() to put_(), and don't use lambdas because they can't be pickled

1873 04/14/2012 07:35 PM Aaron Marcuse-Kubitza

parallel.py: MultiProducerPool.apply_async(): Prepickle all function args. Try pickling the args before the queue pickles them, to get better debugging output.

1872 04/14/2012 07:33 PM Aaron Marcuse-Kubitza

sql.py: with_savepoint(): Use new rand.rand_int()

1871 04/14/2012 07:33 PM Aaron Marcuse-Kubitza

rand.py: rand_int() Fixed bug where newly-created objects did not have unique IDs because they were on the stack. So, we have to use random.randint() anyway.

1870 04/14/2012 07:27 PM Aaron Marcuse-Kubitza

Added rand.py

1869 04/14/2012 06:56 PM Aaron Marcuse-Kubitza

sql.py: DbConn: Made it picklable by establishing a connection on demand

1868 04/14/2012 06:54 PM Aaron Marcuse-Kubitza

bin/map: Also consume asynchronous tasks before closing the DB connection (this is where most if not all tasks will be consumed)

1867 04/14/2012 06:44 PM Aaron Marcuse-Kubitza

Runnable.py: Made it picklable

1866 04/14/2012 06:44 PM Aaron Marcuse-Kubitza

Added eval_.py

1865 04/14/2012 05:35 PM Aaron Marcuse-Kubitza

Added Runnable

1864 04/14/2012 03:05 PM Aaron Marcuse-Kubitza

db_xml.py: put(): Added parallel processing support for inserting children with fkeys to parent asynchronously

1863 04/14/2012 03:03 PM Aaron Marcuse-Kubitza

parallel.py: Fixed bugs: Added self param to instance methods and inner classes where needed

1862 04/14/2012 02:32 PM Aaron Marcuse-Kubitza

parallel.py: Changed to use multi-producer pool, which requires calling pool.main_loop()

1861 04/14/2012 01:04 PM Aaron Marcuse-Kubitza

parallel.py: Pool: Added doc comment

1860 04/14/2012 01:03 PM Aaron Marcuse-Kubitza

parallel.py: Pool: apply_async(): Return a result object like multiprocessing.Pool.apply_async()

1859 04/14/2012 12:53 PM Aaron Marcuse-Kubitza

bin/map: Use new parallel.py for parallel processing

1858 04/14/2012 12:51 PM Aaron Marcuse-Kubitza

Added parallel.py for parallel processing

1857 04/14/2012 12:37 PM Aaron Marcuse-Kubitza

bin/map: Use dummy synchronous Pool implementation if not using parallel processing

1856 04/14/2012 12:18 PM Aaron Marcuse-Kubitza

bin/map: Use multiprocessing instead of pp for parallel processing because it's easier to use (it uses the Python threading API and doesn't require providing all the functions a task calls). Allow the user to set the cpus option to to use all system CPUs (needed because in test mode, the default is 0 CPUs to turn off parallel processing).

1855 04/13/2012 04:41 PM Aaron Marcuse-Kubitza

disown_all, stop_imports: Use /bin/bash instead of /bin/sh because array subscripting is used

1854 04/13/2012 04:38 PM Aaron Marcuse-Kubitza

input.Makefile: Editing import: Use $(datasrc) instead of $(db) since $(db) is only set for DB-source inputs

1853 04/13/2012 04:31 PM Aaron Marcuse-Kubitza

input.Makefile: Import: If profile is on and test mode is on, output formatted profile stats to stdout

1852 04/13/2012 03:00 PM Aaron Marcuse-Kubitza

sql.py: index_cols(): Cache return values in db.index_cols

1851 04/13/2012 02:56 PM Aaron Marcuse-Kubitza

bin/map: Don't import pp unless cpus != 0 because it's slow and doesn't need to happen if we're not using parallelization. cpus option defaults to 0 in test mode so tests run faster.

1850 04/13/2012 02:52 PM Aaron Marcuse-Kubitza

sql.py: pkey(): Use pkeys cache from db object instead of parameter

1849 04/13/2012 02:44 PM Aaron Marcuse-Kubitza

sql.py: Wrapped db connection inside an object that can also store the cache of the pkeys and index_cols

1848 04/13/2012 02:27 PM Aaron Marcuse-Kubitza

bin/map: If cpus is 0, run without Parallel Python

1847 04/13/2012 02:19 PM Aaron Marcuse-Kubitza

bin/map: Set up Parallel Python with an env-var-customizable # CPUs

1846 04/13/2012 02:18 PM Aaron Marcuse-Kubitza

bin/map: Set up Parallel Python with an env-var-customizable # CPUs

1845 04/13/2012 12:58 PM Aaron Marcuse-Kubitza

root Makefile: python-Linux: Added `sudo pip install pp`

1844 04/13/2012 12:47 PM Aaron Marcuse-Kubitza

root Makefile: python-Linux: Added python-parallel to installs

1843 04/13/2012 12:19 PM Aaron Marcuse-Kubitza

mappings: Build VegX-VegBIEN.organisms.csv from VegX-VegBIEN.stems.csv instead of vice versa. This entails switching the roots around so stem points to organism instead of the other way around, which is a complex operation. Re-rooted VegX-VegBIEN.organisms.csv at /plantobservation instead of /taxonoccurrence to avoid traveling up the hierarchy to taxonoccurrence and back down again to plantobservation, etc. as would otherwise have been the case.

1842 04/13/2012 11:43 AM Aaron Marcuse-Kubitza

bin/map: When determining if outer elements are types, look for /*s/ anywhere in the string instead of just at the beginning, because there might be root attrs (namespaces), etc. before it

1841 04/13/2012 10:45 AM Aaron Marcuse-Kubitza

bin/map: When determining if outer elements are types, look for /*s/ anywhere in the string instead of just at the beginning, because there might be root attrs (namespaces), etc. before it

1840 04/13/2012 10:44 AM Aaron Marcuse-Kubitza

xpath.py: get(): forward (parent-to-child) pointers: If last target object exists but doesn't have an ID attr (which indicates a bug), recover gracefully by just assuming the ID is 0. (Any bug will be noticeable in the output, which needs to be generated through workarounds like this in order to be able to debug.)

1839 04/10/2012 05:18 PM Aaron Marcuse-Kubitza

VegX mappings: Updated stemParent mapping for VegX 1.5.3

1838 04/10/2012 04:54 PM Aaron Marcuse-Kubitza

VegX mappings: Changed taxonDetermination of role identifier to instead have explicitly no role, because data providers' VegX files generally do not provide role information and we don't want the default taxonDetermination XPaths to require this

1837 04/10/2012 04:34 PM Aaron Marcuse-Kubitza

inputs/CTFS/maps/VegX.organisms.csv: Connected plot to plotObservation by using new support for backward (child-to-parent) pointers whose target is a text element containing an ID

1836 04/10/2012 04:33 PM Aaron Marcuse-Kubitza

xml_dom.py: get_id(): If the node doesn't have an ID, assumes the node itself is the ID. This enables backward (child-to-parent) pointers whose target is a text element containing an ID, rather than a regular element with an ID attribute.

1835 04/10/2012 04:04 PM Aaron Marcuse-Kubitza

VegX mappings: Map locationevent.sourceaccessioncode to plotUniqueIdentifier since this field is no longer being used by authorlocationcode

1834 04/10/2012 03:48 PM Aaron Marcuse-Kubitza

VegX mappings: Map the authorlocationcode to plotName instead of plotUniqueIdentifier because it's a better fit

1833 04/10/2012 03:13 PM Aaron Marcuse-Kubitza

inputs/CTFS/maps/VegX.organisms.csv: Fixed bug in Species taxonConcept mapping where the role was computer instead of identifier

1832 04/10/2012 03:11 PM Aaron Marcuse-Kubitza

xml_dom.py: value(): Skip comment nodes. This fixes a bug where comments inside text elements would prevent the value from being retrieved.

1831 04/10/2012 03:02 PM Aaron Marcuse-Kubitza

inputs/CTFS/test: Accepted test outputs for new VegX_CTFS_row_120000_bci.0.test.organisms.xml instead of VegX_CTFS_row_180000.0.test.organisms.xml, which didn't have <taxonNameUsageConcepts> that match up with <individualOrganisms>

1830 04/10/2012 02:16 PM Aaron Marcuse-Kubitza

inputs/CTFS/test: Accepted test outputs for new VegX_CTFS_row_120000_bci.0.test.organisms.xml instead of VegX_CTFS_row_180000.0.test.organisms.xml, which didn't have <taxonNameUsageConcepts> that match up with <individualOrganisms>

1829 04/10/2012 01:59 PM Aaron Marcuse-Kubitza

inputs/CTFS/maps/VegX.organisms.csv: Added taxonConcept mappings

1828 04/10/2012 01:59 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.organisms.csv: Added species taxonConcept mapping for identifier role

1827 04/10/2012 01:33 PM Aaron Marcuse-Kubitza

Added expand_xpath to expand XPath abbreviations

1826 04/10/2012 12:43 PM Aaron Marcuse-Kubitza

VegX mappings: Renamed taxonNameUsageConceptsID to taxonNameUsageConceptID (no plural) to match VegX 1.5.3

1825 04/10/2012 12:33 PM Aaron Marcuse-Kubitza

inputs/CTFS/maps/VegX.organisms.csv: Corrected CensusNumber input mapping

1824 04/10/2012 12:24 PM Aaron Marcuse-Kubitza

mappings/Makefile: Generate self maps for all core maps

1823 04/10/2012 12:19 PM Aaron Marcuse-Kubitza

mappings/Makefile: VegX-VegBIEN.stems.csv: Removed $(rootAttrs) from out root because stems don't use tcs namespace elements (stems don't have taxonDeterminations separate from the main organism)

1822 04/10/2012 12:13 PM Aaron Marcuse-Kubitza

VegX mappings: taxonConcept mappings: Added "tcs:" namespace prefix to appropriate elements. This will make the taxonConcept XPaths compatible with CTFS VegX.

1821 04/09/2012 06:52 PM Aaron Marcuse-Kubitza

input.Makefile: Vars/functions: Make: $(subMake): When forwarding to another dir based off of $(root), forward to $(root) rather than directly to the dir of the target. This ensures that any special targets that are only defined in the root Makefile still get run, even when the target is in a subdir with its own Makefile.

1820 04/09/2012 06:41 PM Aaron Marcuse-Kubitza

inputs/CTFS/test: Accepted initial test outputs. A lot of leaves are still unmapped with the default mappings.

1819 04/09/2012 06:40 PM Aaron Marcuse-Kubitza

inputs/CTFS/maps: Added initial maps

1818 04/09/2012 06:39 PM Aaron Marcuse-Kubitza

VegX mappings: taxonConcept mappings: Added "tcs:" namespace prefix to appropriate elements. This will make the taxonConcept XPaths compatible with CTFS VegX.

1817 04/09/2012 06:13 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: full via maps (maps/$(via).%.full.csv): $(makeFullCsv): Sort all maps so that rows are re-ordered whether or not a core self map exists. This way, if a core self map is created, it will not cause the sort order of the generated via-format XMLs to change. This makes it easier to accept any changes to test outputs that result from adding a core self map.

1816 04/09/2012 05:53 PM Aaron Marcuse-Kubitza

mappings/Makefile: VegX: Added VegX.self.organisms.csv. Added root attrs to chRoot maps, commented out since it's not ready to be checked in yet.

1815 04/09/2012 05:34 PM Aaron Marcuse-Kubitza

xpath.py: get(): Run xml_dom.by_tag_name() with ignore_namespace=False (possibly later set to True)

1814 04/09/2012 05:32 PM Aaron Marcuse-Kubitza

xml_dom.py: Comments: Added clean_comment() and mk_comment(). Searching child nodes: by_tag_name(): Added ignore_namespace option to ignore namespace of node name.

1813 04/09/2012 05:26 PM Aaron Marcuse-Kubitza

root Makefile: Added %-remake target

1812 04/09/2012 04:53 PM Aaron Marcuse-Kubitza

mappings/Makefile: Renamed joinMaps to dwcMaps and chrootMaps to vegxMaps. Added commented-out code to create VegX.self.organisms.csv (not ready to check in yet because it affects many dependent maps).

1811 04/09/2012 02:52 PM Aaron Marcuse-Kubitza

input.Makefile: Removed no longer needed $(noEmptyMap)

1810 04/09/2012 12:40 PM Aaron Marcuse-Kubitza

xml_func.py: process(): Use new xml_dom.mk_comment()

1809 04/09/2012 12:40 PM Aaron Marcuse-Kubitza

xml_dom.py: Added clean_comment() and mk_comment() to properly sanitize comment contents (comments can't contain '--')

1808 04/09/2012 12:14 PM Aaron Marcuse-Kubitza

Added inputs/TRTE

1807 04/03/2012 08:26 PM Aaron Marcuse-Kubitza

inputs/QMOR/test: Added initial accepted test outputs

1806 04/03/2012 08:26 PM Aaron Marcuse-Kubitza

inputs/QMOR/maps: Added maps

1805 04/03/2012 08:20 PM Aaron Marcuse-Kubitza

Added inputs/QMOR

1804 04/03/2012 08:14 PM Aaron Marcuse-Kubitza

inputs/MT/test: Added initial accepted test outputs

1803 04/03/2012 08:14 PM Aaron Marcuse-Kubitza

inputs/MT/maps: Added maps

1802 04/03/2012 08:13 PM Aaron Marcuse-Kubitza

mappings/Makefile: DwC-VegBIEN.specimens.csv: Don't call remove_empty to produce it, because join now deals with empty mappings correctly by still raising a warning. Removed no longer needed intermediate DwC.ci-VegBIEN.specimens.csv.

1801 04/03/2012 08:09 PM Aaron Marcuse-Kubitza

join: Also print "No join mapping" warning if a join mapping was found but it was empty. The warning in that case is actually "No non-empty join mapping" to distinguish it from a mapping that's missing entirely. input.Makefile: missing_mappings: Support new "No join mapping" error message.

1800 04/03/2012 08:08 PM Aaron Marcuse-Kubitza

join: Also print "No join mapping" warning if a join mapping was found but it was empty. The warning in that case is actually "No non-empty join mapping" to distinguish it from a mapping that's missing entirely. input.Makefile: missing_mappings: Support new "No join mapping" error message.

1799 04/03/2012 07:33 PM Aaron Marcuse-Kubitza

Added inputs/MT

1798 04/03/2012 07:26 PM Aaron Marcuse-Kubitza

Added disown_all to disown all running jobs

1797 04/03/2012 07:26 PM Aaron Marcuse-Kubitza

stop_imports: Call jobspecs relative to $selfDir, rather than assuming it will be run from the svn root dir

1796 04/03/2012 07:18 PM Aaron Marcuse-Kubitza

union: Call maps.merge_headers() using **dict(prefer=header_num) instead of just prefer=header_num in order to work on Python 2.5.2 (which nimoy is running)

1795 04/03/2012 07:00 PM Aaron Marcuse-Kubitza

inputs/ACAD/test: Accepted initial test outputs

1794 04/03/2012 07:00 PM Aaron Marcuse-Kubitza

Added inputs/ACAD/maps/ maps

1793 04/03/2012 06:59 PM Aaron Marcuse-Kubitza

Accepted new test outputs resulting from the addition of the id -> occurrenceID mapping in mappings/DwC1-DwC2.specimens.csv

1792 04/03/2012 06:57 PM Aaron Marcuse-Kubitza

inputs/SALVIAS*/maps: Cleaned up maps for the first time since all via maps became subject to cleanup

1791 04/03/2012 06:55 PM Aaron Marcuse-Kubitza

input.Makefile: Removed no longer needed default "maps/.$(via).%.csv.last_cleanup" rule

1790 04/03/2012 06:54 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Via maps cleanup: Added `env ignore=1` since with the switch to subtracting $(coreMap), all inputs will attempt to subtract some map, even if it's not subtractable

1789 04/03/2012 06:47 PM Aaron Marcuse-Kubitza

input.Makefile: Don't clean src maps, only build them

1788 04/03/2012 06:45 PM Aaron Marcuse-Kubitza

inputs/ARIZ/maps/DwC.specimens.csv: Re-cleaned up to take advantage of additional entries now removed by subtract

1787 04/03/2012 06:36 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Via maps cleanup: Subtract $(coreMap) instead of $(coreSelfMap) so that entries whose input and output maps to the same place are subtracted as well

1786 04/03/2012 06:35 PM Aaron Marcuse-Kubitza

subtract: Also remove mappings whose input and output maps to the same non-empty value in map_1

1785 04/03/2012 06:32 PM Aaron Marcuse-Kubitza

util.py: Added all_equal(), all_equal_ignore_none(), have_same_value()

1784 04/03/2012 05:45 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Added id -> occurrenceID mapping

1783 04/03/2012 05:43 PM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV/maps/VegX.%.full.csv: Regenerated using new src maps

1782 04/03/2012 05:41 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Added mappings from dcterms elements without namespace to with namespace

1781 04/03/2012 05:40 PM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV: Built maps/src.%.csv

1780 04/03/2012 05:24 PM Aaron Marcuse-Kubitza

Added inputs/ACAD/maps/src.specimens.csv

1779 04/03/2012 05:23 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Autogen src maps with known table names. Sources: $(withCatSrcs): Fixed bug where substitution pattern did not contain %.

1778 04/03/2012 05:22 PM Aaron Marcuse-Kubitza

Added src_map to make a source map spreadsheet from a CSV header