Project

General

Profile

Statistics
| Revision:

# Date Author Comment
1856 04/14/2012 12:18 PM Aaron Marcuse-Kubitza

bin/map: Use multiprocessing instead of pp for parallel processing because it's easier to use (it uses the Python threading API and doesn't require providing all the functions a task calls). Allow the user to set the cpus option to to use all system CPUs (needed because in test mode, the default is 0 CPUs to turn off parallel processing).

1855 04/13/2012 04:41 PM Aaron Marcuse-Kubitza

disown_all, stop_imports: Use /bin/bash instead of /bin/sh because array subscripting is used

1854 04/13/2012 04:38 PM Aaron Marcuse-Kubitza

input.Makefile: Editing import: Use $(datasrc) instead of $(db) since $(db) is only set for DB-source inputs

1853 04/13/2012 04:31 PM Aaron Marcuse-Kubitza

input.Makefile: Import: If profile is on and test mode is on, output formatted profile stats to stdout

1852 04/13/2012 03:00 PM Aaron Marcuse-Kubitza

sql.py: index_cols(): Cache return values in db.index_cols

1851 04/13/2012 02:56 PM Aaron Marcuse-Kubitza

bin/map: Don't import pp unless cpus != 0 because it's slow and doesn't need to happen if we're not using parallelization. cpus option defaults to 0 in test mode so tests run faster.

1850 04/13/2012 02:52 PM Aaron Marcuse-Kubitza

sql.py: pkey(): Use pkeys cache from db object instead of parameter

1849 04/13/2012 02:44 PM Aaron Marcuse-Kubitza

sql.py: Wrapped db connection inside an object that can also store the cache of the pkeys and index_cols

1848 04/13/2012 02:27 PM Aaron Marcuse-Kubitza

bin/map: If cpus is 0, run without Parallel Python

1847 04/13/2012 02:19 PM Aaron Marcuse-Kubitza

bin/map: Set up Parallel Python with an env-var-customizable # CPUs

1846 04/13/2012 02:18 PM Aaron Marcuse-Kubitza

bin/map: Set up Parallel Python with an env-var-customizable # CPUs

1845 04/13/2012 12:58 PM Aaron Marcuse-Kubitza

root Makefile: python-Linux: Added `sudo pip install pp`

1844 04/13/2012 12:47 PM Aaron Marcuse-Kubitza

root Makefile: python-Linux: Added python-parallel to installs

1843 04/13/2012 12:19 PM Aaron Marcuse-Kubitza

mappings: Build VegX-VegBIEN.organisms.csv from VegX-VegBIEN.stems.csv instead of vice versa. This entails switching the roots around so stem points to organism instead of the other way around, which is a complex operation. Re-rooted VegX-VegBIEN.organisms.csv at /plantobservation instead of /taxonoccurrence to avoid traveling up the hierarchy to taxonoccurrence and back down again to plantobservation, etc. as would otherwise have been the case.

1842 04/13/2012 11:43 AM Aaron Marcuse-Kubitza

bin/map: When determining if outer elements are types, look for /*s/ anywhere in the string instead of just at the beginning, because there might be root attrs (namespaces), etc. before it

1841 04/13/2012 10:45 AM Aaron Marcuse-Kubitza

bin/map: When determining if outer elements are types, look for /*s/ anywhere in the string instead of just at the beginning, because there might be root attrs (namespaces), etc. before it

1840 04/13/2012 10:44 AM Aaron Marcuse-Kubitza

xpath.py: get(): forward (parent-to-child) pointers: If last target object exists but doesn't have an ID attr (which indicates a bug), recover gracefully by just assuming the ID is 0. (Any bug will be noticeable in the output, which needs to be generated through workarounds like this in order to be able to debug.)

1839 04/10/2012 05:18 PM Aaron Marcuse-Kubitza

VegX mappings: Updated stemParent mapping for VegX 1.5.3

1838 04/10/2012 04:54 PM Aaron Marcuse-Kubitza

VegX mappings: Changed taxonDetermination of role identifier to instead have explicitly no role, because data providers' VegX files generally do not provide role information and we don't want the default taxonDetermination XPaths to require this

1837 04/10/2012 04:34 PM Aaron Marcuse-Kubitza

inputs/CTFS/maps/VegX.organisms.csv: Connected plot to plotObservation by using new support for backward (child-to-parent) pointers whose target is a text element containing an ID

1836 04/10/2012 04:33 PM Aaron Marcuse-Kubitza

xml_dom.py: get_id(): If the node doesn't have an ID, assumes the node itself is the ID. This enables backward (child-to-parent) pointers whose target is a text element containing an ID, rather than a regular element with an ID attribute.

1835 04/10/2012 04:04 PM Aaron Marcuse-Kubitza

VegX mappings: Map locationevent.sourceaccessioncode to plotUniqueIdentifier since this field is no longer being used by authorlocationcode

1834 04/10/2012 03:48 PM Aaron Marcuse-Kubitza

VegX mappings: Map the authorlocationcode to plotName instead of plotUniqueIdentifier because it's a better fit

1833 04/10/2012 03:13 PM Aaron Marcuse-Kubitza

inputs/CTFS/maps/VegX.organisms.csv: Fixed bug in Species taxonConcept mapping where the role was computer instead of identifier

1832 04/10/2012 03:11 PM Aaron Marcuse-Kubitza

xml_dom.py: value(): Skip comment nodes. This fixes a bug where comments inside text elements would prevent the value from being retrieved.

1831 04/10/2012 03:02 PM Aaron Marcuse-Kubitza

inputs/CTFS/test: Accepted test outputs for new VegX_CTFS_row_120000_bci.0.test.organisms.xml instead of VegX_CTFS_row_180000.0.test.organisms.xml, which didn't have <taxonNameUsageConcepts> that match up with <individualOrganisms>

1830 04/10/2012 02:16 PM Aaron Marcuse-Kubitza

inputs/CTFS/test: Accepted test outputs for new VegX_CTFS_row_120000_bci.0.test.organisms.xml instead of VegX_CTFS_row_180000.0.test.organisms.xml, which didn't have <taxonNameUsageConcepts> that match up with <individualOrganisms>

1829 04/10/2012 01:59 PM Aaron Marcuse-Kubitza

inputs/CTFS/maps/VegX.organisms.csv: Added taxonConcept mappings

1828 04/10/2012 01:59 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.organisms.csv: Added species taxonConcept mapping for identifier role

1827 04/10/2012 01:33 PM Aaron Marcuse-Kubitza

Added expand_xpath to expand XPath abbreviations

1826 04/10/2012 12:43 PM Aaron Marcuse-Kubitza

VegX mappings: Renamed taxonNameUsageConceptsID to taxonNameUsageConceptID (no plural) to match VegX 1.5.3

1825 04/10/2012 12:33 PM Aaron Marcuse-Kubitza

inputs/CTFS/maps/VegX.organisms.csv: Corrected CensusNumber input mapping

1824 04/10/2012 12:24 PM Aaron Marcuse-Kubitza

mappings/Makefile: Generate self maps for all core maps

1823 04/10/2012 12:19 PM Aaron Marcuse-Kubitza

mappings/Makefile: VegX-VegBIEN.stems.csv: Removed $(rootAttrs) from out root because stems don't use tcs namespace elements (stems don't have taxonDeterminations separate from the main organism)

1822 04/10/2012 12:13 PM Aaron Marcuse-Kubitza

VegX mappings: taxonConcept mappings: Added "tcs:" namespace prefix to appropriate elements. This will make the taxonConcept XPaths compatible with CTFS VegX.

1821 04/09/2012 06:52 PM Aaron Marcuse-Kubitza

input.Makefile: Vars/functions: Make: $(subMake): When forwarding to another dir based off of $(root), forward to $(root) rather than directly to the dir of the target. This ensures that any special targets that are only defined in the root Makefile still get run, even when the target is in a subdir with its own Makefile.

1820 04/09/2012 06:41 PM Aaron Marcuse-Kubitza

inputs/CTFS/test: Accepted initial test outputs. A lot of leaves are still unmapped with the default mappings.

1819 04/09/2012 06:40 PM Aaron Marcuse-Kubitza

inputs/CTFS/maps: Added initial maps

1818 04/09/2012 06:39 PM Aaron Marcuse-Kubitza

VegX mappings: taxonConcept mappings: Added "tcs:" namespace prefix to appropriate elements. This will make the taxonConcept XPaths compatible with CTFS VegX.

1817 04/09/2012 06:13 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: full via maps (maps/$(via).%.full.csv): $(makeFullCsv): Sort all maps so that rows are re-ordered whether or not a core self map exists. This way, if a core self map is created, it will not cause the sort order of the generated via-format XMLs to change. This makes it easier to accept any changes to test outputs that result from adding a core self map.

1816 04/09/2012 05:53 PM Aaron Marcuse-Kubitza

mappings/Makefile: VegX: Added VegX.self.organisms.csv. Added root attrs to chRoot maps, commented out since it's not ready to be checked in yet.

1815 04/09/2012 05:34 PM Aaron Marcuse-Kubitza

xpath.py: get(): Run xml_dom.by_tag_name() with ignore_namespace=False (possibly later set to True)

1814 04/09/2012 05:32 PM Aaron Marcuse-Kubitza

xml_dom.py: Comments: Added clean_comment() and mk_comment(). Searching child nodes: by_tag_name(): Added ignore_namespace option to ignore namespace of node name.

1813 04/09/2012 05:26 PM Aaron Marcuse-Kubitza

root Makefile: Added %-remake target

1812 04/09/2012 04:53 PM Aaron Marcuse-Kubitza

mappings/Makefile: Renamed joinMaps to dwcMaps and chrootMaps to vegxMaps. Added commented-out code to create VegX.self.organisms.csv (not ready to check in yet because it affects many dependent maps).

1811 04/09/2012 02:52 PM Aaron Marcuse-Kubitza

input.Makefile: Removed no longer needed $(noEmptyMap)

1810 04/09/2012 12:40 PM Aaron Marcuse-Kubitza

xml_func.py: process(): Use new xml_dom.mk_comment()

1809 04/09/2012 12:40 PM Aaron Marcuse-Kubitza

xml_dom.py: Added clean_comment() and mk_comment() to properly sanitize comment contents (comments can't contain '--')

1808 04/09/2012 12:14 PM Aaron Marcuse-Kubitza

Added inputs/TRTE

1807 04/03/2012 08:26 PM Aaron Marcuse-Kubitza

inputs/QMOR/test: Added initial accepted test outputs

1806 04/03/2012 08:26 PM Aaron Marcuse-Kubitza

inputs/QMOR/maps: Added maps

1805 04/03/2012 08:20 PM Aaron Marcuse-Kubitza

Added inputs/QMOR

1804 04/03/2012 08:14 PM Aaron Marcuse-Kubitza

inputs/MT/test: Added initial accepted test outputs

1803 04/03/2012 08:14 PM Aaron Marcuse-Kubitza

inputs/MT/maps: Added maps

1802 04/03/2012 08:13 PM Aaron Marcuse-Kubitza

mappings/Makefile: DwC-VegBIEN.specimens.csv: Don't call remove_empty to produce it, because join now deals with empty mappings correctly by still raising a warning. Removed no longer needed intermediate DwC.ci-VegBIEN.specimens.csv.

1801 04/03/2012 08:09 PM Aaron Marcuse-Kubitza

join: Also print "No join mapping" warning if a join mapping was found but it was empty. The warning in that case is actually "No non-empty join mapping" to distinguish it from a mapping that's missing entirely. input.Makefile: missing_mappings: Support new "No join mapping" error message.

1800 04/03/2012 08:08 PM Aaron Marcuse-Kubitza

join: Also print "No join mapping" warning if a join mapping was found but it was empty. The warning in that case is actually "No non-empty join mapping" to distinguish it from a mapping that's missing entirely. input.Makefile: missing_mappings: Support new "No join mapping" error message.

1799 04/03/2012 07:33 PM Aaron Marcuse-Kubitza

Added inputs/MT

1798 04/03/2012 07:26 PM Aaron Marcuse-Kubitza

Added disown_all to disown all running jobs

1797 04/03/2012 07:26 PM Aaron Marcuse-Kubitza

stop_imports: Call jobspecs relative to $selfDir, rather than assuming it will be run from the svn root dir

1796 04/03/2012 07:18 PM Aaron Marcuse-Kubitza

union: Call maps.merge_headers() using **dict(prefer=header_num) instead of just prefer=header_num in order to work on Python 2.5.2 (which nimoy is running)

1795 04/03/2012 07:00 PM Aaron Marcuse-Kubitza

inputs/ACAD/test: Accepted initial test outputs

1794 04/03/2012 07:00 PM Aaron Marcuse-Kubitza

Added inputs/ACAD/maps/ maps

1793 04/03/2012 06:59 PM Aaron Marcuse-Kubitza

Accepted new test outputs resulting from the addition of the id -> occurrenceID mapping in mappings/DwC1-DwC2.specimens.csv

1792 04/03/2012 06:57 PM Aaron Marcuse-Kubitza

inputs/SALVIAS*/maps: Cleaned up maps for the first time since all via maps became subject to cleanup

1791 04/03/2012 06:55 PM Aaron Marcuse-Kubitza

input.Makefile: Removed no longer needed default "maps/.$(via).%.csv.last_cleanup" rule

1790 04/03/2012 06:54 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Via maps cleanup: Added `env ignore=1` since with the switch to subtracting $(coreMap), all inputs will attempt to subtract some map, even if it's not subtractable

1789 04/03/2012 06:47 PM Aaron Marcuse-Kubitza

input.Makefile: Don't clean src maps, only build them

1788 04/03/2012 06:45 PM Aaron Marcuse-Kubitza

inputs/ARIZ/maps/DwC.specimens.csv: Re-cleaned up to take advantage of additional entries now removed by subtract

1787 04/03/2012 06:36 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Via maps cleanup: Subtract $(coreMap) instead of $(coreSelfMap) so that entries whose input and output maps to the same place are subtracted as well

1786 04/03/2012 06:35 PM Aaron Marcuse-Kubitza

subtract: Also remove mappings whose input and output maps to the same non-empty value in map_1

1785 04/03/2012 06:32 PM Aaron Marcuse-Kubitza

util.py: Added all_equal(), all_equal_ignore_none(), have_same_value()

1784 04/03/2012 05:45 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Added id -> occurrenceID mapping

1783 04/03/2012 05:43 PM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV/maps/VegX.%.full.csv: Regenerated using new src maps

1782 04/03/2012 05:41 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Added mappings from dcterms elements without namespace to with namespace

1781 04/03/2012 05:40 PM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV: Built maps/src.%.csv

1780 04/03/2012 05:24 PM Aaron Marcuse-Kubitza

Added inputs/ACAD/maps/src.specimens.csv

1779 04/03/2012 05:23 PM Aaron Marcuse-Kubitza

input.Makefile: Maps building: Autogen src maps with known table names. Sources: $(withCatSrcs): Fixed bug where substitution pattern did not contain %.

1778 04/03/2012 05:22 PM Aaron Marcuse-Kubitza

Added src_map to make a source map spreadsheet from a CSV header

1777 04/03/2012 04:32 PM Aaron Marcuse-Kubitza

input.Makefile: Split Maps section into "Existing maps discovery" and "Maps building" sections. Sources: Added cat, cat-% to cat out sources.

1776 04/03/2012 04:17 PM Aaron Marcuse-Kubitza

input.Makefile: Factored out sources-related code to new Sources section

1775 04/03/2012 04:08 PM Aaron Marcuse-Kubitza

input.Makefile: $(srcMaps): Removed `$(filter-out maps/src.join.%.csv,...)` because maps/src.join.%.csv are no longer created

1774 04/03/2012 03:47 PM Aaron Marcuse-Kubitza

README.TXT: Schema changes: Split updating graphical ERD exports into separate section. Update graphical ERD exports: Added schemas/vegbien.ERD.core.pdf .

1773 04/03/2012 03:42 PM Aaron Marcuse-Kubitza

README.TXT: Added Datasource setup section with instructions to add a new datasource

1772 04/03/2012 03:38 PM Aaron Marcuse-Kubitza

Added inputs/ACAD

1771 04/03/2012 03:37 PM Aaron Marcuse-Kubitza

input.Makefile: Only setSvnIgnore the input dir, since it already exists and doesn't need to be added (inputs/Makefile adds it)

1770 04/03/2012 03:23 PM Aaron Marcuse-Kubitza

inputs/*/maps/DwC.specimens.csv: Removed extranenous XML meta info from DwC column root, since it now just needs to be present in the core via map mappings/DwC-VegBIEN.specimens.csv

1769 04/03/2012 03:22 PM Aaron Marcuse-Kubitza

union: Use new maps.merge_headers() to write properly combined header

1768 04/03/2012 03:21 PM Aaron Marcuse-Kubitza

maps.py: join_combinable(): Fixed roots_combinable() to run on col names instead of roots, which were passed in. merge_mappings(): Factored out mapping column combining into merge_mapping_cols(), which handles an optional prefer param as well to take the header_num env var. Added merge_headers().

1767 04/03/2012 03:17 PM Aaron Marcuse-Kubitza

util.py: Added sort_by_len(), shortest(), longest()

1766 04/03/2012 02:12 PM Aaron Marcuse-Kubitza

join: Use new maps.join_combinable() to check if column names match

1765 04/03/2012 02:11 PM Aaron Marcuse-Kubitza

maps.py: Added cols_combinable() and use it in combinable(). Added join_combinable() and associates helper functions. Added documentation labels to each section.

1764 04/03/2012 01:13 PM Aaron Marcuse-Kubitza

xml_parse.py: ConsecXmlInputStream: Removed read() because that's now defined in streams.FilterStream

1763 04/03/2012 01:11 PM Aaron Marcuse-Kubitza

xml_parse.py: parse_next(): Strip control characters from input stream because they mess up the parser

1762 04/03/2012 01:10 PM Aaron Marcuse-Kubitza

streams.py: FilterStream: Forward all reads to readline()

1761 04/03/2012 01:08 PM Aaron Marcuse-Kubitza

strings.py: Added is_ctrl() and strip_ctrl()

1760 04/03/2012 08:34 AM Aaron Marcuse-Kubitza

xml_parse.py: parse_next(): On parser error, advance to next XML document since the rest of the current document is corrupted

1759 04/03/2012 08:33 AM Aaron Marcuse-Kubitza

streams.py: Added consume(). Added documentation labels to each section.

1758 04/03/2012 08:23 AM Aaron Marcuse-Kubitza

bin/map: For XML inputs, wrap sys.stdin in a LineCountStream and use new xml_parse.docs_iter() on_error() to add input line # to XML parsing exceptions

1757 04/03/2012 08:21 AM Aaron Marcuse-Kubitza

xml_parse.py: Added on_error() handler to parse_next() (passed through by docs_iter()), so that the caller can add useful info like the input line # to the exception message, and decide not to suppress rather than re-raising the exception