Project

General

Profile

Activity

From 03/05/2012 to 04/03/2012

04/03/2012

08:26 PM Revision 1807: inputs/QMOR/test: Added initial accepted test outputs
Aaron Marcuse-Kubitza
08:26 PM Revision 1806: inputs/QMOR/maps: Added maps
Aaron Marcuse-Kubitza
08:20 PM Revision 1805: Added inputs/QMOR
Aaron Marcuse-Kubitza
08:14 PM Revision 1804: inputs/MT/test: Added initial accepted test outputs
Aaron Marcuse-Kubitza
08:14 PM Revision 1803: inputs/MT/maps: Added maps
Aaron Marcuse-Kubitza
08:13 PM Revision 1802: mappings/Makefile: DwC-VegBIEN.specimens.csv: Don't call remove_empty to produce it, because join now deals with empty mappings correctly by still raising a warning. Removed no longer needed intermediate DwC.ci-VegBIEN.specimens.csv.
Aaron Marcuse-Kubitza
08:09 PM Revision 1801: join: Also print "No join mapping" warning if a join mapping was found but it was empty. The warning in that case is actually "No non-empty join mapping" to distinguish it from a mapping that's missing entirely. input.Makefile: missing_mappings: Support new "No join mapping" error message.
Aaron Marcuse-Kubitza
08:08 PM Revision 1800: join: Also print "No join mapping" warning if a join mapping was found but it was empty. The warning in that case is actually "No non-empty join mapping" to distinguish it from a mapping that's missing entirely. input.Makefile: missing_mappings: Support new "No join mapping" error message.
Aaron Marcuse-Kubitza
07:33 PM Revision 1799: Added inputs/MT
Aaron Marcuse-Kubitza
07:26 PM Revision 1798: Added disown_all to disown all running jobs
Aaron Marcuse-Kubitza
07:26 PM Revision 1797: stop_imports: Call jobspecs relative to $selfDir, rather than assuming it will be run from the svn root dir
Aaron Marcuse-Kubitza
07:18 PM Revision 1796: union: Call maps.merge_headers() using **dict(prefer=header_num) instead of just prefer=header_num in order to work on Python 2.5.2 (which nimoy is running)
Aaron Marcuse-Kubitza
07:00 PM Revision 1795: inputs/ACAD/test: Accepted initial test outputs
Aaron Marcuse-Kubitza
07:00 PM Revision 1794: Added inputs/ACAD/maps/ maps
Aaron Marcuse-Kubitza
06:59 PM Revision 1793: Accepted new test outputs resulting from the addition of the id -> occurrenceID mapping in mappings/DwC1-DwC2.specimens.csv
Aaron Marcuse-Kubitza
06:57 PM Revision 1792: inputs/SALVIAS*/maps: Cleaned up maps for the first time since all via maps became subject to cleanup
Aaron Marcuse-Kubitza
06:55 PM Revision 1791: input.Makefile: Removed no longer needed default "maps/.$(via).%.csv.last_cleanup" rule
Aaron Marcuse-Kubitza
06:54 PM Revision 1790: input.Makefile: Maps building: Via maps cleanup: Added `env ignore=1` since with the switch to subtracting $(coreMap), all inputs will attempt to subtract some map, even if it's not subtractable
Aaron Marcuse-Kubitza
06:47 PM Revision 1789: input.Makefile: Don't clean src maps, only build them
Aaron Marcuse-Kubitza
06:45 PM Revision 1788: inputs/ARIZ/maps/DwC.specimens.csv: Re-cleaned up to take advantage of additional entries now removed by subtract
Aaron Marcuse-Kubitza
06:36 PM Revision 1787: input.Makefile: Maps building: Via maps cleanup: Subtract $(coreMap) instead of $(coreSelfMap) so that entries whose input and output maps to the same place are subtracted as well
Aaron Marcuse-Kubitza
06:35 PM Revision 1786: subtract: Also remove mappings whose input and output maps to the same non-empty value in map_1
Aaron Marcuse-Kubitza
06:32 PM Revision 1785: util.py: Added all_equal(), all_equal_ignore_none(), have_same_value()
Aaron Marcuse-Kubitza
05:45 PM Revision 1784: mappings/DwC1-DwC2.specimens.csv: Added id -> occurrenceID mapping
Aaron Marcuse-Kubitza
05:43 PM Revision 1783: inputs/SALVIAS-CSV/maps/VegX.%.full.csv: Regenerated using new src maps
Aaron Marcuse-Kubitza
05:41 PM Revision 1782: mappings/DwC1-DwC2.specimens.csv: Added mappings from dcterms elements without namespace to with namespace
Aaron Marcuse-Kubitza
05:40 PM Revision 1781: inputs/SALVIAS-CSV: Built maps/src.%.csv
Aaron Marcuse-Kubitza
05:24 PM Revision 1780: Added inputs/ACAD/maps/src.specimens.csv
Aaron Marcuse-Kubitza
05:23 PM Revision 1779: input.Makefile: Maps building: Autogen src maps with known table names. Sources: $(withCatSrcs): Fixed bug where substitution pattern did not contain %.
Aaron Marcuse-Kubitza
05:22 PM Revision 1778: Added src_map to make a source map spreadsheet from a CSV header
Aaron Marcuse-Kubitza
04:32 PM Revision 1777: input.Makefile: Split Maps section into "Existing maps discovery" and "Maps building" sections. Sources: Added cat, cat-% to cat out sources.
Aaron Marcuse-Kubitza
04:17 PM Revision 1776: input.Makefile: Factored out sources-related code to new Sources section
Aaron Marcuse-Kubitza
04:08 PM Revision 1775: input.Makefile: $(srcMaps): Removed `$(filter-out maps/src.join.%.csv,...)` because maps/src.join.%.csv are no longer created
Aaron Marcuse-Kubitza
03:47 PM Revision 1774: README.TXT: Schema changes: Split updating graphical ERD exports into separate section. Update graphical ERD exports: Added schemas/vegbien.ERD.core.pdf .
Aaron Marcuse-Kubitza
03:42 PM Revision 1773: README.TXT: Added Datasource setup section with instructions to add a new datasource
Aaron Marcuse-Kubitza
03:38 PM Revision 1772: Added inputs/ACAD
Aaron Marcuse-Kubitza
03:37 PM Revision 1771: input.Makefile: Only setSvnIgnore the input dir, since it already exists and doesn't need to be added (inputs/Makefile adds it)
Aaron Marcuse-Kubitza
03:23 PM Revision 1770: inputs/*/maps/DwC.specimens.csv: Removed extranenous XML meta info from DwC column root, since it now just needs to be present in the core via map mappings/DwC-VegBIEN.specimens.csv
Aaron Marcuse-Kubitza
03:22 PM Revision 1769: union: Use new maps.merge_headers() to write properly combined header
Aaron Marcuse-Kubitza
03:21 PM Revision 1768: maps.py: join_combinable(): Fixed roots_combinable() to run on col names instead of roots, which were passed in. merge_mappings(): Factored out mapping column combining into merge_mapping_cols(), which handles an optional prefer param as well to take the header_num env var. Added merge_headers().
Aaron Marcuse-Kubitza
03:17 PM Revision 1767: util.py: Added sort_by_len(), shortest(), longest()
Aaron Marcuse-Kubitza
02:12 PM Revision 1766: join: Use new maps.join_combinable() to check if column names match
Aaron Marcuse-Kubitza
02:11 PM Revision 1765: maps.py: Added cols_combinable() and use it in combinable(). Added join_combinable() and associates helper functions. Added documentation labels to each section.
Aaron Marcuse-Kubitza
01:13 PM Revision 1764: xml_parse.py: ConsecXmlInputStream: Removed read() because that's now defined in streams.FilterStream
Aaron Marcuse-Kubitza
01:11 PM Revision 1763: xml_parse.py: parse_next(): Strip control characters from input stream because they mess up the parser
Aaron Marcuse-Kubitza
01:10 PM Revision 1762: streams.py: FilterStream: Forward all reads to readline()
Aaron Marcuse-Kubitza
01:08 PM Revision 1761: strings.py: Added is_ctrl() and strip_ctrl()
Aaron Marcuse-Kubitza
08:34 AM Revision 1760: xml_parse.py: parse_next(): On parser error, advance to next XML document since the rest of the current document is corrupted
Aaron Marcuse-Kubitza
08:33 AM Revision 1759: streams.py: Added consume(). Added documentation labels to each section.
Aaron Marcuse-Kubitza
08:23 AM Revision 1758: bin/map: For XML inputs, wrap sys.stdin in a LineCountStream and use new xml_parse.docs_iter() on_error() to add input line # to XML parsing exceptions
Aaron Marcuse-Kubitza
08:21 AM Revision 1757: xml_parse.py: Added on_error() handler to parse_next() (passed through by docs_iter()), so that the caller can add useful info like the input line # to the exception message, and decide not to suppress rather than re-raising the exception
Aaron Marcuse-Kubitza
07:19 AM Revision 1756: VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field identificationLabel2 to identificationLabel. Distinguish what are now two identificationLabel fields of the same name by tagging each one with [@id=2] or [@id=1]. inputs/SALVIAS-CSV/maps/VegX.organisms.csv: Merge tag1/stem_tag1 and tag2/stem_tag2 using _alt, since they are never set to different values when both are not NULL (although sometimes just one or just the other is not NULL).
Aaron Marcuse-Kubitza

04/02/2012

05:37 PM Revision 1755: VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field tag2 to identificationLabel2 to reflect that it will become a second instance of identificationLabel
Aaron Marcuse-Kubitza
05:31 PM Revision 1754: VegX-VegBIEN.organisms.csv: Re-mapped individualOrganismObservation user-defined field lineCover to already existing volumeCanopy
Aaron Marcuse-Kubitza
05:29 PM Revision 1753: VegX-VegBIEN.organisms.csv: Re-mapped individualOrganismObservation user-defined field cover to already existing attribute.coverPercent
Aaron Marcuse-Kubitza
05:13 PM Revision 1752: VegX-VegBIEN.organisms.csv: Re-mapped individualOrganismObservation user-defined field count to already existing aggregateOrganismObservation.aggregateValue
Aaron Marcuse-Kubitza
04:44 PM Revision 1751: vegbien.ERD.mwb: Fixed lines
Aaron Marcuse-Kubitza
01:50 PM Revision 1750: README.TXT: Documented that `make reinstall_db` will delete your VegBIEN DB
Aaron Marcuse-Kubitza
01:48 PM Revision 1749: README.TXT: Documented that `make empty_db` will delete your VegBIEN DB
Aaron Marcuse-Kubitza
01:44 PM Revision 1748: root Makefile: empty_db: Confirm deletion just like for rm_db. rm_db: put $(confirmRmDb) on a separate line and move the $(error) call to the main $(confirm) macro since you always want to abort make if the user cancels (not just not run that command).
Aaron Marcuse-Kubitza
01:34 PM Revision 1747: root Makefile: rm_db: If user cancels, abort in case target was reinstall_db to prevent installing
Aaron Marcuse-Kubitza
01:28 PM Revision 1746: root Makefile: core, rm_core: Fixed bug where no longer existing prerequisites postgres_user, rm_postgres_user were not removed
Aaron Marcuse-Kubitza
01:25 PM Revision 1745: root Makefile: rm_db: Confirm deletion with user. Merged postgres_user, rm_postgres_user into db, rm_db so that deletion confirmation applies to user deletion as well (which would indirectly cause the DB to be deleted).
Aaron Marcuse-Kubitza
01:04 PM Revision 1744: README.TXT: Testing: Updated to add missing mappings
Aaron Marcuse-Kubitza
01:03 PM Revision 1743: root Makefile: test-all: Added missing_mappings
Aaron Marcuse-Kubitza
01:00 PM Revision 1742: Moved maps validation targets from main Makefile to input.Makefile. main Makefile: maps validation: Summarize the output of the inputs' maps validations.
Aaron Marcuse-Kubitza
12:22 PM Revision 1741: Makefile: Also find missing input mappings, in addition to missing join mappings
Aaron Marcuse-Kubitza
12:21 PM Revision 1740: join: Also produce warnings for no input mapping (if no comment explaining why no input mapping), in addition to no join mapping
Aaron Marcuse-Kubitza
12:21 PM Revision 1739: join: Also produce warnings for no input mapping (if no comment explaining why no input mapping), in addition to no join mapping
Aaron Marcuse-Kubitza
12:20 PM Revision 1738: inputs/NY/maps/DwC.specimens.csv: Documented why there is no input mapping for key
Aaron Marcuse-Kubitza
11:29 AM Revision 1737: VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined fields stem* to remove the stem* prefix to be consistent with VegBIEN
Aaron Marcuse-Kubitza
11:23 AM Revision 1736: VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation/plotObservation user-defined fields sourceaccessioncode to sourceAccessionCode to be consistent with VegX case sensitivity
Aaron Marcuse-Kubitza
11:19 AM Revision 1735: VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field interceptCm to lineCover to be consistent with VegBIEN
Aaron Marcuse-Kubitza
11:18 AM Revision 1734: VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field individualCode to authorPlantCode to be consistent with VegBIEN
Aaron Marcuse-Kubitza
11:17 AM Revision 1733: VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field htFirstBranchM to heightFirstBranch to be consistent with VegBIEN
Aaron Marcuse-Kubitza
11:15 AM Revision 1732: VegX-VegBIEN.organisms.csv: Renamed individualOrganismObservation user-defined field coverPercent to cover to be consistent with VegBIEN
Aaron Marcuse-Kubitza
11:12 AM Revision 1731: VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field siltPercent to silt to be consistent with VegBIEN
Aaron Marcuse-Kubitza
11:11 AM Revision 1730: VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field sandPercent to sand to be consistent with VegBIEN
Aaron Marcuse-Kubitza
11:10 AM Revision 1729: VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field pottasium to potassium to be consistent with VegBIEN
Aaron Marcuse-Kubitza
11:08 AM Revision 1728: VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field organicPercent to organic to be consistent with VegBIEN
Aaron Marcuse-Kubitza
11:07 AM Revision 1727: VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field clayPercent to clay to be consistent with VegBIEN
Aaron Marcuse-Kubitza
11:06 AM Revision 1726: VegX-VegBIEN.organisms.csv: Renamed abioticObservation user-defined field cationCap to cationExchangeCapacity to be consistent with VegBIEN
Aaron Marcuse-Kubitza
11:02 AM Revision 1725: VegX-VegBIEN.organisms.csv: Renamed plotObservation user-defined field precipMm to precipitation to be consistent with VegBIEN
Aaron Marcuse-Kubitza
10:56 AM Revision 1724: VegX-VegBIEN.organisms.csv: Changed plotObservation user-defined field plotMethodology to /simpleUserdefined[name=method]/*ID/method/name
Aaron Marcuse-Kubitza
10:46 AM Task #304 (Resolved): Complete full dataset imports to VegBIEN via VegX of NYBG and SALVIAS
Aaron Marcuse-Kubitza
10:45 AM Task #319 (Resolved): Update statistics/lists of user-defined fields in use in VegX and VegBIEN
* *[[VegX]]*: "Convert user-defined fields to first-class fields"
* *[[VegBIEN schema]]*: "Remaining user-defined fi...
Aaron Marcuse-Kubitza
10:43 AM Task #320: Convert user-defined VegX fields to first-class fields
user-defined fields to convert: *[[VegX]]*: "Convert user-defined fields to first-class fields" Aaron Marcuse-Kubitza
10:42 AM Task #321 (Resolved): Convert user-defined VegBIEN fields to first-class fields
Aaron Marcuse-Kubitza
10:42 AM Task #373 (Resolved): map all specimens data in raw_data
Aaron Marcuse-Kubitza
09:47 AM Revision 1723: schemas/postgresql.nimoy.conf: Increased default_statistics_target to 8.4 default value to improve execution query plans
Aaron Marcuse-Kubitza
09:43 AM Revision 1722: Added schemas/postgresql.Mac.conf (for tuning developers' local testing DBs)
Aaron Marcuse-Kubitza
09:42 AM Revision 1721: schemas/postgresql*.conf: Increased checkpoint_segments and checkpoint_completion_target so that checkpoints (performance intensive) are written less often and load-balanced better
Aaron Marcuse-Kubitza
08:55 AM Task #289 (Resolved): look for formal mapping mechanism
Aaron Marcuse-Kubitza
08:35 AM Revision 1720: xml_dom.py: Don't print whitespace from parsed XML document when pretty-printing XML. minidom modifications section: Added subsection labels for the class each modification applies to.
Aaron Marcuse-Kubitza
08:20 AM Revision 1719: Parser.py: Renamed SyntaxException to SyntaxError because it's an unexpected condition that should exit the program, a.k.a. an error
Aaron Marcuse-Kubitza
08:05 AM Revision 1718: bin/map: process_rows(): When iterating over each row, only retrieve the next row if the end (limit of # of rows) has not been reached. This prevents the next row from being fetched, possibly causing an entire additional consecutive XML document to be parsed, if the limit has already been reached. This is primarily useful for XML inputs with a ".0.top" segment prepended before the other documents, which contains just the first two nodes for fast parsing of this smaller XML document when only the first two nodes are needed for testing. Without this fix, the ".0.top" segment would have needed to contain the first three nodes instead.
Aaron Marcuse-Kubitza
07:55 AM Revision 1717: inputs/XAL: Accepted initial test outputs
Aaron Marcuse-Kubitza
07:54 AM Revision 1716: inputs/XAL: Added maps
Aaron Marcuse-Kubitza
07:52 AM Revision 1715: bin/map: Extended consecutive XML document support to direct-XML inputs (without a map spreadsheet). Factored out consecutive XML document row-iteration code into helper method get_rows() which does the iters.flatten() and itertools.imap() calls.
Aaron Marcuse-Kubitza
07:37 AM Revision 1714: bin/map: Fixed bug in iteration over consecutive XML documents where only the first element of the first document was processed. Use of iters.flatten() and itertools.imap() fixes this problem so that the consecutive XML documents are regarded as a continuous stream of rows.
Aaron Marcuse-Kubitza
07:16 AM Revision 1713: bin/map: Use new xml_parse.docs_iter() to iterate over each consecutive XML document in stdin
Aaron Marcuse-Kubitza
07:16 AM Revision 1712: xml_parse.py: Added support for parsing consecutive XML documents in a stream
Aaron Marcuse-Kubitza
07:01 AM Revision 1711: Added iters.py
Aaron Marcuse-Kubitza

03/29/2012

10:33 PM Revision 1710: streams.py: Added FilterStream. Changed TracedStream to use FilterStream.
Aaron Marcuse-Kubitza
10:24 PM Revision 1709: Moved parse_str() from xml_dom.py to xml_parse.py
Aaron Marcuse-Kubitza
10:24 PM Revision 1708: Added xml_parse.py
Aaron Marcuse-Kubitza
10:21 PM Revision 1707: streams.py: CaptureStream: Ignore start_str when recording and end_str when not recording
Aaron Marcuse-Kubitza
10:13 PM Revision 1706: streams.py: CaptureStream: Get each match as a separate array elem instead of concatenated together
Aaron Marcuse-Kubitza
09:59 PM Revision 1705: ch_root, repl, map: Use new maps.col_info() instead of parsing col name manually. This allows maps with prefixes containing ":" to be supported, without the ":" being misinterpreted as the label-root separator.
Aaron Marcuse-Kubitza
09:57 PM Revision 1704: maps.py: Added col_info() to get label, root, prefixes from col_name. Added col_formats() for use by combinable(). Use new col_formats() in combinable(). Removed no longer needed col_label().
Aaron Marcuse-Kubitza
09:55 PM Revision 1703: input.Makefile: Use with_cat instead of with_cat_csv for XML sources
Aaron Marcuse-Kubitza
09:54 PM Revision 1702: Renamed inputs/XAL/src/digir.xml.make to digir.specimens.xml.make so it would generate an output file with the proper table name
Aaron Marcuse-Kubitza
08:53 PM Revision 1701: bin/map: Support concatenated XML documents for XML inputs
Aaron Marcuse-Kubitza
08:46 PM Revision 1700: bin/map: Merged XML inputs with and without a map into the in_is_xml section
Aaron Marcuse-Kubitza
08:33 PM Revision 1699: digir_client: Output profiling information
Aaron Marcuse-Kubitza
08:21 PM Revision 1698: Added inputs/XAL/src/digir.xml.make
Aaron Marcuse-Kubitza
08:21 PM Revision 1697: digir_client: Import http to take advantage of httplib modifications to deal with IncompleteRead errors
Aaron Marcuse-Kubitza
08:20 PM Revision 1696: Added http.py with httplib modifications to deal with IncompleteRead errors
Aaron Marcuse-Kubitza
07:46 PM Revision 1695: digir_client: Fixed bug where chunk size was being adjusted even if count == None (indicating no determinable last chunk), causing a type mismatch between None and the integer total
Aaron Marcuse-Kubitza
07:28 PM Revision 1694: input.Makefile: Removed no longer needed "ifneq ($(wildcard test/),)" guard around Testing section because all inputs now have a test subdir
Aaron Marcuse-Kubitza
07:25 PM Revision 1693: Added inputs/XAL
Aaron Marcuse-Kubitza
07:22 PM Revision 1692: digir_client: Made chunk_size a configurable env var. Removed schema env var because schema is always the same for DiGIR (can be different for TAPIR). Make sure output ends in a newline so that consecutive XML documents are on different lines.
Aaron Marcuse-Kubitza
07:13 PM Revision 1691: digir_client: Fixed bug where chunk_size records would always be retrieved even in the last chunk, which ignored any manual count the user might have set via the "n" option
Aaron Marcuse-Kubitza
07:07 PM Revision 1690: digir_client: Repeatedly retrieve data in chunks. Provide match count. Added section comments.
Aaron Marcuse-Kubitza
06:52 PM Revision 1689: xpath.py: Added get_value() to run get_1() and returns the value of any result node
Aaron Marcuse-Kubitza
06:51 PM Revision 1688: xml_dom.py: Added parse_str()
Aaron Marcuse-Kubitza
06:13 PM Revision 1687: digir_client: Use new streams.copy() to copy returned data to stdout
Aaron Marcuse-Kubitza
06:13 PM Revision 1686: streams.py: Added copy(). Added section comment for traced streams.
Aaron Marcuse-Kubitza
06:06 PM Revision 1685: digir_client: Label debugging output
Aaron Marcuse-Kubitza
05:54 PM Revision 1684: streams.py: Renamed LineCountOutputStream to LineCountStream since TracedStream now works on both input and output streams
Aaron Marcuse-Kubitza
05:52 PM Revision 1683: digir_client: Capture diagnostics for later use in determining next start/count values
Aaron Marcuse-Kubitza
05:51 PM Revision 1682: streams.py: Added CaptureStream to wrap a stream, capturing matching text. Renamed TracedOutputStream to TracedStream and made it work on both input and output streams. Made TracedStream inherit from WrapStream so that close() would be forwarded properly.
Aaron Marcuse-Kubitza
05:23 PM Revision 1681: bin/map: Changed XML input prefix handling to prepend prefix directly to XPath instead of separating it from the XPath with a "/". Changed get_with_prefix() to use new strings.with_prefixes().
Aaron Marcuse-Kubitza
05:21 PM Revision 1680: strings.py: Added with_prefixes()
Aaron Marcuse-Kubitza
04:56 PM Revision 1679: digir_client: Made schema customizable
Aaron Marcuse-Kubitza
04:35 PM Revision 1678: digir_client: Set header sendTime, source dynamically. In debug mode, print the request XML.
Aaron Marcuse-Kubitza
04:03 PM Revision 1677: Added local_ip to get local IP address
Aaron Marcuse-Kubitza
03:48 PM Revision 1676: bin/map: Added prefixes support for XML inputs
Aaron Marcuse-Kubitza

03/28/2012

11:12 PM Revision 1675: digir_client: Filter by darwin:Kingdom=PLANTAE because presumably all records will have this. Don't debug-print URL.
Aaron Marcuse-Kubitza
11:07 PM Revision 1674: Added initial bin/digir_client
Aaron Marcuse-Kubitza
07:58 PM Revision 1673: Renamed timeout.py to timeouts.py. Renamed timeout_ vars to timeout.
Aaron Marcuse-Kubitza
07:52 PM Revision 1672: opts.py: get_env_var(): default defaults to None
Aaron Marcuse-Kubitza
06:35 PM Revision 1671: inputs/SpeciesLink: Accepted test outputs for new TAPIR download
Aaron Marcuse-Kubitza
06:03 PM Revision 1670: bin/tapir/tapir2flat.php: Output to specieslink.specimens.csv instead of specieslink.txt so that the output file can be used right away without renaming
Aaron Marcuse-Kubitza
05:52 PM Revision 1669: inputs/REMIB/src/nodes.make: Stop after a configurable # of empty responses (indicating no more nodes), instead of at a preset node ID, because there seem to be many more nodes than are listed on the web form
Aaron Marcuse-Kubitza

03/27/2012

11:10 PM Revision 1668: input.Makefile: import/rotate: Add "." before the date
Aaron Marcuse-Kubitza
11:08 PM Revision 1667: input.Makefile: Added targets for editing import: import/rotate, import/rm
Aaron Marcuse-Kubitza
09:41 PM Revision 1666: bin/tapir/tapir2flat.php: Fixed XML parsing to strip control chars so DOMDocument::loadXML() wouldn't complain about "PCDATA invalid Char value 8 in Entity", etc.
Aaron Marcuse-Kubitza
09:07 PM Revision 1665: main Makefile: php-Darwin: Added instruction to set PHPRC if needed
Aaron Marcuse-Kubitza
09:03 PM Revision 1664: Added inputs/SpeciesLink/src/tapir.make
Aaron Marcuse-Kubitza
09:03 PM Revision 1663: input.Makefile: `src/%: src/%.make`: Don't tee recipe's stderr to make's stderr, because long-running make_scripts usually will be tracked using `tail -f`
Aaron Marcuse-Kubitza
09:00 PM Revision 1662: input.Makefile: `src/%: src/%.make`: Name the log file using the make_script name instead of the output file name
Aaron Marcuse-Kubitza
08:31 PM Revision 1661: cat_csv: If dialect == None, ignore that file because it's empty
Aaron Marcuse-Kubitza
08:30 PM Revision 1660: csvs.py: stream_info(): If header_line == '', set dialect to None rather than trying (and failing) to auto-detect it
Aaron Marcuse-Kubitza
08:19 PM Revision 1659: input.Makefile: Use new sort_filenames to putmultiple numbered sources in the correct order, dealing correctly with embedded numbers that aren't padded with leading zeros
Aaron Marcuse-Kubitza
08:18 PM Revision 1658: Added sort_filenames to sort a list of filenames, comparing embedded numbers numerically instead of lexicographically
Aaron Marcuse-Kubitza
07:18 PM Revision 1657: schemas/postgresql.conf: Decreased shared_buffers again because 4000MB wasn't enough less than 4GB SHMMAX
Aaron Marcuse-Kubitza
07:16 PM Revision 1656: schemas/postgresql.conf: Expressed shared_buffers in MB, since decimal GB doesn't seem to work anymore on 9.1
Aaron Marcuse-Kubitza
07:14 PM Revision 1655: schemas/postgresql.conf: Decreased shared_buffers to 3.9GB, slightly less than SHMMAX
Aaron Marcuse-Kubitza
07:11 PM Revision 1654: schemas/postgresql.conf: Optimized again using same changes as were applied to 8.4 version
Aaron Marcuse-Kubitza
07:10 PM Revision 1653: schemas/postgresql.conf: Replaced with original 9.1 version
Aaron Marcuse-Kubitza
07:03 PM Revision 1652: schemas/postgresql.conf: Optimized using analogous settings as postgresql.nimoy.conf
Aaron Marcuse-Kubitza
06:43 PM Revision 1651: inputs/REMIB/src/nodes.make: Don't abort entire import on empty response, because an empty response is also returned for nodes that are temporarily down, not just nodes that don't exist (assumed to be after the highest numbered node). Instead, stop import after 150 nodes if user did not specify an explicit # nodes.
Aaron Marcuse-Kubitza
05:50 PM Revision 1650: inputs/REMIB/src/nodes.make: Abort prefix on empty response using break, rather than just done = True, to avoid running any more code except the finally block. Moved metadata row validation outside metadata row retrieval try-except block.
Aaron Marcuse-Kubitza
05:41 PM Revision 1649: inputs/REMIB/src/nodes.make: If a read times out, abort the entire node rather than just the prefix to avoid waiting 20 sec for each of 26*26 prefixes
Aaron Marcuse-Kubitza
05:40 PM Revision 1648: profiling.py ItersProfiler, exc.py ExPercentTracker: Only output fraction of rows with errors if self.iter_ct > 0, to avoid divide-by-zero error
Aaron Marcuse-Kubitza
04:55 PM Revision 1647: inputs/REMIB/src/nodes.make: Fixed bug where row count was output in the middle of the row processing code, instead of after the first row is processed and the row count incremented. This removes "Processed 0 row(s)" messages at the beginning of every prefix.
Aaron Marcuse-Kubitza
04:40 PM Revision 1646: inputs/REMIB/src/nodes.make: Support custom starting node ID and # nodes processed via env vars
Aaron Marcuse-Kubitza
04:29 PM Revision 1645: Renamed inputs/REMIB/src/nodes.all.0.header.specimens.csv to node.0.header.specimens.csv so it would sort correctly with the new output file names
Aaron Marcuse-Kubitza
04:27 PM Revision 1644: Renamed inputs/REMIB/src/nodes.all.specimens.csv.make to inputs/REMIB/src/nodes.make since it will not be used to generate nodes.all.specimens.csv. However, it can still be used with the `src/%.make` make target, but will generate a dummy empty output file "nodes".
Aaron Marcuse-Kubitza
04:21 PM Revision 1643: inputs/REMIB/src/nodes.all.specimens.csv.make: Write each node to a separate output file
Aaron Marcuse-Kubitza
04:00 PM Revision 1642: inputs/REMIB/src/nodes.all.specimens.csv.make: Raise InputException instead of AssertionError if invalid metadata row, so that it will be caught and printed instead of aborting the program
Aaron Marcuse-Kubitza
03:56 PM Revision 1641: inputs/REMIB/src/nodes.all.specimens.csv.make: Moved header reading code inside TimeoutException try-except block since read sometimes times out before the header is even read
Aaron Marcuse-Kubitza
03:55 PM Revision 1640: schemas/postgresql.nimoy.conf: Increased shared_buffers to 1.5GB since kernel.shmmax has been increased to 2GB
Aaron Marcuse-Kubitza

03/26/2012

11:07 PM Revision 1639: Renamed inputs/REMIB/src/remib_raw.0.header.specimens.txt to nodes.all.0.header.specimens.csv
Aaron Marcuse-Kubitza
10:57 PM Revision 1638: inputs/REMIB/src/nodes.all.specimens.csv.make: Increased read timeout
Aaron Marcuse-Kubitza
10:55 PM Revision 1637: inputs/REMIB/src/nodes.all.specimens.csv.make: Timeout stuck reads because sometimes nodes are offline, etc.
Aaron Marcuse-Kubitza
10:53 PM Revision 1636: exc.py: str_(): Strip trailing whitespace. print_ex(): Since str_() now strips trailing whitespace, strings.ensure_newl() is no longer necessary.
Aaron Marcuse-Kubitza
10:43 PM Revision 1635: streams.py: Added TimeoutInputStream and WrapStream. Changed StreamIter to use new WrapStream.
Aaron Marcuse-Kubitza
10:42 PM Revision 1634: Added timeout.py
Aaron Marcuse-Kubitza
10:25 PM Revision 1633: inputs/REMIB/src/nodes.all.specimens.csv.make: Download from all prefixes of all nodes. Stop when a node produces an empty response (not even an error), which indicates no more nodes. Changed status messages.
Aaron Marcuse-Kubitza
10:17 PM Revision 1632: input.Makefile: `src/%: src/%.make`: Append stderr to log file
Aaron Marcuse-Kubitza
09:21 PM Revision 1631: Added inputs/REMIB/src/nodes.all.specimens.csv.make to download REMIB data for all nodes
Aaron Marcuse-Kubitza
09:20 PM Revision 1630: Added streams.py for I/O, which contains StreamIter, TracedOutputStream, and LineCountOutputStream
Aaron Marcuse-Kubitza
09:20 PM Revision 1629: term.py: Added clear_line. Corrected file comment.
Aaron Marcuse-Kubitza
08:06 PM Revision 1628: Makefiles: Let subdir's Makefile decide whether to delete on error
Aaron Marcuse-Kubitza
08:05 PM Revision 1627: input.Makefile: Save partial outputs of aborted src make scripts
Aaron Marcuse-Kubitza
06:44 PM Revision 1626: input.Makefile: Fixed bug in `%: %.make` rule to use $< instead of $*
Aaron Marcuse-Kubitza
06:20 PM Revision 1625: mappings/DwC2-VegBIEN.specimens.csv: minimumElevationInMeters: Remove any "ca." prefix
Aaron Marcuse-Kubitza
06:19 PM Revision 1624: xml_func.py: _replace: Strip whitespace from the returned string
Aaron Marcuse-Kubitza
06:09 PM Revision 1623: csvs.py: Added TsvReader to support TSV quirks. Added reader_class(). reader_and_header(): Use reader_class() to automatically use TsvReader instead of csv.reader for TSVs. Added is_tsv() and use it where `dialect.delimiter == '\t'` was used.
Aaron Marcuse-Kubitza
06:06 PM Revision 1622: strings.py: Added extract_line_ending() and remove_line_ending(). ensure_newl(): Use new remove_line_ending(). Moved Parsing section to top since it is used by the other sections.
Aaron Marcuse-Kubitza
04:40 PM Revision 1621: csvs.py: stream_info(): Set dialect.quoting = csv.QUOTE_NONE for TSVs because they usually don't quote fields. Factored dialect detecting code into new function sniff().
Aaron Marcuse-Kubitza
03:45 PM Revision 1620: input.Makefile: verify: Added reverify option, which can be turned off to prevent regenerating the verify/%.out file from the DB (which can be time-consuming), and instead just diff verify/%.out with verify/%.ref
Aaron Marcuse-Kubitza

03/24/2012

10:31 PM Revision 1619: count_error_rows: Allow input to be specified as last arg(s) in addition to as stdin
Aaron Marcuse-Kubitza
10:30 PM Revision 1618: exc.py: ExPercentTracker: When diplaying fraction of iters that had errors, don't duplicate the iter_text ("row", etc.) in the numerator
Aaron Marcuse-Kubitza
10:27 PM Revision 1617: bin/map: Use new ExPercentTracker iter_num tracking to track distinct row #s with errors
Aaron Marcuse-Kubitza
10:27 PM Revision 1616: exc.py: ExPercentTracker: Track iter_nums of Exceptions as well, to distinguish how many distinct iters had errors
Aaron Marcuse-Kubitza
10:10 PM Revision 1615: Added bin/count_error_rows to count distinct rows with errors in `map` error messages
Aaron Marcuse-Kubitza
09:06 PM Revision 1614: input.Makefile: Changed "%.out: %.make" rule to "%: %.make" so that any file can be built from a corresponding .make file. This will allow flat files to be retrieved dynamically by running an associated .make file.
Aaron Marcuse-Kubitza
09:01 PM Revision 1613: xml_func.py: FormatException: Inherit from ExceptionWithCause instead of SyntaxError because a FormatException signals a different kind of error condition (related to the input value rather than the function syntax)
Aaron Marcuse-Kubitza
08:57 PM Revision 1612: xml_func.py: Renamed SyntaxException to SyntaxError because it's a user error signaling invalid mappings syntax
Aaron Marcuse-Kubitza
08:55 PM Revision 1611: xml_func.py: SyntaxException: Use ExceptionWithCause to combine msg and cause's msg because it now combines them on one line, which is needed for bin/error_stats to work properly
Aaron Marcuse-Kubitza
08:54 PM Revision 1610: exc.py: ExceptionWithCause: Prepend msg to cause's msg separated by ': ' instead of '\ncause: '
Aaron Marcuse-Kubitza
08:47 PM Revision 1609: xml_func.py: Changed SyntaxException to FormatException where the error was with the input data format rather than the mapping syntax
Aaron Marcuse-Kubitza
08:41 PM Revision 1608: mappings/VegX-VegBIEN.organisms.csv: slopeaspect: Apply new conversion _compass
Aaron Marcuse-Kubitza
08:40 PM Revision 1607: xml_func.py: Added _compass to convert a compass direction (N, NE, NNE, etc.) into a degree heading
Aaron Marcuse-Kubitza
08:38 PM Revision 1606: Added angles.py
Aaron Marcuse-Kubitza
07:37 PM Revision 1605: inputs/SpeciesLink/maps: Updated to use new TAPIR download
Aaron Marcuse-Kubitza
07:29 PM Revision 1604: input.Makefile: All targets can be specified with an optional trailing slash. This enables using tab completion to complete a target name which is also a subdir name, since tab completion appends a trailing slash.
Aaron Marcuse-Kubitza
07:23 PM Revision 1603: bin/tapir/tapir2flat.php: Fixed bug in row assembly where XML elements that weren't found were left out of the array, causing the columns to shift to the left
Aaron Marcuse-Kubitza
07:03 PM Revision 1602: xml_func.py: _map: Factored replacing code out into new function repl(), which can also be used by other XML funcs
Aaron Marcuse-Kubitza
06:46 PM Revision 1601: bin/tapir/tapir2flat.php: Turned off exiting after 3 successive failures, because it causes the import to abort and it doesn't seem to restart where it left off
Aaron Marcuse-Kubitza
03:41 PM Revision 1600: main Makefile: Added instructions to install PHP PEAR and HTTP_Request on Mac OS X
Aaron Marcuse-Kubitza
03:10 PM Revision 1599: Makefile: Added PHP section, which installs php-http-request
Aaron Marcuse-Kubitza
03:05 PM Revision 1598: Moved _archive/tapir2flatClient/trunk/client/ to bin/tapir/
Aaron Marcuse-Kubitza
03:03 PM Revision 1597: _archive/tapir2flatClient/trunk/client/tapir2flat.php: Upgraded to use fputcsv(). This should fix errors caused by embedded delimeters. configurableParams.php: Set default delimeter to ','.
Aaron Marcuse-Kubitza
02:42 PM Revision 1596: mappings/verify.specimens.sql: # species: Don't join at all on genus because DISTINCT is on the plantname_id rather than the plantname, which is already unique for a given genus because plantname_unique includes parent_id
Aaron Marcuse-Kubitza
02:39 PM Revision 1595: mappings/verify.specimens.sql: # species: Fixed to join separately on plantname_ancestor for genus and species
Aaron Marcuse-Kubitza
02:14 PM Revision 1594: input.Makefile: Moved log and trace files to new import subdir. Moved subdir-adding code from inputs/Makefile to input.Makefile.
Aaron Marcuse-Kubitza
01:49 PM Revision 1593: mappings/verify.specimens.sql: Updated for schema changes
Aaron Marcuse-Kubitza
01:36 PM Revision 1592: inputs/*: Added any missing standard subdirs
Aaron Marcuse-Kubitza
01:35 PM Revision 1591: inputs/Makefile: Added %/-add to re-add existing dirs
Aaron Marcuse-Kubitza
01:29 PM Revision 1590: inputs/Makefile: %-add: `svn mkdir` the datasource's standard subdirs
Aaron Marcuse-Kubitza

03/23/2012

06:52 PM Revision 1589: schemas/postgresql.nimoy.conf: Increased work_mem (for sorting) and maintenance_work_mem (for vacuum)
Aaron Marcuse-Kubitza
06:45 PM Revision 1588: schemas/postgresql.nimoy.conf: Reset shared_buffers to initial value 24MB because although kernel.shmmax is 32MB, only values up to 26MB seem to work
Aaron Marcuse-Kubitza
06:33 PM Revision 1587: schemas/postgresql.nimoy.conf: Set shared_buffers to SHMMAX
Aaron Marcuse-Kubitza
06:27 PM Revision 1586: Optimized schemas/postgresql.nimoy.conf
Aaron Marcuse-Kubitza
06:04 PM Revision 1585: Added schemas/postgresql.nimoy.conf
Aaron Marcuse-Kubitza
05:59 PM Revision 1584: bin/map: When profiling, print the profile_to destination file
Aaron Marcuse-Kubitza
05:53 PM Revision 1583: Added schemas/postgresql.conf
Aaron Marcuse-Kubitza
05:38 PM Revision 1582: xml_func.py: _date: When converting month name to number, wrap any ValueError in a SyntaxException
Aaron Marcuse-Kubitza
05:33 PM Revision 1581: xml_func.py: XML functions that assume their last argument is a value (_map, etc.): Use new helper function pop_value() to retrieve this value. Return None if value is None because this indicates the input is empty.
Aaron Marcuse-Kubitza
05:22 PM Revision 1580: xml_func.py: _date: Use format.str2int instead of int to convert date parts to int so that strange formatting will be parsed correctly
Aaron Marcuse-Kubitza
05:21 PM Revision 1579: format.py: clean_numeric(): Also fix some OCR errors
Aaron Marcuse-Kubitza
05:15 PM Revision 1578: filter_errors: Default to outputing only the first match
Aaron Marcuse-Kubitza
04:59 PM Revision 1577: xpath.py: Added append() to recursively append subpath to every leaf of a path tree. parse(): Use append() to fix bug in split path parsing where subpath was not added to every leaf of the tree, only the main leaf of the main branch and the main leaves of the other branches of the last element.
Aaron Marcuse-Kubitza
04:27 PM Revision 1576: exc.py: Changed to store multiple tracebacks in an exception, in case an exception is caught and re-raised inside an ExceptionWithCause wrapper. This preserves more of the traceback in this situation, because you get the ExceptionWithCause's traceback as well.
Aaron Marcuse-Kubitza
03:53 PM Revision 1575: input.Makefile: import: Removed verbose=1 because verbose mode is now automatically on (except in test mode)
Aaron Marcuse-Kubitza
03:52 PM Revision 1574: bin/map: verbose mode defaults to off in test mode and on otherwise
Aaron Marcuse-Kubitza
03:48 PM Revision 1573: bin/map: In verbose mode, print which input rows will be processed
Aaron Marcuse-Kubitza
03:40 PM Revision 1572: bin/map: n option: Defaults to 1 in test mode. Empty string "" is interpreted as None (previously n would have to be unset to specify None).
Aaron Marcuse-Kubitza
03:32 PM Revision 1571: bin/map: Added section comments to env var config retrieval. Reordered env var config retrieval to put DB config last, since these options are input-type specific and complex, and putting them first hides the more general other options.
Aaron Marcuse-Kubitza
03:31 PM Revision 1570: bin/map: Added section comments to env var config retrieval. Reordered env var config retrieval to put DB config last, since these options are input-type specific and complex, and putting them first hides the more general other options.
Aaron Marcuse-Kubitza
03:29 PM Revision 1569: inputs/SALVIAS*/maps/VegX.plots.csv: Updated _units for % -> decimal conversion to use new syntax
Aaron Marcuse-Kubitza
03:20 PM Revision 1568: inputs/SALVIAS*/maps/VegX.plots.csv: Updated _units for % -> decimal conversion to use new syntax
Aaron Marcuse-Kubitza
03:19 PM Revision 1567: xml_func.py: _units: If value can't be converted to float, wrap the ValueError in a SyntaxException
Aaron Marcuse-Kubitza
03:18 PM Revision 1566: units.py: convert(): Added support for unit conversions. Added initial unit conversion for % -> unitless. str2quantity(): Fixed regexp to match % as units. Set Quantity.__repr__ to quantity2str.
Aaron Marcuse-Kubitza
03:03 PM Revision 1565: units.py: convert(): Put "units == None" test after "quantity.units == units" test because a destination of no units might require a conversion for some input units (e.g. % -> unitless requires a division by 100)
Aaron Marcuse-Kubitza
02:51 PM Revision 1564: inputs/SALVIAS*/maps/VegX.organisms.csv: Habit: Ignore invalid values instead of generating a SyntaxException
Aaron Marcuse-Kubitza
02:47 PM Revision 1563: xml_dom.py: minidom modifications: Escape as many text strings as we use directly. This still leaves the tagName used by xml.dom.minidom.Element.writexml: It uses 'writer.write(indent+"<" + self.tagName)' and doesn't escape the tagName.
Aaron Marcuse-Kubitza
02:39 PM Revision 1562: xml_func.py: Made everything Unicode-safe by using strings.ustr instead of str
Aaron Marcuse-Kubitza
02:15 PM Task #369 (Resolved): get CTFS data dictionary
Aaron Marcuse-Kubitza
02:14 PM Task #384 (Resolved): prototype tree traversal algorithm
Aaron Marcuse-Kubitza
02:14 PM Task #385 (Resolved): implement mechanism to determine which specimenreplicates refer to the same specimen
Aaron Marcuse-Kubitza
12:48 PM Revision 1561: schemas/tree_cross-links.sql: Added comment for how to get the namedplace trigger from the provided plantname trigger
Aaron Marcuse-Kubitza
12:44 PM Revision 1560: vegbien.sql: Fixed bug in tree cross-link algorithm where recursion to descendants' ancestors did not use new to refer to the current node's plantname_id
Aaron Marcuse-Kubitza
12:39 PM Revision 1559: vegbien.sql: Fixed bug in tree cross-link algorithm to also insert ancestors for top-level nodes, because they now need an ancestor entry for themselves
Aaron Marcuse-Kubitza
12:28 PM Revision 1558: Added separate SQL file for tree cross-links code. A link to this can be e-mailed to people to review.
Aaron Marcuse-Kubitza
12:21 PM Revision 1557: vegbien.sql: Modified tree cross-link algorithm to add an "ancestor" for this node. This is useful for queries, because you don't have to separately test if the leaf node is the one you're looking for, in addition to that leaf node's ancestors.
Aaron Marcuse-Kubitza

03/22/2012

07:08 PM Revision 1556: README.TXT: Added instructions how to stop all running imports
Aaron Marcuse-Kubitza
06:59 PM Revision 1555: vegbien.sql: Added namedplace_update_ancestors and plantname_update_ancestors triggers to populate ancestor cross-links in new namedplace_ancestor and plantname_ancestor tables
Aaron Marcuse-Kubitza
06:07 PM Revision 1554: sql.py: insert() (and try_insert()): Added optional returning param to provide name of an inserted column (usually pkey) to return
Aaron Marcuse-Kubitza
05:41 PM Revision 1553: env_password: Print Usage message if run without initial "."
Aaron Marcuse-Kubitza
05:34 PM Revision 1552: Added bin/stop_imports to stop all running imports
Aaron Marcuse-Kubitza
05:33 PM Revision 1551: import_all: Print Usage message if was run without initial "."
Aaron Marcuse-Kubitza
04:52 PM Revision 1550: Renamed import-all to import_all to match convention of using underscores
Aaron Marcuse-Kubitza
04:39 PM Revision 1549: inputs/CTFS: Added remaining non-data src files
Aaron Marcuse-Kubitza
04:35 PM Revision 1548: Added CTFS data dictionary inputs/CTFS/src/ctfs-comments_worksheet.xls
Aaron Marcuse-Kubitza
04:33 PM Revision 1547: import-all: Fixed to display the datasource name in the job name instead of 'make ${input}import &'
Aaron Marcuse-Kubitza

03/20/2012

11:13 PM Revision 1546: import-all: disown each new import process to ignore SIGHUP
Aaron Marcuse-Kubitza
11:06 PM Revision 1545: Added jobspecs to extract jobspecs (%#) from (possibly filtered) `jobs` output
Aaron Marcuse-Kubitza
11:05 PM Revision 1544: README.TXT: Changed `make import &` to `. bin/import-all`
Aaron Marcuse-Kubitza
11:05 PM Revision 1543: README.TXT: Changed `make import &` to `. bin/import-all`
Aaron Marcuse-Kubitza
10:39 PM Revision 1542: main Makefile: import: Before running imports, print message that `. bin/import-all` can be used to import all inputs at once
Aaron Marcuse-Kubitza
10:38 PM Revision 1541: Added import-all to import all inputs at once
Aaron Marcuse-Kubitza
10:20 PM Revision 1540: mappings/DwC2-VegBIEN.specimens.csv: Mapped establishmentMeans, which contains growthform, iscultivated, isnative, etc. combined
Aaron Marcuse-Kubitza
10:11 PM Revision 1539: inputs/SALVIAS-CSV/maps/VegX.organisms.csv: habit: Updated mapping to match equivalent SALVIAS mapping
Aaron Marcuse-Kubitza
10:10 PM Revision 1538: xml_func.py: _map: Instead of _closed special entry, make all maps closed by default and open them if special entry "*=*" is present. Support using a _map to filter values by interpreting special entry "*=" as removing all values not explicitly specified, and by interpreting special value "*" as keeping input value the same.
Aaron Marcuse-Kubitza
10:08 PM Revision 1537: xml_func.py: _map: Instead of _closed special entry, make all maps closed by default and open them if special entry "*=*" is present. Support using a _map to filter values by interpreting special entry "*=" as removing all values not explicitly specified, and by interpreting special value "*" as keeping input value the same.
Aaron Marcuse-Kubitza
09:19 PM Revision 1536: xml_func.py: _date: On error "month must be in 1..12", try swapping month and day
Aaron Marcuse-Kubitza
09:13 PM Revision 1535: xml_func.py: _date: On error "month must be in 1..12", try swapping month and day
Aaron Marcuse-Kubitza
08:36 PM Revision 1534: row: Support getting multiple rows. Document that does *not* handle embedded newlines.
Aaron Marcuse-Kubitza
08:19 PM Revision 1533: mappings/Makefile: Removed no longer needed DwC-VegBIEN.specimens.no_empty.csv
Aaron Marcuse-Kubitza
08:18 PM Revision 1532: input.Makefile: Removed no longer needed $(join) command
Aaron Marcuse-Kubitza
08:15 PM Revision 1531: input.Makefile: Removed no longer needed src join maps
Aaron Marcuse-Kubitza
08:12 PM Revision 1530: input.Makefile: Generate VegBIEN maps from full via maps in order to include all input columns if a src map was provided. This causes the VegBIEN join process to produce *all* the "No join mapping" errors for that datasource, not just those for fields in the (non-full) via map. maps/src.join.*.csv should no longer be needed for producing "No join mapping" errors.
Aaron Marcuse-Kubitza
08:03 PM Revision 1529: mappings/Makefile: Generate DwC-VegBIEN.specimens.csv from new intermediate DwC.ci-VegBIEN.specimens.csv using $(removeEmpty) so that "No join mapping" errors will be reported when maps are joined to it. Deprecate DwC-VegBIEN.specimens.no_empty.csv because it's now identical to DwC-VegBIEN.specimens.csv.
Aaron Marcuse-Kubitza
07:45 PM Revision 1528: Added inputs/NY/maps/src.specimens.csv
Aaron Marcuse-Kubitza
07:41 PM Revision 1527: Added reverse_join to inner-join two map spreadsheets in the opposite order they are specified in
Aaron Marcuse-Kubitza
07:36 PM Revision 1526: input.Makefile: Intersect the generated VegBIEN and full via maps with the src map, if it exists. This reduces the size of the autogen maps significantly by including only the entries used by the datasource.
Aaron Marcuse-Kubitza
07:34 PM Revision 1525: intersect: Compare columns based on specified compare_col_nums, just like subtract
Aaron Marcuse-Kubitza
06:50 PM Revision 1524: input.Makefile: Use var $(selfMap) instead of spelling out $(bin)/cols 0 0
Aaron Marcuse-Kubitza
06:36 PM Revision 1523: mappings/DwC2-VegBIEN.specimens.csv: Mapped continent
Aaron Marcuse-Kubitza
06:20 PM Revision 1522: inputs/SpeciesLink/maps/DwC.specimens.csv: Mapped remaining fields
Aaron Marcuse-Kubitza
06:19 PM Revision 1521: inputs/SpeciesLink/maps/DwC.specimens.csv: Mapped remaining fields
Aaron Marcuse-Kubitza
06:08 PM Revision 1520: inputs/SpeciesLink/maps/src.specimens.csv: Fixed bug where prefixes had not been removed from fields, which prevented join mappings from being found for any of the fields
Aaron Marcuse-Kubitza
06:08 PM Revision 1519: main Makefile: Added missing_joins to determine which input fields are missing join mappings
Aaron Marcuse-Kubitza
05:47 PM Revision 1518: xml_func.py: SyntaxException: Inherit from exc.ExceptionWithCause so the traceback will be populated with the cause's traceback instead of the SyntaxException wrapper's traceback
Aaron Marcuse-Kubitza
05:35 PM Revision 1517: Added inputs/UNCC/test with accepted test outputs
Aaron Marcuse-Kubitza
05:35 PM Revision 1516: Added inputs/UNCC/maps
Aaron Marcuse-Kubitza
05:34 PM Revision 1515: xml_func.py: _date: month: Convert month names to numbers before casting everything to int
Aaron Marcuse-Kubitza
05:27 PM Revision 1514: xml_func.py: _date: Refactored to convert items to dict right away, and use iteritems() for later type conversion. This will enable month names to be converted before casting everything to int.
Aaron Marcuse-Kubitza
04:47 PM Revision 1513: mappings/Makefile: Sort mappings/DwC.self.specimens.csv so that entries can more easily be found when using it as a DwC terms reference
Aaron Marcuse-Kubitza

03/19/2012

09:55 PM Revision 1512: Added inputs/UNCC
Aaron Marcuse-Kubitza
09:50 PM Revision 1511: Added inputs/U/test with accepted test outputs
Aaron Marcuse-Kubitza
09:49 PM Revision 1510: inputs/U/maps/DwC.specimens.csv: Mapped most of the remaining fields
Aaron Marcuse-Kubitza
09:34 PM Revision 1509: input.Makefile: Clean up via maps when they change by subtracting the via format's self map from the via map (the comments column is ignored in determining which entries are redundant, and empty entries with a matching input column are also removed)
Aaron Marcuse-Kubitza
09:29 PM Revision 1508: subtract: Fixed bug where entries were removed even if maps were not combinable and ignore was off
Aaron Marcuse-Kubitza
09:27 PM Revision 1507: union: Fixed bug where combinable was not saved for use in deciding whether to add entries in map 1 that weren't already defined
Aaron Marcuse-Kubitza
09:25 PM Revision 1506: inputs/U/maps: Set svn props
Aaron Marcuse-Kubitza
09:20 PM Revision 1505: subtract: Also remove nonexplicit empty mappings whose input col is in map 1
Aaron Marcuse-Kubitza
09:15 PM Revision 1504: maps.py: Added is_nonexplicit_empty_mapping()
Aaron Marcuse-Kubitza
09:03 PM Revision 1503: subtract: Use new maps.combinable() to compare column headers, which allows more flexibility in combining maps
Aaron Marcuse-Kubitza
09:01 PM Revision 1502: union: Use new maps.combinable()
Aaron Marcuse-Kubitza
09:01 PM Revision 1501: maps.py: Added col_label() and combinable()
Aaron Marcuse-Kubitza
08:54 PM Revision 1500: union: Use new strings.overlaps()
Aaron Marcuse-Kubitza
08:53 PM Revision 1499: strings.py: Added overlaps()
Aaron Marcuse-Kubitza
08:46 PM Revision 1498: vegbien.sql: Fixed sytnax error in taxonclass enum: missing comma at end of element
Aaron Marcuse-Kubitza
08:38 PM Revision 1497: inputs/*/maps/DwC.specimens.csv: Ran through `cols *` to standardize CSV format to that generated by Python
Aaron Marcuse-Kubitza
08:35 PM Revision 1496: cols: If column number of "*" given, get all columns
Aaron Marcuse-Kubitza
08:32 PM Revision 1495: bin/subtract: If no compare columns given, compare on all columns instead of column 0
Aaron Marcuse-Kubitza
08:31 PM Revision 1494: util.py: list_subset(): Support special idxs value None, which returns entire list
Aaron Marcuse-Kubitza
08:22 PM Revision 1493: cat_csv: Added support for using - to cat stdin
Aaron Marcuse-Kubitza
08:18 PM Revision 1492: Added inputs/U/maps
Aaron Marcuse-Kubitza
07:32 PM Revision 1491: Added inputs/U
Aaron Marcuse-Kubitza
07:29 PM Revision 1490: Put inputs/REMIB/src/remib_raw.0.header.specimens.txt under version control
Aaron Marcuse-Kubitza
07:24 PM Revision 1489: Added inputs/REMIB/test with accepted test outputs
Aaron Marcuse-Kubitza
07:22 PM Revision 1488: Added inputs/REMIB/maps
Aaron Marcuse-Kubitza
07:20 PM Revision 1487: inputs/NCU-NCSC/maps/DwC.specimens.csv: Removed State->StateProvince mapping because that is now in mappings/DwC1-DwC2.specimens.csv
Aaron Marcuse-Kubitza
07:13 PM Revision 1486: mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema
Aaron Marcuse-Kubitza
06:51 PM Revision 1485: Added inputs/REMIB
Aaron Marcuse-Kubitza
06:09 PM Revision 1484: bin/map: Deal with fields that may be in the dataset under more than one prefix by getting all fields and coalesce()ing them (e.g. SpeciesLink has dwcore* and darwin1* columns for the same DwC field)
Aaron Marcuse-Kubitza
06:06 PM Revision 1483: util.py: Added coalesce()
Aaron Marcuse-Kubitza
05:40 PM Revision 1482: xpath_func.py: process(): Fixed bug where XPath elem's other_branches were not also processed
Aaron Marcuse-Kubitza
05:28 PM Revision 1481: row: Don't prepend header row because this feature prevents the program from being used on a pipeline. Sheets may be constructed in a pipeline if multiple segments need to be joined, e.g. with cat_csv.
Aaron Marcuse-Kubitza
05:09 PM Revision 1480: Added row to get a row of a spreadsheet, preceded by the header row
Aaron Marcuse-Kubitza
05:09 PM Revision 1479: bin programs: Fixed bug in Usage message where program name was not printed because unset variable $self was used instead of $0
Aaron Marcuse-Kubitza
05:08 PM Revision 1478: xml_func.py: _nullIf: types_by_name: Use strings.ustr instead of str to support Unicode values
Aaron Marcuse-Kubitza
04:40 PM Revision 1477: xml_func.py: _nullIf: If value not convertible, return it, because can't equal null. Refactored to store types by name in a dict instead of using if statements.
Aaron Marcuse-Kubitza
04:31 PM Revision 1476: units.py: convert(): raise MissingUnitsException if quantity doesn't have units. MissingUnitsException: Take Quantity input instead of str.
Aaron Marcuse-Kubitza
04:27 PM Revision 1475: inputs/NCU-NCSC/maps/DwC.specimens.csv: "Cultivated?": For clarity, use _map instead of _if to translate boolean to "cultivated". Translate "No" to "wild" (the opposite of "cultivated") to store an explicit not-cultivated as such.
Aaron Marcuse-Kubitza
04:26 PM Revision 1474: inputs/NCU-NCSC/maps/DwC.specimens.csv: "Cultivated?": For clarity, use _map instead of _if to translate boolean to "cultivated". Translate "No" to "wild" (the opposite of "cultivated") to store an explicit not-cultivated as such.
Aaron Marcuse-Kubitza
04:21 PM Revision 1473: xml_func.py: _map: empty map entry means None
Aaron Marcuse-Kubitza
04:10 PM Revision 1472: xml_func.py: _avg: Support empty inputs by returning None. Moved _range after _rangeStart/_rangeEnd since it's less frequently used.
Aaron Marcuse-Kubitza
04:07 PM Revision 1471: units.py: Restructured to use a Quantity object for the units-tagged value and conversion functions quantity2str() and str2quantity() to convert between that and a raw string. Added convert() with basic support for removing units and passing through matching units. xml_func.py: _units: Added "to" attr. VegBIEN mappings: Remove units using new _units "to" attr instead of temporary workaround in _units.
Aaron Marcuse-Kubitza
03:13 PM Revision 1470: xml_func.py: _units: default units attr renamed to default to clarify that it's not the units you're converting to
Aaron Marcuse-Kubitza
03:06 PM Revision 1469: xml_func.py: Added documentation labels to each section of XML functions
Aaron Marcuse-Kubitza
03:01 PM Revision 1468: Moved units-related functions from format.py to new units.py
Aaron Marcuse-Kubitza
02:55 PM Revision 1467: lib/*.py: Removed svn:executable property to turn execute bit off
Aaron Marcuse-Kubitza
02:45 PM Revision 1466: vegbien.sql: growthform (and taxonclass) enum: Added options suggested by Michael Lee. Removed "woody". establishmentmeans_dwc (and taxonclass) enum: Reordered to match order of taxonoccurrence boolean fields, and to place each option next to its opposite. taxonclass enum: Moved "woody" to bottom because it's no longer part of growthform.
Aaron Marcuse-Kubitza

03/18/2012

09:10 PM Revision 1465: VegBIEN mappings: distance fields: Remove units
Aaron Marcuse-Kubitza
09:08 PM Revision 1464: xml_func.py: _units: Allow value to be NULL
Aaron Marcuse-Kubitza
08:44 PM Revision 1463: xml_func.py: _units: Use new format.cleanup_units() to do units parsing
Aaron Marcuse-Kubitza
08:43 PM Revision 1462: format.py: Added clean_numeric(), str2int(), str2float(). Added units-related functions. Added documentation labels to each section.
Aaron Marcuse-Kubitza
06:42 PM Revision 1461: Added filter_errors to filters `map` error messages
Aaron Marcuse-Kubitza
06:40 PM Revision 1460: Renamed bin/errors_filter_* to filter_errors_* to sound more natural and to have a different prefix than error_stats so that both can easily be tab-completed at the command line
Aaron Marcuse-Kubitza
06:27 PM Revision 1459: README.TXT: Testing: Added instructions for testing just mapping process, just map spreadsheet generation, and everything
Aaron Marcuse-Kubitza
06:26 PM Revision 1458: root Makefile: Added test-all for most complete coverage. Removed extraneous ";" at the end of the prerequisites line of rules with a recipe.
Aaron Marcuse-Kubitza
06:02 PM Revision 1457: mappings/Makefile: Use new ci_map to make DwC.cs-VegBIEN.specimens.csv case-insensitive
Aaron Marcuse-Kubitza
06:02 PM Revision 1456: Added ci_map to make a map spreadsheet case-insensitive.
Aaron Marcuse-Kubitza
05:53 PM Revision 1455: mappings: DwC: Generate case-insensitive map of DwC1 and DwC2 together, rather than just DwC2. DwC1-DwC2.specimens.csv: Make input columns lowercase so that case-insensitization will work properly.
Aaron Marcuse-Kubitza
05:52 PM Revision 1454: inputs/SpeciesLink: Switched to using flat files instead of DB
Aaron Marcuse-Kubitza
05:52 PM Revision 1453: inputs/MO: Switched to using flat files instead of DB
Aaron Marcuse-Kubitza
05:51 PM Revision 1452: mappings: DwC: Generate case-insensitive map of DwC1 and DwC2 together, rather than just DwC2. DwC1-DwC2.specimens.csv: Make input columns lowercase so that case-insensitization will work properly.
Aaron Marcuse-Kubitza
04:55 PM Revision 1451: input.Makefile: Mapping: Support multiple segments of a source table flat file. Use with_cat_csv if flat file segment(s) are available; otherwise use the input file in $+ or the input database, if any. Don't look for an explicit CSV header file because it can now be handled as the first segment if appropriately named.
Aaron Marcuse-Kubitza
04:50 PM Revision 1450: Added with_cat_csv
Aaron Marcuse-Kubitza
04:50 PM Revision 1449: with_cat: Added support for custom cat command in env var
Aaron Marcuse-Kubitza
04:49 PM Revision 1448: cat_csv: Abort if output stream closed instead of exiting with an IOError
Aaron Marcuse-Kubitza
04:16 PM Revision 1447: cat_csv: Ignore any duplicated headers instead of requiring each CSV to have a header identical to the first. Rewrote to pass the CSVs through as lines rather than parsing each row. Because the CSVs are not parsed, checked that all CSVs have the same dialect.
Aaron Marcuse-Kubitza
04:14 PM Revision 1446: csvs.py: Added csv modifications to compare Dialect instances
Aaron Marcuse-Kubitza
04:13 PM Revision 1445: util.py: Added classes_eq()
Aaron Marcuse-Kubitza

03/16/2012

06:25 PM Revision 1444: csvs.py: Added stream_info() to return NamedTuple {header_line, dialect} for later use in cat_csv. Changed reader_and_header() to use stream_info().
Aaron Marcuse-Kubitza
06:23 PM Revision 1443: util.py: Added NamedTuple
Aaron Marcuse-Kubitza
06:04 PM Revision 1442: csvs.py: reader_and_header(): Restrict delimiters to common delimiters so that e.g. letters are not considered delimiters just because they appear frequently
Aaron Marcuse-Kubitza
05:38 PM Revision 1441: Renamed inputs/NYBG to inputs/NY to match herbarium code
Aaron Marcuse-Kubitza
05:35 PM Revision 1440: Renamed inputs/UNC-NCSC to inputs/NCU-NCSC to match herbarium code
Aaron Marcuse-Kubitza
05:32 PM Revision 1439: Renamed inputs/UArizona to inputs/ARIZ to match herbarium code
Aaron Marcuse-Kubitza
05:31 PM Revision 1438: Regenerated inputs/MO/maps/src.join.specimens.csv
Aaron Marcuse-Kubitza
05:26 PM Revision 1437: Renamed inputs/MOBOT to inputs/MO to match herbarium code
Aaron Marcuse-Kubitza
05:11 PM Revision 1436: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
05:08 PM Revision 1435: vegbien.sql: taxonoccurrence: Added cultivatedbasis
Aaron Marcuse-Kubitza
05:03 PM Revision 1434: vegbien.sql: Moved all accessioncode fields to the bottom of their tables. vegbien.ERD.mwb: Adjusted lines to remove overlaps.
Aaron Marcuse-Kubitza
04:52 PM Revision 1433: vegbien.sql: taxonoccurrence: Added iscultivated, isnative. Moved accessioncode to bottom.
Aaron Marcuse-Kubitza
04:36 PM Revision 1432: vegbien.sql: Changed taxonoccurrence.growthform type to more specific growthform
Aaron Marcuse-Kubitza
04:34 PM Revision 1431: vegbien.sql: Added growthform and establishmentmeans_dwc enums using values from taxonclass. Documented that taxonclass is growthform + establishmentmeans_dwc + some other values.
Aaron Marcuse-Kubitza
04:22 PM Revision 1430: VegBIEN: Moved aggregateoccurrence.growthform to taxonoccurrence
Aaron Marcuse-Kubitza
04:21 PM Revision 1429: Added inputs/UNC-NCSC/maps/src.join.specimens.csv
Aaron Marcuse-Kubitza
04:15 PM Revision 1428: VegBIEN: Merged aggregateoccurrence.verbatimcollectorname and specimenreplicate.verbatimcollectorname into taxonoccurrence
Aaron Marcuse-Kubitza
03:58 PM Revision 1427: xml_func.py: parse_range(): Handle negative numbers by treating them as not a range
Aaron Marcuse-Kubitza
03:31 PM Revision 1426: Added inputs/UNC-NCSC/test with initial accepted test outputs
Aaron Marcuse-Kubitza
03:31 PM Revision 1425: Added inputs/UNC-NCSC/maps
Aaron Marcuse-Kubitza
03:31 PM Revision 1424: xml_func.py: _replace: Fixed bug where value entry was not unpacked
Aaron Marcuse-Kubitza
02:59 PM Task #387 (New): count how many duplicates between Canadensys and GBIF
Aaron Marcuse-Kubitza
02:59 PM Task #386 (Resolved): load Canadensys data
http://data.canadensys.net/ipt/ Aaron Marcuse-Kubitza
02:58 PM Task #385 (Resolved): implement mechanism to determine which specimenreplicates refer to the same specimen
* or are the same record, from different data sources Aaron Marcuse-Kubitza
02:58 PM Task #384 (Resolved): prototype tree traversal algorithm
Aaron's alternative algorithm which cross-links each node to its ancestors using a many:many table Aaron Marcuse-Kubitza
02:56 PM Task #383 (New): convert VegBank data dictionary to database comments
* VegBank data dictionary source code is in svn at https://code.ecoinformatics.org/code/vegbank/trunk/docs/xml/db_mod... Aaron Marcuse-Kubitza
12:36 PM Revision 1423: Added inputs/UNC-NCSC
Aaron Marcuse-Kubitza

03/15/2012

07:12 PM Revision 1422: Added inputs/MOBOT/test with initial accepted test outputs
Aaron Marcuse-Kubitza
07:11 PM Revision 1421: Added inputs/MOBOT/maps
Aaron Marcuse-Kubitza
06:51 PM Revision 1420: Added inputs/MOBOT
Aaron Marcuse-Kubitza
06:41 PM Revision 1419: VegX mappings: Updated plot place mappings to VegX 1.5.3 method of place type-tagged place names. This removes the userdef fields in plot.
Aaron Marcuse-Kubitza
06:18 PM Revision 1418: VegX mappings: Changed userdef xPosition, yPosition to /relativePlotPosition/relativeX, /relativePlotPosition/relativeY
Aaron Marcuse-Kubitza
06:16 PM Revision 1417: Regenerated mappings/DwC-VegBIEN.specimens.no_empty.csv
Aaron Marcuse-Kubitza
05:36 PM Revision 1416: bin/map: map_table(): wrap_row(): Use util.list_as_length() to handle CSV rows of different lengths
Aaron Marcuse-Kubitza
05:35 PM Revision 1415: util.py: Added list_as_length(). Documented that list_set_length() takes a list, not a tuple. Documented that ListDict must have len(list_) == len(keys).
Aaron Marcuse-Kubitza
05:19 PM Revision 1414: util.py: Added list_set_length(). Changed list_set() to use list_set_length().
Aaron Marcuse-Kubitza

03/13/2012

07:48 PM Revision 1413: mappings/DwC2-VegBIEN.specimens.csv: Added empty *_id/taxonoccurrence attr to primary keys to ensure that a taxonoccurrence is always created for the specimenreplicate
Aaron Marcuse-Kubitza
07:41 PM Revision 1412: xml_func.py: _label: Use ustr instead of str when checking types
Aaron Marcuse-Kubitza
07:41 PM Revision 1411: csvs.py: Set dialect.doublequote to True because Sniffer doesn't turn this on by default
Aaron Marcuse-Kubitza
07:23 PM Revision 1410: Merged inputs/NYBG-CSV into NYBG
Aaron Marcuse-Kubitza
07:16 PM Revision 1409: Merged inputs/UArizona-CSV into UArizona
Aaron Marcuse-Kubitza
07:02 PM Revision 1408: Added inputs/SpeciesLink/test
Aaron Marcuse-Kubitza
07:02 PM Revision 1407: Added inputs/SpeciesLink/maps
Aaron Marcuse-Kubitza
07:02 PM Revision 1406: xml_func.py: range-related funcs: Made inputs optional in case they get set to NULL by _nullIf
Aaron Marcuse-Kubitza
06:48 PM Revision 1405: mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema
Aaron Marcuse-Kubitza
06:31 PM Revision 1404: bin/map: Added support for getting columns with an optional prefix list for DB/CSV inputs
Aaron Marcuse-Kubitza
06:21 PM Revision 1403: bin/map: Factored out code common to DB and CSV inputs into map_table()
Aaron Marcuse-Kubitza
06:00 PM Revision 1402: bin/map: Parse any prefixes in map input column name. They will later be used to check for versions of columns with a prefix added when processing CSV/DB inputs.
Aaron Marcuse-Kubitza
05:58 PM Revision 1401: strings.py: Added split(), remove_prefix(), remove_suffix(), and remove_prefixes(). Added section comments.
Aaron Marcuse-Kubitza
05:06 PM Revision 1400: mappings/DwC2-VegBIEN.specimens.csv: minimumElevationInMeters: Handle embedded ranges using _rangeStart and _rangeEnd
Aaron Marcuse-Kubitza
05:05 PM Revision 1399: xml_func.py: Added _rangeStart and _rangeEnd
Aaron Marcuse-Kubitza
05:04 PM Revision 1398: xpath.py: parse(): Split paths: Raise a SyntaxException if can't attach a split path because there is no parent element to attach to
Aaron Marcuse-Kubitza
05:02 PM Revision 1397: Parser.py: Renamed _syntax_err() to syntax_err() to make it a public method
Aaron Marcuse-Kubitza
04:38 PM Revision 1396: mappings/DwC2-VegBIEN.specimens.csv: Mapped fieldNotes and taxonRemarks to description using _merge. inputs/UArizona*/maps/DwC.specimens.csv: Mapped Remarks to taxonRemarks, which now has a VegBIEN mapping.
Aaron Marcuse-Kubitza
04:24 PM Revision 1395: Added inputs/GBIF/src with small files that can be under version control
Aaron Marcuse-Kubitza
04:23 PM Revision 1394: input.Makefile: svn_props: Ignore everything in the src/ subdir that hasn't been explicitly checked in
Aaron Marcuse-Kubitza
04:18 PM Revision 1393: Added inputs/GBIF/test with accepted test outputs
Aaron Marcuse-Kubitza
04:18 PM Revision 1392: Added inputs/GBIF/maps
Aaron Marcuse-Kubitza
04:17 PM Revision 1391: Regenerated inputs/UArizona*/maps VegBIEN maps
Aaron Marcuse-Kubitza
04:13 PM Revision 1390: Regenerated mappings/DwC-VegBIEN.specimens.no_empty.csv
Aaron Marcuse-Kubitza
04:09 PM Revision 1389: bin/map: Use new csvs.reader_and_header() to support CSVs/TSVs with other than the default Excel dialect
Aaron Marcuse-Kubitza
04:08 PM Revision 1388: Added csvs.py for CSV I/O such as automatically detecting the dialect based on the header line
Aaron Marcuse-Kubitza
04:07 PM Revision 1387: join: Don't append suffix to empty output mappings, so that they stay empty ("NULL")
Aaron Marcuse-Kubitza
04:00 PM Revision 1386: input.Makefile: Added tsv to $(exts). Strip extra whitespace from $(inputs) so that it's the empty string if $(<in) (and $(<in).header) don't exist, and can be used in $(if ...).
Aaron Marcuse-Kubitza

03/12/2012

07:08 PM Revision 1385: input.Makefile: Fixed bug in inputFiles wildcard where extensions were manually listed instead of dynamically determined from the $(exts) config var
Aaron Marcuse-Kubitza
06:56 PM Revision 1384: README.TXT: Tell user to `disown -h %1` after running `make import &` so that it won't be sent a SIGHUP if the user logs out
Aaron Marcuse-Kubitza
06:55 PM Revision 1383: README.TXT: Tell user to `disown -h %1` after running `make import &` so that it won't be sent a SIGHUP if the user logs out
Aaron Marcuse-Kubitza
06:39 PM Revision 1382: input.Makefile: Prepend separate CSV header when available
Aaron Marcuse-Kubitza
06:24 PM Revision 1381: input.Makefile: Use with_cat in map to later support prepending separate CSV headers
Aaron Marcuse-Kubitza
06:21 PM Revision 1380: Added with_cat to run a command, taking input from the concatenation of files
Aaron Marcuse-Kubitza
05:48 PM Revision 1379: input.Makefile: Set mapEnv if $(dbEngine) is set, to eventually support pre-existing DB connections
Aaron Marcuse-Kubitza
05:14 PM Revision 1378: input.Makefile: Changed $(dbFile) to $(dbExport) to make it unambiguous that it refers to a SQL export, not a pre-existing DB, which will be supported later
Aaron Marcuse-Kubitza
05:10 PM Revision 1377: input.Makefile: Added .txt to list of input file extensions
Aaron Marcuse-Kubitza
04:34 PM Revision 1376: Added inputs/SpeciesLink
Aaron Marcuse-Kubitza
03:57 PM Revision 1375: root Makefile: python-Linux: Added pymetrics
Aaron Marcuse-Kubitza
03:54 PM Revision 1374: bin/map: Consider \N to be None
Aaron Marcuse-Kubitza
03:49 PM Revision 1373: util.py: none_if(): Allow multiple none_vals using varargs
Aaron Marcuse-Kubitza
03:36 PM Revision 1372: Added inputs/GBIF
Aaron Marcuse-Kubitza
03:28 PM Revision 1371: exc.py: Fixed bug in traceback-saving mechanism that didn't deal with nested Exceptions (such as Exceptions with causes in ExceptionWithCause). Renamed add_exc_info() to add_traceback() since we really only need to store the traceback.
Aaron Marcuse-Kubitza
12:41 PM Revision 1370: dates.py: parse_date_range(): Fixed bug where the date parts were not joined back together into a string for each date range element. Use strings.single_space() after the date has been split into range parts so that whitespace around the range separator is removed instead of being replaced with a single space.
Aaron Marcuse-Kubitza
12:25 PM Revision 1369: xml_func.py: process(): Also catch XML func internal errors to assist in debugging. Use new exc.add_exc_info() to save traceback in case later code throws exception, overwriting exc_info().
Aaron Marcuse-Kubitza
12:23 PM Revision 1368: exc.py: str_(): Add the traceback at the end of the exception string. Added add_exc_info() and get_exc_info() for providing traceback info for str_().
Aaron Marcuse-Kubitza

03/11/2012

07:33 PM Revision 1367: mappings/DwC2-VegBIEN.specimens.csv: eventDate, dateIdentified: Use _dateRangeStart and _dateRangeEnd
Aaron Marcuse-Kubitza
07:32 PM Revision 1366: xml_func.py: Added _dateRangeStart and _dateRangeEnd
Aaron Marcuse-Kubitza
07:32 PM Revision 1365: dates.py: Added parse_date_range() and helper funcs could_be_year() and could_be_day()
Aaron Marcuse-Kubitza
07:31 PM Revision 1364: strings.py: Added single_space()
Aaron Marcuse-Kubitza
06:12 PM Revision 1363: inputs/UArizona*: Map the ScientificNameAuthor to the binomial instead since it contains the binomial in addition to the authority
Aaron Marcuse-Kubitza
05:28 PM Revision 1362: Added inputs/UArizona-CSV/test
Aaron Marcuse-Kubitza
05:23 PM Revision 1361: input.Makefile: Use .PRECIOUS to save outputs of failed tests so they can be accepted (needed now that .DELETE_ON_ERROR is turned on globally)
Aaron Marcuse-Kubitza
05:14 PM Revision 1360: bin/map: Moved string-cleanup code from get_value() to cleanup(), called by process_row(). process_row() now cleans up the string before checking if it's None, because cleanup() uses none_if() to map "" to None.
Aaron Marcuse-Kubitza
05:12 PM Revision 1359: util.py: Added do_ignore_none()
Aaron Marcuse-Kubitza
04:25 PM Revision 1358: Added inputs/UArizona-CSV/verify
Aaron Marcuse-Kubitza
04:24 PM Revision 1357: Added inputs/UArizona-CSV/maps
Aaron Marcuse-Kubitza
04:23 PM Revision 1356: mappings/DwC2-VegBIEN.specimens.csv: Mapped coordinateUncertaintyInMeters to the same place as coordinatePrecision (input sources generally use only one of these columns, which is most likely the accuracy regardless of what it's named)
Aaron Marcuse-Kubitza
04:18 PM Revision 1355: join: In error message when map column names don't match, include the actual column names
Aaron Marcuse-Kubitza
04:17 PM Revision 1354: Makefiles: Added .DELETE_ON_ERROR to delete target if recipe fails
Aaron Marcuse-Kubitza
03:18 PM Revision 1353: VegBIEN mappings: plantnames: Nest taxons hierarchically using plantname.parent_id. Mappings using _forEach: Append a "," to the `in` list so that mappings will sort from shortest to longest `in` list ("]" comes after "," in ASCII, causing this not to happen without the trailing ",").
Aaron Marcuse-Kubitza
03:14 PM Revision 1352: xpath.py: parse(): _paths(): Remove trailing ","
Aaron Marcuse-Kubitza
02:38 PM Revision 1351: xpath_func.py: _forEach: Made syntax more natural-looking by using values instead of names for string args and attrs instead of branches for array args
Aaron Marcuse-Kubitza
02:36 PM Revision 1350: xpath.py: parse() Fixed bug in _paths() where empty lists would be parsed as a list containing a single empty path, instead of as an empty list
Aaron Marcuse-Kubitza
01:26 PM Revision 1349: VegBIEN mappings: Place names: Use _forEach to simplify XPaths for recursively nested places
Aaron Marcuse-Kubitza
01:22 PM Revision 1348: bin/map: In debug mode, print output XPaths
Aaron Marcuse-Kubitza

03/09/2012

07:51 PM Revision 1347: xpath_func.py: _forEach: Fixed to support _val replacements anywhere, by doing a string-based search-and-replace on a quoted XPath instead of a list-based search-and-replace on an already-parsed XPath
Aaron Marcuse-Kubitza
07:41 PM Revision 1346: xpath_func.py: Renamed _for to _forEach. Finished implementing _forEach.
Aaron Marcuse-Kubitza
07:41 PM Revision 1345: xpath.py: Import xpath_func after defining XpathElem because xpath_func depends on XpathElem and it hasn't yet been factored into a separate file
Aaron Marcuse-Kubitza
07:39 PM Revision 1344: util.py: Added list_replace()
Aaron Marcuse-Kubitza
07:14 PM Revision 1343: xpath_func.py: Changed XPath function signature to take arguments (args, path), and process() to parse out the args. Implemented basic _for that repeats its do arg as many times as there are in_ elements.
Aaron Marcuse-Kubitza
06:44 PM Revision 1342: xpath.py: parse(): Run xpath_func.process() on the parsed XPath
Aaron Marcuse-Kubitza
06:43 PM Revision 1341: Added xpath_func.py for XPath "function" elements that transform their subpaths
Aaron Marcuse-Kubitza
06:23 PM Revision 1340: VegBIEN mappings: Removed no longer needed taxondetermination.determinationtype values, because they can be determined from the new role closed list
Aaron Marcuse-Kubitza
06:19 PM Revision 1339: filter_ERD.csv: Removed no longer needed references to role
Aaron Marcuse-Kubitza
06:18 PM Revision 1338: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
06:17 PM Revision 1337: VegBIEN: Changed role table to a closed list
Aaron Marcuse-Kubitza
06:14 PM Revision 1336: PostgreSQL-MySQL.csv: custom types: Consider everything except a set of accepted types to be a custom type
Aaron Marcuse-Kubitza
05:40 PM Revision 1335: VegBIEN: taxonrank enum: Made values lowercase to match case convention in other enums
Aaron Marcuse-Kubitza
05:33 PM Revision 1334: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
05:32 PM Revision 1333: vegbien.sql: Renamed plantconceptscope to plantnamescope because it's now attached to plantname
Aaron Marcuse-Kubitza
05:26 PM Revision 1332: vegbien.sql: Moved parent_id from plantconcept to plantname, since plantnames themselves are unique according to their parent taxons (a species under one genus is not the same as a species under another genus)
Aaron Marcuse-Kubitza
05:03 PM Revision 1331: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
04:59 PM Revision 1330: vegbien.ERD.mwb: Fixed lines
Aaron Marcuse-Kubitza
04:57 PM Revision 1329: vegbien.sql: Moved scope_id from plantconcept to plantname, since plantnames themselves are scoped, not just the plantconcepts that use them (e.g. "sp. 1" has different meanings in different scopes, so it should not be shared between scopes). plantname: Added accessioncode.
Aaron Marcuse-Kubitza
04:38 PM Revision 1328: vegbien.sql: Moved plantconcept parent_id from plantstatus to plantconcept. plantconcept: Removed datasource-specific fields to make it globally unique (one plantconcept for each assigned parent taxon of a plantname, of which there will usually be just one)
Aaron Marcuse-Kubitza
04:22 PM Revision 1327: vegbien.sql: plantname: Removed datasource-specific fields to make this a globally-unique table (the datasource-specific fields belong in plantconcept)
Aaron Marcuse-Kubitza
04:16 PM Revision 1326: Added inputs/UArizona/verify
Aaron Marcuse-Kubitza
04:15 PM Revision 1325: mappings/verify.specimens.sql: Updated for schema changes
Aaron Marcuse-Kubitza
04:06 PM Revision 1324: vegbien.sql: placerank enum: Added "village"
Aaron Marcuse-Kubitza
04:00 PM Revision 1323: VegBIEN mappings: lat/long locationdetermination: Removed [!namedplace_id] key so that it's merged into the namedplace locationdetermination
Aaron Marcuse-Kubitza
03:54 PM Revision 1322: VegBIEN mappings: Changed namedplace mappings to use new nested format for storing place containment relationships
Aaron Marcuse-Kubitza
03:44 PM Revision 1321: xml_func.py: Added _simplifyPath
Aaron Marcuse-Kubitza
03:25 PM Revision 1320: xpath.py: Added get_1()
Aaron Marcuse-Kubitza
02:50 PM Revision 1319: vegbien.sql: namedplace: Removed parent_id from unique constraint because some data might be missing intervening links (e.g. state for a county, country), but the place (e.g. county) should still be attached to the existing place of the same name and rank (which will hopefully already have the correct parent_id link)
Aaron Marcuse-Kubitza
02:46 PM Revision 1318: vegbien.sql: namedplace: Made rank required
Aaron Marcuse-Kubitza
02:33 PM Revision 1317: vegbien.sql: namedplace: Removed no longer needed placesystem, which has been replaced by rank closed list
Aaron Marcuse-Kubitza
02:30 PM Revision 1316: VegBIEN mappings: Map namedplaces using new rank field
Aaron Marcuse-Kubitza
02:25 PM Revision 1315: vegbien.sql: namedplace: Added rank. Do duplicate elimination using rank and parent_id instead of placesystem
Aaron Marcuse-Kubitza
02:20 PM Revision 1314: vegbien.sql: placerank: Standardized names to DwC/GML
Aaron Marcuse-Kubitza
01:58 PM Task #378 (New): create automated feedback mechanism
* triggered when an import is run Aaron Marcuse-Kubitza
01:57 PM Task #377 (Resolved): ask NYBG for direct access to server
Aaron Marcuse-Kubitza
01:06 PM Revision 1313: vegbien.sql: Added placerank enum
Aaron Marcuse-Kubitza
12:35 PM Revision 1312: vegbien.sql: namedplace: Removed VegBank internal fields and datasource scoping fields (namedplaces are globally unique). Added parent_id to point to containing namedplace.
Aaron Marcuse-Kubitza
12:21 PM Revision 1311: xml_func.py: Added _dateRangePart with partial implementation (only works on strings with no range)
Aaron Marcuse-Kubitza
12:20 PM Revision 1310: DwC mappings: Moved date _date filter outside _alt so it would run only on the string that was actually chosen, and not produce date format errors when a pre-parsed year/month/day is already available
Aaron Marcuse-Kubitza

03/08/2012

06:30 PM Revision 1309: xml_func.py: _date: Map date with only empty fields to NULL (occurs when all fields were e.g. 0 and were filtered to NULL by _nullIf)
Aaron Marcuse-Kubitza
06:00 PM Revision 1308: xml_func.py: _date: Removed mapping year/month/day of 0 to NULL because that is now handled on a case-by-case basis in the mappings
Aaron Marcuse-Kubitza
05:58 PM Revision 1307: mappings/DwC1-DwC2.specimens.csv: Map year/month/day of 0 to NULL
Aaron Marcuse-Kubitza
05:13 PM Revision 1306: inputs/SALVIAS/maps/VegX.organisms.csv: Habit: Fixed syntax error in growthForm map
Aaron Marcuse-Kubitza
05:11 PM Revision 1305: inputs/SALVIAS/maps/VegX.organisms.csv: Habit: Removed input values from growthForm map that Brad said were invalid
Aaron Marcuse-Kubitza
05:10 PM Revision 1304: xml_func.py: _map: Added option to make map a closed list
Aaron Marcuse-Kubitza
04:56 PM Revision 1303: mappings/DwC2-VegBIEN.specimens.csv: Fixed waterdepth mappings to use _avg
Aaron Marcuse-Kubitza

03/06/2012

06:48 PM Revision 1302: mappings/verify.specimens.sql: Use ORDER BY ... NULLS FIRST to match MySQL
Aaron Marcuse-Kubitza
06:42 PM Revision 1301: input.Makefile: verify: Time the verification since it can take a long time
Aaron Marcuse-Kubitza
06:34 PM Revision 1300: specimens verification: Added duplicate catalog numbers test
Aaron Marcuse-Kubitza
06:27 PM Revision 1299: map: On nimoy, use bien2_staging unless otherwise specified
Aaron Marcuse-Kubitza
06:21 PM Revision 1298: specimens verification: Added # counties test
Aaron Marcuse-Kubitza
05:34 PM Revision 1297: specimens verification: Added collection codes and # catalog numbers tests
Aaron Marcuse-Kubitza
05:33 PM Revision 1296: inputs/SALVIAS/maps/VegX.organisms.csv: Mapped custom Habit values not listed in the SALVIAS data dictionary
Aaron Marcuse-Kubitza
05:32 PM Revision 1295: strings.py: Added unicode_reader for later use in handling Unicode characters in map spreadsheets
Aaron Marcuse-Kubitza
03:45 PM Revision 1294: xpath.py: Removed unnecessary copy.deepcopy()'s and instead changed set_value() and set_id() to make copies of any elements they change. This should result in up to a 17% speed increase in the import, because deepcopy() was taking a lot of time. Added documentation to set_value() and set_id() that caller must make a shallow copy of the path to prevent modifications from propagating to other copies of the path. (Previously, a deep copy was needed, but there was no comment specifying this.)
Aaron Marcuse-Kubitza
03:40 PM Revision 1293: mappings/VegX-VegBIEN.organisms.csv: Removed unneeded lookahead assertions from stemtag mappings. They relied on a bug ("feature"?) in the XPath engine that made the value of the lookahead assertion's path the same as the value of the main path, even though the value is set after the path is parsed.
Aaron Marcuse-Kubitza
02:45 PM Revision 1292: xml_func.py: _date: For year/month/day dates, require the year (it would not make sense to default to a particular year)
Aaron Marcuse-Kubitza
01:29 PM Revision 1291: inputs/UArizona: Added test outputs
Aaron Marcuse-Kubitza
01:28 PM Revision 1290: mappings/DwC1-DwC2.specimens.csv: Fixed to allow datasource to define custom date mappings that don't pass through the default date mapping
Aaron Marcuse-Kubitza

03/05/2012

05:31 PM Revision 1289: input.Makefile: Generate maps/src.join.*.csv, which can be used to determine which DwC fields for a particular dataset do not yet have a join mapping to VegBIEN
Aaron Marcuse-Kubitza
05:26 PM Revision 1288: Makefile: Fixed subdir remake target to work for nested subdirs as well
Aaron Marcuse-Kubitza
04:51 PM Revision 1287: inputs/UArizona: Renamed maps/src.csv to maps/src.specimens.csv because there will be one for each input table
Aaron Marcuse-Kubitza
04:41 PM Revision 1286: inputs/UArizona: Added maps/src.csv with columns from source data
Aaron Marcuse-Kubitza
04:40 PM Revision 1285: Added autogen mappings/DwC-VegBIEN.specimens.no_empty.csv, which will be used for determining which DwC fields for a particular dataset do not yet have a join mapping to VegBIEN
Aaron Marcuse-Kubitza
04:35 PM Revision 1284: Added remove_empty to remove empty mappings in a map spreadsheet
Aaron Marcuse-Kubitza
04:35 PM Revision 1283: join: Don't raise "No join mapping" error for empty mappings because you only want the error for empty mappings for your particular dataset, which requires more information (namely, the subset of the mappings used by your dataset, some of which will not be in the mappings if standard fields have been subtracted out)
Aaron Marcuse-Kubitza
04:10 PM Revision 1282: join: Fixed bug in "No join mapping" error generation where rows with no existing comments column would cause an IndexError
Aaron Marcuse-Kubitza
04:09 PM Revision 1281: util.py: Added list_set() and list_setdefault()
Aaron Marcuse-Kubitza
03:44 PM Revision 1280: inputs/UArizona/maps/DwC.specimens.csv: Merge FieldNotes and Remarks
Aaron Marcuse-Kubitza
03:35 PM Revision 1279: inputs/UArizona/maps/DwC.specimens.csv: Finished mappings
Aaron Marcuse-Kubitza
03:08 PM Revision 1278: inputs/UArizona/maps/DwC.specimens.csv: Removed fields already present in DwC mappings
Aaron Marcuse-Kubitza
03:05 PM Revision 1277: inputs/NYBG-CSV/maps/DwC.specimens.csv: Removed mappings already present in case-insensitive DwC2 mapping
Aaron Marcuse-Kubitza
03:03 PM Revision 1276: inputs/NYBG/maps/DwC.specimens.csv: Removed mappings already present in case-insensitive DwC2 mapping
Aaron Marcuse-Kubitza
02:48 PM Revision 1275: mappings/DwC1-DwC2.specimens.csv: Removed fields already present in DwC2.ci-VegBIEN.specimens.csv
Aaron Marcuse-Kubitza
02:38 PM Revision 1274: Makefiles: Moved remake into main Makefile. Fixed remake to run `make all` in a new make so that cache of existing files is reset. Have main remake run clean and then all instead of forwarding remake to subdirs, so that everything is cleaned before everything is remade.
Aaron Marcuse-Kubitza
02:21 PM Revision 1273: input.Makefile: maps: maps/$(via).%.full.csv: Fixed bug where $(selfMap) would be ignored if it had not yet been made
Aaron Marcuse-Kubitza
02:02 PM Revision 1272: mappings/Makefile: Reorganized into DwC and VegX sections
Aaron Marcuse-Kubitza
02:02 PM Revision 1271: Added autogenerated mappings/DwC2.ci-VegBIEN.specimens.csv. Use it to include DwC2 fields with first letter uppercased in the full DwC mapping, so that datasources that use DwC2 terms with a different case can still use the DwC2 mapping.
Aaron Marcuse-Kubitza
01:57 PM Revision 1270: Added autogenerated mappings/DwC2.ci-VegBIEN.specimens.csv. Use it to include DwC2 fields with first letter uppercased in the full DwC mapping, so that datasources that use DwC2 terms with a different case can still use the DwC2 mapping.
Aaron Marcuse-Kubitza
01:54 PM Revision 1269: inputs/UArizona/maps/DwC.specimens.csv: Mapped CollectedDate to eventDate/_alt/2 even though it's not used because other datasources might copy these mappings and want it already filled in
Aaron Marcuse-Kubitza
01:52 PM Revision 1268: Added ucase_first to uppercase the first character of columns in a spreadsheet
Aaron Marcuse-Kubitza
01:21 PM Revision 1267: Added inputs/UArizona/maps/DwC.specimens.csv autogen maps
Aaron Marcuse-Kubitza
01:20 PM Revision 1266: inputs/UArizona/maps/DwC.specimens.csv: Mapped more fields
Aaron Marcuse-Kubitza
01:14 PM Revision 1265: mappings/DwC1-DwC2.specimens.csv: Remove date -> date/_alt/2 mappings because they prevent the original DwC2 date field from being mapped to without an extra /_alt/2 appended
Aaron Marcuse-Kubitza
01:10 PM Revision 1264: xml_func.py: Use new dates.strtotime(). When component date parts specified, year defaults to dates.epoch.year.
Aaron Marcuse-Kubitza
01:09 PM Revision 1263: dates.py: Added strtotime() to wrap dateutil.parser.parse() with default defaulting to epoch, so that e.g. months with day missing default to day 1 instead of the current day of the month
Aaron Marcuse-Kubitza
12:38 PM Revision 1262: mappings/DwC1-DwC2.specimens.csv: Map eventDate,dateIdentified using /_alt/2 and year/month/day using /_alt/1 so that inputs with both a date and date parts will select between the two
Aaron Marcuse-Kubitza
11:43 AM Revision 1261: input.Makefile: Added comment that self map must be made first if it's needed for maps/$(via).%.full.csv
Aaron Marcuse-Kubitza
11:40 AM Revision 1260: Makefiles: Use .SECONDARY with no prerequisites instead of setting a .PRECIOUS for each intermediate, to simplify turning off automatic deletion of intermediate files
Aaron Marcuse-Kubitza
11:23 AM Revision 1259: inputs/UArizona: Added initial maps/DwC.specimens.csv
Aaron Marcuse-Kubitza
11:10 AM Revision 1258: DwC mappings: Map datasource name via institutionID to avoid conflicting with existing institutionCode fields that many DwC data sources have
Aaron Marcuse-Kubitza
10:57 AM Revision 1257: input.Makefile: Don't profile by default because it appears to slow things down significantly on long imports
Aaron Marcuse-Kubitza
10:56 AM Revision 1256: Added inputs/UArizona/maps
Aaron Marcuse-Kubitza
10:33 AM Task #372 (Resolved): talk to Nick about proposed changes to VegX
Aaron Marcuse-Kubitza
 

Also available in: Atom