Project

General

Profile

Statistics
| Revision:

# Date Author Comment
1624 03/26/2012 06:19 PM Aaron Marcuse-Kubitza

xml_func.py: _replace: Strip whitespace from the returned string

1623 03/26/2012 06:09 PM Aaron Marcuse-Kubitza

csvs.py: Added TsvReader to support TSV quirks. Added reader_class(). reader_and_header(): Use reader_class() to automatically use TsvReader instead of csv.reader for TSVs. Added is_tsv() and use it where `dialect.delimiter == '\t'` was used.

1622 03/26/2012 06:06 PM Aaron Marcuse-Kubitza

strings.py: Added extract_line_ending() and remove_line_ending(). ensure_newl(): Use new remove_line_ending(). Moved Parsing section to top since it is used by the other sections.

1621 03/26/2012 04:40 PM Aaron Marcuse-Kubitza

csvs.py: stream_info(): Set dialect.quoting = csv.QUOTE_NONE for TSVs because they usually don't quote fields. Factored dialect detecting code into new function sniff().

1620 03/26/2012 03:45 PM Aaron Marcuse-Kubitza

input.Makefile: verify: Added reverify option, which can be turned off to prevent regenerating the verify/%.out file from the DB (which can be time-consuming), and instead just diff verify/%.out with verify/%.ref

1619 03/24/2012 10:31 PM Aaron Marcuse-Kubitza

count_error_rows: Allow input to be specified as last arg(s) in addition to as stdin

1618 03/24/2012 10:30 PM Aaron Marcuse-Kubitza

exc.py: ExPercentTracker: When diplaying fraction of iters that had errors, don't duplicate the iter_text ("row", etc.) in the numerator

1617 03/24/2012 10:27 PM Aaron Marcuse-Kubitza

bin/map: Use new ExPercentTracker iter_num tracking to track distinct row #s with errors

1616 03/24/2012 10:27 PM Aaron Marcuse-Kubitza

exc.py: ExPercentTracker: Track iter_nums of Exceptions as well, to distinguish how many distinct iters had errors

1615 03/24/2012 10:10 PM Aaron Marcuse-Kubitza

Added bin/count_error_rows to count distinct rows with errors in `map` error messages

1614 03/24/2012 09:06 PM Aaron Marcuse-Kubitza

input.Makefile: Changed "%.out: .make" rule to ": %.make" so that any file can be built from a corresponding .make file. This will allow flat files to be retrieved dynamically by running an associated .make file.

1613 03/24/2012 09:01 PM Aaron Marcuse-Kubitza

xml_func.py: FormatException: Inherit from ExceptionWithCause instead of SyntaxError because a FormatException signals a different kind of error condition (related to the input value rather than the function syntax)

1612 03/24/2012 08:57 PM Aaron Marcuse-Kubitza

xml_func.py: Renamed SyntaxException to SyntaxError because it's a user error signaling invalid mappings syntax

1611 03/24/2012 08:55 PM Aaron Marcuse-Kubitza

xml_func.py: SyntaxException: Use ExceptionWithCause to combine msg and cause's msg because it now combines them on one line, which is needed for bin/error_stats to work properly

1610 03/24/2012 08:54 PM Aaron Marcuse-Kubitza

exc.py: ExceptionWithCause: Prepend msg to cause's msg separated by ': ' instead of '\ncause: '

1609 03/24/2012 08:47 PM Aaron Marcuse-Kubitza

xml_func.py: Changed SyntaxException to FormatException where the error was with the input data format rather than the mapping syntax

1608 03/24/2012 08:41 PM Aaron Marcuse-Kubitza

mappings/VegX-VegBIEN.organisms.csv: slopeaspect: Apply new conversion _compass

1607 03/24/2012 08:40 PM Aaron Marcuse-Kubitza

xml_func.py: Added _compass to convert a compass direction (N, NE, NNE, etc.) into a degree heading

1606 03/24/2012 08:38 PM Aaron Marcuse-Kubitza

Added angles.py

1605 03/24/2012 07:37 PM Aaron Marcuse-Kubitza

inputs/SpeciesLink/maps: Updated to use new TAPIR download

1604 03/24/2012 07:29 PM Aaron Marcuse-Kubitza

input.Makefile: All targets can be specified with an optional trailing slash. This enables using tab completion to complete a target name which is also a subdir name, since tab completion appends a trailing slash.

1603 03/24/2012 07:23 PM Aaron Marcuse-Kubitza

bin/tapir/tapir2flat.php: Fixed bug in row assembly where XML elements that weren't found were left out of the array, causing the columns to shift to the left

1602 03/24/2012 07:03 PM Aaron Marcuse-Kubitza

xml_func.py: _map: Factored replacing code out into new function repl(), which can also be used by other XML funcs

1601 03/24/2012 06:46 PM Aaron Marcuse-Kubitza

bin/tapir/tapir2flat.php: Turned off exiting after 3 successive failures, because it causes the import to abort and it doesn't seem to restart where it left off

1600 03/24/2012 03:41 PM Aaron Marcuse-Kubitza

main Makefile: Added instructions to install PHP PEAR and HTTP_Request on Mac OS X

1599 03/24/2012 03:10 PM Aaron Marcuse-Kubitza

Makefile: Added PHP section, which installs php-http-request

1598 03/24/2012 03:05 PM Aaron Marcuse-Kubitza

Moved _archive/tapir2flatClient/trunk/client/ to bin/tapir/

1597 03/24/2012 03:03 PM Aaron Marcuse-Kubitza

_archive/tapir2flatClient/trunk/client/tapir2flat.php: Upgraded to use fputcsv(). This should fix errors caused by embedded delimeters. configurableParams.php: Set default delimeter to ','.

1596 03/24/2012 02:42 PM Aaron Marcuse-Kubitza

mappings/verify.specimens.sql: # species: Don't join at all on genus because DISTINCT is on the plantname_id rather than the plantname, which is already unique for a given genus because plantname_unique includes parent_id

1595 03/24/2012 02:39 PM Aaron Marcuse-Kubitza

mappings/verify.specimens.sql: # species: Fixed to join separately on plantname_ancestor for genus and species

1594 03/24/2012 02:14 PM Aaron Marcuse-Kubitza

input.Makefile: Moved log and trace files to new import subdir. Moved subdir-adding code from inputs/Makefile to input.Makefile.

1593 03/24/2012 01:49 PM Aaron Marcuse-Kubitza

mappings/verify.specimens.sql: Updated for schema changes

1592 03/24/2012 01:36 PM Aaron Marcuse-Kubitza

inputs/*: Added any missing standard subdirs

1591 03/24/2012 01:35 PM Aaron Marcuse-Kubitza

inputs/Makefile: Added %/-add to re-add existing dirs

1590 03/24/2012 01:29 PM Aaron Marcuse-Kubitza

inputs/Makefile: %-add: `svn mkdir` the datasource's standard subdirs

1589 03/23/2012 06:52 PM Aaron Marcuse-Kubitza

schemas/postgresql.nimoy.conf: Increased work_mem (for sorting) and maintenance_work_mem (for vacuum)

1588 03/23/2012 06:45 PM Aaron Marcuse-Kubitza

schemas/postgresql.nimoy.conf: Reset shared_buffers to initial value 24MB because although kernel.shmmax is 32MB, only values up to 26MB seem to work

1587 03/23/2012 06:33 PM Aaron Marcuse-Kubitza

schemas/postgresql.nimoy.conf: Set shared_buffers to SHMMAX

1586 03/23/2012 06:27 PM Aaron Marcuse-Kubitza

Optimized schemas/postgresql.nimoy.conf

1585 03/23/2012 06:04 PM Aaron Marcuse-Kubitza

Added schemas/postgresql.nimoy.conf

1584 03/23/2012 05:59 PM Aaron Marcuse-Kubitza

bin/map: When profiling, print the profile_to destination file

1583 03/23/2012 05:53 PM Aaron Marcuse-Kubitza

Added schemas/postgresql.conf

1582 03/23/2012 05:38 PM Aaron Marcuse-Kubitza

xml_func.py: _date: When converting month name to number, wrap any ValueError in a SyntaxException

1581 03/23/2012 05:33 PM Aaron Marcuse-Kubitza

xml_func.py: XML functions that assume their last argument is a value (_map, etc.): Use new helper function pop_value() to retrieve this value. Return None if value is None because this indicates the input is empty.

1580 03/23/2012 05:22 PM Aaron Marcuse-Kubitza

xml_func.py: _date: Use format.str2int instead of int to convert date parts to int so that strange formatting will be parsed correctly

1579 03/23/2012 05:21 PM Aaron Marcuse-Kubitza

format.py: clean_numeric(): Also fix some OCR errors

1578 03/23/2012 05:15 PM Aaron Marcuse-Kubitza

filter_errors: Default to outputing only the first match

1577 03/23/2012 04:59 PM Aaron Marcuse-Kubitza

xpath.py: Added append() to recursively append subpath to every leaf of a path tree. parse(): Use append() to fix bug in split path parsing where subpath was not added to every leaf of the tree, only the main leaf of the main branch and the main leaves of the other branches of the last element.

1576 03/23/2012 04:27 PM Aaron Marcuse-Kubitza

exc.py: Changed to store multiple tracebacks in an exception, in case an exception is caught and re-raised inside an ExceptionWithCause wrapper. This preserves more of the traceback in this situation, because you get the ExceptionWithCause's traceback as well.

1575 03/23/2012 03:53 PM Aaron Marcuse-Kubitza

input.Makefile: import: Removed verbose=1 because verbose mode is now automatically on (except in test mode)

1574 03/23/2012 03:52 PM Aaron Marcuse-Kubitza

bin/map: verbose mode defaults to off in test mode and on otherwise

1573 03/23/2012 03:48 PM Aaron Marcuse-Kubitza

bin/map: In verbose mode, print which input rows will be processed

1572 03/23/2012 03:40 PM Aaron Marcuse-Kubitza

bin/map: n option: Defaults to 1 in test mode. Empty string "" is interpreted as None (previously n would have to be unset to specify None).

1571 03/23/2012 03:32 PM Aaron Marcuse-Kubitza

bin/map: Added section comments to env var config retrieval. Reordered env var config retrieval to put DB config last, since these options are input-type specific and complex, and putting them first hides the more general other options.

1570 03/23/2012 03:31 PM Aaron Marcuse-Kubitza

bin/map: Added section comments to env var config retrieval. Reordered env var config retrieval to put DB config last, since these options are input-type specific and complex, and putting them first hides the more general other options.

1569 03/23/2012 03:29 PM Aaron Marcuse-Kubitza

inputs/SALVIAS*/maps/VegX.plots.csv: Updated _units for % -> decimal conversion to use new syntax

1568 03/23/2012 03:20 PM Aaron Marcuse-Kubitza

inputs/SALVIAS*/maps/VegX.plots.csv: Updated _units for % -> decimal conversion to use new syntax

1567 03/23/2012 03:19 PM Aaron Marcuse-Kubitza

xml_func.py: _units: If value can't be converted to float, wrap the ValueError in a SyntaxException

1566 03/23/2012 03:18 PM Aaron Marcuse-Kubitza

units.py: convert(): Added support for unit conversions. Added initial unit conversion for % -> unitless. str2quantity(): Fixed regexp to match % as units. Set Quantity.__repr__ to quantity2str.

1565 03/23/2012 03:03 PM Aaron Marcuse-Kubitza

units.py: convert(): Put "units None" test after "quantity.units units" test because a destination of no units might require a conversion for some input units (e.g. % -> unitless requires a division by 100)

1564 03/23/2012 02:51 PM Aaron Marcuse-Kubitza

inputs/SALVIAS*/maps/VegX.organisms.csv: Habit: Ignore invalid values instead of generating a SyntaxException

1563 03/23/2012 02:47 PM Aaron Marcuse-Kubitza

xml_dom.py: minidom modifications: Escape as many text strings as we use directly. This still leaves the tagName used by xml.dom.minidom.Element.writexml: It uses 'writer.write(indent+"<" + self.tagName)' and doesn't escape the tagName.

1562 03/23/2012 02:39 PM Aaron Marcuse-Kubitza

xml_func.py: Made everything Unicode-safe by using strings.ustr instead of str

1561 03/23/2012 12:48 PM Aaron Marcuse-Kubitza

schemas/tree_cross-links.sql: Added comment for how to get the namedplace trigger from the provided plantname trigger

1560 03/23/2012 12:44 PM Aaron Marcuse-Kubitza

vegbien.sql: Fixed bug in tree cross-link algorithm where recursion to descendants' ancestors did not use new to refer to the current node's plantname_id

1559 03/23/2012 12:39 PM Aaron Marcuse-Kubitza

vegbien.sql: Fixed bug in tree cross-link algorithm to also insert ancestors for top-level nodes, because they now need an ancestor entry for themselves

1558 03/23/2012 12:28 PM Aaron Marcuse-Kubitza

Added separate SQL file for tree cross-links code. A link to this can be e-mailed to people to review.

1557 03/23/2012 12:21 PM Aaron Marcuse-Kubitza

vegbien.sql: Modified tree cross-link algorithm to add an "ancestor" for this node. This is useful for queries, because you don't have to separately test if the leaf node is the one you're looking for, in addition to that leaf node's ancestors.

1556 03/22/2012 07:08 PM Aaron Marcuse-Kubitza

README.TXT: Added instructions how to stop all running imports

1555 03/22/2012 06:59 PM Aaron Marcuse-Kubitza

vegbien.sql: Added namedplace_update_ancestors and plantname_update_ancestors triggers to populate ancestor cross-links in new namedplace_ancestor and plantname_ancestor tables

1554 03/22/2012 06:07 PM Aaron Marcuse-Kubitza

sql.py: insert() (and try_insert()): Added optional returning param to provide name of an inserted column (usually pkey) to return

1553 03/22/2012 05:41 PM Aaron Marcuse-Kubitza

env_password: Print Usage message if run without initial "."

1552 03/22/2012 05:34 PM Aaron Marcuse-Kubitza

Added bin/stop_imports to stop all running imports

1551 03/22/2012 05:33 PM Aaron Marcuse-Kubitza

import_all: Print Usage message if was run without initial "."

1550 03/22/2012 04:52 PM Aaron Marcuse-Kubitza

Renamed import-all to import_all to match convention of using underscores

1549 03/22/2012 04:39 PM Aaron Marcuse-Kubitza

inputs/CTFS: Added remaining non-data src files

1548 03/22/2012 04:35 PM Aaron Marcuse-Kubitza

Added CTFS data dictionary inputs/CTFS/src/ctfs-comments_worksheet.xls

1547 03/22/2012 04:33 PM Aaron Marcuse-Kubitza

import-all: Fixed to display the datasource name in the job name instead of 'make ${input}import &'

1546 03/20/2012 11:13 PM Aaron Marcuse-Kubitza

import-all: disown each new import process to ignore SIGHUP

1545 03/20/2012 11:06 PM Aaron Marcuse-Kubitza

Added jobspecs to extract jobspecs (%#) from (possibly filtered) `jobs` output

1544 03/20/2012 11:05 PM Aaron Marcuse-Kubitza

README.TXT: Changed `make import &` to `. bin/import-all`

1543 03/20/2012 11:05 PM Aaron Marcuse-Kubitza

README.TXT: Changed `make import &` to `. bin/import-all`

1542 03/20/2012 10:39 PM Aaron Marcuse-Kubitza

main Makefile: import: Before running imports, print message that `. bin/import-all` can be used to import all inputs at once

1541 03/20/2012 10:38 PM Aaron Marcuse-Kubitza

Added import-all to import all inputs at once

1540 03/20/2012 10:20 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Mapped establishmentMeans, which contains growthform, iscultivated, isnative, etc. combined

1539 03/20/2012 10:11 PM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV/maps/VegX.organisms.csv: habit: Updated mapping to match equivalent SALVIAS mapping

1538 03/20/2012 10:10 PM Aaron Marcuse-Kubitza

xml_func.py: _map: Instead of _closed special entry, make all maps closed by default and open them if special entry "*=*" is present. Support using a _map to filter values by interpreting special entry "*=" as removing all values not explicitly specified, and by interpreting special value "*" as keeping input value the same.

1537 03/20/2012 10:08 PM Aaron Marcuse-Kubitza

xml_func.py: _map: Instead of _closed special entry, make all maps closed by default and open them if special entry "*=*" is present. Support using a _map to filter values by interpreting special entry "*=" as removing all values not explicitly specified, and by interpreting special value "*" as keeping input value the same.

1536 03/20/2012 09:19 PM Aaron Marcuse-Kubitza

xml_func.py: _date: On error "month must be in 1..12", try swapping month and day

1535 03/20/2012 09:13 PM Aaron Marcuse-Kubitza

xml_func.py: _date: On error "month must be in 1..12", try swapping month and day

1534 03/20/2012 08:36 PM Aaron Marcuse-Kubitza

row: Support getting multiple rows. Document that does not handle embedded newlines.

1533 03/20/2012 08:19 PM Aaron Marcuse-Kubitza

mappings/Makefile: Removed no longer needed DwC-VegBIEN.specimens.no_empty.csv

1532 03/20/2012 08:18 PM Aaron Marcuse-Kubitza

input.Makefile: Removed no longer needed $(join) command

1531 03/20/2012 08:15 PM Aaron Marcuse-Kubitza

input.Makefile: Removed no longer needed src join maps

1530 03/20/2012 08:12 PM Aaron Marcuse-Kubitza

input.Makefile: Generate VegBIEN maps from full via maps in order to include all input columns if a src map was provided. This causes the VegBIEN join process to produce all the "No join mapping" errors for that datasource, not just those for fields in the (non-full) via map. maps/src.join.*.csv should no longer be needed for producing "No join mapping" errors.

1529 03/20/2012 08:03 PM Aaron Marcuse-Kubitza

mappings/Makefile: Generate DwC-VegBIEN.specimens.csv from new intermediate DwC.ci-VegBIEN.specimens.csv using $(removeEmpty) so that "No join mapping" errors will be reported when maps are joined to it. Deprecate DwC-VegBIEN.specimens.no_empty.csv because it's now identical to DwC-VegBIEN.specimens.csv.

1528 03/20/2012 07:45 PM Aaron Marcuse-Kubitza

Added inputs/NY/maps/src.specimens.csv

1527 03/20/2012 07:41 PM Aaron Marcuse-Kubitza

Added reverse_join to inner-join two map spreadsheets in the opposite order they are specified in

1526 03/20/2012 07:36 PM Aaron Marcuse-Kubitza

input.Makefile: Intersect the generated VegBIEN and full via maps with the src map, if it exists. This reduces the size of the autogen maps significantly by including only the entries used by the datasource.

1525 03/20/2012 07:34 PM Aaron Marcuse-Kubitza

intersect: Compare columns based on specified compare_col_nums, just like subtract