Project

General

Profile

Statistics
| Revision:

# Date Author Comment
1557 03/23/2012 12:21 PM Aaron Marcuse-Kubitza

vegbien.sql: Modified tree cross-link algorithm to add an "ancestor" for this node. This is useful for queries, because you don't have to separately test if the leaf node is the one you're looking for, in addition to that leaf node's ancestors.

1556 03/22/2012 07:08 PM Aaron Marcuse-Kubitza

README.TXT: Added instructions how to stop all running imports

1555 03/22/2012 06:59 PM Aaron Marcuse-Kubitza

vegbien.sql: Added namedplace_update_ancestors and plantname_update_ancestors triggers to populate ancestor cross-links in new namedplace_ancestor and plantname_ancestor tables

1554 03/22/2012 06:07 PM Aaron Marcuse-Kubitza

sql.py: insert() (and try_insert()): Added optional returning param to provide name of an inserted column (usually pkey) to return

1553 03/22/2012 05:41 PM Aaron Marcuse-Kubitza

env_password: Print Usage message if run without initial "."

1552 03/22/2012 05:34 PM Aaron Marcuse-Kubitza

Added bin/stop_imports to stop all running imports

1551 03/22/2012 05:33 PM Aaron Marcuse-Kubitza

import_all: Print Usage message if was run without initial "."

1550 03/22/2012 04:52 PM Aaron Marcuse-Kubitza

Renamed import-all to import_all to match convention of using underscores

1549 03/22/2012 04:39 PM Aaron Marcuse-Kubitza

inputs/CTFS: Added remaining non-data src files

1548 03/22/2012 04:35 PM Aaron Marcuse-Kubitza

Added CTFS data dictionary inputs/CTFS/src/ctfs-comments_worksheet.xls

1547 03/22/2012 04:33 PM Aaron Marcuse-Kubitza

import-all: Fixed to display the datasource name in the job name instead of 'make ${input}import &'

1546 03/20/2012 11:13 PM Aaron Marcuse-Kubitza

import-all: disown each new import process to ignore SIGHUP

1545 03/20/2012 11:06 PM Aaron Marcuse-Kubitza

Added jobspecs to extract jobspecs (%#) from (possibly filtered) `jobs` output

1544 03/20/2012 11:05 PM Aaron Marcuse-Kubitza

README.TXT: Changed `make import &` to `. bin/import-all`

1543 03/20/2012 11:05 PM Aaron Marcuse-Kubitza

README.TXT: Changed `make import &` to `. bin/import-all`

1542 03/20/2012 10:39 PM Aaron Marcuse-Kubitza

main Makefile: import: Before running imports, print message that `. bin/import-all` can be used to import all inputs at once

1541 03/20/2012 10:38 PM Aaron Marcuse-Kubitza

Added import-all to import all inputs at once

1540 03/20/2012 10:20 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Mapped establishmentMeans, which contains growthform, iscultivated, isnative, etc. combined

1539 03/20/2012 10:11 PM Aaron Marcuse-Kubitza

inputs/SALVIAS-CSV/maps/VegX.organisms.csv: habit: Updated mapping to match equivalent SALVIAS mapping

1538 03/20/2012 10:10 PM Aaron Marcuse-Kubitza

xml_func.py: _map: Instead of _closed special entry, make all maps closed by default and open them if special entry "*=*" is present. Support using a _map to filter values by interpreting special entry "*=" as removing all values not explicitly specified, and by interpreting special value "*" as keeping input value the same.

1537 03/20/2012 10:08 PM Aaron Marcuse-Kubitza

xml_func.py: _map: Instead of _closed special entry, make all maps closed by default and open them if special entry "*=*" is present. Support using a _map to filter values by interpreting special entry "*=" as removing all values not explicitly specified, and by interpreting special value "*" as keeping input value the same.

1536 03/20/2012 09:19 PM Aaron Marcuse-Kubitza

xml_func.py: _date: On error "month must be in 1..12", try swapping month and day

1535 03/20/2012 09:13 PM Aaron Marcuse-Kubitza

xml_func.py: _date: On error "month must be in 1..12", try swapping month and day

1534 03/20/2012 08:36 PM Aaron Marcuse-Kubitza

row: Support getting multiple rows. Document that does not handle embedded newlines.

1533 03/20/2012 08:19 PM Aaron Marcuse-Kubitza

mappings/Makefile: Removed no longer needed DwC-VegBIEN.specimens.no_empty.csv

1532 03/20/2012 08:18 PM Aaron Marcuse-Kubitza

input.Makefile: Removed no longer needed $(join) command

1531 03/20/2012 08:15 PM Aaron Marcuse-Kubitza

input.Makefile: Removed no longer needed src join maps

1530 03/20/2012 08:12 PM Aaron Marcuse-Kubitza

input.Makefile: Generate VegBIEN maps from full via maps in order to include all input columns if a src map was provided. This causes the VegBIEN join process to produce all the "No join mapping" errors for that datasource, not just those for fields in the (non-full) via map. maps/src.join.*.csv should no longer be needed for producing "No join mapping" errors.

1529 03/20/2012 08:03 PM Aaron Marcuse-Kubitza

mappings/Makefile: Generate DwC-VegBIEN.specimens.csv from new intermediate DwC.ci-VegBIEN.specimens.csv using $(removeEmpty) so that "No join mapping" errors will be reported when maps are joined to it. Deprecate DwC-VegBIEN.specimens.no_empty.csv because it's now identical to DwC-VegBIEN.specimens.csv.

1528 03/20/2012 07:45 PM Aaron Marcuse-Kubitza

Added inputs/NY/maps/src.specimens.csv

1527 03/20/2012 07:41 PM Aaron Marcuse-Kubitza

Added reverse_join to inner-join two map spreadsheets in the opposite order they are specified in

1526 03/20/2012 07:36 PM Aaron Marcuse-Kubitza

input.Makefile: Intersect the generated VegBIEN and full via maps with the src map, if it exists. This reduces the size of the autogen maps significantly by including only the entries used by the datasource.

1525 03/20/2012 07:34 PM Aaron Marcuse-Kubitza

intersect: Compare columns based on specified compare_col_nums, just like subtract

1524 03/20/2012 06:50 PM Aaron Marcuse-Kubitza

input.Makefile: Use var $(selfMap) instead of spelling out $(bin)/cols 0 0

1523 03/20/2012 06:36 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Mapped continent

1522 03/20/2012 06:20 PM Aaron Marcuse-Kubitza

inputs/SpeciesLink/maps/DwC.specimens.csv: Mapped remaining fields

1521 03/20/2012 06:19 PM Aaron Marcuse-Kubitza

inputs/SpeciesLink/maps/DwC.specimens.csv: Mapped remaining fields

1520 03/20/2012 06:08 PM Aaron Marcuse-Kubitza

inputs/SpeciesLink/maps/src.specimens.csv: Fixed bug where prefixes had not been removed from fields, which prevented join mappings from being found for any of the fields

1519 03/20/2012 06:08 PM Aaron Marcuse-Kubitza

main Makefile: Added missing_joins to determine which input fields are missing join mappings

1518 03/20/2012 05:47 PM Aaron Marcuse-Kubitza

xml_func.py: SyntaxException: Inherit from exc.ExceptionWithCause so the traceback will be populated with the cause's traceback instead of the SyntaxException wrapper's traceback

1517 03/20/2012 05:35 PM Aaron Marcuse-Kubitza

Added inputs/UNCC/test with accepted test outputs

1516 03/20/2012 05:35 PM Aaron Marcuse-Kubitza

Added inputs/UNCC/maps

1515 03/20/2012 05:34 PM Aaron Marcuse-Kubitza

xml_func.py: _date: month: Convert month names to numbers before casting everything to int

1514 03/20/2012 05:27 PM Aaron Marcuse-Kubitza

xml_func.py: _date: Refactored to convert items to dict right away, and use iteritems() for later type conversion. This will enable month names to be converted before casting everything to int.

1513 03/20/2012 04:47 PM Aaron Marcuse-Kubitza

mappings/Makefile: Sort mappings/DwC.self.specimens.csv so that entries can more easily be found when using it as a DwC terms reference

1512 03/19/2012 09:55 PM Aaron Marcuse-Kubitza

Added inputs/UNCC

1511 03/19/2012 09:50 PM Aaron Marcuse-Kubitza

Added inputs/U/test with accepted test outputs

1510 03/19/2012 09:49 PM Aaron Marcuse-Kubitza

inputs/U/maps/DwC.specimens.csv: Mapped most of the remaining fields

1509 03/19/2012 09:34 PM Aaron Marcuse-Kubitza

input.Makefile: Clean up via maps when they change by subtracting the via format's self map from the via map (the comments column is ignored in determining which entries are redundant, and empty entries with a matching input column are also removed)

1508 03/19/2012 09:29 PM Aaron Marcuse-Kubitza

subtract: Fixed bug where entries were removed even if maps were not combinable and ignore was off

1507 03/19/2012 09:27 PM Aaron Marcuse-Kubitza

union: Fixed bug where combinable was not saved for use in deciding whether to add entries in map 1 that weren't already defined

1506 03/19/2012 09:25 PM Aaron Marcuse-Kubitza

inputs/U/maps: Set svn props

1505 03/19/2012 09:20 PM Aaron Marcuse-Kubitza

subtract: Also remove nonexplicit empty mappings whose input col is in map 1

1504 03/19/2012 09:15 PM Aaron Marcuse-Kubitza

maps.py: Added is_nonexplicit_empty_mapping()

1503 03/19/2012 09:03 PM Aaron Marcuse-Kubitza

subtract: Use new maps.combinable() to compare column headers, which allows more flexibility in combining maps

1502 03/19/2012 09:01 PM Aaron Marcuse-Kubitza

union: Use new maps.combinable()

1501 03/19/2012 09:01 PM Aaron Marcuse-Kubitza

maps.py: Added col_label() and combinable()

1500 03/19/2012 08:54 PM Aaron Marcuse-Kubitza

union: Use new strings.overlaps()

1499 03/19/2012 08:53 PM Aaron Marcuse-Kubitza

strings.py: Added overlaps()

1498 03/19/2012 08:46 PM Aaron Marcuse-Kubitza

vegbien.sql: Fixed sytnax error in taxonclass enum: missing comma at end of element

1497 03/19/2012 08:38 PM Aaron Marcuse-Kubitza

inputs/*/maps/DwC.specimens.csv: Ran through `cols *` to standardize CSV format to that generated by Python

1496 03/19/2012 08:35 PM Aaron Marcuse-Kubitza

cols: If column number of "*" given, get all columns

1495 03/19/2012 08:32 PM Aaron Marcuse-Kubitza

bin/subtract: If no compare columns given, compare on all columns instead of column 0

1494 03/19/2012 08:31 PM Aaron Marcuse-Kubitza

util.py: list_subset(): Support special idxs value None, which returns entire list

1493 03/19/2012 08:22 PM Aaron Marcuse-Kubitza

cat_csv: Added support for using - to cat stdin

1492 03/19/2012 08:18 PM Aaron Marcuse-Kubitza

Added inputs/U/maps

1491 03/19/2012 07:32 PM Aaron Marcuse-Kubitza

Added inputs/U

1490 03/19/2012 07:29 PM Aaron Marcuse-Kubitza

Put inputs/REMIB/src/remib_raw.0.header.specimens.txt under version control

1489 03/19/2012 07:24 PM Aaron Marcuse-Kubitza

Added inputs/REMIB/test with accepted test outputs

1488 03/19/2012 07:22 PM Aaron Marcuse-Kubitza

Added inputs/REMIB/maps

1487 03/19/2012 07:20 PM Aaron Marcuse-Kubitza

inputs/NCU-NCSC/maps/DwC.specimens.csv: Removed State->StateProvince mapping because that is now in mappings/DwC1-DwC2.specimens.csv

1486 03/19/2012 07:13 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema

1485 03/19/2012 06:51 PM Aaron Marcuse-Kubitza

Added inputs/REMIB

1484 03/19/2012 06:09 PM Aaron Marcuse-Kubitza

bin/map: Deal with fields that may be in the dataset under more than one prefix by getting all fields and coalesce()ing them (e.g. SpeciesLink has dwcore* and darwin1* columns for the same DwC field)

1483 03/19/2012 06:06 PM Aaron Marcuse-Kubitza

util.py: Added coalesce()

1482 03/19/2012 05:40 PM Aaron Marcuse-Kubitza

xpath_func.py: process(): Fixed bug where XPath elem's other_branches were not also processed

1481 03/19/2012 05:28 PM Aaron Marcuse-Kubitza

row: Don't prepend header row because this feature prevents the program from being used on a pipeline. Sheets may be constructed in a pipeline if multiple segments need to be joined, e.g. with cat_csv.

1480 03/19/2012 05:09 PM Aaron Marcuse-Kubitza

Added row to get a row of a spreadsheet, preceded by the header row

1479 03/19/2012 05:09 PM Aaron Marcuse-Kubitza

bin programs: Fixed bug in Usage message where program name was not printed because unset variable $self was used instead of $0

1478 03/19/2012 05:08 PM Aaron Marcuse-Kubitza

xml_func.py: _nullIf: types_by_name: Use strings.ustr instead of str to support Unicode values

1477 03/19/2012 04:40 PM Aaron Marcuse-Kubitza

xml_func.py: _nullIf: If value not convertible, return it, because can't equal null. Refactored to store types by name in a dict instead of using if statements.

1476 03/19/2012 04:31 PM Aaron Marcuse-Kubitza

units.py: convert(): raise MissingUnitsException if quantity doesn't have units. MissingUnitsException: Take Quantity input instead of str.

1475 03/19/2012 04:27 PM Aaron Marcuse-Kubitza

inputs/NCU-NCSC/maps/DwC.specimens.csv: "Cultivated?": For clarity, use _map instead of _if to translate boolean to "cultivated". Translate "No" to "wild" (the opposite of "cultivated") to store an explicit not-cultivated as such.

1474 03/19/2012 04:26 PM Aaron Marcuse-Kubitza

inputs/NCU-NCSC/maps/DwC.specimens.csv: "Cultivated?": For clarity, use _map instead of _if to translate boolean to "cultivated". Translate "No" to "wild" (the opposite of "cultivated") to store an explicit not-cultivated as such.

1473 03/19/2012 04:21 PM Aaron Marcuse-Kubitza

xml_func.py: _map: empty map entry means None

1472 03/19/2012 04:10 PM Aaron Marcuse-Kubitza

xml_func.py: _avg: Support empty inputs by returning None. Moved _range after _rangeStart/_rangeEnd since it's less frequently used.

1471 03/19/2012 04:07 PM Aaron Marcuse-Kubitza

units.py: Restructured to use a Quantity object for the units-tagged value and conversion functions quantity2str() and str2quantity() to convert between that and a raw string. Added convert() with basic support for removing units and passing through matching units. xml_func.py: _units: Added "to" attr. VegBIEN mappings: Remove units using new _units "to" attr instead of temporary workaround in _units.

1470 03/19/2012 03:13 PM Aaron Marcuse-Kubitza

xml_func.py: _units: default units attr renamed to default to clarify that it's not the units you're converting to

1469 03/19/2012 03:06 PM Aaron Marcuse-Kubitza

xml_func.py: Added documentation labels to each section of XML functions

1468 03/19/2012 03:01 PM Aaron Marcuse-Kubitza

Moved units-related functions from format.py to new units.py

1467 03/19/2012 02:55 PM Aaron Marcuse-Kubitza

lib/*.py: Removed svn:executable property to turn execute bit off

1466 03/19/2012 02:45 PM Aaron Marcuse-Kubitza

vegbien.sql: growthform (and taxonclass) enum: Added options suggested by Michael Lee. Removed "woody". establishmentmeans_dwc (and taxonclass) enum: Reordered to match order of taxonoccurrence boolean fields, and to place each option next to its opposite. taxonclass enum: Moved "woody" to bottom because it's no longer part of growthform.

1465 03/18/2012 09:10 PM Aaron Marcuse-Kubitza

VegBIEN mappings: distance fields: Remove units

1464 03/18/2012 09:08 PM Aaron Marcuse-Kubitza

xml_func.py: _units: Allow value to be NULL

1463 03/18/2012 08:44 PM Aaron Marcuse-Kubitza

xml_func.py: _units: Use new format.cleanup_units() to do units parsing

1462 03/18/2012 08:43 PM Aaron Marcuse-Kubitza

format.py: Added clean_numeric(), str2int(), str2float(). Added units-related functions. Added documentation labels to each section.

1461 03/18/2012 06:42 PM Aaron Marcuse-Kubitza

Added filter_errors to filters `map` error messages

1460 03/18/2012 06:40 PM Aaron Marcuse-Kubitza

Renamed bin/errors_filter_* to filter_errors_* to sound more natural and to have a different prefix than error_stats so that both can easily be tab-completed at the command line

1459 03/18/2012 06:27 PM Aaron Marcuse-Kubitza

README.TXT: Testing: Added instructions for testing just mapping process, just map spreadsheet generation, and everything

1458 03/18/2012 06:26 PM Aaron Marcuse-Kubitza

root Makefile: Added test-all for most complete coverage. Removed extraneous ";" at the end of the prerequisites line of rules with a recipe.