Project

General

Profile

Statistics
| Revision:

# Date Author Comment
1497 03/19/2012 08:38 PM Aaron Marcuse-Kubitza

inputs/*/maps/DwC.specimens.csv: Ran through `cols *` to standardize CSV format to that generated by Python

1496 03/19/2012 08:35 PM Aaron Marcuse-Kubitza

cols: If column number of "*" given, get all columns

1495 03/19/2012 08:32 PM Aaron Marcuse-Kubitza

bin/subtract: If no compare columns given, compare on all columns instead of column 0

1494 03/19/2012 08:31 PM Aaron Marcuse-Kubitza

util.py: list_subset(): Support special idxs value None, which returns entire list

1493 03/19/2012 08:22 PM Aaron Marcuse-Kubitza

cat_csv: Added support for using - to cat stdin

1492 03/19/2012 08:18 PM Aaron Marcuse-Kubitza

Added inputs/U/maps

1491 03/19/2012 07:32 PM Aaron Marcuse-Kubitza

Added inputs/U

1490 03/19/2012 07:29 PM Aaron Marcuse-Kubitza

Put inputs/REMIB/src/remib_raw.0.header.specimens.txt under version control

1489 03/19/2012 07:24 PM Aaron Marcuse-Kubitza

Added inputs/REMIB/test with accepted test outputs

1488 03/19/2012 07:22 PM Aaron Marcuse-Kubitza

Added inputs/REMIB/maps

1487 03/19/2012 07:20 PM Aaron Marcuse-Kubitza

inputs/NCU-NCSC/maps/DwC.specimens.csv: Removed State->StateProvince mapping because that is now in mappings/DwC1-DwC2.specimens.csv

1486 03/19/2012 07:13 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema

1485 03/19/2012 06:51 PM Aaron Marcuse-Kubitza

Added inputs/REMIB

1484 03/19/2012 06:09 PM Aaron Marcuse-Kubitza

bin/map: Deal with fields that may be in the dataset under more than one prefix by getting all fields and coalesce()ing them (e.g. SpeciesLink has dwcore* and darwin1* columns for the same DwC field)

1483 03/19/2012 06:06 PM Aaron Marcuse-Kubitza

util.py: Added coalesce()

1482 03/19/2012 05:40 PM Aaron Marcuse-Kubitza

xpath_func.py: process(): Fixed bug where XPath elem's other_branches were not also processed

1481 03/19/2012 05:28 PM Aaron Marcuse-Kubitza

row: Don't prepend header row because this feature prevents the program from being used on a pipeline. Sheets may be constructed in a pipeline if multiple segments need to be joined, e.g. with cat_csv.

1480 03/19/2012 05:09 PM Aaron Marcuse-Kubitza

Added row to get a row of a spreadsheet, preceded by the header row

1479 03/19/2012 05:09 PM Aaron Marcuse-Kubitza

bin programs: Fixed bug in Usage message where program name was not printed because unset variable $self was used instead of $0

1478 03/19/2012 05:08 PM Aaron Marcuse-Kubitza

xml_func.py: _nullIf: types_by_name: Use strings.ustr instead of str to support Unicode values

1477 03/19/2012 04:40 PM Aaron Marcuse-Kubitza

xml_func.py: _nullIf: If value not convertible, return it, because can't equal null. Refactored to store types by name in a dict instead of using if statements.

1476 03/19/2012 04:31 PM Aaron Marcuse-Kubitza

units.py: convert(): raise MissingUnitsException if quantity doesn't have units. MissingUnitsException: Take Quantity input instead of str.

1475 03/19/2012 04:27 PM Aaron Marcuse-Kubitza

inputs/NCU-NCSC/maps/DwC.specimens.csv: "Cultivated?": For clarity, use _map instead of _if to translate boolean to "cultivated". Translate "No" to "wild" (the opposite of "cultivated") to store an explicit not-cultivated as such.

1474 03/19/2012 04:26 PM Aaron Marcuse-Kubitza

inputs/NCU-NCSC/maps/DwC.specimens.csv: "Cultivated?": For clarity, use _map instead of _if to translate boolean to "cultivated". Translate "No" to "wild" (the opposite of "cultivated") to store an explicit not-cultivated as such.

1473 03/19/2012 04:21 PM Aaron Marcuse-Kubitza

xml_func.py: _map: empty map entry means None

1472 03/19/2012 04:10 PM Aaron Marcuse-Kubitza

xml_func.py: _avg: Support empty inputs by returning None. Moved _range after _rangeStart/_rangeEnd since it's less frequently used.

1471 03/19/2012 04:07 PM Aaron Marcuse-Kubitza

units.py: Restructured to use a Quantity object for the units-tagged value and conversion functions quantity2str() and str2quantity() to convert between that and a raw string. Added convert() with basic support for removing units and passing through matching units. xml_func.py: _units: Added "to" attr. VegBIEN mappings: Remove units using new _units "to" attr instead of temporary workaround in _units.

1470 03/19/2012 03:13 PM Aaron Marcuse-Kubitza

xml_func.py: _units: default units attr renamed to default to clarify that it's not the units you're converting to

1469 03/19/2012 03:06 PM Aaron Marcuse-Kubitza

xml_func.py: Added documentation labels to each section of XML functions

1468 03/19/2012 03:01 PM Aaron Marcuse-Kubitza

Moved units-related functions from format.py to new units.py

1467 03/19/2012 02:55 PM Aaron Marcuse-Kubitza

lib/*.py: Removed svn:executable property to turn execute bit off

1466 03/19/2012 02:45 PM Aaron Marcuse-Kubitza

vegbien.sql: growthform (and taxonclass) enum: Added options suggested by Michael Lee. Removed "woody". establishmentmeans_dwc (and taxonclass) enum: Reordered to match order of taxonoccurrence boolean fields, and to place each option next to its opposite. taxonclass enum: Moved "woody" to bottom because it's no longer part of growthform.

1465 03/18/2012 09:10 PM Aaron Marcuse-Kubitza

VegBIEN mappings: distance fields: Remove units

1464 03/18/2012 09:08 PM Aaron Marcuse-Kubitza

xml_func.py: _units: Allow value to be NULL

1463 03/18/2012 08:44 PM Aaron Marcuse-Kubitza

xml_func.py: _units: Use new format.cleanup_units() to do units parsing

1462 03/18/2012 08:43 PM Aaron Marcuse-Kubitza

format.py: Added clean_numeric(), str2int(), str2float(). Added units-related functions. Added documentation labels to each section.

1461 03/18/2012 06:42 PM Aaron Marcuse-Kubitza

Added filter_errors to filters `map` error messages

1460 03/18/2012 06:40 PM Aaron Marcuse-Kubitza

Renamed bin/errors_filter_* to filter_errors_* to sound more natural and to have a different prefix than error_stats so that both can easily be tab-completed at the command line

1459 03/18/2012 06:27 PM Aaron Marcuse-Kubitza

README.TXT: Testing: Added instructions for testing just mapping process, just map spreadsheet generation, and everything

1458 03/18/2012 06:26 PM Aaron Marcuse-Kubitza

root Makefile: Added test-all for most complete coverage. Removed extraneous ";" at the end of the prerequisites line of rules with a recipe.

1457 03/18/2012 06:02 PM Aaron Marcuse-Kubitza

mappings/Makefile: Use new ci_map to make DwC.cs-VegBIEN.specimens.csv case-insensitive

1456 03/18/2012 06:02 PM Aaron Marcuse-Kubitza

Added ci_map to make a map spreadsheet case-insensitive.

1455 03/18/2012 05:53 PM Aaron Marcuse-Kubitza

mappings: DwC: Generate case-insensitive map of DwC1 and DwC2 together, rather than just DwC2. DwC1-DwC2.specimens.csv: Make input columns lowercase so that case-insensitization will work properly.

1454 03/18/2012 05:52 PM Aaron Marcuse-Kubitza

inputs/SpeciesLink: Switched to using flat files instead of DB

1453 03/18/2012 05:52 PM Aaron Marcuse-Kubitza

inputs/MO: Switched to using flat files instead of DB

1452 03/18/2012 05:51 PM Aaron Marcuse-Kubitza

mappings: DwC: Generate case-insensitive map of DwC1 and DwC2 together, rather than just DwC2. DwC1-DwC2.specimens.csv: Make input columns lowercase so that case-insensitization will work properly.

1451 03/18/2012 04:55 PM Aaron Marcuse-Kubitza

input.Makefile: Mapping: Support multiple segments of a source table flat file. Use with_cat_csv if flat file segment(s) are available; otherwise use the input file in $+ or the input database, if any. Don't look for an explicit CSV header file because it can now be handled as the first segment if appropriately named.

1450 03/18/2012 04:50 PM Aaron Marcuse-Kubitza

Added with_cat_csv

1449 03/18/2012 04:50 PM Aaron Marcuse-Kubitza

with_cat: Added support for custom cat command in env var

1448 03/18/2012 04:49 PM Aaron Marcuse-Kubitza

cat_csv: Abort if output stream closed instead of exiting with an IOError

1447 03/18/2012 04:16 PM Aaron Marcuse-Kubitza

cat_csv: Ignore any duplicated headers instead of requiring each CSV to have a header identical to the first. Rewrote to pass the CSVs through as lines rather than parsing each row. Because the CSVs are not parsed, checked that all CSVs have the same dialect.

1446 03/18/2012 04:14 PM Aaron Marcuse-Kubitza

csvs.py: Added csv modifications to compare Dialect instances

1445 03/18/2012 04:13 PM Aaron Marcuse-Kubitza

util.py: Added classes_eq()

1444 03/16/2012 06:25 PM Aaron Marcuse-Kubitza

csvs.py: Added stream_info() to return NamedTuple {header_line, dialect} for later use in cat_csv. Changed reader_and_header() to use stream_info().

1443 03/16/2012 06:23 PM Aaron Marcuse-Kubitza

util.py: Added NamedTuple

1442 03/16/2012 06:04 PM Aaron Marcuse-Kubitza

csvs.py: reader_and_header(): Restrict delimiters to common delimiters so that e.g. letters are not considered delimiters just because they appear frequently

1441 03/16/2012 05:38 PM Aaron Marcuse-Kubitza

Renamed inputs/NYBG to inputs/NY to match herbarium code

1440 03/16/2012 05:35 PM Aaron Marcuse-Kubitza

Renamed inputs/UNC-NCSC to inputs/NCU-NCSC to match herbarium code

1439 03/16/2012 05:32 PM Aaron Marcuse-Kubitza

Renamed inputs/UArizona to inputs/ARIZ to match herbarium code

1438 03/16/2012 05:31 PM Aaron Marcuse-Kubitza

Regenerated inputs/MO/maps/src.join.specimens.csv

1437 03/16/2012 05:26 PM Aaron Marcuse-Kubitza

Renamed inputs/MOBOT to inputs/MO to match herbarium code

1436 03/16/2012 05:11 PM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

1435 03/16/2012 05:08 PM Aaron Marcuse-Kubitza

vegbien.sql: taxonoccurrence: Added cultivatedbasis

1434 03/16/2012 05:03 PM Aaron Marcuse-Kubitza

vegbien.sql: Moved all accessioncode fields to the bottom of their tables. vegbien.ERD.mwb: Adjusted lines to remove overlaps.

1433 03/16/2012 04:52 PM Aaron Marcuse-Kubitza

vegbien.sql: taxonoccurrence: Added iscultivated, isnative. Moved accessioncode to bottom.

1432 03/16/2012 04:36 PM Aaron Marcuse-Kubitza

vegbien.sql: Changed taxonoccurrence.growthform type to more specific growthform

1431 03/16/2012 04:34 PM Aaron Marcuse-Kubitza

vegbien.sql: Added growthform and establishmentmeans_dwc enums using values from taxonclass. Documented that taxonclass is growthform + establishmentmeans_dwc + some other values.

1430 03/16/2012 04:22 PM Aaron Marcuse-Kubitza

VegBIEN: Moved aggregateoccurrence.growthform to taxonoccurrence

1429 03/16/2012 04:21 PM Aaron Marcuse-Kubitza

Added inputs/UNC-NCSC/maps/src.join.specimens.csv

1428 03/16/2012 04:15 PM Aaron Marcuse-Kubitza

VegBIEN: Merged aggregateoccurrence.verbatimcollectorname and specimenreplicate.verbatimcollectorname into taxonoccurrence

1427 03/16/2012 03:58 PM Aaron Marcuse-Kubitza

xml_func.py: parse_range(): Handle negative numbers by treating them as not a range

1426 03/16/2012 03:31 PM Aaron Marcuse-Kubitza

Added inputs/UNC-NCSC/test with initial accepted test outputs

1425 03/16/2012 03:31 PM Aaron Marcuse-Kubitza

Added inputs/UNC-NCSC/maps

1424 03/16/2012 03:31 PM Aaron Marcuse-Kubitza

xml_func.py: _replace: Fixed bug where value entry was not unpacked

1423 03/16/2012 12:36 PM Aaron Marcuse-Kubitza

Added inputs/UNC-NCSC

1422 03/15/2012 07:12 PM Aaron Marcuse-Kubitza

Added inputs/MOBOT/test with initial accepted test outputs

1421 03/15/2012 07:11 PM Aaron Marcuse-Kubitza

Added inputs/MOBOT/maps

1420 03/15/2012 06:51 PM Aaron Marcuse-Kubitza

Added inputs/MOBOT

1419 03/15/2012 06:41 PM Aaron Marcuse-Kubitza

VegX mappings: Updated plot place mappings to VegX 1.5.3 method of place type-tagged place names. This removes the userdef fields in plot.

1418 03/15/2012 06:18 PM Aaron Marcuse-Kubitza

VegX mappings: Changed userdef xPosition, yPosition to /relativePlotPosition/relativeX, /relativePlotPosition/relativeY

1417 03/15/2012 06:16 PM Aaron Marcuse-Kubitza

Regenerated mappings/DwC-VegBIEN.specimens.no_empty.csv

1416 03/15/2012 05:36 PM Aaron Marcuse-Kubitza

bin/map: map_table(): wrap_row(): Use util.list_as_length() to handle CSV rows of different lengths

1415 03/15/2012 05:35 PM Aaron Marcuse-Kubitza

util.py: Added list_as_length(). Documented that list_set_length() takes a list, not a tuple. Documented that ListDict must have len(list_) == len(keys).

1414 03/15/2012 05:19 PM Aaron Marcuse-Kubitza

util.py: Added list_set_length(). Changed list_set() to use list_set_length().

1413 03/13/2012 07:48 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Added empty *_id/taxonoccurrence attr to primary keys to ensure that a taxonoccurrence is always created for the specimenreplicate

1412 03/13/2012 07:41 PM Aaron Marcuse-Kubitza

xml_func.py: _label: Use ustr instead of str when checking types

1411 03/13/2012 07:41 PM Aaron Marcuse-Kubitza

csvs.py: Set dialect.doublequote to True because Sniffer doesn't turn this on by default

1410 03/13/2012 07:23 PM Aaron Marcuse-Kubitza

Merged inputs/NYBG-CSV into NYBG

1409 03/13/2012 07:16 PM Aaron Marcuse-Kubitza

Merged inputs/UArizona-CSV into UArizona

1408 03/13/2012 07:02 PM Aaron Marcuse-Kubitza

Added inputs/SpeciesLink/test

1407 03/13/2012 07:02 PM Aaron Marcuse-Kubitza

Added inputs/SpeciesLink/maps

1406 03/13/2012 07:02 PM Aaron Marcuse-Kubitza

xml_func.py: range-related funcs: Made inputs optional in case they get set to NULL by _nullIf

1405 03/13/2012 06:48 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema

1404 03/13/2012 06:31 PM Aaron Marcuse-Kubitza

bin/map: Added support for getting columns with an optional prefix list for DB/CSV inputs

1403 03/13/2012 06:21 PM Aaron Marcuse-Kubitza

bin/map: Factored out code common to DB and CSV inputs into map_table()

1402 03/13/2012 06:00 PM Aaron Marcuse-Kubitza

bin/map: Parse any prefixes in map input column name. They will later be used to check for versions of columns with a prefix added when processing CSV/DB inputs.

1401 03/13/2012 05:58 PM Aaron Marcuse-Kubitza

strings.py: Added split(), remove_prefix(), remove_suffix(), and remove_prefixes(). Added section comments.

1400 03/13/2012 05:06 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: minimumElevationInMeters: Handle embedded ranges using _rangeStart and _rangeEnd

1399 03/13/2012 05:05 PM Aaron Marcuse-Kubitza

xml_func.py: Added _rangeStart and _rangeEnd

1398 03/13/2012 05:04 PM Aaron Marcuse-Kubitza

xpath.py: parse(): Split paths: Raise a SyntaxException if can't attach a split path because there is no parent element to attach to