/ - Changes - BIEN 3 - NCEAS Projects

root @ 1497

#	Date	Author	Comment
1497	03/19/2012 08:38 PM	Aaron Marcuse-Kubitza	inputs//maps/DwC.specimens.csv: Ran through `cols ` to standardize CSV format to that generated by Python
1496	03/19/2012 08:35 PM	Aaron Marcuse-Kubitza	cols: If column number of "*" given, get all columns
1495	03/19/2012 08:32 PM	Aaron Marcuse-Kubitza	bin/subtract: If no compare columns given, compare on all columns instead of column 0
1494	03/19/2012 08:31 PM	Aaron Marcuse-Kubitza	util.py: list_subset(): Support special idxs value None, which returns entire list
1493	03/19/2012 08:22 PM	Aaron Marcuse-Kubitza	cat_csv: Added support for using - to cat stdin
1492	03/19/2012 08:18 PM	Aaron Marcuse-Kubitza	Added inputs/U/maps
1491	03/19/2012 07:32 PM	Aaron Marcuse-Kubitza	Added inputs/U
1490	03/19/2012 07:29 PM	Aaron Marcuse-Kubitza	Put inputs/REMIB/src/remib_raw.0.header.specimens.txt under version control
1489	03/19/2012 07:24 PM	Aaron Marcuse-Kubitza	Added inputs/REMIB/test with accepted test outputs
1488	03/19/2012 07:22 PM	Aaron Marcuse-Kubitza	Added inputs/REMIB/maps
1487	03/19/2012 07:20 PM	Aaron Marcuse-Kubitza	inputs/NCU-NCSC/maps/DwC.specimens.csv: Removed State->StateProvince mapping because that is now in mappings/DwC1-DwC2.specimens.csv
1486	03/19/2012 07:13 PM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema
1485	03/19/2012 06:51 PM	Aaron Marcuse-Kubitza	Added inputs/REMIB
1484	03/19/2012 06:09 PM	Aaron Marcuse-Kubitza	bin/map: Deal with fields that may be in the dataset under more than one prefix by getting all fields and coalesce()ing them (e.g. SpeciesLink has dwcore* and darwin1* columns for the same DwC field)
1483	03/19/2012 06:06 PM	Aaron Marcuse-Kubitza	util.py: Added coalesce()
1482	03/19/2012 05:40 PM	Aaron Marcuse-Kubitza	xpath_func.py: process(): Fixed bug where XPath elem's other_branches were not also processed
1481	03/19/2012 05:28 PM	Aaron Marcuse-Kubitza	row: Don't prepend header row because this feature prevents the program from being used on a pipeline. Sheets may be constructed in a pipeline if multiple segments need to be joined, e.g. with cat_csv.
1480	03/19/2012 05:09 PM	Aaron Marcuse-Kubitza	Added row to get a row of a spreadsheet, preceded by the header row
1479	03/19/2012 05:09 PM	Aaron Marcuse-Kubitza	bin programs: Fixed bug in Usage message where program name was not printed because unset variable $self was used instead of $0
1478	03/19/2012 05:08 PM	Aaron Marcuse-Kubitza	xml_func.py: _nullIf: types_by_name: Use strings.ustr instead of str to support Unicode values
1477	03/19/2012 04:40 PM	Aaron Marcuse-Kubitza	xml_func.py: _nullIf: If value not convertible, return it, because can't equal null. Refactored to store types by name in a dict instead of using if statements.
1476	03/19/2012 04:31 PM	Aaron Marcuse-Kubitza	units.py: convert(): raise MissingUnitsException if quantity doesn't have units. MissingUnitsException: Take Quantity input instead of str.
1475	03/19/2012 04:27 PM	Aaron Marcuse-Kubitza	inputs/NCU-NCSC/maps/DwC.specimens.csv: "Cultivated?": For clarity, use _map instead of _if to translate boolean to "cultivated". Translate "No" to "wild" (the opposite of "cultivated") to store an explicit not-cultivated as such.
1474	03/19/2012 04:26 PM	Aaron Marcuse-Kubitza	inputs/NCU-NCSC/maps/DwC.specimens.csv: "Cultivated?": For clarity, use _map instead of _if to translate boolean to "cultivated". Translate "No" to "wild" (the opposite of "cultivated") to store an explicit not-cultivated as such.
1473	03/19/2012 04:21 PM	Aaron Marcuse-Kubitza	xml_func.py: _map: empty map entry means None
1472	03/19/2012 04:10 PM	Aaron Marcuse-Kubitza	xml_func.py: _avg: Support empty inputs by returning None. Moved _range after _rangeStart/_rangeEnd since it's less frequently used.
1471	03/19/2012 04:07 PM	Aaron Marcuse-Kubitza	units.py: Restructured to use a Quantity object for the units-tagged value and conversion functions quantity2str() and str2quantity() to convert between that and a raw string. Added convert() with basic support for removing units and passing through matching units. xml_func.py: _units: Added "to" attr. VegBIEN mappings: Remove units using new _units "to" attr instead of temporary workaround in _units.
1470	03/19/2012 03:13 PM	Aaron Marcuse-Kubitza	xml_func.py: _units: default units attr renamed to default to clarify that it's not the units you're converting to
1469	03/19/2012 03:06 PM	Aaron Marcuse-Kubitza	xml_func.py: Added documentation labels to each section of XML functions
1468	03/19/2012 03:01 PM	Aaron Marcuse-Kubitza	Moved units-related functions from format.py to new units.py
1467	03/19/2012 02:55 PM	Aaron Marcuse-Kubitza	lib/*.py: Removed svn:executable property to turn execute bit off
1466	03/19/2012 02:45 PM	Aaron Marcuse-Kubitza	vegbien.sql: growthform (and taxonclass) enum: Added options suggested by Michael Lee. Removed "woody". establishmentmeans_dwc (and taxonclass) enum: Reordered to match order of taxonoccurrence boolean fields, and to place each option next to its opposite. taxonclass enum: Moved "woody" to bottom because it's no longer part of growthform.
1465	03/18/2012 09:10 PM	Aaron Marcuse-Kubitza	VegBIEN mappings: distance fields: Remove units
1464	03/18/2012 09:08 PM	Aaron Marcuse-Kubitza	xml_func.py: _units: Allow value to be NULL
1463	03/18/2012 08:44 PM	Aaron Marcuse-Kubitza	xml_func.py: _units: Use new format.cleanup_units() to do units parsing
1462	03/18/2012 08:43 PM	Aaron Marcuse-Kubitza	format.py: Added clean_numeric(), str2int(), str2float(). Added units-related functions. Added documentation labels to each section.
1461	03/18/2012 06:42 PM	Aaron Marcuse-Kubitza	Added filter_errors to filters `map` error messages
1460	03/18/2012 06:40 PM	Aaron Marcuse-Kubitza	Renamed bin/errors_filter_* to filter_errors_* to sound more natural and to have a different prefix than error_stats so that both can easily be tab-completed at the command line
1459	03/18/2012 06:27 PM	Aaron Marcuse-Kubitza	README.TXT: Testing: Added instructions for testing just mapping process, just map spreadsheet generation, and everything
1458	03/18/2012 06:26 PM	Aaron Marcuse-Kubitza	root Makefile: Added test-all for most complete coverage. Removed extraneous ";" at the end of the prerequisites line of rules with a recipe.
1457	03/18/2012 06:02 PM	Aaron Marcuse-Kubitza	mappings/Makefile: Use new ci_map to make DwC.cs-VegBIEN.specimens.csv case-insensitive
1456	03/18/2012 06:02 PM	Aaron Marcuse-Kubitza	Added ci_map to make a map spreadsheet case-insensitive.
1455	03/18/2012 05:53 PM	Aaron Marcuse-Kubitza	mappings: DwC: Generate case-insensitive map of DwC1 and DwC2 together, rather than just DwC2. DwC1-DwC2.specimens.csv: Make input columns lowercase so that case-insensitization will work properly.
1454	03/18/2012 05:52 PM	Aaron Marcuse-Kubitza	inputs/SpeciesLink: Switched to using flat files instead of DB
1453	03/18/2012 05:52 PM	Aaron Marcuse-Kubitza	inputs/MO: Switched to using flat files instead of DB
1452	03/18/2012 05:51 PM	Aaron Marcuse-Kubitza	mappings: DwC: Generate case-insensitive map of DwC1 and DwC2 together, rather than just DwC2. DwC1-DwC2.specimens.csv: Make input columns lowercase so that case-insensitization will work properly.
1451	03/18/2012 04:55 PM	Aaron Marcuse-Kubitza	input.Makefile: Mapping: Support multiple segments of a source table flat file. Use with_cat_csv if flat file segment(s) are available; otherwise use the input file in $+ or the input database, if any. Don't look for an explicit CSV header file because it can now be handled as the first segment if appropriately named.
1450	03/18/2012 04:50 PM	Aaron Marcuse-Kubitza	Added with_cat_csv
1449	03/18/2012 04:50 PM	Aaron Marcuse-Kubitza	with_cat: Added support for custom cat command in env var
1448	03/18/2012 04:49 PM	Aaron Marcuse-Kubitza	cat_csv: Abort if output stream closed instead of exiting with an IOError
1447	03/18/2012 04:16 PM	Aaron Marcuse-Kubitza	cat_csv: Ignore any duplicated headers instead of requiring each CSV to have a header identical to the first. Rewrote to pass the CSVs through as lines rather than parsing each row. Because the CSVs are not parsed, checked that all CSVs have the same dialect.
1446	03/18/2012 04:14 PM	Aaron Marcuse-Kubitza	csvs.py: Added csv modifications to compare Dialect instances
1445	03/18/2012 04:13 PM	Aaron Marcuse-Kubitza	util.py: Added classes_eq()
1444	03/16/2012 06:25 PM	Aaron Marcuse-Kubitza	csvs.py: Added stream_info() to return NamedTuple {header_line, dialect} for later use in cat_csv. Changed reader_and_header() to use stream_info().
1443	03/16/2012 06:23 PM	Aaron Marcuse-Kubitza	util.py: Added NamedTuple
1442	03/16/2012 06:04 PM	Aaron Marcuse-Kubitza	csvs.py: reader_and_header(): Restrict delimiters to common delimiters so that e.g. letters are not considered delimiters just because they appear frequently
1441	03/16/2012 05:38 PM	Aaron Marcuse-Kubitza	Renamed inputs/NYBG to inputs/NY to match herbarium code
1440	03/16/2012 05:35 PM	Aaron Marcuse-Kubitza	Renamed inputs/UNC-NCSC to inputs/NCU-NCSC to match herbarium code
1439	03/16/2012 05:32 PM	Aaron Marcuse-Kubitza	Renamed inputs/UArizona to inputs/ARIZ to match herbarium code
1438	03/16/2012 05:31 PM	Aaron Marcuse-Kubitza	Regenerated inputs/MO/maps/src.join.specimens.csv
1437	03/16/2012 05:26 PM	Aaron Marcuse-Kubitza	Renamed inputs/MOBOT to inputs/MO to match herbarium code
1436	03/16/2012 05:11 PM	Aaron Marcuse-Kubitza	Regenerated vegbien.ERD exports
1435	03/16/2012 05:08 PM	Aaron Marcuse-Kubitza	vegbien.sql: taxonoccurrence: Added cultivatedbasis
1434	03/16/2012 05:03 PM	Aaron Marcuse-Kubitza	vegbien.sql: Moved all accessioncode fields to the bottom of their tables. vegbien.ERD.mwb: Adjusted lines to remove overlaps.
1433	03/16/2012 04:52 PM	Aaron Marcuse-Kubitza	vegbien.sql: taxonoccurrence: Added iscultivated, isnative. Moved accessioncode to bottom.
1432	03/16/2012 04:36 PM	Aaron Marcuse-Kubitza	vegbien.sql: Changed taxonoccurrence.growthform type to more specific growthform
1431	03/16/2012 04:34 PM	Aaron Marcuse-Kubitza	vegbien.sql: Added growthform and establishmentmeans_dwc enums using values from taxonclass. Documented that taxonclass is growthform + establishmentmeans_dwc + some other values.
1430	03/16/2012 04:22 PM	Aaron Marcuse-Kubitza	VegBIEN: Moved aggregateoccurrence.growthform to taxonoccurrence
1429	03/16/2012 04:21 PM	Aaron Marcuse-Kubitza	Added inputs/UNC-NCSC/maps/src.join.specimens.csv
1428	03/16/2012 04:15 PM	Aaron Marcuse-Kubitza	VegBIEN: Merged aggregateoccurrence.verbatimcollectorname and specimenreplicate.verbatimcollectorname into taxonoccurrence
1427	03/16/2012 03:58 PM	Aaron Marcuse-Kubitza	xml_func.py: parse_range(): Handle negative numbers by treating them as not a range
1426	03/16/2012 03:31 PM	Aaron Marcuse-Kubitza	Added inputs/UNC-NCSC/test with initial accepted test outputs
1425	03/16/2012 03:31 PM	Aaron Marcuse-Kubitza	Added inputs/UNC-NCSC/maps
1424	03/16/2012 03:31 PM	Aaron Marcuse-Kubitza	xml_func.py: _replace: Fixed bug where value entry was not unpacked
1423	03/16/2012 12:36 PM	Aaron Marcuse-Kubitza	Added inputs/UNC-NCSC
1422	03/15/2012 07:12 PM	Aaron Marcuse-Kubitza	Added inputs/MOBOT/test with initial accepted test outputs
1421	03/15/2012 07:11 PM	Aaron Marcuse-Kubitza	Added inputs/MOBOT/maps
1420	03/15/2012 06:51 PM	Aaron Marcuse-Kubitza	Added inputs/MOBOT
1419	03/15/2012 06:41 PM	Aaron Marcuse-Kubitza	VegX mappings: Updated plot place mappings to VegX 1.5.3 method of place type-tagged place names. This removes the userdef fields in plot.
1418	03/15/2012 06:18 PM	Aaron Marcuse-Kubitza	VegX mappings: Changed userdef xPosition, yPosition to /relativePlotPosition/relativeX, /relativePlotPosition/relativeY
1417	03/15/2012 06:16 PM	Aaron Marcuse-Kubitza	Regenerated mappings/DwC-VegBIEN.specimens.no_empty.csv
1416	03/15/2012 05:36 PM	Aaron Marcuse-Kubitza	bin/map: map_table(): wrap_row(): Use util.list_as_length() to handle CSV rows of different lengths
1415	03/15/2012 05:35 PM	Aaron Marcuse-Kubitza	util.py: Added list_as_length(). Documented that list_set_length() takes a list, not a tuple. Documented that ListDict must have len(list_) == len(keys).
1414	03/15/2012 05:19 PM	Aaron Marcuse-Kubitza	util.py: Added list_set_length(). Changed list_set() to use list_set_length().
1413	03/13/2012 07:48 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: Added empty *_id/taxonoccurrence attr to primary keys to ensure that a taxonoccurrence is always created for the specimenreplicate
1412	03/13/2012 07:41 PM	Aaron Marcuse-Kubitza	xml_func.py: _label: Use ustr instead of str when checking types
1411	03/13/2012 07:41 PM	Aaron Marcuse-Kubitza	csvs.py: Set dialect.doublequote to True because Sniffer doesn't turn this on by default
1410	03/13/2012 07:23 PM	Aaron Marcuse-Kubitza	Merged inputs/NYBG-CSV into NYBG
1409	03/13/2012 07:16 PM	Aaron Marcuse-Kubitza	Merged inputs/UArizona-CSV into UArizona
1408	03/13/2012 07:02 PM	Aaron Marcuse-Kubitza	Added inputs/SpeciesLink/test
1407	03/13/2012 07:02 PM	Aaron Marcuse-Kubitza	Added inputs/SpeciesLink/maps
1406	03/13/2012 07:02 PM	Aaron Marcuse-Kubitza	xml_func.py: range-related funcs: Made inputs optional in case they get set to NULL by _nullIf
1405	03/13/2012 06:48 PM	Aaron Marcuse-Kubitza	mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema
1404	03/13/2012 06:31 PM	Aaron Marcuse-Kubitza	bin/map: Added support for getting columns with an optional prefix list for DB/CSV inputs
1403	03/13/2012 06:21 PM	Aaron Marcuse-Kubitza	bin/map: Factored out code common to DB and CSV inputs into map_table()
1402	03/13/2012 06:00 PM	Aaron Marcuse-Kubitza	bin/map: Parse any prefixes in map input column name. They will later be used to check for versions of columns with a prefix added when processing CSV/DB inputs.
1401	03/13/2012 05:58 PM	Aaron Marcuse-Kubitza	strings.py: Added split(), remove_prefix(), remove_suffix(), and remove_prefixes(). Added section comments.
1400	03/13/2012 05:06 PM	Aaron Marcuse-Kubitza	mappings/DwC2-VegBIEN.specimens.csv: minimumElevationInMeters: Handle embedded ranges using _rangeStart and _rangeEnd
1399	03/13/2012 05:05 PM	Aaron Marcuse-Kubitza	xml_func.py: Added _rangeStart and _rangeEnd
1398	03/13/2012 05:04 PM	Aaron Marcuse-Kubitza	xpath.py: parse(): Split paths: Raise a SyntaxException if can't attach a split path because there is no parent element to attach to

Project

General

Profile