Project

General

Profile

Activity

From 02/21/2012 to 03/21/2012

03/20/2012

11:13 PM Revision 1546: import-all: disown each new import process to ignore SIGHUP
Aaron Marcuse-Kubitza
11:06 PM Revision 1545: Added jobspecs to extract jobspecs (%#) from (possibly filtered) `jobs` output
Aaron Marcuse-Kubitza
11:05 PM Revision 1544: README.TXT: Changed `make import &` to `. bin/import-all`
Aaron Marcuse-Kubitza
11:05 PM Revision 1543: README.TXT: Changed `make import &` to `. bin/import-all`
Aaron Marcuse-Kubitza
10:39 PM Revision 1542: main Makefile: import: Before running imports, print message that `. bin/import-all` can be used to import all inputs at once
Aaron Marcuse-Kubitza
10:38 PM Revision 1541: Added import-all to import all inputs at once
Aaron Marcuse-Kubitza
10:20 PM Revision 1540: mappings/DwC2-VegBIEN.specimens.csv: Mapped establishmentMeans, which contains growthform, iscultivated, isnative, etc. combined
Aaron Marcuse-Kubitza
10:11 PM Revision 1539: inputs/SALVIAS-CSV/maps/VegX.organisms.csv: habit: Updated mapping to match equivalent SALVIAS mapping
Aaron Marcuse-Kubitza
10:10 PM Revision 1538: xml_func.py: _map: Instead of _closed special entry, make all maps closed by default and open them if special entry "*=*" is present. Support using a _map to filter values by interpreting special entry "*=" as removing all values not explicitly specified, and by interpreting special value "*" as keeping input value the same.
Aaron Marcuse-Kubitza
10:08 PM Revision 1537: xml_func.py: _map: Instead of _closed special entry, make all maps closed by default and open them if special entry "*=*" is present. Support using a _map to filter values by interpreting special entry "*=" as removing all values not explicitly specified, and by interpreting special value "*" as keeping input value the same.
Aaron Marcuse-Kubitza
09:19 PM Revision 1536: xml_func.py: _date: On error "month must be in 1..12", try swapping month and day
Aaron Marcuse-Kubitza
09:13 PM Revision 1535: xml_func.py: _date: On error "month must be in 1..12", try swapping month and day
Aaron Marcuse-Kubitza
08:36 PM Revision 1534: row: Support getting multiple rows. Document that does *not* handle embedded newlines.
Aaron Marcuse-Kubitza
08:19 PM Revision 1533: mappings/Makefile: Removed no longer needed DwC-VegBIEN.specimens.no_empty.csv
Aaron Marcuse-Kubitza
08:18 PM Revision 1532: input.Makefile: Removed no longer needed $(join) command
Aaron Marcuse-Kubitza
08:15 PM Revision 1531: input.Makefile: Removed no longer needed src join maps
Aaron Marcuse-Kubitza
08:12 PM Revision 1530: input.Makefile: Generate VegBIEN maps from full via maps in order to include all input columns if a src map was provided. This causes the VegBIEN join process to produce *all* the "No join mapping" errors for that datasource, not just those for fields in the (non-full) via map. maps/src.join.*.csv should no longer be needed for producing "No join mapping" errors.
Aaron Marcuse-Kubitza
08:03 PM Revision 1529: mappings/Makefile: Generate DwC-VegBIEN.specimens.csv from new intermediate DwC.ci-VegBIEN.specimens.csv using $(removeEmpty) so that "No join mapping" errors will be reported when maps are joined to it. Deprecate DwC-VegBIEN.specimens.no_empty.csv because it's now identical to DwC-VegBIEN.specimens.csv.
Aaron Marcuse-Kubitza
07:45 PM Revision 1528: Added inputs/NY/maps/src.specimens.csv
Aaron Marcuse-Kubitza
07:41 PM Revision 1527: Added reverse_join to inner-join two map spreadsheets in the opposite order they are specified in
Aaron Marcuse-Kubitza
07:36 PM Revision 1526: input.Makefile: Intersect the generated VegBIEN and full via maps with the src map, if it exists. This reduces the size of the autogen maps significantly by including only the entries used by the datasource.
Aaron Marcuse-Kubitza
07:34 PM Revision 1525: intersect: Compare columns based on specified compare_col_nums, just like subtract
Aaron Marcuse-Kubitza
06:50 PM Revision 1524: input.Makefile: Use var $(selfMap) instead of spelling out $(bin)/cols 0 0
Aaron Marcuse-Kubitza
06:36 PM Revision 1523: mappings/DwC2-VegBIEN.specimens.csv: Mapped continent
Aaron Marcuse-Kubitza
06:20 PM Revision 1522: inputs/SpeciesLink/maps/DwC.specimens.csv: Mapped remaining fields
Aaron Marcuse-Kubitza
06:19 PM Revision 1521: inputs/SpeciesLink/maps/DwC.specimens.csv: Mapped remaining fields
Aaron Marcuse-Kubitza
06:08 PM Revision 1520: inputs/SpeciesLink/maps/src.specimens.csv: Fixed bug where prefixes had not been removed from fields, which prevented join mappings from being found for any of the fields
Aaron Marcuse-Kubitza
06:08 PM Revision 1519: main Makefile: Added missing_joins to determine which input fields are missing join mappings
Aaron Marcuse-Kubitza
05:47 PM Revision 1518: xml_func.py: SyntaxException: Inherit from exc.ExceptionWithCause so the traceback will be populated with the cause's traceback instead of the SyntaxException wrapper's traceback
Aaron Marcuse-Kubitza
05:35 PM Revision 1517: Added inputs/UNCC/test with accepted test outputs
Aaron Marcuse-Kubitza
05:35 PM Revision 1516: Added inputs/UNCC/maps
Aaron Marcuse-Kubitza
05:34 PM Revision 1515: xml_func.py: _date: month: Convert month names to numbers before casting everything to int
Aaron Marcuse-Kubitza
05:27 PM Revision 1514: xml_func.py: _date: Refactored to convert items to dict right away, and use iteritems() for later type conversion. This will enable month names to be converted before casting everything to int.
Aaron Marcuse-Kubitza
04:47 PM Revision 1513: mappings/Makefile: Sort mappings/DwC.self.specimens.csv so that entries can more easily be found when using it as a DwC terms reference
Aaron Marcuse-Kubitza

03/19/2012

09:55 PM Revision 1512: Added inputs/UNCC
Aaron Marcuse-Kubitza
09:50 PM Revision 1511: Added inputs/U/test with accepted test outputs
Aaron Marcuse-Kubitza
09:49 PM Revision 1510: inputs/U/maps/DwC.specimens.csv: Mapped most of the remaining fields
Aaron Marcuse-Kubitza
09:34 PM Revision 1509: input.Makefile: Clean up via maps when they change by subtracting the via format's self map from the via map (the comments column is ignored in determining which entries are redundant, and empty entries with a matching input column are also removed)
Aaron Marcuse-Kubitza
09:29 PM Revision 1508: subtract: Fixed bug where entries were removed even if maps were not combinable and ignore was off
Aaron Marcuse-Kubitza
09:27 PM Revision 1507: union: Fixed bug where combinable was not saved for use in deciding whether to add entries in map 1 that weren't already defined
Aaron Marcuse-Kubitza
09:25 PM Revision 1506: inputs/U/maps: Set svn props
Aaron Marcuse-Kubitza
09:20 PM Revision 1505: subtract: Also remove nonexplicit empty mappings whose input col is in map 1
Aaron Marcuse-Kubitza
09:15 PM Revision 1504: maps.py: Added is_nonexplicit_empty_mapping()
Aaron Marcuse-Kubitza
09:03 PM Revision 1503: subtract: Use new maps.combinable() to compare column headers, which allows more flexibility in combining maps
Aaron Marcuse-Kubitza
09:01 PM Revision 1502: union: Use new maps.combinable()
Aaron Marcuse-Kubitza
09:01 PM Revision 1501: maps.py: Added col_label() and combinable()
Aaron Marcuse-Kubitza
08:54 PM Revision 1500: union: Use new strings.overlaps()
Aaron Marcuse-Kubitza
08:53 PM Revision 1499: strings.py: Added overlaps()
Aaron Marcuse-Kubitza
08:46 PM Revision 1498: vegbien.sql: Fixed sytnax error in taxonclass enum: missing comma at end of element
Aaron Marcuse-Kubitza
08:38 PM Revision 1497: inputs/*/maps/DwC.specimens.csv: Ran through `cols *` to standardize CSV format to that generated by Python
Aaron Marcuse-Kubitza
08:35 PM Revision 1496: cols: If column number of "*" given, get all columns
Aaron Marcuse-Kubitza
08:32 PM Revision 1495: bin/subtract: If no compare columns given, compare on all columns instead of column 0
Aaron Marcuse-Kubitza
08:31 PM Revision 1494: util.py: list_subset(): Support special idxs value None, which returns entire list
Aaron Marcuse-Kubitza
08:22 PM Revision 1493: cat_csv: Added support for using - to cat stdin
Aaron Marcuse-Kubitza
08:18 PM Revision 1492: Added inputs/U/maps
Aaron Marcuse-Kubitza
07:32 PM Revision 1491: Added inputs/U
Aaron Marcuse-Kubitza
07:29 PM Revision 1490: Put inputs/REMIB/src/remib_raw.0.header.specimens.txt under version control
Aaron Marcuse-Kubitza
07:24 PM Revision 1489: Added inputs/REMIB/test with accepted test outputs
Aaron Marcuse-Kubitza
07:22 PM Revision 1488: Added inputs/REMIB/maps
Aaron Marcuse-Kubitza
07:20 PM Revision 1487: inputs/NCU-NCSC/maps/DwC.specimens.csv: Removed State->StateProvince mapping because that is now in mappings/DwC1-DwC2.specimens.csv
Aaron Marcuse-Kubitza
07:13 PM Revision 1486: mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema
Aaron Marcuse-Kubitza
06:51 PM Revision 1485: Added inputs/REMIB
Aaron Marcuse-Kubitza
06:09 PM Revision 1484: bin/map: Deal with fields that may be in the dataset under more than one prefix by getting all fields and coalesce()ing them (e.g. SpeciesLink has dwcore* and darwin1* columns for the same DwC field)
Aaron Marcuse-Kubitza
06:06 PM Revision 1483: util.py: Added coalesce()
Aaron Marcuse-Kubitza
05:40 PM Revision 1482: xpath_func.py: process(): Fixed bug where XPath elem's other_branches were not also processed
Aaron Marcuse-Kubitza
05:28 PM Revision 1481: row: Don't prepend header row because this feature prevents the program from being used on a pipeline. Sheets may be constructed in a pipeline if multiple segments need to be joined, e.g. with cat_csv.
Aaron Marcuse-Kubitza
05:09 PM Revision 1480: Added row to get a row of a spreadsheet, preceded by the header row
Aaron Marcuse-Kubitza
05:09 PM Revision 1479: bin programs: Fixed bug in Usage message where program name was not printed because unset variable $self was used instead of $0
Aaron Marcuse-Kubitza
05:08 PM Revision 1478: xml_func.py: _nullIf: types_by_name: Use strings.ustr instead of str to support Unicode values
Aaron Marcuse-Kubitza
04:40 PM Revision 1477: xml_func.py: _nullIf: If value not convertible, return it, because can't equal null. Refactored to store types by name in a dict instead of using if statements.
Aaron Marcuse-Kubitza
04:31 PM Revision 1476: units.py: convert(): raise MissingUnitsException if quantity doesn't have units. MissingUnitsException: Take Quantity input instead of str.
Aaron Marcuse-Kubitza
04:27 PM Revision 1475: inputs/NCU-NCSC/maps/DwC.specimens.csv: "Cultivated?": For clarity, use _map instead of _if to translate boolean to "cultivated". Translate "No" to "wild" (the opposite of "cultivated") to store an explicit not-cultivated as such.
Aaron Marcuse-Kubitza
04:26 PM Revision 1474: inputs/NCU-NCSC/maps/DwC.specimens.csv: "Cultivated?": For clarity, use _map instead of _if to translate boolean to "cultivated". Translate "No" to "wild" (the opposite of "cultivated") to store an explicit not-cultivated as such.
Aaron Marcuse-Kubitza
04:21 PM Revision 1473: xml_func.py: _map: empty map entry means None
Aaron Marcuse-Kubitza
04:10 PM Revision 1472: xml_func.py: _avg: Support empty inputs by returning None. Moved _range after _rangeStart/_rangeEnd since it's less frequently used.
Aaron Marcuse-Kubitza
04:07 PM Revision 1471: units.py: Restructured to use a Quantity object for the units-tagged value and conversion functions quantity2str() and str2quantity() to convert between that and a raw string. Added convert() with basic support for removing units and passing through matching units. xml_func.py: _units: Added "to" attr. VegBIEN mappings: Remove units using new _units "to" attr instead of temporary workaround in _units.
Aaron Marcuse-Kubitza
03:13 PM Revision 1470: xml_func.py: _units: default units attr renamed to default to clarify that it's not the units you're converting to
Aaron Marcuse-Kubitza
03:06 PM Revision 1469: xml_func.py: Added documentation labels to each section of XML functions
Aaron Marcuse-Kubitza
03:01 PM Revision 1468: Moved units-related functions from format.py to new units.py
Aaron Marcuse-Kubitza
02:55 PM Revision 1467: lib/*.py: Removed svn:executable property to turn execute bit off
Aaron Marcuse-Kubitza
02:45 PM Revision 1466: vegbien.sql: growthform (and taxonclass) enum: Added options suggested by Michael Lee. Removed "woody". establishmentmeans_dwc (and taxonclass) enum: Reordered to match order of taxonoccurrence boolean fields, and to place each option next to its opposite. taxonclass enum: Moved "woody" to bottom because it's no longer part of growthform.
Aaron Marcuse-Kubitza

03/18/2012

09:10 PM Revision 1465: VegBIEN mappings: distance fields: Remove units
Aaron Marcuse-Kubitza
09:08 PM Revision 1464: xml_func.py: _units: Allow value to be NULL
Aaron Marcuse-Kubitza
08:44 PM Revision 1463: xml_func.py: _units: Use new format.cleanup_units() to do units parsing
Aaron Marcuse-Kubitza
08:43 PM Revision 1462: format.py: Added clean_numeric(), str2int(), str2float(). Added units-related functions. Added documentation labels to each section.
Aaron Marcuse-Kubitza
06:42 PM Revision 1461: Added filter_errors to filters `map` error messages
Aaron Marcuse-Kubitza
06:40 PM Revision 1460: Renamed bin/errors_filter_* to filter_errors_* to sound more natural and to have a different prefix than error_stats so that both can easily be tab-completed at the command line
Aaron Marcuse-Kubitza
06:27 PM Revision 1459: README.TXT: Testing: Added instructions for testing just mapping process, just map spreadsheet generation, and everything
Aaron Marcuse-Kubitza
06:26 PM Revision 1458: root Makefile: Added test-all for most complete coverage. Removed extraneous ";" at the end of the prerequisites line of rules with a recipe.
Aaron Marcuse-Kubitza
06:02 PM Revision 1457: mappings/Makefile: Use new ci_map to make DwC.cs-VegBIEN.specimens.csv case-insensitive
Aaron Marcuse-Kubitza
06:02 PM Revision 1456: Added ci_map to make a map spreadsheet case-insensitive.
Aaron Marcuse-Kubitza
05:53 PM Revision 1455: mappings: DwC: Generate case-insensitive map of DwC1 and DwC2 together, rather than just DwC2. DwC1-DwC2.specimens.csv: Make input columns lowercase so that case-insensitization will work properly.
Aaron Marcuse-Kubitza
05:52 PM Revision 1454: inputs/SpeciesLink: Switched to using flat files instead of DB
Aaron Marcuse-Kubitza
05:52 PM Revision 1453: inputs/MO: Switched to using flat files instead of DB
Aaron Marcuse-Kubitza
05:51 PM Revision 1452: mappings: DwC: Generate case-insensitive map of DwC1 and DwC2 together, rather than just DwC2. DwC1-DwC2.specimens.csv: Make input columns lowercase so that case-insensitization will work properly.
Aaron Marcuse-Kubitza
04:55 PM Revision 1451: input.Makefile: Mapping: Support multiple segments of a source table flat file. Use with_cat_csv if flat file segment(s) are available; otherwise use the input file in $+ or the input database, if any. Don't look for an explicit CSV header file because it can now be handled as the first segment if appropriately named.
Aaron Marcuse-Kubitza
04:50 PM Revision 1450: Added with_cat_csv
Aaron Marcuse-Kubitza
04:50 PM Revision 1449: with_cat: Added support for custom cat command in env var
Aaron Marcuse-Kubitza
04:49 PM Revision 1448: cat_csv: Abort if output stream closed instead of exiting with an IOError
Aaron Marcuse-Kubitza
04:16 PM Revision 1447: cat_csv: Ignore any duplicated headers instead of requiring each CSV to have a header identical to the first. Rewrote to pass the CSVs through as lines rather than parsing each row. Because the CSVs are not parsed, checked that all CSVs have the same dialect.
Aaron Marcuse-Kubitza
04:14 PM Revision 1446: csvs.py: Added csv modifications to compare Dialect instances
Aaron Marcuse-Kubitza
04:13 PM Revision 1445: util.py: Added classes_eq()
Aaron Marcuse-Kubitza

03/16/2012

06:25 PM Revision 1444: csvs.py: Added stream_info() to return NamedTuple {header_line, dialect} for later use in cat_csv. Changed reader_and_header() to use stream_info().
Aaron Marcuse-Kubitza
06:23 PM Revision 1443: util.py: Added NamedTuple
Aaron Marcuse-Kubitza
06:04 PM Revision 1442: csvs.py: reader_and_header(): Restrict delimiters to common delimiters so that e.g. letters are not considered delimiters just because they appear frequently
Aaron Marcuse-Kubitza
05:38 PM Revision 1441: Renamed inputs/NYBG to inputs/NY to match herbarium code
Aaron Marcuse-Kubitza
05:35 PM Revision 1440: Renamed inputs/UNC-NCSC to inputs/NCU-NCSC to match herbarium code
Aaron Marcuse-Kubitza
05:32 PM Revision 1439: Renamed inputs/UArizona to inputs/ARIZ to match herbarium code
Aaron Marcuse-Kubitza
05:31 PM Revision 1438: Regenerated inputs/MO/maps/src.join.specimens.csv
Aaron Marcuse-Kubitza
05:26 PM Revision 1437: Renamed inputs/MOBOT to inputs/MO to match herbarium code
Aaron Marcuse-Kubitza
05:11 PM Revision 1436: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
05:08 PM Revision 1435: vegbien.sql: taxonoccurrence: Added cultivatedbasis
Aaron Marcuse-Kubitza
05:03 PM Revision 1434: vegbien.sql: Moved all accessioncode fields to the bottom of their tables. vegbien.ERD.mwb: Adjusted lines to remove overlaps.
Aaron Marcuse-Kubitza
04:52 PM Revision 1433: vegbien.sql: taxonoccurrence: Added iscultivated, isnative. Moved accessioncode to bottom.
Aaron Marcuse-Kubitza
04:36 PM Revision 1432: vegbien.sql: Changed taxonoccurrence.growthform type to more specific growthform
Aaron Marcuse-Kubitza
04:34 PM Revision 1431: vegbien.sql: Added growthform and establishmentmeans_dwc enums using values from taxonclass. Documented that taxonclass is growthform + establishmentmeans_dwc + some other values.
Aaron Marcuse-Kubitza
04:22 PM Revision 1430: VegBIEN: Moved aggregateoccurrence.growthform to taxonoccurrence
Aaron Marcuse-Kubitza
04:21 PM Revision 1429: Added inputs/UNC-NCSC/maps/src.join.specimens.csv
Aaron Marcuse-Kubitza
04:15 PM Revision 1428: VegBIEN: Merged aggregateoccurrence.verbatimcollectorname and specimenreplicate.verbatimcollectorname into taxonoccurrence
Aaron Marcuse-Kubitza
03:58 PM Revision 1427: xml_func.py: parse_range(): Handle negative numbers by treating them as not a range
Aaron Marcuse-Kubitza
03:31 PM Revision 1426: Added inputs/UNC-NCSC/test with initial accepted test outputs
Aaron Marcuse-Kubitza
03:31 PM Revision 1425: Added inputs/UNC-NCSC/maps
Aaron Marcuse-Kubitza
03:31 PM Revision 1424: xml_func.py: _replace: Fixed bug where value entry was not unpacked
Aaron Marcuse-Kubitza
02:59 PM Task #387 (New): count how many duplicates between Canadensys and GBIF
Aaron Marcuse-Kubitza
02:59 PM Task #386 (Resolved): load Canadensys data
http://data.canadensys.net/ipt/ Aaron Marcuse-Kubitza
02:58 PM Task #385 (Resolved): implement mechanism to determine which specimenreplicates refer to the same specimen
* or are the same record, from different data sources Aaron Marcuse-Kubitza
02:58 PM Task #384 (Resolved): prototype tree traversal algorithm
Aaron's alternative algorithm which cross-links each node to its ancestors using a many:many table Aaron Marcuse-Kubitza
02:56 PM Task #383 (New): convert VegBank data dictionary to database comments
* VegBank data dictionary source code is in svn at https://code.ecoinformatics.org/code/vegbank/trunk/docs/xml/db_mod... Aaron Marcuse-Kubitza
12:36 PM Revision 1423: Added inputs/UNC-NCSC
Aaron Marcuse-Kubitza

03/15/2012

07:12 PM Revision 1422: Added inputs/MOBOT/test with initial accepted test outputs
Aaron Marcuse-Kubitza
07:11 PM Revision 1421: Added inputs/MOBOT/maps
Aaron Marcuse-Kubitza
06:51 PM Revision 1420: Added inputs/MOBOT
Aaron Marcuse-Kubitza
06:41 PM Revision 1419: VegX mappings: Updated plot place mappings to VegX 1.5.3 method of place type-tagged place names. This removes the userdef fields in plot.
Aaron Marcuse-Kubitza
06:18 PM Revision 1418: VegX mappings: Changed userdef xPosition, yPosition to /relativePlotPosition/relativeX, /relativePlotPosition/relativeY
Aaron Marcuse-Kubitza
06:16 PM Revision 1417: Regenerated mappings/DwC-VegBIEN.specimens.no_empty.csv
Aaron Marcuse-Kubitza
05:36 PM Revision 1416: bin/map: map_table(): wrap_row(): Use util.list_as_length() to handle CSV rows of different lengths
Aaron Marcuse-Kubitza
05:35 PM Revision 1415: util.py: Added list_as_length(). Documented that list_set_length() takes a list, not a tuple. Documented that ListDict must have len(list_) == len(keys).
Aaron Marcuse-Kubitza
05:19 PM Revision 1414: util.py: Added list_set_length(). Changed list_set() to use list_set_length().
Aaron Marcuse-Kubitza

03/13/2012

07:48 PM Revision 1413: mappings/DwC2-VegBIEN.specimens.csv: Added empty *_id/taxonoccurrence attr to primary keys to ensure that a taxonoccurrence is always created for the specimenreplicate
Aaron Marcuse-Kubitza
07:41 PM Revision 1412: xml_func.py: _label: Use ustr instead of str when checking types
Aaron Marcuse-Kubitza
07:41 PM Revision 1411: csvs.py: Set dialect.doublequote to True because Sniffer doesn't turn this on by default
Aaron Marcuse-Kubitza
07:23 PM Revision 1410: Merged inputs/NYBG-CSV into NYBG
Aaron Marcuse-Kubitza
07:16 PM Revision 1409: Merged inputs/UArizona-CSV into UArizona
Aaron Marcuse-Kubitza
07:02 PM Revision 1408: Added inputs/SpeciesLink/test
Aaron Marcuse-Kubitza
07:02 PM Revision 1407: Added inputs/SpeciesLink/maps
Aaron Marcuse-Kubitza
07:02 PM Revision 1406: xml_func.py: range-related funcs: Made inputs optional in case they get set to NULL by _nullIf
Aaron Marcuse-Kubitza
06:48 PM Revision 1405: mappings/DwC1-DwC2.specimens.csv: Added common DwC1 fields that are not part of the official DwC1 schema
Aaron Marcuse-Kubitza
06:31 PM Revision 1404: bin/map: Added support for getting columns with an optional prefix list for DB/CSV inputs
Aaron Marcuse-Kubitza
06:21 PM Revision 1403: bin/map: Factored out code common to DB and CSV inputs into map_table()
Aaron Marcuse-Kubitza
06:00 PM Revision 1402: bin/map: Parse any prefixes in map input column name. They will later be used to check for versions of columns with a prefix added when processing CSV/DB inputs.
Aaron Marcuse-Kubitza
05:58 PM Revision 1401: strings.py: Added split(), remove_prefix(), remove_suffix(), and remove_prefixes(). Added section comments.
Aaron Marcuse-Kubitza
05:06 PM Revision 1400: mappings/DwC2-VegBIEN.specimens.csv: minimumElevationInMeters: Handle embedded ranges using _rangeStart and _rangeEnd
Aaron Marcuse-Kubitza
05:05 PM Revision 1399: xml_func.py: Added _rangeStart and _rangeEnd
Aaron Marcuse-Kubitza
05:04 PM Revision 1398: xpath.py: parse(): Split paths: Raise a SyntaxException if can't attach a split path because there is no parent element to attach to
Aaron Marcuse-Kubitza
05:02 PM Revision 1397: Parser.py: Renamed _syntax_err() to syntax_err() to make it a public method
Aaron Marcuse-Kubitza
04:38 PM Revision 1396: mappings/DwC2-VegBIEN.specimens.csv: Mapped fieldNotes and taxonRemarks to description using _merge. inputs/UArizona*/maps/DwC.specimens.csv: Mapped Remarks to taxonRemarks, which now has a VegBIEN mapping.
Aaron Marcuse-Kubitza
04:24 PM Revision 1395: Added inputs/GBIF/src with small files that can be under version control
Aaron Marcuse-Kubitza
04:23 PM Revision 1394: input.Makefile: svn_props: Ignore everything in the src/ subdir that hasn't been explicitly checked in
Aaron Marcuse-Kubitza
04:18 PM Revision 1393: Added inputs/GBIF/test with accepted test outputs
Aaron Marcuse-Kubitza
04:18 PM Revision 1392: Added inputs/GBIF/maps
Aaron Marcuse-Kubitza
04:17 PM Revision 1391: Regenerated inputs/UArizona*/maps VegBIEN maps
Aaron Marcuse-Kubitza
04:13 PM Revision 1390: Regenerated mappings/DwC-VegBIEN.specimens.no_empty.csv
Aaron Marcuse-Kubitza
04:09 PM Revision 1389: bin/map: Use new csvs.reader_and_header() to support CSVs/TSVs with other than the default Excel dialect
Aaron Marcuse-Kubitza
04:08 PM Revision 1388: Added csvs.py for CSV I/O such as automatically detecting the dialect based on the header line
Aaron Marcuse-Kubitza
04:07 PM Revision 1387: join: Don't append suffix to empty output mappings, so that they stay empty ("NULL")
Aaron Marcuse-Kubitza
04:00 PM Revision 1386: input.Makefile: Added tsv to $(exts). Strip extra whitespace from $(inputs) so that it's the empty string if $(<in) (and $(<in).header) don't exist, and can be used in $(if ...).
Aaron Marcuse-Kubitza

03/12/2012

07:08 PM Revision 1385: input.Makefile: Fixed bug in inputFiles wildcard where extensions were manually listed instead of dynamically determined from the $(exts) config var
Aaron Marcuse-Kubitza
06:56 PM Revision 1384: README.TXT: Tell user to `disown -h %1` after running `make import &` so that it won't be sent a SIGHUP if the user logs out
Aaron Marcuse-Kubitza
06:55 PM Revision 1383: README.TXT: Tell user to `disown -h %1` after running `make import &` so that it won't be sent a SIGHUP if the user logs out
Aaron Marcuse-Kubitza
06:39 PM Revision 1382: input.Makefile: Prepend separate CSV header when available
Aaron Marcuse-Kubitza
06:24 PM Revision 1381: input.Makefile: Use with_cat in map to later support prepending separate CSV headers
Aaron Marcuse-Kubitza
06:21 PM Revision 1380: Added with_cat to run a command, taking input from the concatenation of files
Aaron Marcuse-Kubitza
05:48 PM Revision 1379: input.Makefile: Set mapEnv if $(dbEngine) is set, to eventually support pre-existing DB connections
Aaron Marcuse-Kubitza
05:14 PM Revision 1378: input.Makefile: Changed $(dbFile) to $(dbExport) to make it unambiguous that it refers to a SQL export, not a pre-existing DB, which will be supported later
Aaron Marcuse-Kubitza
05:10 PM Revision 1377: input.Makefile: Added .txt to list of input file extensions
Aaron Marcuse-Kubitza
04:34 PM Revision 1376: Added inputs/SpeciesLink
Aaron Marcuse-Kubitza
03:57 PM Revision 1375: root Makefile: python-Linux: Added pymetrics
Aaron Marcuse-Kubitza
03:54 PM Revision 1374: bin/map: Consider \N to be None
Aaron Marcuse-Kubitza
03:49 PM Revision 1373: util.py: none_if(): Allow multiple none_vals using varargs
Aaron Marcuse-Kubitza
03:36 PM Revision 1372: Added inputs/GBIF
Aaron Marcuse-Kubitza
03:28 PM Revision 1371: exc.py: Fixed bug in traceback-saving mechanism that didn't deal with nested Exceptions (such as Exceptions with causes in ExceptionWithCause). Renamed add_exc_info() to add_traceback() since we really only need to store the traceback.
Aaron Marcuse-Kubitza
12:41 PM Revision 1370: dates.py: parse_date_range(): Fixed bug where the date parts were not joined back together into a string for each date range element. Use strings.single_space() after the date has been split into range parts so that whitespace around the range separator is removed instead of being replaced with a single space.
Aaron Marcuse-Kubitza
12:25 PM Revision 1369: xml_func.py: process(): Also catch XML func internal errors to assist in debugging. Use new exc.add_exc_info() to save traceback in case later code throws exception, overwriting exc_info().
Aaron Marcuse-Kubitza
12:23 PM Revision 1368: exc.py: str_(): Add the traceback at the end of the exception string. Added add_exc_info() and get_exc_info() for providing traceback info for str_().
Aaron Marcuse-Kubitza

03/11/2012

07:33 PM Revision 1367: mappings/DwC2-VegBIEN.specimens.csv: eventDate, dateIdentified: Use _dateRangeStart and _dateRangeEnd
Aaron Marcuse-Kubitza
07:32 PM Revision 1366: xml_func.py: Added _dateRangeStart and _dateRangeEnd
Aaron Marcuse-Kubitza
07:32 PM Revision 1365: dates.py: Added parse_date_range() and helper funcs could_be_year() and could_be_day()
Aaron Marcuse-Kubitza
07:31 PM Revision 1364: strings.py: Added single_space()
Aaron Marcuse-Kubitza
06:12 PM Revision 1363: inputs/UArizona*: Map the ScientificNameAuthor to the binomial instead since it contains the binomial in addition to the authority
Aaron Marcuse-Kubitza
05:28 PM Revision 1362: Added inputs/UArizona-CSV/test
Aaron Marcuse-Kubitza
05:23 PM Revision 1361: input.Makefile: Use .PRECIOUS to save outputs of failed tests so they can be accepted (needed now that .DELETE_ON_ERROR is turned on globally)
Aaron Marcuse-Kubitza
05:14 PM Revision 1360: bin/map: Moved string-cleanup code from get_value() to cleanup(), called by process_row(). process_row() now cleans up the string before checking if it's None, because cleanup() uses none_if() to map "" to None.
Aaron Marcuse-Kubitza
05:12 PM Revision 1359: util.py: Added do_ignore_none()
Aaron Marcuse-Kubitza
04:25 PM Revision 1358: Added inputs/UArizona-CSV/verify
Aaron Marcuse-Kubitza
04:24 PM Revision 1357: Added inputs/UArizona-CSV/maps
Aaron Marcuse-Kubitza
04:23 PM Revision 1356: mappings/DwC2-VegBIEN.specimens.csv: Mapped coordinateUncertaintyInMeters to the same place as coordinatePrecision (input sources generally use only one of these columns, which is most likely the accuracy regardless of what it's named)
Aaron Marcuse-Kubitza
04:18 PM Revision 1355: join: In error message when map column names don't match, include the actual column names
Aaron Marcuse-Kubitza
04:17 PM Revision 1354: Makefiles: Added .DELETE_ON_ERROR to delete target if recipe fails
Aaron Marcuse-Kubitza
03:18 PM Revision 1353: VegBIEN mappings: plantnames: Nest taxons hierarchically using plantname.parent_id. Mappings using _forEach: Append a "," to the `in` list so that mappings will sort from shortest to longest `in` list ("]" comes after "," in ASCII, causing this not to happen without the trailing ",").
Aaron Marcuse-Kubitza
03:14 PM Revision 1352: xpath.py: parse(): _paths(): Remove trailing ","
Aaron Marcuse-Kubitza
02:38 PM Revision 1351: xpath_func.py: _forEach: Made syntax more natural-looking by using values instead of names for string args and attrs instead of branches for array args
Aaron Marcuse-Kubitza
02:36 PM Revision 1350: xpath.py: parse() Fixed bug in _paths() where empty lists would be parsed as a list containing a single empty path, instead of as an empty list
Aaron Marcuse-Kubitza
01:26 PM Revision 1349: VegBIEN mappings: Place names: Use _forEach to simplify XPaths for recursively nested places
Aaron Marcuse-Kubitza
01:22 PM Revision 1348: bin/map: In debug mode, print output XPaths
Aaron Marcuse-Kubitza

03/09/2012

07:51 PM Revision 1347: xpath_func.py: _forEach: Fixed to support _val replacements anywhere, by doing a string-based search-and-replace on a quoted XPath instead of a list-based search-and-replace on an already-parsed XPath
Aaron Marcuse-Kubitza
07:41 PM Revision 1346: xpath_func.py: Renamed _for to _forEach. Finished implementing _forEach.
Aaron Marcuse-Kubitza
07:41 PM Revision 1345: xpath.py: Import xpath_func after defining XpathElem because xpath_func depends on XpathElem and it hasn't yet been factored into a separate file
Aaron Marcuse-Kubitza
07:39 PM Revision 1344: util.py: Added list_replace()
Aaron Marcuse-Kubitza
07:14 PM Revision 1343: xpath_func.py: Changed XPath function signature to take arguments (args, path), and process() to parse out the args. Implemented basic _for that repeats its do arg as many times as there are in_ elements.
Aaron Marcuse-Kubitza
06:44 PM Revision 1342: xpath.py: parse(): Run xpath_func.process() on the parsed XPath
Aaron Marcuse-Kubitza
06:43 PM Revision 1341: Added xpath_func.py for XPath "function" elements that transform their subpaths
Aaron Marcuse-Kubitza
06:23 PM Revision 1340: VegBIEN mappings: Removed no longer needed taxondetermination.determinationtype values, because they can be determined from the new role closed list
Aaron Marcuse-Kubitza
06:19 PM Revision 1339: filter_ERD.csv: Removed no longer needed references to role
Aaron Marcuse-Kubitza
06:18 PM Revision 1338: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
06:17 PM Revision 1337: VegBIEN: Changed role table to a closed list
Aaron Marcuse-Kubitza
06:14 PM Revision 1336: PostgreSQL-MySQL.csv: custom types: Consider everything except a set of accepted types to be a custom type
Aaron Marcuse-Kubitza
05:40 PM Revision 1335: VegBIEN: taxonrank enum: Made values lowercase to match case convention in other enums
Aaron Marcuse-Kubitza
05:33 PM Revision 1334: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
05:32 PM Revision 1333: vegbien.sql: Renamed plantconceptscope to plantnamescope because it's now attached to plantname
Aaron Marcuse-Kubitza
05:26 PM Revision 1332: vegbien.sql: Moved parent_id from plantconcept to plantname, since plantnames themselves are unique according to their parent taxons (a species under one genus is not the same as a species under another genus)
Aaron Marcuse-Kubitza
05:03 PM Revision 1331: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
04:59 PM Revision 1330: vegbien.ERD.mwb: Fixed lines
Aaron Marcuse-Kubitza
04:57 PM Revision 1329: vegbien.sql: Moved scope_id from plantconcept to plantname, since plantnames themselves are scoped, not just the plantconcepts that use them (e.g. "sp. 1" has different meanings in different scopes, so it should not be shared between scopes). plantname: Added accessioncode.
Aaron Marcuse-Kubitza
04:38 PM Revision 1328: vegbien.sql: Moved plantconcept parent_id from plantstatus to plantconcept. plantconcept: Removed datasource-specific fields to make it globally unique (one plantconcept for each assigned parent taxon of a plantname, of which there will usually be just one)
Aaron Marcuse-Kubitza
04:22 PM Revision 1327: vegbien.sql: plantname: Removed datasource-specific fields to make this a globally-unique table (the datasource-specific fields belong in plantconcept)
Aaron Marcuse-Kubitza
04:16 PM Revision 1326: Added inputs/UArizona/verify
Aaron Marcuse-Kubitza
04:15 PM Revision 1325: mappings/verify.specimens.sql: Updated for schema changes
Aaron Marcuse-Kubitza
04:06 PM Revision 1324: vegbien.sql: placerank enum: Added "village"
Aaron Marcuse-Kubitza
04:00 PM Revision 1323: VegBIEN mappings: lat/long locationdetermination: Removed [!namedplace_id] key so that it's merged into the namedplace locationdetermination
Aaron Marcuse-Kubitza
03:54 PM Revision 1322: VegBIEN mappings: Changed namedplace mappings to use new nested format for storing place containment relationships
Aaron Marcuse-Kubitza
03:44 PM Revision 1321: xml_func.py: Added _simplifyPath
Aaron Marcuse-Kubitza
03:25 PM Revision 1320: xpath.py: Added get_1()
Aaron Marcuse-Kubitza
02:50 PM Revision 1319: vegbien.sql: namedplace: Removed parent_id from unique constraint because some data might be missing intervening links (e.g. state for a county, country), but the place (e.g. county) should still be attached to the existing place of the same name and rank (which will hopefully already have the correct parent_id link)
Aaron Marcuse-Kubitza
02:46 PM Revision 1318: vegbien.sql: namedplace: Made rank required
Aaron Marcuse-Kubitza
02:33 PM Revision 1317: vegbien.sql: namedplace: Removed no longer needed placesystem, which has been replaced by rank closed list
Aaron Marcuse-Kubitza
02:30 PM Revision 1316: VegBIEN mappings: Map namedplaces using new rank field
Aaron Marcuse-Kubitza
02:25 PM Revision 1315: vegbien.sql: namedplace: Added rank. Do duplicate elimination using rank and parent_id instead of placesystem
Aaron Marcuse-Kubitza
02:20 PM Revision 1314: vegbien.sql: placerank: Standardized names to DwC/GML
Aaron Marcuse-Kubitza
01:58 PM Task #378 (New): create automated feedback mechanism
* triggered when an import is run Aaron Marcuse-Kubitza
01:57 PM Task #377 (Resolved): ask NYBG for direct access to server
Aaron Marcuse-Kubitza
01:06 PM Revision 1313: vegbien.sql: Added placerank enum
Aaron Marcuse-Kubitza
12:35 PM Revision 1312: vegbien.sql: namedplace: Removed VegBank internal fields and datasource scoping fields (namedplaces are globally unique). Added parent_id to point to containing namedplace.
Aaron Marcuse-Kubitza
12:21 PM Revision 1311: xml_func.py: Added _dateRangePart with partial implementation (only works on strings with no range)
Aaron Marcuse-Kubitza
12:20 PM Revision 1310: DwC mappings: Moved date _date filter outside _alt so it would run only on the string that was actually chosen, and not produce date format errors when a pre-parsed year/month/day is already available
Aaron Marcuse-Kubitza

03/08/2012

06:30 PM Revision 1309: xml_func.py: _date: Map date with only empty fields to NULL (occurs when all fields were e.g. 0 and were filtered to NULL by _nullIf)
Aaron Marcuse-Kubitza
06:00 PM Revision 1308: xml_func.py: _date: Removed mapping year/month/day of 0 to NULL because that is now handled on a case-by-case basis in the mappings
Aaron Marcuse-Kubitza
05:58 PM Revision 1307: mappings/DwC1-DwC2.specimens.csv: Map year/month/day of 0 to NULL
Aaron Marcuse-Kubitza
05:13 PM Revision 1306: inputs/SALVIAS/maps/VegX.organisms.csv: Habit: Fixed syntax error in growthForm map
Aaron Marcuse-Kubitza
05:11 PM Revision 1305: inputs/SALVIAS/maps/VegX.organisms.csv: Habit: Removed input values from growthForm map that Brad said were invalid
Aaron Marcuse-Kubitza
05:10 PM Revision 1304: xml_func.py: _map: Added option to make map a closed list
Aaron Marcuse-Kubitza
04:56 PM Revision 1303: mappings/DwC2-VegBIEN.specimens.csv: Fixed waterdepth mappings to use _avg
Aaron Marcuse-Kubitza

03/06/2012

06:48 PM Revision 1302: mappings/verify.specimens.sql: Use ORDER BY ... NULLS FIRST to match MySQL
Aaron Marcuse-Kubitza
06:42 PM Revision 1301: input.Makefile: verify: Time the verification since it can take a long time
Aaron Marcuse-Kubitza
06:34 PM Revision 1300: specimens verification: Added duplicate catalog numbers test
Aaron Marcuse-Kubitza
06:27 PM Revision 1299: map: On nimoy, use bien2_staging unless otherwise specified
Aaron Marcuse-Kubitza
06:21 PM Revision 1298: specimens verification: Added # counties test
Aaron Marcuse-Kubitza
05:34 PM Revision 1297: specimens verification: Added collection codes and # catalog numbers tests
Aaron Marcuse-Kubitza
05:33 PM Revision 1296: inputs/SALVIAS/maps/VegX.organisms.csv: Mapped custom Habit values not listed in the SALVIAS data dictionary
Aaron Marcuse-Kubitza
05:32 PM Revision 1295: strings.py: Added unicode_reader for later use in handling Unicode characters in map spreadsheets
Aaron Marcuse-Kubitza
03:45 PM Revision 1294: xpath.py: Removed unnecessary copy.deepcopy()'s and instead changed set_value() and set_id() to make copies of any elements they change. This should result in up to a 17% speed increase in the import, because deepcopy() was taking a lot of time. Added documentation to set_value() and set_id() that caller must make a shallow copy of the path to prevent modifications from propagating to other copies of the path. (Previously, a deep copy was needed, but there was no comment specifying this.)
Aaron Marcuse-Kubitza
03:40 PM Revision 1293: mappings/VegX-VegBIEN.organisms.csv: Removed unneeded lookahead assertions from stemtag mappings. They relied on a bug ("feature"?) in the XPath engine that made the value of the lookahead assertion's path the same as the value of the main path, even though the value is set after the path is parsed.
Aaron Marcuse-Kubitza
02:45 PM Revision 1292: xml_func.py: _date: For year/month/day dates, require the year (it would not make sense to default to a particular year)
Aaron Marcuse-Kubitza
01:29 PM Revision 1291: inputs/UArizona: Added test outputs
Aaron Marcuse-Kubitza
01:28 PM Revision 1290: mappings/DwC1-DwC2.specimens.csv: Fixed to allow datasource to define custom date mappings that don't pass through the default date mapping
Aaron Marcuse-Kubitza

03/05/2012

05:31 PM Revision 1289: input.Makefile: Generate maps/src.join.*.csv, which can be used to determine which DwC fields for a particular dataset do not yet have a join mapping to VegBIEN
Aaron Marcuse-Kubitza
05:26 PM Revision 1288: Makefile: Fixed subdir remake target to work for nested subdirs as well
Aaron Marcuse-Kubitza
04:51 PM Revision 1287: inputs/UArizona: Renamed maps/src.csv to maps/src.specimens.csv because there will be one for each input table
Aaron Marcuse-Kubitza
04:41 PM Revision 1286: inputs/UArizona: Added maps/src.csv with columns from source data
Aaron Marcuse-Kubitza
04:40 PM Revision 1285: Added autogen mappings/DwC-VegBIEN.specimens.no_empty.csv, which will be used for determining which DwC fields for a particular dataset do not yet have a join mapping to VegBIEN
Aaron Marcuse-Kubitza
04:35 PM Revision 1284: Added remove_empty to remove empty mappings in a map spreadsheet
Aaron Marcuse-Kubitza
04:35 PM Revision 1283: join: Don't raise "No join mapping" error for empty mappings because you only want the error for empty mappings for your particular dataset, which requires more information (namely, the subset of the mappings used by your dataset, some of which will not be in the mappings if standard fields have been subtracted out)
Aaron Marcuse-Kubitza
04:10 PM Revision 1282: join: Fixed bug in "No join mapping" error generation where rows with no existing comments column would cause an IndexError
Aaron Marcuse-Kubitza
04:09 PM Revision 1281: util.py: Added list_set() and list_setdefault()
Aaron Marcuse-Kubitza
03:44 PM Revision 1280: inputs/UArizona/maps/DwC.specimens.csv: Merge FieldNotes and Remarks
Aaron Marcuse-Kubitza
03:35 PM Revision 1279: inputs/UArizona/maps/DwC.specimens.csv: Finished mappings
Aaron Marcuse-Kubitza
03:08 PM Revision 1278: inputs/UArizona/maps/DwC.specimens.csv: Removed fields already present in DwC mappings
Aaron Marcuse-Kubitza
03:05 PM Revision 1277: inputs/NYBG-CSV/maps/DwC.specimens.csv: Removed mappings already present in case-insensitive DwC2 mapping
Aaron Marcuse-Kubitza
03:03 PM Revision 1276: inputs/NYBG/maps/DwC.specimens.csv: Removed mappings already present in case-insensitive DwC2 mapping
Aaron Marcuse-Kubitza
02:48 PM Revision 1275: mappings/DwC1-DwC2.specimens.csv: Removed fields already present in DwC2.ci-VegBIEN.specimens.csv
Aaron Marcuse-Kubitza
02:38 PM Revision 1274: Makefiles: Moved remake into main Makefile. Fixed remake to run `make all` in a new make so that cache of existing files is reset. Have main remake run clean and then all instead of forwarding remake to subdirs, so that everything is cleaned before everything is remade.
Aaron Marcuse-Kubitza
02:21 PM Revision 1273: input.Makefile: maps: maps/$(via).%.full.csv: Fixed bug where $(selfMap) would be ignored if it had not yet been made
Aaron Marcuse-Kubitza
02:02 PM Revision 1272: mappings/Makefile: Reorganized into DwC and VegX sections
Aaron Marcuse-Kubitza
02:02 PM Revision 1271: Added autogenerated mappings/DwC2.ci-VegBIEN.specimens.csv. Use it to include DwC2 fields with first letter uppercased in the full DwC mapping, so that datasources that use DwC2 terms with a different case can still use the DwC2 mapping.
Aaron Marcuse-Kubitza
01:57 PM Revision 1270: Added autogenerated mappings/DwC2.ci-VegBIEN.specimens.csv. Use it to include DwC2 fields with first letter uppercased in the full DwC mapping, so that datasources that use DwC2 terms with a different case can still use the DwC2 mapping.
Aaron Marcuse-Kubitza
01:54 PM Revision 1269: inputs/UArizona/maps/DwC.specimens.csv: Mapped CollectedDate to eventDate/_alt/2 even though it's not used because other datasources might copy these mappings and want it already filled in
Aaron Marcuse-Kubitza
01:52 PM Revision 1268: Added ucase_first to uppercase the first character of columns in a spreadsheet
Aaron Marcuse-Kubitza
01:21 PM Revision 1267: Added inputs/UArizona/maps/DwC.specimens.csv autogen maps
Aaron Marcuse-Kubitza
01:20 PM Revision 1266: inputs/UArizona/maps/DwC.specimens.csv: Mapped more fields
Aaron Marcuse-Kubitza
01:14 PM Revision 1265: mappings/DwC1-DwC2.specimens.csv: Remove date -> date/_alt/2 mappings because they prevent the original DwC2 date field from being mapped to without an extra /_alt/2 appended
Aaron Marcuse-Kubitza
01:10 PM Revision 1264: xml_func.py: Use new dates.strtotime(). When component date parts specified, year defaults to dates.epoch.year.
Aaron Marcuse-Kubitza
01:09 PM Revision 1263: dates.py: Added strtotime() to wrap dateutil.parser.parse() with default defaulting to epoch, so that e.g. months with day missing default to day 1 instead of the current day of the month
Aaron Marcuse-Kubitza
12:38 PM Revision 1262: mappings/DwC1-DwC2.specimens.csv: Map eventDate,dateIdentified using /_alt/2 and year/month/day using /_alt/1 so that inputs with both a date and date parts will select between the two
Aaron Marcuse-Kubitza
11:43 AM Revision 1261: input.Makefile: Added comment that self map must be made first if it's needed for maps/$(via).%.full.csv
Aaron Marcuse-Kubitza
11:40 AM Revision 1260: Makefiles: Use .SECONDARY with no prerequisites instead of setting a .PRECIOUS for each intermediate, to simplify turning off automatic deletion of intermediate files
Aaron Marcuse-Kubitza
11:23 AM Revision 1259: inputs/UArizona: Added initial maps/DwC.specimens.csv
Aaron Marcuse-Kubitza
11:10 AM Revision 1258: DwC mappings: Map datasource name via institutionID to avoid conflicting with existing institutionCode fields that many DwC data sources have
Aaron Marcuse-Kubitza
10:57 AM Revision 1257: input.Makefile: Don't profile by default because it appears to slow things down significantly on long imports
Aaron Marcuse-Kubitza
10:56 AM Revision 1256: Added inputs/UArizona/maps
Aaron Marcuse-Kubitza
10:33 AM Task #372 (Resolved): talk to Nick about proposed changes to VegX
Aaron Marcuse-Kubitza

03/03/2012

05:56 PM Revision 1255: Makefile: python-Linux: Added python-profiler
Aaron Marcuse-Kubitza
05:44 PM Revision 1254: specimens verification: Added # binomials test
Aaron Marcuse-Kubitza
05:35 PM Revision 1253: vegbien.sql: specimenreplicate: Removed specimenreplicate_unique_collectionnumber index because the collectionnumber (NYBG FieldNumber) is not always unique within a collector, even though it should be. Changed specimenreplicate_unique_catalognumber to only operate on rows with no sourceaccessioncode (of which there are 8 in NYBG).
Aaron Marcuse-Kubitza
05:09 PM Revision 1252: mappings/verify.specimens.sql: # species test: Fixed to join separately on taxondeterminations for genus and species. # genera test: Removed no longer needed join on party.
Aaron Marcuse-Kubitza
05:04 PM Revision 1251: vegbien.sql: specimenreplicate: Added fki index on taxonoccurrence_id
Aaron Marcuse-Kubitza
04:25 PM Revision 1250: vegbien.sql: plantname: Added index on rank to speed up specimens verifications, where the query planner insists on joining from plantname to specimenreplicate instead of the other way around (which takes much longer without the index)
Aaron Marcuse-Kubitza
03:33 PM Revision 1249: mappings/verify.*: Use nested SELECT instead of JOIN on party to get datasource_id, so that party will not be joined on after other joins have already occurred (which slows things down)
Aaron Marcuse-Kubitza
03:26 PM Revision 1248: vegbien.sql: party: Changed party_unique_name to ignore NULL values and the organizationname (a first(+middle)+last name is considered unique)
Aaron Marcuse-Kubitza
03:15 PM Revision 1247: vegbien.sql: party: Added party_unique_organizationname constraint
Aaron Marcuse-Kubitza
02:11 PM Revision 1246: Specimens verification: Added # genera and # species
Aaron Marcuse-Kubitza
01:50 PM Revision 1245: input.Makefile: verify: Create target dir if it doesn't exist
Aaron Marcuse-Kubitza
01:42 PM Revision 1244: inputs/NYBG: Added verify/specimens.ref.sql
Aaron Marcuse-Kubitza
01:41 PM Revision 1243: Added mappings/verify.specimens.sql
Aaron Marcuse-Kubitza
01:41 PM Revision 1242: Added inputs/NYBG-CSV/verify/
Aaron Marcuse-Kubitza
01:40 PM Revision 1241: Makefile: Print done message after verify
Aaron Marcuse-Kubitza
01:29 PM Revision 1240: VegX-VegBIEN mapping: Use new lookup-only element syntax to ensure that stemtag 1 is not created if it doesn't exist when stemtag 2 tries to set its iscurrent status to false. This should fix the 136 "NullValueException: columns: tag" errors in the SALVIAS organisms import.
Aaron Marcuse-Kubitza
01:27 PM Revision 1239: xpath.py: get(): Added support for lookup-only elements which are not created if they don't exist
Aaron Marcuse-Kubitza
01:25 PM Revision 1238: xpath.py: parse(): Added support for lookup-only elements which are not created if they don't exist
Aaron Marcuse-Kubitza
01:15 PM Revision 1237: VegX-VegBIEN mapping: Map stemtags using [] instead of :[] for attrs that are really keys
Aaron Marcuse-Kubitza

03/02/2012

07:54 PM Revision 1236: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
07:52 PM Revision 1235: VegX-VegBIEN mapping: Handle user-defined field voucherType (SALVIAS DetType) by mapping specimenreplicates for voucherTypes other than direct via voucher
Aaron Marcuse-Kubitza
06:58 PM Revision 1234: xml_func.py: Added _if and _eq. Added cast() to throw SyntaxException if can't cast and use it in conv_items(). _merge: Check types of input using conv_items(strings.ustr, items).
Aaron Marcuse-Kubitza
06:53 PM Revision 1233: util.py: Added all_not_none() and bool2str()
Aaron Marcuse-Kubitza
06:52 PM Revision 1232: strings.py: Added ustr() (like built-in str() but converts to unicode object)
Aaron Marcuse-Kubitza
05:32 PM Revision 1231: PostgreSQL-MySQL.csv: Fixed bug in removal of casts of default values, which treated NOT NULL as part of the datatype
Aaron Marcuse-Kubitza
05:30 PM Revision 1230: VegBIEN: soilobs: Added default value for horizon. Adjusted mappings to remove now-unecessary horizon value.
Aaron Marcuse-Kubitza
05:26 PM Revision 1229: repl: Removed automatic case-insensitivity because Python apparently only supports turning *on* case-insensitivity via (?i) but not off via (?-i) (as Java does)
Aaron Marcuse-Kubitza
05:09 PM Revision 1228: VegBIEN: soilobs: Removed soil* prefix from fields
Aaron Marcuse-Kubitza
05:05 PM Revision 1227: VegX-VegBIEN mapping: Map to new soilobs fields
Aaron Marcuse-Kubitza
04:57 PM Revision 1226: SALVIAS inputs: Use new _units:[units="%"] on soil fields that are percents. Replace "<..." values with 0.
Aaron Marcuse-Kubitza
04:55 PM Revision 1225: xml_func.py: Added _units
Aaron Marcuse-Kubitza
04:30 PM Revision 1224: vegbien.sql: soilobs: Converted user-defined fields to first-class. Labeled appropriate fields as "fraction".
Aaron Marcuse-Kubitza
04:08 PM Revision 1223: VegBIEN mappings: Changed tableRecord_ID to tablerecord_id to match PostgreSQL field name
Aaron Marcuse-Kubitza
04:05 PM Revision 1222: DwC2-VegBIEN mapping: Adjusted user-defined mappings
Aaron Marcuse-Kubitza
04:00 PM Revision 1221: vegbien.sql: userdefined: Made userdefinedname NOT NULL. userdefined, definedvalue: Added unique constraints.
Aaron Marcuse-Kubitza
03:54 PM Revision 1220: VegX-VegBIEN mapping: Mapped userdefined fields to new first-class fields
Aaron Marcuse-Kubitza
03:46 PM Revision 1219: xml_func.py: Added _map and _replace
Aaron Marcuse-Kubitza
02:33 PM Revision 1218: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
02:30 PM Revision 1217: vegbien.ERD.mwb: Fixed lines. Expanded truncated tables where there was room.
Aaron Marcuse-Kubitza
02:01 PM Task #366: refactor VegX
See [[VegX]] schema Aaron Marcuse-Kubitza
02:00 PM Task #370 (Resolved): create ERD of final schema
Ready for sign off on first draft Aaron Marcuse-Kubitza
01:59 PM Task #374 (Resolved): mechanism to export VegBIEN data to flat file
Aaron Marcuse-Kubitza
01:58 PM Task #373 (Resolved): map all specimens data in raw_data
Aaron Marcuse-Kubitza
01:58 PM Task #372 (Resolved): talk to Nick about proposed changes to VegX
Aaron Marcuse-Kubitza
01:58 PM Task #371 (Resolved): formalize proposed changes to VegX
Aaron Marcuse-Kubitza
12:51 PM Revision 1216: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
12:51 PM Revision 1215: vegbien.sql: locationevent: Added temperature and precipitation
Aaron Marcuse-Kubitza
12:45 PM Revision 1214: vegbien.sql: aggregateoccurrence: Added growthform
Aaron Marcuse-Kubitza
12:39 PM Revision 1213: vegbien.ERD.mwb: Reversed the locations of soiltaxon and soilobs to give soilobs room to add new fields
Aaron Marcuse-Kubitza
12:36 PM Revision 1212: vegbien.sql: Removed embargo table and emb_* fields because we're using a central field, location.confidentialitystatus, for embargo information and coordinate fuzzing
Aaron Marcuse-Kubitza
12:22 PM Revision 1211: vegbien.sql: stemobservation: Added heightfirstbranch
Aaron Marcuse-Kubitza
12:17 PM Revision 1210: vegbien.sql: stemobservation: Added diameteraccuracy. Reordered fields.
Aaron Marcuse-Kubitza

03/01/2012

05:55 PM Revision 1209: VegBIEN: stemobservation: Renamed diameter to diameterbreastheight to be more accurate
Aaron Marcuse-Kubitza
05:45 PM Revision 1208: vegbien.ERD.mwb: Expanded tables where there was room
Aaron Marcuse-Kubitza
05:34 PM Revision 1207: DwC mappings: Fixed user-defined field mappings according to Brad Boyle's changes
Aaron Marcuse-Kubitza
05:33 PM Revision 1206: vegbien.sql: Changed specimenreplicate_unique_collectionnumber constraint to include verbatimcollectorname because collection number is assigned by collector
Aaron Marcuse-Kubitza

02/28/2012

07:41 PM Revision 1205: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
07:39 PM Revision 1204: vegbien.sql: Changed specimenreplicate_unique_collectionnumber constraint to include verbatimcollectorname because collection number is assigned by collector
Aaron Marcuse-Kubitza
07:36 PM Revision 1203: VegBIEN: Moved taxonoccurrence.verbatimcollectorname to specimenreplicate and aggregateoccurrence so that it can be used in specimenreplicate duplicate elimination
Aaron Marcuse-Kubitza
07:21 PM Revision 1202: mappings/DwC1-DwC2.specimens.csv: Notes mapping: Removed extraneous /_merge/1
Aaron Marcuse-Kubitza
05:51 PM Revision 1201: input.Makefile: svn_props: Removed no longer needed items from input dir svn:ignore
Aaron Marcuse-Kubitza
05:49 PM Revision 1200: input.Makefile: verify: Fixed bug for inputs without a .ref where $(wildcard) wouldn't recheck the file after verify/%.out is run, so the verify output wasn't printed
Aaron Marcuse-Kubitza
05:45 PM Revision 1199: input.Makefile: Moved verify files into separate subdir
Aaron Marcuse-Kubitza
04:46 PM Task #348: 1st draft of schema
Waiting for confirmation from Bob Peet Aaron Marcuse-Kubitza
04:44 PM Task #353 (Resolved): add terms from previous versions of DwC to DwC-BIEN
@mappings/DwC-VegBIEN.specimens.csv@ now contains duplicate mappings for DwC1 terms, which are automatically added fr... Aaron Marcuse-Kubitza
04:43 PM Task #367 (Resolved): get Univ of Arizona DwC data
See @vegbiendev:/home/bien/svn/inputs/UArizona-CSV/src/ARIZ_DiGIR_21012010.csv.tar.gz@ Aaron Marcuse-Kubitza
04:30 PM Revision 1198: bin/map: Changed root label data format convention to datasrc[data_format] so datasource names containing hyphens would not have the part after the - treated as the data format
Aaron Marcuse-Kubitza
04:25 PM Revision 1197: inputs maps: Changed input root labels to match dir names since verify expects these to be the same
Aaron Marcuse-Kubitza
04:22 PM Revision 1196: input.Makefile: verify: Fixed bug where datasource name was not set for non-DB inputs
Aaron Marcuse-Kubitza
04:18 PM Revision 1195: input.Makefile: Removed no longer needed default verify action for dirs with no verify.ref's
Aaron Marcuse-Kubitza
04:15 PM Revision 1194: input.Makefile: verify: Made verifications table-specific
Aaron Marcuse-Kubitza
03:27 PM Revision 1193: input.Makefile: import: Merged import and import-all because they do the same thing
Aaron Marcuse-Kubitza
03:26 PM Revision 1192: input.Makefile: verify: Started rearranging to allow different verifies for each table
Aaron Marcuse-Kubitza
03:19 PM Revision 1191: Moved verify.sql to mappings since it's mapping-related
Aaron Marcuse-Kubitza
02:31 PM Revision 1190: input.Makefile: Changed option nolog to log so that options aren't specified in the negative
Aaron Marcuse-Kubitza
01:43 PM Revision 1189: input.Makefile: svn ignore .trace files
Aaron Marcuse-Kubitza
01:41 PM Revision 1188: input.Makefile: Profile imports into a .trace file unless env var profile=""
Aaron Marcuse-Kubitza
01:28 PM Revision 1187: xml_func.py: _alt: On empty input, return None instead of raising SyntaxException because empty input should be OK
Aaron Marcuse-Kubitza

02/27/2012

05:37 PM Revision 1186: xml_func.py: _alt: Fixed bug where not specifying any item would crash the program instead of raising a SyntaxException
Aaron Marcuse-Kubitza
05:33 PM Revision 1185: Factored verify.sql out into schemas dir
Aaron Marcuse-Kubitza
05:26 PM Revision 1184: input.Makefile: verify: Print diff in two columns if verbose=1
Aaron Marcuse-Kubitza
05:03 PM Revision 1183: inputs/SALVIAS/verify.sql: When filtering by datasource name, use an AND clause in the JOIN party's ON condition instead of a separate WHERE statement, so that the datasource filtering code is all on the same line
Aaron Marcuse-Kubitza
04:58 PM Revision 1182: inputs/SALVIAS/verify.sql: Use new :datasource variable instead of literal 'SALVIAS'
Aaron Marcuse-Kubitza
04:58 PM Revision 1181: input.Makefile: Provide the verify.sql script a :datasource variable set to the datasource name (in quotes)
Aaron Marcuse-Kubitza
04:39 PM Revision 1180: vegbien.ERD.mwb: Re-marked aggregateoccurrence:plantobservation relationship as 1:1 in the ERD
Aaron Marcuse-Kubitza
03:55 PM Revision 1179: bin/map: DB, CSV inputs: Use column indexes instead of column names to look up each field (optimization to avoid repeated dict lookups of the same key)
Aaron Marcuse-Kubitza
03:47 PM Revision 1178: util.py: ListDict: __str__(): Print each entry on its own line, in the order the keys were provided
Aaron Marcuse-Kubitza
03:37 PM Revision 1177: NYBG-DwC maps: Filter out MinimumElevation = "."
Aaron Marcuse-Kubitza
03:37 PM Revision 1176: xml_dom.py: NodeTextEntryIter: Filter out empty entries (instead of producing an entry with an explicit None value, which causes problems with XML funcs that can't handle Nones)
Aaron Marcuse-Kubitza
03:34 PM Revision 1175: NYBG-DwC maps: Map to input fields with XML func appended whenever possible (DwC1->DwC2 translation is done by DwC-VegBIEN.specimens.csv)
Aaron Marcuse-Kubitza
02:57 PM Revision 1174: vegbien.sql: Renamed methodtaxonclass.description to methodtaxonclass.taxonclass and changed it to a closed list (enum taxonclass). method.description can still be used for freeform taxonclass inclusions/exclusions.
Aaron Marcuse-Kubitza
02:47 PM Revision 1173: DwC1-DwC2.specimens.csv: Removed no longer needed /_alt/2 XML func from date mappings (you will only ever map either the full date or the year/month/day)
Aaron Marcuse-Kubitza
02:43 PM Revision 1172: DwC mappings: Moved DwC1's CoordinatePrecision /_noCV/value XML func suffix to DwC2-VegBIEN.specimens.csv
Aaron Marcuse-Kubitza
02:38 PM Revision 1171: mappings: Removed mappings for XML func suffixes of a path because they are now automatically created heuristically by join
Aaron Marcuse-Kubitza
02:37 PM Revision 1170: join: Added heuristic search for a match on a parent path, so that every XML func suffix of a path doesn't need its own mapping
Aaron Marcuse-Kubitza
02:03 PM Revision 1169: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
02:01 PM Revision 1168: vegbien.sql: Added method.pointsperline. Rearranged ERD after removing role fkeys.
Aaron Marcuse-Kubitza
02:00 PM Revision 1167: filter_ERD.csv: Remove role fkeys
Aaron Marcuse-Kubitza
01:45 PM Revision 1166: vegbien.sql: aggregateoccurrence: Added linecover
Aaron Marcuse-Kubitza
01:37 PM Revision 1165: vegbien.sql: methodtaxonclass: Added description comment with list of values (which may become a closed list)
Aaron Marcuse-Kubitza
01:10 PM Revision 1164: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
01:02 PM Revision 1163: vegbien.sql: Changed lengthunits to m in all comments
Aaron Marcuse-Kubitza
12:56 PM Revision 1162: vegbien.sql: method: Added subplotspacing and subplotmethod_id
Aaron Marcuse-Kubitza
12:36 PM Revision 1161: vegbien.sql: method: Removed lengthunits and instead require all length- or area-related measurements throughout VegBIEN to be converted to SI base units, e.g. cm -> m, ha -> m^2. Adjusted ERD to avoid some densely packed lines.
Aaron Marcuse-Kubitza
12:17 PM Revision 1160: vegbien.sql: methodtaxonclass: Added description field for taxon classes that don't fit well into a plantconcept. Made at least one of plantconcept_id or description required. Added unique constraint.
Aaron Marcuse-Kubitza
12:07 PM Revision 1159: SALVIAS verifications: Use count(DISTINCT) instead of nested SELECT DISTINCT
Aaron Marcuse-Kubitza
12:05 PM Revision 1158: VegBIEN verifications: Select only the records for the datasource being verified
Aaron Marcuse-Kubitza
11:46 AM Revision 1157: SALVIAS verifications: Fixed to exclude subplots from locations/location events and uniqify locations based on coords
Aaron Marcuse-Kubitza
11:25 AM Revision 1156: inputs/SALVIAS/verify.sql: Updated for schema changes
Aaron Marcuse-Kubitza
10:24 AM Revision 1155: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
10:22 AM Revision 1154: vegbien.ERD.mwb: Re-marked aggregateoccurrence:plantobservation relationship as 1:1 in the ERD. (I think this will need to be manually re-marked whenever either of those tables is updated.)
Aaron Marcuse-Kubitza
10:18 AM Revision 1153: vegbien.sql: Removed methodgrowthform and growthform, since growthforms can be accommodated by plantconcept in a similar way as higher-order taxonomic ranks
Aaron Marcuse-Kubitza
10:09 AM Revision 1152: vegbien.sql: methodgrowthform, methodtaxonclass: Removed "included" default value so it's always obvious whether the author intended the classes to be inclusions or exclusions
Aaron Marcuse-Kubitza
10:04 AM Revision 1151: vegbien.sql: aggregateoccurrence: Removed unneeded fields. Added aggregateoccurrence->coverindex fkey.
Aaron Marcuse-Kubitza
09:54 AM Revision 1150: vegbien.sql: Added constraint to enforce 1:1 aggregateoccurrence:plantobservation relationship
Aaron Marcuse-Kubitza

02/25/2012

08:16 PM Revision 1149: vegbien.sql: Added plantname unique constraint
Aaron Marcuse-Kubitza
08:01 PM Revision 1148: bin/map: Use new util.ListDict and util.WrapIter to simplify getting rows by column name instead of index, and to enable a row to be printed with its column names in error messages
Aaron Marcuse-Kubitza
08:00 PM Revision 1147: util.py: Added WrapIter to wrap an iterator and ListDict to view a list as a dict
Aaron Marcuse-Kubitza
07:38 PM Revision 1146: bin/map: Use new util.list_flip()
Aaron Marcuse-Kubitza
07:37 PM Revision 1145: util.py: Added list_flip()
Aaron Marcuse-Kubitza
07:02 PM Revision 1144: env_password: Fixed to set the environment variable in the calling shell. Do this by cc-ing the tty only on messages before the "Enter password" prompt, because the redirect creates a subshell which causes the env var to only be set within that subshell.
Aaron Marcuse-Kubitza
06:18 PM Revision 1143: inputs/NYBG-CSV/maps/DwC.specimens.csv: Removed mappings that are already present in mappings/DwC1-DwC2.specimens.csv. This map now contains only the mappings where NYBG-CSV differs from standard DwC1.
Aaron Marcuse-Kubitza
06:14 PM Revision 1142: inputs/NYBG/maps/DwC.specimens.csv: Removed mappings that are already present in mappings/DwC1-DwC2.specimens.csv. This map now contains only the mappings where NYBG differs from standard DwC1.
Aaron Marcuse-Kubitza
05:58 PM Revision 1141: Remove accidentally-committed temp file inputs/NYBG/DwC.specimens2.csv
Aaron Marcuse-Kubitza
05:56 PM Revision 1140: mappings/Makefile: Generate DwC.self.specimens.csv from DwC-VegBIEN.specimens.csv for use in creating full via maps for inputs
Aaron Marcuse-Kubitza
05:40 PM Revision 1139: input.Makefile: Generate full via maps from input via maps by appending mappings from the via format to itself when available
Aaron Marcuse-Kubitza
04:30 PM Revision 1138: inputs/NYBG/maps/DwC.specimens.csv: Changed label to "NYBG-DwC" to take advantage of automatic filling in of DwC mappings not specified in the NYBG map
Aaron Marcuse-Kubitza
04:28 PM Revision 1137: subtract: Support custom column numbers to compare on (instead of just input col). Added ignore option to continue even if input columns don't match.
Aaron Marcuse-Kubitza
04:26 PM Revision 1136: bin/map: DB inputs: Get all rows in one query (hopefully a significant optimization). Allow maps to contain entries for columns that are not in the DB table.
Aaron Marcuse-Kubitza
04:22 PM Revision 1135: sql.py: select(): Select all fields if fields == None. Replaced col(cur, idx) with col_names(cur) because an iterator is easier to use than getting by index.
Aaron Marcuse-Kubitza
03:57 PM Revision 1134: bin/map: Fixed bug in previous implementation of allowing maps for CSV inputs to contain entries for columns that are not in the CSV file
Aaron Marcuse-Kubitza
03:45 PM Revision 1133: bin/map: Allow maps for CSV inputs to contain entries for columns that are not in the CSV file
Aaron Marcuse-Kubitza
02:54 PM Revision 1132: Use new sort_map instead of manually specifying the sort order
Aaron Marcuse-Kubitza
02:54 PM Revision 1131: Added sort_map to sort a map spreadsheet in the standard order
Aaron Marcuse-Kubitza
02:43 PM Revision 1130: Removed no longer needed join_passthru, because join_union_sort now serves its purpose
Aaron Marcuse-Kubitza
02:42 PM Revision 1129: Don't generate mappings/for_review/DwC-VegBIEN.specimens.csv because it's a derived map with lots of duplicated mappings for the various DwC versions
Aaron Marcuse-Kubitza
02:41 PM Revision 1128: mappings/Makefile: Generate DwC-VegBIEN.specimens.csv directly from DwC1-DwC2 and DwC2-VegBIEN mappings by using join_union_sort with header_num=1, rather than via intermediate DwC1-VegBIEN.specimens.csv
Aaron Marcuse-Kubitza
02:37 PM Revision 1127: union: Added header_num option to select which map's header to use as the output header
Aaron Marcuse-Kubitza
02:28 PM Revision 1126: Rename join_sort to join_union_sort and have it run union in ignore mode. This will automatically append the joined map when the input map is a derivative of the joined map, such as for NYBG-DwC.
Aaron Marcuse-Kubitza
02:25 PM Revision 1125: union: Pass through map 0, so that if ignore is set, the input map will still be output. Allow either map's input label to contain the other's input label to enable e.g. appending mappings for an older input version to those for a newer input version.
Aaron Marcuse-Kubitza
01:43 PM Revision 1124: DwC1-DwC2 mapping: Changed input label to DwC1, which is allowed by the now relaxed label constraints imposed by union
Aaron Marcuse-Kubitza
01:42 PM Revision 1123: union: Check if two maps can be combined based on whether map 0 column 0 label *contains* map 1 column 0 label instead of being equal. This allows map 0's input 0 root to contain the datasource name as well as a format that allows it to be combined with a more general map. Added ignore flag to not print an error if column labels don't match.
Aaron Marcuse-Kubitza
01:39 PM Revision 1122: bin/map: Support optional data format tag in map spreadsheet labels, used by union to check if two maps can be combined
Aaron Marcuse-Kubitza
01:01 PM Revision 1121: mappings: Added DwC1-DwC2.specimens.csv to core maps so it gets cleaned up
Aaron Marcuse-Kubitza
12:57 PM Revision 1120: Only generate for_review mappings of core maps and end products
Aaron Marcuse-Kubitza
12:56 PM Revision 1119: Generate DwC-VegBIEN mapping as union of DwC1 and DwC2 mappings
Aaron Marcuse-Kubitza

02/24/2012

08:00 PM Revision 1118: Generate DwC-VegBIEN mapping as union of DwC1 and DwC2 mappings
Aaron Marcuse-Kubitza
07:40 PM Revision 1117: NYBG DB mapping: Removed IdentifiedDate and CollectedDate mappings because they are generated from the year/month/day
Aaron Marcuse-Kubitza
07:39 PM Revision 1116: Added mappings/for_review/DwC1-VegBIEN.specimens.csv
Aaron Marcuse-Kubitza
07:35 PM Revision 1115: Added DwC1-DwC mapping. Generate DwC1-VegBIEN mapping automatically.
Aaron Marcuse-Kubitza
07:11 PM Revision 1114: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
07:08 PM Revision 1113: vegbien.sql: Renamed _keys unique constraints/unique indexes to _unique to better reflect their purpose
Aaron Marcuse-Kubitza
06:54 PM Revision 1112: vegbien.sql: Added method.diameterheight to store DBH height
Aaron Marcuse-Kubitza
06:44 PM Revision 1111: VegBIEN: Moved plantstatus.plantlevel to plantname.rank because the taxonomic rank is a property of the name itself
Aaron Marcuse-Kubitza
06:43 PM Revision 1110: PostgreSQL-MySQL.csv: Fixed custom types translation to match shorter type names
Aaron Marcuse-Kubitza
06:09 PM Revision 1109: vegbien.sql: Added plantstatus unique constraint
Aaron Marcuse-Kubitza
06:07 PM Revision 1108: DwC-VegBIEN mapping: Map datasource name via DwC institutionCode
Aaron Marcuse-Kubitza
05:42 PM Revision 1107: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
05:40 PM Revision 1106: vegbien.ERD.mwb: Lined up logo and legend with other ERD elements
Aaron Marcuse-Kubitza
05:35 PM Revision 1105: vegbien.sql: Renamed methodgrowthform.growthformmethod_id to submethod_id. Added methodtaxonclass.submethod_id (similar to methodgrowthform.submethod_id).
Aaron Marcuse-Kubitza
05:27 PM Revision 1104: vegbien.sql: Added methodgrowthform.growthformmethod_id for specifying a method used by just the growthform
Aaron Marcuse-Kubitza
05:14 PM Revision 1103: vegbien.ERD.mwb: Rearranged legend to more closely match layout of ERD
Aaron Marcuse-Kubitza
04:51 PM Revision 1102: vegbien.sql: Reordered plantstatus fields to put the most important fields at the top, which will be visible in the ERD
Aaron Marcuse-Kubitza
04:42 PM Revision 1101: vegbien.sql: Replaced method.taxonclassincluded,taxonclassexcluded with new many:many methodtaxonclass table. Added methodgrowthform, growthform tables to do the same thing as methodtaxonclass for growth forms.
Aaron Marcuse-Kubitza
03:53 PM Revision 1100: vegbien.sql: method: Added comment on reference_id
Aaron Marcuse-Kubitza
03:44 PM Revision 1099: VegBIEN: Moved plotmethod fields to method because they can also apply to strata. Removed no longer used plotmethod table.
Aaron Marcuse-Kubitza
03:13 PM Revision 1098: input.Makefile: input DB creation: Removed "IF NOT EXISTS" because that check is handled by $(dbExists)
Aaron Marcuse-Kubitza
03:02 PM Revision 1097: input.Makefile: Don't try to recreate an input DB if it already exists
Aaron Marcuse-Kubitza
03:01 PM Revision 1096: Added UArizona DB input
Aaron Marcuse-Kubitza
02:42 PM Revision 1095: Renaming UArizona to UArizona-CSV because there is also a DB input in bien2_staging.ariz_raw on nimoy
Aaron Marcuse-Kubitza
02:31 PM Revision 1094: Added UArizona input
Aaron Marcuse-Kubitza
02:08 PM Task #330 (Rejected): DwC extension to VegX
We're mapping DwC data separately from VegX Aaron Marcuse-Kubitza
02:07 PM Task #370 (Resolved): create ERD of final schema
See [[VegBIEN schema]] > Proposed changes for remaining changes Aaron Marcuse-Kubitza
02:06 PM Task #369 (Resolved): get CTFS data dictionary
Aaron Marcuse-Kubitza
02:06 PM Task #368 (Rejected): get TEAM VegX data
Aaron Marcuse-Kubitza
02:06 PM Task #367 (Resolved): get Univ of Arizona DwC data
Aaron Marcuse-Kubitza
02:05 PM Task #366 (Rejected): refactor VegX
Aaron Marcuse-Kubitza
02:04 PM Task #365 (Rejected): retrieve taxonomic hierarchy in analytical layer by using dynamic queries to external sources
Aaron Marcuse-Kubitza
12:46 PM Revision 1093: env_password: Fixed bug where exit command would not cause it to exit, because pipefail shell option was not set. Moved automatic exiting of the calling script into env_password itself.
Aaron Marcuse-Kubitza
12:26 PM Revision 1092: map: Exit if password not set
Aaron Marcuse-Kubitza
12:18 PM Revision 1091: env_password: cc stderr if it's a log file
Aaron Marcuse-Kubitza

02/23/2012

06:49 PM Revision 1090: env_password: Print all messages to /dev/tty so the user sees them even if stderr is redirected to a log file. Exit if password not already set, because e.g. scripts run in the background will not be able to prompt for it.
Aaron Marcuse-Kubitza
05:32 PM Revision 1089: input.Makefile: Don't have make import call verify, because the user often runs import as a test and will not want the output cluttered with verification information. Also, the full imports for which this was intended are often run asynchronously, so that the user will not see the output anyway.
Aaron Marcuse-Kubitza
05:28 PM Revision 1088: input.Makefile: Don't abort on verification errors, which are expected during development
Aaron Marcuse-Kubitza

02/21/2012

06:21 PM Revision 1087: SALVIAS tests: Fixed invalid accepted test outputs due to not running `make empty_db` before running tests when using the no-redo optimization shortcut
Aaron Marcuse-Kubitza
06:14 PM Revision 1086: SALVIAS mappings: Fixed plot key mappings to map the correct values to subplot and parent plot
Aaron Marcuse-Kubitza
05:36 PM Revision 1085: vegbien.sql: locationevent: Added unique constraint for subplots based on subplot location
Aaron Marcuse-Kubitza
05:02 PM Revision 1084: SALVIAS-db VegX mapping: Map subplots correctly the way SALVIAS-CSV does
Aaron Marcuse-Kubitza
04:54 PM Revision 1083: SALVIAS verification: Updated to schema changes
Aaron Marcuse-Kubitza
04:42 PM Revision 1082: input.Makefile: Fixed syntax error in verify %.ref target (outdated variable name)
Aaron Marcuse-Kubitza
04:33 PM Revision 1081: input.Makefile: Halt psql commands on first error
Aaron Marcuse-Kubitza
04:27 PM Revision 1080: vegbien.sql: Removed location.authorlocationcode because it's now stored in locationevent as an author-specific setting
Aaron Marcuse-Kubitza
04:24 PM Revision 1079: vegbien.sql: locationevent: Redid unique constraints to avoid applying authorlocationcode-only duplicate elimination to subplots
Aaron Marcuse-Kubitza
04:16 PM Revision 1078: SALVIAS mappings: Map SiteCode/plot_code to locationevent.authorlocationcode because locationevent is now the place to store author-specific plot information
Aaron Marcuse-Kubitza
04:10 PM Revision 1077: SALVIAS mappings: Fixed PlotID mapping to go to locationevent.sourceaccessioncode
Aaron Marcuse-Kubitza
04:06 PM Revision 1076: VegBIEN: Renamed locationevent.authoreventcode to authorlocationcode to reflect that datasources usually use an author-defined code for a plot rather than a plot event
Aaron Marcuse-Kubitza
04:03 PM Revision 1075: vegbien.sql: locationevent: Redid unique constraints to handle datasources that treat the authoreventcode as an authorlocationcode. Eventually, authoreventcode will be renamed to authorlocationcode.
Aaron Marcuse-Kubitza
03:51 PM Revision 1074: vegbien.sql: locationevent: Redid unique constraints to work properly for all fully-specified combinations of keys
Aaron Marcuse-Kubitza
03:31 PM Revision 1073: VegBIEN mappings: Mapped datasource name to new project.datasource. Fixes project duplicate elimination.
Aaron Marcuse-Kubitza
03:16 PM Revision 1072: vegbien.sql: Renamed project.reference_id to datasource_id and pointed it to party, to match locationevent, etc.
Aaron Marcuse-Kubitza
03:02 PM Revision 1071: VegBIEN mappings: Mapped current lat/long to centerlat/long as well so location duplicate elimination will work properly
Aaron Marcuse-Kubitza
03:01 PM Revision 1070: xpath.py: Added support for common subpath after split path's {}
Aaron Marcuse-Kubitza
01:30 PM Revision 1069: sql.py: put(): When encountering a DuplicateKeyException, use dict_subset_right_join to fill in explicit NULL values for columns which don't have data. This causes the database to use the UNIQUE constraint's index to look up the record, instead of relying on individual column indexes for the columns that did have data, which may or may not be available.
Aaron Marcuse-Kubitza
01:27 PM Revision 1068: util.py: Added DefaultDict to wrap collections.defaultdict with a simple value passed in the constructor, defaulting to None. Added dict_subset_right_join() to fill in None for subset keys that don't exist.
Aaron Marcuse-Kubitza
01:09 PM Task #351 (Resolved): list of fields and method attributes needed to know whether can combine data from different plots
Aaron Marcuse-Kubitza
01:08 PM Task #350 (Resolved): implement methods in VegBIEN
Aaron Marcuse-Kubitza
01:08 PM Task #352 (Resolved): create way to represent methods hierarchically in schema
Aaron Marcuse-Kubitza
01:08 PM Task #358 (Resolved): make shortlist of 1st-level fields a method should have
Aaron Marcuse-Kubitza
01:06 PM Revision 1067: vegbien.sql: Added method and plotmethod UNIQUE indexes
Aaron Marcuse-Kubitza
01:04 PM Revision 1066: vegbien.ERD.mwb: Removed embargo table from ERD because its functionality is provided in location.confidentialitystatus,confidentialityreason
Aaron Marcuse-Kubitza
12:36 PM Revision 1065: Regenerated vegbien.ERD exports
Aaron Marcuse-Kubitza
12:34 PM Revision 1064: vegbien.sql: Moved locationevent method fields to plotmethod and method. Added comments to method/plotmethod fields, as provided by Michael Lee.
Aaron Marcuse-Kubitza
12:15 PM Revision 1063: VegX-VegBIEN mapping: Mapped locationevent.methodnarrative to new plotmethod table
Aaron Marcuse-Kubitza
 

Also available in: Atom