Project

General

Profile

Statistics
| Revision:

# Date Author Comment
1393 03/13/2012 04:18 PM Aaron Marcuse-Kubitza

Added inputs/GBIF/test with accepted test outputs

1392 03/13/2012 04:18 PM Aaron Marcuse-Kubitza

Added inputs/GBIF/maps

1391 03/13/2012 04:17 PM Aaron Marcuse-Kubitza

Regenerated inputs/UArizona*/maps VegBIEN maps

1390 03/13/2012 04:13 PM Aaron Marcuse-Kubitza

Regenerated mappings/DwC-VegBIEN.specimens.no_empty.csv

1389 03/13/2012 04:09 PM Aaron Marcuse-Kubitza

bin/map: Use new csvs.reader_and_header() to support CSVs/TSVs with other than the default Excel dialect

1388 03/13/2012 04:08 PM Aaron Marcuse-Kubitza

Added csvs.py for CSV I/O such as automatically detecting the dialect based on the header line

1387 03/13/2012 04:07 PM Aaron Marcuse-Kubitza

join: Don't append suffix to empty output mappings, so that they stay empty ("NULL")

1386 03/13/2012 04:00 PM Aaron Marcuse-Kubitza

input.Makefile: Added tsv to $(exts). Strip extra whitespace from $(inputs) so that it's the empty string if $(<in) (and $(<in).header) don't exist, and can be used in $(if ...).

1385 03/12/2012 07:08 PM Aaron Marcuse-Kubitza

input.Makefile: Fixed bug in inputFiles wildcard where extensions were manually listed instead of dynamically determined from the $(exts) config var

1384 03/12/2012 06:56 PM Aaron Marcuse-Kubitza

README.TXT: Tell user to `disown -h 1` after running `make import x%x` so that it won't be sent a SIGHUP if the user logs out

1383 03/12/2012 06:55 PM Aaron Marcuse-Kubitza

README.TXT: Tell user to `disown -h 1` after running `make import x%x` so that it won't be sent a SIGHUP if the user logs out

1382 03/12/2012 06:39 PM Aaron Marcuse-Kubitza

input.Makefile: Prepend separate CSV header when available

1381 03/12/2012 06:24 PM Aaron Marcuse-Kubitza

input.Makefile: Use with_cat in map to later support prepending separate CSV headers

1380 03/12/2012 06:21 PM Aaron Marcuse-Kubitza

Added with_cat to run a command, taking input from the concatenation of files

1379 03/12/2012 05:48 PM Aaron Marcuse-Kubitza

input.Makefile: Set mapEnv if $(dbEngine) is set, to eventually support pre-existing DB connections

1378 03/12/2012 05:14 PM Aaron Marcuse-Kubitza

input.Makefile: Changed $(dbFile) to $(dbExport) to make it unambiguous that it refers to a SQL export, not a pre-existing DB, which will be supported later

1377 03/12/2012 05:10 PM Aaron Marcuse-Kubitza

input.Makefile: Added .txt to list of input file extensions

1376 03/12/2012 04:34 PM Aaron Marcuse-Kubitza

Added inputs/SpeciesLink

1375 03/12/2012 03:57 PM Aaron Marcuse-Kubitza

root Makefile: python-Linux: Added pymetrics

1374 03/12/2012 03:54 PM Aaron Marcuse-Kubitza

bin/map: Consider \N to be None

1373 03/12/2012 03:49 PM Aaron Marcuse-Kubitza

util.py: none_if(): Allow multiple none_vals using varargs

1372 03/12/2012 03:36 PM Aaron Marcuse-Kubitza

Added inputs/GBIF

1371 03/12/2012 03:28 PM Aaron Marcuse-Kubitza

exc.py: Fixed bug in traceback-saving mechanism that didn't deal with nested Exceptions (such as Exceptions with causes in ExceptionWithCause). Renamed add_exc_info() to add_traceback() since we really only need to store the traceback.

1370 03/12/2012 12:41 PM Aaron Marcuse-Kubitza

dates.py: parse_date_range(): Fixed bug where the date parts were not joined back together into a string for each date range element. Use strings.single_space() after the date has been split into range parts so that whitespace around the range separator is removed instead of being replaced with a single space.

1369 03/12/2012 12:25 PM Aaron Marcuse-Kubitza

xml_func.py: process(): Also catch XML func internal errors to assist in debugging. Use new exc.add_exc_info() to save traceback in case later code throws exception, overwriting exc_info().

1368 03/12/2012 12:23 PM Aaron Marcuse-Kubitza

exc.py: str_(): Add the traceback at the end of the exception string. Added add_exc_info() and get_exc_info() for providing traceback info for str_().

1367 03/11/2012 07:33 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: eventDate, dateIdentified: Use _dateRangeStart and _dateRangeEnd

1366 03/11/2012 07:32 PM Aaron Marcuse-Kubitza

xml_func.py: Added _dateRangeStart and _dateRangeEnd

1365 03/11/2012 07:32 PM Aaron Marcuse-Kubitza

dates.py: Added parse_date_range() and helper funcs could_be_year() and could_be_day()

1364 03/11/2012 07:31 PM Aaron Marcuse-Kubitza

strings.py: Added single_space()

1363 03/11/2012 06:12 PM Aaron Marcuse-Kubitza

inputs/UArizona*: Map the ScientificNameAuthor to the binomial instead since it contains the binomial in addition to the authority

1362 03/11/2012 05:28 PM Aaron Marcuse-Kubitza

Added inputs/UArizona-CSV/test

1361 03/11/2012 05:23 PM Aaron Marcuse-Kubitza

input.Makefile: Use .PRECIOUS to save outputs of failed tests so they can be accepted (needed now that .DELETE_ON_ERROR is turned on globally)

1360 03/11/2012 05:14 PM Aaron Marcuse-Kubitza

bin/map: Moved string-cleanup code from get_value() to cleanup(), called by process_row(). process_row() now cleans up the string before checking if it's None, because cleanup() uses none_if() to map "" to None.

1359 03/11/2012 05:12 PM Aaron Marcuse-Kubitza

util.py: Added do_ignore_none()

1358 03/11/2012 04:25 PM Aaron Marcuse-Kubitza

Added inputs/UArizona-CSV/verify

1357 03/11/2012 04:24 PM Aaron Marcuse-Kubitza

Added inputs/UArizona-CSV/maps

1356 03/11/2012 04:23 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Mapped coordinateUncertaintyInMeters to the same place as coordinatePrecision (input sources generally use only one of these columns, which is most likely the accuracy regardless of what it's named)

1355 03/11/2012 04:18 PM Aaron Marcuse-Kubitza

join: In error message when map column names don't match, include the actual column names

1354 03/11/2012 04:17 PM Aaron Marcuse-Kubitza

Makefiles: Added .DELETE_ON_ERROR to delete target if recipe fails

1353 03/11/2012 03:18 PM Aaron Marcuse-Kubitza

VegBIEN mappings: plantnames: Nest taxons hierarchically using plantname.parent_id. Mappings using _forEach: Append a "," to the `in` list so that mappings will sort from shortest to longest `in` list ("]" comes after "," in ASCII, causing this not to happen without the trailing ",").

1352 03/11/2012 03:14 PM Aaron Marcuse-Kubitza

xpath.py: parse(): _paths(): Remove trailing ","

1351 03/11/2012 02:38 PM Aaron Marcuse-Kubitza

xpath_func.py: _forEach: Made syntax more natural-looking by using values instead of names for string args and attrs instead of branches for array args

1350 03/11/2012 02:36 PM Aaron Marcuse-Kubitza

xpath.py: parse() Fixed bug in _paths() where empty lists would be parsed as a list containing a single empty path, instead of as an empty list

1349 03/11/2012 01:26 PM Aaron Marcuse-Kubitza

VegBIEN mappings: Place names: Use _forEach to simplify XPaths for recursively nested places

1348 03/11/2012 01:22 PM Aaron Marcuse-Kubitza

bin/map: In debug mode, print output XPaths

1347 03/09/2012 07:51 PM Aaron Marcuse-Kubitza

xpath_func.py: _forEach: Fixed to support _val replacements anywhere, by doing a string-based search-and-replace on a quoted XPath instead of a list-based search-and-replace on an already-parsed XPath

1346 03/09/2012 07:41 PM Aaron Marcuse-Kubitza

xpath_func.py: Renamed _for to _forEach. Finished implementing _forEach.

1345 03/09/2012 07:41 PM Aaron Marcuse-Kubitza

xpath.py: Import xpath_func after defining XpathElem because xpath_func depends on XpathElem and it hasn't yet been factored into a separate file

1344 03/09/2012 07:39 PM Aaron Marcuse-Kubitza

util.py: Added list_replace()

1343 03/09/2012 07:14 PM Aaron Marcuse-Kubitza

xpath_func.py: Changed XPath function signature to take arguments (args, path), and process() to parse out the args. Implemented basic for that repeats its do arg as many times as there are in elements.

1342 03/09/2012 06:44 PM Aaron Marcuse-Kubitza

xpath.py: parse(): Run xpath_func.process() on the parsed XPath

1341 03/09/2012 06:43 PM Aaron Marcuse-Kubitza

Added xpath_func.py for XPath "function" elements that transform their subpaths

1340 03/09/2012 06:23 PM Aaron Marcuse-Kubitza

VegBIEN mappings: Removed no longer needed taxondetermination.determinationtype values, because they can be determined from the new role closed list

1339 03/09/2012 06:19 PM Aaron Marcuse-Kubitza

filter_ERD.csv: Removed no longer needed references to role

1338 03/09/2012 06:18 PM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

1337 03/09/2012 06:17 PM Aaron Marcuse-Kubitza

VegBIEN: Changed role table to a closed list

1336 03/09/2012 06:14 PM Aaron Marcuse-Kubitza

PostgreSQL-MySQL.csv: custom types: Consider everything except a set of accepted types to be a custom type

1335 03/09/2012 05:40 PM Aaron Marcuse-Kubitza

VegBIEN: taxonrank enum: Made values lowercase to match case convention in other enums

1334 03/09/2012 05:33 PM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

1333 03/09/2012 05:32 PM Aaron Marcuse-Kubitza

vegbien.sql: Renamed plantconceptscope to plantnamescope because it's now attached to plantname

1332 03/09/2012 05:26 PM Aaron Marcuse-Kubitza

vegbien.sql: Moved parent_id from plantconcept to plantname, since plantnames themselves are unique according to their parent taxons (a species under one genus is not the same as a species under another genus)

1331 03/09/2012 05:03 PM Aaron Marcuse-Kubitza

Regenerated vegbien.ERD exports

1330 03/09/2012 04:59 PM Aaron Marcuse-Kubitza

vegbien.ERD.mwb: Fixed lines

1329 03/09/2012 04:57 PM Aaron Marcuse-Kubitza

vegbien.sql: Moved scope_id from plantconcept to plantname, since plantnames themselves are scoped, not just the plantconcepts that use them (e.g. "sp. 1" has different meanings in different scopes, so it should not be shared between scopes). plantname: Added accessioncode.

1328 03/09/2012 04:38 PM Aaron Marcuse-Kubitza

vegbien.sql: Moved plantconcept parent_id from plantstatus to plantconcept. plantconcept: Removed datasource-specific fields to make it globally unique (one plantconcept for each assigned parent taxon of a plantname, of which there will usually be just one)

1327 03/09/2012 04:22 PM Aaron Marcuse-Kubitza

vegbien.sql: plantname: Removed datasource-specific fields to make this a globally-unique table (the datasource-specific fields belong in plantconcept)

1326 03/09/2012 04:16 PM Aaron Marcuse-Kubitza

Added inputs/UArizona/verify

1325 03/09/2012 04:15 PM Aaron Marcuse-Kubitza

mappings/verify.specimens.sql: Updated for schema changes

1324 03/09/2012 04:06 PM Aaron Marcuse-Kubitza

vegbien.sql: placerank enum: Added "village"

1323 03/09/2012 04:00 PM Aaron Marcuse-Kubitza

VegBIEN mappings: lat/long locationdetermination: Removed [!namedplace_id] key so that it's merged into the namedplace locationdetermination

1322 03/09/2012 03:54 PM Aaron Marcuse-Kubitza

VegBIEN mappings: Changed namedplace mappings to use new nested format for storing place containment relationships

1321 03/09/2012 03:44 PM Aaron Marcuse-Kubitza

xml_func.py: Added _simplifyPath

1320 03/09/2012 03:25 PM Aaron Marcuse-Kubitza

xpath.py: Added get_1()

1319 03/09/2012 02:50 PM Aaron Marcuse-Kubitza

vegbien.sql: namedplace: Removed parent_id from unique constraint because some data might be missing intervening links (e.g. state for a county, country), but the place (e.g. county) should still be attached to the existing place of the same name and rank (which will hopefully already have the correct parent_id link)

1318 03/09/2012 02:46 PM Aaron Marcuse-Kubitza

vegbien.sql: namedplace: Made rank required

1317 03/09/2012 02:33 PM Aaron Marcuse-Kubitza

vegbien.sql: namedplace: Removed no longer needed placesystem, which has been replaced by rank closed list

1316 03/09/2012 02:30 PM Aaron Marcuse-Kubitza

VegBIEN mappings: Map namedplaces using new rank field

1315 03/09/2012 02:25 PM Aaron Marcuse-Kubitza

vegbien.sql: namedplace: Added rank. Do duplicate elimination using rank and parent_id instead of placesystem

1314 03/09/2012 02:20 PM Aaron Marcuse-Kubitza

vegbien.sql: placerank: Standardized names to DwC/GML

1313 03/09/2012 01:06 PM Aaron Marcuse-Kubitza

vegbien.sql: Added placerank enum

1312 03/09/2012 12:35 PM Aaron Marcuse-Kubitza

vegbien.sql: namedplace: Removed VegBank internal fields and datasource scoping fields (namedplaces are globally unique). Added parent_id to point to containing namedplace.

1311 03/09/2012 12:21 PM Aaron Marcuse-Kubitza

xml_func.py: Added _dateRangePart with partial implementation (only works on strings with no range)

1310 03/09/2012 12:20 PM Aaron Marcuse-Kubitza

DwC mappings: Moved date _date filter outside _alt so it would run only on the string that was actually chosen, and not produce date format errors when a pre-parsed year/month/day is already available

1309 03/08/2012 06:30 PM Aaron Marcuse-Kubitza

xml_func.py: _date: Map date with only empty fields to NULL (occurs when all fields were e.g. 0 and were filtered to NULL by _nullIf)

1308 03/08/2012 06:00 PM Aaron Marcuse-Kubitza

xml_func.py: _date: Removed mapping year/month/day of 0 to NULL because that is now handled on a case-by-case basis in the mappings

1307 03/08/2012 05:58 PM Aaron Marcuse-Kubitza

mappings/DwC1-DwC2.specimens.csv: Map year/month/day of 0 to NULL

1306 03/08/2012 05:13 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/maps/VegX.organisms.csv: Habit: Fixed syntax error in growthForm map

1305 03/08/2012 05:11 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/maps/VegX.organisms.csv: Habit: Removed input values from growthForm map that Brad said were invalid

1304 03/08/2012 05:10 PM Aaron Marcuse-Kubitza

xml_func.py: _map: Added option to make map a closed list

1303 03/08/2012 04:56 PM Aaron Marcuse-Kubitza

mappings/DwC2-VegBIEN.specimens.csv: Fixed waterdepth mappings to use _avg

1302 03/06/2012 06:48 PM Aaron Marcuse-Kubitza

mappings/verify.specimens.sql: Use ORDER BY ... NULLS FIRST to match MySQL

1301 03/06/2012 06:42 PM Aaron Marcuse-Kubitza

input.Makefile: verify: Time the verification since it can take a long time

1300 03/06/2012 06:34 PM Aaron Marcuse-Kubitza

specimens verification: Added duplicate catalog numbers test

1299 03/06/2012 06:27 PM Aaron Marcuse-Kubitza

map: On nimoy, use bien2_staging unless otherwise specified

1298 03/06/2012 06:21 PM Aaron Marcuse-Kubitza

specimens verification: Added # counties test

1297 03/06/2012 05:34 PM Aaron Marcuse-Kubitza

specimens verification: Added collection codes and # catalog numbers tests

1296 03/06/2012 05:33 PM Aaron Marcuse-Kubitza

inputs/SALVIAS/maps/VegX.organisms.csv: Mapped custom Habit values not listed in the SALVIAS data dictionary

1295 03/06/2012 05:32 PM Aaron Marcuse-Kubitza

strings.py: Added unicode_reader for later use in handling Unicode characters in map spreadsheets

1294 03/06/2012 03:45 PM Aaron Marcuse-Kubitza

xpath.py: Removed unnecessary copy.deepcopy()'s and instead changed set_value() and set_id() to make copies of any elements they change. This should result in up to a 17% speed increase in the import, because deepcopy() was taking a lot of time. Added documentation to set_value() and set_id() that caller must make a shallow copy of the path to prevent modifications from propagating to other copies of the path. (Previously, a deep copy was needed, but there was no comment specifying this.)