Task #916
Updated by Aaron Marcuse-Kubitza over 10 years ago
Hi Aaron,
Bob only had time to get part way through the VegBank taxon validation file you sent, but there are some errors to correct. It'll be best for him if you fix these, rescrub the TNRS names as described in #917, and send a new extract before he invests more time, so I'm going to go ahead and create issues for them. Please fix these issues and then send Bob a new file in the format described in issue #915.
Line numbers refer to the csv file you sent him.
Line 617: TNRS gives a synonym for Aronia prunifolia in a different genus, but this is missed here
Likely cause: The BIEN scripts may be using only Tropicos as the taxonomic source, not USDA. Tropicos matches only the genus.
Fix: Use all sources for the next round of scrubbing. Use them in the order: TPL, Tropicos, GCC, USDA.
Line 897: Diacritical marks on authors names are often messed up
Likely cause: Character set problem. These have been checked using the online version of TNRS and it has been confirmed that diacritics are being rendered correctly.
Fix: Find where character set problems need to be handled in the BIEN scripts. (TNRS code works so look at what's done there.)
Line 1049: There are two spellings of Erechtites hieraciifolia and TNRS know of both of them, so why is one rejected here
Same issue as Line 617. The two spellings are in USDA and GCC, but not in Tropicos.
Please fix these problems and then send Bob a new file in the format described in issue #915.
---
my e-mail response:
> Fix: Use all sources for the next round of scrubbing. Use them in the order:
> TPL, Tropicos, GCC, USDA.
The next round of TNRS re-scrubbing is currently planned to happen *after* the aggregating validations are complete, so we will not be able to fix this issue for the validations, unless we move up the re-scrubbing. Note that re-scrubbing all the names is expected to take at least a week.
> TPL
This is not currently available as a TNRS source; we will have to wait until Brad adds it if we need that included.
> Line 897: Diacritical marks on authors names are often messed up
The name on this row, "Asteraceae Boltonia L'Hér.", actually *is* rendered correctly by our TNRS client [1], so this is just a problem with the cached name, which has since been resolved.
[1] ssh vegbiendev.nceas.ucsb.edu <<<"bin/tnrs_client \"Asteraceae Boltonia L'Hér.\""
> Line 1049: There are two spellings of Erechtites hieraciifolia and TNRS
> know of both of them, so why is one rejected here
This is a bug in TNRS, which has been reported.