Project

General

Profile

Task #916

Updated by Aaron Marcuse-Kubitza almost 10 years ago

Hi Aaron, 

 Bob only had time to get part way through the VegBank taxon validation file you sent, but there are some errors to correct. It'll be best for him if you fix these, rescrub the TNRS names as described in #917, and send a new extract before he invests more time, so I'm going to go ahead and create issues for them. Please fix these issues and then send Bob a new file in the format described in issue #915. 

 Line numbers refer to the csv file you sent him. 

 Line 617: TNRS gives a synonym for Aronia prunifolia in a different genus, but this is missed here 

 Likely cause: The BIEN scripts may be using only Tropicos as the taxonomic source, not USDA. Tropicos matches only the genus.  

 Fix: Use all sources for the next round of scrubbing. Use them in the order: TPL, Tropicos, GCC, USDA. 


 Line 897:    Diacritical marks on authors names are often messed up 

 Likely cause: Character set problem. These have been checked using the online version of TNRS and it has been confirmed that diacritics are being rendered correctly.  

 Fix: Find where character set problems need to be handled in the BIEN scripts. (TNRS code works so look at what's done there.) 


 Line 1049:    There are two spellings of Erechtites hieraciifolia and TNRS know of both of them, so why is one rejected here 
 Same issue as Line 617. The two spellings are in USDA and GCC, but not in Tropicos.  


 Please fix these problems and then send Bob a new file in the format described in issue #915. 

 --- 

 my e-mail response from 2014-5-12: 

 > Fix: Use all sources for the next round of scrubbing. Use them in the order: 
 > TPL, Tropicos, GCC, USDA. 

 -The next round of TNRS re-scrubbing is currently planned to happen *after* the aggregating validations are complete, so we will not be able to fix this issue for the validations, unless we move up the re-scrubbing. Note that re-scrubbing all the names is expected to take at least a week.- _rescrubbing done_ 

 > TPL 

 -This is not currently available as a TNRS source; we will have to wait until Brad adds it if we need that included.- _now added_ 



 > Line 897:    Diacritical marks on authors names are often messed up 

 The name on this row, "Asteraceae Boltonia L'Hér.", actually *is* rendered correctly by our TNRS client [1], so this is just a problem with the cached name, which has since been resolved. 

 [1] ssh vegbiendev.nceas.ucsb.edu <<<"bin/tnrs_client \"Asteraceae Boltonia L'Hér.\"" 



 > Line 1049:    There are two spellings of Erechtites hieraciifolia and TNRS 
 > know of both of them, so why is one rejected here 

 -This This is a bug in TNRS, which has been reported.- _we are now using a workaround instead_ reported.

Back