Task #917
Updated by Aaron Marcuse-Kubitza over 10 years ago
From Brad:
I’ve given some thought to the TPL matter. The algorithm isn’t hard, but Aaron will have to do the sorting himself.
1. -Make sure sources are selected in the following order: GCC, TPL, Tropicos, USDA-
2. -When downloading names, do NOT sort by source _(ie. don't limit results to just the best match when sorted by source)_-
3. -Download all results (not just best matches)-
4. -Apply the usual TNRS sort order to the matches for a give name.- _obtained by sorting by @Overall_score@. note that this is *not* the same as the order the matches are returned in, because Constrain by Source is broken and can't be turned off._
Aaron, Brad said there are no additional steps to apply here (in step 4). Just proceed to step 5. (-Martha)
5. If the best match (indicated by Selected=TRUE) has source=Tropicos and acceptance=accepted AND another match is available where source<is not equal to>Tropicos and acceptance=synonym, use the latter name (we don't need this until after the names are scrubbed (Martha))
_TNRS appears to do this sorting automatically when @Constrain by Source@ is on. however, for some names, using @Selected=true@ produces an incorrect result, because @Selected@ is derived from the @Overall_score@, which does not take into account that there might be a better match on a higher rank (and thus there should be a higher match score)._
_*we should probably develop our own formula for determining the best match.* we don't know what formula TNRS uses to mark matches as @Selected@, and therefore can't verify that TNRS's best-match formula is correct._
6. All other cases, use the best match as flagged
That should filter out most Tropicos nomenclatural synonyms incorrectly labeled accepted. I can unpack #4 for Aaron when the time arrives.
Brad