Task #917
Updated by Aaron Marcuse-Kubitza over 7 years ago
From Brad: I’ve given some thought to the TPL matter. The algorithm isn’t hard, but Aaron will have to do the sorting himself. 1. -Make sure sources are selected in the following order: GCC, TPL, Tropicos, USDA- 2. -When downloading names, do NOT sort by source _(ie. don't limit results to just the best match when sorted by source)_- 3. -Download all results (not just best matches)- 3a. -fix anomaly where there were multiple @Selected@ names for some input names (to avoid breaking constraints)- 3b. -reimplement the parsed-rank columns for the all-matches strategy, which does not have a single scrubbed name per input name to parse- _see "@taxon_match@ derived columns":http://vegbiendev.nceas.ucsb.edu/VegBIEN/TNRS/taxon_match/:constraints columns":http://vegpath.org/VegBIEN/TNRS/taxon_match/:constraints ._ 3c. -create table and algorithm to store a selected best match for each input name- 4. Apply the usual TNRS sort order (see the "README":https://github.com/iPlantCollaborativeOpenSource/TNRS/blob/master/README_TNRSBestMatchAlgorithm.docx?raw=true ._[1]) to the matches for a give name. -Aaron, Brad said there are no additional steps to apply here (in step 4). Just proceed to step 5. (--Martha)- _Brad now says that actually the TNRS sort order is incorrect because of the "Constrain by Source bug":https://pods.iplantcollaborative.org/jira/browse/TNRS-188 ._ 5. If the best match (indicated by Selected=TRUE) has source=Tropicos and acceptance=accepted AND another match is available where source<is not equal to>Tropicos and acceptance=synonym, use the latter name (we don't need this until after the names are scrubbed (Martha)) 6. -All other cases, use the best match as flagged- That should filter out most Tropicos nomenclatural synonyms incorrectly labeled accepted. I can unpack #4 for Aaron when the time arrives. Brad fn1. note that @edit_distance = (1 - specific_epithet_score)*greatest_length@