Project

General

Profile

Result filtering

TNRS results

is_homonym

Author_matched IS NOT NULL: never a homonym, because author disambiguates
otherwise: Family_matched in family homonyms OR Genus_matched in genus homonyms

taxon_is_plant

note that this is different from tnrs.is_valid_match, because specimens data (which uses this field) has more stringent requirements for a valid match than plots data (which needs a separate validity field that ignores homonyms)

from Brad, with modifications:

Use this algorithm to populate column 'isPlant' for each TNRS results:

Family_matched IS NOT NULL AND NOT is_homonym --> 'plant'
otherwise:
        Genus_score=1
                NOT is_homonym -->  'plant'
                is_homonym:
                        Infraspecific_epithet_matched IS NOT NULL OR Infraspecific_epithet_2_matched
                          IS NOT NULL OR Specific_epithet_score>=0.9 -->  'plant'
                        Infraspecific_epithet_matched IS NULL AND Infraspecific_epithet_2_matched IS
                          NULL AND Specific_epithet_score <0.9 -->  'ambiguous'
        Genus_score>1 AND Genus_score>=0.85
                Infraspecific_epithet_matched IS NOT NULL OR Infraspecific_epithet_2_matched
                  IS NOT NULL OR Specific_epithet_score>=0.9 -->  'plant'
                Infraspecific_epithet_matched IS NULL AND Infraspecific_epithet_2_matched IS
                  NULL AND Specific_epithet_score <0.9 -->  'ambiguous'
        Genus_score<0.85 -->  'ambiguous'

observation_is_plant

this additional filter actually needs to add back names, because the join to TNRS by default only includes valid matches, which are required to be unambiguous

from Brad:

dataSourceType='plot' --> Keep
        Name_matched IS NOT NULL --> 'plant'
        Name_matched IS NULL --> 'ambiguous'
dataSourceType='specimen' --> use isPlant from TNRS

observations to include in range modeling

range_model_include

this would actually be a filter in the range_modeling view

from Brad:

Use this algorithm to filter each observation:

dataSourceType='plot' --> Keep
dataSourceType='specimen'
        Combined_species IS NULL --> Discard
        Combined_species IS NOT NULL
                isPlant='plant' --> Keep
                isPlant='ambiguous' --> Discard