2013-11-07 conference call¶
Martha's notes¶
Upcoming¶
- call next week at usual time (Th 9am PT&Tucson/12pm ET)
- will discuss VegBank validation
Availability¶
- Paul is back on his previous project
- see the *Google spreadsheet* (and please add your availability for future weeks once it's known):
Decisions made¶
VegBank validation¶
- remove CVS plots, rather than adding them back for the validation
- the unique aspects of CVS plots are in other sample plots, too: the datasets for the Rocky Mountains (Colorado) and Smoky Mountains National Park have stems
but these are from the very first validation round; the subset has now changed and may not include these datasets
- the unique aspects of CVS plots are in other sample plots, too: the datasets for the Rocky Mountains (Colorado) and Smoky Mountains National Park have stems
taxonomic names in extracts¶
except for the taxonMorphospecies, the species binomial should always be the scrubbed name, not the verbatim name, so that scientists can be sure this represents an actual species- this can, however, be a matched name rather than an accepted name because these are both considered scrubbed
- Brad wants us to include the following fields:
familyVerbatim - family as submitted by the data provider. May be null for some datasets.
scientificNameVerbatim - scientificName, as submitted by data provider
familyMatched - family matched by the TNRS, if any
scientificNameMatched - lowest-level scientific name matched by the TNRS, without the author
scientificNameAuthorshipMatched - author of the lowest-level scientific name matched by TNRS
familyAccepted - accepted family provided by TNRS
scientificNameAccepted - lowest-level accepted scientific name provided by TNRS, without the author
scientificNameAuthorshipAccepted - author of the lowest-level accepted scientific name provided by TNRS
annotations - annotation terms such as "cf." and "aff.", as extracted by the TNRS
unmatchedTerms - trailing strings, if any, not recognized by the TNRS (as returned by the TNRS in column umnatched_terms)
morphospecies"morphospecies" is the resolved scientific name (minus the author) concatenated with unmatched terms, formed according the algorithm I sent you previously. I think it's helpful to list both unmatchedTerms and morphospecies; doing so clarifies how morphospecies is formed. As morphospecies are not relevant to specimen validation extracts, tell whoever is doing the validation to ignore the column morphospecies.
scientificNameMatched - lowest-level scientific name matched by the TNRS, without the author
We actually call this taxonName instead of scientificName, to indicate that it doesn't have the author. scientificName is defined by DwC as including the author. (Or perhaps we should come up with a different term for "taxonomic name with author" to avoid the ambiguity?)
In that case, substitute "taxon" for "scientificName" for all columns except "scientificNameVerbatim" (the latter can contain the authors, depending on the data source).
The column headings would be:
familyVerbatimscientificNameVerbatimfamilyMatchedtaxonMatchedtaxonAuthorshipMatchedfamilyAcceptedtaxonAcceptedtaxonAuthorshipAccepted
annotations
unmatchedTermsmorphospecies
geoscrubbing¶
- if need to re-run geoscrubbing scripts, test just a sample of 1000 rows
To do for Bob and Mike Lee¶
review current VegBank extractignore CVS rows (rows 239-541)
check that new VegBank extract has CVS data properly removed
To do for Aaron¶
VegBank validation¶
completely remove CVS plotssend new extract
taxonomic names in extracts¶
- ensure all of Brad's taxonomic columns are in the validation view