Project

General

Profile

2013-06-06 conference call

To dos from Martha

from Martha on the iPlant wiki:

Mark

Regarding item postprocess TNRS results to exclude animals with genus homonyms
Contact Tony at CSIRO about giving Aaron direct access to IRMNG database's animal/plant genus homonyms so he doesn’t have to resort to parsing web pages.

Martha

On Monday, advise Aaron on how to proceed with the animal/plant genus homonyms.

Aaron

1) Fix the bug in the GBIF filtering script – we expect that to reduce the number of plant records to a believable number

2) Regarding postprocess TNRS results to exclude animals with genus homonyms,

wait until Monday to see if Tony provides you direct access to the IRMNG database’s animal/ plant genus homonyms.

3) fix higherPlantGroup to match on the genus when no family match

create genus->higherPlantGroup lookup table
lookup table must exclude internal plant homonyms (different from animal/plant homonyms)
get these from TNRS's Tropicos DB

4) add COALESCE of TNRS accepted and matched name to analytical_stem_view

5) FIA filtering

Mark and Jim

Help Aaron determine which points at which to provide concrete results.

Aaron and Brad

Work together to define the column name changes for the data dictionary (resulting from the 'coalesced TNRS name' item).

To do for Mark

To do for Aaron

GBIF subsetting: fix plant_fraction SQL bug

FIXED

  • COUNT(boolean) counts non-NULL rather than true values
    • and boolean is actually an integer datatype in MySQL, so MySQL would not know that you were referring to a boolean
  • you need to add NULLIF(..., false) around the expression in inputs/GBIF/raw_occurrence_record/run > COUNT(family LIKE ...)

GBIF subsetting: fix raw_occurrence_record filter formula

FIXED

  • within the herbaria_filter institution_codes, NULL families are OK but non-plant families are not
  • WHERE clause needs to include a recheck of each family to ensure that it is a plant or ambiguous

animal/plant genus and family homonyms

Waiting to see if Tony will provide the species and family homonyms in the same delimited format as the genera (he said he would work on it this Monday). If he doesn't change the format, we'll still need a screenscraper for the other homonym ranks.

  • note: there are also family homonyms between plants and the kingdoms Fungi, Bacteria, Protista (search for Plantae in the IRMNG page)
    • GBIF may contain data from these kingdoms, especially fungi (e.g. mushrooms growing on a tree), so we do need to deal with family homonyms in general, even though there aren't animal/plant family homonyms.
  • if haven't heard from Tony by next Monday, implement screenscraping of their homonyms web interface
  • automate download of each letter's page
  • use regular expressions to extract homonyms
  • ensure that the genera of all species-level homonyms are in the genus-level homonyms list
  • when matching the species binomial against homonyms, use just the species-level homonyms rather than the genus-level homonyms, to include more unambiguous taxa

higherPlantGroup population

analytical_stem_view: add disambiguating prefix for TNRS accepted name terms

  • family -> acceptedFamily, etc.

analytical_stem_view: add combination of TNRS accepted and matched name

  • named combined_* call this scrubbed_* instead because users need to know that this is the final output name from TNRS, and because this is not the combination of the scrubbed and verbatim names

document TNRS terms in VegCore data dictionary

  • mapping from analytical_stem_view (VegCore) name to TNRS name
  • first add links to TNRS data dictionary
  • then get Brad's input on the definitions
  • Bob should review the names for clarity to scientists

Availability

  • Mark will be gone next Monday 6/10
  • Brad is unreachable all of this week but will be back next week