Project

General

Profile

2013-06-13 conference call

To do for Brad

determine higher_plant_group node names in Tropicos/APGIII backbone

see higher_plant_group node names in Tropicos APGIII

  • is the list in the BIEN2 analytical DB overview (p. 12 bottom > higherPlantGroup) complete?
    "bryophytes", "ferns and allies", "flowering plants", "gymnosperms (conifers)", "gymnosperms (non-conifer)"
    • e.g. there is an entry for "seed plants and ferns" under polyphyletic clades (p. 13 bottom > last ¶), but it is not in the list above
  • on Monday?

ask Naim to include TNRS version columns in CSV download

Naim has been e-mailed and a feature request submitted at *TNRS-185* .

In order to know when to refresh our TNRS cache, it would be useful to have the following CSV columns indicating when the TNRS source code or database has changed:

e-mail out which meeting dates not available this summer

To do for Aaron

include GCC when running TNRS

  • it provides more synonyms than Tropicos for Asteraceae, and the accepted names still match the Tropicos backbone
  • positioned before Tropicos in the sources list, so that GCC will be used instead when it provides a result
  • we had originally removed this along with USDA because "GCC is for only one family (Asteraceae)" (r5691)

fix higher_plant_group_nodes mapping

  • contrary to the list in the BIEN2 analytical DB overview (p. 13 bottom > last ¶), "ferns and allies" should not include all the nodes in bryophytes
    "ferns and allies": bryophytes (see above) + "Moniliformopses"

plant/non-plant genus/family homonyms

  • genus and family homonyms are now both available in a delimited format
  • can't use species homonyms to whitelist binomials because the list is not exhaustive

observation filtering

switch from NCBI backbone to Tropicos

see higher_plant_group node names in Tropicos APGIII

  • this will avoid family+genus mismatch problems due to NCBI using a different family classification (needed when determining higher_plant_group)
  • NCBI is also missing a number of genera from Tropicos
  • use Tropicos name and classification tables, joined together
  • using the TNRS copy of Tropicos at ssh://arjuna.iplantcollaborative.org:1657 :
    ssh -p 1657 aaronmk@arjuna.iplantcollaborative.org
    
  • TNRS batch-downloads the names from the Tropicos web service once a year (script runs overnight)

analytical_stem_view: add disambiguating prefix for TNRS accepted name terms

analytical_stem_view: add combination of TNRS accepted and matched name

  • TNRS "no opinion" names don't have a taxon concept (accepted name), just a matched name

add species_binomial

  • species_binomial: (from Brad)
    IF(Accepted_species IS NOT NULL, Accepted_species,
    IF(Specific_epithet_matched IS NOT NULL,CONCAT(Genus_matched,' ',Specific_epithet_matched),
    NULL)
    )
    

FIA filtering

document TNRS terms in VegCore data dictionary

include TNRS version and settings in TNRS cache

  • this helps determine when the TNRS cache needs to be reloaded (and which names to reload)
  • retrieve this from the TNRS web app's download settings file (using Download settings button displayed once results returned)
  • the following attributes should be included as cache table columns:
    • TNRS URL
    • Job type
    • Contains Id
    • Start time
    • TNRS version
    • github revision (from https://github.com/iPlantCollaborativeOpenSource/TNRS)
    • TNRS DB version (not yet included)
    • Sources selected
    • Match threshold
    • Classification
    • Allow partial matches?
    • Constrain by higher taxonomy
  • the following attributes are not needed:
    • E-mail (always set to tnrs@lka5jjs.orv when using the web app download)
    • Id (the session key)
    • Finish time (unused, always null)

include our TNRS client's version in TNRS cache

  • in addition to the TNRS web service's version, the client version is needed to track changes to the format we encode data in
  • the following columns are needed:
    • global svn revision
    • /lib/tnrs.py revision (currently r9525)
    • /bin/tnrs_db revision (currently r9530)
  • for existing rows, this information can be reconstructed from the Time_submitted

future GBIF exports

  • change runscripts to not hardcode date in export filename

Availability

  • Brad won't be available for some meetings this summer
    (he's full-time iPlant until end of June, then traveling in Canada and doing consulting)
    • Brad will send meeting dates he won't be available
  • Bob not here (getting ready for trip)