Project

General

Profile

2011 working group Tu iPToL-BIEN Phylogenetics

  • together with BIEN, doing phylogenetic hypothesis testing
  • slow to get a tree to do this
  • first tree from iPToL last month
  • blending tree with BIEN data
  • perpetual tree
  • OTU mapping
  • use TNRS to take BIEN taxa, iPToL taxa and normalize them so they can be mapped together
  • iPToL has ~116,000 names; BIEN has ~22,000 names
  • periods, hyphens turned into underscores
  • names don't precisely map to taxonomy b/c have gone through encoding for RAXML
  • Phylocom (phylomatic software) crashes on invalid names
  • using Phylocom for initial pruning of tree
  • Biophylo libraries: powerful, but took 5 hours to parse 70,000 name tree
  • BIEN names are genus and species for which hyphens are retained
  • deleted names from iPToL not found in BIEN list genera (iPToL down to ~44,000 names) and vice versa (BIEN down to ~21,000 names)
  • sp taxa handled individually
  • TNRS doesn't have sp fungi
  • iPToL using NCBI names
  • clip tree using MRCA for first pass
  • remove suffixes after f, var, subsp, subpp, var nov, x, sp -> canonicalize (sp preserved if no duplicate)
  • which iPToL con-generic species can be deleted b/c not in BIEN (BIEN down to ~19,000 names)
  • removes names from tree if missing
  • don't keep species not in BIEN
  • replace species with zero-length polytomy and use for clade (BIEN up to ~20,000 names)
  • can't join by higher taxa
  • 7000 BIEN names can't be added accurately b/c >1 congeneric name in iPToL
  • where to attach nodes
  • software says node attaches in a certain place, doesn't give conservative placement
  • cultivated hybrids, varieties under one another
  • put in clade that gives misleading evolution signal: put at base or tip?
  • points to root of subtree, not physically attached
  • queries that find reciprocal monophyly
  • Dendroscope for viewing phylogenetic trees
  • name prefixed w/ O: original tree
  • preserve structure inside clades
  • interact with database instead of taking tree as input
  • Perl script query_vtol.pl: finds MRCA of two species
  • Find map names that might descend from MRCA or direct ancestor
  • %s in output tell whether could be in clade
  • Felsenstein's independent contrast
  • discrete or continuous character value for node
  • clip out subtree w/ branch names -> Bayesian filter -> randomly put missing species in tree according to branch length
  • use that w/ independent contrast or calculations
  • clade by clade analysis to get scored value for node
  • Phylocom creates giant polytomies, soft polytomies
  • instead calculate optimized value w/ optimized trees
  • never have tree that perfectly has all BIEN species
  • technique to randomly resolve polytomies
  • add missing species using skeletal backbone
  • TB of data
  • missing species: ~19,000 names
  • TROPICOS classification
  • deeper mapping than genera
  • use TNRS API to script it all
  • but want whole NCBI tree to clip out non-? plants
  • script into pipeline
  • forks output into fine, coarse mapping trees
  • make pipeline completely automated
  • for some eco-phylo questions, more conservative tree
  • biases like rare species more likely to be floaters b/c not sequenced
  • single vs multi-sequenced trees
  • shunk iPToL tree and then expanded it with BIEN
  • get original data from matrices of subset to generate support values
  • inflate w/ missing BIEN species
  • degree of conservatism in geographic trait, range values
  • need both trees
  • trait evolution: more highly resolved taxa
  • communities: catch by genus/family rather than throwing out species
  • better tree than what ecologists have available/typically work with
  • Phylocom software: genus phylogeny -> create polytomies at tips for species
    • assumes species put into genera correctly
  • species-level tree from iPToL -> add missing species
  • iPToL: someone works on perpetual tree engine? will e-mail out
  • scripts could be moved over to service tacked on to perpetual tree to get BIEN tree
  • phylo people don't like own trees b/c worried they are wrong
  • supermatrix approach, not supertree
  • pull out tree-based data -> parallel supertree
  • want to have script that takes big tree and adds timepoints, rate smoothing, age
  • data much better than what currently have access to
  • NCBI classification -> tree (molecular biology side)
  • get use case for what would need to do this on the fly from BIEN
  • API to framework?

Afternoon

  • two groups: science-related and BIEN 3.0
  • details of things for BIEN domain: data discussion (across the hall)
    • walk through data structure
    • warehouse database
    • outline of primary constraints
  • science group
    • focal areas to continue working
    • outline primary goals