2011 working group Tu iPToL-BIEN Phylogenetics¶
- together with BIEN, doing phylogenetic hypothesis testing
- slow to get a tree to do this
- first tree from iPToL last month
- blending tree with BIEN data
- perpetual tree
- OTU mapping
- use TNRS to take BIEN taxa, iPToL taxa and normalize them so they can be mapped together
- iPToL has ~116,000 names; BIEN has ~22,000 names
- periods, hyphens turned into underscores
- names don't precisely map to taxonomy b/c have gone through encoding for RAXML
- Phylocom (phylomatic software) crashes on invalid names
- using Phylocom for initial pruning of tree
- Biophylo libraries: powerful, but took 5 hours to parse 70,000 name tree
- BIEN names are genus and species for which hyphens are retained
- deleted names from iPToL not found in BIEN list genera (iPToL down to ~44,000 names) and vice versa (BIEN down to ~21,000 names)
- sp taxa handled individually
- TNRS doesn't have sp fungi
- iPToL using NCBI names
- clip tree using MRCA for first pass
- remove suffixes after f, var, subsp, subpp, var nov, x, sp -> canonicalize (sp preserved if no duplicate)
- which iPToL con-generic species can be deleted b/c not in BIEN (BIEN down to ~19,000 names)
- removes names from tree if missing
- don't keep species not in BIEN
- replace species with zero-length polytomy and use for clade (BIEN up to ~20,000 names)
- can't join by higher taxa
- 7000 BIEN names can't be added accurately b/c >1 congeneric name in iPToL
- where to attach nodes
- software says node attaches in a certain place, doesn't give conservative placement
- cultivated hybrids, varieties under one another
- put in clade that gives misleading evolution signal: put at base or tip?
- points to root of subtree, not physically attached
- queries that find reciprocal monophyly
- Dendroscope for viewing phylogenetic trees
- name prefixed w/ O: original tree
- preserve structure inside clades
- interact with database instead of taking tree as input
- Perl script query_vtol.pl: finds MRCA of two species
- Find map names that might descend from MRCA or direct ancestor
- %s in output tell whether could be in clade
- Felsenstein's independent contrast
- discrete or continuous character value for node
- clip out subtree w/ branch names -> Bayesian filter -> randomly put missing species in tree according to branch length
- use that w/ independent contrast or calculations
- clade by clade analysis to get scored value for node
- Phylocom creates giant polytomies, soft polytomies
- instead calculate optimized value w/ optimized trees
- never have tree that perfectly has all BIEN species
- technique to randomly resolve polytomies
- add missing species using skeletal backbone
- TB of data
- missing species: ~19,000 names
- TROPICOS classification
- deeper mapping than genera
- use TNRS API to script it all
- but want whole NCBI tree to clip out non-? plants
- script into pipeline
- forks output into fine, coarse mapping trees
- make pipeline completely automated
- for some eco-phylo questions, more conservative tree
- biases like rare species more likely to be floaters b/c not sequenced
- single vs multi-sequenced trees
- shunk iPToL tree and then expanded it with BIEN
- get original data from matrices of subset to generate support values
- inflate w/ missing BIEN species
- degree of conservatism in geographic trait, range values
- need both trees
- trait evolution: more highly resolved taxa
- communities: catch by genus/family rather than throwing out species
- better tree than what ecologists have available/typically work with
- Phylocom software: genus phylogeny -> create polytomies at tips for species
- assumes species put into genera correctly
- species-level tree from iPToL -> add missing species
- iPToL: someone works on perpetual tree engine? will e-mail out
- scripts could be moved over to service tacked on to perpetual tree to get BIEN tree
- phylo people don't like own trees b/c worried they are wrong
- supermatrix approach, not supertree
- pull out tree-based data -> parallel supertree
- want to have script that takes big tree and adds timepoints, rate smoothing, age
- data much better than what currently have access to
- NCBI classification -> tree (molecular biology side)
- get use case for what would need to do this on the fly from BIEN
- API to framework?
Afternoon¶
- two groups: science-related and BIEN 3.0
- details of things for BIEN domain: data discussion (across the hall)
- walk through data structure
- warehouse database
- outline of primary constraints
- science group
- focal areas to continue working
- outline primary goals