Abstract
We consider the problem of estimating species trees from unrooted gene tree topologies in the presence of incomplete lineage sorting, a common phenomenon that creates gene tree heterogeneity in multilocus datasets. One popular class of reconstruction methods in this setting is based on internode distances, i.e. the average graph distance between pairs of species across gene trees. While statistical consistency in the limit of large numbers of loci has been established in some cases, little is known about the sample complexity of such methods. Here we make progress on this question by deriving a lower bound on the worst-case variance of internode distance which depends linearly on the corresponding graph distance in the species tree. We also discuss some algorithmic implications.
This work was supported by funding from the U.S. National Science Foundation DMS-1149312 (CAREER), DMS-1614242 and CCF-1740707 (TRIPODS). We thank Tandy Warnow for suggesting the problem and for helpful discussions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In keeping with much of the literature on the MSC, we use the generic term gene to refer to any genomic region experiencing low rates of internal recombination, not necessarily a protein-coding region.
- 2.
Note however that the associated branch lengths may differ from \(\varGamma \).
- 3.
Note that it is trivial that \((d^{\mathcal {T}_j}_\mathrm {g}(x,y))_{x,y}\) is an additive metric associated to gene tree \(\mathcal {T}_j\). On the other hand it is far from trivial that averaging over the MSC leads to an additive metric associated to the species tree.
- 4.
A straightforward modification of the argument also works for odd n.
References
Allman, E.S., Degnan, J.H., Rhodes, J.A.: Species tree inference from gene splits by unrooted star methods. IEEE/ACM Trans. Comput. Biol. Bioinform. 15(1), 337–342 (2018)
Atteson, K.: The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica 25(2), 251–278 (1999)
Cannon, J.T., Vellutini, B.C., Smith, J., Ronquist, F., Jondelius, U., Hejnol, A.: Xenacoelomorpha is the sister group to nephrozoa. Nature 530(7588), 89–93 (2016)
Dasarathy, G., Mossel, E., Nowak, R.D., Roch, S.: Coalescent-based species tree estimation: a stochastic farris transform. CoRR, abs/1707.04300 (2017)
Dasarathy, G., Nowak, R.D., Roch, S.: Data requirement for phylogenetic inference from multiple loci: a new distance method. IEEE/ACM Trans. Comput. Biology Bioinform. 12(2), 422–432 (2015)
Jarvis, E., et al.: Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346(6215), 1320–1331 (2014)
Kreidl, M.: Note on expected internode distances for gene trees in species trees (2011)
Kubatko, L.S., Carstens, B.C., Knowles, L.L.: STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25(7), 971–973 (2009)
Erdős, P.L., Steel, M.A., Székely, L.A., Warnow, T.J.: A few logs suffice to build (almost) all trees (i). Random Structures Algorithms 14(2), 153–184 (1999)
Lacey, M.R., Chang, J.T.: A signal-to-noise analysis of phylogeny estimation by neighbor-joining: Insufficiency of polynomial length sequences. Math. Biosci. 199(2), 188–215 (2006)
Liu, L., Yu, L.: Estimating species trees from unrooted gene trees. Syst. Biol. 60(5), 661–667 (2011)
Liu, L., Yu, L., Edwards, S.V.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10(1), 302 (2010)
Liu, L., Lili, Y., Pearl, D.K., Edwards, S.V.: Estimating species phylogenies using coalescence times among sequences. Syst. Biol. 58(5), 468–477 (2009)
Maddison, W.P.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997)
Mirarab, S., Reaz, R., Bayzid, M.S., Zimmermann, T., Swenson, M.S., Warnow, T.: ASTRAL: accurate species TRee ALgorithm. Bioinformatics 30(17), i541–i548 (2014)
Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)
Mossel, E., Roch, S.: Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Trans. Comput. Biol. Bioinform. 7(1), 166–71 (2010)
Mossel, E., Roch, S.: Distance-based species tree estimation: Information-theoretic trade-off between number of loci and sequence length under the coalescent. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2015, Princeton, NJ, USA, August 24–26, 2015, pp. 931–942 (2015)
Rannala, B., Yang, Z.: Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164(4), 1645–1656 (2003)
Roch, S., Steel, M.A.: Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theor. Popul. Biol. 100, 56–62 (2015)
Roch, S., Warnow, T.: On the robustness to gene tree estimation error (or lack thereof) of coalescent-based species tree methods. Syst. Biol. 64(4), 663–676 (2015)
Roch, S.: Toward extracting all phylogenetic information from matrices of evolutionary distances. Science 327(5971), 1376–1379 (2010)
Roch, S., Nute, M., Warnow, T.J.: Long-branch attraction in species tree estimation: inconsistency of partitioned likelihood and topology-based summary methods. CoRR, abs/1803.02800 (2018)
Shekhar, S., Roch, S., Mirarab, S.: Species tree estimation using ASTRAL: how many genes are enough? IEEE/ACM Trans. Comput. Biol. Bioinform., 1 (2018)
Steel, M.: Phylogeny–discrete and random processes in evolution. In: CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 89. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2016)
Vachaspati, P., Warnow, T.: ASTRID: accurate species TRees from internode distances. BMC Genomics 16(Suppl 10), S3 (2015)
Wickett, N.J., et al.: Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl. Acad. Sci. 111(45), E4859–E4868 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Roch, S. (2018). On the Variance of Internode Distance Under the Multispecies Coalescent. In: Blanchette, M., Ouangraoua, A. (eds) Comparative Genomics. RECOMB-CG 2018. Lecture Notes in Computer Science(), vol 11183. Springer, Cham. https://doi.org/10.1007/978-3-030-00834-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-00834-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00833-8
Online ISBN: 978-3-030-00834-5
eBook Packages: Computer ScienceComputer Science (R0)