Skip to main content

On the Variance of Internode Distance Under the Multispecies Coalescent

  • Conference paper
  • First Online:
Book cover Comparative Genomics (RECOMB-CG 2018)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11183))

Included in the following conference series:

Abstract

We consider the problem of estimating species trees from unrooted gene tree topologies in the presence of incomplete lineage sorting, a common phenomenon that creates gene tree heterogeneity in multilocus datasets. One popular class of reconstruction methods in this setting is based on internode distances, i.e. the average graph distance between pairs of species across gene trees. While statistical consistency in the limit of large numbers of loci has been established in some cases, little is known about the sample complexity of such methods. Here we make progress on this question by deriving a lower bound on the worst-case variance of internode distance which depends linearly on the corresponding graph distance in the species tree. We also discuss some algorithmic implications.

This work was supported by funding from the U.S. National Science Foundation DMS-1149312 (CAREER), DMS-1614242 and CCF-1740707 (TRIPODS). We thank Tandy Warnow for suggesting the problem and for helpful discussions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In keeping with much of the literature on the MSC, we use the generic term gene to refer to any genomic region experiencing low rates of internal recombination, not necessarily a protein-coding region.

  2. 2.

    Note however that the associated branch lengths may differ from \(\varGamma \).

  3. 3.

    Note that it is trivial that \((d^{\mathcal {T}_j}_\mathrm {g}(x,y))_{x,y}\) is an additive metric associated to gene tree \(\mathcal {T}_j\). On the other hand it is far from trivial that averaging over the MSC leads to an additive metric associated to the species tree.

  4. 4.

    A straightforward modification of the argument also works for odd n.

References

  1. Allman, E.S., Degnan, J.H., Rhodes, J.A.: Species tree inference from gene splits by unrooted star methods. IEEE/ACM Trans. Comput. Biol. Bioinform. 15(1), 337–342 (2018)

    Article  Google Scholar 

  2. Atteson, K.: The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica 25(2), 251–278 (1999)

    Article  MathSciNet  Google Scholar 

  3. Cannon, J.T., Vellutini, B.C., Smith, J., Ronquist, F., Jondelius, U., Hejnol, A.: Xenacoelomorpha is the sister group to nephrozoa. Nature 530(7588), 89–93 (2016)

    Article  Google Scholar 

  4. Dasarathy, G., Mossel, E., Nowak, R.D., Roch, S.: Coalescent-based species tree estimation: a stochastic farris transform. CoRR, abs/1707.04300 (2017)

    Google Scholar 

  5. Dasarathy, G., Nowak, R.D., Roch, S.: Data requirement for phylogenetic inference from multiple loci: a new distance method. IEEE/ACM Trans. Comput. Biology Bioinform. 12(2), 422–432 (2015)

    Article  Google Scholar 

  6. Jarvis, E., et al.: Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346(6215), 1320–1331 (2014)

    Article  Google Scholar 

  7. Kreidl, M.: Note on expected internode distances for gene trees in species trees (2011)

    Google Scholar 

  8. Kubatko, L.S., Carstens, B.C., Knowles, L.L.: STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25(7), 971–973 (2009)

    Article  Google Scholar 

  9. Erdős, P.L., Steel, M.A., Székely, L.A., Warnow, T.J.: A few logs suffice to build (almost) all trees (i). Random Structures Algorithms 14(2), 153–184 (1999)

    Article  MathSciNet  Google Scholar 

  10. Lacey, M.R., Chang, J.T.: A signal-to-noise analysis of phylogeny estimation by neighbor-joining: Insufficiency of polynomial length sequences. Math. Biosci. 199(2), 188–215 (2006)

    Article  MathSciNet  Google Scholar 

  11. Liu, L., Yu, L.: Estimating species trees from unrooted gene trees. Syst. Biol. 60(5), 661–667 (2011)

    Article  Google Scholar 

  12. Liu, L., Yu, L., Edwards, S.V.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10(1), 302 (2010)

    Article  Google Scholar 

  13. Liu, L., Lili, Y., Pearl, D.K., Edwards, S.V.: Estimating species phylogenies using coalescence times among sequences. Syst. Biol. 58(5), 468–477 (2009)

    Article  Google Scholar 

  14. Maddison, W.P.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997)

    Article  Google Scholar 

  15. Mirarab, S., Reaz, R., Bayzid, M.S., Zimmermann, T., Swenson, M.S., Warnow, T.: ASTRAL: accurate species TRee ALgorithm. Bioinformatics 30(17), i541–i548 (2014)

    Article  Google Scholar 

  16. Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)

    Article  Google Scholar 

  17. Mossel, E., Roch, S.: Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Trans. Comput. Biol. Bioinform. 7(1), 166–71 (2010)

    Article  Google Scholar 

  18. Mossel, E., Roch, S.: Distance-based species tree estimation: Information-theoretic trade-off between number of loci and sequence length under the coalescent. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2015, Princeton, NJ, USA, August 24–26, 2015, pp. 931–942 (2015)

    Google Scholar 

  19. Rannala, B., Yang, Z.: Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164(4), 1645–1656 (2003)

    Google Scholar 

  20. Roch, S., Steel, M.A.: Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theor. Popul. Biol. 100, 56–62 (2015)

    Article  Google Scholar 

  21. Roch, S., Warnow, T.: On the robustness to gene tree estimation error (or lack thereof) of coalescent-based species tree methods. Syst. Biol. 64(4), 663–676 (2015)

    Article  Google Scholar 

  22. Roch, S.: Toward extracting all phylogenetic information from matrices of evolutionary distances. Science 327(5971), 1376–1379 (2010)

    Article  MathSciNet  Google Scholar 

  23. Roch, S., Nute, M., Warnow, T.J.: Long-branch attraction in species tree estimation: inconsistency of partitioned likelihood and topology-based summary methods. CoRR, abs/1803.02800 (2018)

    Google Scholar 

  24. Shekhar, S., Roch, S., Mirarab, S.: Species tree estimation using ASTRAL: how many genes are enough? IEEE/ACM Trans. Comput. Biol. Bioinform., 1 (2018)

    Google Scholar 

  25. Steel, M.: Phylogeny–discrete and random processes in evolution. In: CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 89. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2016)

    Google Scholar 

  26. Vachaspati, P., Warnow, T.: ASTRID: accurate species TRees from internode distances. BMC Genomics 16(Suppl 10), S3 (2015)

    Article  Google Scholar 

  27. Wickett, N.J., et al.: Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl. Acad. Sci. 111(45), E4859–E4868 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sébastien Roch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Roch, S. (2018). On the Variance of Internode Distance Under the Multispecies Coalescent. In: Blanchette, M., Ouangraoua, A. (eds) Comparative Genomics. RECOMB-CG 2018. Lecture Notes in Computer Science(), vol 11183. Springer, Cham. https://doi.org/10.1007/978-3-030-00834-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00834-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00833-8

  • Online ISBN: 978-3-030-00834-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics