Skip to main content

Hands-on Introduction to Sequence-Length Requirements in Phylogenetics

  • Chapter
  • First Online:
Bioinformatics and Phylogenetics

Part of the book series: Computational Biology ((COBO,volume 29))

Abstract

In this tutorial, through a series of analytical computations and numerical simulations, we review many known insights into a fundamental question: how much data is needed to reconstruct the Tree of Life? A Jupyter notebook and code for this tutorial are provided in Python.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Casella, G., Berger, R.: Statistical Inference. Duxbury Resource Center (2001)

    Google Scholar 

  2. Cavender, J.A.: Taxonomy with confidence. Math. Biosci. 40(3–4) (1978)

    Article  MathSciNet  Google Scholar 

  3. Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley-Interscience. Wiley, Hoboken, NJ (2006)

    Google Scholar 

  4. Dasarathy, G., Nowak, R., Roch, S.: Data requirement for phylogenetic inference from multiple loci: a new distance method. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(2), 422–432 (2015)

    Article  Google Scholar 

  5. Daskalakis, C., Hill, C., Jaffe, A., Mihaescu, R., Mossel, E., Rao, S.: Maximal accurate forests from distance matrices. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) Research in Computational Molecular Biology, pp. 281–295. Springer, Berlin, Heidelberg (2006)

    Google Scholar 

  6. Daskalakis, C., Mossel, E., Roch, S.: Evolutionary trees and the ising model on the bethe lattice: a proof of steel’s conjecture. Probab. Theory Relat. Fields 149(1), 149–189 (2011)

    Article  MathSciNet  Google Scholar 

  7. Daskalakis, C., Mossel, E., Roch, S.: Phylogenies without branch bounds: contracting the short, pruning the deep. SIAM J. Discret. Math. 25(2), 872–893 (2011)

    Article  MathSciNet  Google Scholar 

  8. Daskalakis, C., Roch, S.: Alignment-free phylogenetic reconstruction: sample complexity via a branching process analysis. Ann. Appl. Probab. 23(2), 693–721 (2013)

    Article  MathSciNet  Google Scholar 

  9. Erdős, P.L., Steel, M.A., Székely, L., Warnow, T.J.: A few logs suffice to build (almost) all trees (i). Random Struct. Algorithms 14(2), 153–184 (1999)

    Article  MathSciNet  Google Scholar 

  10. Erdős, P.L., Steel, M.A., Székely, L., Warnow, T.J.: A few logs suffice to build (almost) all trees: part II. Theor. Comput. Sci. 221(1), 77–118 (1999)

    Article  Google Scholar 

  11. Farris, J.S.: A probability model for inferring evolutionary trees. Syst. Zool. 22(4), 250–256 (1973)

    Article  MathSciNet  Google Scholar 

  12. Huson, D.H., Nettles, S.M., Warnow, T.J.: Disk-covering, a fast-converging method for phylogenetic tree reconstruction. J. Comput. Biol. 6(3–4), 369–386 (1999)

    Article  Google Scholar 

  13. Lacey, M.R., Chang, J.T.: A signal-to-noise analysis of phylogeny estimation by neighbor-joining: Insufficiency of polynomial length sequences. Math. Biosci. 199(2), 188–215 (2006)

    Article  MathSciNet  Google Scholar 

  14. Mihaescu, R., Hill, C., Rao, S.: Fast phylogeny reconstruction through learning of ancestral sequences. Algorithmica 66(2), 419–449 (2013)

    Article  MathSciNet  Google Scholar 

  15. Moret, B.M., Roshan, U., Warnow, T.: Sequence-length requirements for phylogenetic methods. In: Guigó, R., Gusfield, D. (eds.) In: International Workshop on Algorithms in Bioinformatics (WABI), pp. 343–356. Springer, Berlin, Heidelberg (2002)

    Google Scholar 

  16. Moret, B.M.E., Wang, L.S., Warnow, T.: Toward new software for computational phylogenetics. Computer 35(7), 55–64 (2002). https://doi.org/10.1109/MC.2002.1016902

    Article  Google Scholar 

  17. Mossel, E.: On the impossibility of reconstructing ancestral data and phylogenies. J. Comput. Biol. 10(5), 669–676 (2003)

    Article  Google Scholar 

  18. Mossel, E.: Phase transitions in phylogeny. Trans. Am. Math. Soc. 356(6), 2379–2404 (2004)

    Article  MathSciNet  Google Scholar 

  19. Mossel, E.: Distorted metrics on trees and phylogenetic forests. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(1), 108–116 (2007)

    Article  Google Scholar 

  20. Mossel, E., Roch, S.: Learning nonsingular phylogenies and hidden Markov models. Ann. Appl. Probab. 16(2), 583–614 (2006)

    Article  MathSciNet  Google Scholar 

  21. Mossel, E., Roch, S.: Distance-based species tree estimation under the coalescent: information-theoretic trade-off between number of loci and sequence length. Ann. Appl. Probab. 27(5), 2926–2955 (2017)

    Article  MathSciNet  Google Scholar 

  22. Mossel, E., Roch, S., Sly, A.: On the inference of large phylogenies with long branches: how long is too long? Bull. Math. Biol. 73(7), 1627–1644 (2011)

    Article  MathSciNet  Google Scholar 

  23. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)

    Google Scholar 

  24. Nakhleh, L., Moret, B.M.E., Roshan, U., John, K.S., Sun, J., Warnow, T.: The accuracy of fast phylogenetic methods for large datasets. In: Altman, R., Dunker, A., Hunter, L., Lauderdale, K., Klein, T. (eds.) In: Pacific Symposium on Biocomputing 2002, pp. 211–222. World Scientific Press, Singapore

    Google Scholar 

  25. Pollard, D., Gill, R., Ripley, B.: A User’s Guide to Measure Theoretic Probability. Cambridge Series in Statistica. Cambridge University Press (2002)

    Google Scholar 

  26. Roch, S.: Toward extracting all phylogenetic information from matrices of evolutionary distances. Science 327(5971), 1376–1379 (2010)

    Article  MathSciNet  Google Scholar 

  27. Roch, S., Sly, A.: Phase transition in the sample complexity of likelihood-based phylogeny inference. Probab. Theory Relat. Fields 169(1), 3–62 (2017)

    Article  MathSciNet  Google Scholar 

  28. Roch, S., Warnow, T.: On the robustness to gene tree estimation error (or lack thereof) of coalescent-based species tree methods. Syst. Biol. 64(4), 663–676 (2015)

    Article  Google Scholar 

  29. Steel, M.: Phylogeny. Society for Industrial and Applied Mathematics, Philadelphia, PA (2016)

    Google Scholar 

  30. Steel, M., Székely, L.: Inverting random functions II: explicit bounds for discrete maximum likelihood estimation, with applications. SIAM J. Discret. Math. 15(4), 562–575 (2002)

    Article  MathSciNet  Google Scholar 

  31. Warnow, T.: Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation. Cambridge University Press (2017)

    Google Scholar 

  32. Warnow, T., Moret, B.M.E., St. John, K.: Absolute convergence: true trees from short sequences. In: Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’01, pp. 186–195. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2001)

    Google Scholar 

Download references

Acknowledgements

This work is supported by NSF grants DMS-1149312 (CAREER), DMS-1614242, and CCF-1740707 (TRIPODS).

   When I was first introduced to the field of computational phylogenetics in graduate school, I had the privilege of being supported by the NSF-funded CIPRES project—of which Bernard Moret was a leader—which had a significant impact on my early career .

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sébastien Roch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Roch, S. (2019). Hands-on Introduction to Sequence-Length Requirements in Phylogenetics. In: Warnow, T. (eds) Bioinformatics and Phylogenetics. Computational Biology, vol 29. Springer, Cham. https://doi.org/10.1007/978-3-030-10837-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-10837-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-10836-6

  • Online ISBN: 978-3-030-10837-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics