Abstract
In this tutorial, through a series of analytical computations and numerical simulations, we review many known insights into a fundamental question: how much data is needed to reconstruct the Tree of Life? A Jupyter notebook and code for this tutorial are provided in Python.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Casella, G., Berger, R.: Statistical Inference. Duxbury Resource Center (2001)
Cavender, J.A.: Taxonomy with confidence. Math. Biosci. 40(3–4) (1978)
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley-Interscience. Wiley, Hoboken, NJ (2006)
Dasarathy, G., Nowak, R., Roch, S.: Data requirement for phylogenetic inference from multiple loci: a new distance method. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(2), 422–432 (2015)
Daskalakis, C., Hill, C., Jaffe, A., Mihaescu, R., Mossel, E., Rao, S.: Maximal accurate forests from distance matrices. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) Research in Computational Molecular Biology, pp. 281–295. Springer, Berlin, Heidelberg (2006)
Daskalakis, C., Mossel, E., Roch, S.: Evolutionary trees and the ising model on the bethe lattice: a proof of steel’s conjecture. Probab. Theory Relat. Fields 149(1), 149–189 (2011)
Daskalakis, C., Mossel, E., Roch, S.: Phylogenies without branch bounds: contracting the short, pruning the deep. SIAM J. Discret. Math. 25(2), 872–893 (2011)
Daskalakis, C., Roch, S.: Alignment-free phylogenetic reconstruction: sample complexity via a branching process analysis. Ann. Appl. Probab. 23(2), 693–721 (2013)
Erdős, P.L., Steel, M.A., Székely, L., Warnow, T.J.: A few logs suffice to build (almost) all trees (i). Random Struct. Algorithms 14(2), 153–184 (1999)
Erdős, P.L., Steel, M.A., Székely, L., Warnow, T.J.: A few logs suffice to build (almost) all trees: part II. Theor. Comput. Sci. 221(1), 77–118 (1999)
Farris, J.S.: A probability model for inferring evolutionary trees. Syst. Zool. 22(4), 250–256 (1973)
Huson, D.H., Nettles, S.M., Warnow, T.J.: Disk-covering, a fast-converging method for phylogenetic tree reconstruction. J. Comput. Biol. 6(3–4), 369–386 (1999)
Lacey, M.R., Chang, J.T.: A signal-to-noise analysis of phylogeny estimation by neighbor-joining: Insufficiency of polynomial length sequences. Math. Biosci. 199(2), 188–215 (2006)
Mihaescu, R., Hill, C., Rao, S.: Fast phylogeny reconstruction through learning of ancestral sequences. Algorithmica 66(2), 419–449 (2013)
Moret, B.M., Roshan, U., Warnow, T.: Sequence-length requirements for phylogenetic methods. In: Guigó, R., Gusfield, D. (eds.) In: International Workshop on Algorithms in Bioinformatics (WABI), pp. 343–356. Springer, Berlin, Heidelberg (2002)
Moret, B.M.E., Wang, L.S., Warnow, T.: Toward new software for computational phylogenetics. Computer 35(7), 55–64 (2002). https://doi.org/10.1109/MC.2002.1016902
Mossel, E.: On the impossibility of reconstructing ancestral data and phylogenies. J. Comput. Biol. 10(5), 669–676 (2003)
Mossel, E.: Phase transitions in phylogeny. Trans. Am. Math. Soc. 356(6), 2379–2404 (2004)
Mossel, E.: Distorted metrics on trees and phylogenetic forests. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(1), 108–116 (2007)
Mossel, E., Roch, S.: Learning nonsingular phylogenies and hidden Markov models. Ann. Appl. Probab. 16(2), 583–614 (2006)
Mossel, E., Roch, S.: Distance-based species tree estimation under the coalescent: information-theoretic trade-off between number of loci and sequence length. Ann. Appl. Probab. 27(5), 2926–2955 (2017)
Mossel, E., Roch, S., Sly, A.: On the inference of large phylogenies with long branches: how long is too long? Bull. Math. Biol. 73(7), 1627–1644 (2011)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)
Nakhleh, L., Moret, B.M.E., Roshan, U., John, K.S., Sun, J., Warnow, T.: The accuracy of fast phylogenetic methods for large datasets. In: Altman, R., Dunker, A., Hunter, L., Lauderdale, K., Klein, T. (eds.) In: Pacific Symposium on Biocomputing 2002, pp. 211–222. World Scientific Press, Singapore
Pollard, D., Gill, R., Ripley, B.: A User’s Guide to Measure Theoretic Probability. Cambridge Series in Statistica. Cambridge University Press (2002)
Roch, S.: Toward extracting all phylogenetic information from matrices of evolutionary distances. Science 327(5971), 1376–1379 (2010)
Roch, S., Sly, A.: Phase transition in the sample complexity of likelihood-based phylogeny inference. Probab. Theory Relat. Fields 169(1), 3–62 (2017)
Roch, S., Warnow, T.: On the robustness to gene tree estimation error (or lack thereof) of coalescent-based species tree methods. Syst. Biol. 64(4), 663–676 (2015)
Steel, M.: Phylogeny. Society for Industrial and Applied Mathematics, Philadelphia, PA (2016)
Steel, M., Székely, L.: Inverting random functions II: explicit bounds for discrete maximum likelihood estimation, with applications. SIAM J. Discret. Math. 15(4), 562–575 (2002)
Warnow, T.: Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation. Cambridge University Press (2017)
Warnow, T., Moret, B.M.E., St. John, K.: Absolute convergence: true trees from short sequences. In: Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’01, pp. 186–195. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2001)
Acknowledgements
This work is supported by NSF grants DMS-1149312 (CAREER), DMS-1614242, and CCF-1740707 (TRIPODS).
When I was first introduced to the field of computational phylogenetics in graduate school, I had the privilege of being supported by the NSF-funded CIPRES project—of which Bernard Moret was a leader—which had a significant impact on my early career .
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Roch, S. (2019). Hands-on Introduction to Sequence-Length Requirements in Phylogenetics. In: Warnow, T. (eds) Bioinformatics and Phylogenetics. Computational Biology, vol 29. Springer, Cham. https://doi.org/10.1007/978-3-030-10837-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-10837-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10836-6
Online ISBN: 978-3-030-10837-3
eBook Packages: Computer ScienceComputer Science (R0)