Skip to main content

Inferring Trees

  • Protocol
Bioinformatics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 452))

Abstract

Molecular phylogenetics examines how biological sequences evolve and the historical relationships between them. An important aspect of many such studies is the estimation of a phylogenetic tree, which explicitly describes evolutionary relationships between the sequences. This chapter provides an introduction to evolutionary trees and some commonly used inferential methodology, focusing on the assumptions made and how they affect an analysis. Detailed discussion is also provided about some common algorithms used for phylogenetic tree estimation. Finally, there are a few practical guidelines, including how to combine multiple software packages to improve inference, and a comparison between Bayesian and maximum likelihood phylogenetics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hahn, B. H., Shaw, G. M., de Cock, K.M., et al. (2000) AIDS as a zoonosis: Scientific and public health implications.Science 287, 607–614.

    Article  PubMed  CAS  Google Scholar 

  2. Pellegrini, M., Marcotte, E. M., Thompson, M. J., et al. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.Proc Natl Acad Sci U S A 96, 4285–4288.

    Article  PubMed  CAS  Google Scholar 

  3. Tatusov, R. L., Natale, D. A., Garkavtsev, I. V., et al. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes.Nucleic Acids Res 29, 22–28.

    Article  PubMed  CAS  Google Scholar 

  4. Mouse Genome Sequencing Consortium. (2002) Initial sequencing of the mouse genome.Nature 420, 520–562.

    Article  Google Scholar 

  5. The ENCODE Project Consortium. (2004) The ENCODE (Encyclopedia of DNA Elements) project.Science 306, 636–640.

    Article  Google Scholar 

  6. Page, R. D. M., Holmes, E. C. (1998)Molecular Evolution: A Phylogenetic Approach. Blackwell Science, Oxford, UK.

    Google Scholar 

  7. Gogarten, J. P., Doolittle, W. F., Lawrence, J. G. (2002) Prokaryotic evolution in light of gene transfer.Mol Biol Evol 19, 2226– 2238.

    PubMed  CAS  Google Scholar 

  8. Siepel, A., Bejerano, G., Pedersen, J. S., et al. (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.Genome Res 15, 1034–1050.

    Article  PubMed  CAS  Google Scholar 

  9. Felsenstein, J. (2004)Inferring Phylogenies. Sinauer Associates, Sunderland, MA.

    Google Scholar 

  10. Nei, M., Kumar, S. (2000)Molecular Evolution and Phylogenetics. Oxford University Press, New York.

    Google Scholar 

  11. Whelan, S., Lio, P., Goldman, N. (2001). Molecular phylogenetics: state-of-the-art methods for looking into the past.Trends Genet 17, 262–272.

    Article  PubMed  CAS  Google Scholar 

  12. Chang, J. T. (1996) Full reconstruction of Markov models on evolutionary trees: Identifiability and consistency.Math Biosci 137, 51–73.

    Article  PubMed  CAS  Google Scholar 

  13. Rogers, J. S. (1997) On the consistency of maximum likelihood estimation of phy-logenetic trees from nucleotide sequences.Syst Biol 46, 354–357.

    Article  PubMed  CAS  Google Scholar 

  14. Steel, M. A., Penny, D. (2000) Parsimony, likelihood, and the role of models in molecular phylogenetics.Mol Biol Evol 17, 839–850.

    PubMed  CAS  Google Scholar 

  15. Siddall M. E., Kluge A. G. (1997) Probabi-lism and phylogenetic inference.Cladistics 13, 313–336.

    Article  Google Scholar 

  16. Saitou, N., Nei, M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees.Mol Biol Evol 4, 406–425.

    PubMed  CAS  Google Scholar 

  17. Fitch, W. M., Margoliash, E. (1967) Construction of phylogenetic trees. A method based on mutation distances as estimated from cytochrome c sequences is of general applicability.Science 155, 279–284.

    Article  PubMed  CAS  Google Scholar 

  18. Swofford, D. L., Olsen, G. J., Waddell, P. J., et al. (1996) Phylogenetic inference, in (Hillis, D.M., Moritz, C., and Mable B. K., eds.), Molecular Systematics, 2nd ed. Sin-auer, Sunderland, MA.

    Google Scholar 

  19. Yang, Z., Goldman, N., Friday, A. (1995) Maximum likelihood trees from DNA sequences: a peculiar statistical estimation problem.Syst Biol 44, 384–399.

    Google Scholar 

  20. Strimmer, K., von Haeseler, A. (1996) Quartet puzzling: A quartet maximum likelihood method for reconstructing tree topologies.Mol Biol Evol 13, 964–969.

    CAS  Google Scholar 

  21. Bryant, D. The splits in the neighbourhood of a tree.Ann Combinat 8, 1–11.

    Google Scholar 

  22. Sankoff, D., Abel Y., Hein, J. (1994) A tree, a window, a hill; generalisation of nearest neighbor interchange in phylogenetic optimisation.J Classif 11, 209–232.

    Article  Google Scholar 

  23. Ganapathy, G., Ramachandran, V., Warnow, T. (2004) On contract-and-refine transformations between phylogenetic trees.Proc Fifteenth ACM-SIAM Symp Discrete Algorithms (SODA), 893–902.

    Google Scholar 

  24. Wolf, M. J., Easteal, S., Kahn, M., et al. (2000) TrExML: a maximum-likelihood approach for extensive tree-space exploration.Bioinformatics 16, 383–394.

    Article  PubMed  CAS  Google Scholar 

  25. Stamatakis, A., Ludwig, T., Meier, H. (2005) RAxML-III: a fast program for maximum likelihood-based inference of large phyloge-netic trees.Bioinformatics 21, 456–463.

    Article  PubMed  CAS  Google Scholar 

  26. Vinh, L. S., von Haeseler, A. (2004) IQPNNI: moving fast through tree space and stopping in time.Mol Biol Evol 21, 1565–1571.

    Article  CAS  Google Scholar 

  27. Felsenstein, J. (1993)PHYLIP (Phylog-eny Inference Package). Distributed by the author. Department of Genetics, University of Washington, Seattle.

    Google Scholar 

  28. Lewis, P. O. (1998) A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data.Mol Biol Evol 15, 277–283.

    PubMed  CAS  Google Scholar 

  29. Lemmon, A. R., Milinkovich, M. C. (2002) The metapopulation genetic algorithm: an efficient solution for the problem of large phylogeny estimation.Proc Natl Acad Sci U S A 99, 10516–10521.

    Article  Google Scholar 

  30. Lundy, M. (1985) Applications of the annealing algorithm to combinatorial problems in statistics.Biometrika 72, 191–198.

    Article  Google Scholar 

  31. Salter, L., Pearl., D. K. (2001) Stochastic search strategy for estimation of maximum likelihood phylogenetic trees.Syst Biol 50, 7–17.

    Article  PubMed  CAS  Google Scholar 

  32. Keith J. M., Adams P., Ragan M. A., et al. (2005) Sampling phylogenetic tree space with the generalized Gibbs sampler.Mol Phy Evol 34, 459–468.

    Article  Google Scholar 

  33. Efron, B., Tibshirani, R. J. (1993)An Introduction to the Bootstrap. Chapman and Hall, New York.

    Google Scholar 

  34. Felsenstein, J. (1985) Confidence limits on phylogenies: an approach using the bootstrap.Evolution 39, 783–791.

    Article  Google Scholar 

  35. Hillis, D., Bull, J. (1993) An empirical test of bootstrapping as a method for assessing conference in phylogenetic analysis.Syst Biol 42, 182–192.

    Google Scholar 

  36. Efrom, B., Halloran, E., Holmes, S. (1996) Bootstrap confidence levels for phyloge-netic trees.Proc Natl Acad Sci U S A 93, 13429–13434.

    Article  Google Scholar 

  37. Shimodaira, H., Hasegawa, M. (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference.Mol Biol Evol 16, 1114–1116.

    CAS  Google Scholar 

  38. Shimodaira, H. (2002) An approximately unbiased test of phylogenetic tree selection.Syst Biol 51, 492–508.

    Article  PubMed  Google Scholar 

  39. Kishino, H., Hasegawa, M. (1989) Evaluation of the maximum-likelihood estimate of the evolutionary tree topologies from DNA-sequence data, and the branching order in Hominoidea.J Mol Evol 29, 170–179.

    Article  PubMed  CAS  Google Scholar 

  40. Hasegawa, M., Kishino, H. (1994) Accuracies of the simple methods for estimating the bootstrap probability of a maximum-likelihood tree.Mol Biol Evol 11, 142–145.

    CAS  Google Scholar 

  41. Davison, A. C., Hinkley, D. V. (1997)Bootstrap Methods and Their Application. Cambridge University Press, Cambridge, MA.

    Google Scholar 

  42. Siepel, A., Haussler, D. (2005) Phyloge-netic hidden Markov models, in (Nielsen, R., ed.),Statistical Methods in Molecular Evolution. Springer, New York.

    Google Scholar 

  43. Huelsenbeck, J. P., Larget, B., Miller, R. E., et al. (2002) Potential applications and pitfalls of Bayesian inference of phylogeny.Syst Biol 51, 673–688.

    Article  PubMed  Google Scholar 

  44. Holder, M., Lewis, P. O. (2003) Phylog-eny estimation: traditional and Bayesian approaches.Nat Rev Genet 4, 275–284.

    Article  PubMed  CAS  Google Scholar 

  45. Larget, B., Simon, D. (1999) Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees.Mol Biol Evol 16, 750–759.

    CAS  Google Scholar 

  46. Suzuki, Y., Glazko G. V., Nei, M. (2002) Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics.Proc Natl Acad Sci U S A 99, 16138–16143.

    Article  PubMed  CAS  Google Scholar 

  47. Alfaro, M. E., Zoller, S., Lutzoni, F. (2003) Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence.Mol Biol Evol 20,255–266.

    Article  PubMed  CAS  Google Scholar 

  48. Douady, C. J., Delsuc, F., Boucher, Y., et al. (2003) Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability.Mol Biol Evol 20, 248–254.

    Article  PubMed  CAS  Google Scholar 

  49. Yang, Z., Rannala, B. (2005) Branch-length prior influences Bayesian posterior probability of phylogeny.Syst Biol 54, 455–470.

    Article  PubMed  Google Scholar 

  50. Lewis, P. O., Holder, M. T., Holsinger, K. E. (2005) Polytomies and Bayesian phyloge-netic inference.Syst Biol 54, 241–253.

    Article  PubMed  Google Scholar 

  51. Yang, Z. (1996) Among-site rate variation and its impact on phylogenetic analysis.Trends Ecol Evol 11, 367–372.

    Article  PubMed  CAS  Google Scholar 

  52. Hasegawa, M., Kishino, H., Yano, T. (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.J Mol Evol 22, 160–174.

    Article  PubMed  CAS  Google Scholar 

  53. Dayhoff, M. O., Eck, R. V., Park, C. M. (1972) A model of evolutionary change in proteins, in (Dayhoff, M. O., ed.),Atlas of Protein Sequence and Structure,vol. 5. National Biomedical Research Foundation, Washington, DC.

    Google Scholar 

  54. Whelan, S., Goldman, N. (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum likelihood approach.Mol Biol Evol 18, 691–699.

    PubMed  CAS  Google Scholar 

  55. Adachi, J., Hasegawa M. (1996) Model of amino acid substitution in proteins encoded by mitochondrial DNA.J Mol Evol 42, 459–468.

    Article  PubMed  CAS  Google Scholar 

  56. Yang, Z., Nielsen, R., Hasegawa, M. (1998) Models of amino acid substitution and applications to mitochondrial protein evolution.Mol Biol Evol 15, 1600–1611.

    PubMed  CAS  Google Scholar 

  57. Cao, Y., Adachi, J., Janke, A., et al. (1994) Phylogenetic relationships among eutherian orders estimated from inferred sequences of mitochondrial proteins: instability of a tree based on a single gene.J Mol Evol 39, 519–527.

    Article  PubMed  CAS  Google Scholar 

  58. Goldman, N., Whelan, S. (2002) A novel use of equilibrium frequencies in models of sequence evolution.Mol Biol Evol 19, 1821–1831.

    PubMed  CAS  Google Scholar 

  59. Ren, F., Tanaka, H., Yang, Z. (2005) An empirical examination of the utility of codon-substitution models in phylogeny reconstruction.Syst Biol 54, 808–818.

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

S.W. is funded by EMBL. Comments and suggestions from Nick Goldman, Lars Jermiin, Ari Loytynoja, and Fabio Pardi all helped improve previous versions of the manuscript.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Whelan, S. (2008). Inferring Trees. In: Keith, J.M. (eds) Bioinformatics. Methods in Molecular Biology™, vol 452. Humana Press. https://doi.org/10.1007/978-1-60327-159-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-159-2_14

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-707-5

  • Online ISBN: 978-1-60327-159-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics