Computational Approaches to Studying Molecular Phylogenetics

  • Benu Atri
  • Olivier Lichtarge


Molecular phylogenetics is the study of evolutionary history among organisms. After selecting sequences and obtaining an optimal alignment for patterns of divergence, the next step is to build a graphical representation called a phylogenetic tree with each sequence as a branch of it. A tree represents a prediction of evolutionary relationships among organisms. In addition to uncovering evolutionary relationships, phylogenetic analysis finds applications in numerous ways such as guiding mutagenesis in laboratory, peptide design, and quantification of gene variants. This chapter focuses on the methodology of building a phylogenetic tree, which requires a careful selection of parameters as well as statistical analyses of the predictions for accuracy and robustness. We also discuss a number of tools which are based on algorithms with different underlying assumptions. These tools are available to perform different steps of any phylogenetic analysis including inference of phylogenetic trees and their visualization, estimating divergence times, mining online databases, estimating rates of molecular evolution, inferring ancestral sequences, and testing evolutionary hypotheses.


Phylogenetics Molecular evolution Neighbor joining UPGMA Tree building Bootstrapping Models of evolution Phylogenetic tools Evolutionary action Evolutionary trace 



This work is supported by a grant from the NIH Research Project Grant Program (2R01GM079656). The authors are grateful to Dr. David C. Marciano, Dr. Angela Wilkins, and Dr. Rhonald C. Lua for their helpful comments.


  1. Adikesavan AK, Katsonis P, Marciano DC et al (2011) Separation of recombination and SOS response in Escherichia coli RecA suggests LexA interaction sites. PLoS Genet 7:e1002244CrossRefGoogle Scholar
  2. David WM (2004) Bioinformatics: sequence and genome analysis, 2nd edn. Cold Spring Harbor Laboratory Press, Cold Spring HarborGoogle Scholar
  3. DeLano WL (2002) The PyMOL molecular graphics system.
  4. Dereeper A, Audic S, Claverie J-M, Blanc G (2010) BLAST-EXPLORER helps you building datasets for phylogenetic analysis. BMC Evol Biol 10:8. CrossRefPubMedPubMedCentralGoogle Scholar
  5. Dereeper A, Guignon V, Blanc G et al (2008) robust phylogenetic analysis for the non-specialist. Nucleic Acids Res 36:W465–W469. CrossRefPubMedPubMedCentralGoogle Scholar
  6. Edwards AWF, Cavalli-Sforza LL (1964) Reconstruction of evolutionary trees. In: Phenetic and phylogenetic classification, vol 6. Systematics Association, London, pp 67–76Google Scholar
  7. Efron B, Halloran E, Holmes S (1996) Bootstrap confidence levels for phylogenetic trees. Proc Natl Acad Sci USA 93:13429–13434. CrossRefPubMedGoogle Scholar
  8. Felsenstein J (1973) Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240–249CrossRefGoogle Scholar
  9. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376. CrossRefPubMedGoogle Scholar
  10. Felsenstein J (1983) Statistical inference of phylogenies. J R Stat Soc 126:246–272Google Scholar
  11. Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution (NY) 39:783–791. CrossRefGoogle Scholar
  12. Felsenstein J (1989) PHYLIP–phylogeny inference package (version 3.2). Cladistics 5:164–166Google Scholar
  13. Felsenstein J (2013) PHYLIP-phylogeny inference package (version 3.695). Department of Genome Sciences, University of Washington, SeattleGoogle Scholar
  14. Fitch WM (1971) Towards defining the course of evolution: minimum change for a specific tree topology. Syst Zool 20:406–416CrossRefGoogle Scholar
  15. Hasegawa M, Kishino H, Yano T (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174CrossRefGoogle Scholar
  16. Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109CrossRefGoogle Scholar
  17. Hillis DM, Bull JJ (1993) An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst Biol 42:182–192. CrossRefGoogle Scholar
  18. Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP (2001a) Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294:2310–2314CrossRefGoogle Scholar
  19. Huelsenbeck JP, Ronquist F (2001b) MrBayes: Bayesian inference of phylogeny. Bioinformatics 17:754–755CrossRefGoogle Scholar
  20. Huelsenbeck JP, Ronquist F (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574. CrossRefPubMedGoogle Scholar
  21. Jukes TH, Cantor CR (1969) Evolution of protein molecules. Academic, New York, pp 21–132Google Scholar
  22. Katsonis P, Lichtarge O (2014) A formal perturbation equation between genotype and phenotype determines the evolutionary action of protein coding variations on fitness. Genome Res 24:2050. CrossRefPubMedPubMedCentralGoogle Scholar
  23. Kumar S, Tamura K, Nei M (1994) MEGA: molecular evolutionary genetics analysis software for microcomputers. Comput Appl Biosci 10:189–191PubMedGoogle Scholar
  24. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120. CrossRefGoogle Scholar
  25. Lewis PO, Holder MT, Swofford DL (2015) Phycas: software for Bayesian phylogenetic analysis. Syst Biol 64:525–523. CrossRefPubMedGoogle Scholar
  26. Lichtarge O, Bourne HR, Cohen FE (1996) An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257:342–358. CrossRefPubMedGoogle Scholar
  27. Lua RC, Lichtarge O (2010) PyETV: a PyMOL evolutionary trace viewer to analyze functional site predictions in protein complexes. Bioinformatics 26:2981–2982. CrossRefPubMedPubMedCentralGoogle Scholar
  28. Lua RC, Wilson SJ, Konecki DM et al (2015) UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures. Nucleic Acids Res 44:D308–D312. CrossRefPubMedPubMedCentralGoogle Scholar
  29. Madabushi S, Yao H, Marsh M et al (2002) Structural clusters of evolutionary trace residues are statistically significant and common in proteins. J Mol Biol 316:139–154. CrossRefPubMedGoogle Scholar
  30. Maddison WP, Maddison DR (1999) MacClade: analysis of phylogeny and character evolution (version 3.08). Sinauer Associates, SunderlandGoogle Scholar
  31. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) J Chem Phys 21:1087CrossRefGoogle Scholar
  32. Mueller LD, Ayala FJ (1982) Estimation and interpretation of genetic distance in empirical studies. Genet Res 40:127–137CrossRefGoogle Scholar
  33. Revell LJ (2013) Rphylip: an R interface for PHYLIP. R package (Version 0-1.09)Google Scholar
  34. Rodriguez GJ, Yao R, Lichtarge O, Wensel TG (2010) Evolution-guided discovery and recoding of allosteric pathway specificity determinants in psychoactive bioamine receptors. Proc Natl Acad Sci USA 107:9476–9476. CrossRefGoogle Scholar
  35. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574. CrossRefPubMedPubMedCentralGoogle Scholar
  36. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425Google Scholar
  37. Shoji-Kawata S, RJr S, Leveno M, Campbell GR et al (2009) Identification of a candidate therapeutic autophagy–inducing peptide. Nature 33:1223–1229. CrossRefGoogle Scholar
  38. Sneath PHA, Sokal RR (1973) Numerical taxonomy. W.H. Freeman, San FranciscoGoogle Scholar
  39. Swofford DL (1991) PAUP: Phylogenetic analysis using parsimony, (version 3.1) computer program distributed by the Illinois. Natural History Survey, ChampaignGoogle Scholar
  40. Tamura K, Nei M (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10:512–526PubMedPubMedCentralGoogle Scholar
  41. Tamura K, Stecher G, Peterson D et al (2013) MEGA6: molecular evolutionary genetics analysis (version 6.0). Mol Biol Evol 30:2725–2729. CrossRefPubMedPubMedCentralGoogle Scholar
  42. Tavaré S (1986) Some probabilistic and statistical problems in the analysis of DNA sequences. Lectures on mathematics in the life sciences. Am Math Soc 17:57–86Google Scholar
  43. Ward RM, Venner E, Daines B et al (2009) Evolutionary trace annotation server: automated enzyme function prediction in protein structures using 3D templates. Bioinformatics 25:1426–1427. CrossRefPubMedGoogle Scholar
  44. Wilkins AD, Lua R, Erdin S et al (2010) Sequence and structure continuity of evolutionary importance improves protein functional site discovery and annotation. Protein Sci 19:1296–1311. CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Benu Atri
    • 1
  • Olivier Lichtarge
    • 1
    • 2
    • 3
  1. 1.Quantitative and Computational Biosciences, Baylor College of MedicineHoustonUSA
  2. 2.Center for Computational and Integrative Biomedical Research (CIBR)Baylor College of MedicineHoustonUSA
  3. 3.Department of Molecular and Human GeneticsBaylor College of MedicineHoustonUSA

Personalised recommendations