Skip to main content

Large-Scale Multiple Sequence Alignment and Phylogeny Estimation

  • Chapter

Part of the book series: Computational Biology ((COBO,volume 19))

Abstract

With the advent of next generation sequencing technologies, alignment and phylogeny estimation of datasets with thousands of sequences is being attempted. To address these challenges, new algorithmic approaches have been developed that have been able to provide substantial improvements over standard methods. This paper focuses on new approaches for ultra-large tree estimation, including methods for co-estimation of alignments and trees, estimating trees without needing a full sequence alignment, and phylogenetic placement. While the main focus is on methods with empirical performance advantages, we also discuss the theoretical guarantees of methods under Markov models of evolution. Finally, we include a discussion of the future of large-scale phylogenetic analysis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The famous quote by Dobzhansky “Nothing in biology makes sense except in the light of evolution” [1] reflects the less known quote by the Jesuit priest Pierre Teilhard de Chardin [2], who wrote “Evolution is a light which illuminates all facts, a curve that all lines must follow.”

  2. 2.

    Morgan Price, personal communication, 1 May 2013.

  3. 3.

    Alexis Stamatakis, personal communication, 1 May 2013.

  4. 4.

    The p-distance between two aligned sequences is the number of positions in which the two sequences differ, and then normalized to give a number between 0 and 1.

References

  1. Dobzhansky, T.: Nothing in biology makes sense except in the light of evolution. Am. Biol. Teach. 35, 125–129 (1973)

    Google Scholar 

  2. de Chardin, P.T.: Le Phénomene Humain. Harper Perennial, New York (1959)

    Google Scholar 

  3. Eisen, J.A.: Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8, 163–167 (1998)

    Google Scholar 

  4. Wang, L.-S., Leebens-Mack, J., Wall, K., Beckmann, K., de Pamphilis, C., et al.: The impact of protein multiple sequence alignment on phylogeny estimation. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 1108–1119 (2011)

    Google Scholar 

  5. Simmons, M., Freudenstein, J.: The effects of increasing genetic distance on alignment of, and tree construction from, rDNA internal transcribed spacer sequences. Mol. Phylogenet. Evol. 26, 444–451 (2003)

    Google Scholar 

  6. Liu, K., Linder, C.R., Warnow, T.: Multiple sequence alignment: a major challenge to large-scale phylogenetics. PLoS Currents: Tree of Life (2010)

    Google Scholar 

  7. Hall, B.G.: Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences. Mol. Evol. Biol. 22, 792–802 (2005)

    Google Scholar 

  8. Kumar, S., Filipski, A.: Multiple sequence alignment: in pursuit of homologous DNA positions. Genome Res. 17, 127–135 (2007)

    Google Scholar 

  9. Ogden, T., Rosenberg, M.: Multiple sequence alignment accuracy and phylogenetic inference. Syst. Biol. 55, 314–328 (2006)

    Google Scholar 

  10. Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324, 1561–1564 (2009)

    Google Scholar 

  11. Morrison, D.: Multiple sequence alignment for phylogenetic purposes. Aust. Syst. Bot. 19, 479–539 (2006)

    Google Scholar 

  12. Graybeal, A.: Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47, 9–17 (1998)

    Google Scholar 

  13. Pollock, D., Zwickl, D., McGuire, J., Hillis, D.: Increased taxon sampling is advantageous for phylogenetic inference. Syst. Biol. 51, 664–671 (2002)

    Google Scholar 

  14. Zwickl, D., Hillis, D.: Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol. 51, 588–598 (2002)

    Google Scholar 

  15. Hillis, D.: Inferring complex phylogenies. Nature 383, 130–131 (1996)

    Google Scholar 

  16. Felsenstein, J.: Inferring Phylogenies. Sinauer Associates, Sunderland (2003)

    Google Scholar 

  17. Kim, J., Warnow, T.: Tutorial on phylogenetic tree estimation. Presented at the ISMB 1999 Conference (1999). Available on-line at http://www.cs.utexas.edu/users/tandy/tutorial.ps

  18. Linder, C.R., Warnow, T.: An overview of phylogeny reconstruction. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology. Chapman and Hall/CRC Computer and Information Science Series, vol. 9. CRC Press, Boca Raton (2005)

    Google Scholar 

  19. Semple, C., Steel, M.: Phylogenetics. Oxford University Press, London (2003)

    MATH  Google Scholar 

  20. Hillis, D., Moritz, C., Mable, B. (eds.): Molecular Systematics. Sinauer Associates, Sunderland (1996)

    Google Scholar 

  21. Ortuno, F., Valenzuela, O., Pomares, H., Rojas, F., Florido, J., et al.: Predicting the accuracy of multiple sequence alignment algorithms by using computational intelligent techniques. Nucleic Acids Res. 41 (2013)

    Google Scholar 

  22. Whelan, S., Lin, P., Goldman, N.: Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Genet. 17, 262–272 (2001)

    Google Scholar 

  23. Goldman, N., Yang, Z.: Introduction: statistical and computational challenges in molecular phylogenetics and evolution. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 363, 3889–3892 (2008)

    Google Scholar 

  24. Kemena, C., Notredame, C.: Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 25, 2455–2465 (2009)

    Google Scholar 

  25. Do, C., Katoh, K.: Protein multiple sequence alignment. In: Methods in Molecular Biology: Functional Proteomics, Methods and Protocols, vol. 484, pp. 379–413. Humana Press, Clifton (2008)

    Google Scholar 

  26. Mokaddem, A., Elloumi, M.: Algorithms for the alignment of biological sequences. In: Elloumi, M., Zomaya, A. (eds.) Algorithms in Computational Molecular Biology. Wiley, New York (2011). doi:10.1002/9780470892107.ch12

    Google Scholar 

  27. Pei, J.: Multiple protein sequence alignment. Curr. Opin. Struct. Biol. 18, 382–386 (2008)

    Google Scholar 

  28. Sievers, F., Wilm, A., Dineen, D., Gibson, T., Karplus, K., et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol. Syst. Biol. 7 (2011)

    Google Scholar 

  29. Katoh, K., Toh, H.: PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23(3), 372–374 (2007)

    Google Scholar 

  30. Nelesen, S., Liu, K., Wang, L.S., Linder, C.R., Warnow, T.: DACTAL: divide-and-conquer trees (almost) without alignments. Bioinformatics 28, i274–i282 (2012)

    Google Scholar 

  31. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., Mcgettigan, P.A., et al.: ClustalW and ClustalX version 2.0. Bioinformatics 23, 2947–2948 (2007)

    Google Scholar 

  32. Lassmann, T., Frings, O., Sonnhammer, E.: Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acids Res. 37, 858–865 (2009)

    Google Scholar 

  33. Neuwald, A.: Rapid detection, classification, and accurate alignment of up to a million or more related protein sequences. Bioinformatics 25, 1869–1875 (2009)

    Google Scholar 

  34. Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree-2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010). 10.1371/journal.pone.0009490

    Google Scholar 

  35. Smith, S., Beaulieu, J., Stamatakis, A., Donoghue, M.: Understanding angiosperm diversification using small and large phylogenetic trees. Am. J. Bot. 98, 404–414 (2011)

    Google Scholar 

  36. Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006)

    Google Scholar 

  37. Goloboff, P.A., Catalano, S.A., Mirande, J.M., Szumik, C.A., Arias, J.S., et al.: Phylogenetic analysis of 73,060 taxa corroborates major eukaryotic groups. Cladistics 25, 211–230 (2009)

    Google Scholar 

  38. Goloboff, P., Farris, J., Nixon, K.: TNT, a free program for phylogenetic analysis. Cladistics 24, 774–786 (2008)

    Google Scholar 

  39. Liu, K., Warnow, T., Holder, M., Nelesen, S., Yu, J., et al.: SATé-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Syst. Biol. 61, 90–106 (2011)

    Google Scholar 

  40. Maddison, W.: Gene trees in species trees. Syst. Biol. 46, 523–536 (1997)

    Google Scholar 

  41. Delsuc, F., Brinkmann, H., Philippe, H.: Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6, 361–375 (2005)

    Google Scholar 

  42. Edwards, S.V.: Is a new and general theory of molecular systematics emerging? Evolution 63, 1–19 (2009)

    Google Scholar 

  43. Dunn, C.W., Hejnol, A., Matus, D.Q., Pang, K., Browne, W.E., et al.: Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452, 745–749 (2008)

    Google Scholar 

  44. Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., et al.: A phylogeny-driven genomic encyclopedia of bacteria and archaea. Nature 462, 1056–1060 (2009)

    Google Scholar 

  45. Eisen, J., Fraser, C.: Phylogenomics: intersection of evolution and genomics. Science 300, 1706–1707 (2003)

    Google Scholar 

  46. Bininda-Emonds, O. (ed.): Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Kluwer Academic, Dordrecht (2004)

    Google Scholar 

  47. Baum, B., Ragan, M.A.: The MRP method. In: Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, pp. 17–34. Kluwer Academic, Dordrecht (2004)

    Google Scholar 

  48. Chen, D., Eulenstein, O., Fernández-Baca, D., Sanderson, M.: Minimum-flip supertrees: complexity and algorithms. IEEE/ACM Trans. Comput. Biol. Bioinform. 3, 165–173 (2006)

    Google Scholar 

  49. Bininda-Emonds, O.R.P.: The evolution of supertrees. Trends Ecol. Evol. 19, 315–322 (2004)

    Google Scholar 

  50. Snir, S., Rao, S.: Quartets MaxCut: a divide and conquer quartets algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. 7, 704–718 (2010)

    Google Scholar 

  51. Steel, M., Rodrigo, A.: Maximum likelihood supertrees. Syst. Biol. 57, 243–250 (2008)

    Google Scholar 

  52. Swenson, M., Suri, R., Linder, C., Warnow, T.: An experimental study of quartets MaxCut and other supertree methods. Algorithms Mol. Biol. 6(1), 7 (2011)

    Google Scholar 

  53. Swenson, M., Suri, R., Linder, C., Warnow, T.: SuperFine: fast and accurate supertree estimation. Syst. Biol. 61, 214–227 (2012)

    Google Scholar 

  54. Nguyen, N., Mirarab, S., Warnow, T.: MRL and SuperFine+MRL: new supertree methods. Algorithms Mol. Biol. 7(3) (2012)

    Google Scholar 

  55. Than, C.V., Nakhleh, L.: Species tree inference by minimizing deep coalescences. PLoS Comput. Biol. 5 (2009)

    Google Scholar 

  56. Boussau, B., Szollosi, G., Duret, L., Gouy, M., Tannier, E., et al.: Genome-scale co-estimation of species and gene trees. Genome Res. 23(2), 323–330 (2013)

    Google Scholar 

  57. Degnan, J.H., Rosenberg, N.A.: Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 26, 332–340 (2009)

    Google Scholar 

  58. Chaudhary, R., Bansal, M.S., Wehe, A., Fernández-Baca, D., Eulenstein, O.: IGTP: a software package for large-scale gene tree parsimony analysis. BMC Bioinform. 11, 574 (2010)

    Google Scholar 

  59. Larget, B., Kotha, S.K., Dewey, C.N., Ané, C.: BUCKy: gene tree/species tree reconciliation with the Bayesian concordance analysis. Bioinformatics 26, 2910–2911 (2010)

    Google Scholar 

  60. Yu, Y., Warnow, T., Nakhleh, L.: Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. J. Comput. Biol. 18, 1543–1559 (2011)

    MathSciNet  Google Scholar 

  61. Yang, J., Warnow, T.: Fast and accurate methods for phylogenomic analyses. BMC Bioinform. 12(Suppl 9), S4 (2011). doi:10.1186/1471-2105-12-S9-S4

    Google Scholar 

  62. Liu, L., Yu, L., Edwards, S.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10, 302 (2010)

    Google Scholar 

  63. Chauve, C., Doyon, J.P., El-Mabrouk, N.: Gene family evolution by duplication, speciation, and loss. J. Comput. Biol. 15, 1043–1062 (2008)

    MathSciNet  Google Scholar 

  64. Hallett, M.T., Lagergren, J.: New algorithms for the duplication-loss model. In: Proceedings RECOMB 2000, pp. 138–146. ACM Press, New York (2000)

    Google Scholar 

  65. Doyon, J.P., Chauve, C.: Branch-and-bound approach for parsimonious inference of a species tree from a set of gene family trees. Adv. Exp. Med. Biol. 696, 287–295 (2011)

    Google Scholar 

  66. Ma, B., Li, M., Zhang, L.: From gene trees to species trees. SIAM J. Comput. 30, 729–752 (2000)

    MathSciNet  MATH  Google Scholar 

  67. Zhang, L.: From gene trees to species trees II: species tree inference by minimizing deep coalescence events. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 1685–1691 (2011)

    Google Scholar 

  68. Arvestad, L., Berglung, A.C., Lagergren, J., Sennblad, B.: Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution. In: Bininda-Emonds, O. (ed.) Proc. RECOMB 2004, pp. 238–252 (2004)

    Google Scholar 

  69. Sennblad, B., Lagergren, J.: Probabilistic orthology analysis. Syst. Biol. 58, 411–424 (2009)

    Google Scholar 

  70. Edwards, S., Liu, L., Pearl, D.: High-resolution species trees without concatenation. Proc. Natl. Acad. Sci. USA 104, 5936–5941 (2007)

    Google Scholar 

  71. Heled, J., Drummond, A.J.: Bayesian inference of species trees from multilocus data. Mol. Biol. Evol. 27, 570–580 (2010)

    Google Scholar 

  72. Roch, S.: An analytical comparison of multilocus methods under the multispecies coalescent: the three-taxon case. In: Proc. Pacific Symposium on Biocomputing, vol. 18, pp. 297–306 (2013)

    Google Scholar 

  73. Kopelman, N.M., Stone, L., Gascuel, O., Rosenberg, N.A.: The behavior of admixed populations in neighbor-joining inference of population trees. In: Proc. Pacific Symposium on Biocomputing, vol. 18 (2013)

    Google Scholar 

  74. Degnan, J.H.: Evaluating variations on the STAR algorithm for relative efficiency and sample sizes needed to reconstruct species trees. In: Proc. Pacific Symposium on Biocomputing, vol. 18, pp. 262–272 (2013)

    Google Scholar 

  75. Bayzid, M., Mirarab, S., Warnow, T.: Inferring optimal species trees under gene duplication and loss. In: Proc. Pacific Symposium on Biocomputing, vol. 18, pp. 250–261 (2013)

    Google Scholar 

  76. Pei, J., Grishin, N.: PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23, 802–808 (2007)

    Google Scholar 

  77. Edgar, R.C., Sjölander, K.: SATCHMO: sequence alignment and tree construction using hidden Markov models. Bioinformatics 19, 1404–1411 (2003)

    Google Scholar 

  78. Hagopian, R., Davidson, J., Datta, R., Jarvis, G., Sjölander, K.: SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction. Nucleic Acids Res. 38(Web Server Issue), W29–W34 (2010)

    Google Scholar 

  79. O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D., Notredame, C.: 3DCoffee: combining protein sequences and structure within multiple sequence alignments. J. Mol. Biol. 340, 385–395 (2004)

    Google Scholar 

  80. Zhou, H., Zhou, Y.: SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21, 3615–3621 (2005)

    Google Scholar 

  81. Deng, X., Cheng, J.: MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts. BMC Bioinform. 12, 472 (2011)

    Google Scholar 

  82. Roshan, U., Livesay, D.R.: Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22, 2715–2721 (2006)

    Google Scholar 

  83. Roshan, U., Chikkagoudar, S., Livesay, D.R.: Searching for RNA homologs within large genomic sequences using partition function posterior probabilities. BMC Bioinform. 9, 61 (2008)

    Google Scholar 

  84. Do, C., Mahabhashyam, M., Brudno, M., Batzoglou, S.: PROBCONS: probabilistic consistency-based multiple sequence alignment of amino acid sequences. Software available at http://probcons.stanford.edu/download.html (2006)

  85. Nawrocki, E.P., Kolbe, D.L., Eddy, S.R.: Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009)

    Google Scholar 

  86. Nawrocki, E.P.: Structural RNA homology search and alignment using covariance models. Ph.D. thesis, Washington University in Saint Louis, School of Medicine (2009)

    Google Scholar 

  87. Gardner, D., Xu, W., Miranker, D., Ozer, S., Cannonne, J., et al.: An accurate scalable template-based alignment algorithm. In: Proc. International Conference on Bioinformatics and Biomedicine, 2012, pp. 237–243 (2012)

    Google Scholar 

  88. Edgar, R.C.: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 5, 113 (2004)

    Google Scholar 

  89. Mirarab, S., Warnow, T.: FastSP: linear-time calculation of alignment accuracy. Bioinformatics 27, 3250–3258 (2011)

    Google Scholar 

  90. Blackburne, B., Whelan, S.: Measuring the distance between multiple sequence alignments. Bioinformatics 28, 495–502 (2012)

    Google Scholar 

  91. Stojanovic, N., Florea, L., Riemer, C., Gumucio, D., Slightom, J., et al.: Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. Nucleic Acids Res. 27, 3899–3910 (1999)

    Google Scholar 

  92. Edgar, R.: Quality measures for protein alignment benchmarks. Nucleic Acids Res. 7, 2145–2153 (2010)

    Google Scholar 

  93. Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27, 2682–2690 (1999)

    Google Scholar 

  94. Thompson, J., Plewniak, F., Poch, O.: BAliBASE: a benchmark alignments database for the evaluation of multiple sequence alignment programs. Bioinformatics 15, 87–88 (1999)

    Google Scholar 

  95. Raghava, G., Searle, S.M., Audley, P.C., Barber, J.D., Barton, G.J.: Oxbench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinform. 4, 47 (2003)

    Google Scholar 

  96. Gardner, P., Wilm, A., Washietl, S.: A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res. 33, 2433–2439 (2005)

    Google Scholar 

  97. Walle, I.L.V., Wyns, L.: SABmark-a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21, 1267–1268 (2005)

    Google Scholar 

  98. Carroll, H., Beckstead, W., O’Connor, T., Ebbert, M., Clement, M., et al.: DNA reference alignment benchmarks based on tertiary structure of encoded proteins. Bioinformatics 23, 2648–2649 (2007)

    Google Scholar 

  99. Blazewicz, J., Formanowicz, P., Wojciechowski, P.: Some remarks on evaluating the quality of the multiple sequence alignment based on the BAliBASE benchmark. Int. J. Appl. Math. Comput. Sci. 19, 675–678 (2009)

    Google Scholar 

  100. Iantomo, S., Gori, K., Goldman, N., Gil, M., Dessimoz, C.: Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment. arXiv:1211.2160 [q-bio.QM] (2012)

  101. Aniba, M., Poch, O., Thompson, J.D.: Issues in bioinformatics benchmarking: the case study of multiple sequence alignment. Nucleic Acids Res. 38, 7353–7363 (2010)

    Google Scholar 

  102. Morrison, D.A.: Why would phylogeneticists ignore computerized sequence alignment? Syst. Biol. 58, 150–158 (2009)

    Google Scholar 

  103. Reeck, G., de Haen, C., Teller, D., Doolitte, R., Fitch, W., et al.: “Homology” in proteins and nucleic acids: a terminology muddle and a way out of it. Cell 50, 667 (1987)

    Google Scholar 

  104. Galperin, M., Koonin, E.: Divergence and convergence in enzyme evolution. J. Biol. Chem. 287, 21–28 (2012)

    Google Scholar 

  105. Sjolander, K.: Getting started in structural phylogenomics. PLoS Comput. Biol. 6, e1000621 (2010)

    MathSciNet  Google Scholar 

  106. Katoh, K., Kuma, K., Miyata, T., Toh, H.: Improvement in the accuracy of multiple sequence alignment MAFFT. Genome Inf. 16, 22–33 (2005)

    Google Scholar 

  107. Do, C., Mahabhashyam, M., Brudno, M., Batzoglou, S.: PROBCONS: probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340 (2005)

    Google Scholar 

  108. Loytynoja, A., Goldman, N.: An algorithm for progressive multiple alignment of sequences with insertions. Proc. Natl. Acad. Sci. 102, 10557–10562 (2005)

    Google Scholar 

  109. Nelesen, S., Liu, K., Zhao, D., Linder, C.R., Warnow, T.: The effect of the guide tree on multiple sequence alignments and subsequent phylogenetic analyses. In: Proc. Pacific Symposium on Biocomputing, vol. 13, pp. 15–24 (2008)

    Google Scholar 

  110. Fletcher, W., Yang, Z.: The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. Mol. Biol. Evol. 27, 2257–2267 (2010)

    Google Scholar 

  111. Penn, O., Privman, E., Landan, G., Graur, D., Pupko, T.: An alignment confidence score capturing robustness to guide tree uncertainty. Mol. Biol. Evol. 27, 1759–1767 (2010)

    Google Scholar 

  112. Toth, A., Hausknecht, A., Krisai-Greilhuber, I., Papp, T., Vagvolgyi, C., et al.: Iteratively refined guide trees help improving alignment and phylogenetic inference in the mushroom family bolbitiaceae. PLoS ONE 8, e56143 (2013)

    Google Scholar 

  113. Capella-Gutiérrez, S., Gabaldón, T.: Measuring guide-tree dependency of inferred gaps for progressive aligners. Bioinformatics 29(8), 1011–1017 (2013)

    Google Scholar 

  114. Preusse, E., Quast, C., Knittel, K., Fuchs, B., Ludwig, W., et al.: SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 35, 718–796 (2007)

    Google Scholar 

  115. DeSantis, T., Hugenholtz, P., Keller, K., Brodie, E., Larsen, N., et al.: NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res. 34, W394–W399 (2006)

    Google Scholar 

  116. Löytynoja, A., Vilella, A.J., Goldman, N.: Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm. Bioinformatics 28, 1685–1691 (2012)

    Google Scholar 

  117. Papadopoulos, J.S., Agarwala, R.: COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23, 1073–1079 (2007)

    Google Scholar 

  118. Berger, S.A., Stamatakis, A.: Aligning short reads to reference alignments and trees. Bioinformatics 27, 2068–2075 (2011)

    Google Scholar 

  119. Sievers, F., Dineen, D., Wilm, A., Higgins, D.G.: Making automated multiple alignments of very large numbers of protein sequences. Bioinformatics 29(8), 989–995 (2013)

    Google Scholar 

  120. Smith, S., Beaulieu, J., Donoghue, M.: Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches. BMC Evol. Biol. 9, 37 (2009)

    Google Scholar 

  121. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)

    Google Scholar 

  122. Roquet, C., Thuiller, W., Lavergne, S.: Building megaphylogenies for macroecology: taking up the challenge. Ecography 36, 013–026 (2013)

    Google Scholar 

  123. Steel, M.A.: Recovering a tree from the leaf colourations it generates under a Markov model. Appl. Math. Lett. 7, 19–24 (1994)

    MathSciNet  MATH  Google Scholar 

  124. Evans, S., Warnow, T.: Unidentifiable divergence times in rates-across-sites models. IEEE/ACM Trans. Comput. Biol. Bioinform. 1, 130–134 (2005)

    Google Scholar 

  125. Tavaré, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. In: Lectures on Mathematics in the Life Sciences, vol. 17, pp. 57–86 (1986)

    Google Scholar 

  126. Dayhoff, M., Schwartz, R., Orcutt, B.: A model of evolutionary change in proteins. In: Dayhoff, M. (ed.) Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, pp. 345–352 (1978)

    Google Scholar 

  127. Lakner, C., Holder, M., Goldman, N., Naylor, G.: What’s in a likelihood? Simple models of protein evolution and the contribution of structurally viable reconstructions to the likelihood. Syst. Biol. 60, 161–174 (2011)

    Google Scholar 

  128. Le, S., Gascuel, O.: An improved general amino acid replacement matrix. Mol. Biol. Evol. 25, 1307–1320 (2008)

    Google Scholar 

  129. Whelan, S., Goldman, N.: A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699 (2001)

    Google Scholar 

  130. Kosiol, C., Goldman, N.: Different versions of the Dayhoff rate matrix. Mol. Biol. Evol. 22, 193–199 (2005)

    Google Scholar 

  131. Thorne, J.: Models of protein sequence evolution and their applications. Curr. Opin. Genet. Dev. 10, 602–605 (2000)

    Google Scholar 

  132. Thorne, J., Goldman, N.: Probabilistic models for the study of protein evolution. In: Balding, D., Bishop, M., Cannings, C. (eds.) Handbook of Statistical Genetics, pp. 209–226. Wiley, New York (2003)

    Google Scholar 

  133. Adachi, J., Hasegawa, M.: Model of amino acid substitution in proteins encoded by mitochondrial DNA. J. Mol. Evol. 42, 459–468 (1996)

    Google Scholar 

  134. Goldman, N., Yang, Z.: A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11, 725–736 (1994)

    Google Scholar 

  135. Scherrer, M., Meyer, A., Wilke, C.: Modeling coding-sequence evolution within the context of residue solvent accessibility. BMC Evol. Biol. 12, 179 (2012)

    Google Scholar 

  136. Mayrose, I., Doron-Faigenbom, A., Bacharach, E., Pupko, T.: Towards realistic codon models: among site variability and dependency of synonymous and non-synonymous rates. Bioinformatics 23, i319–i327 (2007)

    Google Scholar 

  137. Abascal, F., Zardoya, R., Posada, D.: ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21, 2104–2105 (2005)

    Google Scholar 

  138. Wilke, C.: Bringing molecules back into molecular evolution. PLoS Comput. Biol. 8, e1002572 (2012)

    Google Scholar 

  139. Liberles, D., Teichmann, S., et al.: The inference of protein structure, protein biophysics, and molecular evolution. Protein Sci. 21, 769–785 (2012)

    Google Scholar 

  140. Lopez, P., Casane, D., Philippe, H.: Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19, 1–7 (2002)

    Google Scholar 

  141. Whelan, S.: Spatial and temporal heterogeneity in nucleotide sequence evolution. Mol. Biol. Evol. 25, 1683–1694 (2008)

    Google Scholar 

  142. Tuffley, C., Steel, M.: Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull. Math. Biol. 59, 581–607 (1997)

    MATH  Google Scholar 

  143. Steel, M.A.: Can we avoid ‘SIN’ in the house of ‘No Common Mechanism’? Syst. Biol. 60, 96–109 (2011)

    Google Scholar 

  144. Lobkovsky, A., Wolf, Y., Koonin, E.: Gene frequency distributions reject a neutral model of genome evolution. Genome Biol. Evol. 5, 233–242 (2013)

    Google Scholar 

  145. Galtier, N., Gouy, M.: Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol. Biol. Evol. 15, 871–879 (1998)

    Google Scholar 

  146. Foulds, L.R., Graham, R.L.: The Steiner problem in phylogeny is NP-complete. Adv. Appl. Math. 3, 43–49 (1982)

    MathSciNet  MATH  Google Scholar 

  147. Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981)

    Google Scholar 

  148. Allman, E.S., Ané, C., Rhodes, J.: Identifiability of a Markovian model of molecular evolution with gamma-distributed rates. Adv. Appl. Probab. 40, 229–249 (2008)

    MATH  Google Scholar 

  149. Allman, E.S., Rhodes, J.: Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. Math. Biosci. 211, 18–33 (2008)

    MathSciNet  MATH  Google Scholar 

  150. Allman, E.S., Rhodes, J.A.: The identifiability of tree topology for phylogenetic models, including covariant and mixture models. J. Comput. Biol. 13, 1101–1113 (2006)

    MathSciNet  Google Scholar 

  151. Atteson, K.: The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica 25, 251–278 (1999)

    MathSciNet  MATH  Google Scholar 

  152. Chang, J.: Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math. Biosci. 137, 51–73 (1996)

    MathSciNet  MATH  Google Scholar 

  153. Steel, M.A.: Consistency of Bayesian inference of resolved phylogenetic trees. arXiv:1001.2864 [q-bioPE] (2010)

  154. Felsenstein, J.: Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27, 401–410 (1978)

    Google Scholar 

  155. Chang, J.T.: Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters. Math. Biosci. 134, 189–215 (1996)

    MathSciNet  MATH  Google Scholar 

  156. Matsen, F., Steel, M.: Phylogenetic mixtures on a single tree can mimic a tree of another topology. Syst. Biol. 56, 767–775 (2007)

    Google Scholar 

  157. Allman, E., Rhodes, J., Sullivant, S.: When do phylogenetic mixture models mimic other phylogenetic models? Syst. Biol. 61, 1049–1059 (2012)

    Google Scholar 

  158. Erdos, P., Steel, M., Szekely, L., Warnow, T.: Local quartet splits of a binary tree infer all quartet splits via one dyadic inference rule. Comput. Artif. Intell. 16, 217–227 (1997)

    MathSciNet  Google Scholar 

  159. Erdos, P., Steel, M., Szekely, L., Warnow, T.: A few logs suffice to build (almost) all trees (i). Random Struct. Algorithms 14, 153–184 (1999)

    MathSciNet  Google Scholar 

  160. Erdos, P., Steel, M., Szekely, L., Warnow, T.: A few logs suffice to build (almost) all trees (ii). Theor. Comput. Sci. 221, 77–118 (1999)

    MathSciNet  Google Scholar 

  161. Lacey, M.R., Chang, J.T.: A signal-to-noise analysis of phylogeny estimation by neighbor-joining: insufficiency of polynomial length sequences. Math. Biosci. 199, 188–215 (2006)

    MathSciNet  MATH  Google Scholar 

  162. Csürős, M., Kao, M.Y.: Recovering evolutionary trees through harmonic greedy triplets. Proc. SODA 99, 261–270 (1999)

    Google Scholar 

  163. Csurös, M.: Fast recovery of evolutionary trees with thousands of nodes. J. Comput. Biol. 9, 277–297 (2002)

    Google Scholar 

  164. Huson, D., Nettles, S., Warnow, T.: Disk-covering, a fast converging method for phylogenetic tree reconstruction. J. Comput. Biol. 6, 369–386 (1999)

    Google Scholar 

  165. Steel, M.A., Székely, L.A.: Inverting random functions. Ann. Comb. 3, 103–113 (1999)

    MathSciNet  MATH  Google Scholar 

  166. Steel, M.A., Székely, L.A.: Inverting random functions—II: explicit bounds for discrete maximum likelihood estimation, with applications. SIAM J. Discrete Math. 15, 562–575 (2002)

    MathSciNet  MATH  Google Scholar 

  167. King, V., Zhang, L., Zhou, Y.: On the complexity of distance-based evolutionary tree reconstruction. In: SODA: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pp. 444–453 (2003)

    Google Scholar 

  168. Mossel, E., Roch, S.: Learning nonsingular phylogenies and hidden Markov models. In: Proc. 37th Symp. on the Theory of Computing (STOC’05), pp. 366–376 (2005)

    Google Scholar 

  169. Mossel, E., Roch, S.: Learning nonsingular phylogenies and hidden Markov models. Ann. Appl. Probab. 16, 538–614 (2006)

    MathSciNet  Google Scholar 

  170. Daskalakis, C., Mossel, E., Roch, S.: Optimal phylogenetic reconstruction. In: STOC’06: Proceedings of the 38th Annual ACM Symposium on Theory of Computing, pp. 159–168 (2006)

    Google Scholar 

  171. Daskalakis, C., Hill, C., Jaffe, A., Mihaescu, R., Mossel, E., et al.: Maximal accurate forests from distance matrices. In: RECOMB, pp. 281–295 (2006)

    Google Scholar 

  172. Mossel, E.: Distorted metrics on trees and phylogenetic forests. IEEE/ACM Trans. Comput. Biol. Bioinform. 4, 108–116 (2007)

    Google Scholar 

  173. Gronau, I., Moran, S., Snir, S.: Fast and reliable reconstruction of phylogenetic trees with very short edges. In: SODA (ACM/SIAM Symp. Disc. Alg), pp. 379–388 (2008)

    Google Scholar 

  174. Roch, S.: Sequence-length requirement for distance-based phylogeny reconstruction: breaking the polynomial barrier. In: FOCS (Foundations of Computer Science), pp. 729–738 (2008)

    Google Scholar 

  175. Daskalakis, C., Mossel, E., Roch, S.: Phylogenies without branch bounds: contracting the short, pruning the deep. In: RECOMB, pp. 451–465 (2009)

    Google Scholar 

  176. Lin, Y., Rajan, V., Moret, B.: A metric for phylogenetic trees based on matching. IEEE/ACM Trans. Comput. Biol. Bioinform. 9, 1014–1022 (2012)

    Google Scholar 

  177. Rannala, B., Huelsenbeck, J., Yang, Z., Nielsen, R.: Taxon sampling and the accuracy of large phylogenies. Syst. Biol. 47, 702–710 (1998)

    Google Scholar 

  178. Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981)

    MathSciNet  MATH  Google Scholar 

  179. Huelsenbeck, J., Hillis, D.: Success of phylogenetic methods in the four-taxon case. Syst. Biol. 42, 247–265 (1993)

    Google Scholar 

  180. Hillis, D.: Taxonomic sampling, phylogenetic accuracy, and investigator bias. Syst. Biol. 47, 3–8 (1998)

    Google Scholar 

  181. Nakhleh, L., Moret, B., Roshan, U., St John, K., Sun, J., et al.: The accuracy of fast phylogenetic methods for large datasets. In: Proc. 7th Pacific Symposium on BioComputing, pp. 211–222. World Scientific, Singapore (2002)

    Google Scholar 

  182. Zwickl, D.J., Hillis, D.M.: Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol. 51, 588–598 (2002)

    Google Scholar 

  183. Pollock, D.D., Zwickl, D.J., McGuire, J.A., Hillis, D.M.: Increased taxon sampling is advantageous for phylogenetic inference. Syst. Biol. 51, 664–671 (2002)

    Google Scholar 

  184. Wiens, J.: Missing data and the design of phylogenetic analyses. J. Biomed. Inform. 39, 36–42 (2006)

    Google Scholar 

  185. Lemmon, A., Brown, J., Stanger-Hall, K., Lemmon, E.: The effect of ambiguous data on phylogenetic estimates obtained by maximum-likelihood and Bayesian inference. Syst. Biol. 58, 130–145 (2009)

    Google Scholar 

  186. Wiens, J., Morrill, M.: Missing data in phylogenetic analysis: reconciling results from simulations and empirical data. Syst. Biol. 60, 719–731 (2011)

    Google Scholar 

  187. Simmons, M.: Misleading results of likelihood-based phylogenetic analyses in the presence of missing data. Cladistics 28, 208–222 (2012)

    Google Scholar 

  188. Moret, B., Roshan, U., Warnow, T.: Sequence-length requirements for phylogenetic methods. In: Guigo, R., Gusfield, D. (eds.) Proc. 2nd International Workshop on Algorithms in Bioinformatics. Lecture Notes in Computer Science, vol. 2452, pp. 343–356. Springer, Berlin (2002)

    Google Scholar 

  189. Gascuel, O.: BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14, 685–695 (1997)

    Google Scholar 

  190. Bruno, W.J., Socci, N.D., Halpern, A.L.: Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol. Biol. Evol. 17, 189–197 (2000)

    Google Scholar 

  191. Wheeler, T.: Large-scale neighbor-joining with NINJA. In: Proc. Workshop Algorithms in Bioinformatics (WABI), vol. 5724, pp. 375–389 (2009)

    Google Scholar 

  192. Desper, R., Gascuel, O.: Fast and accurate phylogeny reconstruction algorithm based on the minimum-evolution principle. J. Comput. Biol. 9, 687–705 (2002)

    Google Scholar 

  193. Price, M., Dehal, P., Arkin, A.: FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 7, 1641–1650 (2009)

    Google Scholar 

  194. Brown, D., Truszkowski, J.: Towards a practical O(nlogn) phylogeny algorithm. In: Proc. Workshop Algorithms in Bioinformatics (WABI), pp. 14–25 (2011)

    Google Scholar 

  195. Rice, K., Warnow, T.: Parsimony is hard to beat! In: Jiang, T., Lee, D. (eds.) Proceedings, Third Annual International Conference of Computing and Combinatorics (COCOON), pp. 124–133 (1997)

    Google Scholar 

  196. Hillis, D., Huelsenbeck, J., Swofford, D.: Hobgoblin of phylogenetics. Nature 369, 363–364 (1994)

    Google Scholar 

  197. Swofford, D.: PAUP*: Phylogenetic Analysis Using Parsimony (and Other Methods), Version 4.0. Sinauer Associates, Sunderland (1996)

    Google Scholar 

  198. Roch, S.: A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM Trans. Comput. Biol. Bioinform. 3, 92–94 (2006)

    Google Scholar 

  199. Guindon, S., Gascuel, O.: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003)

    Google Scholar 

  200. Zwickl, D.: Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. thesis, The University of Texas at Austin (2006)

    Google Scholar 

  201. Liu, K., Linder, C., Warnow, T.: RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation PLoS ONE 6, e27731 (2012).

    Google Scholar 

  202. Claesson, M.J., Cusack, S., O’Sullivan, O., Greene-Diniz, R., de Weerd, H., et al.: Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proc. Natl. Acad. Sci. 108, 4586–4591 (2011)

    Google Scholar 

  203. McDonald, D., Price, M.N., Goodrich, J., Nawrocki, E.P., DeSantis, T.Z., et al.: An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012)

    Google Scholar 

  204. Boussau, B., Guoy, M.: Efficient likelihood computations with non-reversible models of evolution. Syst. Biol. 55, 756–768 (2006)

    Google Scholar 

  205. Whelan, S., Money, D.: The prevalence of multifurcations in tree-space and their implications for tree-search. Mol. Biol. Evol. 27, 2674–2677 (2010)

    Google Scholar 

  206. Whelan, S., Money, D.: Characterizing the phylogenetic tree-search problem. Syst. Biol. 61, 228–239 (2012)

    Google Scholar 

  207. Ronquist, F., Huelsenbeck, J.: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003)

    Google Scholar 

  208. Drummond, A., Rambaut, A.: BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007)

    Google Scholar 

  209. Lartillot, N., Philippe, H.: A Bayesian mixture model for across-site heterogeneities in the amino acid replacement process. Mol. Biol. Evol. 21 (2004)

    Google Scholar 

  210. Foster, P.: Modeling compositional heterogeneity. Syst. Biol. 53, 485–495 (2004)

    Google Scholar 

  211. Pagel, M., Meade, A.: A phylogenetic mixture model for detecting pattern heterogeneity in gene sequence or character state data. Syst. Biol. 53, 571–581 (2004)

    Google Scholar 

  212. Huelsenbeck, J., Ronquist, R.: MrBayes: Bayesian inference of phylogeny. Bioinformatics 17, 754–755 (2001)

    Google Scholar 

  213. Ronquist, F., Deans, A.: Bayesian phylogenetics and its influence on insect systematics. Annu. Rev. Entomol. 55, 189–206 (2010)

    Google Scholar 

  214. Huelsenbeck, J.P., Ronquist, F., Nielsen, R., Bollback, J.P.: Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294, 2310–2314 (2001)

    Google Scholar 

  215. Holder, M., Lewis, P.: Phylogeny estimation: traditional and Bayesian approaches. Nat. Rev. Genet. 4, 275–284 (2003)

    Google Scholar 

  216. Lewis, P., Holder, M., Holsinger, K.: Polytomies and Bayesian phylogenetic inference. Syst. Biol. 54, 241–253 (2005)

    Google Scholar 

  217. Ganapathy, G., Ramachandran, V., Warnow, T.: On contract-and-refine-transformations between phylogenetic trees. In: ACM/SIAM Symposium on Discrete Algorithms (SODA’04), pp. 893–902. SIAM Press, Philadelphia (2004)

    Google Scholar 

  218. Ganapathy, G., Ramachandran, V., Warnow, T.: Better hill-climbing searches for parsimony. In: Proceedings of the Third International Workshop on Algorithms in Bioinformatics (WABI), pp. 245–258 (2003)

    Google Scholar 

  219. Bonet, M., Steel, M., Warnow, T., Yooseph, S.: Faster algorithms for solving parsimony and compatibility. J. Comput. Biol. 5, 409–422 (1999)

    Google Scholar 

  220. Nixon, K.C.: The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15, 407–414 (1999)

    Google Scholar 

  221. Vos, R.: Accelerated likelihood surface exploration: the likelihood ratchet. Syst. Biol. 52, 368–373 (2003)

    Google Scholar 

  222. Warnow, T., Moret, B.M.E., St John, K.: Absolute phylogeny: true trees from short sequences. In: Proc. 12th Ann. ACM/SIAM Symp. on Discr. Algs., SODA01, pp. 186–195. SIAM Press, Philadelphia (2001)

    Google Scholar 

  223. Nakhleh, L., Roshan, U., St John, K., Sun, J., Warnow, T.: Designing fast converging phylogenetic methods. Bioinformatics 17, 190–198 (2001)

    Google Scholar 

  224. Warnow, T.: Large-scale phylogenetic reconstruction. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology. Chapman and Hall/CRC Computer and Information Science Series, vol. 9. CRC Press, Boca Raton (2005)

    Google Scholar 

  225. Roshan, U., Moret, B., Williams, T., Warnow, T.: Rec-I-DCM3: a fast algorithmic technique for reconstructing large phylogenetic trees. In: Proc. 3rd Computational Systems Biology Conf. (CSB’05). Proceedings of the IEEE, pp. 98–109 (2004)

    Google Scholar 

  226. Steel, M.: The maximum likelihood point for a phylogenetic tree is not unique. Syst. Biol. 43, 560–564 (1994)

    Google Scholar 

  227. Blair, C., Murphy, R.: Recent trends in molecular phylogenetic analysis: where to next? J. Heredity 102, 130–138 (2011)

    Google Scholar 

  228. Nagy, L., Kocsube, S., Csanadi, Z., Kovacs, G., Petkovits, T., et al.: Re-mind the gap! Insertion and deletion data reveal neglected phylogenetic potential of the nuclear ribosomal internal transcribed spacer (its) of fungi. PLoS ONE 7, e49794 (2012).

    Google Scholar 

  229. Barriel, V.: Molecular phylogenies and nucleotide insertion-deletions. C. R. Acad. Sci. III 7, 693–701 (1994)

    Google Scholar 

  230. Young, N., Healy, J.: GapCoder automates the use of indel characters in phylogenetic analysis. BMC Bioinform. 4 (2003)

    Google Scholar 

  231. Muller, K.: Incorporating information from length-mutational events into phylogenetic analysis. Mol. Phylogenet. Evol. 38, 667–676 (2006)

    Google Scholar 

  232. Ogden, T., Rosenberg, M.: How should gaps be treated in parsimony? A comparison of approaches using simulation. Mol. Phylogenet. Evol. 42, 817–826 (2007)

    Google Scholar 

  233. Dwivedi, B., Gadagkar, S.: Phylogenetic inference under varying proportions of indel-induced alignment gaps. BMC Evol. Biol. 9, 211 (2009)

    Google Scholar 

  234. Dessimoz, C., Gil, M.: Phylogenetic assessment of alignments reveals neglected tree signal in gaps. Genome Biol. 11, R37 (2010)

    Google Scholar 

  235. Yuri, T., Kimball, R.T., Harshman, J., Bowie, R.C.K., Braun, M.J., et al.: Parsimony and model-based analyses of indel in avian nuclear genes reveal congruent and incongruent phylogenetic signals. Biology 2, 419–444 (2013)

    Google Scholar 

  236. Warnow, T.: Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent. PLoS Currents Tree of Life (2012)

    Google Scholar 

  237. Daskalakis, C., Roch, S.: Alignment-free phylogenetic reconstruction. In: Berger, B. (ed.) Proc. RECOMB 2010. Lecture Notes in Computer Science, vol. 6044, pp. 123–137. Springer, Berlin (2010). http://dx.doi.org/10.1007/978-3-642-12683-3_9

    Google Scholar 

  238. Thatte, B.: Invertibility of the TKF model of sequence evolution. Math. Biosci. 200, 58–75 (2006)

    MathSciNet  MATH  Google Scholar 

  239. Hartmann, S., Vision, T.: Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a Gappy alignment? BMC Evol. Biol. 8, 95 (2008)

    Google Scholar 

  240. Mirarab, S., Nguyen, N., Warnow, T.: SEPP: SATé-enabled phylogenetic placement. In: Pacific Symposium on Biocomputing, pp. 247–258 (2012)

    Google Scholar 

  241. Matsen, F.A., Kodner, R.B., Armbrust, E.V.: pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinform. 11, 538 (2010)

    Google Scholar 

  242. Berger, S.A., Krompass, D., Stamatakis, A.: Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Syst. Biol. 60, 291–302 (2011)

    Google Scholar 

  243. Eddy, S.: A new generation of homology search tools based on probabilistic inference. Genome Inform. 23, 205–211 (2009)

    Google Scholar 

  244. Finn, R., Clements, J., Eddy, S.: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011)

    Google Scholar 

  245. Brown, D.G., Truskowski, J.: LSHPlace: fast phylogenetic placement using locality-sensitive hashing. In: Pacific Symposium on Biocomputing, vol. 18, pp. 310–319 (2013)

    Google Scholar 

  246. Stark, M., Berger, S., Stamatakis, A., von Mering, C.: MLTreeMap—accurate maximum likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. BMC Genomics 11, 461 (2010)

    Google Scholar 

  247. Droge, J., McHardy, A.: Taxonomic binning of metagenome samples generated by next-generation sequencing technologies. Brief. Bioinform. (2012)

    Google Scholar 

  248. Giribet, G.: Exploring the behavior of POY, a program for direct optimization of molecular data. Cladistics 17, S60–S70 (2001)

    Google Scholar 

  249. Hartigan, J.: Minimum mutation fits to a given tree. Biometrics 29, 53–65 (1973)

    Google Scholar 

  250. Sankoff, D.: Minimal mutation trees of sequences. SIAM J. Appl. Math. 28, 35–42 (1975)

    MathSciNet  MATH  Google Scholar 

  251. Sankoff, D., Cedergren, R.J.: Simultaneous comparison of three or more sequences related by a tree. In: Sankoff, D., Kruskall, J.B. (eds.) Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, pp. 253–263. Addison Wesley, New York (1993)

    Google Scholar 

  252. Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337–348 (1994)

    Google Scholar 

  253. Wang, L., Jiang, T., Lawler, E.: Approximation algorithms for tree alignment with a given phylogeny. Algorithmica 16, 302–315 (1996)

    MathSciNet  Google Scholar 

  254. Wang, L., Gusfield, D.: Improved approximation algorithms for tree alignment. J. Algorithms 25(2), 255–273 (1997)

    MathSciNet  MATH  Google Scholar 

  255. Wang, L., Jiang, T., Gusfield, D.: A more efficient approximation scheme for tree alignment. SIAM J. Comput. 30(1), 283–299 (2000)

    MathSciNet  MATH  Google Scholar 

  256. Liu, K., Warnow, T.: Treelength optimization for phylogeny estimation. PLoS ONE 7, e33104 (2012)

    Google Scholar 

  257. Varón, A., Vinh, L., Bomash, I., Wheeler, W.: POY software. Documentation by Varon, A., Vinh, L.S., Bomash, I., Wheeler, W., Pickett, K., Temkin, I., Faivovich, J., Grant, T., Smith, W.L. Available for download at http://research.amnh.org/scicomp/projects/poy.php (2007)

  258. Kjer, K., Gillespie, J., Ober, K.: Opinions on multiple sequence alignment, and an empirical comparison on repeatability and accuracy between POY and structural alignment. Syst. Biol. 56, 133–146 (2007)

    Google Scholar 

  259. Ogden, T.H., Rosenberg, M.: Alignment and topological accuracy of the direct optimization approach via POY and traditional phylogenetics via ClustalW+PAUP*. Syst. Biol. 56, 182–193 (2007)

    Google Scholar 

  260. Yoshizawa, K.: Direct optimization overly optimizes data. Syst. Entomol. 35, 199–206 (2010)

    Google Scholar 

  261. Wheeler, W., Giribet, G.: Phylogenetic hypotheses and the utility of multiple sequence alignment. In: Rosenberg, M. (ed.) Sequence Alignment: Methods, Models, Concepts and Strategies, pp. 95–104. University of California Press, Berkeley (2009)

    Google Scholar 

  262. Lehtonen, S.: Phylogeny estimation and alignment via POY versus clustal + PAUP*: a response to Ogden and Rosenberg. Syst. Biol. 57, 653–657 (2008)

    Google Scholar 

  263. Liu, K., Nelesen, S., Raghavan, S., Linder, C., Warnow, T.: Barking up the wrong treelength: the impact of gap penalty on alignment and tree accuracy. IEEE/ACM Trans. Comput. Biol. Bioinform. 6, 7–21 (2009)

    Google Scholar 

  264. Gu, X., Li, W.H.: The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment. J. Mol. Evol. 40, 464–473 (1995)

    Google Scholar 

  265. Altschul, S.F.: Generalized affine gap costs for protein sequence alignment. Proteins, Struct. Funct. Genomics 32, 88–96 (1998)

    Google Scholar 

  266. Gill, O., Zhou, Y., Mishra, B.: Aligning sequences with non-affine gap penalty: PLAINS algorithm, a practical implementation, and its biological applications in comparative genomics. In: Proc. ICBA 2004 (2004)

    Google Scholar 

  267. Qian, B., Goldstein, R.: Distribution of indel lengths. Proteins 45, 102–104 (2001)

    Google Scholar 

  268. Chang, M., Benner, S.: Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J. Mol. Biol. 341, 617–631 (2004)

    Google Scholar 

  269. Thorne, J.L., Kishino, H., Felsenstein, J.: An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33, 114–124 (1991)

    Google Scholar 

  270. Thorne, J.L., Kishino, H., Felsenstein, J.: Inching toward reality: an improved likelihood model of sequence evolution. J. Mol. Evol. 34, 3–16 (1992)

    Google Scholar 

  271. Thorne, J.L., Kishino, H., Felsenstein, J.: Erratum, an evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 34, 91–92 (1992)

    Google Scholar 

  272. Rivas, E.: Evolutionary models for insertions and deletions in a probabilistic modeling framework. BMC Bioinform. 6, 30 (2005)

    Google Scholar 

  273. Rivas, E., Eddy, S.: Probabilistic phylogenetic inference with insertions and deletions. PLoS Comput. Biol. 4, e1000172 (2008)

    MathSciNet  Google Scholar 

  274. Holmes, I., Bruno, W.J.: Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17, 803–820 (2001)

    Google Scholar 

  275. Miklós, I., Lunter, G.A., Holmes, I.: A “long indel model” for evolutionary sequence alignment. Mol. Biol. Evol. 21, 529–540 (2004)

    Google Scholar 

  276. Redelings, B., Suchard, M.: Joint Bayesian estimation of alignment and phylogeny. Syst. Biol. 54, 401–418 (2005)

    Google Scholar 

  277. Suchard, M.A., Redelings, B.D.: BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22, 2047–2048 (2006)

    Google Scholar 

  278. Redelings, B., Suchard, M.: Incorporating indel information into phylogeny estimation for rapidly emerging pathogens. BMC Evol. Biol. 7, 40 (2007)

    Google Scholar 

  279. Fleissner, R., Metzler, D., von Haeseler, A.: Simultaneous statistical multiple alignment and phylogeny reconstruction. Syst. Biol. 54, 548–561 (2005)

    Google Scholar 

  280. Novák, A., Miklós, I., Lyngso, R., Hein, J.: StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees. Bioinformatics 24, 2403–2404 (2008)

    Google Scholar 

  281. Lunter, G.A., Miklos, I., Song, Y.S., Hein, J.: An efficient algorithm for statistical multiple alignment on arbitrary phylogenetic trees. J. Comput. Biol. 10, 869–889 (2003)

    Google Scholar 

  282. Lunter, G., Miklós, I., Drummond, A., Jensen, J.L., Hein, J.: Bayesian phylogenetic inference under a statistical indel model. In: Benson, G., Page, R. (eds.) Third International Workshop (WABI 2003). Lecture Notes in Bioinformatics vol. 2812, pp. 228–244. Springer, Berlin (2003)

    Google Scholar 

  283. Lunter, G., Drummond, A., Miklós, I., Hein, J.: Statistical alignment: recent progress, new applications, and challenges. In: Nielsen, R. (ed.) Statistical Methods in Molecular Evolution (Statistics for Biology and Health), pp. 375–406. Springer, Berlin (2005)

    Google Scholar 

  284. Metzler, D.: Statistical alignment based on fragment insertion and deletion models. Bioinformatics 19, 490–499 (2003)

    Google Scholar 

  285. Miklós, I.: Algorithm for statistical alignment of sequences derived from a Poisson sequence length distribution. Discrete Appl. Math. 127, 79–84 (2003)

    MathSciNet  MATH  Google Scholar 

  286. Arunapuram, P., Edvardsson, I., Golden, M., Anderson, J., Novak, A., et al.: StatAlign 2.0: combining statistical alignment with RNA secondary structure prediction. Bioinformatics 29(5), 654–655 (2013)

    Google Scholar 

  287. Lunter, G., Miklós, I., Drummond, A., Jensen, J.L., Hein, J.: Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinform. 6, 83 (2005)

    Google Scholar 

  288. Bouchard-Côté, A., Jordan, M.I.: Evolutionary inference via the Poisson indel process. Proc. Natl. Acad. Sci. 110, 1160–1166 (2013)

    Google Scholar 

  289. Brown, D., Krishnamurthy, N., Sjolander, K.: Automated protein subfamily identification and classification. PLoS Comput. Biol. 3, e160 (2007)

    Google Scholar 

  290. Vinga, S., Almeida, J.: Alignment-free sequence comparison—a review. Bioinformatics 19, 513–523 (2003)

    Google Scholar 

  291. Chan, C., Ragan, M.: Next-generation phylogenomics. Biol. Direct 8 (2013)

    Google Scholar 

  292. Blaisdell, B.: A measure of the similarity of sets of sequences not requiring sequence alignment. Proc. Natl. Acad. Sci. USA 83, 5155–5159 (1986)

    MATH  Google Scholar 

  293. Sims, G., Jun, S.R., Wu, G., Kim, S.H.: Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc. Natl. Acad. Sci. USA 106, 2677–2682 (2009)

    Google Scholar 

  294. Jun, S.R., Sims, G., Wu, G., Kim, S.H.: Whole-proteome phylogeny of prokaryotes by feature frequency profiles: an alignment-free method with optimal feature resolution. Proc. Natl. Acad. Sci. USA 107, 133–138 (2010)

    Google Scholar 

  295. Liu, X., Wan, L., Li, J., Reinert, G., Waterman, M., et al.: New powerful statistics for alignment-free sequence comparison under a pattern transfer model. J. Theor. Biol. 284, 106–116 (2011)

    MathSciNet  Google Scholar 

  296. Yang, K., Zhang, L.: Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction. Nucleic Acids Res. 36, e33 (2008)

    Google Scholar 

  297. Roshan, U., Moret, B.M.E., Williams, T.L., Warnow, T.: Performance of supertree methods on various dataset decompositions. In: Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, pp. 301–328. Kluwer Academic, Dordrecht (2004)

    Google Scholar 

  298. Nelesen, S.: Improved methods for phylogenetics. Ph.D. thesis, The University of Texas at Austin (2009)

    Google Scholar 

  299. Swenson, M.: Phylogenetic supertree methods. Ph.D. thesis, The University of Texas at Austin (2008)

    Google Scholar 

  300. Neves, D., Warnow, T., Sobral, J., Pingali, K.: Parallelizing SuperFine. In: 27th Symposium on Applied Computing (ACM-SAC) (2012)

    Google Scholar 

  301. Cannone, J., Subramanian, S., Schnare, M., Collett, J., D’Souza, L., et al.: The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron and other RNAs. BMC Bioinform. 3 (2002)

    Google Scholar 

  302. Roch, S.: Towards extracting all phylogenetic information from matrices of evolutionary distances. Science 327, 1376–1379 (2010)

    MathSciNet  MATH  Google Scholar 

  303. Darling, A., Mau, B., Blatter, F., Perna, N.: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 14, 1394–1403 (2004)

    Google Scholar 

  304. Darling, A., Mau, B., Perna, N.: progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5, e11147 (2010)

    Google Scholar 

  305. Raphael, B., Zhi, D., Tang, H., Pevzner, P.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14, 2336–2346 (2004)

    Google Scholar 

  306. Dubchak, I., Poliakov, A., Kislyuk, A., Brudno, M.: Multiple whole-genome alignments without a reference organism. Genome Res. 19, 682–689 (2009)

    Google Scholar 

  307. Brudno, M., Do, C., Cooper, G., Kim, M., Davydov, E., et al.: LAGAN and multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731 (2003)

    Google Scholar 

  308. Phuong, T., Do, C., Edgar, R., Batzoglou, S.: Multiple alignment of protein sequences with repeats and rearrangements. Nucleic Acids Res. 34, 5932–5942 (2006)

    Google Scholar 

  309. Paten, B., Earl, D., Nguyen, N., Diekhans, M., Zerbino, D., et al.: Cactus: algorithms for genome multiple sequence alignment. Genome Res. 21, 1512–1528 (2011)

    Google Scholar 

  310. Angiuoli, S., Salzberg, S.: Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics (2011). 10.1093/bioinformatics/btq665

    Google Scholar 

  311. Agren, J., Sundstrom, A., Hafstrom, T., Segerman, B.: Gegenees: fragmented alignment of multiple genomes for determining phylogenomic distances and genetic signatures unique for specified target groups. PLoS ONE 7, e39107 (2012)

    Google Scholar 

  312. Gogarten, J., Doolittle, W., Lawrence, J.: Prokaryotic evolution in light of gene transfer. Mol. Biol. Evol. 19, 2226–2238 (2002)

    Google Scholar 

  313. Gogarten, J., Townsend, J.: Horizontal gene transfer, genome innovation and evolution. Nat. Rev. Microbiol. 3, 679–687 (2005)

    Google Scholar 

  314. Bergthorsson, U., Richardson, A., Young, G., Goertzen, L., Palmer, J.: Massive horizontal transfer of mitochondrial genes from diverse land plant donors to basal angiosperm Amborella. Proc. Natl. Acad. Sci. USA 101, 17,747–17,752 (2004)

    Google Scholar 

  315. Bergthorsson, U., Adams, K., Thomason, B., Palmer, J.: Widespread horizontal transfer of mitochondrial genes in flowering plants. Nature 424, 197–201 (2003)

    Google Scholar 

  316. Wolf, Y., Rogozin, I., Grishin, N., Koonin, E.: Genome trees and the tree of life. Trends Genet. 18, 472–478 (2002)

    Google Scholar 

  317. Koonin, E., Makarova, K., Aravind, L.: Horizontal gene transfer in prokaryotes: quantification and classification. Annu. Rev. Microbiol. 55, 709–742 (2001)

    Google Scholar 

  318. Linder, C., Rieseberg, L.: Reconstructing patterns of reticulate evolution in plants. Am. J. Bot. 91, 1700–1708 (2004)

    Google Scholar 

  319. Sessa, E., Zimmer, E., Givnish, T.: Reticulate evolution on a global scale: a nuclear phylogeny for New World Dryopteris (Dryopteridaceae). Mol. Phylogenet. Evol. 64, 563–581 (2012)

    Google Scholar 

  320. Moody, M., Rieseberg, L.: Sorting through the chaff, nDNA gene trees for phylogenetic inference and hybrid identification of annual sunflowers Helianthus. Mol. Phylogenet. Evol. 64, 145–155 (2012) (sect. Helianthus)

    Google Scholar 

  321. Mindell, D.: The tree of life: metaphor, model, and heuristic device. Syst. Biol. 62(3), 479–489 (2013)

    Google Scholar 

  322. Warnow, T., Evans, S., Ringe, D., Nakhleh, L.: A stochastic model of language evolution that incorporates homoplasy and borrowing. In: Phylogenetic Methods and the Prehistory of Languages, pp. 75–90. Cambridge University Press, Cambridge (2006)

    Google Scholar 

  323. Nakhleh, L., Ringe, D.A., Warnow, T.: Perfect phylogenetic networks: a new methodology for reconstructing the evolutionary history of natural languages. Language 81, 382–420 (2005)

    Google Scholar 

  324. Huson, D., Rupp, R., Scornovacca, C.: Phylogenetic Networks: Concepts, Algorithms and Applications. Cambridge University Press, Cambridge (2010)

    Google Scholar 

  325. Morrison, D.: Introduction to Phylogenetic Networks. RJR Productions, Uppsala (2011)

    Google Scholar 

  326. Nakhleh, L.: Evolutionary phylogenetic networks: models and issues. In: Problem Solving Handbook in Computational Biology and Bioinformatics, pp. 125–158. Springer, Berlin (2011)

    Google Scholar 

  327. van Iersel, L., Kelk, S., Rupp, R., Huson, D.: Phylogenetic networks do not need to be complex: using fewer reticulations to represent conflicting clusters. Bioinformatics 26, i124–i131 (2010)

    Google Scholar 

  328. Wu, Y.: An algorithm for constructing parsimonious hybridization networks with multiple phylogenetic trees. In: Proc. RECOMB (2013)

    Google Scholar 

  329. Jin, G., Nakhleh, L., Snir, S., Tuller, T.: Maximum likelihood of phylogenetic networks. Bioinformatics 22, 2604–2611 (2006)

    Google Scholar 

  330. Jin, G., Nakhleh, L., Snir, S., Tuller, T.: Inferring phylogenetic networks by the maximum parsimony criterion: a case study. Mol. Biol. Evol. 24, 324–337 (2007)

    Google Scholar 

  331. Nakhleh, L., Warnow, T., Linder, C.: Reconstructing reticulate evolution in species—theory and practice. In: Proc. 8th Conf. Comput. Mol. Biol. (RECOMB’04), pp. 337–346. ACM Press, New York (2004)

    Google Scholar 

  332. Nakhleh, L., Ruths, D., Wang, L.S.: RIATA-HGT: a fast and accurate heuristic for reconstructing horizontal gene transfer. In: Proc. 11th Conf. Computing and Combinatorics (COCOON’05). Lecture Notes in Computer Science. Springer, Berlin (2005)

    Google Scholar 

  333. Yu, Y., Than, C., Degnan, J., Nakhleh, L.: Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Syst. Biol. 60, 138–149 (2011)

    Google Scholar 

  334. Lapierre, P., Lasek-Nesselquist, E., Gogarten, J.: The impact of HGT on phylogenomic reconstruction methods. Brief. Bioinform. (2012). 10.1093/bib/bbs050

    Google Scholar 

  335. Roch, S., Snir, S.: Recovering the tree-like trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. In: Proceedings RECOMB 2012 (2012)

    Google Scholar 

  336. Gerard, D., Gibbs, H., Kubatko, L.: Estimating hybridization in the presence of coalescence using phylogenetic intraspecific sampling. BMC Evol. Biol. 11, 291 (2011)

    Google Scholar 

  337. Yu, Y., Degnan, J., Nakhleh, L.: The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genet. 8, e1002660 (2012)

    Google Scholar 

  338. Chowdhury, R., Ramachandran, V.: Cache-oblivious dynamic programming. In: Proc. ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 591–600 (2006)

    Google Scholar 

Download references

Acknowledgements

During the time I wrote the paper, I was a Program Director at the National Science Foundation working on the BigData program; however, the research discussed in this paper took place over a span of many years. This research was therefore supported by U.S. National Science Foundation, Microsoft New England, the Guggenheim Foundation, the David and Lucile Packard Foundation, the Radcliffe Institute for Advanced Study, the Program for Evolutionary Dynamics at Harvard, the David Bruton Jr. Centennial Professorship in Computer Sciences at U.T. Austin, and two Faculty Research Assignments from the University of Texas at Austin.

It makes sense now to tell how some of the work in this paper came about. I was working with Randy Linder (UT-Austin Integrative Biology) on various problems, including large-scale alignment and phylogeny estimation. During our initial attempts to design a fast and accurate co-estimation method, we began by trying to come up with a better solution to the Treelength optimization problem. Our interest in treelength optimization convinced a colleague, Vijaya Ramachandran (UT-Austin Computer Science), to develop a fast exact median calculator [338], which led to an improved treelength estimator; however our subsequent studies [263] suggested that improving the treelength would not lead to improved alignments and trees. This led us to look for other approaches to obtain more accurate alignments and trees from large datasets. Our next attempts considered the impact of guide trees, which gave a small benefit [109], but even iterating in this manner also did not lead to substantial improvements. Finally, we developed SATé, the co-estimation method described earlier. In a very real sense, therefore, much of the work in this chapter was inspired by David Sankoff, since he introduced the treelength optimization problem. And so, I end by thanking David Sankoff for this, as well as many other things.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tandy Warnow .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Warnow, T. (2013). Large-Scale Multiple Sequence Alignment and Phylogeny Estimation. In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds) Models and Algorithms for Genome Evolution. Computational Biology, vol 19. Springer, London. https://doi.org/10.1007/978-1-4471-5298-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5298-9_6

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5297-2

  • Online ISBN: 978-1-4471-5298-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics