Ultra-Large Alignments Using Ensembles of Hidden Markov Models

  • Nam-phuong Nguyen
  • Siavash Mirarab
  • Keerthana Kumar
  • Tandy WarnowEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9029)


Many biological questions rely upon multiple sequence alignments (MSAs) and phylogenetic trees of large datasets. However, accurate MSA estimation is difficult for large datasets, especially when the dataset evolved under high rates of evolution or contains fragmentary sequences.


  1. 1.
    Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792–1797 (2004)CrossRefGoogle Scholar
  2. 2.
    Finn, R.D., Clements, J., Eddy, S.R.: HMMER web server: interactive sequence similarity searching. Nucleic Acids Research 39, W29–W37 (2011)CrossRefGoogle Scholar
  3. 3.
    Katoh, K., Toh, H.: PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23, 372–374 (2007)CrossRefGoogle Scholar
  4. 4.
    Mirarab, S., Nguyen, N., Wang, L.-S., Guo, S., Kim, J., Warnow, T.: PASTA: ultra-large multiple sequence alignment of nucleotide and amino acid sequences. J. Computational Biology (2015)Google Scholar
  5. 5.
    Mirarab, S., Nguyen, N., Warnow, T.: SEPP: SATé-Enabled Phylogenetic Placement. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 247–58, January 2012Google Scholar
  6. 6.
    Mirarab, S., Nguyen, N., Warnow, T.: PASTA: ultra-large multiple sequence alignment. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 177–191. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  7. 7.
    Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree 2 – approximately maximum-likelihood trees for large alignments. PloS One 5(3), e9490 (2010)CrossRefGoogle Scholar
  8. 8.
    Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Sjöding, J., Thompson, J.D., Higgins, D.G.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology, 7(539), October 2011Google Scholar
  9. 9.
    Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics (Oxford, England), pp. 1–2, February 2014Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Nam-phuong Nguyen
    • 1
  • Siavash Mirarab
    • 2
  • Keerthana Kumar
    • 2
  • Tandy Warnow
    • 1
    • 3
    Email author
  1. 1.Carl R. Woese Institute for Genomic BiologyUniversity of Illinois at Urbana-ChampaignChampaignUSA
  2. 2.Department of Computer ScienceUniversity of Texas at AustinAustinUSA
  3. 3.Departments of Bioengineering and Computer ScienceUniversity of Illinois at Urbana-ChampaignChampaignUSA

Personalised recommendations