Skip to main content

Heuristic Alignment Methods

  • Protocol
  • First Online:
Multiple Sequence Alignment Methods

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1079))

Abstract

Computation of multiple sequence alignment (MSA) is usually formulated as a combinatory optimization problem of an objective function. Solving the problem for virtually all sensible objective functions is known to be NP-complete implying that some heuristics must be adopted. Several general strategies have been proven effective to obtain accurate MSAs in reasonable computational costs. This chapter is devoted to a brief summary of most successful heuristic approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Carrillo H, Lipman D (1988) The multiple sequence alignment problem in biology. SIAM J Appl Math 48:1073–1082

    Google Scholar 

  2. Gupta SK, Kececioglu JD, Schaffer AA (1995) Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J Comput Biol 2:459–472

    PubMed  CAS  Google Scholar 

  3. Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M (2008) AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 36:D202–D205

    PubMed  CAS  Google Scholar 

  4. Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of protein sequence and structure, vol 5, Suppl. 3. National Biomedical Research Foundation, Silver Spring, MD, pp 345–352

    Google Scholar 

  5. Chiaromonte F, Yap VB, Miller W (2002) Scoring pairwise genomic sequence alignments. In: Altman RB, Dunker AK, Hunter L, Klein TED, Lauderdale K (eds) Pacific symposium on biocomputing. World Scientific, Singapore, pp 115–126

    Google Scholar 

  6. Frith MC, Hamada M, Horton P (2010) Parameters for accurate genome alignment. BMC Bioinformatics 11:80

    PubMed  Google Scholar 

  7. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453

    PubMed  CAS  Google Scholar 

  8. Sellers PH (1974) On the theory and computation of evolutionary distances. SIAM J Appl Math 26:787–793

    Google Scholar 

  9. Waterman MS, Smith TF, Beyer WA (1976) Some biological sequence metrics. Adv Math 20:367–387

    Google Scholar 

  10. Gotoh O (1982) An improved algorithm for matching biological sequences. J Mol Biol 162:705–708

    PubMed  CAS  Google Scholar 

  11. Gotoh O (1990) Optimal sequence alignment allowing for long gaps. Bull Math Biol 52:359–373

    PubMed  CAS  Google Scholar 

  12. Waterman MS, Byers TH (1985) A dynamic-programming algorithm to find all solutions in a neighborhood of the optimum. Math Biosci 77:179–188

    Google Scholar 

  13. Bishop MJ, Thompson EA (1986) Maximum likelihood alignment of DNA sequences. J Mol Biol 190:159–165

    PubMed  CAS  Google Scholar 

  14. Thorne JL, Kishino H, Felsenstein J (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J Mol Evol 33:114–124

    PubMed  CAS  Google Scholar 

  15. Miyazawa S (1995) A reliable sequence alignment method based on probabilities of residue correspondences. Protein Eng 8:999–1009

    PubMed  CAS  Google Scholar 

  16. Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge

    Google Scholar 

  17. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–286

    Google Scholar 

  18. Holmes I, Durbin R (1998) Dynamic programming alignment accuracy. J Comput Biol 5:493–504

    PubMed  CAS  Google Scholar 

  19. Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340

    PubMed  CAS  Google Scholar 

  20. Bradley RK, Roberts A, Smoot M, Juvekar S, Do J, Dewey C, Holmes I, Pachter L (2009) Fast statistical alignment. PLoS Comput Biol 5:e1000392

    PubMed  Google Scholar 

  21. Gotoh O (1990) Consistency of optimal sequence alignments. Bull Math Biol 52:509–525

    PubMed  CAS  Google Scholar 

  22. Kruskal JB, Sankoff D (1983) An anthology of algorithms and concepts for sequence comparison. In: Sankoff D, Kruskal J (eds) Time warps, string edits, and macromolecules: the theory and practice of sequence comparison. Addison-Wesley, Reading, MA, pp 265–310

    Google Scholar 

  23. Notredame C, Holm L, Higgins DG (1998) COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14:407–422

    PubMed  CAS  Google Scholar 

  24. Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217

    PubMed  CAS  Google Scholar 

  25. Kececioglu JD (1993) The maximum weight trace problem in multiple sequence alignment. Lect Notes Comput Sci 684:106–119

    Google Scholar 

  26. Roshan U, Livesay DR (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22:2715–2721

    PubMed  CAS  Google Scholar 

  27. Pei J, Grishin NV (2006) MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Res 34:4364–4374

    PubMed  CAS  Google Scholar 

  28. Liu Y, Schmidt B, Maskell DL (2010) MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26:1958–1964

    PubMed  CAS  Google Scholar 

  29. Paten B, Herrero J, Beal K, Birney E (2009) Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment. Bioinformatics 25:295–301

    PubMed  CAS  Google Scholar 

  30. Paten B, Herrero J, Beal K, Fitzgerald S, Birney E (2008) Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res 18:1814–1828

    PubMed  CAS  Google Scholar 

  31. Hogeweg P, Hesper B (1984) The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J Mol Evol 20:175–186

    PubMed  CAS  Google Scholar 

  32. Kruspe M, Stadler PF (2007) Progressive multiple sequence alignments from triplets. BMC Bioinformatics 8:254

    PubMed  Google Scholar 

  33. Lassmann T, Frings O, Sonnhammer EL (2009) Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acids Res 37:858–865

    PubMed  CAS  Google Scholar 

  34. Loytynoja A, Goldman N (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci U S A 102:10557–10562

    PubMed  Google Scholar 

  35. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soding J, Thompson JD, Higgins DG (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539

    PubMed  Google Scholar 

  36. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680

    PubMed  CAS  Google Scholar 

  37. Blaisdell BE (1986) A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Natl Acad Sci USA 83:5155–5159

    PubMed  CAS  Google Scholar 

  38. Muth R, Manber U (1996) Approximate multiple string search. Lect Notes Comput Sci 1075:75–86

    Google Scholar 

  39. Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8:275–282

    PubMed  CAS  Google Scholar 

  40. Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066

    PubMed  CAS  Google Scholar 

  41. Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113

    PubMed  Google Scholar 

  42. Sneath PHA, Sokal RP (1973) Numerical taxonomy. Freeman, San Francisco, CA

    Google Scholar 

  43. Wheeler TJ, Kececioglu JD (2007) Multiple alignment by aligning alignments. Bioinformatics 23:i559–i568

    PubMed  CAS  Google Scholar 

  44. Plyusnin I, Holm L (2012) Comprehensive comparison of graph based multiple protein sequence alignment strategies. BMC Bioinformatics 13:64

    PubMed  CAS  Google Scholar 

  45. Gronau I, Moran S (2007) Optimal implementations of UPGMA and other common clustering algorithms. Inform Process Lett 104:205–210

    Google Scholar 

  46. Katoh K, Toh H (2007) PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23:372–374

    PubMed  CAS  Google Scholar 

  47. Blackshields G, Sievers F, Shi W, Wilm A, Higgins DG (2010) Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithm Mol Bio 5:21

    Google Scholar 

  48. Gribskov M, McLachlan AD, Eisenberg D (1987) Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A 84:4355–4358

    PubMed  CAS  Google Scholar 

  49. Hein J (1989) A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. Mol Biol Evol 6:649–668

    PubMed  CAS  Google Scholar 

  50. Lee C, Grasso C, Sharlow MF (2002) Multiple sequence alignment using partial order graphs. Bioinformatics 18:452–464

    PubMed  CAS  Google Scholar 

  51. Loytynoja A, Vilella AJ, Goldman N (2012) Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm. Bioinformatics 28:1684–1691

    PubMed  Google Scholar 

  52. Gotoh O (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264:823–838

    PubMed  CAS  Google Scholar 

  53. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797

    PubMed  CAS  Google Scholar 

  54. Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–360

    PubMed  CAS  Google Scholar 

  55. Barton GJ, Sternberg MJ (1987) A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J Mol Biol 198:327–337

    PubMed  CAS  Google Scholar 

  56. Subbiah S, Harrison SC (1989) A method for multiple sequence alignment with gaps. J Mol Biol 209:539–548

    PubMed  CAS  Google Scholar 

  57. Berger MP, Munson PJ (1991) A novel randomized iterative strategy for aligning multiple protein sequences. Comput Appl Biosci 7:479–484

    PubMed  CAS  Google Scholar 

  58. Gotoh O (1993) Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput Appl Biosci 9:361–370

    PubMed  CAS  Google Scholar 

  59. Altschul SF (1989) Gap costs for multiple sequence alignment. J Theor Biol 138:297–309

    PubMed  CAS  Google Scholar 

  60. Altschul SF, Carroll RJ, Lipman DJ (1989) Weights for data related by a tree. J Mol Biol 207:647–653

    PubMed  CAS  Google Scholar 

  61. Gotoh O (1994) Further improvement in methods of group-to-group sequence alignment with generalized profile operations. Comput Appl Biosci 10:379–387

    PubMed  CAS  Google Scholar 

  62. Ma B, Wang Z, Zhang K (2003) Alignment between two multiple alignments. Lect Notes Comput Sci 2676:254–265

    Google Scholar 

  63. Gotoh O (1999) Multiple sequence alignment: algorithms and applications. Adv Biophys 36:159–206

    PubMed  CAS  Google Scholar 

  64. Kececioglu JD, Starrett D (2004) Aligning alignments exactly. In: Gusfield D, Bourne P, Istrail S, Pevzner P, Waterman M (eds) Proceedings of the 8th ACM conference on computational molecular biology (RECOMB). ACM Press, New York, pp 85–96

    Google Scholar 

  65. Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33:511–518

    PubMed  CAS  Google Scholar 

  66. Yamada S, Gotoh O, Yamana H (2006) Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost. BMC Bioinformatics 7:524

    PubMed  Google Scholar 

  67. Soding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21:951–960

    PubMed  Google Scholar 

  68. Edgar RC, Sjolander K (2004) A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 20:1301–1308

    PubMed  CAS  Google Scholar 

  69. Wang G, Dunbrack RL Jr (2004) Scoring profile-to-profile sequence alignments. Protein Sci 13:1612–1626

    PubMed  CAS  Google Scholar 

  70. Altschul SF, Wootton JC, Zaslavsky E, Yu YK (2010) The construction and use of log-odds substitution scores for multiple sequence alignment. PLoS Comput Biol 6:e1000852

    PubMed  Google Scholar 

  71. Edgar RC (2009) Optimizing substitution matrix choice and gap parameters for sequence alignment. BMC Bioinformatics 10:396

    PubMed  Google Scholar 

  72. Muller T, Spang R, Vingron M (2002) Estimating amino acid substitution models: a comparison of Dayhoff’s estimator, the resolvent approach and a maximum likelihood method. Mol Biol Evol 19:8–13

    PubMed  CAS  Google Scholar 

  73. Hirosawa M, Totoki Y, Hoshida M, Ishikawa M (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Comput Appl Biosci 11:13–18

    PubMed  CAS  Google Scholar 

  74. Gotoh O (1995) A weighting system and algorithm for aligning many phylogenetically related sequences. Comput Appl Biosci 11:543–551

    PubMed  CAS  Google Scholar 

  75. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    PubMed  CAS  Google Scholar 

  76. Kent WJ (2002) BLAT–the BLAST-like alignment tool. Genome Res 12:656–664

    PubMed  CAS  Google Scholar 

  77. Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL (1999) Alignment of whole genomes. Nucleic Acids Res 27:2369–2376

    PubMed  CAS  Google Scholar 

  78. Darling AC, Mau B, Blattner FR, Perna NT (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14:1394–1403

    PubMed  CAS  Google Scholar 

  79. Hohl M, Kurtz S, Ohlebusch E (2002) Efficient multiple genome alignment. Bioinformatics 18(Suppl 1):S312–S320

    PubMed  Google Scholar 

  80. Choi JH, Cho HG, Kim S (2005) GAME: a simple and efficient whole genome alignment method using maximal exact match filtering. Comput Biol Chem 29:244–253

    PubMed  CAS  Google Scholar 

  81. Kryukov K, Saitou N (2010) MISHIMA–a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data. BMC Bioinformatics 11:142

    PubMed  Google Scholar 

  82. Crochemore M, Hancart C, Lecroq T (2007) Algorithms on strings. Cambridge University Press, Cambridge

    Google Scholar 

  83. Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res 13:721–731

    PubMed  CAS  Google Scholar 

  84. Bray N, Pachter L (2004) MAVID: constrained ancestral alignment of multiple sequences. Genome Res 14:693–699

    PubMed  CAS  Google Scholar 

  85. Batzoglou S, Pachter L, Mesirov JP, Berger B, Lander ES (2000) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res 10:950–958

    PubMed  CAS  Google Scholar 

  86. Bray N, Dubchak I, Pachter L (2003) AVID: a global alignment program. Genome Res 13:97–102

    PubMed  CAS  Google Scholar 

  87. Morgenstern B (1999) DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15:211–218

    PubMed  CAS  Google Scholar 

  88. Rausch T, Emde AK, Weese D, Doring A, Notredame C, Reinert K (2008) Segment-based multiple sequence alignment. Bioinformatics 24:i187–i192

    PubMed  Google Scholar 

  89. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    PubMed  CAS  Google Scholar 

  90. Schwartz AS, Pachter L (2007) Multiple alignment by sequence annealing. Bioinformatics 23:e24–e29

    PubMed  CAS  Google Scholar 

  91. Sahraeian SM, Yoon BJ (2010) PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences. Nucleic Acids Res 38:4917–4928

    PubMed  CAS  Google Scholar 

  92. Thompson JD, Thierry JC, Poch O (2003) RASCAL: rapid scanning and correction of multiple sequence alignments. Bioinformatics 19:1155–1161

    PubMed  CAS  Google Scholar 

  93. Yamada S, Gotoh O, Yamana H (2009) Improvement in speed and accuracy of multiple sequence alignment program PRIME. Inform Media Tech 4:317–327

    Google Scholar 

  94. Sadreyev RI, Baker D, Grishin NV (2003) Profile-profile comparisons by COMPASS predict intricate homologies between protein families. Protein Sci 12:2262–2272

    PubMed  CAS  Google Scholar 

  95. Tomii K, Akiyama Y (2004) FORTE: a profile-profile comparison tool for protein fold recognition. Bioinformatics 20:594–595

    PubMed  CAS  Google Scholar 

  96. Soding J, Remmert M (2011) Protein sequence comparison and fold recognition: progress and good-practice benchmarking. Curr Opin Struct Biol 21:404–411

    PubMed  Google Scholar 

  97. Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202

    PubMed  CAS  Google Scholar 

  98. Rost B, Sander C (1994) Conservation and prediction of solvent accessibility in protein families. Proteins 20:216–226

    PubMed  CAS  Google Scholar 

  99. Simossis VA, Heringa J (2005) PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Res 33:W289–W294

    PubMed  CAS  Google Scholar 

  100. Zhou H, Zhou Y (2005) SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21:3615–3621

    PubMed  CAS  Google Scholar 

  101. Pei J, Sadreyev R, Grishin NV (2003) PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics 19:427–428

    PubMed  CAS  Google Scholar 

  102. Pei J, Grishin NV (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23:802–808

    PubMed  CAS  Google Scholar 

  103. Papadopoulos JS, Agarwala R (2007) COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23:1073–1079

    PubMed  CAS  Google Scholar 

  104. O’Sullivan O, Suhre K, Abergel C, Higgins DG, Notredame C (2004) 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J Mol Biol 340:385–395

    PubMed  Google Scholar 

  105. Pei J, Kim BH, Grishin NV (2008) PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res 36:2295–2300

    PubMed  CAS  Google Scholar 

  106. Smith TF, Waterman MS, Fitch WM (1981) Comparative biosequence metrics. J Mol Evol 18:38–46

    PubMed  CAS  Google Scholar 

  107. Sellers PH (1980) The theory and computation of evolutionary distances: pattern recognition. J Algorithm 1:359–373

    Google Scholar 

  108. Hamada M, Asai K (2012) A classification of bioinformatics algorithms from the viewpoint of maximizing expected accuracy (MEA). J Comput Biol 19:532–549

    PubMed  CAS  Google Scholar 

Download references

Acknowledgment

I thank Dr. Kentaro Tomii for instruction about protein fold recognition methods. This work was partly supported by Kakenhi (Grant-in-Aid for Scientific Research) B (grant number 22310124) from the Ministry of Education, Culture, Sports, Science and Technology of Japan.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Gotoh, O. (2014). Heuristic Alignment Methods. In: Russell, D. (eds) Multiple Sequence Alignment Methods. Methods in Molecular Biology, vol 1079. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-646-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-646-7_2

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-645-0

  • Online ISBN: 978-1-62703-646-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics