Skip to main content

Multiple Sequence Alignment

  • Protocol
Bioinformatics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 452))

Abstract

Multiple sequence alignment (MSA) has assumed a key role in comparative structure and function analysis of biological sequences. It often leads to fundamental biological insight into sequence-structure-function relationships of nucleotide or protein sequence families. Significant advances have been achieved in this field, and many useful tools have been developed for constructing alignments. It should be stressed, however, that many complex biological and methodological issues are still open. This chapter first provides some background information and considerations associated with MSA techniques, concentrating on the alignment of protein sequences. Then, a practical overview of currently available methods and a description of their specific advantages and limitations are given, so that this chapter might constitute a helpful guide or starting point for researchers who aim to construct a reliable MSA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gribskov, M., McLachlan, A. D., Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 84, 4355–4358.

    Article  PubMed  CAS  Google Scholar 

  2. Haussler, D., Krogh, A., Mian, I. S., et al. (1993) Protein modeling using hidden Markov models: analysis of globins, in Proceedings of the Hawaii International Conference on System Sciences. Los Alamitos, CA: IEEE Computer Society Press.

    Google Scholar 

  3. Bucher, P., Karplus, K., Moeri, N., et al. (1996) A flexible motif search technique based on generalized profiles. Comput Chem 20, 3–23.

    Article  PubMed  CAS  Google Scholar 

  4. Dayhoff, M. O., Schwart, R. M., Orcutt, B. C. (1978) A model of evolutionary change in proteins, in (Dayhoff, M., ed.), Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Washington, DC.

    Google Scholar 

  5. Henikoff, S., Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89, 10915–10919.

    Article  PubMed  CAS  Google Scholar 

  6. Needleman, S. B., Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48, 443–453.

    Article  PubMed  CAS  Google Scholar 

  7. Carillo, H., Lipman, D. J. (1988) The multiple sequence alignment problem in biology. SIAM J Appl Math 48, 1073–1082.

    Article  Google Scholar 

  8. Stoye, J., Moulton, V., Dress, A. W. (1997) DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment. Comput Appl Biosci 13, 625–626.

    PubMed  CAS  Google Scholar 

  9. Feng, D. F., Doolittle, R. F. (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25, 351–360.

    Article  PubMed  CAS  Google Scholar 

  10. Hogeweg, P., Hesper, B. (1984) The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J Mol Evol 20, 175–186.

    Article  PubMed  CAS  Google Scholar 

  11. Gotoh, O. (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264, 823–838.

    Article  PubMed  CAS  Google Scholar 

  12. Altschul, S. F., Gish, W., Miller, W., et al. (1990) Basic local alignment search tool. J Mol Biol 215, 403–410.

    PubMed  CAS  Google Scholar 

  13. Pearson, W. R. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol 183, 63–98.

    Article  PubMed  CAS  Google Scholar 

  14. Heringa, J., Taylor, W. R. (1997) Three-dimensional domain duplication, swapping and stealing. Curr Opin Struct Biol 7, 416–421.

    Article  PubMed  CAS  Google Scholar 

  15. Smith, T. F., Waterman, M. S. (1981) Identification of common molecular subsequences. J Mol Biol 147, 195–197.

    Article  PubMed  CAS  Google Scholar 

  16. Waterman, M. S., Eggert, M. (1987) A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J Mol Biol 197, 723–728.

    Article  PubMed  CAS  Google Scholar 

  17. Thompson, J. D., Plewniak, F., Poch, O. (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15, 87–88.

    Article  PubMed  CAS  Google Scholar 

  18. Heringa, J. (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comput Chem 23, 341–364.

    Article  PubMed  CAS  Google Scholar 

  19. Heringa, J. (2002) Local weighting schemes for protein multiple sequence alignment. Comput Chem 26, 459–477.

    Article  PubMed  CAS  Google Scholar 

  20. Simossis, V. A., Heringa, J. (2005) PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Res 33, W289–294.

    Article  PubMed  CAS  Google Scholar 

  21. Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402.

    Article  PubMed  CAS  Google Scholar 

  22. Kabsch, W., Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637.

    Article  PubMed  CAS  Google Scholar 

  23. Jones, D. T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292, 195–202.

    Article  PubMed  CAS  Google Scholar 

  24. Rost, B., Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232, 584–599.

    Article  PubMed  CAS  Google Scholar 

  25. Lin, K., Simossis, V. A., Taylor, W. Ret al. (2005) A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21, 152–159.

    Article  PubMed  CAS  Google Scholar 

  26. Edgar, R. C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinfor-matics 5, 113.

    Article  Google Scholar 

  27. Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32, 1792–1797.

    Article  PubMed  CAS  Google Scholar 

  28. Edgar, R. C. (2004) Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucleic Acids Res 32, 380–385.

    Article  PubMed  CAS  Google Scholar 

  29. Notredame, C., Higgins, D. G., Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302, 205–217.

    Article  PubMed  CAS  Google Scholar 

  30. Huang, X., Miller, W. (1991) A time-efficient, linear-space local similarity algorithm. Adv Appl Math 12, 337–357.

    Article  Google Scholar 

  31. Thompson, J. D., Higgins, D. G., Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673–4680.

    Article  PubMed  CAS  Google Scholar 

  32. O'Sullivan, O., Suhre, K., Abergel, C., et al. (2004) 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J Mol Biol 340, 385–395.

    Article  PubMed  Google Scholar 

  33. Taylor, W. R., Orengo, C. A. (1989) Protein structure alignment. J Mol Biol 208, 1–22.

    Article  PubMed  CAS  Google Scholar 

  34. Shi, J., Blundell,T. L., Mizuguchi, K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 310, 243–257.

    Article  PubMed  CAS  Google Scholar 

  35. Wallace,I.M.,O'Sullivan,O.,Higgins,D.G., et al. (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res 34, 1692–1699.

    Article  PubMed  CAS  Google Scholar 

  36. Katoh, K, Misawa, K, Kuma, K, et al. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30, 3059–3066.

    Article  PubMed  CAS  Google Scholar 

  37. Katoh, K., Kuma, K, Toh, H., et al. (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33, 511–518.

    Article  PubMed  CAS  Google Scholar 

  38. Gotoh, O. (1995) A weighting system and algorithm for aligning many phylogenetically related sequences. Comput Appl Biosci 11, 543–551.

    PubMed  CAS  Google Scholar 

  39. Altschul, S. F. (1998) Generalized affine gap costs for protein sequence alignment. Proteins 32, 88–96.

    Article  PubMed  CAS  Google Scholar 

  40. Zachariah, M. A., Crooks, G. E., Holbrook, S. R, et al. (2005) A generalized affine gap model significantly improves protein sequence alignment accuracy. Proteins 58, 329–338.

    Article  PubMed  CAS  Google Scholar 

  41. Do, C. B., Mahabhashyam, M. S., Brudno, M., et al. (2005) ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 15, 330–340.

    Article  PubMed  CAS  Google Scholar 

  42. Holmes, I., Durbin, R. (1998) Dynamic programming alignment accuracy. J Comput Biol 5, 493–504.

    Article  PubMed  CAS  Google Scholar 

  43. Zhou, PL, Zhou, Y. (2005) SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21, 3615–3621.

    Article  PubMed  CAS  Google Scholar 

  44. Rost, B. (1999) Twilight zone of protein sequence alignments. Protein Eng 12, 85–94.

    Article  PubMed  CAS  Google Scholar 

  45. Sammeth, M., Heringa, J. (2006) Global multiple-sequence alignment with repeats. Prot Struct Funct Bioinf 64, 263–274.

    Article  CAS  Google Scholar 

  46. Morgenstern, B., Dress, A., Werner, T. (1996) Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc Natl Acad Sci U S A 93, 12098–12103.

    Article  PubMed  CAS  Google Scholar 

  47. Morgenstern, B. (2004) DIALIGN: multiple DNA and protein sequence alignment at BiBiServ. Nucleic Acids Res 32, W33–36.

    Article  PubMed  CAS  Google Scholar 

  48. Krogh, A., Larsson, B., von Heijne, G., et al. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305, 567–580.

    Article  PubMed  CAS  Google Scholar 

  49. Kail, L., Krogh, A., Sonnhammer, E.L. (2004) A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338, 1027–1036.

    Article  Google Scholar 

  50. Clamp, M., Cuff, J., Searle, S. M., et al. (2004) The Jalview Java alignment editor. Bioinformatics 20, 426–427.

    Article  PubMed  CAS  Google Scholar 

  51. Saitou, N., Nei, M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4, 406–425.

    PubMed  CAS  Google Scholar 

  52. Galtier, N., Gouy, M., Gautier, C. (1996) SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. Comput Appl Biosci 12, 543–548.

    PubMed  CAS  Google Scholar 

  53. Li, W.-LL, Graur, D. (1991) Fundamentals of Molecular Evolution. Sinauer, Sunderland, MA.

    Google Scholar 

  54. Gille, C, Frommel, C. (2001) STRAP: editor for STRuctural Alignments of Proteins. Bioinformatics 17, 377–378.

    Article  PubMed  CAS  Google Scholar 

  55. Parry-Smith, D. J., Payne, A. W., Michie, A. D., et al. (1998) CINEMA-a novel colour INteractive editor for multiple alignments. Gene 221, GC57–63.

    Article  PubMed  CAS  Google Scholar 

  56. Attwood, T. K., Beck, M. E., Bleasby, A. J., et al. (1997) Novel developments with the PRINTS protein fingerprint database. Nucleic Acids Res 25, 212–217.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Pirovano, W., Heringa, J. (2008). Multiple Sequence Alignment. In: Keith, J.M. (eds) Bioinformatics. Methods in Molecular Biology™, vol 452. Humana Press. https://doi.org/10.1007/978-1-60327-159-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-159-2_7

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-707-5

  • Online ISBN: 978-1-60327-159-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics