Advertisement

Faster Algorithms for Optimal Multiple Sequence Alignment Based on Pairwise Comparisons

  • Pankaj K. Agarwal
  • Yonatan Bilu
  • Rachel Kolodny
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3692)

Abstract

Multiple Sequence Alignment (MSA) is one of the most fundamental problems in computational molecular biology. The running time of the best known scheme for finding an optimal alignment, based on dynamic programming, increases exponentially with the number of input sequences. Hence, many heuristics were suggested for the problem. We consider the following version of the MSA problem: In a preprocessing stage pairwise alignments are found for every pair of sequences. The goal is to find an optimal alignment in which matches are restricted to positions that were matched at the preprocessing stage. We present several techniques for making the dynamic programming algorithm more efficient, while still finding an optimal solution under these restrictions. Namely, in our formulation the MSA must conform with pairwise (local) alignments, and in return can be solved more efficiently. We prove that it suffices to find an optimal alignment of sequence segments, rather than single letters, thereby reducing the input size and thus improving the running time.

Keywords

Dynamic Programming Multiple Sequence Alignment Optimal Path Dynamic Programming Algorithm Pairwise Alignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Stephen, A.F., Thomas, L.M., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRefGoogle Scholar
  3. 3.
    Bonizzoni, P., Della Vedova, G.: The complexity of multiple sequence alignment with sp-score that is a metric. Theoretical Computer Science 259(1-2), 63–79 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Carrillo, H., Lipman, D.: The multiple sequence alignment problem in biology. SIAM J. Applied Math. 48(5), 1073–1082 (1988)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Corpet, F.: Multiple sequence alignment with hierarchical-clustering. Nucleic Acids Research 16(22), 10881–10890 (1988)CrossRefGoogle Scholar
  6. 6.
    Eppstein, D., Galil, Z., Giancarlo, R., Italiano, G.F.: Sparse dynamic-programming: I. linear cost-functions. JACM 39(3), 519–545 (1992)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Eppstein, D., Galil, Z., Giancarlo, R., Italiano, G.F.: Sparse dynamic-programming: II. convex and concave cost-functions. JACM 39(3), 546–567 (1992)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Garey, M.R., Johnson, D.S.: Computers and Intractability–A Guide to the Theory of NP-completeness. Freeman, San Francisco (1979)zbMATHGoogle Scholar
  9. 9.
    Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89, 10915–10919 (1992)CrossRefGoogle Scholar
  10. 10.
    Jiang, T., Wang, L.: On the complexity of multiple sequence alignment. J. Comp. Biol. 1(4), 337–348 (1994)CrossRefGoogle Scholar
  11. 11.
    Landau, G.M., Crochemore, M., Ziv-Ukelson, M.: A sub-quadratic sequence alignment algorithm for unrestricted cost matrices. In: Proc. 13th Annual ACM-SIAM Sympos. Discrete Algo., pp. 679–688 (2002)Google Scholar
  12. 12.
    Lee, C., Grasso, C., Sharlow, M.F.: Multiple sequence alignment using partial order graphs. Bioinformatics 18(3), 452–464 (2002)CrossRefGoogle Scholar
  13. 13.
    Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. J. Comput. Sys. Sci. 20(1), 18–31 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Morgenstern, B.: Dialign 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15(3), 211–218 (1999)CrossRefGoogle Scholar
  15. 15.
    Morgenstern, B.: A simple and space-efficient fragment-chaining algorithm for alignment of DNA and protein sequences. Applied Math. Lett. 15(1), 11–16 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Morgenstern, B., Dress, A., Werner, T.: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Nat. Acad. Sci. 93(22), 12098–12103 (1996)zbMATHCrossRefGoogle Scholar
  17. 17.
    Murata, M., Richardson, J.S., Sussman, J.L.: Simultaneous comparison of 3 protein sequences. Proc. Nat. Acad. Sci. 82(10), 3073–3077 (1985)CrossRefGoogle Scholar
  18. 18.
    Myers, G., Miller, W.: Chaining multiple-alignment fragments in sub-quadratic time. In: Proc. 6th Annual ACM-SIAM Sympos. Discrete Algo., pp. 38–47 (1995)Google Scholar
  19. 19.
    Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48, 443–453 (1970)CrossRefGoogle Scholar
  20. 20.
    Notredame, C.: Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3(1), 131–144 (2002)CrossRefGoogle Scholar
  21. 21.
    Notredame, C., Higgins, D.G., Heringa, J.: T-coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)CrossRefGoogle Scholar
  22. 22.
    Schuler, G.D., Altschul, S.F., Lipman, D.J.: A workbench for multiple alignment construction and analysis. Proteins-Structure Function And Genetics 9(3), 180–190 (1991)CrossRefGoogle Scholar
  23. 23.
    Schwartz, R.M., Dayhoff, M.O.: Matrices for Detecting Distant Relationships. In: Dayhoff, M.O. (ed.) Atlas of Protein Sequences and Structure, Washington, D.C., USA. National Biomedical Research Foundation, vol. 5(Suppl. 3), pp. 353–358.Google Scholar
  24. 24.
    Smith, T.F., Waterman, M.S.: Comparison of biosequences. Adv. Applied Math. 2(4), 482–489 (1981)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Thompson, J.D., Higgins, D.G., Gibson, T.J.: Clustal-W - improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22(22), 4673–4680 (1994)CrossRefGoogle Scholar
  26. 26.
    Wilbur, W.J., Lipman, D.J.: Rapid similarity searches of nucleic-acid and protein data banks. Proc. Nat. Acad. Sci. 80(3), 726–730 (1983)CrossRefGoogle Scholar
  27. 27.
    Wilbur, W.J., Lipman, D.J.: The context dependent comparison of biological sequences. SIAM J. Applied Math. 44(3), 557–567 (1984)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Pankaj K. Agarwal
    • 1
  • Yonatan Bilu
    • 2
  • Rachel Kolodny
    • 3
  1. 1.Department of Computer ScienceDuke UniversityDurhamUSA
  2. 2.Department of Molecular GeneticsWeizmann InstituteRehovotIsrael
  3. 3.Department of Biochemistry and Molecular BiophysicsColumbia University 

Personalised recommendations