Skip to main content

Fast algorithms for aligning sequences with restricted affine gap penalties

  • Session 8: Computational Biology II
  • Conference paper
  • First Online:
  • 129 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1276))

Abstract

Affine gap penalties are generally considered appropriate for aligning DNA and protein sequences. (“Affine” means that a gap of length k is penalized α + k β, i.e., it costs α to open up a gap plus β for each symbol in the gap.) For certain applications, such as aligning a cDNA sequence with a genomic DNA sequence, it might be adequate to use the restricted affine gap penalties which penalize long gaps with a constant penalty. As it turns out, several techniques developed for solving the approximate string matching problem can be utilized to yield efficient algorithms for computing the optimal alignment with restricted affine gap penalties. In particular, efficient algorithms can be derived based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the cost tables. To speedup the computation, the q-gram paradigm can be used to locate the interval in the longer sequence that should be aligned with the shorter sequence. We have implemented the above methods in C on Sun workstations running SunOS Unix. Preliminary experiments show that these approaches are very promising for aligning a cDNA sequence with a genomic DNA sequence.

This work was supported in part by grant R01 LM05110 from the National Library of Medicine, National Institutes of Health, USA, and grant NSC86-2213-E-126-002 from the National Science Council, Taiwan.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S., Gish, W., Miller, W., Myers, E. and Lipman, D. (1990) A basic local alignment search tool. J. Mol. Biol. 215, 403–410.

    Google Scholar 

  2. Baeza-Yates, R. A. and Gonnet, G. H. (1994) Fast string matching with mismatches. Information and Computation 108, 187–199.

    Google Scholar 

  3. Chang, W. I. and Lampe, J. (1992) Theoretical and empirical comparisons of approximate string matching algorithms. Combinatorial Pattern Matching '92, Lecture Notes in Computer Science, 172–181.

    Google Scholar 

  4. Chao, K.-M. (1994) Computing all suboptimal alignments in linear space. Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807,31–42.

    Google Scholar 

  5. Chao, K.-M. and Miller, W. (1995) Linear-space algorithms that build local alignments from fragments. Algorithmica 13, 106–134.

    Google Scholar 

  6. Chao, K.-M., Zhang, J., Ostell, J. and Miller, W. (1995) A local alignment tool for very long DNA sequences. CABIOS 11, 147–153.

    Google Scholar 

  7. Chao, K.-M., Zhang, J., Ostell, J. and Miller, W. (1997) A tool for aligning very similar DNA sequences. CABIOS, 13, 75–80.

    Google Scholar 

  8. Crochemore, M., Czumaj, A., Gaasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W. and Rytter, W. (1994) Speeding up two string-matching algorithms. Algorithmica 12, 247–267.

    Google Scholar 

  9. Daniels, D. L., Plunkett, G., Burland, V. and Blattner, F. R. (1992) Analysis of the Escherichia coli genome: DNA sequence of the region from 84.5 to 86.5 minutes. Science 257, 771–778.

    Google Scholar 

  10. Dermouche, A. (1995) A fast algorithm for string matching with mismatches. Information Processing Letters 55, 105–110.

    Google Scholar 

  11. Gotoh, O. (1982) An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708.

    Google Scholar 

  12. Gotoh, O. (1990) Optimal sequence alignment allowing for long gaps. Bull. Math. Biol. 52, 359–373.

    Google Scholar 

  13. Hardison, R. C., Chao, K.-M., Schwartz, S., Stojanovic, N., Ganetsky, M. and Miller, W. (1994) Globin Gene Server: a prototype E-mail database server featuring extensive multiple alignments and data compilation for electronic genetic analysis. Genomics 21, 344–353.

    Google Scholar 

  14. Huang, X. (1994) On global sequence alignment. CABIOS 10, 227–235.

    Google Scholar 

  15. Kim, J. Y. and Shawe-Taylor, J. (1992) An approximate string-matching algorithm. Theo. Comp. Sci. 92, 107–117.

    Google Scholar 

  16. Landau, G. M., Vishkin, U. and Nussinov, R. (1988) Locating alignments with k differences for nucleotide and amino acid sequences. CABIOS 4, 19–24.

    Google Scholar 

  17. Lewin, B. (1994) Genes V. Oxford University Press.

    Google Scholar 

  18. Myers, E. W. (1986) An O(ND) difference algorithm and its variations. Algorithmica 1, 251–266.

    Google Scholar 

  19. Myers, E. W. and Miller, W. (1988) Optimal alignments in linear space. CABIOS 4,11–17.

    Google Scholar 

  20. Myers, E. W. and Miller, W. (1989) Row replacement algorithms for screen editors. ACM Trans. Program. Lang. Syst. 11, 33–56.

    Google Scholar 

  21. Needleman, S. B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48,443–453.

    Google Scholar 

  22. Pearson, W. R. and Lipman, D. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. 85, 2444–2448.

    Google Scholar 

  23. Plunkett, G., Burland, V., Daniels D. L. and Blattner, F. R. (1993) Analysis of the Escherichia coli genome.III. DNA sequence of the region from 87.2 to 89.2 minutes. Nucleic Acids Res 21, 3391–3398.

    Google Scholar 

  24. Schuler, G.D., Epstein, J.A., Ohkawa, H., and Kans, J.A. (1996) Entrez: Molecular Biology Database and Retrieval System. Methods in Enzymol. 266, 141–162.

    Google Scholar 

  25. Sze, S.-H. and Pevzner, P. A. (1997) Las Vegas algorithms for gene recognition: suboptimal and error-tolerant spliced alignment. Proceedings of the First Annual International Conference on Computational Molecular Biology, 300–309.

    Google Scholar 

  26. Ukkonen, E. (1992) Approximate string-matching with q-grams and maximal matches. Theo. Comp. Sci. 92, 191–211.

    Google Scholar 

  27. Ukkonen, E. and Wood, D. (1993) Approximate string matching with suffix automata. Algorithmica 10, 353–364.

    Google Scholar 

  28. Waterman, M. S. (1984) Efficient sequence alignment algorithms. J. theor. Biol. 108, 333–337.

    Google Scholar 

  29. Wilbur, W. J. and Lipman, D. (1984) The context dependent comparison of biological sequences. SIAM J. Appl. Math. 44, 557–567.

    Google Scholar 

  30. Xu, Y, Mural, R. and Uberbacher, E. C. (1994) Constructing gene models from a set of accurately-predicted exons: an application of dynamic programming. CABIOS 10, 613–623.

    Google Scholar 

  31. Zhang, J., Chao, K.-M., Florea, L. and Miller, W. (1997) Alignment Requirements for NCBI's Genomes Division. First Annual International Conference on Computational Molecular Biology, poster session.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Tao Jiang D. T. Lee

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chao, KM. (1997). Fast algorithms for aligning sequences with restricted affine gap penalties. In: Jiang, T., Lee, D.T. (eds) Computing and Combinatorics. COCOON 1997. Lecture Notes in Computer Science, vol 1276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0045093

Download citation

  • DOI: https://doi.org/10.1007/BFb0045093

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63357-0

  • Online ISBN: 978-3-540-69522-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics