Fast algorithms for aligning sequences with restricted affine gap penalties

Chao, Kun-Mao

doi:10.1007/BFb0045093

Fast algorithms for aligning sequences with restricted affine gap penalties

Kun-Mao Chao¹

Session 8: Computational Biology II
Conference paper
First Online: 01 January 2006

129 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1276))

Abstract

Affine gap penalties are generally considered appropriate for aligning DNA and protein sequences. (“Affine” means that a gap of length k is penalized α + k β, i.e., it costs α to open up a gap plus β for each symbol in the gap.) For certain applications, such as aligning a cDNA sequence with a genomic DNA sequence, it might be adequate to use the restricted affine gap penalties which penalize long gaps with a constant penalty. As it turns out, several techniques developed for solving the approximate string matching problem can be utilized to yield efficient algorithms for computing the optimal alignment with restricted affine gap penalties. In particular, efficient algorithms can be derived based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the cost tables. To speedup the computation, the q-gram paradigm can be used to locate the interval in the longer sequence that should be aligned with the shorter sequence. We have implemented the above methods in C on Sun workstations running SunOS Unix. Preliminary experiments show that these approaches are very promising for aligning a cDNA sequence with a genomic DNA sequence.

This work was supported in part by grant R01 LM05110 from the National Library of Medicine, National Institutes of Health, USA, and grant NSC86-2213-E-126-002 from the National Science Council, Taiwan.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

Altschul, S., Gish, W., Miller, W., Myers, E. and Lipman, D. (1990) A basic local alignment search tool. J. Mol. Biol. 215, 403–410.
Google Scholar
Baeza-Yates, R. A. and Gonnet, G. H. (1994) Fast string matching with mismatches. Information and Computation 108, 187–199.
Google Scholar
Chang, W. I. and Lampe, J. (1992) Theoretical and empirical comparisons of approximate string matching algorithms. Combinatorial Pattern Matching '92, Lecture Notes in Computer Science, 172–181.
Google Scholar
Chao, K.-M. (1994) Computing all suboptimal alignments in linear space. Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807,31–42.
Google Scholar
Chao, K.-M. and Miller, W. (1995) Linear-space algorithms that build local alignments from fragments. Algorithmica 13, 106–134.
Google Scholar
Chao, K.-M., Zhang, J., Ostell, J. and Miller, W. (1995) A local alignment tool for very long DNA sequences. CABIOS 11, 147–153.
Google Scholar
Chao, K.-M., Zhang, J., Ostell, J. and Miller, W. (1997) A tool for aligning very similar DNA sequences. CABIOS, 13, 75–80.
Google Scholar
Crochemore, M., Czumaj, A., Gaasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W. and Rytter, W. (1994) Speeding up two string-matching algorithms. Algorithmica 12, 247–267.
Google Scholar
Daniels, D. L., Plunkett, G., Burland, V. and Blattner, F. R. (1992) Analysis of the Escherichia coli genome: DNA sequence of the region from 84.5 to 86.5 minutes. Science 257, 771–778.
Google Scholar
Dermouche, A. (1995) A fast algorithm for string matching with mismatches. Information Processing Letters 55, 105–110.
Google Scholar
Gotoh, O. (1982) An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708.
Google Scholar
Gotoh, O. (1990) Optimal sequence alignment allowing for long gaps. Bull. Math. Biol. 52, 359–373.
Google Scholar
Hardison, R. C., Chao, K.-M., Schwartz, S., Stojanovic, N., Ganetsky, M. and Miller, W. (1994) Globin Gene Server: a prototype E-mail database server featuring extensive multiple alignments and data compilation for electronic genetic analysis. Genomics 21, 344–353.
Google Scholar
Huang, X. (1994) On global sequence alignment. CABIOS 10, 227–235.
Google Scholar
Kim, J. Y. and Shawe-Taylor, J. (1992) An approximate string-matching algorithm. Theo. Comp. Sci. 92, 107–117.
Google Scholar
Landau, G. M., Vishkin, U. and Nussinov, R. (1988) Locating alignments with k differences for nucleotide and amino acid sequences. CABIOS 4, 19–24.
Google Scholar
Lewin, B. (1994) Genes V. Oxford University Press.
Google Scholar
Myers, E. W. (1986) An O(ND) difference algorithm and its variations. Algorithmica 1, 251–266.
Google Scholar
Myers, E. W. and Miller, W. (1988) Optimal alignments in linear space. CABIOS 4,11–17.
Google Scholar
Myers, E. W. and Miller, W. (1989) Row replacement algorithms for screen editors. ACM Trans. Program. Lang. Syst. 11, 33–56.
Google Scholar
Needleman, S. B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48,443–453.
Google Scholar
Pearson, W. R. and Lipman, D. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. 85, 2444–2448.
Google Scholar
Plunkett, G., Burland, V., Daniels D. L. and Blattner, F. R. (1993) Analysis of the Escherichia coli genome.III. DNA sequence of the region from 87.2 to 89.2 minutes. Nucleic Acids Res 21, 3391–3398.
Google Scholar
Schuler, G.D., Epstein, J.A., Ohkawa, H., and Kans, J.A. (1996) Entrez: Molecular Biology Database and Retrieval System. Methods in Enzymol. 266, 141–162.
Google Scholar
Sze, S.-H. and Pevzner, P. A. (1997) Las Vegas algorithms for gene recognition: suboptimal and error-tolerant spliced alignment. Proceedings of the First Annual International Conference on Computational Molecular Biology, 300–309.
Google Scholar
Ukkonen, E. (1992) Approximate string-matching with q-grams and maximal matches. Theo. Comp. Sci. 92, 191–211.
Google Scholar
Ukkonen, E. and Wood, D. (1993) Approximate string matching with suffix automata. Algorithmica 10, 353–364.
Google Scholar
Waterman, M. S. (1984) Efficient sequence alignment algorithms. J. theor. Biol. 108, 333–337.
Google Scholar
Wilbur, W. J. and Lipman, D. (1984) The context dependent comparison of biological sequences. SIAM J. Appl. Math. 44, 557–567.
Google Scholar
Xu, Y, Mural, R. and Uberbacher, E. C. (1994) Constructing gene models from a set of accurately-predicted exons: an application of dynamic programming. CABIOS 10, 613–623.
Google Scholar
Zhang, J., Chao, K.-M., Florea, L. and Miller, W. (1997) Alignment Requirements for NCBI's Genomes Division. First Annual International Conference on Computational Molecular Biology, poster session.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Management Providence University Shalu, 43309, Taichung, Taiwan
Kun-Mao Chao

Authors

Kun-Mao Chao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Tao Jiang D. T. Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chao, KM. (1997). Fast algorithms for aligning sequences with restricted affine gap penalties. In: Jiang, T., Lee, D.T. (eds) Computing and Combinatorics. COCOON 1997. Lecture Notes in Computer Science, vol 1276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0045093

Download citation

DOI: https://doi.org/10.1007/BFb0045093
Published: 24 January 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63357-0
Online ISBN: 978-3-540-69522-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics