Aligning coding DNA in the presence of frame-shift errors

  • Lars Arvestad
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1264)


The problem of aligning two DNA sequences with respect to the fact that they are coding for proteins is discussed. Criteria for a good alignment of coding DNA, together with an algorithm that satisfies them, are presented. The algorithm is robust against frame-shifts and forgiving towards silent substitutions. The important choice of objective function is examined and several variants are proposed.


Generalize Substitution Silent Mutation Silent Substitution Dynamic Programming Matrix Ambiguity Code 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    K.-M. Chao. Computing all suboptimal alignments in linear space. In 5th Symposium on Combinatorial Pattern Matching, pages 31–42. Springer-Verlag LNCS 807, 1994.Google Scholar
  2. 2.
    M. O. Dayhoff, R. M. Schwartz, and B. C. Orcott. A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure, 5:345–352, 1978. National Biomedical Research Foundation, Silver Spring, Maryland, USA.Google Scholar
  3. 3.
    O. Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162:705–708, 1982.Google Scholar
  4. 4.
    X. Guan and E. C. Uberbacher. Alignments of DNA and protein sequences containing frameshift errors. Comp. Appl. Bio. Sci., 12(1):31–40, 1996.Google Scholar
  5. 5.
    J. Hein. An algorithm combining DNA and protein alignment. Journal of Theoretical Biology, 167:169–174, 1994.Google Scholar
  6. 6.
    S. Henikoff and J. G. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad.Sci., 89:10915–10919, 1992.Google Scholar
  7. 7.
    D. S. Hirschberg. A linear space algorithm for computing longest common subsequences. Communications of the ACM, 18:341–343, 1975.Google Scholar
  8. 8.
    L. J. Knecht. Alignment and Analysis of Genes Coding for Proteins. PhD thesis, Swiss Federal Institute of Technology, 1996.Google Scholar
  9. 9.
    T. Leitner. Personal communication. Until recently at the Department of Biochemistry, Royal Institute of Technology, Stockholm, now at Los Alamos National Laboratory, USA, Theoretical Biology and Biophysics Group.Google Scholar
  10. 10.
    E. W. Myers and W. Miller. Optimal alignments in linear space. Comp. Appl. Bio. Sci., 4(1):11–17, 1988.Google Scholar
  11. 11.
    H. Peltola, H. Söderlund, and E. Ukkonen. Algorithms for the search of amino acid patterns in nucleic acid sequences. Nuclear Acids Research, 14(1):99–107, 1986.Google Scholar
  12. 12.
    D. Sankoff and J. Kruskal. Time warps, string edits, and macromolecules: The theory and practice of sequence comparison. Addison-Wesley, 1983.Google Scholar
  13. 13.
    P. H. Sellers. On the theory and computation of evolutionary distances. SIAM Journal on Applied Mathematics, 26:787, 1974.Google Scholar
  14. 14.
    D. J. States and D. Botstein. Molecular sequence accuracy and the analysis of protein coding regions. Proc. Natl. Acad.Sci., 88:5518–5522, July 1991.Google Scholar
  15. 15.
    M. S. Waterman. Introduction to computational biology: Maps, sequences and genomes. Chapman & Hall, 1995.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Lars Arvestad
    • 1
  1. 1.Department of Numerical Analysis and Computing ScienceRoyal Institute of TechnologyStockholmSweden

Personalised recommendations