Skip to main content

Computing all suboptimal alignments in linear space

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 807))

Included in the following conference series:

Abstract

Recently, a new compact representation for suboptimal alignments was proposed by Naor and Brutlag (1993). The kernel of that representation is a minimal directed acyclic graph (DAG) containing all suboptimal alignments. In this paper, we propose a method that computes such a DAG in space linear to the graph size. Let F be the area of the region of the dynamicprogramming matrix bounded by the suboptimal alignments and W the maximum width of that region. For two sequences of lengths M and N, it is shown that the worst-case running time is O(MN+F log log W). To exploit the computed DAG, we employ a variant of Aho-Corasick pattern matching machine (Aho and Corasick, 1975) to locate all occurrences of specified patterns, and then find a path in the DAG that maximizes the sum of the scores of the non-overlapping patterns occurring in it. An example illustrates the utility.

This work was supported by grant RO1 LM05110 from the National Library of Medicine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aho, A. V. and Corasick, M. J. (1975) Efficient string matching: an aid to bibliographic search. Comm. ACM, 18, 333–340.

    Google Scholar 

  • Altschul, S. F. and Lipman, D. J. (1989) Trees, stars, and multiple biological sequence alignment. SIAM J. Appl. Math., 49, 197–209.

    Google Scholar 

  • Carrillo, H., and Lipman, D. J. (1988) The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48, 1073–1082.

    Google Scholar 

  • Chao, K.-M., Hardison, R. C. and Miller, W. (1993) Locating well-conserved regions within a pairwise alignment. CABIOS, 9, 387–396.

    Google Scholar 

  • Gumucio, D. L., Shelton, D. A., Bailey, W. J., Slightom, J. L., and Goodman, M. (1993) Phylogenetic footprinting reveals unexpected complexity in trans factor binding upstream from the ε-globin gene. Proc. Natl. Acad. Sci. USA, 90, 6018–6022.

    Google Scholar 

  • Hardison, R. C., Chao, K.-M., Adamkiewicz, M., Price, D., Jackson, J., Zeigler, T., Stojanovic, N., and Miller, W. (1993) Positive and negative regulatory elements of the rabbit embryonic ε-globin gene revealed by an improved multiple alignment program and functional analysis. DNA Sequence, 4, 163–176.

    Google Scholar 

  • Hirschberg, D. S. (1975) A linear space algorithm for computing maximal common subsequences. Comm. ACM, 18, 341–343.

    Google Scholar 

  • Kececioglu, J. D. (1989) Notes on a multiple sequence alignment cost bound of Carrillo and Lipman. Manuscript.

    Google Scholar 

  • Lawerence, C. B., Goldman, D. A., and Hood, R. T. (1986) Optimized homology searches of the gene and protein sequence data banks. Bull. Math. Biol., 48, 569–583.

    Google Scholar 

  • Myers, E. W. and Miller, W. (1988) Optimal alignments in linear space. CABIOS, 4, 11–17.

    Google Scholar 

  • Myers, E. W. and Miller, W. (1989) Approximate matching of regular expressions. Bull. Math. Biol., 51, 5–37.

    Google Scholar 

  • Naor, D. and Brutlag, D. (1993) On suboptimal alignments of biological sequences. In Proceedings of the 4th Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, 684, 179–196.

    Google Scholar 

  • Saqi, M. and Sternberg, M. (1991) A simple method to generate non-trivial alternative alignments of protein sequences. J. Mol. Biol., 219, 727–732.

    Google Scholar 

  • Tagle, D. A., Koop, B. F., Goodman, M., Slightom, J., Hess, D. L. and Jones, R. T. (1988) Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatus): Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol., 203, 7469–7480.

    Google Scholar 

  • Vingron, M. and Argos, P. (1990) Determination of reliable regions in protein sequence alignment Protein Engineering, 3, 565–569.

    Google Scholar 

  • Waterman, M., and Byers, T. (1985) A dynamic programming algorithm to find all solutions in a neighborhood of the optimum. Math. Biosciences, 77, 179–185.

    Google Scholar 

  • Zuker, M. (1991) Suboptimal sequence alignment in molecular biology: alignment with error analysis. J. Mol. Biol., 221, 403–420.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Maxime Crochemore Dan Gusfield

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chao, K.M. (1994). Computing all suboptimal alignments in linear space. In: Crochemore, M., Gusfield, D. (eds) Combinatorial Pattern Matching. CPM 1994. Lecture Notes in Computer Science, vol 807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58094-8_3

Download citation

  • DOI: https://doi.org/10.1007/3-540-58094-8_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58094-2

  • Online ISBN: 978-3-540-48450-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics