Advertisement

Sparse LCS Common Substring Alignment

  • Gad M. Landau
  • Baruch Schieber
  • Michal Ziv-Ukelson
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2676)

Abstract

The “Common Substring Alignment” problem is defined as follows. The input consists of a set of strings S 1, S 2 ... S c, with a common substring appearing at least once in each of them, and a target string T. The goal is to compute similarity of all strings S i with T, without computing the part of the common substring over and over again. In this paper we consider the Common Substring Alignment problem for the LCS (Longest Common Subsequence) similarity metric. Our algorithm gains its efficiency by exploiting the sparsity inherent to the LCS problem. Let Y be the common substring, n be the size of the compared sequences, L y be the length of the LCS of T and Y, denoted |LCS[T, Y]|, and L be max{|LCS[T, S i]|}. Our algorithm consists of an O(nL y) time encoding stage that is executed once per common substring, and an O(L) time alignment stage that is executed once for each appearance of the common substring in each source string. The additional running time depends only on the length of the parts of the strings that are not in any common substring.

Keywords

Common Substring Longe Common Subsequence Longe Common Subsequence Target String Encode Stage 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Apostolico, String editing and longest common subsequences. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, Vol. 2, 361–398, Berlin, 1997. Springer Verlag.Google Scholar
  2. 2.
    Apostolico A., and C. Guerra, The longest common subsequence problem revisited. Algorithmica, 2, 315–336 (1987).zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Aggarwal, A., M. Klawe, S. Moran, P. Shor, and R. Wilber, Geometric Applications of a Matrix-Searching Algorithm, Algorithmica, 2, 195–208 (1987).zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Benson, G., A space efficient algorithm for finding the best nonoverlapping alignment score, Theoretical Computer Science, 145, 357–369 (1995).zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Crochemore, M., G.M. Landau, and M. Ziv-Ukelson, A Sub-quadratic Sequence Alignment Algorithm for Unrestricted Cost Matrices, Proc. Symposium On Discrete Algorithms, 679–688 (2002).Google Scholar
  6. 6.
    Eppstein, D., Z. Galil, R. Giancarlo, and G.F. Italiano, Sparse Dynamic Programming I: Linear Cost Functions, JACM, 39, 546–567 (1992).zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Gusfield, D., Algorithms on Strings, Trees, and Sequences. Cambridge University Press, (1997).Google Scholar
  8. 8.
    Hirshberg, D.S., “Algorithms for the longest common subsequence problem”, JACM, 24(4), 664–675 (1977).CrossRefGoogle Scholar
  9. 9.
    Hunt, J. W. and T. G. Szymanski. “A fast algorithm for computing longest common subsequences.” Communications of the ACM, 20, 350–353 (1977).zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Kannan, S. K., and E. W. Myers, An Algorithm For Locating Non-Overlapping Regions of Maximum Alignment Score, SIAM J. Comput., 25(3), 648–662 (1996).zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Landau, G.M., and M. Ziv-Ukelson, On the Shared Substring Alignment Problem, Proc. 11th Annual ACM-SIAM Symposium on Discrete Algorithms, 804–814 (2000).Google Scholar
  12. 12.
    Landau, G.M., and M. Ziv-Ukelson, On the Common Substring Alignment Problem, Journal of Algorithms, 41(2), 338–359 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Monge, G., Déblai et Remblai, Mémoires de l’Academie des Sciences, Paris (1781).Google Scholar
  14. 14.
    Myers, E. W., “Incremental Alignment Algorithms and their Applications,” Tech. Rep. 86-22, Dept. of Computer Science, U. of Arizona. 1986.Google Scholar
  15. 15.
    Schmidt, J.P., All Highest Scoring Paths In Weighted Grid Graphs and Their Application To Finding All Approximate Repeats In Strings, SIAM J. Comput, 27(4), 972–992 (1998).zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Gad M. Landau
    • 1
    • 2
  • Baruch Schieber
    • 3
  • Michal Ziv-Ukelson
    • 2
    • 3
  1. 1.Department of Computer and Information SciencePolytechnic University, Six MetroTech CenterBrooklyn
  2. 2.Department of Computer ScienceHaifa UniversityHaifaIsrael
  3. 3.IBM T.J. Watson Research CenterYorktown Heights

Personalised recommendations