An Improved Algorithm for Sequence Comparison with Block Reversals

  • S. Muthukrishnan
  • S. Cenk Sahinalp
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2286)

Abstract

Given two sequences X and Y that are strings over some alphabet set, we consider the distance d(X, Y ) between them defined to be minimum number of character replacements and block (substring) reversals needed to transform X to Y (or vice versa). This is the “simplest” sequence comparison problem we know of that allows natural block edit operations. Block reversals arise naturally in genomic sequence comparison; they are also of interest in matching music data. We present an improved algorithm for exactly computing the distance d(X, Y ); it takes time O(X log2X), and hence, is near-linear. Trivial approach takes quadratic time and the best known previous algorithm for this problem takes time ω(X log3X).

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [AL+95]
    R. Agarwal, K. Lin, H. Sawhney and K. Shim. Fast similarity search in the presence of noise, scaling and translation in time-series databases. Proc. 21st VLDB conf, 1995.Google Scholar
  2. [GD91]
    M. Gribskov and J. Devereux Sequence Analysis Primer, Stockton Press, 1991.Google Scholar
  3. [HT84]
    D. Harel and R. Tarjan. Fast Algorithms for Finding Nearest Common Ancestors. SIAM J. Comput., 13(2): 338–355, 1984.MATHCrossRefMathSciNetGoogle Scholar
  4. [JKL96]
    M. Jackson, T. Strachan and G. Dover. Human Genome Evolution, Bios Scientific Publishers, 1996.Google Scholar
  5. [KMR72]
    R. Karp, R. Miller and A. Rosenberg, Rapid Identification of Repeated Patterns in Strings, Trees, and Arrays, Proceedings of ACM Symposium on Theory of Computing, (1972).Google Scholar
  6. [LT97]
    D. Lopresti and A. Tomkins. Block Edit Models for Approximate String Matching. Theoretical Computer Science, 181(1): 159–179, 1997.MATHCrossRefMathSciNetGoogle Scholar
  7. [SK83]
    D. Sanko. and J. Kruskal, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, Reading, Mass., 1983.Google Scholar
  8. [Lev66]
    V. I. Levenshtein, Binary Codes Capable of Correcting Deletions, Insertions and Reversals, Cybernetics and Control Theory, 10(8):707–710, 1966.MathSciNetGoogle Scholar
  9. [MS00]
    S. Muthukrishnan and S. C. Sahinalp, Approximate Nearest Neighbors and Sequence Comparison with Block Operations, Proceedings of ACM Symposium on Theory of Computing, 2000.Google Scholar
  10. [Se80]
    P. Sellers, The Theory and Computation of Evolutionary Distances: Pattern Recognition. Journal of Algorithms, 1, (1980):359–373.MATHCrossRefMathSciNetGoogle Scholar
  11. [St88]
    J. A. Storer, Data Compression, Methods and Theory. Computer Science Press, 1988.Google Scholar
  12. [T84]
    W. F. Tichy, The String-to-String Correction Problem with Block Moves. ACM Trans. on Computer Systems, 2(4): 309–321, 1984.CrossRefMathSciNetGoogle Scholar
  13. [ZL77]
    J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data Compression IEEE Trans. on Information Theory, 337–343, 1977.Google Scholar
  14. [W73]
    P. Weiner Linear Pattern Matching Algorithms. Proc. IEEE Foundations of Computer Science (FOCS), 1–11, 1973.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • S. Muthukrishnan
    • 1
  • S. Cenk Sahinalp
    • 2
  1. 1.AT& T Labs - ResearchFlorham Park
  2. 2.Dept of EECS, Dept of Genetics, and Cntr for Computational GenomicsCase Western Reserve UniversityCleveland

Personalised recommendations