An Improved Algorithm for Sequence Comparison with Block Reversals
Given two sequences X and Y that are strings over some alphabet set, we consider the distance d(X, Y ) between them defined to be minimum number of character replacements and block (substring) reversals needed to transform X to Y (or vice versa). This is the “simplest” sequence comparison problem we know of that allows natural block edit operations. Block reversals arise naturally in genomic sequence comparison; they are also of interest in matching music data. We present an improved algorithm for exactly computing the distance d(X, Y ); it takes time O(X log2 X), and hence, is near-linear. Trivial approach takes quadratic time and the best known previous algorithm for this problem takes time ω(X log3 X).
KeywordsEdit Distance Improve Algorithm Edit Operation Character Replacement Approximate String Match
Unable to display preview. Download preview PDF.
- [AL+95]R. Agarwal, K. Lin, H. Sawhney and K. Shim. Fast similarity search in the presence of noise, scaling and translation in time-series databases. Proc. 21st VLDB conf, 1995.Google Scholar
- [GD91]M. Gribskov and J. Devereux Sequence Analysis Primer, Stockton Press, 1991.Google Scholar
- [JKL96]M. Jackson, T. Strachan and G. Dover. Human Genome Evolution, Bios Scientific Publishers, 1996.Google Scholar
- [KMR72]R. Karp, R. Miller and A. Rosenberg, Rapid Identification of Repeated Patterns in Strings, Trees, and Arrays, Proceedings of ACM Symposium on Theory of Computing, (1972).Google Scholar
- [SK83]D. Sanko. and J. Kruskal, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, Reading, Mass., 1983.Google Scholar
- [MS00]S. Muthukrishnan and S. C. Sahinalp, Approximate Nearest Neighbors and Sequence Comparison with Block Operations, Proceedings of ACM Symposium on Theory of Computing, 2000.Google Scholar
- [St88]J. A. Storer, Data Compression, Methods and Theory. Computer Science Press, 1988.Google Scholar
- [ZL77]J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data Compression IEEE Trans. on Information Theory, 337–343, 1977.Google Scholar
- [W73]P. Weiner Linear Pattern Matching Algorithms. Proc. IEEE Foundations of Computer Science (FOCS), 1–11, 1973.Google Scholar