Abstract
Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over a fixed alphabet A, we can preprocess T and give an \(O(n\sqrt{{\rm log} n})\)-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(m log log n + occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(n log2 n) bits.
The space of our data structure can be further reduced to O(n) if we can afford a slow down factor of logε n, for 0 < ε ≤ 1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A|k m k(k+log log n) + occ) and O(logε n (|A|k m k(k+log log n) + occ)) query time using an \(O(n\sqrt{{\rm log} n})\)-bit and an O(n)-bit indexing data structures, respectively.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. Journal of Algorithms 37(2), 309–325 (2000)
Buchsbaum, A.L., Goodrich, M.T., Westbrook, J.R.: Range searching over tree cross products. In: Paterson, M. (ed.) ESA 2000. LNCS, vol. 1879, pp. 120–131. Springer, Heidelberg (2000)
Cobbs, A.L.: Fast approximate matching using suffix trees. In: Galil, Z., Ukkonen, E. (eds.) CPM 1995. LNCS, vol. 937, pp. 41–54. Springer, Heidelberg (1995)
Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matcing and indexing with errors and don’t cares. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, pp. 91–100 (2004)
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: Proceedings of the 32nd ACM Symposium on Theory of Computing, pp. 397–406 (2000)
Jokinen, P., Ukkonen, E.: Two algorithms for approximate string matching in static texts. In: Tarlecki, A. (ed.) MFCS 1991. LNCS, vol. 520, pp. 240–248. Springer, Heidelberg (1991)
Munro, J.I., Raman, V., Rao, S.S.: Space efficient suffix trees. Journal of Algorithms 39, 205–222 (2001)
Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. Journal of Discrete Algorithms 1(1), 205–239 (2000)
Navarro, G., Sutinen, E., Tanninen, J., Tarhio, J.: Indexing text with approximate q-grams. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 350–363. Springer, Heidelberg (2000)
Rao, S.S.: Time-space trade-offs for compressed suffix arrays. Information Processing Letters 82, 307–311 (2002)
Sadakane, K.: Compressed suffix trees with full functionality. Theory of Computing Systems (accepted)
Trinh, H.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 434–444. Springer, Heidelberg (2004)
Ukkonen, E.: Approximate string-matching over suffix trees. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 228–242. Springer, Heidelberg (1993)
Willard, D.E.: Log-logarithmic worst-case range queries are possible in space θ(n). Information Processing Letters 17, 81–84 (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lam, TW., Sung, WK., Wong, SS. (2005). Improved Approximate String Matching Using Compressed Suffix Data Structures. In: Deng, X., Du, DZ. (eds) Algorithms and Computation. ISAAC 2005. Lecture Notes in Computer Science, vol 3827. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11602613_35
Download citation
DOI: https://doi.org/10.1007/11602613_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30935-2
Online ISBN: 978-3-540-32426-3
eBook Packages: Computer ScienceComputer Science (R0)