Improved Approximate String Matching Using Compressed Suffix Data Structures

Lam, Tak-Wah; Sung, Wing-Kin; Wong, Swee-Seong

doi:10.1007/11602613_35

Improved Approximate String Matching Using Compressed Suffix Data Structures

Tak-Wah Lam¹⁸,
Wing-Kin Sung¹⁹ &
Swee-Seong Wong¹⁹

Conference paper

1287 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3827))

Abstract

Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over a fixed alphabet A, we can preprocess T and give an \(O(n\sqrt{{\rm log} n})\)-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(m log log n + occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(n log² n) bits.

The space of our data structure can be further reduced to O(n) if we can afford a slow down factor of log^ε n, for 0 < ε ≤ 1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A|^k m ^k(k+log log n) + occ) and O(log^ε n (|A|^k m ^k(k+log log n) + occ)) query time using an \(O(n\sqrt{{\rm log} n})\)-bit and an O(n)-bit indexing data structures, respectively.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. Journal of Algorithms 37(2), 309–325 (2000)
Article MATH MathSciNet Google Scholar
Buchsbaum, A.L., Goodrich, M.T., Westbrook, J.R.: Range searching over tree cross products. In: Paterson, M. (ed.) ESA 2000. LNCS, vol. 1879, pp. 120–131. Springer, Heidelberg (2000)
Chapter Google Scholar
Cobbs, A.L.: Fast approximate matching using suffix trees. In: Galil, Z., Ukkonen, E. (eds.) CPM 1995. LNCS, vol. 937, pp. 41–54. Springer, Heidelberg (1995)
Google Scholar
Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matcing and indexing with errors and don’t cares. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, pp. 91–100 (2004)
Google Scholar
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: Proceedings of the 32nd ACM Symposium on Theory of Computing, pp. 397–406 (2000)
Google Scholar
Jokinen, P., Ukkonen, E.: Two algorithms for approximate string matching in static texts. In: Tarlecki, A. (ed.) MFCS 1991. LNCS, vol. 520, pp. 240–248. Springer, Heidelberg (1991)
Google Scholar
Munro, J.I., Raman, V., Rao, S.S.: Space efficient suffix trees. Journal of Algorithms 39, 205–222 (2001)
Article MATH MathSciNet Google Scholar
Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. Journal of Discrete Algorithms 1(1), 205–239 (2000)
MathSciNet Google Scholar
Navarro, G., Sutinen, E., Tanninen, J., Tarhio, J.: Indexing text with approximate q-grams. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 350–363. Springer, Heidelberg (2000)
Chapter Google Scholar
Rao, S.S.: Time-space trade-offs for compressed suffix arrays. Information Processing Letters 82, 307–311 (2002)
Article MATH MathSciNet Google Scholar
Sadakane, K.: Compressed suffix trees with full functionality. Theory of Computing Systems (accepted)
Google Scholar
Trinh, H.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 434–444. Springer, Heidelberg (2004)
Chapter Google Scholar
Ukkonen, E.: Approximate string-matching over suffix trees. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 228–242. Springer, Heidelberg (1993)
Chapter Google Scholar
Willard, D.E.: Log-logarithmic worst-case range queries are possible in space θ(n). Information Processing Letters 17, 81–84 (1983)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of HongKong, HongKong
Tak-Wah Lam
School of Computing, National University of Singapore, Singapore
Wing-Kin Sung & Swee-Seong Wong

Authors

Tak-Wah Lam
View author publications
You can also search for this author in PubMed Google Scholar
Wing-Kin Sung
View author publications
You can also search for this author in PubMed Google Scholar
Swee-Seong Wong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon , Hong Kong
Xiaotie Deng
Department of Computer Science, University of Texas at Dallas, EE/CS Building, 75083, Richardson, TX, USA
Ding-Zhu Du

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lam, TW., Sung, WK., Wong, SS. (2005). Improved Approximate String Matching Using Compressed Suffix Data Structures. In: Deng, X., Du, DZ. (eds) Algorithms and Computation. ISAAC 2005. Lecture Notes in Computer Science, vol 3827. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11602613_35

Download citation

DOI: https://doi.org/10.1007/11602613_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30935-2
Online ISBN: 978-3-540-32426-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics