Skip to main content

Improved Approximate String Matching Using Compressed Suffix Data Structures

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3827))

Abstract

Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over a fixed alphabet A, we can preprocess T and give an \(O(n\sqrt{{\rm log} n})\)-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(m log log n + occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(n log2 n) bits.

The space of our data structure can be further reduced to O(n) if we can afford a slow down factor of logε n, for 0 < ε ≤ 1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A|k m k(k+log log n) + occ) and O(logε n (|A|k m k(k+log log n) + occ)) query time using an \(O(n\sqrt{{\rm log} n})\)-bit and an O(n)-bit indexing data structures, respectively.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. Journal of Algorithms 37(2), 309–325 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  2. Buchsbaum, A.L., Goodrich, M.T., Westbrook, J.R.: Range searching over tree cross products. In: Paterson, M. (ed.) ESA 2000. LNCS, vol. 1879, pp. 120–131. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  3. Cobbs, A.L.: Fast approximate matching using suffix trees. In: Galil, Z., Ukkonen, E. (eds.) CPM 1995. LNCS, vol. 937, pp. 41–54. Springer, Heidelberg (1995)

    Google Scholar 

  4. Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matcing and indexing with errors and don’t cares. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, pp. 91–100 (2004)

    Google Scholar 

  5. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: Proceedings of the 32nd ACM Symposium on Theory of Computing, pp. 397–406 (2000)

    Google Scholar 

  6. Jokinen, P., Ukkonen, E.: Two algorithms for approximate string matching in static texts. In: Tarlecki, A. (ed.) MFCS 1991. LNCS, vol. 520, pp. 240–248. Springer, Heidelberg (1991)

    Google Scholar 

  7. Munro, J.I., Raman, V., Rao, S.S.: Space efficient suffix trees. Journal of Algorithms 39, 205–222 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  8. Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. Journal of Discrete Algorithms 1(1), 205–239 (2000)

    MathSciNet  Google Scholar 

  9. Navarro, G., Sutinen, E., Tanninen, J., Tarhio, J.: Indexing text with approximate q-grams. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 350–363. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  10. Rao, S.S.: Time-space trade-offs for compressed suffix arrays. Information Processing Letters 82, 307–311 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  11. Sadakane, K.: Compressed suffix trees with full functionality. Theory of Computing Systems (accepted)

    Google Scholar 

  12. Trinh, H.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 434–444. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  13. Ukkonen, E.: Approximate string-matching over suffix trees. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 228–242. Springer, Heidelberg (1993)

    Chapter  Google Scholar 

  14. Willard, D.E.: Log-logarithmic worst-case range queries are possible in space θ(n). Information Processing Letters 17, 81–84 (1983)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lam, TW., Sung, WK., Wong, SS. (2005). Improved Approximate String Matching Using Compressed Suffix Data Structures. In: Deng, X., Du, DZ. (eds) Algorithms and Computation. ISAAC 2005. Lecture Notes in Computer Science, vol 3827. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11602613_35

Download citation

  • DOI: https://doi.org/10.1007/11602613_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30935-2

  • Online ISBN: 978-3-540-32426-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics