Skip to main content

Fast approximate dictionary matching

  • Session 7B
  • Conference paper
  • First Online:
  • 167 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1004))

Abstract

In the approximate dictionary matching problem, a dictionary that contains a set of pattern strings is given. The user presents a text string and a tolerance k (k is a positive integer) and asks for all occurrences of all dictionary patterns that appear in the text with at most k differences to the original patterns. We present two algorithms for the problem. The first algorithm assumes that all patterns in the dictionary are of the same length. The second algorithm removes this assumption at the expense of a bit more complicated preprocess of the dictionary and slower query time. The basic idea behind our algorithms is to represent each dictionary pattern with one or two points in a ¦Σ¦ q — dimensional real space under the L 1-metric where Σ is the underlying alphabet and q a fixed integer and then organize these points with some spatial data structure to make subsequent searches with different texts of different lengths and different tolerance values fast. Although the approximate dictionary matching would be of enormous importance in molecular biological applications, no previous results for the problem are known.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. V. Aho and M. Corasick, Efficient string matching: an aid to bibliographic search. Communications of the ACM, June 1975, Vol. 18, No. 6, pp. 333–340

    Google Scholar 

  2. A. Amir and M. Farach, Adaptive dictionary matching. Proc. of the 32nd IEEE Annual Symposium on Foundation of Computer Science, 1991, pp. 760–766

    Google Scholar 

  3. A. Amir, M. Farach, R. Indury, J. A. Poutre and A. Schaeffer, Improved dynamic dictionary matching, Proc. of the fifth Annual ACM-SIAM Symposium on Discrete Algorithms, 1993, pp. 392–400

    Google Scholar 

  4. Amir, M. Farach and Y. Matias, efficient randomized dictionary matching algorithms. Proc. of the 3rd Ann. Symp. on Combinatorial Pattern Matching, 1992

    Google Scholar 

  5. E. Bugnion, T. Roos, F. Shi, P. Widmayer and F. Widmer, Approximate multiple string matching using spatial indexes. in Proc. of the 1st South American Workshop on String Processing, (eds.) R. Baeza-Yates and N. Ziviani, pp. 43–53, 1993

    Google Scholar 

  6. M. L. Fredman, J. Komlos and E. Szemeredi, Storing a sparse table with O(1) worst case access time. Journal of the ACM, 31, 3(1984), 538–544

    Google Scholar 

  7. R. Indury and A. Schaeffer, Dynamic dictionary matching with failure functions. in Proc. of the 3rd Annual Symposium on Combinatorial Pattern Matching, 1992

    Google Scholar 

  8. T. Kohonen and E. Reuhkala, A very fast associative method for the recognition and correction of misspellt words, based on redundant hash-addressing. Proc. 4th Joint Conf. on Pattern Recognition, 1978, Kyoto, Japan, 807–809

    Google Scholar 

  9. G. M. Landau and U. Vishkin, in Proc. 18th ACM Symposium on Theory of Computing, 1986, pp. 220–250

    Google Scholar 

  10. H. Noltmeier, K. Verbarg and C. Zirkelbach, Monotonous bisector* trees — a tool for efficient partitioning of complex scenes of geometric objects. In Data Structures and Efficient Algorithms: Final Report on the DFG Special Joint Initiative Vol. 594 of L.N.C., Spring-Verlag, 1992

    Google Scholar 

  11. O. Owolabi and D. R. McGregor: Fast approximate string matching. Software — Practice and Experience 18(4) (1988), 387–393

    Google Scholar 

  12. C. E. Shannon, A mathematical theory of communications. The Bell Systems Techn. Journal 27 (1948), 379–423

    Google Scholar 

  13. E. Ukkonen, Approximate string matching with q-grams and maximal matches. Report, Department of Computer Science, University of Helsinki, Finland, 1991

    Google Scholar 

  14. K. Verbarg, Räumliche Indizes-Celltrees: Analyse und Vergleich mit Monotonen Bisektorbäumen, Diploma Thesis, Department of Computer Science, University of Würzburg, Germany, 1992

    Google Scholar 

  15. C. K. Wong and A. K. Chandra, Bounds for the string editing problem. Journal of the ACM, vol.23, No.1, January 1976, pp. 13–16

    Google Scholar 

  16. C. Zirkelbach, Monotonous bisector trees and clustering problems, Report, Department of Computer Science, University of Würzburg, Germany, 1990

    Google Scholar 

  17. C. Zirkelbach, Geometrisches Clustern — ein metrischer Ansatz, Dissertation, Department of Computer Science, University of Würzburg, Germany, 1992

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

John Staples Peter Eades Naoki Katoh Alistair Moffat

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shi, F. (1995). Fast approximate dictionary matching. In: Staples, J., Eades, P., Katoh, N., Moffat, A. (eds) Algorithms and Computations. ISAAC 1995. Lecture Notes in Computer Science, vol 1004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0015431

Download citation

  • DOI: https://doi.org/10.1007/BFb0015431

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60573-7

  • Online ISBN: 978-3-540-47766-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics