Fast approximate dictionary matching

Shi, Fei

doi:10.1007/BFb0015431

Fast approximate dictionary matching

Fei Shi¹

Session 7B
Conference paper
First Online: 01 January 2005

167 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1004))

Abstract

In the approximate dictionary matching problem, a dictionary that contains a set of pattern strings is given. The user presents a text string and a tolerance k (k is a positive integer) and asks for all occurrences of all dictionary patterns that appear in the text with at most k differences to the original patterns. We present two algorithms for the problem. The first algorithm assumes that all patterns in the dictionary are of the same length. The second algorithm removes this assumption at the expense of a bit more complicated preprocess of the dictionary and slower query time. The basic idea behind our algorithms is to represent each dictionary pattern with one or two points in a ¦Σ¦ ^q — dimensional real space under the L ₁-metric where Σ is the underlying alphabet and q a fixed integer and then organize these points with some spatial data structure to make subsequent searches with different texts of different lengths and different tolerance values fast. Although the approximate dictionary matching would be of enormous importance in molecular biological applications, no previous results for the problem are known.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

A. V. Aho and M. Corasick, Efficient string matching: an aid to bibliographic search. Communications of the ACM, June 1975, Vol. 18, No. 6, pp. 333–340
Google Scholar
A. Amir and M. Farach, Adaptive dictionary matching. Proc. of the 32nd IEEE Annual Symposium on Foundation of Computer Science, 1991, pp. 760–766
Google Scholar
A. Amir, M. Farach, R. Indury, J. A. Poutre and A. Schaeffer, Improved dynamic dictionary matching, Proc. of the fifth Annual ACM-SIAM Symposium on Discrete Algorithms, 1993, pp. 392–400
Google Scholar
Amir, M. Farach and Y. Matias, efficient randomized dictionary matching algorithms. Proc. of the 3rd Ann. Symp. on Combinatorial Pattern Matching, 1992
Google Scholar
E. Bugnion, T. Roos, F. Shi, P. Widmayer and F. Widmer, Approximate multiple string matching using spatial indexes. in Proc. of the 1st South American Workshop on String Processing, (eds.) R. Baeza-Yates and N. Ziviani, pp. 43–53, 1993
Google Scholar
M. L. Fredman, J. Komlos and E. Szemeredi, Storing a sparse table with O(1) worst case access time. Journal of the ACM, 31, 3(1984), 538–544
Google Scholar
R. Indury and A. Schaeffer, Dynamic dictionary matching with failure functions. in Proc. of the 3rd Annual Symposium on Combinatorial Pattern Matching, 1992
Google Scholar
T. Kohonen and E. Reuhkala, A very fast associative method for the recognition and correction of misspellt words, based on redundant hash-addressing. Proc. 4th Joint Conf. on Pattern Recognition, 1978, Kyoto, Japan, 807–809
Google Scholar
G. M. Landau and U. Vishkin, in Proc. 18th ACM Symposium on Theory of Computing, 1986, pp. 220–250
Google Scholar
H. Noltmeier, K. Verbarg and C. Zirkelbach, Monotonous bisector* trees — a tool for efficient partitioning of complex scenes of geometric objects. In Data Structures and Efficient Algorithms: Final Report on the DFG Special Joint Initiative Vol. 594 of L.N.C., Spring-Verlag, 1992
Google Scholar
O. Owolabi and D. R. McGregor: Fast approximate string matching. Software — Practice and Experience 18(4) (1988), 387–393
Google Scholar
C. E. Shannon, A mathematical theory of communications. The Bell Systems Techn. Journal 27 (1948), 379–423
Google Scholar
E. Ukkonen, Approximate string matching with q-grams and maximal matches. Report, Department of Computer Science, University of Helsinki, Finland, 1991
Google Scholar
K. Verbarg, Räumliche Indizes-Celltrees: Analyse und Vergleich mit Monotonen Bisektorbäumen, Diploma Thesis, Department of Computer Science, University of Würzburg, Germany, 1992
Google Scholar
C. K. Wong and A. K. Chandra, Bounds for the string editing problem. Journal of the ACM, vol.23, No.1, January 1976, pp. 13–16
Google Scholar
C. Zirkelbach, Monotonous bisector trees and clustering problems, Report, Department of Computer Science, University of Würzburg, Germany, 1990
Google Scholar
C. Zirkelbach, Geometrisches Clustern — ein metrischer Ansatz, Dissertation, Department of Computer Science, University of Würzburg, Germany, 1992
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, ETH Zentrum, CH-8092, Zurich, Switzerland
Fei Shi

Authors

Fei Shi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

John Staples Peter Eades Naoki Katoh Alistair Moffat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, F. (1995). Fast approximate dictionary matching. In: Staples, J., Eades, P., Katoh, N., Moffat, A. (eds) Algorithms and Computations. ISAAC 1995. Lecture Notes in Computer Science, vol 1004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0015431

Download citation

DOI: https://doi.org/10.1007/BFb0015431
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60573-7
Online ISBN: 978-3-540-47766-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics