Abstract
We consider the problem of finding all approximate occurrences of a given string q, with at most k differences, in a finite database or dictionary of strings. The strings can be e.g. natural language words, such as the vocabulary of some document or set of documents. This has many important application in both off-line (indexed) and on-line string matching. More precisely, we have a universe \({\mathbb U}\) of strings, and a non-negative distance function \(d: {\mathbb U} \times {\mathbb U} \rightarrow {\mathbb N}\). The distance function is metric, if it satisfies (i) \(d(x,y) = 0 ~ \Leftrightarrow ~ x = y\); (ii) d(x,y) = d(y,x); (iii) d(x,y) ≤ d(x,z) + d(z,y). The last item is called the “triangular inequality”, and is the most important property in our case. Many useful distance functions are known to be metric, in particular edit (Levenshtein) distance is metric, which we will use for d.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Burkhard, W.A., Keller, R.M.: Some approaches to best-match file searching. Commun. ACM 16(4), 230–236 (1973)
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.L.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)
Hyyrö, H., Fredriksson, K., Navarro, G.: Increased bit-parallelism for approximate string matching. In: Ribeiro, C.C., Martins, S.L. (eds.) WEA 2004. LNCS, vol. 3059, pp. 285–298. Springer, Heidelberg (2004)
Vidal, E.: An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognition Letters 4, 145–157 (1986)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fredriksson, K. (2004). Metric Indexes for Approximate String Matching in a Dictionary. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-30213-1_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive