Abstract
Approximate string matching with k differences is considered. Filtration of the text is a widely adopted technique to reduce the text area processed by dynamic programming. A sublinear filtration algorithm is presented. The method is based on the locations of the q-grams in the pattern. Samples of q-grams are drawn from the text at fixed periods, and only if consecutive samples appear in the pattern approximately in the same configuration, the text area is examined with dynamic programming. Practical experiments show that this approach gives better filtration efficiency than an earlier method.
The work was supported by the Academy of Finland.
Preview
Unable to display preview. Download preview PDF.
References
R. Baeza-Yates and G. Gonnet: A new approach to text searching. Communications of ACM 35, 10 (1992), 74–82.
W. Chang and E. Lawler: Sublinear approximate string matching and biological applications. Algorithmica 12 (1994), 327–344.
W. Chang and T. Marr: Approximate string matching and local similarity. Combinatorial Pattern Matching, Proceedings of 5th Annual Symposium (ed. M. Crochemore and D. Gusfield), Lecture Notes in Computer Science 807, Springer-Verlag, Berlin, 1994, 259–273.
Z. Galil and K. Park: Art improved algorithm for approximate string matching. Proceedings of 16th International Colloquium on Automata, Languages and Programming (ed. M. Chytil et al.), Lecture Notes in Computer Science 372, Springer-Verlag, Berlin, 1989, 394–404.
N. Holsti and E. Sutinen: Approximate string matching using q-gram places. Proc. Seventh Finnish Symposium on Computer Science (ed. M. Penttonen), University of Joensuu, 1994, 23–32.
R. Grossi and F. Luccio: Simple and efficient string matching with k mismatches. Information Processing Letters 33 (1989), 113–120.
P. Jokinen, J. Tarhio, and E. Ukkonen: A comparison of approximate string matching algorithms. Report A-1991-7, Department of Computer Science, University of Helsinki, 1991.
R. Karp and M. Rabin: Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31 (1987), 249–260.
P. Pevzner and M. Waterman: A fast filtration algorithm for substring matching problem. Combinatorial Pattern Matching, Proceedings of 4th Annual Symposium (ed. A. Apostolico et al.), Lecture Notes in Computer Science 684, Springer-Verlag, Berlin, 1993, 197–214.
E. Sutinen and J. Tarhio: Information retrieval based on q-gram locations. In preparation.
T. Takaoka: Approximate pattern matching with samples. Proceedings of ISAAC '94, Lecture Notes in Computer Science 834, Springer-Verlag, Berlin, 1994, 234–242.
J. Tarhio and E. Ukkonen: Approximate Boyer-Moore string matching. SIAM Journal on Computing 22, 2 (1993), 243–260.
E. Ukkonen: Approximate string matching with q-grams and maximal matches. Theoretical Computer Science 92, 1 (1992), 191–211.
E. Ukkonen: Finding approximate patterns in strings. Journal of Algorithms 6 (1985), 132–137.
S. Wu and U. Manber: Fast text searching allowing errors. Communications of ACM 35, 10 (1992), 83–91.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sutinen, E., Tarhio, J. (1995). On using q-gram locations in approximate string matching. In: Spirakis, P. (eds) Algorithms — ESA '95. ESA 1995. Lecture Notes in Computer Science, vol 979. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60313-1_153
Download citation
DOI: https://doi.org/10.1007/3-540-60313-1_153
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60313-9
Online ISBN: 978-3-540-44913-3
eBook Packages: Springer Book Archive