Abstract
We present a new bit-parallel technique for approximate string matching. We build on two previous techniques. The first one [Myers, J. of the ACM, 1999], searches for a pattern of length m in a text of length n permitting k differences in O(mn/w) time, where w is the width of the computer word. The second one [Navarro and Raffinot, ACM JEA, 2000], extends a sublinear-time exact algorithm to approximate searching. The latter technique makes use of an O(kmn/w) time algorithm [Wu and Manber, Comm. ACM, 1992] for its internal workings. This algorithm is slow but flexible enough to support all the required operations. In this paper we show that the faster algorithm of Myers can be adapted to support all those operations. This involves extending it to compute edit distance, to search for any pattern suffix, and to detect in advance the impossibility of a later match. The result is an algorithm that performs better than the original version of Navarro and Raffinot and that is the fastest for several combinations of m, k and alphabet sizes that are useful, for example, in natural language searching and computational biology.
Supported by the Academy of Finland and Tampere Graduate School in Information Science and Engineering.
Partially supported by Fondecyt Project 1-020831.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Baeza-Yates. Text retrieval: Theory and practice. In 12th IFIP World Computer Congress, volume I, pages 465–476. Elsevier Science, 1992.
R. Baeza-Yates. A unified view of string matching algorithms. In Proc. Theory and Practice of Informatics (SOFSEM’96), LNCS 1175, pages 1–15, 1996.
R. Baeza-Yates and G. Navarro. Faster approximate string matching. Algorithmica, 23(2):127–158, 1999.
W. Chang and J. Lampe. Theoretical and empirical comparisons of approximate string matching algorithms. In Proc. 3rd Combinatorial Pattern Matching (CPM’92), LNCS 644, pages 172–181, 1992.
W. Chang and T. Marr. Approximate string matching and local similarity. In Proc. 5th Combinatorial Pattern Matching (CPM’94), LNCS 807, pages 259–273, 1994.
M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, Oxford, UK, 1994.
Z. Galil and K. Park. An improved algorithm for approximate string matching. SIAM Journal on Computing, 19(6):989–999, 1990.
H. Hyyrö. Explaining and extending the bit-parallel algorithm of Myers. Technical Report A-2001-10, University of Tampere, Finland, 2001.
G. Landau and U. Vishkin. Fast parallel and serial approximate string matching. Journal of Algorithms, 10:157–169, 1989.
G. Myers. A fast bit-vector algorithm for approximate string matching based on dynamic progamming. Journal of the ACM, 46(3):395–415, 1999.
G. Navarro. A guided tour to approximate string matching. ACM Computing Surveys, 33(1):31–88, 2001.
G. Navarro and R. Baeza-Yates. Very fast and simple approximate string matching. Information Processing Letters, 72:65–70, 1999.
G. Navarro and R. Baeza-Yates. Improving an algorithm for approximate string matching. Algorithmica, 30(4):473–502, 2001.
G. Navarro and M. Raffinot. Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithmics (JEA), 5(4), 2000.
G. Navarro and M. Raffinot. Flexible Pattern Matching in Strings-Practical on-line search algorithms for texts and biological sequences. Cambridge University Press, 2002. To appear.
P. Sellers. The theory and computation of evolutionary distances: pattern recognition. Journal of Algorithms, 1:359–373, 1980.
E. Sutinen and J. Tarhio. On using q-gram locations in approximate string matching. In Proc. European Symposium on Algorithms (ESA’95), LNCS 979, pages 327–340, 1995.
J. Tarhio and E. Ukkonen. Approximate Boyer-Moore string matching. SIAM Journal on Computing, 22(2):243–260, 1993.
E. Ukkonen. Algorithms for approximate string matching. Information and Control, 64:100–118, 1985.
E. Ukkonen. Finding approximate patterns in strings. Journal of Algorithms, 6:132–137, 1985.
S. Wu and U. Manber. Fast text searching allowing errors. Comm. of the ACM, 35(10):83–91, 1992.
S. Wu, U. Manber, and G. Myers. A sub-quadratic algorithm for approximate limited expression matching. Algorithmica, 15(1):50–67, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hyyrö, H., Navarro, G. (2002). Faster Bit-Parallel Approximate String Matching. In: Apostolico, A., Takeda, M. (eds) Combinatorial Pattern Matching. CPM 2002. Lecture Notes in Computer Science, vol 2373. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45452-7_18
Download citation
DOI: https://doi.org/10.1007/3-540-45452-7_18
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43862-5
Online ISBN: 978-3-540-45452-6
eBook Packages: Springer Book Archive