Skip to main content

Faster Bit-Parallel Approximate String Matching

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2373))

Included in the following conference series:

Abstract

We present a new bit-parallel technique for approximate string matching. We build on two previous techniques. The first one [Myers, J. of the ACM, 1999], searches for a pattern of length m in a text of length n permitting k differences in O(mn/w) time, where w is the width of the computer word. The second one [Navarro and Raffinot, ACM JEA, 2000], extends a sublinear-time exact algorithm to approximate searching. The latter technique makes use of an O(kmn/w) time algorithm [Wu and Manber, Comm. ACM, 1992] for its internal workings. This algorithm is slow but flexible enough to support all the required operations. In this paper we show that the faster algorithm of Myers can be adapted to support all those operations. This involves extending it to compute edit distance, to search for any pattern suffix, and to detect in advance the impossibility of a later match. The result is an algorithm that performs better than the original version of Navarro and Raffinot and that is the fastest for several combinations of m, k and alphabet sizes that are useful, for example, in natural language searching and computational biology.

Supported by the Academy of Finland and Tampere Graduate School in Information Science and Engineering.

Partially supported by Fondecyt Project 1-020831.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Baeza-Yates. Text retrieval: Theory and practice. In 12th IFIP World Computer Congress, volume I, pages 465–476. Elsevier Science, 1992.

    Google Scholar 

  2. R. Baeza-Yates. A unified view of string matching algorithms. In Proc. Theory and Practice of Informatics (SOFSEM’96), LNCS 1175, pages 1–15, 1996.

    Chapter  Google Scholar 

  3. R. Baeza-Yates and G. Navarro. Faster approximate string matching. Algorithmica, 23(2):127–158, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  4. W. Chang and J. Lampe. Theoretical and empirical comparisons of approximate string matching algorithms. In Proc. 3rd Combinatorial Pattern Matching (CPM’92), LNCS 644, pages 172–181, 1992.

    Google Scholar 

  5. W. Chang and T. Marr. Approximate string matching and local similarity. In Proc. 5th Combinatorial Pattern Matching (CPM’94), LNCS 807, pages 259–273, 1994.

    Google Scholar 

  6. M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, Oxford, UK, 1994.

    MATH  Google Scholar 

  7. Z. Galil and K. Park. An improved algorithm for approximate string matching. SIAM Journal on Computing, 19(6):989–999, 1990.

    Article  MATH  MathSciNet  Google Scholar 

  8. H. Hyyrö. Explaining and extending the bit-parallel algorithm of Myers. Technical Report A-2001-10, University of Tampere, Finland, 2001.

    Google Scholar 

  9. G. Landau and U. Vishkin. Fast parallel and serial approximate string matching. Journal of Algorithms, 10:157–169, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  10. G. Myers. A fast bit-vector algorithm for approximate string matching based on dynamic progamming. Journal of the ACM, 46(3):395–415, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  11. G. Navarro. A guided tour to approximate string matching. ACM Computing Surveys, 33(1):31–88, 2001.

    Article  Google Scholar 

  12. G. Navarro and R. Baeza-Yates. Very fast and simple approximate string matching. Information Processing Letters, 72:65–70, 1999.

    Article  MathSciNet  Google Scholar 

  13. G. Navarro and R. Baeza-Yates. Improving an algorithm for approximate string matching. Algorithmica, 30(4):473–502, 2001.

    Article  MATH  MathSciNet  Google Scholar 

  14. G. Navarro and M. Raffinot. Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithmics (JEA), 5(4), 2000.

    Google Scholar 

  15. G. Navarro and M. Raffinot. Flexible Pattern Matching in Strings-Practical on-line search algorithms for texts and biological sequences. Cambridge University Press, 2002. To appear.

    Google Scholar 

  16. P. Sellers. The theory and computation of evolutionary distances: pattern recognition. Journal of Algorithms, 1:359–373, 1980.

    Article  MATH  MathSciNet  Google Scholar 

  17. E. Sutinen and J. Tarhio. On using q-gram locations in approximate string matching. In Proc. European Symposium on Algorithms (ESA’95), LNCS 979, pages 327–340, 1995.

    Google Scholar 

  18. J. Tarhio and E. Ukkonen. Approximate Boyer-Moore string matching. SIAM Journal on Computing, 22(2):243–260, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  19. E. Ukkonen. Algorithms for approximate string matching. Information and Control, 64:100–118, 1985.

    Article  MATH  MathSciNet  Google Scholar 

  20. E. Ukkonen. Finding approximate patterns in strings. Journal of Algorithms, 6:132–137, 1985.

    Article  MATH  MathSciNet  Google Scholar 

  21. S. Wu and U. Manber. Fast text searching allowing errors. Comm. of the ACM, 35(10):83–91, 1992.

    Article  Google Scholar 

  22. S. Wu, U. Manber, and G. Myers. A sub-quadratic algorithm for approximate limited expression matching. Algorithmica, 15(1):50–67, 1996.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hyyrö, H., Navarro, G. (2002). Faster Bit-Parallel Approximate String Matching. In: Apostolico, A., Takeda, M. (eds) Combinatorial Pattern Matching. CPM 2002. Lecture Notes in Computer Science, vol 2373. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45452-7_18

Download citation

  • DOI: https://doi.org/10.1007/3-540-45452-7_18

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43862-5

  • Online ISBN: 978-3-540-45452-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics