Advertisement

A Sequential Recursive Implementation of Dead-Zone Single Keyword Pattern Matching

  • Bruce W. Watson
  • Derrick G. Kourie
  • Tinus Strauss
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7643)

Abstract

Earlier publications provided an abstract specification of a family of single keyword pattern matching algorithms [18] which search unexamined portions of the text in a divide-and-conquer fashion, generating dead-zones in the text as they progress. These dead zones are area of text that require no further examination. Here the results are described of implementing in C++ a sequential recursive version of the algorithm family, where all instances of a single keyword p in a text S are sought—the online keyword matching problem where S may not be precomputed.

We show that each step may involve a window shift of up to \(2 \times |p|\index{set cardinality}-1\) characters—almost twice as much (and therefore potentially almost twice as fast) as the maximum of \(|p|\index{set cardinality}\) characters possible with the Boyer-Moore family of algorithms. Our counterintuitive improvement over Boyer-Moore algorithms is achieved by simultaneously shifting left and right. Ongoing benchmarking shows [12] that such bidirectional shifts are highly efficient—and we make specific comparisons here to Horspool’s algorithm [9], regarded as one of the most efficient algorithms of the Boyer-Moore family.

Keywords

Shift Function Live High Pattern Match Algorithm Abstract Algorithm Modern Processor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berry, T., Ravindran, S.: A fast string matching algorithm and experimental results. In: Holub, J., Simánek, M. (eds.) Proceedings of the Prague Stringology Club Workshop 1999, pp. 16–26. No. Collaborative Report DC-99-05, Czech Technical University, Prague, Czech Republic (1999)Google Scholar
  2. 2.
    Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of the ACM 20(10), 62–72 (1977)CrossRefGoogle Scholar
  3. 3.
    Charras, C., Lecroq, T.: Handbook of exact string matching algorithms. Kings College Publications (2004)Google Scholar
  4. 4.
    Cleophas, L., Watson, B.W., Zwaan, G.: A new taxonomy of sublinear right-to-left scanning keyword pattern matching algorithms. Science of Computer Programming 75, 1095–1112 (2010)zbMATHCrossRefGoogle Scholar
  5. 5.
    Cleophas, L.G., Watson, B.W.: Taxonomy-Based Software Construction of SPARE Time: A case study. IEE Proceedings — Software 152(1), 29–37 (2005)CrossRefGoogle Scholar
  6. 6.
    Crochemore, M.A., Rytter, W.: Text Algorithms. Oxford University Press (1994)Google Scholar
  7. 7.
    Crochemore, M.A., Rytter, W.: Jewels of Stringology. World Scientific Publishing Company (2003)Google Scholar
  8. 8.
    Faro, S., Lecroq, T.: 2001–2010: Ten years of exact string matching algorithms. In: Holub, J., Žďárek, J. (eds.) Proceedings of the Prague Stringology Conference 2011, pp. 1–2. Czech Technical University in Prague, Czech Republic (2011)Google Scholar
  9. 9.
    Horspool, R.N.: Practical fast searching in strings. Software — Practice & Experience 10(6), 501–506 (1980)CrossRefGoogle Scholar
  10. 10.
    Knuth, D.E., Morris, J., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal of Computing 6(2), 323–350 (1977)MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Kourie, D.G., Watson, B.W.: The Correctness-by-Construction Approach to Programming. Springer (2012)Google Scholar
  12. 12.
    Mauch, M., Watson, B.W., Kourie, D.G., Strauss, T.: Performance assessment of dead-zone single keyword pattern matching. In: Kroeze, J. (ed.) Proceedings of the South African Institute of Computer Scientists and Information Technologists Conference, Pretoria, South Africa (October 2012)Google Scholar
  13. 13.
    Meyer, B.: Object-Oriented Software Construction, 2nd edn. Addison-Wesley (1998)Google Scholar
  14. 14.
    Smyth, W.F.: Computing Patterns in Strings. Addison-Wesley (2003)Google Scholar
  15. 15.
    Watson, B.W.: Taxonomies and Toolkits of Regular Language Algorithms. Ph.D dissertation. Eindhoven University of Technology, Eindhoven, Netherlands (1995)Google Scholar
  16. 16.
    Watson, B.W., Cleophas, L.: SPARE Parts: A C++ toolkit for String Pattern Recognition. Software — Practice & Experience 34(7), 697–710 (2004)CrossRefGoogle Scholar
  17. 17.
    Watson, B.W., Watson, R.E.: A new family of string pattern matching algorithms. In: Holub, J. (ed.) Proceedings of the Second Prague Stringologic Workshop, pp. 12–23. Czech Technical University, Prague, Czech Republic (July 1997)Google Scholar
  18. 18.
    Watson, B.W., Watson, R.E.: A new family of string pattern matching algorithms. South African Computer Journal 30, 34–41 (2003); for rapid access, A reprint of this article appears on www.fastar.org. This journal remains the appropriate citation referenceGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Bruce W. Watson
    • 1
  • Derrick G. Kourie
    • 2
  • Tinus Strauss
    • 2
  1. 1.FASTAR, Dept. of InformaticsStellenbosch UniversitySouth Africa
  2. 2.FASTAR, Dept. of Computer ScienceUniversity of PretoriaSouth Africa

Personalised recommendations