Advertisement

A new regular grammar pattern matching algorithm

  • Bruce W. Watson
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1136)

Abstract

This paper presents a Boyer-Moore type algorithm for regular grammar pattern matching, answering a variant of an open problem posed by A.V. Aho in 1980 [2, p. 342]. The new algorithm handles patterns specified by regular (left linear) grammars — a generalization of the Boyer-Moore (single keyword) and Commentz-Walter (multiple keyword) algorithms, both considered extensively in [17] and [14, Chapter 4]. Like the Boyer-Moore and Commentz-Walter algorithms, the new algorithm makes use of shift functions which can be precomputed and tabulated. The precomputation functions are derived, and it is shown that they can be precomputed from Commentz-Walter's d1 and d2 shift functions.

In most cases, the Boyer-Moore (respectively Commentz-Walter) algorithm has greatly outperformed the Knuth-Morris-Pratt (respectively Aho-Corasick) algorithm (as discussed in [14, Chapter 13]). In testing, an earlier version of the algorithm presented in this paper also frequently outperforms the regular grammar generalization of the Aho-Corasick algorithm.

Keywords

Pattern Match Shift Function Input String Pattern Match Algorithm Reverse Trie 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search, Comm. ACM, 18(6) (1975) 333–340.CrossRefGoogle Scholar
  2. 2.
    Aho, A.V.: Pattern matching in strings, in: Book, R.V., ed., Formal Language Theory: Perspectives and Open Problems. (Academic Press, New York, 1980) 325–347.Google Scholar
  3. 3.
    Aho, A.V.: Algorithms for finding patterns in strings, in: van Leeuwen, J., ed., Handbook of Theoretical Computer Science, Vol. A. (North-Holland, Amsterdam, 1990) 257–300.Google Scholar
  4. 4.
    Baeza-Yates, R.: Efficient Text Searching. (Ph.D dissertation, University of Waterloo, Canada, May 1989).Google Scholar
  5. 5.
    Boyer, R.S., Moore, J.S.: A fast string searching algorithm, Comm. ACM, 20(10) (1977) 62–72.Google Scholar
  6. 6.
    Commentz-Walter, B.: A string matching algorithm fast on the average, in: Maurer, H.A., ed., Proc. 6th Internat. Coll. on Automata, Languages and Programming (Springer-Verlag, Berlin, 1979) 118–132.Google Scholar
  7. 7.
    Commentz-Walter, B.: A string matching algorithm fast on the average, Technical Report TR 79.09.007, IBM Germany, Heidelberg Scientific Center, 1979.Google Scholar
  8. 8.
    Crochemore, M., Rytter, W.: Text Algorithms. (Oxford University Press, Oxford, England, 1994).Google Scholar
  9. 9.
    Fredkin, E.: Trie memory, Comm. ACM 3(9) (1960) 490–499.Google Scholar
  10. 10.
    Gonnet, G.H., Baeza-Yates, R.: Handbook of Algorithms and Data Structures (In Pascal and C). (Addison-Wesley, Reading, MA, 2nd edition, 1991).Google Scholar
  11. 11.
    Hume, S.C., Sunday, D.: Fast string searching, Software—Practice and Experience 21(11) (1991) 1221–1248.Google Scholar
  12. 12.
    Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings, SIAM J. Comput. 6(2) (1977) 323–350.CrossRefGoogle Scholar
  13. 13.
    Watson, B.W.: The performance of single-keyword and multiple-keyword pattern matching algorithms, Computing Science Report 94/19, Eindhoven University of Technology, The Netherlands, 1994. Available for ftp from ftp.win.tue.nl in directory /pub/techreports/pi/pattm/.Google Scholar
  14. 14.
    Watson, B.W.: Taxonomies and Toolkits of Regular Language Algorithms. (Ph.D dissertation, Eindhoven University of Technology, The Netherlands, 1995). Contact the watson@RibbitSoft.com.Google Scholar
  15. 15.
    Watson, B.W.: A Boyer-Moore (or Watson-Watson) type algorithm for regular tree pattern matching, in: Aarts, E.H.L., ten Eikelder, H.M.M., Hemerik, C., Rem, M., eds., Simplex Sigillum Veri: Een Liber Amicorum voor prof.dr. F.E.J. Kruseman Aretz (Eindhoven University of Technology, ISBN 90-386-0197-2, 1995) 315–320.Google Scholar
  16. 16.
    Watson, B.W., Watson, R.E.: A Boyer-Moore type algorithm for regular expression pattern matching, Computing Science Report 94/31, Eindhoven University of Technology, The Netherlands, 1994. Available by e-mail from watson@RibbitSoft.com.Google Scholar
  17. 17.
    Watson, B.W., Zwaan, G.: A taxonomy of keyword pattern matching algorithms, Computing Science Report 92/27, Eindhoven University of Technology, The Netherlands, 1992. Available by e-mail from watson@RibbitSoft.com or wsinswan@win.tue.nl.Google Scholar
  18. 18.
    Watson, B.W., Zwaan, G.: A taxonomy of sublinear multiple keyword pattern matching algorithms, Computing Science Report 95/13, Eindhoven University of Technology, The Netherlands, 1994. Available by e-mail from wsinswan@win.tue.nl.Google Scholar
  19. 19.
    Watson, B.W., Zwaan, G.: A taxonomy of sublinear multiple keyword pattern matching algorithms, to appear in: Science of Computer Programming, (1996).Google Scholar
  20. 20.
    Zwaan, G.: Sublinear pattern matching, in: Aarts, E.H.L., ten Eikelder, H.M.M., Hemerik, C., Rem, M., eds., Simplex Sigillum Veri: Een Liber Amicorum voor prof.dr. F.E.J. Kruseman Aretz (Eindhoven University of Technology, ISBN 90-386-0197-2, 1995) 335–350.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Bruce W. Watson
    • 1
    • 2
  1. 1.Ribbit Software Systems Inc.KelownaCanada
  2. 2.Faculty of Mathematics and Computing ScienceEindhoven University of TechnologyEindhovenThe Netherlands

Personalised recommendations