Advertisement

Speeding up two string-matching algorithms

  • Maxime Crochemore
  • Thierry Lecroq
  • Artur Czumaj
  • Leszek Gasieniec
  • Stefan Jarominek
  • Wojciech Plandowski
  • Wojciech Rytter
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 577)

Abstract

We show how to speed up two string-matching algorithms: the Boyer-Moore algorithm (BM algorithm) and its version called here the reversed-factor algorithm (the RF algorithm). The RF algorithm is based on factor graphs for the reverse of the pattern. The main feature of both algorithms is that they scan the text right-to-left from the supposed right position of the pattern, BM algorithm goes as far as the scanned segment is a suffix of the pattern, while the RF algorithm is scanning while it is a factor of the pattern. Then they make a shift of the pattern, forget the history and start again. The RF algorithm usually makes bigger shifts than BM, but is quadratic in the worst case. We show that it is enough to remember the last matched segment to speed up considerably the RF algorithm (to make linear number of comparisons with small coefficient) and to speed up BM algorithm with match-shifts (to make at most 2.n comparisons). Only a constant additional memory is needed for the search phase. We give alternative versions of an accelerated algorithm RF: the first one is based on combinatorial properties of primitive words, and two others use extensively the power of suffix trees.

Keywords

Factor Graph Suffix Tree Search Phase Constant Memory Primitive Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Ah 90]
    A.V. Aho, Algorithms for finding patterns in strings, in: (J. van Leeuwen, editor, Handbook of Theoretical Computer Science, vol A, Algorithms and complexity, Elsevier, Amsterdam, 1990) 255–300.Google Scholar
  2. [Ap 85]
    A. Apostolico, The myriad virtues of suffix trees, in: (A. Apostolico, Z. Galil, editors, Combinatorial Algorithms on Words, NATO Advanced Science Institutes, Series F, vol. 12, Springer-Verlag, Berlin, 1985) 85–96.Google Scholar
  3. [AG 86]
    A. Apostolico, R. Giancarlo, The Boyer-Moore-Galil string searching strategies revisited, SIAM J.Comput. 15 (1986) 98–105.Google Scholar
  4. [BR 91]
    R.A. Baeza-Yates, M. Régnier, Average running time of the Boyer-Moore-Horspool algorithm, Theoret. Comput. Sci. (1991) to appear.Google Scholar
  5. [BBEHCS 85]
    A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, M.T. Chen, J. Seiferas, The smallest automaton recognizing the subwords of a text, Theoret. Comput. Sci. 40 (1985) 31–55.Google Scholar
  6. [BKR 91]
    L. Banachowski, A. Kreczmar, W. Rytter, Analysis of algorithms and data structures, Addison Wesley, 1991.Google Scholar
  7. [BM 77]
    R.S. Boyer, J.S. Moore, A fast string searching algorithm, Comm. ACM 20 (1977) 762–772.Google Scholar
  8. [Co 90]
    R. Cole, Tight bounds on the complexity of the Boyer-Moore pattern matching algorithm, in: (2nd annual ACM Symp. on Discrete Algorithms, 1991) 224–233Google Scholar
  9. [Cr 86]
    M. Crochemore, Transducers and repetitions, Theoret. Comput. Sci. 45 (1986) 63–86.Google Scholar
  10. [Ga 79]
    Z. Galil, On improving the worst case running time of the Boyer-Moore string searching algorithm, Comm. ACM 22 (1979) 505–508.Google Scholar
  11. [GO 80]
    L.J. Guibas, A.M. Odlyzko, A new proof of the linearity of the Boyer-Moore string searching algorithm, SIAM J.Comput. 9 (1980) 672–682.Google Scholar
  12. [KMP 77]
    D.E. Knuth, J.H. Morris Jr, V.R. Pratt, Fast pattern matching in strings, SIAM J.Comput. 6 (1977) 323–350.Google Scholar
  13. [Le 91]
    T. Lecroq, A variation on Boyer-Moore algorithm, Theoret. Comput. Sci. (1991) to appear.Google Scholar
  14. [Ry 80]
    W. Rytter, A correct preprocessing algorithm for Boyer-Moore string searching, SIAM J.Comput. 9 (1980) 509–512.Google Scholar
  15. [Ya 79]
    A.C. Yao, The complexity of pattern matching for a random string, SIAM J.Comput. 8 (1979) 368–387.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1992

Authors and Affiliations

  • Maxime Crochemore
    • 1
  • Thierry Lecroq
    • 1
  • Artur Czumaj
    • 2
  • Leszek Gasieniec
    • 2
  • Stefan Jarominek
    • 2
  • Wojciech Plandowski
    • 2
  • Wojciech Rytter
    • 2
  1. 1.LITP, Institut Blaise PascalUniversité Paris 7Paris Cedex 05France
  2. 2.Institute of InformaticsWarsaw UniversityWarsaw 59Poland

Personalised recommendations