Advertisement

String searching algorithms revisited

  • Ricardo A. Baeza-Yates
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 382)

Abstract

We present bounds for the average case of the Knuth-Morris-Pratt (KMP) algorithm and the Boyer-Moore-Horspool (BMH) algorithm for random text. Experimental results in both random and English text suggests that the bounds are tight. We also present a hybrid algorithm which combines the KMP and BMH algorithms, and which, in practice, is faster than the Boyer-Moore algorithm.

Keywords

Markov Chain Average Case Hybrid Algorithm String Match English Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [AC75]
    A.V. Aho and M. Corasick. Efficient string matching: An aid to bibliographic search. C.ACM, 18(6):333–340, June 1975.CrossRefGoogle Scholar
  2. [AG86]
    A. Apostolico and R. Giancarlo. The Boyer-Moore-Galil string searching strategies revisited. SIAM J on Computing, 15:98–105, 1986.CrossRefGoogle Scholar
  3. [Aho80]
    A.V. Aho. Pattern matching in strings. In R. Book, editor, Formal Language Theory: Perspectives and Open Problems, pages 325–347. Academic Press, London, 1980.Google Scholar
  4. [Bar81]
    G. Barth. An alternative for the implementation of Knuth-Morris-Pratt algorithm. Inf. Proc. Letters, 13:134–137, 1981.Google Scholar
  5. [Bar84]
    G. Barth. An analytical comparison of two string searching algorithms. Inf. Proc. Letters, 18:249–256, 1984.Google Scholar
  6. [BM77]
    R. Boyer and S. Moore. A fast string searching algorithm. C.ACM, 20:762–772, 1977.CrossRefGoogle Scholar
  7. [BY87]
    R. Baeza-Yates. On the average case of string matching algorithms. Technical Report CS-87-66, Department of Computer Science, University of Waterloo, Ontario, Canada, 1987.Google Scholar
  8. [BY89]
    R. Baeza-Yates. Improved string searching. Software-Practice and Experience, 19(3):257–271, 1989.Google Scholar
  9. [BYR88]
    R. Baeza-Yates and M. Regnier. Analysis of Boyer-Moore type string searching algorithms. (document in preparation), 1988.Google Scholar
  10. [Cla82]
    A. Clausing. Kantorovich-type inequalities. The American Mathematical Monthly, 89:314–330, 1982.Google Scholar
  11. [CM65]
    D. Cox and H. Miller. The Theory of Stochastic Processes. Chapman and Hall, London, 1965.Google Scholar
  12. [CW79]
    B. Commentz-Walter. A string matching algorithm fast on the average. In ICALP, volume 6 of Lecture Notes in Computer Science, pages 118–132. Springer-Verlag, 1979.Google Scholar
  13. [DB86]
    G. Davies and S. Bowsher. Algorithms for pattern matching. Software — Practice and Experience, 16:575–601, 1986.Google Scholar
  14. [Gal79]
    Z. Galil. On improving the worst case running time of the Boyer-Moore string matching algorithm. C.ACM, 22:505–508, 1979.Google Scholar
  15. [GO80]
    L. Guibas and A. Odlyzko. A new proof of the linearity of the Boyer-Moore string searching algorithm. SIAM J on Computing, 9:672–682, 1980.Google Scholar
  16. [Hor80]
    N. Horspool. Practical fast searching in strings. Software — Practice and Experience, 10:501–506, 1980.Google Scholar
  17. [KMP77]
    D.E. Knuth, J. Morris, and V. Pratt. Fast pattern matching in strings. SIAM J on Computing, 6:323–350, 1977.Google Scholar
  18. [KR78]
    B. Kernighan and D. Ritchie. The C Programming Language. Prentice-Hall, Englewood Cliffs, NJ, 1978.Google Scholar
  19. [Reg88]
    M. Regnier. Knuth-Morris-Pratt algorithm: An analysis. INRIA, Rocquencourt, France (unpublished), 1988.Google Scholar
  20. [Riv77]
    R. Rivest. On the worst-case behavior of string-searching algorithms. SIAM J on Computing, 6:669–674, 1977.Google Scholar
  21. [Ryt80]
    W. Rytter. A correct preprocessing algorithm for Boyer-Moore string-searching. SIAM J on Computing, 9:509–512, 1980.Google Scholar
  22. [Sch88]
    R. Schaback. On the expected sublinearity of the Boyer-Moore algorithm. SIAM J on Computing, 17:548–658, 1988.Google Scholar
  23. [Sed83]
    R. Sedgewick. Algorithms. Addison-Wesley, Reading, Mass., 1983.Google Scholar
  24. [Smi82]
    G.V. Smit. A comparison of three string matching algorithms. Software — Practice and Experience, 12:57–66, 1982.Google Scholar
  25. [Tak86]
    T. Takaoka. An on-line pattern matching algorithm. Inf. Proc. Letters, 22:329–330, 1986.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1989

Authors and Affiliations

  • Ricardo A. Baeza-Yates
    • 1
  1. 1.Data Structuring Group Department of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations