Skip to main content

Automaton-Based Sublinear Keyword Pattern Matching

  • Conference paper
String Processing and Information Retrieval (SPIRE 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3246))

Included in the following conference series:

Abstract

We show how automaton-based sublinear keyword pattern matching (skpm) algorithms appearing in the literature can be seen as different instantiations of a general automaton-based skpm algorithm skeleton. Such algorithms use finite automata (FA) for efficient computation of string membership in a certain language. The algorithms were formally derived as part of a new skpm algorithm taxonomy, based on an earlier suffix-based skpm algorithm taxonomy [1]. Such a taxonomy is based on deriving the algorithms from a common starting point by successively adding algorithm and problem details and has a number of advantages. It provides correctness arguments, clarifies the working of the algorithms and their interrelationships, helps in implementing the algorithms, and may lead to new algorithms being discovered by finding gaps in the taxonomy. We show how to arrive at the general algorithm skeleton and derive some instantiations, leading to well-known factor- and factor oracle-based algorithms. In doing so, we show the shift functions used for them can be (strengthenings of) shift functions used for suffix-based algorithms. This also results in a number of previously undescribed factor-based skpm algorithm variants, whose performance remains to be investigated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Watson, B.W., Zwaan, G.: A taxonomy of sublinear multiple keyword pattern matching algorithms. Science of Computer Programming 27, 85–118 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  2. Watson, B.W.: Taxonomies and Toolkits of Regular Language Algorithms. PhD thesis, Faculty of Computing Science, Technische Universiteit Eindhoven (1995)

    Google Scholar 

  3. Crochemore, M., Rytter, W.: Jewels of Stringology - Text Algorithms. World Scientific Publishing, Singapore (2003)

    MATH  Google Scholar 

  4. Apostolico, A., Galil, Z.: Pattern Matching Algorithms. Oxford University Press, Oxford (1997)

    MATH  Google Scholar 

  5. Crochemore, M., Czumaj, A., Gasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding up two string matching algorithms. Algorithmica 12, 247–267 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  6. Allauzen, C., Crochemore, M., Raffinot, M.: Efficient Experimental String Matching by Weak Factor Recognition. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 51–72. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  7. Allauzen, C., Raffinot, M.: Oracle des facteurs d’un ensemble de mots. Technical Report 99-11, Institut Gaspard-Monge, Université de Marne-la-Vallée (1999)

    Google Scholar 

  8. Cleophas, L.G.: Towards SPARE Time: A New Taxonomy and Toolkit of Keyword Pattern Matching Algorithms. Master’s thesis, Department of Mathematics and Computer Science, Technische Universiteit Eindhoven (2003)

    Google Scholar 

  9. Cleophas, L., Watson, B.W., Zwaan, G.: A new taxonomy of sublinear keyword pattern matching algorithms. Technical Report 04/07, Department of Mathematics and Computer Science, Technische Universiteit Eindhoven (2004)

    Google Scholar 

  10. Watson, B.W., Cleophas, L.: SPARE Parts: A C++ toolkit for String PAttern REcognition. Software – Practice & Experience 34, 697–710 (2004)

    Article  Google Scholar 

  11. Jonkers, H.: Abstraction, specification and implementation techniques, with an application to garbage collection. Technical Report 166, Mathematisch Centrum, Amsterdam (1983)

    Google Scholar 

  12. Watson, B.W.: Constructing minimal acyclic deterministic finite automata. PhD thesis, Department of Computer Science, University of Pretoria (2004)

    Google Scholar 

  13. Barla-Szabo, G.: A taxonomy of graph representations. Master’s thesis, Department of Computer Science, University of Pretoria (2002)

    Google Scholar 

  14. Dijkstra, E.W., Scholten, C.S.: Predicate Calculus and Program Semantics. Springer, New York (1990)

    MATH  Google Scholar 

  15. Dijkstra, E.W.: A Discipline of Programming. Prentice Hall, Englewood Cliffs (1976)

    MATH  Google Scholar 

  16. van den Eijnde, J.: Program derivation in acyclic graphs and related problems. Technical Report 92/04, Faculty of Computing Science, Technische Universiteit Eindhoven (1992)

    Google Scholar 

  17. Watson, B.W.: A new family of Commentz-Walter-style multiple-keyword pattern matching algorithms. In: Proceedings of the Prague Stringology Club Workshop 2000, Department of Computer Science and Engineering, pp. 71–76. Czech Technical University, Prague (2000)

    Google Scholar 

  18. Navarro, G., Raffinot, M.: Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  19. Cleophas, L., Zwaan, G., Watson, B.W.: Constructing Factor Oracles. In: Proceedings of the Prague Stringology Conference 2003, Department of Computer Science and Engineering, Czech Technical University, Prague (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cleophas, L., Watson, B.W., Zwaan, G. (2004). Automaton-Based Sublinear Keyword Pattern Matching. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30213-1_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23210-0

  • Online ISBN: 978-3-540-30213-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics