Abstract
Given a dictionary D of regular expressions and a text T, the online regular-pattern-matching problem is to single out, for each text position T[c], those expressions in D that have a match ending at T[c], while processing T only once. This problem is considered in the context of regular patterns over bounded-length gaps and keywords, where the gaps are specified by wildcards and character classes and the keywords are strings over the input alphabet. Our algorithm is based on constructing the Aho–Corasick pattern-matching automaton for the set of keywords, and representing as a bit vector the set of keywords that can precede a given keyword in a regular-pattern instance. For a dictionary D with r patterns and with k i keywords in pattern i, the preprocessing takes time \(O(|D| + \sum_{i=1}^r k_i^2 \log k_i / w)\), where w denotes the number of bits in a memory word. When only fixed-length wildcard gaps without character classes are allowed, the time spent by our matching algorithm for each text character T[c] is at most O((logr + k/w) (K c + 1)), where k = max {k 1, …, k r } and K c is the number of keyword occurrences in D matched at text position T[c].
This work was supported by the Academy of Finland.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
Bille, P.: New algorithms for regular expression matching. In: Proc. of the 33rd Internat. Colloq. Automata, Languages and Programming, pp. 643–654 (2006)
Bille, P., Farach-Colton, M.: Fast and compact regular expression matching. Theor. Comput. Sci. 409(3), 486–496 (2008)
Bille, P., Gørtz, I.L., Vildhøj, H.W., Wind, D.K.: String matching with variable length gaps. Theor. Comput. Sci. 443, 25–34 (2012)
Bille, P., Thorup, M.: Faster regular expression matching. In: Proc. of the 36th Internat. Colloq. Automata, Languages and Programming, pp. 171–182 (2009)
Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proc. of the 21st Annual ACM-SIAM Symp. on Discrete Algorithms, pp. 1297–1308 (2010)
Haapasalo, T., Silvasti, P., Sippu, S., Soisalon-Soininen, E.: Online Dictionary Matching with Variable-Length Gaps. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 76–87. Springer, Heidelberg (2011)
Myers, E.W.: A four russians algorithm for regular expression pattern matching. J. ACM 39(2), 430–448 (1992)
Schnitger, G.: Regular expressions and NFAs without ε-transitions. In: Proc. of the 23rd Annual Symp. on Theoretical Aspects of Computer Science, pp. 432–443 (2006)
Sen, S., Spatscheck, O., Wang, D.: Accurate, scalable in-network identification of p2p traffic using application signatures. In: Proc. of the 13th Internat. Conf. on World Wide Web, pp. 512–521 (2004)
Thompson, K.: Regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sippu, S., Soisalon-Soininen, E. (2013). Online Matching of Multiple Regular Patterns with Gaps and Character Classes. In: Dediu, AH., Martín-Vide, C., Truthe, B. (eds) Language and Automata Theory and Applications. LATA 2013. Lecture Notes in Computer Science, vol 7810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37064-9_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-37064-9_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37063-2
Online ISBN: 978-3-642-37064-9
eBook Packages: Computer ScienceComputer Science (R0)