Episode matching

Das, Gautam; Fleischer, Rudolf; Gasieniec, Leszek; Gunopulos, Dimitris; Kärkkäinen, Juha

doi:10.1007/3-540-63220-4_46

Gautam Das¹,
Rudolf Fleischer²,
Leszek Gasieniec²,
Dimitris Gunopulos³ &
…
Juha Kärkkäinen⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1264))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

290 Accesses
32 Citations

Abstract

Given two words, text T of length n and episode P of length m, the episode matching problem is to find all minimal length substrings of text T that contain episode P as a subsequence. The respective optimization problem is to find the smallest number w, s.t. text T has a subword of length w which contains episode P.

In this paper, we introduce a few efficient off-line as well as on-line algorithms for the entire problem, where by on-line algorithms we mean algorithms which search from left to right consecutive text symbols only once. We present two alphabet independent algorithms which work in time O(nm). The off-line algorithm operates in O(1) additional space while the on-line algorithm pays for its property with O(m) additional space. Two other on-line algorithms have subquadratic time complexity. One of them works in time O(nm/log m) and O(m) additional space. The other one gives a time/space trade-off, i.e., it works in time O(n+s+nm log log s/log(s/m)) when additional space is limited to O(s). Finally, we present two approximation algorithms for the optimization problem. The off-line algorithm is alphabet independent, it has superlinear time complexity O(n/∈+nloglog(n/m)) and it uses only constant space. The on-line algorithm works in time O(n/∈+n) and uses space O(m). Both approximation algorithms achieve 1+∈ approximation ratio, for any ∈>0.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. V. Aho, J. E. Hopcroft and J. D. Ullman: The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.
Google Scholar
Z. Galil and K. Park: An improved algorithm for approximate string matching. SIAM J. Comp., 19(6) (Dec. 1990), 989–999.
Google Scholar
G. M. Landau and U. Vishkin: Fast parallel and serial approximate string matching. J. Algorithms, 10(2) (June 1989), 157–169.
Google Scholar
J. H. van Lint and R. M. Wilson: A Course in Combinatorics. Cambridge University Press, 1992.
Google Scholar
H. Mannila and H. Toivonen: Discovering frequent episodes in sequences. Proc. 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96), 146–151. AAAI Press 1996.
Google Scholar
H. Mannila, H. Toivonen and A. I. Verkamo: Discovering frequent episodes in sequences. Proc. 1st International Conference on Knowledge Discovery and Data Mining (KDD'95), 210–215. AAAI Press 1995.
Google Scholar
W. J. Masek and M. S. Paterson: A faster algorithm for computing string edit distances. J. Comput. System Sci., 20 (1980), 18–31.
Google Scholar
S. B. Needleman and C. D. Wunsch: A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Molecular Biol. 48 (1970), 443–453.
Google Scholar
P. H. Sellers: The theory and computation of evolutionary distances: pattern recognition. J. Algorithms, 1(4) (Dec. 1980), 359–373.
Google Scholar
H. Toivonen: Discovery of Frequent Patterns in Large Data Collections. Ph.D. Thesis, Report A-1996-5, Department of Computer Science, University of Helsinki, 1996.
Google Scholar
E. Ukkonen: Finding approximate patterns in strings. J. Algorithms, 6(1) (May 1985), 132–137.
Google Scholar
S. Wu, U. Manber: Agrep — a fast approximate pattern-matching tool. Proc. Usenix Winter 1992 Technical Conference, 153–162. Jan. 1992.
Google Scholar
S. Wu, U. Manber and G. Myers: A subquadratic algorithm for approximate limited expression matching. Algorithmica, 15(1) (Jan. 1996), 50–67.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Mathematical Sciences, The University of Memphis, 38152, Memphis, TN, USA
Gautam Das
Max-Planck Institut für Informatik, Im Stadtwald, D-66123, Saarbrücken, Germany
Rudolf Fleischer & Leszek Gasieniec
IBM Almaden RC k55/B1, 650 Harry Rd, 95120, CA, USA
Dimitris Gunopulos
Dept. of Computer Science, University of Helsinki, P.O. Box 26, FIN-00014, Finland
Juha Kärkkäinen

Authors

Gautam Das
View author publications
You can also search for this author in PubMed Google Scholar
Rudolf Fleischer
View author publications
You can also search for this author in PubMed Google Scholar
Leszek Gasieniec
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Gunopulos
View author publications
You can also search for this author in PubMed Google Scholar
Juha Kärkkäinen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alberto Apostolico Jotun Hein

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Das, G., Fleischer, R., Gasieniec, L., Gunopulos, D., Kärkkäinen, J. (1997). Episode matching. In: Apostolico, A., Hein, J. (eds) Combinatorial Pattern Matching. CPM 1997. Lecture Notes in Computer Science, vol 1264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63220-4_46

Download citation

DOI: https://doi.org/10.1007/3-540-63220-4_46
Published: 08 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63220-7
Online ISBN: 978-3-540-69214-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics