Skip to main content

Episode matching

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1264))

Included in the following conference series:

Abstract

Given two words, text T of length n and episode P of length m, the episode matching problem is to find all minimal length substrings of text T that contain episode P as a subsequence. The respective optimization problem is to find the smallest number w, s.t. text T has a subword of length w which contains episode P.

In this paper, we introduce a few efficient off-line as well as on-line algorithms for the entire problem, where by on-line algorithms we mean algorithms which search from left to right consecutive text symbols only once. We present two alphabet independent algorithms which work in time O(nm). The off-line algorithm operates in O(1) additional space while the on-line algorithm pays for its property with O(m) additional space. Two other on-line algorithms have subquadratic time complexity. One of them works in time O(nm/log m) and O(m) additional space. The other one gives a time/space trade-off, i.e., it works in time O(n+s+nm log log s/log(s/m)) when additional space is limited to O(s). Finally, we present two approximation algorithms for the optimization problem. The off-line algorithm is alphabet independent, it has superlinear time complexity O(n/∈+nloglog(n/m)) and it uses only constant space. The on-line algorithm works in time O(n/∈+n) and uses space O(m). Both approximation algorithms achieve 1+∈ approximation ratio, for any ∈>0.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. V. Aho, J. E. Hopcroft and J. D. Ullman: The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.

    Google Scholar 

  2. Z. Galil and K. Park: An improved algorithm for approximate string matching. SIAM J. Comp., 19(6) (Dec. 1990), 989–999.

    Google Scholar 

  3. G. M. Landau and U. Vishkin: Fast parallel and serial approximate string matching. J. Algorithms, 10(2) (June 1989), 157–169.

    Google Scholar 

  4. J. H. van Lint and R. M. Wilson: A Course in Combinatorics. Cambridge University Press, 1992.

    Google Scholar 

  5. H. Mannila and H. Toivonen: Discovering frequent episodes in sequences. Proc. 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96), 146–151. AAAI Press 1996.

    Google Scholar 

  6. H. Mannila, H. Toivonen and A. I. Verkamo: Discovering frequent episodes in sequences. Proc. 1st International Conference on Knowledge Discovery and Data Mining (KDD'95), 210–215. AAAI Press 1995.

    Google Scholar 

  7. W. J. Masek and M. S. Paterson: A faster algorithm for computing string edit distances. J. Comput. System Sci., 20 (1980), 18–31.

    Google Scholar 

  8. S. B. Needleman and C. D. Wunsch: A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Molecular Biol. 48 (1970), 443–453.

    Google Scholar 

  9. P. H. Sellers: The theory and computation of evolutionary distances: pattern recognition. J. Algorithms, 1(4) (Dec. 1980), 359–373.

    Google Scholar 

  10. H. Toivonen: Discovery of Frequent Patterns in Large Data Collections. Ph.D. Thesis, Report A-1996-5, Department of Computer Science, University of Helsinki, 1996.

    Google Scholar 

  11. E. Ukkonen: Finding approximate patterns in strings. J. Algorithms, 6(1) (May 1985), 132–137.

    Google Scholar 

  12. S. Wu, U. Manber: Agrep — a fast approximate pattern-matching tool. Proc. Usenix Winter 1992 Technical Conference, 153–162. Jan. 1992.

    Google Scholar 

  13. S. Wu, U. Manber and G. Myers: A subquadratic algorithm for approximate limited expression matching. Algorithmica, 15(1) (Jan. 1996), 50–67.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alberto Apostolico Jotun Hein

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Das, G., Fleischer, R., Gasieniec, L., Gunopulos, D., Kärkkäinen, J. (1997). Episode matching. In: Apostolico, A., Hein, J. (eds) Combinatorial Pattern Matching. CPM 1997. Lecture Notes in Computer Science, vol 1264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63220-4_46

Download citation

  • DOI: https://doi.org/10.1007/3-540-63220-4_46

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63220-7

  • Online ISBN: 978-3-540-69214-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics