Advertisement

Episode matching

  • Gautam Das
  • Rudolf Fleischer
  • Leszek Gasieniec
  • Dimitris Gunopulos
  • Juha Kärkkäinen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1264)

Abstract

Given two words, text T of length n and episode P of length m, the episode matching problem is to find all minimal length substrings of text T that contain episode P as a subsequence. The respective optimization problem is to find the smallest number w, s.t. text T has a subword of length w which contains episode P.

In this paper, we introduce a few efficient off-line as well as on-line algorithms for the entire problem, where by on-line algorithms we mean algorithms which search from left to right consecutive text symbols only once. We present two alphabet independent algorithms which work in time O(nm). The off-line algorithm operates in O(1) additional space while the on-line algorithm pays for its property with O(m) additional space. Two other on-line algorithms have subquadratic time complexity. One of them works in time O(nm/log m) and O(m) additional space. The other one gives a time/space trade-off, i.e., it works in time O(n+s+nm log log s/log(s/m)) when additional space is limited to O(s). Finally, we present two approximation algorithms for the optimization problem. The off-line algorithm is alphabet independent, it has superlinear time complexity O(n/∈+nloglog(n/m)) and it uses only constant space. The on-line algorithm works in time O(n/∈+n) and uses space O(m). Both approximation algorithms achieve 1+∈ approximation ratio, for any ∈>0.

Keywords

Regular Expression Edit Operation Text Character Additional Space Approximate String Match 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. V. Aho, J. E. Hopcroft and J. D. Ullman: The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.Google Scholar
  2. 2.
    Z. Galil and K. Park: An improved algorithm for approximate string matching. SIAM J. Comp., 19(6) (Dec. 1990), 989–999.Google Scholar
  3. 3.
    G. M. Landau and U. Vishkin: Fast parallel and serial approximate string matching. J. Algorithms, 10(2) (June 1989), 157–169.Google Scholar
  4. 4.
    J. H. van Lint and R. M. Wilson: A Course in Combinatorics. Cambridge University Press, 1992.Google Scholar
  5. 5.
    H. Mannila and H. Toivonen: Discovering frequent episodes in sequences. Proc. 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96), 146–151. AAAI Press 1996.Google Scholar
  6. 6.
    H. Mannila, H. Toivonen and A. I. Verkamo: Discovering frequent episodes in sequences. Proc. 1st International Conference on Knowledge Discovery and Data Mining (KDD'95), 210–215. AAAI Press 1995.Google Scholar
  7. 7.
    W. J. Masek and M. S. Paterson: A faster algorithm for computing string edit distances. J. Comput. System Sci., 20 (1980), 18–31.Google Scholar
  8. 8.
    S. B. Needleman and C. D. Wunsch: A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Molecular Biol. 48 (1970), 443–453.Google Scholar
  9. 9.
    P. H. Sellers: The theory and computation of evolutionary distances: pattern recognition. J. Algorithms, 1(4) (Dec. 1980), 359–373.Google Scholar
  10. 10.
    H. Toivonen: Discovery of Frequent Patterns in Large Data Collections. Ph.D. Thesis, Report A-1996-5, Department of Computer Science, University of Helsinki, 1996.Google Scholar
  11. 11.
    E. Ukkonen: Finding approximate patterns in strings. J. Algorithms, 6(1) (May 1985), 132–137.Google Scholar
  12. 12.
    S. Wu, U. Manber: Agrep — a fast approximate pattern-matching tool. Proc. Usenix Winter 1992 Technical Conference, 153–162. Jan. 1992.Google Scholar
  13. 13.
    S. Wu, U. Manber and G. Myers: A subquadratic algorithm for approximate limited expression matching. Algorithmica, 15(1) (Jan. 1996), 50–67.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Gautam Das
    • 1
  • Rudolf Fleischer
    • 2
  • Leszek Gasieniec
    • 2
  • Dimitris Gunopulos
    • 3
  • Juha Kärkkäinen
    • 4
  1. 1.Dept. of Mathematical SciencesThe University of MemphisMemphisUSA
  2. 2.Max-Planck Institut für Informatik, Im StadtwaldSaarbrückenGermany
  3. 3.IBM Almaden RC k55/B1USA
  4. 4.Dept. of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations