Advertisement

Journal of Combinatorial Optimization

, Volume 13, Issue 3, pp 243–262 | Cite as

An efficient polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence

  • Hiroki Arimura
  • Takeaki Uno
Article

Abstract

In this paper, we consider the problem of enumerating all maximal motifs in an input string for the class of repeated motifs with wild cards. A maximal motif is such a representative motif that is not properly contained in any larger motifs with the same location lists. Although the enumeration problem for maximal motifs with wild cards has been studied in Parida et al. (2001), Pisanti et al. (2003) and Pelfrêne et al. (2003), its output-polynomial time computability has been still open. The main result of this paper is a polynomial space polynomial delay algorithm for the maximal motif enumeration problem for the repeated motifs with wild cards. This algorithm enumerates all maximal motifs in an input string of length n in O(n 3) time per motif with O(n) space, in particular O(n 3) delay. The key of the algorithm is depth-first search on a tree-shaped search route over all maximal motifs based on a technique called prefix-preserving closure extension. We also show an exponential lower bound and a succinctness result on the number of maximal motifs, which indicate the limit of a straightforward approach. The results of the computational experiments show that our algorithm can be applicable to huge string data such as genome data in practice, and does not take large additional computational cost compared to usual frequent motif mining algorithms.

Keywords

Motif Maximal motif Data mining Sequence mining Algorithm Delay Enumeration Polynomial time Closed itemset Closed pattern Pattern discovery 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apostolico A, Comin M 2, Parida L (2005) Conservative extraction of over-represented extensible motifs. ISMB (Supplement of Bioinformatics) 21:9–18CrossRefGoogle Scholar
  2. Apostolico A, Parida L (2003) Compression and the wheel of fortune. In: Proceedings of the 2003 data compression conference (DCC'03), IEEEGoogle Scholar
  3. Arimura H, Uno T (2005) An output-polynomial time algorithm for mining frequent closed attribute trees. In: Proceedings of the ILP'05, LNAI 3625, pp 1–19Google Scholar
  4. Arimura H, Shinohara T, Otsuki S (1994) Finding minimal generalizations for unions of pattern languages and its application to inductive inference from positive data. In: STACS'94, LNCS 775, Springer-Verlag, pp 649–660Google Scholar
  5. Boros E, Gurvich V, Khachiyan L, Makino K (2002) The complexity of generating maximal frequent and minimal infrequent sets. In: Proceedings of the STACS '02, LNCS, pp 133–141Google Scholar
  6. Crochemore M, Rytter W (2002) Jewels of stringology. World ScientificGoogle Scholar
  7. Goldberg LA (1993) Polynomial space polynomial delay algorithms for listing families of graphs. In: Proceedings of the 25th STOC, ACM, pp 218–225Google Scholar
  8. Gusfield D (1997) Algorithms on strings, trees, and sequences. CambridgeGoogle Scholar
  9. Parida L, Rigoutsos I, Floratos A, Platt D, Gao Y (2000) Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and effcient polynomial time algorithm. In: Proceedings of the 11th SIAM symposium on discrete algorithms (SODA'00), pp 297–308Google Scholar
  10. Parida L, Rigoutsos I, Platt DE (2001) An output-sensitive flexible pattern discovery algorithm. In: Proceedings of the CPM'01, LNCS 2089, pp 131–142Google Scholar
  11. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the ICDT'99, pp 398–416Google Scholar
  12. Pelfrêne J, Abdeddaim S, Alexandre J (2003) Extending approximate patterns. In: Proceeding of the CPM'03, LNCS 2676, pp 328–347Google Scholar
  13. Pisanti N, Crochemore M, Grossi R, Sagot M-F (2003) A basis of tiling motifs for generating repeated patterns and its complexity for higher quorum, In: Proceedings of the MFCS'03, LNCS 2747, pp 622–631Google Scholar
  14. Pisanti N, Crochemore M, Grossi R, Sagot M-F (2004) A comparative study of bases for motif inference. String algorithmics. KCL publicationsGoogle Scholar
  15. Uno T (2003) Two general methods to reduce delay and change of enumeration algorithms, NII Technical Report, NII-2003-004E, April 2003Google Scholar
  16. Uno T, Asai T, Uchida Y, Arimura H (2004) An efficient algorithm for enumerating closed patterns in transaction databases. In: Proceedings of the DS'04, LNAI 3245, pp 16–30Google Scholar
  17. Valiant LG (1979) The complexity of computing the permanent. Theor Comput Sci 8:189–201MATHCrossRefMathSciNetGoogle Scholar
  18. Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns. In: Proceedings of the SIGKDD'03Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2006

Authors and Affiliations

  1. 1.Hokkaido UniversitySapporoJapan
  2. 2.National Institute of InformaticsTokyoJapan

Personalised recommendations