Skip to main content

Exact and Approximate Minimal Pattern Mining

  • Chapter
  • First Online:
Book cover Advances in Knowledge Discovery and Management

Part of the book series: Studies in Computational Intelligence ((SCI,volume 665))

Abstract

Condensed representations have been studied extensively for 15 years. In particular, the maximal patterns of the equivalence classes have received much attention with very general proposals. In contrast, the minimal patterns remained in the shadows in particular because they are too numerous and they are difficult to extract. In this paper, we present a generic framework for exact and approximate minimal patterns mining by introducing the concept of minimizable set system. This framework based on set systems addresses various languages such as itemsets or strings, and at the same time, different metrics such as frequency. For instance, the free, \(\delta \)-free and the essential patterns are naturally handled by our approach, just as the minimal strings. Then, for any minimizable set system, we introduce a fast minimality checking method that is easy to incorporate in a depth-first search algorithm for mining the \(\delta \)-minimal patterns. We demonstrate that it is polynomial-delay and polynomial-space. Experiments on traditional benchmarks complete our study by showing that our approach is competitive with the best proposals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use the notation Xe instead of \(X \cup \{e\}\).

  2. 2.

    As this prototype mines non-derivable itemsets, it enable us to compute free patterns when the depth parameter is set to 1.

  3. 3.

    http://fimi.ua.ac.be/data/ and http://lisp.vse.cz/challenge/ecmlpkdd2004/.

  4. 4.

    This dataset is provided with \({{\textsc {maxMotif}}}\): http://research.nii.ac.jp/~uno/codes.htm.

References

  • Arimura, H., & Uno, T. (2009). Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems. In SDM (pp. 1087–1098). SIAM.

    Google Scholar 

  • Boulicaut, J.-F., Bykowski, A., & Rigotti, C. (2000). Approximation of frequency queries by means of free-sets. In D. A. Zighed, J. Komorowski & J. Żytkow (Eds.), PKDD. LNCS (Vol. 1910, pp. 75–85). Heidelberg: Springer.

    Google Scholar 

  • Boulicaut, J.-F., Bykowski, A., & Rigotti, C. (2003). Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery, 7(1), 5–22.

    Article  MathSciNet  Google Scholar 

  • Calders, T., & Goethals, B. (2003). Minimal k-free representations of frequent sets. In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003) (pp. 71–82). Heidelberg: Springer.

    Google Scholar 

  • Calders, T., & Goethals, B. (2005). Depth-first non-derivable itemset mining. In SDM (pp. 250–261).

    Google Scholar 

  • Calders, T., Rigotti, C., & Boulicaut, J. F. (2004). A survey on condensed representations for frequent sets. In J.-F. Boulicaut, L. De Raedt, & H. Mannila (Eds.), Constraint-based mining and inductive databases. Lecture notes in computer science (Vol. 3848, pp. 64–80). Heidelberg: Springer.

    Google Scholar 

  • Casali, A., Cicchetti, R., & Lakhal, L. (2005). Essential patterns: A perfect cover of frequent patterns. In A. M. Tjoa & J. Trujillo (Eds.), DaWaK. Lecture notes in computer science (Vol. 3589, pp. 428–437). Heidelberg: Springer.

    Google Scholar 

  • Crémilleux, B., & Boulicaut, J.-F. (2003). Simplest rules characterizing classes generated by \(\delta \)-free sets. In M. Bramer, A. Preece, & F. Coenen (Eds.), Research and development in intelligent systems XIX (pp. 33–46). London: Springer.

    Chapter  Google Scholar 

  • Eiter, T., & Gottlob, G. (2002). Hypergraph transversal computation and related problems in logic and AI. In S. Flesca, S. Greco, G. Ianni, & N. Leone (Eds.), JELIA. Lecture notes in computer science (Vol. 2424, pp. 549–564). Heidelberg: Springer.

    Google Scholar 

  • Gao, C., Wang, J., He, Y., & Zhou, L. (2008). Efficient mining of frequent sequence generators. In WWW (pp. 1051–1052). ACM.

    Google Scholar 

  • Gasmi, G., Yahia, S. B., Nguifo, E. M., & Bouker, S. (2007). Extraction of association rules based on literalsets. In Y. Song, J. Eder, & T. M. Nguyen (Eds.), DaWaK. Lecture notes in computer science (Vol. 4654, pp. 293–302). Heidelberg: Springer.

    Google Scholar 

  • Giacometti, A., Li, D. H., Marcel, P., & Soulet, A. (2013). 20 years of pattern mining: a bibliometric survey. SIGKDD Explorations, 15(1), 41–50.

    Article  Google Scholar 

  • Hamrouni, T. (2012). Key roles of closed sets and minimal generators in concise representations of frequent patterns. Intelligent Data Analysis, 16(4), 581–631.

    Google Scholar 

  • Hébert, C., & Crémilleux, B. (2005). Mining frequent delta-free patterns in large databases. In A. Hoffmann, H. Motoda, & T. Scheffer (Eds.), Discovery science. Lecture notes in computer science (Vol. 3735, pp. 124–136). Heidelberg: Springer.

    Google Scholar 

  • Jelassi, M. N., Largeron, C., & Yahia, S. B. (2014). Efficient unveiling of multi-members in a social network. Journal of Systems and Software, 94, 30–38.

    Article  Google Scholar 

  • Kryszkiewicz, M. (2005). Generalized disjunction-free representation of frequent patterns with negation. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2), 63–82.

    Article  MATH  Google Scholar 

  • Li, J., Li, H., Wong, L., Pei, J. & Dong, G. (2006). Minimum description length principle: Generators are preferable to closed patterns. In AAAI (pp. 409–414).

    Google Scholar 

  • Liu, B., Hsu, W. & Ma, Y. (1998). Integrating classification and association rule mining. In KDD (pp. 80–86).

    Google Scholar 

  • Liu, G., Li, J., & Wong, L. (2008). A new concise representation of frequent itemsets using generators and a positive border. Knowledge and Information Systems, 17(1), 35–56.

    Article  MathSciNet  Google Scholar 

  • Lo, D., Khoo, S. -C., & Li, J. (2008). Mining and ranking generators of sequential patterns. In SDM (pp. 553–564). SIAM.

    Google Scholar 

  • Lo, D., Khoo, S.-C., & Wong, L. (2009). Non-redundant sequential rules-theory and algorithm. Information Systems, 34(4–5), 438–453.

    Article  Google Scholar 

  • Mannila, H. & Toivonen, H. (1996). Multiple uses of frequent sets and condensed representations (extended abstract). In E. Simoudis, J. Han & U. M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA (pp. 189–194). AAAI Press.

    Google Scholar 

  • Murakami, K. & Uno, T. (2013). Efficient algorithms for dualizing large-scale hypergraphs. In ALENEX (pp. 1–13).

    Google Scholar 

  • Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Efficient mining of association rules using closed itemset lattices. Information Systems, 24(1), 25–46.

    Article  MATH  Google Scholar 

  • Rioult, F., Zanuttini, B., & Crémilleux, B. (2010). Nonredundant generalized rules and their impact in classification. In Z. W. Ras & L.-S. Tsay (Eds.), Advances in intelligent information systems. Studies in computational intelligence (Vol. 265, pp. 3–25). Heidelberg: Springer.

    Google Scholar 

  • Soulet, A., & Crémilleux, B. (2008). Adequate condensed representations of patterns. Data Mining and Knowledge Discovery, 17(1), 94–110.

    Article  MathSciNet  Google Scholar 

  • Soulet, A., Crémilleux, B., & Rioult, F. (2004). Condensed representation of EPs and patterns quantified by frequency-based measures. In Post-proceedings of knowledge discovery in inductive databases, pise. Heidelberg: Springer.

    Google Scholar 

  • Soulet, A., & Rioult, F. (2014). Efficiently depth-first minimal pattern mining. In V. S. Tseng., T. B. Ho., Z. Zhou., A. L. P. Chen., & H. Kao (Eds.), Proceedings 18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2014, Part I, Tainan, Taiwan, May 13–16, 2014. Lecture notes in computer science (Vol. 8443, pp. 28–39). Heidelberg: Springer.

    Google Scholar 

  • Szathmary, L., Valtchev, P., Napoli, A., & Godin, R. (2009). Efficient vertical mining of frequent closures and generators. In IDA. LNCS (Vol. 5772, pp. 393–404). Heidelberg: Springer.

    Google Scholar 

  • Zaki, M.J. (2000). Generating non-redundant association rules. In KDD (pp. 34–43).

    Google Scholar 

  • Zeng, Z., Wang, J., Zhang, J., & Zhou, L. (2009). FOGGER: an algorithm for graph generator discovery. In EDBT (pp. 517–528).

    Google Scholar 

Download references

Acknowledgments

This article has been partially funded by the Hybride project (ANR-11-BS02-0002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnaud Soulet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Soulet, A., Rioult, F. (2017). Exact and Approximate Minimal Pattern Mining. In: Guillet, F., Pinaud, B., Venturini, G. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 665. Springer, Cham. https://doi.org/10.1007/978-3-319-45763-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45763-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45762-8

  • Online ISBN: 978-3-319-45763-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics