Skip to main content

On Maximal Frequent Itemsets Mining with Constraints

  • Conference paper
  • First Online:
Principles and Practice of Constraint Programming (CP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11008))

Abstract

Recently, a new declarative mining framework based on constraint programming (CP) and propositional satisfiability (SAT) has been designed to deal with several pattern mining tasks. The itemset mining problem has been modeled using constraints whose models correspond to the patterns to be mined. In this paper, we propose a new propositional satisfiability based approach for mining maximal frequent itemsets that extends the one proposed in [20]. We show that instead of adding constraints to the initial SAT based itemset mining encoding, the maximal itemsets can be obtained by performing clause learning during search. A major strength of our approach rises in the compactness of the proposed encoding and the efficiency of the SAT-based maximal itemsets enumeration derived using blocked clauses. Experimental results on several datasets, show the feasibility and the efficiency of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A pseudo Boolean constraint over boolean variables is defined by \(\sum _{i} c_i.l_i \triangleright k\) where \(c_i\) are the coefficients, k an integer constant, \(l_i\) are literals and \(\triangleright \) is one of the operators \(\{=, <, \mathrel {\leqslant }, >, \mathrel {\geqslant }\}\).

  2. 2.

    http://fimi.ua.ac.be/data/.

  3. 3.

    http://dtai.cs.kuleuven.be/CP4IM/datasets/.

References

  1. Abío, I., Nieuwenhuis, R., Oliveras, A., Rodríguez-Carbonell, E., Mayer-Eichberger, V.: A new look at bdds for pseudo-boolean constraints. J. Artif. Intell. Res. (JAIR) 45, 443–480 (2012)

    Article  MathSciNet  Google Scholar 

  2. Agarwal, R.C., Aggarwal, C.C., Prasad, V.V.V.: Depth first generation of long patterns. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 108–118 (2000)

    Google Scholar 

  3. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD 1993, pp. 207–216. ACM, New York (1993)

    Google Scholar 

  4. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases VLDB 1994, pp. 487–499 (1994)

    Google Scholar 

  5. Borgelt, C.: Frequent item set mining. Wiley Interdisc. Rew.: Data Min. Knowl. Disc. 2(6), 437–456 (2012)

    Google Scholar 

  6. Burdick, D., Calimlim, M., Gehrke, J.: Mafia: a maximal frequent itemset algorithm for transactional databases. In: ICDE, pp. 443–452 (2001)

    Google Scholar 

  7. Coquery, E., Jabbour, S., Saïs, L., Salhi, Y.: A sat-based approach for discovering frequent, closed and maximal patterns in a sequence. In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI 2012), pp. 258–263 (2012)

    Google Scholar 

  8. Davis, M., Logemann, G., Loveland, D.: A machine program for theorem proving. Commun. ACM 5, 394–397 (1962)

    Article  MathSciNet  Google Scholar 

  9. Dlala, I.O., Jabbour, S., Raddaoui, B., Sais, L., Yaghlane, B.B.: A sat-based approach for enumerating interesting patterns from uncertain data. In: Proceedings of 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, San Jose, CA, USA, pp. 255–262, 6–8 November 2016

    Google Scholar 

  10. Dlala, I.O., Jabbour, S., Sais, L., Yaghlane, B.B.: A comparative study of SAT-based itemsets mining. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXXIII, pp. 37–52. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47175-4_3

    Chapter  Google Scholar 

  11. Eén, N., Sörensson, N.: Translating pseudo-boolean constraints into SAT. JSAT 2(1–4), 1–26 (2006)

    MATH  Google Scholar 

  12. Gebser, M., Guyet, T., Quiniou, R., Romero, J., Schaub, T.: Knowledge-based sequence mining with ASP. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016

    Google Scholar 

  13. Gouda, K., Zaki, M.J.: GenMax: an efficient algorithm for mining maximal frequent itemsets. Data Min. Knowl. Discov. 11(3), 223–242 (2005)

    Article  MathSciNet  Google Scholar 

  14. Guns, T., Nijssen, S., Raedt, L.D.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12–13), 1951–1983 (2011)

    Article  MathSciNet  Google Scholar 

  15. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29, 1–12 (2000)

    Article  Google Scholar 

  16. Henriques, R., Lynce, I., Manquinho, V.M.: On when and how to use sat to mine frequent itemsets. CoRR, abs/1207.6253 (2012)

    Google Scholar 

  17. Heule, M., Järvisalo, M., Biere, A.: Revisiting hyper binary resolution. In: International Conference on Integration of AI and OR Techniques in Constraint Programming, pp. 77–93 (2013)

    Chapter  Google Scholar 

  18. Jabbour, S., Sais, L., Salhi, Y.: Boolean satisfiability for sequence mining. In: Proceedings of 22nd ACM International Conference on Information and Knowledge Management (CIKM 2013), pp. 649–658. ACM (2013)

    Google Scholar 

  19. Jabbour, S., Sais, L., Salhi, Y.: A pigeon-hole based encoding of cardinality constraints. TPLP, 13(4-5-Online-Supplement) (2013)

    Google Scholar 

  20. Jabbour, S., Sais, L., Salhi, Y.: The top-k frequent closed itemset mining using top-k sat problem. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD 2013), pp. 403–418 (2013)

    Chapter  Google Scholar 

  21. Jabbour, S., Sais, L., Salhi, Y.: Mining top-k motifs with a sat-based framework. Artif. Intell. 244, 30–47 (2017)

    Article  MathSciNet  Google Scholar 

  22. Bayardo, Jr R.J.: Efficiently mining long patterns from databases. In: Proceedings ACM SIGMOD International Conference on Management of Data SIGMOD 1998, Seattle, Washington, USA, pp. 85–93, 2–4 June 1998

    Google Scholar 

  23. Lin, D.-I., Kedem, Z.M.: Pincer-search: a new algorithm for discovering the maximum frequent set. In: Schek, H.-J., Alonso, G., Saltor, F., Ramos, I. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 103–119. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0100980

    Chapter  Google Scholar 

  24. Nijssen, S., Guns, T.: Integrating constraint programming and itemset mining. In: Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2010, Proceedings, Part II, Barcelona, Spain, pp. 467–482, 20–24 September 2010

    Chapter  Google Scholar 

  25. Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings IEEE International Conference on Data Mining ICDM 2001, pp. 441–448 (2001)

    Google Scholar 

  26. Pei, J., Han, J., Mao, R.: CLOSET: an efficient algorithm for mining frequent closed itemsets. In: 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 21–30 (2000)

    Google Scholar 

  27. Raedt, L.D., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: ACM SIGKDD, pp. 204–212 (2008)

    Google Scholar 

  28. Raedt, L.D., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, pp. 204–212, 24–27 August 2008

    Google Scholar 

  29. Tiwari, A., Gupta, R., Agrawal, D.: A survey on frequent pattern mining: current status and challenging issues. Inform. Technol. J 9, 1278–1293 (2010)

    Article  Google Scholar 

  30. Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations FIMI 2004, Brighton, UK, 1 November 2004

    Google Scholar 

  31. Warners, J.P.: A linear-time transformation of linear inequalities into conjunctive normal form. Inf. Process. Lett. 68(2), 63–69 (1998)

    Article  MathSciNet  Google Scholar 

  32. Zaki, M.J., Hsiao, C.: CHARM: an efficient algorithm for closed itemset mining. In: Proceedings of the Second SIAM International Conference on Data Mining, pp. 457–473 (2002)

    Google Scholar 

  33. Zou, Q., Chu, W.W., Lu, B.: Smartminer: a depth first algorithm guided by tail information for mining maximal frequent itemsets. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 570–577, 9–12 December 2002

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lakhdar Sais .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jabbour, S., Mana, F.E., Dlala, I.O., Raddaoui, B., Sais, L. (2018). On Maximal Frequent Itemsets Mining with Constraints. In: Hooker, J. (eds) Principles and Practice of Constraint Programming. CP 2018. Lecture Notes in Computer Science(), vol 11008. Springer, Cham. https://doi.org/10.1007/978-3-319-98334-9_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98334-9_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98333-2

  • Online ISBN: 978-3-319-98334-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics