Abstract
Recently, a new declarative mining framework based on constraint programming (CP) and propositional satisfiability (SAT) has been designed to deal with several pattern mining tasks. The itemset mining problem has been modeled using constraints whose models correspond to the patterns to be mined. In this paper, we propose a new propositional satisfiability based approach for mining maximal frequent itemsets that extends the one proposed in [20]. We show that instead of adding constraints to the initial SAT based itemset mining encoding, the maximal itemsets can be obtained by performing clause learning during search. A major strength of our approach rises in the compactness of the proposed encoding and the efficiency of the SAT-based maximal itemsets enumeration derived using blocked clauses. Experimental results on several datasets, show the feasibility and the efficiency of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A pseudo Boolean constraint over boolean variables is defined by \(\sum _{i} c_i.l_i \triangleright k\) where \(c_i\) are the coefficients, k an integer constant, \(l_i\) are literals and \(\triangleright \) is one of the operators \(\{=, <, \mathrel {\leqslant }, >, \mathrel {\geqslant }\}\).
- 2.
- 3.
References
Abío, I., Nieuwenhuis, R., Oliveras, A., Rodríguez-Carbonell, E., Mayer-Eichberger, V.: A new look at bdds for pseudo-boolean constraints. J. Artif. Intell. Res. (JAIR) 45, 443–480 (2012)
Agarwal, R.C., Aggarwal, C.C., Prasad, V.V.V.: Depth first generation of long patterns. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 108–118 (2000)
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD 1993, pp. 207–216. ACM, New York (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases VLDB 1994, pp. 487–499 (1994)
Borgelt, C.: Frequent item set mining. Wiley Interdisc. Rew.: Data Min. Knowl. Disc. 2(6), 437–456 (2012)
Burdick, D., Calimlim, M., Gehrke, J.: Mafia: a maximal frequent itemset algorithm for transactional databases. In: ICDE, pp. 443–452 (2001)
Coquery, E., Jabbour, S., Saïs, L., Salhi, Y.: A sat-based approach for discovering frequent, closed and maximal patterns in a sequence. In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI 2012), pp. 258–263 (2012)
Davis, M., Logemann, G., Loveland, D.: A machine program for theorem proving. Commun. ACM 5, 394–397 (1962)
Dlala, I.O., Jabbour, S., Raddaoui, B., Sais, L., Yaghlane, B.B.: A sat-based approach for enumerating interesting patterns from uncertain data. In: Proceedings of 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, San Jose, CA, USA, pp. 255–262, 6–8 November 2016
Dlala, I.O., Jabbour, S., Sais, L., Yaghlane, B.B.: A comparative study of SAT-based itemsets mining. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXXIII, pp. 37–52. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47175-4_3
Eén, N., Sörensson, N.: Translating pseudo-boolean constraints into SAT. JSAT 2(1–4), 1–26 (2006)
Gebser, M., Guyet, T., Quiniou, R., Romero, J., Schaub, T.: Knowledge-based sequence mining with ASP. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016
Gouda, K., Zaki, M.J.: GenMax: an efficient algorithm for mining maximal frequent itemsets. Data Min. Knowl. Discov. 11(3), 223–242 (2005)
Guns, T., Nijssen, S., Raedt, L.D.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12–13), 1951–1983 (2011)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29, 1–12 (2000)
Henriques, R., Lynce, I., Manquinho, V.M.: On when and how to use sat to mine frequent itemsets. CoRR, abs/1207.6253 (2012)
Heule, M., Järvisalo, M., Biere, A.: Revisiting hyper binary resolution. In: International Conference on Integration of AI and OR Techniques in Constraint Programming, pp. 77–93 (2013)
Jabbour, S., Sais, L., Salhi, Y.: Boolean satisfiability for sequence mining. In: Proceedings of 22nd ACM International Conference on Information and Knowledge Management (CIKM 2013), pp. 649–658. ACM (2013)
Jabbour, S., Sais, L., Salhi, Y.: A pigeon-hole based encoding of cardinality constraints. TPLP, 13(4-5-Online-Supplement) (2013)
Jabbour, S., Sais, L., Salhi, Y.: The top-k frequent closed itemset mining using top-k sat problem. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD 2013), pp. 403–418 (2013)
Jabbour, S., Sais, L., Salhi, Y.: Mining top-k motifs with a sat-based framework. Artif. Intell. 244, 30–47 (2017)
Bayardo, Jr R.J.: Efficiently mining long patterns from databases. In: Proceedings ACM SIGMOD International Conference on Management of Data SIGMOD 1998, Seattle, Washington, USA, pp. 85–93, 2–4 June 1998
Lin, D.-I., Kedem, Z.M.: Pincer-search: a new algorithm for discovering the maximum frequent set. In: Schek, H.-J., Alonso, G., Saltor, F., Ramos, I. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 103–119. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0100980
Nijssen, S., Guns, T.: Integrating constraint programming and itemset mining. In: Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2010, Proceedings, Part II, Barcelona, Spain, pp. 467–482, 20–24 September 2010
Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings IEEE International Conference on Data Mining ICDM 2001, pp. 441–448 (2001)
Pei, J., Han, J., Mao, R.: CLOSET: an efficient algorithm for mining frequent closed itemsets. In: 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 21–30 (2000)
Raedt, L.D., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: ACM SIGKDD, pp. 204–212 (2008)
Raedt, L.D., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, pp. 204–212, 24–27 August 2008
Tiwari, A., Gupta, R., Agrawal, D.: A survey on frequent pattern mining: current status and challenging issues. Inform. Technol. J 9, 1278–1293 (2010)
Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations FIMI 2004, Brighton, UK, 1 November 2004
Warners, J.P.: A linear-time transformation of linear inequalities into conjunctive normal form. Inf. Process. Lett. 68(2), 63–69 (1998)
Zaki, M.J., Hsiao, C.: CHARM: an efficient algorithm for closed itemset mining. In: Proceedings of the Second SIAM International Conference on Data Mining, pp. 457–473 (2002)
Zou, Q., Chu, W.W., Lu, B.: Smartminer: a depth first algorithm guided by tail information for mining maximal frequent itemsets. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 570–577, 9–12 December 2002
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Jabbour, S., Mana, F.E., Dlala, I.O., Raddaoui, B., Sais, L. (2018). On Maximal Frequent Itemsets Mining with Constraints. In: Hooker, J. (eds) Principles and Practice of Constraint Programming. CP 2018. Lecture Notes in Computer Science(), vol 11008. Springer, Cham. https://doi.org/10.1007/978-3-319-98334-9_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-98334-9_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98333-2
Online ISBN: 978-3-319-98334-9
eBook Packages: Computer ScienceComputer Science (R0)