Advertisement

A Parallel SAT-Based Framework for Closed Frequent Itemsets Mining

  • Imen Ouled Dlala
  • Said Jabbour
  • Badran Raddaoui
  • Lakhdar SaisEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11008)

Abstract

Constraint programming (CP) and propositional satisfiability (SAT) based framework for modeling and solving pattern mining tasks has gained a considerable audience in recent years. However, this nice declarative and generic framework encounters a scaling problem. The huge size of constraints networks/propositional formulas encoding large datasets is identified as the main bottleneck of most existing approaches. In this paper, we propose a parallel SAT based framework for itemset mining problem to push forward the solving efficiency. The proposed approach is based on a divide-and-conquer paradigm, where the transaction database is partitioned using item-based guiding paths. Such decomposition allows us to derive smaller and independent Boolean formulas that can be solved in parallel. The performance and scalability of the proposed algorithm are evaluated through extensive experiments on several datasets. We demonstrate that our partition-based parallel SAT approach outperforms other CP approaches even in the sequential case, while significantly reducing the performances gap with specialized approaches.

References

  1. 1.
    Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)Google Scholar
  2. 2.
    Bailleux, O., Boufkhad, Y.: Efficient CNF encoding of boolean cardinality constraints. In: International Conference on Principles and Practice of Constraint Programming CP, pp. 108–122 (2003)CrossRefGoogle Scholar
  3. 3.
    Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., Lakhal, L.: Mining minimal non-redundant association rules using frequent closed itemsets. In: Lloyd, J., et al. (eds.) CL 2000. LNCS (LNAI), vol. 1861, pp. 972–986. Springer, Heidelberg (2000).  https://doi.org/10.1007/3-540-44957-4_65CrossRefGoogle Scholar
  4. 4.
    Borgelt, C.: Frequent item set mining. Wiley Int. Rev.: Data Min. Knowl. Disc. 2(6), 437–456 (2012)Google Scholar
  5. 5.
    Boudane, A., Jabbour, S., Sais, L., Salhi, Y.: A sat-based approach for mining association rules. In: IJCAI, pp. 2472–2478 (2016)Google Scholar
  6. 6.
    Boudane, A., Jabbour, S., Sais, L., Salhi, Y.: Clustering complex data represented as propositional formulas. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10235, pp. 441–452. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-57529-2_35CrossRefGoogle Scholar
  7. 7.
    Dao, T., Duong, K., Vrain, C.: Constrained clustering by constraint programming. Artif. Intell. 244, 70–94 (2017)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Davis, M., Logemann, G., Loveland, D.: A machine program for theorem proving. Commun. ACM 5, 394–397 (1962)MathSciNetCrossRefGoogle Scholar
  9. 9.
    En, N., Srensson, N.: An extensible sat-solver. In: Proceedings of the Sixth International Conference on Theory and Applications of Satisfiability Testing (SAT 2003), pp. 502–518 (2002)Google Scholar
  10. 10.
    Ganji, M., Bailey, J., Stuckey, P.J.: A declarative approach to constrained community detection. In: International Conference on Principles and Practice of Constraint Programming, pp. 477–494 (2017)Google Scholar
  11. 11.
    Gebser, M., Guyet, T., Quiniou, R., Romero, J., Schaub, T.: Knowledge-based sequence mining with ASP. In: International Joint Conference on Artificial Intelligence, pp. 1497–1504 (2016)Google Scholar
  12. 12.
    Guns, T., Dries, A., Tack, G., Nijssen, S., Raedt, L.D.: Miningzinc: a modeling language for constraint-based mining. In: International Joint Conference on Artificial Intelligence, pp. 1365–1372 (2013)Google Scholar
  13. 13.
    Guns, T., Nijssen, S., Raedt, L.D.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12–13), 1951–1983 (2011)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Hamadi, Y., Jabbour, S., Sais, L.: Manysat: a parallel SAT solver. JSAT 6(4), 245–262 (2009)zbMATHGoogle Scholar
  15. 15.
    Henriques, R., Lynce, I., Manquinho, V.M.: On when and how to use sat to mine frequent itemsets. CoRR, abs/1207.6253 (2012)Google Scholar
  16. 16.
    Jabbour, S., Mhadhbi, N., Raddaoui, B., Sais, L.: A sat-based framework for overlapping community detection in networks. In: Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 786–798 (2017)CrossRefGoogle Scholar
  17. 17.
    Jabbour, S., Sais, L., Salhi, Y.: A pigeon-hole based encoding of cardinality constraints. TPLP 13(4-5-Online-Supplement) (2013)Google Scholar
  18. 18.
    Jabbour, S., Sais, L., Salhi, Y.: The top-k frequent closed itemset mining using top-k SAT problem. In: ECML/PKDD, pp. 403–418 (2013)CrossRefGoogle Scholar
  19. 19.
    Jabbour, S., Sais, L., Salhi, Y.: Decomposition based SAT encodings for itemset mining problems. In: Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 662–674 (2015)CrossRefGoogle Scholar
  20. 20.
    Jabbour, S., Sais, L., Salhi, Y.: Mining top-k motifs with a SAT-based framework. Artif. Intell. 244, 30–47 (2017)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Jeroslow, R.G., Wang, J.: Solving propositional satisfiability problems. Ann. Math. Artif. Intell. 1, 167–187 (1990)CrossRefGoogle Scholar
  22. 22.
    Lazaar, N., Lebbah, Y., Loudni, S., Maamar, M., Lemière, V., Bessiere, C., Boizumault, P.: A global constraint for closed frequent pattern mining. In: International Conference on Principles and Practice of Constraint Programming, pp. 333–349 (2016)Google Scholar
  23. 23.
    Lin, Y.C., Wu, C., Tseng, V.S.: Mining high utility itemsets in big data. In: Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 649–661 (2015)CrossRefGoogle Scholar
  24. 24.
    Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. In: International Conference on Very Large Data Bases (2007)Google Scholar
  25. 25.
    Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: IEEE International Conference on Big Data, pp. 111–118 (2013)Google Scholar
  26. 26.
    Négrevergne, B., Guns, T.: Constraint-based sequence mining using constraint programming. In: International Conference on Integration of AI and OR Techniques in Constraint Programming, pp. 288–305 (2015)zbMATHGoogle Scholar
  27. 27.
    Négrevergne, B., Termier, A., Méhaut, J., Uno, T.: Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: International Conference on High Performance Computing & Simulation, pp. 521–528 (2010)Google Scholar
  28. 28.
    Raedt, L.D., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: ACM SIGKDD, pp. 204–212 (2008)Google Scholar
  29. 29.
    Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: International Conference on Very Large Data Bases, pp. 432–444 (1995)Google Scholar
  30. 30.
    Schaus, P., Aoga, J.O.R., Guns, T.: Coversize: a global constraint for frequency-based itemset mining. In: International Conference on Principles and Practice of Constraint Programming, pp. 529–546 (2017)Google Scholar
  31. 31.
    Schubert, T., Lewis, M.D.T., Becker, B.: Pamiraxt: parallel SAT solving with threads and message passing. JSAT 6(4), 203–222 (2009)zbMATHGoogle Scholar
  32. 32.
    Tseitin, G.: On the complexity of derivations in the propositional calculus. In: Studies in Mathematics and Mathematical Logic, pp. 115–125 (1968)Google Scholar
  33. 33.
    Wang, S., Yang, Y., Gao, Y., Chen, G., Zhang, Y.: Mapreduce-based closed frequent itemset mining with efficient redundancy filtering. In: IEEE International Conference on Data Mining Workshops ICDM, pp. 449–453 (2012)Google Scholar
  34. 34.
    Warners, J.P.: A linear-time transformation of linear inequalities into conjunctive normal form. Inf Process Lett 68(2), 63–69 (1998)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Zaïane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. In: IEEE International Conference on Data Mining, pp. 665–668 (2001)Google Scholar
  36. 36.
    Zaki, M.J.: Mining non-redundant association rules. Data Min. Knowl. Discov. 9(3), 223–248 (2004)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Zhang, H., Bonacina, M.P., Hsiang, J.: Psato: a distributed propositional prover and its application to quasigroup problems. J. Symbolic Comput. 21(4), 543–560 (1996)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Zitouni, M., Akbarinia, R., Yahia, S.B., Masseglia, F.: Massively distributed environments and closed itemset mining: the DCIM approach. In: International Conference on Advanced Information Systems Engineering, pp. 231–246 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Imen Ouled Dlala
    • 1
    • 3
  • Said Jabbour
    • 1
  • Badran Raddaoui
    • 2
  • Lakhdar Sais
    • 1
    Email author
  1. 1.CRIL-CNRS, Université d’ArtoisLens CedexFrance
  2. 2.SAMOVAR, Télécom SudParis, CNRS, Univ. Paris-SaclayEvryFrance
  3. 3.LARODEC, University of TunisTunisTunisia

Personalised recommendations