Advertisement

Anytime Subgroup Discovery in Numerical Domains with Guarantees

  • Aimene BelfodilEmail author
  • Adnene BelfodilEmail author
  • Mehdi Kaytoue
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11052)

Abstract

Subgroup discovery is the task of discovering patterns that accurately discriminate a class label from the others. Existing approaches can uncover such patterns either through an exhaustive or an approximate exploration of the pattern search space. However, an exhaustive exploration is generally unfeasible whereas approximate approaches do not provide guarantees bounding the error of the best pattern quality nor the exploration progression (“How far are we of an exhaustive search”). We design here an algorithm for mining numerical data with three key properties w.r.t. the state of the art: (i) It yields progressively interval patterns whose quality improves over time; (ii) It can be interrupted anytime and always gives a guarantee bounding the error on the top pattern quality and (iii) It always bounds a distance to the exhaustive exploration. After reporting experimentations showing the effectiveness of our method, we discuss its generalization to other kinds of patterns. Code related to this paper is available at: https://github.com/Adnene93/RefineAndMine.

Keywords

Subgroup discovery Anytime algorithms Discretization 

Notes

Aknowledgement

This work has been partially supported by the project ContentCheck ANR-15-CE23-0025 funded by the French National Research Agency, the Association Nationale Recherche Technologie (ANRt) French program and the APRC Conf Pap - CNRS project. The authors would like to thank the reviewers for their valuable remarks. They also warmly thank Loïc Cerf, Marc Plantevit and Anes Bendimerad for interesting discussions.

Supplementary material

478890_1_En_30_MOESM1_ESM.pdf (728 kb)
Supplementary material 1 (pdf 728 KB)

References

  1. 1.
    Abudawood, T., Flach, P.: Evaluation measures for multi-class subgroup discovery. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5781, pp. 35–50. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-04180-8_20CrossRefGoogle Scholar
  2. 2.
    Atzmueller, M., Puppe, F.: SD-Map – a fast algorithm for exhaustive subgroup discovery. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 6–17. Springer, Heidelberg (2006).  https://doi.org/10.1007/11871637_6CrossRefGoogle Scholar
  3. 3.
    Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient two-step random procedures. In: KDD, pp. 582–590 (2011)Google Scholar
  4. 4.
    Boley, M., Moens, S., Gärtner, T.: Linear space direct pattern sampling using coupling from the past. In: KDD, pp. 69–77 (2012)Google Scholar
  5. 5.
    Bosc, G., Boulicaut, J., Raïssi, C., Kaytoue, M.: Anytime discovery of a diverse set of patterns with monte carlo tree search. DMKD 32(3), 604–650 (2018)MathSciNetGoogle Scholar
  6. 6.
    Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Fast generation of best interval patterns for nonmonotonic constraints. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9285, pp. 157–172. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-23525-7_10CrossRefGoogle Scholar
  7. 7.
    Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Revisiting pattern structure projections. In: Baixeries, J., Sacarea, C., Ojeda-Aciego, M. (eds.) ICFCA 2015. LNCS (LNAI), vol. 9113, pp. 200–215. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-19545-2_13CrossRefGoogle Scholar
  8. 8.
    Denecke, K., Wismath, S.L.: Galois connections and complete sublattices. In: Denecke, K., Erné, M., Wismath, S.L. (eds.) Galois Connections and Applications, vol. 565, pp. 211–229. Springer, Dordrecht (2004).  https://doi.org/10.1007/978-1-4020-1898-5_4CrossRefzbMATHGoogle Scholar
  9. 9.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI, pp. 1022–1029 (1993)Google Scholar
  10. 10.
    Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS-ConceptStruct 2001. LNCS (LNAI), vol. 2120, pp. 129–142. Springer, Heidelberg (2001).  https://doi.org/10.1007/3-540-44583-8_10CrossRefGoogle Scholar
  11. 11.
    Garriga, G.C., Kralj, P., Lavrac, N.: Closed sets for labeled data. J. Mach. Learn. Res. 9, 559–580 (2008)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38(3), 9 (2006)CrossRefGoogle Scholar
  13. 13.
    Giacometti, A., Soulet, A.: Dense neighborhood pattern sampling in numerical data. In: SIAM, pp. 756–764 (2018)Google Scholar
  14. 14.
    Grosskreutz, H., Rüping, S.: On subgroup discovery in numerical domains. Data Min. Knowl. Discov. 19(2), 210–226 (2009)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Guyet, T., Quiniou, R., Masson, V.: Mining relevant interval rules. CoRR abs/1709.03267 (2017), http://arxiv.org/abs/1709.03267
  16. 16.
    Hu, Q., Imielinski, T.: ALPINE: progressive itemset mining with definite guarantees. In: SIAM, pp. 63–71 (2017)Google Scholar
  17. 17.
    Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.: Comparing images using the hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 850–863 (1993)CrossRefGoogle Scholar
  18. 18.
    Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Revisiting numerical pattern mining with formal concept analysis. In: IJCAI, pp. 1342–1347 (2011)Google Scholar
  19. 19.
    Kurgan, L., Cios, K.J.: Discretization algorithm that uses class-attribute interdependence maximization. In: IC-AI, pp. 980–987 (2001)Google Scholar
  20. 20.
    van Leeuwen, M., Knobbe, A.J.: Diverse subgroup set discovery. Data Min. Knowl. Discov. 25(2), 208–242 (2012)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Lenca, P., Meyer, P., Vaillant, B., Lallich, S.: On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. Eur. J. Oper. Res. 184(2), 610–626 (2008)CrossRefGoogle Scholar
  22. 22.
    Lucas, T., Silva, T.C.P.B., Vimieiro, R., Ludermir, T.B.: A new evolutionary algorithm for mining top-k discriminative patterns in high dimensional data. Appl. Soft Comput. 59, 487–499 (2017)CrossRefGoogle Scholar
  23. 23.
    Mampaey, M., Nijssen, S., Feelders, A., Knobbe, A.J.: Efficient algorithms for finding richer subgroup descriptions in numeric and nominal data. In: ICDM, pp. 499–508 (2012)Google Scholar
  24. 24.
    Morishita, S., Sese, J.: Traversing itemset lattice with statistical metric pruning. In: ACM SIGMOD-SIGACT-SIGART, pp. 226–236 (2000)Google Scholar
  25. 25.
    Pawlak, Z.: Rough sets. Int. J. Parallel Program. 11(5), 341–356 (1982)zbMATHGoogle Scholar
  26. 26.
    Roman, S.: Lattices and Ordered Sets. Springer, New York (2008).  https://doi.org/10.1007/978-0-387-78901-9CrossRefzbMATHGoogle Scholar
  27. 27.
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997).  https://doi.org/10.1007/3-540-63223-9_108CrossRefGoogle Scholar
  28. 28.
    Yang, Y., Webb, G.I., Wu, X.: Discretization methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn, pp. 101–116. Springer, Boston (2010).  https://doi.org/10.1007/978-0-387-09823-4_6CrossRefGoogle Scholar
  29. 29.
    Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17(3), 73–83 (1996)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Univ Lyon, INSA Lyon, CNRS, LIRIS UMR 5205LyonFrance
  2. 2.Mobile Devices IngénierieVillejuifFrance
  3. 3.InfologicBourg-Lès-ValenceFrance

Personalised recommendations