Skip to main content
Log in

Rank correlated subgroup discovery

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Subgroup discovery (SD) and exceptional model mining (EMM), its generalization to handle more complex targets, are two mature fields at the frontier of data mining and machine learning. More precisely, EMM aims to find coherent subgroups of a dataset where multiple targets interact in an unusual way. Correlation model classes have already been defined to discover interesting subgroups when dealing with two numerical targets. However, in this supervised setting, the two numerical targets are fixed before the subgroup search. To make unsupervised exploration possible, we propose to search for arbitrary subsets of numerical targets whose correlation is exceptional for an automatically found subgroup. This involves solving two challenges: the definition of a model that evaluates the interest of a subgroup for a subset of numerical targets and the definition of a pattern language that enumerates both subgroups and targets and lends itself to effective research strategies. We propose an integrated solution to both challenges. We introduce the problem of rank-correlated subgroup discovery with an arbitrary subset of numerical targets. A rank-correlated subgroup is identified by both conditions on descriptive attributes, whether numeric or nominal, and a pattern on numeric attributes that captures (positive or negative) rank correlations based on a generalization of the Kendall’s τ. We define a new branch-and-bound algorithm that exploits some pruning properties based on two upper-bounds and a closure property. An empirical study on several datasets demonstrates the efficiency and the effectiveness of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://gitlab.liris.cnrs.fr/mahammal/rank-correlated-subgroup-discovery.git

  2. https://web.stanford.edu/~hastie/ElemStatLearn/datasets/

References

  • Atzmüller, M., & Puppe, F. (2006). Sd-map - a fast algorithm for exhaustive subgroup discovery. In ECMLPKDD (pp. 6–17).

  • Aumann, Y., & Lindell, Y. (1999). A statistical theory for quantitative association rules. In KDD. Citeseer, (Vol. 99 pp. 261–270).

  • Bay, S.D., & Pazzani, M.J. (2001). Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246.

    Article  MATH  Google Scholar 

  • Belfodil, A., Cazalens, S., Lamarre, P., Plantevit, M. (2017). Flash points: discovering exceptional pairwise behaviors in vote or rating data. In ECML PKDD (pp. 442–458).

    Chapter  Google Scholar 

  • Belfodil, A., Kuznetsov, S.O., Robardet, C., Kaytoue, M. (2017). Mining convex polygon patterns with formal concept analysis. In Proceedings of the 26th international joint conference on artificial intelligence, IJCAI. Melbourne, Australia August 19-25, 2017 (pp. 1425–1432).

  • Bendimerad, A.A., Plantevit, M., Robardet, C. (2016). Unsupervised exceptional attributed sub-graph mining in urban data. In ICDM (pp. 21–30).

  • Bendimerad, A.A., Cazabet, R., Plantevit, M., Robardet, C. (2017). Contextual subgraph discovery with mobility models. In Complex networks (pp. 477–489).

    Google Scholar 

  • Bie, T.D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407–446.

    Article  MathSciNet  MATH  Google Scholar 

  • Błaszczyński, J., Słowiński, R., Szelkag, M. (2011). Sequential covering rule induction algorithm for variable consistency rough set approaches. Information Sciences, 181(5), 987–1002.

    Article  MathSciNet  Google Scholar 

  • Boley, M., Lucchese, C., Paurat, D., Gärtner, T. (2011). Direct local pattern sampling by efficient two-step random procedures. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, August 21-24, 2011 (pp. 582–590).

  • Boley, M., Moens, S., Gärtner, T. (2012). Linear space direct pattern sampling using coupling from the past. In The 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, Beijing, China, August 12-16, 2012 (pp. 69–77).

  • Bosc, G., Golebiowski, J., Bensafi, M., Robardet, C., Plantevit, M., Boulicaut, J., Kaytoue, M. (2016). Local subgroup discovery for eliciting and understanding new structure-odor relationships. In DS (pp. 19–34).

    Chapter  Google Scholar 

  • Calders, T., Goethals, B., Jaroszewicz, S. (2006). Mining rank-correlated sets of numerical attributes. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 96–105): ACM.

  • Cerf, L., Besson, J., Robardet, C., Boulicaut, J. (2009). Closed patterns meet n-ary relations. TKDD, 3(1), 3:1–3:36.

    Article  Google Scholar 

  • Chaoji, V., Hasan, M.A., Salem, S., Besson, J., Zaki, M.J. (2008). ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Statistical Analysis and Data Mining, 1(2), 67–84.

    Article  MathSciNet  Google Scholar 

  • de Sá, C.R., Duivesteijn, W., Soares, C., Knobbe, A.J. (2016). Exceptional preferences mining (pp. 3–18).

  • Do, T.D.T., Laurent, A., Termier, A. (2010). PGLCM: efficient parallel mining of closed frequent gradual itemsets (pp. 138–147).

  • Do, T.D.T., Termier, A., Laurent, A., Negrevergne, B., Omidvar-Tehrani, B., Amer-Yahia, S. (2015). Pglcm: efficient parallel mining of closed frequent gradual itemsets. Knowledge and Information Systems, 43(3), 497–527.

    Article  Google Scholar 

  • Dong, G., & Li, J. (1999). Efficient mining of emerging patterns Discovering trends and differences. In Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 43–52): ACM.

  • Downar, L., & Duivesteijn, W. (2017). Exceptionally monotone models—the rank correlation model class for exceptional model mining. Knowledge and Information Systems, 51(2), 369–394.

    Article  Google Scholar 

  • Dua, D., & Graff, C. (2017). UCI machine learning repository.

  • Duivesteijn, W., Feelders, A.J., Knobbe, A. (2016). Exceptional model mining. Data Mining and Knowledge Discovery, 30(1), 47–98.

    Article  MathSciNet  MATH  Google Scholar 

  • Duivesteijn, W., Knobbe, A.J., Feelders, A., van Leeuwen, M. (2010). Subgroup discovery meets bayesian networks – an exceptional model mining approach. In ICDM (pp. 158–167).

  • Fan, Y.-N., Tseng, T.-L.B., Chern, C.-C., Huang, C.-C. (2009). Rule induction based on an incremental rough set. Expert Systems with Applications, 36(9), 11439–11450.

    Article  Google Scholar 

  • Grosskreutz, H., & Rüping, S. (2009). On subgroup discovery in numerical domains. Data Mining and Knowledge Discovery, 19(2), 210–226.

    Article  MathSciNet  Google Scholar 

  • Grosskreutz, H., Lang, B., Trabold, D. (2013). A relevance criterion for sequential patterns. In ECMLPKDD (pp. 369–384).

    Chapter  Google Scholar 

  • Hüllermeier, E. (2002). Association rules for expressing gradual dependencies (pp. 200–211).

    Chapter  Google Scholar 

  • Kaytoue, M., Kuznetsov, S.O., Napoli, A. (2011). Revisiting numerical pattern mining with formal concept analysis. In Walsh, T. (Ed.) IJCAI proceedings of the 22nd international joint conference on artificial intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011 (p. 2011): IJCAI/AAAI.

  • Kaytoue, M., Plantevit, M., Zimmermann, A., Bendimerad, A., Robardet, C. (2017). Exceptional contextual subgraph mining. Machine Learning, 106(8), 1171–1211.

    Article  MathSciNet  MATH  Google Scholar 

  • Klosgen, W. (1996). Explora: a multipattern and multistrategy discovery assistant. Advances in knowledge discovery and data mining (pp. 249–271).

  • Lavrač, N., Flach, P., Zupan, B. (1999). Rule evaluation measures: a unifying view. In Džeroski, S., & Flach, P. (Eds.) Inductive logic programming (pp. 174–185). Berlin: Springer.

  • Lavrač, N., Kavšek, B., Flach, P., Todorovski, L. (2004). Subgroup discovery with cn2-sd. Journal of Machine Learning Research, 5, 153–188.

    MathSciNet  Google Scholar 

  • Leman, D., Feelders, A., Knobbe, A. (2008). Exceptional model mining. In Joint European conference on machine learning and knowledge discovery in databases (pp. 1–16): Springer.

  • Lemmerich, F., Atzmueller, M., Puppe, F. (2016). Fast exhaustive subgroup discovery with numerical target concepts. Data Mining and Knowledge Discovery, 30 (3), 711–762.

    Article  MathSciNet  MATH  Google Scholar 

  • Liu, B., Hsu, W., Ma, Y. (1998). Integrating classification and association rule mining. In KDD (pp. 80–86).

  • Martínez-Ballesteros, M., Troncoso, A., Martínez-Álvarez, F., Riquelme, J. (2016). Obtaining optimal quality measures for quantitative association rules. Neurocomputing, 176, 36–47. Recent advancements in hybrid artificial intelligence systems and its application to real-world problems.

    Article  Google Scholar 

  • Morishita, S., & Sese, J. (2000). Transversing itemset lattices with statistical metric pruning. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS’00 (pp. 226–236). New York: ACM.

  • Prado, A., Plantevit, M., Robardet, C., Boulicaut, J.-F., patterns. (2013). Mining graph topological Finding covariations among vertex descriptors. IEEE Transactions on Knowledge and Data Engineering, 25(9), 2090–2104.

    Article  Google Scholar 

  • Rückert, U., Richter, L., Kramer, S. (2004). Quantitative association rules based on half-spaces: an optimization approach. In Proceedings of the 4th IEEE international conference on data mining (ICDM 2004), 1-4 November 2004, Brighton, UK (pp. 507–510).

  • Salleb-Aouissi, A., Vrain, C., Nortet, C., Kong, X., Rathod, V., Cassard, D. (2013). Quantminer for mining quantitative association rules. Journal of Machine Learning Research, 14(1), 3153–3157.

    MATH  Google Scholar 

  • Sikora, M., & Wróbel, Ł. (2010). Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines. Archives of Mining Sciences, 55(1), 91–114.

    Google Scholar 

  • Terada, A., Okada-Hatakeyama, M., Tsuda, K., Sese, J. (2013). Statistical significance of combinatorial regulations. Proceedings of the National Academy of Sciences, 110(32), 12996–13001.

    Article  MathSciNet  MATH  Google Scholar 

  • Tukey, J.W. (1977). Exploratory data analysis. Addison-Wesley series in behavioral science : quantitative methods, Addison-Wesley.

  • van Leeuwen, M., & Knobbem, A.J. (2012). Diverse subgroup set discovery. Data Mining and Knowledge Discovery, 25(2), 208–242.

    Article  MathSciNet  Google Scholar 

  • Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In PKDD (pp. 78–87).

    Chapter  Google Scholar 

  • Xin, D., Cheng, H., Yan, X., Han, J. (2006). Extracting redundancy-aware top-k patterns. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 444–453): ACM.

Download references

Acknowledgments

This work was supported by the Labex IMU Université de Lyon (project RESALI). It was also partially supported by the CNRS project APRC Conf Pap Grant APQ-04224-16 (Multilateral Cooperation FAPEMIG-CNRS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Céline Robardet.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hammal, M.A., Mathian, H., Merchez, L. et al. Rank correlated subgroup discovery. J Intell Inf Syst 53, 305–328 (2019). https://doi.org/10.1007/s10844-019-00555-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-019-00555-y

Keywords

Navigation