Rank correlated subgroup discovery

Hammal, Mohamed Ali; Mathian, Hélène; Merchez, Luc; Plantevit, Marc; Robardet, Céline

doi:10.1007/s10844-019-00555-y

Rank correlated subgroup discovery

Published: 06 April 2019

Volume 53, pages 305–328, (2019)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Mohamed Ali Hammal¹,
Hélène Mathian²,
Luc Merchez²,
Marc Plantevit³ &
…
Céline Robardet ORCID: orcid.org/0000-0002-8583-9408¹

336 Accesses
2 Citations
Explore all metrics

Abstract

Subgroup discovery (SD) and exceptional model mining (EMM), its generalization to handle more complex targets, are two mature fields at the frontier of data mining and machine learning. More precisely, EMM aims to find coherent subgroups of a dataset where multiple targets interact in an unusual way. Correlation model classes have already been defined to discover interesting subgroups when dealing with two numerical targets. However, in this supervised setting, the two numerical targets are fixed before the subgroup search. To make unsupervised exploration possible, we propose to search for arbitrary subsets of numerical targets whose correlation is exceptional for an automatically found subgroup. This involves solving two challenges: the definition of a model that evaluates the interest of a subgroup for a subset of numerical targets and the definition of a pattern language that enumerates both subgroups and targets and lends itself to effective research strategies. We propose an integrated solution to both challenges. We introduce the problem of rank-correlated subgroup discovery with an arbitrary subset of numerical targets. A rank-correlated subgroup is identified by both conditions on descriptive attributes, whether numeric or nominal, and a pattern on numeric attributes that captures (positive or negative) rank correlations based on a generalization of the Kendall’s τ. We define a new branch-and-bound algorithm that exploits some pruning properties based on two upper-bounds and a closure property. An empirical study on several datasets demonstrates the efficiency and the effectiveness of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exceptionally monotone models—the rank correlation model class for Exceptional Model Mining

Article 18 August 2016

For real: a thorough look at numeric attributes in subgroup discovery

Article Open access 21 September 2020

Discovering Outstanding Subgroup Lists for Numeric Targets Using MDL

Notes

References

Atzmüller, M., & Puppe, F. (2006). Sd-map - a fast algorithm for exhaustive subgroup discovery. In ECMLPKDD (pp. 6–17).
Aumann, Y., & Lindell, Y. (1999). A statistical theory for quantitative association rules. In KDD. Citeseer, (Vol. 99 pp. 261–270).
Bay, S.D., & Pazzani, M.J. (2001). Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246.
Article MATH Google Scholar
Belfodil, A., Cazalens, S., Lamarre, P., Plantevit, M. (2017). Flash points: discovering exceptional pairwise behaviors in vote or rating data. In ECML PKDD (pp. 442–458).
Chapter Google Scholar
Belfodil, A., Kuznetsov, S.O., Robardet, C., Kaytoue, M. (2017). Mining convex polygon patterns with formal concept analysis. In Proceedings of the 26th international joint conference on artificial intelligence, IJCAI. Melbourne, Australia August 19-25, 2017 (pp. 1425–1432).
Bendimerad, A.A., Plantevit, M., Robardet, C. (2016). Unsupervised exceptional attributed sub-graph mining in urban data. In ICDM (pp. 21–30).
Bendimerad, A.A., Cazabet, R., Plantevit, M., Robardet, C. (2017). Contextual subgraph discovery with mobility models. In Complex networks (pp. 477–489).
Google Scholar
Bie, T.D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407–446.
Article MathSciNet MATH Google Scholar
Błaszczyński, J., Słowiński, R., Szelkag, M. (2011). Sequential covering rule induction algorithm for variable consistency rough set approaches. Information Sciences, 181(5), 987–1002.
Article MathSciNet Google Scholar
Boley, M., Lucchese, C., Paurat, D., Gärtner, T. (2011). Direct local pattern sampling by efficient two-step random procedures. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, August 21-24, 2011 (pp. 582–590).
Boley, M., Moens, S., Gärtner, T. (2012). Linear space direct pattern sampling using coupling from the past. In The 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, Beijing, China, August 12-16, 2012 (pp. 69–77).
Bosc, G., Golebiowski, J., Bensafi, M., Robardet, C., Plantevit, M., Boulicaut, J., Kaytoue, M. (2016). Local subgroup discovery for eliciting and understanding new structure-odor relationships. In DS (pp. 19–34).
Chapter Google Scholar
Calders, T., Goethals, B., Jaroszewicz, S. (2006). Mining rank-correlated sets of numerical attributes. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 96–105): ACM.
Cerf, L., Besson, J., Robardet, C., Boulicaut, J. (2009). Closed patterns meet n-ary relations. TKDD, 3(1), 3:1–3:36.
Article Google Scholar
Chaoji, V., Hasan, M.A., Salem, S., Besson, J., Zaki, M.J. (2008). ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Statistical Analysis and Data Mining, 1(2), 67–84.
Article MathSciNet Google Scholar
de Sá, C.R., Duivesteijn, W., Soares, C., Knobbe, A.J. (2016). Exceptional preferences mining (pp. 3–18).
Do, T.D.T., Laurent, A., Termier, A. (2010). PGLCM: efficient parallel mining of closed frequent gradual itemsets (pp. 138–147).
Do, T.D.T., Termier, A., Laurent, A., Negrevergne, B., Omidvar-Tehrani, B., Amer-Yahia, S. (2015). Pglcm: efficient parallel mining of closed frequent gradual itemsets. Knowledge and Information Systems, 43(3), 497–527.
Article Google Scholar
Dong, G., & Li, J. (1999). Efficient mining of emerging patterns Discovering trends and differences. In Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 43–52): ACM.
Downar, L., & Duivesteijn, W. (2017). Exceptionally monotone models—the rank correlation model class for exceptional model mining. Knowledge and Information Systems, 51(2), 369–394.
Article Google Scholar
Dua, D., & Graff, C. (2017). UCI machine learning repository.
Duivesteijn, W., Feelders, A.J., Knobbe, A. (2016). Exceptional model mining. Data Mining and Knowledge Discovery, 30(1), 47–98.
Article MathSciNet MATH Google Scholar
Duivesteijn, W., Knobbe, A.J., Feelders, A., van Leeuwen, M. (2010). Subgroup discovery meets bayesian networks – an exceptional model mining approach. In ICDM (pp. 158–167).
Fan, Y.-N., Tseng, T.-L.B., Chern, C.-C., Huang, C.-C. (2009). Rule induction based on an incremental rough set. Expert Systems with Applications, 36(9), 11439–11450.
Article Google Scholar
Grosskreutz, H., & Rüping, S. (2009). On subgroup discovery in numerical domains. Data Mining and Knowledge Discovery, 19(2), 210–226.
Article MathSciNet Google Scholar
Grosskreutz, H., Lang, B., Trabold, D. (2013). A relevance criterion for sequential patterns. In ECMLPKDD (pp. 369–384).
Chapter Google Scholar
Hüllermeier, E. (2002). Association rules for expressing gradual dependencies (pp. 200–211).
Chapter Google Scholar
Kaytoue, M., Kuznetsov, S.O., Napoli, A. (2011). Revisiting numerical pattern mining with formal concept analysis. In Walsh, T. (Ed.) IJCAI proceedings of the 22nd international joint conference on artificial intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011 (p. 2011): IJCAI/AAAI.
Kaytoue, M., Plantevit, M., Zimmermann, A., Bendimerad, A., Robardet, C. (2017). Exceptional contextual subgraph mining. Machine Learning, 106(8), 1171–1211.
Article MathSciNet MATH Google Scholar
Klosgen, W. (1996). Explora: a multipattern and multistrategy discovery assistant. Advances in knowledge discovery and data mining (pp. 249–271).
Lavrač, N., Flach, P., Zupan, B. (1999). Rule evaluation measures: a unifying view. In Džeroski, S., & Flach, P. (Eds.) Inductive logic programming (pp. 174–185). Berlin: Springer.
Lavrač, N., Kavšek, B., Flach, P., Todorovski, L. (2004). Subgroup discovery with cn2-sd. Journal of Machine Learning Research, 5, 153–188.
MathSciNet Google Scholar
Leman, D., Feelders, A., Knobbe, A. (2008). Exceptional model mining. In Joint European conference on machine learning and knowledge discovery in databases (pp. 1–16): Springer.
Lemmerich, F., Atzmueller, M., Puppe, F. (2016). Fast exhaustive subgroup discovery with numerical target concepts. Data Mining and Knowledge Discovery, 30 (3), 711–762.
Article MathSciNet MATH Google Scholar
Liu, B., Hsu, W., Ma, Y. (1998). Integrating classification and association rule mining. In KDD (pp. 80–86).
Martínez-Ballesteros, M., Troncoso, A., Martínez-Álvarez, F., Riquelme, J. (2016). Obtaining optimal quality measures for quantitative association rules. Neurocomputing, 176, 36–47. Recent advancements in hybrid artificial intelligence systems and its application to real-world problems.
Article Google Scholar
Morishita, S., & Sese, J. (2000). Transversing itemset lattices with statistical metric pruning. In Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS’00 (pp. 226–236). New York: ACM.
Prado, A., Plantevit, M., Robardet, C., Boulicaut, J.-F., patterns. (2013). Mining graph topological Finding covariations among vertex descriptors. IEEE Transactions on Knowledge and Data Engineering, 25(9), 2090–2104.
Article Google Scholar
Rückert, U., Richter, L., Kramer, S. (2004). Quantitative association rules based on half-spaces: an optimization approach. In Proceedings of the 4th IEEE international conference on data mining (ICDM 2004), 1-4 November 2004, Brighton, UK (pp. 507–510).
Salleb-Aouissi, A., Vrain, C., Nortet, C., Kong, X., Rathod, V., Cassard, D. (2013). Quantminer for mining quantitative association rules. Journal of Machine Learning Research, 14(1), 3153–3157.
MATH Google Scholar
Sikora, M., & Wróbel, Ł. (2010). Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines. Archives of Mining Sciences, 55(1), 91–114.
Google Scholar
Terada, A., Okada-Hatakeyama, M., Tsuda, K., Sese, J. (2013). Statistical significance of combinatorial regulations. Proceedings of the National Academy of Sciences, 110(32), 12996–13001.
Article MathSciNet MATH Google Scholar
Tukey, J.W. (1977). Exploratory data analysis. Addison-Wesley series in behavioral science : quantitative methods, Addison-Wesley.
van Leeuwen, M., & Knobbem, A.J. (2012). Diverse subgroup set discovery. Data Mining and Knowledge Discovery, 25(2), 208–242.
Article MathSciNet Google Scholar
Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In PKDD (pp. 78–87).
Chapter Google Scholar
Xin, D., Cheng, H., Yan, X., Han, J. (2006). Extracting redundancy-aware top-k patterns. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 444–453): ACM.

Download references

Acknowledgments

This work was supported by the Labex IMU Université de Lyon (project RESALI). It was also partially supported by the CNRS project APRC Conf Pap Grant APQ-04224-16 (Multilateral Cooperation FAPEMIG-CNRS).

Author information

Authors and Affiliations

Université de Lyon, INSA Lyon, LIRIS CNRS UMR 5205, Lyon, France
Mohamed Ali Hammal & Céline Robardet
École Normale Supérieure de Lyon, UMR 5600 Environnement Ville Société, Lyon, France
Hélène Mathian & Luc Merchez
Université de Lyon, Université Lyon 1, LIRIS CNRS UMR 5205, Lyon, France
Marc Plantevit

Authors

Mohamed Ali Hammal
View author publications
You can also search for this author in PubMed Google Scholar
Hélène Mathian
View author publications
You can also search for this author in PubMed Google Scholar
Luc Merchez
View author publications
You can also search for this author in PubMed Google Scholar
Marc Plantevit
View author publications
You can also search for this author in PubMed Google Scholar
Céline Robardet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Céline Robardet.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hammal, M.A., Mathian, H., Merchez, L. et al. Rank correlated subgroup discovery. J Intell Inf Syst 53, 305–328 (2019). https://doi.org/10.1007/s10844-019-00555-y

Download citation

Received: 24 April 2018
Revised: 07 March 2019
Accepted: 18 March 2019
Published: 06 April 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s10844-019-00555-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rank correlated subgroup discovery

Abstract

Access this article

Similar content being viewed by others

Exceptionally monotone models—the rank correlation model class for Exceptional Model Mining

For real: a thorough look at numeric attributes in subgroup discovery

Discovering Outstanding Subgroup Lists for Numeric Targets Using MDL

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Rank correlated subgroup discovery

Abstract

Access this article

Similar content being viewed by others

Exceptionally monotone models—the rank correlation model class for Exceptional Model Mining

For real: a thorough look at numeric attributes in subgroup discovery

Discovering Outstanding Subgroup Lists for Numeric Targets Using MDL

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation