Abstract
Subgroup discovery is the most well-known task within the supervised descriptive pattern mining field. It aims at discovering patterns in the form of rules induced from labeled data. This chapter therefore introduces the subgroup discovery problem and also describes the main differences with regard to classification and clustering tasks. Additionally, it provides a good description about similarities and differences with respect to other well-known tasks within the supervised descriptive pattern mining field such as contrast set mining and emerging pattern mining. Finally, the most widely used metrics in this field as well as important approaches to perform this task are analysed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
T. Abudawood, P. Flach, Evaluation measures for multi-class subgroup discovery, in Machine Learning and Knowledge Discovery in Databases, ed. by W. Buntine, M. Grobelnik, D. Mladenić, J. Shawe-Taylor. Lecture Notes in Computer Science, vol. 5781 (Springer, Berlin, 2009), pp. 35–50
C.C. Aggarwal, J. Han, Frequent Pattern Mining (Springer International Publishing, Cham, 2014)
R. Agrawal, T. Imielinski, A.N. Swami, Mining association rules between sets of items in large databases, in Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD Conference ’93), Washington, DC, pp. 207–216 (1993)
M. Atzmueller, Subgroup discovery - advanced review. WIREs Data Min. Knowl. Discovery 5, 35–49 (2015)
M. Atzmueller, F. Puppe, SD-Map – a fast algorithm for exhaustive subgroup discovery, in Proceedings of the 10th European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD ’06), Berlin, pp. 6–17 (2006)
M. Atzmuller, F. Puppe, H.P. Buscher, Towards knowledge-intensive subgroup discovery, in Proceedings of the Lernen-Wissensentdeckung-Adaptivitat-Fachgruppe Maschinelles Lernen (LWA-04), Berlin, pp. 111–117, October 2004
S.D. Bay, M.J. Pazzani, Detecting group differences: mining contrast sets. Data Min. Knowl. Disc. 5(3), 213–246 (2001)
O. Bousquet, U. Luxburg, G. Ratsch, Advanced Lectures On Machine Learning (Springer, Berlin, 2004)
C.J. Carmona, P. González, M.J. del Jesus, F. Herrera, NMEEF-SD: non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans. Fuzzy Syst. 18(5), 958–970 (2010)
C.J. Carmona, P. González, M.J. del Jesus, M. Navío-Acosta, L. Jiménez-Trevino, Evolutionary fuzzy rule extraction for subgroup discovery in a psychiatric emergency department. Soft Comput. 15(12), 2435–2448 (2011)
C.J. Carmona, P. González, M.J. del Jesus, F. Herrera, Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 4(2), 87–103 (2014)
C.J. Carmona, M.J. del Jesus, F. Herrera, A unifying analysis for the supervised descriptive rule discovery via the weighted relative accuracy. Knowl. Based Syst. 139, 89–100 (2018)
P. Clark, T. Niblett, The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)
C.A. Coello, G.B. Lamont, D.A. Van Veldhuizen, Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation) (Springer, New York, 2006)
J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
K. Deb, A. Pratap, S. Agrawal, T. Meyarivan, A fast elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2000)
M.J. del Jesus, P. Gonzalez, F. Herrera, M. Mesonero, Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing. IEEE Trans. Fuzzy Syst. 15(4), 578–592 (2007)
G. Dong, J. Bailey (eds.), Contrast Data Mining: Concepts, Algorithms, and Applications (CRC Press, Boca Raton, 2013)
W. Duivesteijn, A.J. Knobbe, Exploiting false discoveries - statistical validation of patterns and quality measures in subgroup discovery, in Proceedings of the 11th IEEE International Conference on Data Mining (ICDM 2011), Vancouver, BC, pp. 151–160, December 2011
D. Gamberger, N. Lavrac, Expert-guided subgroup discovery: methodology and application. J. Artif. Intell. Res. 17(1), 501–527 (2002)
A.M. García-Vico, C.J. Carmona, D. Martín, M. García-Borroto, M.J. del Jesus, An overview of emerging pattern mining in supervised descriptive rule discovery: taxonomy, empirical study, trends and prospects. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 8(1) (2018)
H. Grosskreutz, S. Rüping, On subgroup discovery in numerical domains. Data Min. Knowl. Disc. 19(2), 210–226 (2009)
H. Grosskreutz, S. Rüping, S. Wrobel, Tight optimistic estimates for fast subgroup discovery, in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 08), Antwerp, pp. 440–456, September 2008
J. Han, J. Pei, Y. Yin, R. Mao, Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Disc. 8, 53–87 (2004)
F. Herrera, C.J. Carmona, P. González, M.J. del Jesus, An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)
B. Kavšek, N. Lavrač, APRIORI-SD: adapting association rule learning to subgroup discovery. Appl. Artif. Intell. 20(7), 543–583 (2006)
W. Kloesgen, M. May, Census data mining - an application, in Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2002), Helsinki (Springer, London, 2002), pp. 733–739
W. Klösgen, Explora: a multipattern and multistrategy discovery assistant, in Advances in Knowledge Discovery and Data Mining, ed. by U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (American Association for Artificial Intelligence, Menlo Park, 1996), pp. 249–271
N. Lavrač, B. Kavšek, P. Flach, L. Todorovski, Subgroup discovery with CN2-SD. J Mach Learn Res 5, 153–188 (2004)
F. Lemmerich, M. Atzmueller, F. Puppe, Fast exhaustive subgroup discovery with numerical target concepts. Data Min. Knowl. Disc. 30(3), 711–762 (2016)
H. Li, Y. Wang, D. Zhang, M. Zhang, E.Y. Chang, PFP: parallel FP-growth for query recommendation, in Proceedings of the 2008 ACM Conference on Recommender Systems, Lausanne, October 2008 (ACM, New York, 2008), pp. 107–114
J.M. Luna, J.R. Romero, C. Romero, S. Ventura, On the use of genetic programming for mining comprehensible rules in subgroup discovery. IEEE Trans. Cybern. 44(12), 2329–2341 (2014)
R. McKay, N. Hoai, P. Whigham, Y. Shan, M. O’Neill, Grammar-based Genetic Programming: a survey. Genet. Program. Evolvable Mach. 11, 365–396 (2010)
T.M. Mitchell, Machine Learning. McGraw Hill Series in Computer Science (McGraw-Hill, Maidenhead, 1997)
M. Mueller, R. Rosales, H. Steck, S. Krishnan, B. Rao, S. Kramer, Subgroup discovery for test selection: a novel approach and its application to breast cancer diagnosis, in Advances in Intelligent Data Analysis VIII, ed. by N. Adams, C. Robardet, A. Siebes, J.F. Boulicaut. Lecture Notes in Computer Science, vol. 5772 (Springer, Berlin, 2009), pp. 119–130
P.K. Novak, N. Lavrač, G.I. Webb, Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J. Mach. Learn. Res. 10, 377–403 (2009)
V. Pachón, J. Mata, J.L. Domínguez, M.J. Maña, A multi-objective evolutionary approach for subgroup discovery, in Proceedings of the 5th International Conference on Hybrid Artificial Intelligence Systems (HAIS 2010), San Sebastian (Springer, Berlin, 2010), pp. 271–278
F. Padillo, J.M. Luna, S. Ventura, Subgroup discovery on big data: exhaustive methodologies using map-reduce, in Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, Tianjin (IEEE, Piscataway, 2016), pp. 1684–1691
F. Padillo, J.M. Luna, S. Ventura, Exhaustive search algorithms to mine subgroups on big data using apache spark. Prog. Artif. Intell. 6(2), 145–158 (2017)
F. Pulgar-Rubio, A.J. Rivera-Rivas, M.D. Pérez-Godoy, P. González, C.J. Carmona, M.J. del Jesus, MEFASD-BD: multi-objective evolutionary fuzzy algorithm for subgroup discovery in big data environments - a mapreduce solution. Knowl. Based Syst. 117, 70–78 (2017)
D. Rodriguez, R. Ruiz, J.C. Riquelme, J.S. Aguilar-Ruiz. Searching for rules to detect defective modules: a subgroup discovery approach. Inf. Sci. 191, 14–30 (2012)
P.N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining (Addison Wesley, Boston, 2005)
S. Ventura, J.M. Luna, Pattern Mining with Evolutionary Algorithms (Springer International Publishing, Cham, 2016)
S. Wrobel, An algorithm for multi-relational discovery of subgroups, in Proceedings of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD ’97), London (Springer, Berlin, 1997), pp. 78–87
L.A. Zadeh, The concept of a linguistic variable and its application to approximate reasoning I,II,III. Inf. Sci. 8–9, 199–249, 301–357, 43–80 (1975)
E. Zitzler, M. Laumanns, L. Thiele, SPEA2: improving the strength pareto evolutionary algorithm for multiobjective optimization, in Proceedings of the 2001 conference on Evolutionary Methods for Design, Optimisation and Control with Application to Industrial Problems (EUROGEN 2001), Athens, pp. 95–100 (2001)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Ventura, S., Luna, J.M. (2018). Subgroup Discovery. In: Supervised Descriptive Pattern Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-98140-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-98140-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98139-0
Online ISBN: 978-3-319-98140-6
eBook Packages: Computer ScienceComputer Science (R0)