Skip to main content

Subgroup Discovery

  • Chapter
  • First Online:
Supervised Descriptive Pattern Mining

Abstract

Subgroup discovery is the most well-known task within the supervised descriptive pattern mining field. It aims at discovering patterns in the form of rules induced from labeled data. This chapter therefore introduces the subgroup discovery problem and also describes the main differences with regard to classification and clustering tasks. Additionally, it provides a good description about similarities and differences with respect to other well-known tasks within the supervised descriptive pattern mining field such as contrast set mining and emerging pattern mining. Finally, the most widely used metrics in this field as well as important approaches to perform this task are analysed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. T. Abudawood, P. Flach, Evaluation measures for multi-class subgroup discovery, in Machine Learning and Knowledge Discovery in Databases, ed. by W. Buntine, M. Grobelnik, D. Mladenić, J. Shawe-Taylor. Lecture Notes in Computer Science, vol. 5781 (Springer, Berlin, 2009), pp. 35–50

    Chapter  Google Scholar 

  2. C.C. Aggarwal, J. Han, Frequent Pattern Mining (Springer International Publishing, Cham, 2014)

    Book  Google Scholar 

  3. R. Agrawal, T. Imielinski, A.N. Swami, Mining association rules between sets of items in large databases, in Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD Conference ’93), Washington, DC, pp. 207–216 (1993)

    Google Scholar 

  4. M. Atzmueller, Subgroup discovery - advanced review. WIREs Data Min. Knowl. Discovery 5, 35–49 (2015)

    Article  Google Scholar 

  5. M. Atzmueller, F. Puppe, SD-Map – a fast algorithm for exhaustive subgroup discovery, in Proceedings of the 10th European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD ’06), Berlin, pp. 6–17 (2006)

    Google Scholar 

  6. M. Atzmuller, F. Puppe, H.P. Buscher, Towards knowledge-intensive subgroup discovery, in Proceedings of the Lernen-Wissensentdeckung-Adaptivitat-Fachgruppe Maschinelles Lernen (LWA-04), Berlin, pp. 111–117, October 2004

    Google Scholar 

  7. S.D. Bay, M.J. Pazzani, Detecting group differences: mining contrast sets. Data Min. Knowl. Disc. 5(3), 213–246 (2001)

    Article  Google Scholar 

  8. O. Bousquet, U. Luxburg, G. Ratsch, Advanced Lectures On Machine Learning (Springer, Berlin, 2004)

    Book  Google Scholar 

  9. C.J. Carmona, P. González, M.J. del Jesus, F. Herrera, NMEEF-SD: non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans. Fuzzy Syst. 18(5), 958–970 (2010)

    Article  Google Scholar 

  10. C.J. Carmona, P. González, M.J. del Jesus, M. Navío-Acosta, L. Jiménez-Trevino, Evolutionary fuzzy rule extraction for subgroup discovery in a psychiatric emergency department. Soft Comput. 15(12), 2435–2448 (2011)

    Article  Google Scholar 

  11. C.J. Carmona, P. González, M.J. del Jesus, F. Herrera, Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 4(2), 87–103 (2014)

    Article  Google Scholar 

  12. C.J. Carmona, M.J. del Jesus, F. Herrera, A unifying analysis for the supervised descriptive rule discovery via the weighted relative accuracy. Knowl. Based Syst. 139, 89–100 (2018)

    Article  Google Scholar 

  13. P. Clark, T. Niblett, The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)

    Google Scholar 

  14. C.A. Coello, G.B. Lamont, D.A. Van Veldhuizen, Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation) (Springer, New York, 2006)

    MATH  Google Scholar 

  15. J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  16. K. Deb, A. Pratap, S. Agrawal, T. Meyarivan, A fast elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2000)

    Article  Google Scholar 

  17. M.J. del Jesus, P. Gonzalez, F. Herrera, M. Mesonero, Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing. IEEE Trans. Fuzzy Syst. 15(4), 578–592 (2007)

    Article  Google Scholar 

  18. G. Dong, J. Bailey (eds.), Contrast Data Mining: Concepts, Algorithms, and Applications (CRC Press, Boca Raton, 2013)

    Google Scholar 

  19. W. Duivesteijn, A.J. Knobbe, Exploiting false discoveries - statistical validation of patterns and quality measures in subgroup discovery, in Proceedings of the 11th IEEE International Conference on Data Mining (ICDM 2011), Vancouver, BC, pp. 151–160, December 2011

    Google Scholar 

  20. D. Gamberger, N. Lavrac, Expert-guided subgroup discovery: methodology and application. J. Artif. Intell. Res. 17(1), 501–527 (2002)

    Article  Google Scholar 

  21. A.M. García-Vico, C.J. Carmona, D. Martín, M. García-Borroto, M.J. del Jesus, An overview of emerging pattern mining in supervised descriptive rule discovery: taxonomy, empirical study, trends and prospects. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 8(1) (2018)

    Google Scholar 

  22. H. Grosskreutz, S. Rüping, On subgroup discovery in numerical domains. Data Min. Knowl. Disc. 19(2), 210–226 (2009)

    Article  MathSciNet  Google Scholar 

  23. H. Grosskreutz, S. Rüping, S. Wrobel, Tight optimistic estimates for fast subgroup discovery, in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 08), Antwerp, pp. 440–456, September 2008

    Google Scholar 

  24. J. Han, J. Pei, Y. Yin, R. Mao, Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Disc. 8, 53–87 (2004)

    Article  MathSciNet  Google Scholar 

  25. F. Herrera, C.J. Carmona, P. González, M.J. del Jesus, An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)

    Article  Google Scholar 

  26. B. Kavšek, N. Lavrač, APRIORI-SD: adapting association rule learning to subgroup discovery. Appl. Artif. Intell. 20(7), 543–583 (2006)

    Article  Google Scholar 

  27. W. Kloesgen, M. May, Census data mining - an application, in Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2002), Helsinki (Springer, London, 2002), pp. 733–739

    Google Scholar 

  28. W. Klösgen, Explora: a multipattern and multistrategy discovery assistant, in Advances in Knowledge Discovery and Data Mining, ed. by U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (American Association for Artificial Intelligence, Menlo Park, 1996), pp. 249–271

    Google Scholar 

  29. N. Lavrač, B. Kavšek, P. Flach, L. Todorovski, Subgroup discovery with CN2-SD. J Mach Learn Res 5, 153–188 (2004)

    MathSciNet  Google Scholar 

  30. F. Lemmerich, M. Atzmueller, F. Puppe, Fast exhaustive subgroup discovery with numerical target concepts. Data Min. Knowl. Disc. 30(3), 711–762 (2016)

    Article  MathSciNet  Google Scholar 

  31. H. Li, Y. Wang, D. Zhang, M. Zhang, E.Y. Chang, PFP: parallel FP-growth for query recommendation, in Proceedings of the 2008 ACM Conference on Recommender Systems, Lausanne, October 2008 (ACM, New York, 2008), pp. 107–114

    Google Scholar 

  32. J.M. Luna, J.R. Romero, C. Romero, S. Ventura, On the use of genetic programming for mining comprehensible rules in subgroup discovery. IEEE Trans. Cybern. 44(12), 2329–2341 (2014)

    Article  Google Scholar 

  33. R. McKay, N. Hoai, P. Whigham, Y. Shan, M. O’Neill, Grammar-based Genetic Programming: a survey. Genet. Program. Evolvable Mach. 11, 365–396 (2010)

    Article  Google Scholar 

  34. T.M. Mitchell, Machine Learning. McGraw Hill Series in Computer Science (McGraw-Hill, Maidenhead, 1997)

    Google Scholar 

  35. M. Mueller, R. Rosales, H. Steck, S. Krishnan, B. Rao, S. Kramer, Subgroup discovery for test selection: a novel approach and its application to breast cancer diagnosis, in Advances in Intelligent Data Analysis VIII, ed. by N. Adams, C. Robardet, A. Siebes, J.F. Boulicaut. Lecture Notes in Computer Science, vol. 5772 (Springer, Berlin, 2009), pp. 119–130

    Chapter  Google Scholar 

  36. P.K. Novak, N. Lavrač, G.I. Webb, Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J. Mach. Learn. Res. 10, 377–403 (2009)

    MATH  Google Scholar 

  37. V. Pachón, J. Mata, J.L. Domínguez, M.J. Maña, A multi-objective evolutionary approach for subgroup discovery, in Proceedings of the 5th International Conference on Hybrid Artificial Intelligence Systems (HAIS 2010), San Sebastian (Springer, Berlin, 2010), pp. 271–278

    Google Scholar 

  38. F. Padillo, J.M. Luna, S. Ventura, Subgroup discovery on big data: exhaustive methodologies using map-reduce, in Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, Tianjin (IEEE, Piscataway, 2016), pp. 1684–1691

    Google Scholar 

  39. F. Padillo, J.M. Luna, S. Ventura, Exhaustive search algorithms to mine subgroups on big data using apache spark. Prog. Artif. Intell. 6(2), 145–158 (2017)

    Article  Google Scholar 

  40. F. Pulgar-Rubio, A.J. Rivera-Rivas, M.D. Pérez-Godoy, P. González, C.J. Carmona, M.J. del Jesus, MEFASD-BD: multi-objective evolutionary fuzzy algorithm for subgroup discovery in big data environments - a mapreduce solution. Knowl. Based Syst. 117, 70–78 (2017)

    Article  Google Scholar 

  41. D. Rodriguez, R. Ruiz, J.C. Riquelme, J.S. Aguilar-Ruiz. Searching for rules to detect defective modules: a subgroup discovery approach. Inf. Sci. 191, 14–30 (2012)

    Article  Google Scholar 

  42. P.N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining (Addison Wesley, Boston, 2005)

    Google Scholar 

  43. S. Ventura, J.M. Luna, Pattern Mining with Evolutionary Algorithms (Springer International Publishing, Cham, 2016)

    Book  Google Scholar 

  44. S. Wrobel, An algorithm for multi-relational discovery of subgroups, in Proceedings of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD ’97), London (Springer, Berlin, 1997), pp. 78–87

    Google Scholar 

  45. L.A. Zadeh, The concept of a linguistic variable and its application to approximate reasoning I,II,III. Inf. Sci. 8–9, 199–249, 301–357, 43–80 (1975)

    Google Scholar 

  46. E. Zitzler, M. Laumanns, L. Thiele, SPEA2: improving the strength pareto evolutionary algorithm for multiobjective optimization, in Proceedings of the 2001 conference on Evolutionary Methods for Design, Optimisation and Control with Application to Industrial Problems (EUROGEN 2001), Athens, pp. 95–100 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ventura, S., Luna, J.M. (2018). Subgroup Discovery. In: Supervised Descriptive Pattern Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-98140-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98140-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98139-0

  • Online ISBN: 978-3-319-98140-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics