Journal of Intelligent Information Systems

, Volume 45, Issue 3, pp 337–355 | Cite as

Cost-based quality measures in subgroup discovery

  • Rob M. Konijn
  • Wouter Duivesteijn
  • Marvin Meeng
  • Arno Knobbe


We consider data where examples are not only labeled in the classical sense (positive or negative), but also have costs associated with them. In this sense, each example has two target attributes, and we aim to find clearly defined subsets of the data where the values of these two targets have an unusual distribution. In other words, we are focusing on a Subgroup Discovery task with a somewhat unusual target concept, and investigate quality measures that take into account both the binary and the cost target. In defining such quality measures, we aim to produce an interpretable valuation of a subgroup, such that data analysts can directly appreciate the findings, and relate these to monetary gains or losses. Our work is particularly relevant in the domain of health care fraud detection. In this domain, the binary target identifies the patients of a specific medical practitioner under investigation, and the cost target specifies the money spent on each patient. When looking for differences in claim behavior, we need to take into account both the ‘positive’ examples (patients of the practitioner) and ‘negative’ examples (other patients), as well as information about costs of all patients. A typical subgroup will list a number of treatments, and the target practitioner’s patients behavioral difference in both treatment prevalence and associated costs. An additional angle is the Local Subgroup Discovery task, where subgroups are judged according to the difference with a local reference group instead of the entire dataset. We show how the cost-based analysis of data specifically fits this local focus.


Subgroup discovery Quality measures 


  1. Atzmueller, M., & Lemmerich, F. (2009). Fast subgroup discovery for continuous target concepts. In J. Rauch, W. Raś, Z., P. Berka, T. Elomaa (Eds.), Foundations of intelligent systems. Lecture notes in computer science (Vol. 5722, pp. 35–44). Berlin: Springer.Google Scholar
  2. Bay, S., & Pazzani, M. (2001). Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246.MATHCrossRefGoogle Scholar
  3. Chan, R., Yang, Q., Shen, Y.-D. (2003). Mining high utility itemsets. In Third IEEE international conference on data mining, 2003 (pp. 19–26). IEEE.Google Scholar
  4. Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: discovering trends and differences. In Proceedings of KDD ’99 (pp. 43–52). New York.Google Scholar
  5. Elkan, C. (2001). The foundations of cost-sensitive learning. In International joint conference on artificial intelligence (Vol. 17, pp. 973–978). Citeseer.Google Scholar
  6. Grosskreutz, H. (2010). Cascaded subgroups discovery with an application to regression. In LeGo-08 - from local patterns to global models: ECML/PKDD-08 workshop (p. 16).Google Scholar
  7. Grosskreutz, H., Rüping, S., Wrobel, S. (2008). Tight optimistic estimates for fast subgroup discovery. In W. Daelemans, B. Goethals, K. Morik (Eds.), Machine learning and knowledge discovery in databases. Lecture notes in computer science (Vol. 5211, pp. 440–456). Berlin: Springer.Google Scholar
  8. Hernández-Orallo, J., Flach, P. A., Ramirez, C. F. (2011). Technical note: towards roc curves in cost space. CoRR, ArXiv abs/1107.5930.
  9. Jorge, A. M., Azevedo, P. J., Pereira, F. (2006). Distribution rules with numeric attributes of interest. In J. Fürnkranz, T. Scheffer, M. Spiliopoulou (Eds.), Knowledge discovery in databases: PKDD 2006. Lecture notes in computer science (Vol. 4213, pp. 247–258). Berlin: Springer.Google Scholar
  10. Knobbe, A., & Ho, E. (2006). Pattern teams. In J. Fürnkranz, T. Scheffer, M. Spiliopoulou (Eds.), Knowledge discovery in databases: PKDD 2006. Lecture notes in computer science (Vol. 4213, pp. 577–584). Berlin: Springer.Google Scholar
  11. Konijn, R. M., & Kowalczyk, W. (2012). Hunting for fraudsters in random forests. In E. Corchado, V. Snasel, A. Abraham, M. Wozniak, M. Grana, S.-B. Cho (Eds.), Hybrid artificial intelligent systems. Lecture notes in computer science (Vol. 7208, pp. 174–185). Berlin: Springer.Google Scholar
  12. Konijn, R. M., Duivesteijn, W., Kowalczyk, W., Knobbe, A. (2013a). Discovering local subgroups, with an application to fraud detection. In J. Pei, V. Tseng, L. Cao, H. Motoda, G. Xu (Eds.), Advances in knowledge discovery and data mining. Lecture notes in computer science (Vol. 7818, pp. 1–12). Berlin: Springer.Google Scholar
  13. Konijn, R. M., Duivesteijn, W., Meeng, M., Knobbe, A. (2013b). Cost-based quality measures in subgroup discovery. In New frontiers in applied data mining - PAKDD 2013 international workshops - QIMIE 2013.Google Scholar
  14. Lavrač, N., Flach, P., Zupan, B. (1999). Rule evaluation measures: a unifying view. In S. Džeroski, P. Flach (Eds.), Inductive logic programming. Lecture notes in computer science (Vol. 1634, pp. 174–185). Berlin: Springer.Google Scholar
  15. Liu, Y., Liao, W. K., Choudhary, A. (2005). A fast high utility itemsets mining algorithm. In Proceedings of the 1st international workshop on utility-based data mining (pp. 90–99).Google Scholar
  16. Meeng, M., & Knobbe, A. (2011). Flexible enrichment with cortana (software demo). In Proceedings Benelearn (pp. 117–120).Google Scholar
  17. Pieters, B.F.I., Knobbe, A., Džeroski, S. (2010). Subgroup discovery in ranked data, with an application to gene set enrichment. In Proceedings preference learning workshop (PL 2010) Google Scholar
  18. Reid, A. A., Tayebi, M. A., Frank, R. (2013). Exploring the structural characteristics of social networks in a large criminal court database. In Proceedings of the IEEE intelligence and security informatics conference (ISI 2013) (pp. 209–214).Google Scholar
  19. Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In Proceedings of PKDD (pp. 78–87).Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Rob M. Konijn
    • 1
    • 2
  • Wouter Duivesteijn
    • 1
  • Marvin Meeng
    • 1
  • Arno Knobbe
    • 1
  1. 1.LIACSLeiden UniversityLeidenThe Netherlands
  2. 2.Achmea Health InsuranceZeistThe Netherlands

Personalised recommendations