Exceptional Model Mining

  • Arno Knobbe
  • Ad Feelders
  • Dennis Leman
Part of the Intelligent Systems Reference Library book series (ISRL, volume 24)


In most databases, it is possible to identify small partitions of the data where the observed distribution is notably different from that of the database as a whole. In classical subgroup discovery, one considers the distribution of a single nominal attribute, and exceptional subgroups show a surprising increase in the occurrence of one of its values. In this paper, we describe Exceptional Model Mining (EMM), a framework that allows for more complicated target concepts. Rather than finding subgroups based on the distribution of a single target attribute, EMM finds subgroups where a model fitted to that subgroup is somehow exceptional. We discuss regression as well as classification models, and define quality measures that determine how exceptional a given model on a subgroup is. Our framework is general enough to be applied to many types of models, even from other paradigms such as association analysis and graphical modeling.


Quality Measure Sales Price Decision Table Output Attribute Hellinger Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian Networks: The combination of knowledge and statistical data. Machine Learning 20, 179–243 (1995)Google Scholar
  3. 3.
    Klösgen, W.: Handbook of Data Mining and Knowledge Discovery. Subgroup Discovery, ch. 16.3. Oxford University Press, New York (2002)zbMATHGoogle Scholar
  4. 4.
    Friedman, J., Fisher, N.: Bump-Hunting in High-Dimensional Data. Statistics and Computing 9(2), 123–143 (1999)CrossRefGoogle Scholar
  5. 5.
    Leman, D., Feelders, A., Knobbe, A.J.: Exceptional Model Mining. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 1–16. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Knobbe, A.: Safarii multi-relational data mining environment (2006),
  7. 7.
    Knobbe, A.J., Ho, E.K.Y.: Pattern teams. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 577–584. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Kohavi, R.: The Power of Decision Tables. In: Proceedings ECML1995, London (1995)Google Scholar
  9. 9.
    Anglin, P.M., Gençay, R.: Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics 11(6), 633–648 (1996)CrossRefGoogle Scholar
  10. 10.
    van de Koppel, E., et al.: Knowledge Discovery in Neuroblastoma-related Biological Data. In: Data Mining in Functional Genomics and Proteomics workshop at PKDD 2007, Warsaw, Poland (2007)Google Scholar
  11. 11.
    Moore, D., McCabe, G.: Introduction to the Practice of Statistics, New York (1993)Google Scholar
  12. 12.
    Neter, J., Kutner, M., Nachtsheim, C.J., Wasserman, W.: Applied Linear Statistical Models. WCB McGraw-Hill, New York (1996)Google Scholar
  13. 13.
    Yang, G., Le Cam, L.: Asymptotics in Statistics: Some Basic Concepts. Springer, Heidelberg (2000)zbMATHGoogle Scholar
  14. 14.
    Xu, Y., Fern, A.: Learning Linear Ranking Functions for Beam Search. In: Proceedings ICML 2007 (2007)Google Scholar
  15. 15.
    Niculescu-Mizil, A., Caruana, R.: Inductive Transfer for Bayesian Network Structure Learning. In: Proceedings of the 11th International Conference on AI and Statitics, AISTATS 2007 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Arno Knobbe
    • 1
  • Ad Feelders
    • 2
  • Dennis Leman
    • 2
  1. 1.LIACS, Leiden UniversityLeidenThe Netherlands
  2. 2.Utrecht UniversityUtrechtThe Netherlands

Personalised recommendations