Evaluation Measures for Multi-class Subgroup Discovery

Abudawood, Tarek; Flach, Peter

doi:10.1007/978-3-642-04180-8_20

Tarek Abudawood²² &
Peter Flach²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5781))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3207 Accesses
14 Citations

Abstract

Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. It has previously predominantly been investigated in a two-class context. This paper investigates multi-class subgroup discovery methods. We consider six evaluation measures for multi-class subgroups, four of them new, and study their theoretical properties. We extend the two-class subgroup discovery algorithm CN2-SD to incorporate the new evaluation measures and a new weighting scheme inspired by AdaBoost. We demonstrate the usefulness of multi-class subgroup discovery experimentally, using discovered subgroups as features for a decision tree learner. Not only is the number of leaves of the decision tree reduced with a factor between 8 and 16 on average, but significant improvements in accuracy and AUC are achieved with particular evaluation measures and settings. Similar performance improvements can be observed when using naive Bayes.

Download to read the full chapter text

Chapter PDF

Subgroup Discovery with Proper Scoring Rules

For real: a thorough look at numeric attributes in subgroup discovery

Article Open access 21 September 2020

Subgroup Discovery Algorithms: A Survey and Empirical Evaluation

Article 06 May 2016

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup Discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188 (2004)
MathSciNet Google Scholar
Klösgen, W.: Explora: A multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 249–271. MIT Press, Cambridge (2004)
Google Scholar
Klösgen, W.: Subgroup discovery. In: Klösgen, W., Zytkow, J.M. (eds.) Handbook of Data Mining and Knowledge Discovery, pp. 354–361. Oxford University Press, Oxford (2002)
Google Scholar
Fürnkranz, J., Flach, P.: ROC ’n’ rule learning: Towards a better understanding of covering algorithms. Machine Learning 58, 39–77 (2005)
Article MATH Google Scholar
Clark, P., Niblett, T.: The CN2 Induction Algorithm. Machine Learning 3, 261–283 (1989)
Google Scholar
Clark, P., Boswell, R.: Rule induction with CN2: Some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS (LNAI), vol. 482, pp. 151–163. Springer, Heidelberg (1991)
Chapter Google Scholar
Schapire, R.E.: The boosting approach to machine learning: An overview. In: Nonlinear Estimation and Classification. Lecture Notes in Statistics. Springer, Heidelberg (2003)
Google Scholar
Lavrač, N., Flach, P., Zupan, B.: Rule evaluation measures: A unifying view. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 174–185. Springer, Heidelberg (1999)
Chapter Google Scholar
Friedman, J.H.: Another approach to polychotomous classification. Technical report, Stanford University, Department of Statistics (1996)
Google Scholar
Kijsirikul, B., Ussivakul, N., Meknavin, S.: Adaptive directed acyclic graphs for multiclass classification. In: Ishizuka, M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417, pp. 158–168. Springer, Heidelberg (2002)
Chapter Google Scholar
Platt, J.C., Cristianini, N.: Large margin DAGs for multiclass classification. In: Advance in Neural Information Processing Systems, vol. 12. MIT Press, Cambridge (2000)
Google Scholar
Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass support vector machines. Neural Networks 13, 415–425 (2002)
Google Scholar
Jin, X., Xu, A., Bie, R., Guo, P.: Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles. In: Li, J., Yang, Q., Tan, A.-H. (eds.) BioDM 2006. LNCS (LNBI), vol. 3916, pp. 106–115. Springer, Heidelberg (2006)
Chapter Google Scholar
Newman, D., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. Transactions on Neural Networks 5(4) (1994)
Google Scholar
Fleuret, F.: Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research 5, 1531–1555 (2004)
MathSciNet MATH Google Scholar
Bostrom, H.: Covering vs. divide-and-conquer for top-down induction of logic programs. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1194–1200. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Friedman, J.H., Fisher, N.I.: Bump hunting in high-dimensional data. Statistics and Computing 9, 123–143 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Bristol, United Kingdom
Tarek Abudawood & Peter Flach

Authors

Tarek Abudawood
View author publications
You can also search for this author in PubMed Google Scholar
Peter Flach
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA, Locked Bag 8001, Canberra, 2601, Australia and Helsinki Institute of IT,, Finland
Wray Buntine
Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik & Dunja Mladenić &
University College London, The Centre for Computational Statistics and Machine Learning Department of Computer Science, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abudawood, T., Flach, P. (2009). Evaluation Measures for Multi-class Subgroup Discovery. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-04180-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluation Measures for Multi-class Subgroup Discovery

Abstract

Chapter PDF

Similar content being viewed by others

Subgroup Discovery with Proper Scoring Rules

For real: a thorough look at numeric attributes in subgroup discovery

Subgroup Discovery Algorithms: A Survey and Empirical Evaluation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Evaluation Measures for Multi-class Subgroup Discovery

Abstract

Chapter PDF

Similar content being viewed by others

Subgroup Discovery with Proper Scoring Rules

For real: a thorough look at numeric attributes in subgroup discovery

Subgroup Discovery Algorithms: A Survey and Empirical Evaluation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation