Matching Model Versus Single Model: A Study of the Requirement to Match Class Distribution Using Decision Trees

Ting, Kai Ming

doi:10.1007/978-3-540-30115-8_40

Kai Ming Ting²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3201))

Included in the following conference series:

European Conference on Machine Learning

3895 Accesses
1 Citations

Abstract

A tacit assumption in classifier induction is that the class distribution of the training set must match the class distribution of the test set. A direct implementation is to retrain a model using a data set with matching class distribution every time the operating condition changes (i.e., the matching model). The alternative is to modify the decision rule of a previous trained model to the new operating condition. The latter is the single model approach commonly used and recommended by many researchers. In this paper, we argue with empirical support using decision trees that learning using the matching class distribution is desirable. We also make explicit the differences and limitations of the two methods for the single model approach: rescaling and thresholding.

Download to read the full chapter text

Chapter PDF

Class imbalance revisited: a new experimental setup to assess the performance of treatment methods

Article 17 October 2014

Ronaldo C. Prati, Gustavo E. A. P. A. Batista & Diego F. Silva

Classifier calibration: a survey on how to assess and improve predicted class probabilities

Article Open access 16 May 2023

Telmo Silva Filho, Hao Song, … Peter Flach

Constrained Naïve Bayes with application to unbalanced data classification

Article Open access 20 October 2021

Rafael Blanquero, Emilio Carrizosa, … M. Remedios Sillero-Denamiel

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Blake, C., Merz, C.J.: UCI Repository of machine learning databases. Irvine, CA: University of California (1998). www.ics.uci.edu/~mlearn/MLRepository.html
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Chapman and Hall, Boca Raton (1993)
Google Scholar
Dietterich, T.G., Kearns, M., Mansour, Y.: Applying the weak learning framework to understand and improve C4.5. In: Proceedings of Thirteenth International Conference on Machine Learning, pp. 96–104. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Duda, O.R., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley, Chichester (2001)
MATH Google Scholar
Drummond, C., Holte, R.C.: Explicitly Representing Expected Cost: An Alternative to ROC Representation. In: Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining, pp. 198–207 (2000)
Google Scholar
Drummond, C., Holte, R.C.: Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria. In: Proceedings of The Seventeenth International Conference on Machine Learning, pp. 239–246. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Elkan, C.: The Foundations of Cost-Sensitive Learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973–978 (2001)
Google Scholar
Hooper, P.M.: Reference Point Logistic Regression and the Identification of DNA Functional Sites. Journal of Classification 18, 81–107 (2001)
MATH MathSciNet Google Scholar
Provost, F., Fawcett, T.: Robust Classification for Imprecise Environments. Machine Learning 42, 203–231 (2001)
Article MATH Google Scholar
Provost, F., Domingos, P.: Tree-Induction for Probability-based Ranking. Machine Learning 52, 199–215 (2003)
Article MATH Google Scholar
Quinlan, J.R.: C4.5: Program for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Ting, K.M.: Issues in Classifier Evaluation using Optimal Cost Curves. In: Proceedings of The Nineteenth International Conference on Machine Learning, pp. 642–649 (2002)
Google Scholar
Ting, K.M.: An Instance-Weighting Method to Induce Cost-Sensitive Trees. IEEE Transactions on Knowledge and Data Engineering 14(3), 659–665 (2002)
Article Google Scholar
Weiss, G., Provost, F.: Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction. Journal of Artificial Intelligence Research 19, 315–354 (2003)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Gippsland School of Computing and Information Technology, Monash University, Australia
Kai Ming Ting

Authors

Kai Ming Ting
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut
Dipartimento di Informatica, Università degli Studi di Bari,
Floriana Esposito
Pisa KDD Laboratory, ISTI - CNR, Area della Ricerca di Pisa, Via Giuseppe Moruzzi 1, Pisa, Italy
Fosca Giannotti
Dipartimento di Informatica, Via F. Buonarroti 2, 56127, Pisa, Italy
Dino Pedreschi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ting, K.M. (2004). Matching Model Versus Single Model: A Study of the Requirement to Match Class Distribution Using Decision Trees. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Machine Learning: ECML 2004. ECML 2004. Lecture Notes in Computer Science(), vol 3201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30115-8_40

Download citation

DOI: https://doi.org/10.1007/978-3-540-30115-8_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23105-9
Online ISBN: 978-3-540-30115-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Matching Model Versus Single Model: A Study of the Requirement to Match Class Distribution Using Decision Trees

Abstract

Chapter PDF

Similar content being viewed by others

Class imbalance revisited: a new experimental setup to assess the performance of treatment methods

Classifier calibration: a survey on how to assess and improve predicted class probabilities

Constrained Naïve Bayes with application to unbalanced data classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Matching Model Versus Single Model: A Study of the Requirement to Match Class Distribution Using Decision Trees

Abstract

Chapter PDF

Similar content being viewed by others

Class imbalance revisited: a new experimental setup to assess the performance of treatment methods

Classifier calibration: a survey on how to assess and improve predicted class probabilities

Constrained Naïve Bayes with application to unbalanced data classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation