Abstract
In this work we investigate several issues in order to improve the performance of probabilistic estimation trees (PETs). First, we derive a new probability smoothing that takes into account the class distributions of all the nodes from the root to each leaf. Secondly, we introduce or adapt some new splitting criteria aimed at improving probability estimates rather than improving classification accuracy, and compare them with other accuracy-aimed splitting criteria. Thirdly, we analyse the effect of pruning methods and we choose a cardinality-based pruning, which is able to significantly reduce the size of the trees without degrading the quality of the estimates. The quality of probability estimates of these three issues is evaluated by the 1-vs-1 multi-class extension of the Area Under the ROC Curve (AUC) measure, which is becoming widespread for evaluating probability estimators, ranking of predictions in particular.
Chapter PDF
References
Blake, C., Merz, C.: UCI repository of machine learning databases, University of California (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees, Belmont, CA, Wadsworth (1984)
Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly Weather Rev. 78, 1–3 (1950)
Cestnik, B., Bratko, I.: On estimating probabilities in tree pruning. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS (LNAI), vol. 482, pp. 138–150. Springer, Heidelberg (1991)
Esposito, F., Malerba, D., Semeraro, G.: A Comparative Analysis of Methods for Pruning Decision Trees. IEEE Trans. on Pattern Analysis and Machine Intelligence 19(5), 476–491 (1997)
Ferri, C., Flach, P., Hernández-Orallo, J.: Learning Decision Trees using the Area Under the ROC Curve. In: Sammut, C., Hoffman, A. (eds.) Proc. Int. Conf. on Machine Learning (ICML 2002), pp. 139–146. Morgan Kaufmann, San Francisco (2002)
Ferri, C., Flach, P., Hernández-Orallo, J.: Decision Trees for Ranking: Effect of new smoothing methods, new splitting criteria and simple pruning methods. Tech. Rep. Dep. de Sistemes Informàtics i Computació, Univ. Politècnica de València (2003)
Hand, D.J., Till, R.J.: A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning 45, 171–186 (2001)
Kearns, M., Mansour, Y.: On the boosting ability of top-down decision tree learning algorithms. Journal of Computer and Systems Sciences 58(1), 109–128 (1999)
Ling, C.X., Yan, R.J.: Decision Tree with Better Ranking. In: Proc. Int. Conf. on Machine Learning (ICML 2003), AAAI Press, Menlo Park (2003)
Provost, F., Domingos, P.: Tree Induction for Probability-based Ranking. Machine Learning 52(3) (2003)
Quinlan, J.R.: C4.5. Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Smyth, P., Gray, A., Fayyad, U.: Retrofitting decision tree classifiers using kernel density estimation. In: Proc. Int. Conf. on Machine Learning (ICML1995), pp. 506–514. Morgan Kaufmann, San Francisco (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ferri, C., Flach, P.A., Hernández-Orallo, J. (2003). Improving the AUC of Probabilistic Estimation Trees. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds) Machine Learning: ECML 2003. ECML 2003. Lecture Notes in Computer Science(), vol 2837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39857-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-39857-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20121-2
Online ISBN: 978-3-540-39857-8
eBook Packages: Springer Book Archive