Improving the AUC of Probabilistic Estimation Trees

Ferri, César; Flach, Peter A.; Hernández-Orallo, José

doi:10.1007/978-3-540-39857-8_13

Improving the AUC of Probabilistic Estimation Trees

César Ferri¹⁰,
Peter A. Flach¹¹ &
José Hernández-Orallo¹⁰

Conference paper

2310 Accesses
24 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2837))

Abstract

In this work we investigate several issues in order to improve the performance of probabilistic estimation trees (PETs). First, we derive a new probability smoothing that takes into account the class distributions of all the nodes from the root to each leaf. Secondly, we introduce or adapt some new splitting criteria aimed at improving probability estimates rather than improving classification accuracy, and compare them with other accuracy-aimed splitting criteria. Thirdly, we analyse the effect of pruning methods and we choose a cardinality-based pruning, which is able to significantly reduce the size of the trees without degrading the quality of the estimates. The quality of probability estimates of these three issues is evaluated by the 1-vs-1 multi-class extension of the Area Under the ROC Curve (AUC) measure, which is becoming widespread for evaluating probability estimators, ranking of predictions in particular.

Download to read the full chapter text

Chapter PDF

References

Blake, C., Merz, C.: UCI repository of machine learning databases, University of California (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)
Article Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees, Belmont, CA, Wadsworth (1984)
Google Scholar
Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly Weather Rev. 78, 1–3 (1950)
Article Google Scholar
Cestnik, B., Bratko, I.: On estimating probabilities in tree pruning. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS (LNAI), vol. 482, pp. 138–150. Springer, Heidelberg (1991)
Chapter Google Scholar
Esposito, F., Malerba, D., Semeraro, G.: A Comparative Analysis of Methods for Pruning Decision Trees. IEEE Trans. on Pattern Analysis and Machine Intelligence 19(5), 476–491 (1997)
Article Google Scholar
Ferri, C., Flach, P., Hernández-Orallo, J.: Learning Decision Trees using the Area Under the ROC Curve. In: Sammut, C., Hoffman, A. (eds.) Proc. Int. Conf. on Machine Learning (ICML 2002), pp. 139–146. Morgan Kaufmann, San Francisco (2002)
Google Scholar
Ferri, C., Flach, P., Hernández-Orallo, J.: Decision Trees for Ranking: Effect of new smoothing methods, new splitting criteria and simple pruning methods. Tech. Rep. Dep. de Sistemes Informàtics i Computació, Univ. Politècnica de València (2003)
Google Scholar
Hand, D.J., Till, R.J.: A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning 45, 171–186 (2001)
Article MATH Google Scholar
Kearns, M., Mansour, Y.: On the boosting ability of top-down decision tree learning algorithms. Journal of Computer and Systems Sciences 58(1), 109–128 (1999)
Article MATH MathSciNet Google Scholar
Ling, C.X., Yan, R.J.: Decision Tree with Better Ranking. In: Proc. Int. Conf. on Machine Learning (ICML 2003), AAAI Press, Menlo Park (2003)
Google Scholar
Provost, F., Domingos, P.: Tree Induction for Probability-based Ranking. Machine Learning 52(3) (2003)
Google Scholar
Quinlan, J.R.: C4.5. Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Smyth, P., Gray, A., Fayyad, U.: Retrofitting decision tree classifiers using kernel density estimation. In: Proc. Int. Conf. on Machine Learning (ICML1995), pp. 506–514. Morgan Kaufmann, San Francisco (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Dep. Sistemes Informàtics i Computació, Univ. Politècnica de València, Spain
César Ferri & José Hernández-Orallo
Department of Computer Science, University of Bristol, United Kingdom
Peter A. Flach

Authors

César Ferri
View author publications
You can also search for this author in PubMed Google Scholar
Peter A. Flach
View author publications
You can also search for this author in PubMed Google Scholar
José Hernández-Orallo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Nova Gorica, Nova Gorica, Slovenia
Nada Lavrač
Rudjer Bošković Institute, Bijenička 54, 10000, Zagreb, Croatia
Dragan Gamberger
Leiden Institute of Advanced Computer Science, Leiden University,
Hendrik Blockeel
Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Ljupčo Todorovski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferri, C., Flach, P.A., Hernández-Orallo, J. (2003). Improving the AUC of Probabilistic Estimation Trees. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds) Machine Learning: ECML 2003. ECML 2003. Lecture Notes in Computer Science(), vol 2837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39857-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-39857-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20121-2
Online ISBN: 978-3-540-39857-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics