Advertisement

Improving the AUC of Probabilistic Estimation Trees

  • César Ferri
  • Peter A. Flach
  • José Hernández-Orallo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2837)

Abstract

In this work we investigate several issues in order to improve the performance of probabilistic estimation trees (PETs). First, we derive a new probability smoothing that takes into account the class distributions of all the nodes from the root to each leaf. Secondly, we introduce or adapt some new splitting criteria aimed at improving probability estimates rather than improving classification accuracy, and compare them with other accuracy-aimed splitting criteria. Thirdly, we analyse the effect of pruning methods and we choose a cardinality-based pruning, which is able to significantly reduce the size of the trees without degrading the quality of the estimates. The quality of probability estimates of these three issues is evaluated by the 1-vs-1 multi-class extension of the Area Under the ROC Curve (AUC) measure, which is becoming widespread for evaluating probability estimators, ranking of predictions in particular.

Keywords

Probability Estimator Minority Class Smoothing Method Decision Tree Classifier Pruning Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Blake, C., Merz, C.: UCI repository of machine learning databases, University of California (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
  2. 2.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)CrossRefGoogle Scholar
  3. 3.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees, Belmont, CA, Wadsworth (1984)Google Scholar
  4. 4.
    Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly Weather Rev. 78, 1–3 (1950)CrossRefGoogle Scholar
  5. 5.
    Cestnik, B., Bratko, I.: On estimating probabilities in tree pruning. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS (LNAI), vol. 482, pp. 138–150. Springer, Heidelberg (1991)CrossRefGoogle Scholar
  6. 6.
    Esposito, F., Malerba, D., Semeraro, G.: A Comparative Analysis of Methods for Pruning Decision Trees. IEEE Trans. on Pattern Analysis and Machine Intelligence 19(5), 476–491 (1997)CrossRefGoogle Scholar
  7. 7.
    Ferri, C., Flach, P., Hernández-Orallo, J.: Learning Decision Trees using the Area Under the ROC Curve. In: Sammut, C., Hoffman, A. (eds.) Proc. Int. Conf. on Machine Learning (ICML 2002), pp. 139–146. Morgan Kaufmann, San Francisco (2002)Google Scholar
  8. 8.
    Ferri, C., Flach, P., Hernández-Orallo, J.: Decision Trees for Ranking: Effect of new smoothing methods, new splitting criteria and simple pruning methods. Tech. Rep. Dep. de Sistemes Informàtics i Computació, Univ. Politècnica de València (2003)Google Scholar
  9. 9.
    Hand, D.J., Till, R.J.: A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning 45, 171–186 (2001)zbMATHCrossRefGoogle Scholar
  10. 10.
    Kearns, M., Mansour, Y.: On the boosting ability of top-down decision tree learning algorithms. Journal of Computer and Systems Sciences 58(1), 109–128 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Ling, C.X., Yan, R.J.: Decision Tree with Better Ranking. In: Proc. Int. Conf. on Machine Learning (ICML 2003), AAAI Press, Menlo Park (2003)Google Scholar
  12. 12.
    Provost, F., Domingos, P.: Tree Induction for Probability-based Ranking. Machine Learning 52(3) (2003)Google Scholar
  13. 13.
    Quinlan, J.R.: C4.5. Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  14. 14.
    Smyth, P., Gray, A., Fayyad, U.: Retrofitting decision tree classifiers using kernel density estimation. In: Proc. Int. Conf. on Machine Learning (ICML1995), pp. 506–514. Morgan Kaufmann, San Francisco (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • César Ferri
    • 1
  • Peter A. Flach
    • 2
  • José Hernández-Orallo
    • 1
  1. 1.Dep. Sistemes Informàtics i ComputacióUniv. Politècnica de ValènciaSpain
  2. 2.Department of Computer ScienceUniversity of BristolUnited Kingdom

Personalised recommendations