Rademacher Penalization over Decision Tree Prunings

  • Matti Kääriäinen
  • Tapio Elomaa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2837)


Rademacher penalization is a modern technique for obtaining data-dependent bounds on the generalization error of classifiers. It would appear to be limited to relatively simple hypothesis classes because of computational complexity issues. In this paper we, nevertheless, apply Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prunings of a given decision tree rather than the tree growing phase. Moreover, we generalize the error-bounding approach from binary classification to multi-class situations. Our empirical experiments indicate that the proposed new bounds clearly outperform earlier bounds for decision tree prunings and provide non-trivial error estimates on real-world data sets.


Generalization Error Pruning Algorithm Hypothesis Class Empirical Risk Minimization Decision Tree Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Koltchinskii, V.: Rademacher penalties and structural risk minimization. IEEE Trans. Inf. Theor. 47, 1902–1914 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: Risk bounds and structural results. JMLR 3, 463–482 (2002)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Lozano, F.: Model selection using Rademacher penalization. In: Proc. 2nd ICSC Symposium on Neural Networks, NAISO, Academic Press, London (2000)Google Scholar
  4. 4.
    Elomaa, T., Kääriäinen, M.: Progressive Rademacher sampling. In: Proc. 18th National Conference on Artificial Intelligence, pp. 140–145. MIT Press, Cambridge (2002)Google Scholar
  5. 5.
    Auer, P., Holte, R.C., Maass, W.: Theory and application of agnostic PAC-learning with small decision trees. In: Proc. 12th International Conference on Machine Learning, pp. 21–29. Morgan Kaufmann, San Francisco (1995)Google Scholar
  6. 6.
    Grigni, M., Mirelli, V., Papadimitriou, C.H.: On the difficulty of designing good classifiers. SIAM J. Comput. 30, 318–323 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  8. 8.
    Quinlan, J.R.: Simplifying decision trees. Int. J. Man-Mach. Stud. 27, 221–248 (1987)CrossRefGoogle Scholar
  9. 9.
    Vapnik, V.N.: Estimation of Dependencies Based on Empirical Data. Springer, Heidelberg (1982)Google Scholar
  10. 10.
    Van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer, Heidelberg (2000) (corrected second printing)Google Scholar
  11. 11.
    McDiarmid, C.: On the method of bounded differences. In: Surveys in Combinatorics. London Mathematical Society Lecture Note Series, vol. 141, pp. 148–188. Cambridge University Press, Cambridge (1989)Google Scholar
  12. 12.
    Mingers, J.: An empirical comparison of pruning methods for decision tree induction. Mach. Learn. 4, 227–243 (1989)CrossRefGoogle Scholar
  13. 13.
    Esposito, F., Malerba, D., Semeraro, G.: A comparative analysis of methods for pruning decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 19, 476–491 (1997)CrossRefGoogle Scholar
  14. 14.
    Mansour, Y.: Pessimistic decision tree pruning based on tree size. In: Proc. 14th International Conference on Machine Learning, pp. 195–201. Morgan Kaufmann, San Francisco (1997)Google Scholar
  15. 15.
    Kearns, M., Mansour, Y.: A fast, bottom-up decision tree pruning algorithm with near-optimal generalization. In: Proc. 15th International Conference on Machine Learning, pp. 269–277. Morgan Kaufmann, San Francisco (1998)Google Scholar
  16. 16.
    Helmbold, D.P., Schapire, R.E.: Predicting nearly as well as the best pruning of a decision tree. Mach. Learn. 27, 51–68 (1997)CrossRefGoogle Scholar
  17. 17.
    Pereira, F.C., Singer, Y.: An efficient extension to mixture techniques for prediction and decision trees. Mach. Learn. 36, 183–199 (1999)zbMATHCrossRefGoogle Scholar
  18. 18.
    Oates, T., Jensen, D.: Toward a theoretical understanding of why and when decision tree pruning algorithms fail. In: Proc. 16th National Conference on Artificial Intelligence, pp. 372–378. MIT Press, Cambridge (1999)Google Scholar
  19. 19.
    Elomaa, T., Kääriäinen, M.: An analysis of reduced error pruning. J. Artif. Intell. Res. 15, 163–187 (2001)zbMATHGoogle Scholar
  20. 20.
    Esposito, F., Malerba, D., Semeraro, G.: Decision tree pruning as a search in the state space. In: Brazdil, P.B. (ed.) ECML 1993. LNCS (LNAI), vol. 667, pp. 165–184. Springer, Heidelberg (1993)Google Scholar
  21. 21.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)zbMATHGoogle Scholar
  22. 22.
    Bohanec, M., Bratko, I.: Trading accuracy for simplicity in decision trees. Mach. Learn. 15, 223–250 (1994)zbMATHGoogle Scholar
  23. 23.
    Oliver, J.J., Hand, D.J.: On pruning and averaging decision trees. In: Proc. 12th International Conference on Machine Learning, pp. 430–437. Morgan Kaufmann, San Francisco (1995)Google Scholar
  24. 24.
    Almuallim, H.: An efficient algorithm for optimal pruning of decision trees. Artif. Intell. 83, 347–362 (1996)CrossRefGoogle Scholar
  25. 25.
    Langford, J.: Combining training set and test set bounds. In: Proc. 19th International Conference on Machine Learning, pp. 331–338. Morgan Kaufmann, San Francisco (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Matti Kääriäinen
    • 1
  • Tapio Elomaa
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations