End-to-End Learning of Deterministic Decision Trees

  • Thomas M. HehnEmail author
  • Fred A. Hamprecht
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11269)


Conventional decision trees have a number of favorable properties, including interpretability, a small computational footprint and the ability to learn from little training data. However, they lack a key quality that has helped fuel the deep learning revolution: that of being end-to-end trainable. Kontschieder 2015 has addressed this deficit, but at the cost of losing a main attractive trait of decision trees: the fact that each sample is routed along a small subset of tree nodes only. We here propose a model and Expectation-Maximization training scheme for decision trees that are fully probabilistic at train time, but after an annealing process become deterministic at test time. We analyze the learned oblique split parameters on image datasets and show that Neural Networks can be trained at each split. In summary, we present an end-to-end learning scheme for deterministic decision trees and present results on par or superior to published standard oblique decision tree algorithms.



The authors gratefully acknowledge financial support by DFG grant HA 4364/10-1.

Supplementary material

480455_1_En_42_MOESM1_ESM.pdf (375 kb)
Supplementary material 1 (pdf 375 KB)


  1. 1.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  2. 2.
    Breiman, L., Friedman, J., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, London (1984)zbMATHGoogle Scholar
  3. 3.
    Cardona, A., et al.: An integrated micro- and macroarchitectural analysis of the drosophila brain by computer-assisted serial section electron microscopy. PLOS Biol. 8(10), 1–17 (2010). Scholar
  4. 4.
    Criminisi, A., Shotton, J.: Decision Forests for Computer Vision and Medical Image Analysis. Springer, Berlin (2013). Scholar
  5. 5.
    Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11, 89–121 (1996)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Fan, R.E., Lin, C.J.: LIBSVM data: classification, regression and multi-label (2011).
  7. 7.
    Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1022–1029, June 2009.
  9. 9.
    Ioannou, Y., et al.: Decision forests, convolutional networks and the models in-between. arXiv:1603.01250 (March 2016)
  10. 10.
    Jordan, M.I.: A statistical approach to decision tree modeling. In: Proceedings of the Seventh Annual Conference on Computational Learning Theory, COLT 1994, New York, NY, USA, pp. 13–20 (1994)Google Scholar
  11. 11.
    Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the em algorithm. Neural Comput. 6(2), 181–214 (1994). Scholar
  12. 12.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  13. 13.
    Kontschieder, P., Fiterau, M., Criminisi, A., Rota Bulò, S.: Deep neural decision forests. In: ICCV (2015)Google Scholar
  14. 14.
    Kontschieder, P., Kohli, P., Shotton, J., Criminisi, A.: GeoF: geodesic forests for learning coupled predictors. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013Google Scholar
  15. 15.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  16. 16.
    Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 775–781, June 2005.
  17. 17.
    McGill, M., Perona, P.: Deciding how to decide: dynamic routing in artificial neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia, 06–11 August 2017, vol. 70, pp. 2363–2372.
  18. 18.
    Menze, B.H., Kelm, B.M., Splitthoff, D.N., Koethe, U., Hamprecht, F.A.: On oblique random forests. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6912, pp. 453–469. Springer, Heidelberg (2011). Scholar
  19. 19.
    Montillo, A., et al.: Entanglement and differentiable information gain maximization. In: Criminisi, A., Shotton, J. (eds.) Decision Forests for Computer Vision and Medical Image Analysis. ACVPR, pp. 273–293. Springer, London (2013). Scholar
  20. 20.
    Murthy, K.V.S.: On growing better decision trees from data. Ph.D. thesis, The Johns Hopkins University (1996)Google Scholar
  21. 21.
    Norouzi, M., Collins, M.D., Fleet, D.J., Kohli, P.: Co2 forest: improved random forest by continuous optimization of oblique splits. arXiv:1506.06155 (2015)
  22. 22.
    Norouzi, M., Collins, M.D., Johnson, M., Fleet, D.J., Kohli, P.: Efficient non-greedy optimization of decision trees. In: NIPS, December 2015Google Scholar
  23. 23.
  24. 24.
    Quinlan, J.R.: Induction of decision trees. In: Shavlik, J.W., Dietterich, T.G. (eds.) Readings in Machine Learning. Morgan Kaufmann, Los Altos (1990). Originally published in Mach. Learn. 1, 81–106 (1986)Google Scholar
  25. 25.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  26. 26.
    Richmond, D., Kainmueller, D., Yang, M., Myers, E., Rother, C.: Mapping auto-context decision forests to deep convnets for semantic segmentation. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference (BMVC), pp. 144.1–144.12. BMVA Press, September 2016.
  27. 27.
    Rose, K., Gurewitz, E., Fox, G.C.: Statistical mechanics and phase transitions in clustering. Phys. Rev. Lett. 65, 945–948 (1990). Scholar
  28. 28.
    Rota Bulo, S., Kontschieder, P.: Neural decision forests for semantic image labelling. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014Google Scholar
  29. 29.
    Sethi, I.K.: Entropy nets: from decision trees to neural networks. Proc. IEEE 78(10), 1605–1613 (1990)CrossRefGoogle Scholar
  30. 30.
    Suárez, A., Lutsko, J.F.: Globally optimal fuzzy decision trees for classification and regression. IEEE Trans. Pattern Anal. Mach. Intell. 21(12), 1297–1311 (1999)CrossRefGoogle Scholar
  31. 31.
    Welbl, J.: Casting random forests as artificial neural networks (and profiting from it). In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 765–771. Springer, Cham (2014). Scholar
  32. 32.
    Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747 (2017)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Heidelberg Collaboratory for Image Processing Interdisciplinary Center for Scientific ComputingHeidelberg UniversityHeidelbergGermany

Personalised recommendations