Skip to main content

End-to-End Learning of Deterministic Decision Trees

  • Conference paper
  • First Online:
Pattern Recognition (GCPR 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11269))

Included in the following conference series:

Abstract

Conventional decision trees have a number of favorable properties, including interpretability, a small computational footprint and the ability to learn from little training data. However, they lack a key quality that has helped fuel the deep learning revolution: that of being end-to-end trainable. Kontschieder 2015 has addressed this deficit, but at the cost of losing a main attractive trait of decision trees: the fact that each sample is routed along a small subset of tree nodes only. We here propose a model and Expectation-Maximization training scheme for decision trees that are fully probabilistic at train time, but after an annealing process become deterministic at test time. We analyze the learned oblique split parameters on image datasets and show that Neural Networks can be trained at each split. In summary, we present an end-to-end learning scheme for deterministic decision trees and present results on par or superior to published standard oblique decision tree algorithms.

T. M. Hehn—Corresponding author is now at TU Delft.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  2. Breiman, L., Friedman, J., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, London (1984)

    MATH  Google Scholar 

  3. Cardona, A., et al.: An integrated micro- and macroarchitectural analysis of the drosophila brain by computer-assisted serial section electron microscopy. PLOS Biol. 8(10), 1–17 (2010). https://doi.org/10.1371/journal.pbio.1000502

    Article  Google Scholar 

  4. Criminisi, A., Shotton, J.: Decision Forests for Computer Vision and Medical Image Analysis. Springer, Berlin (2013). https://doi.org/10.1007/978-1-4471-4929-3

    Book  Google Scholar 

  5. Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11, 89–121 (1996)

    Article  MathSciNet  Google Scholar 

  6. Fan, R.E., Lin, C.J.: LIBSVM data: classification, regression and multi-label (2011). http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

  7. Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)

    MathSciNet  MATH  Google Scholar 

  8. Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1022–1029, June 2009. https://doi.org/10.1109/CVPR.2009.5206740

  9. Ioannou, Y., et al.: Decision forests, convolutional networks and the models in-between. arXiv:1603.01250 (March 2016)

  10. Jordan, M.I.: A statistical approach to decision tree modeling. In: Proceedings of the Seventh Annual Conference on Computational Learning Theory, COLT 1994, New York, NY, USA, pp. 13–20 (1994)

    Google Scholar 

  11. Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the em algorithm. Neural Comput. 6(2), 181–214 (1994). https://doi.org/10.1162/neco.1994.6.2.181

    Article  Google Scholar 

  12. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  13. Kontschieder, P., Fiterau, M., Criminisi, A., Rota Bulò, S.: Deep neural decision forests. In: ICCV (2015)

    Google Scholar 

  14. Kontschieder, P., Kohli, P., Shotton, J., Criminisi, A.: GeoF: geodesic forests for learning coupled predictors. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013

    Google Scholar 

  15. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  16. Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 775–781, June 2005. https://doi.org/10.1109/CVPR.2005.288

  17. McGill, M., Perona, P.: Deciding how to decide: dynamic routing in artificial neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia, 06–11 August 2017, vol. 70, pp. 2363–2372. http://proceedings.mlr.press/v70/mcgill17a.html

  18. Menze, B.H., Kelm, B.M., Splitthoff, D.N., Koethe, U., Hamprecht, F.A.: On oblique random forests. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6912, pp. 453–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23783-6_29

    Chapter  Google Scholar 

  19. Montillo, A., et al.: Entanglement and differentiable information gain maximization. In: Criminisi, A., Shotton, J. (eds.) Decision Forests for Computer Vision and Medical Image Analysis. ACVPR, pp. 273–293. Springer, London (2013). https://doi.org/10.1007/978-1-4471-4929-3_19

    Chapter  Google Scholar 

  20. Murthy, K.V.S.: On growing better decision trees from data. Ph.D. thesis, The Johns Hopkins University (1996)

    Google Scholar 

  21. Norouzi, M., Collins, M.D., Fleet, D.J., Kohli, P.: Co2 forest: improved random forest by continuous optimization of oblique splits. arXiv:1506.06155 (2015)

  22. Norouzi, M., Collins, M.D., Johnson, M., Fleet, D.J., Kohli, P.: Efficient non-greedy optimization of decision trees. In: NIPS, December 2015

    Google Scholar 

  23. PyTorch: http://www.pytorch.org/

  24. Quinlan, J.R.: Induction of decision trees. In: Shavlik, J.W., Dietterich, T.G. (eds.) Readings in Machine Learning. Morgan Kaufmann, Los Altos (1990). Originally published in Mach. Learn. 1, 81–106 (1986)

    Google Scholar 

  25. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  26. Richmond, D., Kainmueller, D., Yang, M., Myers, E., Rother, C.: Mapping auto-context decision forests to deep convnets for semantic segmentation. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference (BMVC), pp. 144.1–144.12. BMVA Press, September 2016. https://doi.org/10.5244/C.30.144

  27. Rose, K., Gurewitz, E., Fox, G.C.: Statistical mechanics and phase transitions in clustering. Phys. Rev. Lett. 65, 945–948 (1990). https://doi.org/10.1103/PhysRevLett.65.945

    Article  Google Scholar 

  28. Rota Bulo, S., Kontschieder, P.: Neural decision forests for semantic image labelling. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014

    Google Scholar 

  29. Sethi, I.K.: Entropy nets: from decision trees to neural networks. Proc. IEEE 78(10), 1605–1613 (1990)

    Article  Google Scholar 

  30. Suárez, A., Lutsko, J.F.: Globally optimal fuzzy decision trees for classification and regression. IEEE Trans. Pattern Anal. Mach. Intell. 21(12), 1297–1311 (1999)

    Article  Google Scholar 

  31. Welbl, J.: Casting random forests as artificial neural networks (and profiting from it). In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 765–771. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_66

    Chapter  Google Scholar 

  32. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747 (2017)

Download references

Acknowledgments

The authors gratefully acknowledge financial support by DFG grant HA 4364/10-1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas M. Hehn .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 375 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hehn, T.M., Hamprecht, F.A. (2019). End-to-End Learning of Deterministic Decision Trees. In: Brox, T., Bruhn, A., Fritz, M. (eds) Pattern Recognition. GCPR 2018. Lecture Notes in Computer Science(), vol 11269. Springer, Cham. https://doi.org/10.1007/978-3-030-12939-2_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-12939-2_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-12938-5

  • Online ISBN: 978-3-030-12939-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics