Abstract
Conventional decision trees have a number of favorable properties, including interpretability, a small computational footprint and the ability to learn from little training data. However, they lack a key quality that has helped fuel the deep learning revolution: that of being end-to-end trainable. Kontschieder 2015 has addressed this deficit, but at the cost of losing a main attractive trait of decision trees: the fact that each sample is routed along a small subset of tree nodes only. We here propose a model and Expectation-Maximization training scheme for decision trees that are fully probabilistic at train time, but after an annealing process become deterministic at test time. We analyze the learned oblique split parameters on image datasets and show that Neural Networks can be trained at each split. In summary, we present an end-to-end learning scheme for deterministic decision trees and present results on par or superior to published standard oblique decision tree algorithms.
T. M. Hehn—Corresponding author is now at TU Delft.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breiman, L., Friedman, J., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, London (1984)
Cardona, A., et al.: An integrated micro- and macroarchitectural analysis of the drosophila brain by computer-assisted serial section electron microscopy. PLOS Biol. 8(10), 1–17 (2010). https://doi.org/10.1371/journal.pbio.1000502
Criminisi, A., Shotton, J.: Decision Forests for Computer Vision and Medical Image Analysis. Springer, Berlin (2013). https://doi.org/10.1007/978-1-4471-4929-3
Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11, 89–121 (1996)
Fan, R.E., Lin, C.J.: LIBSVM data: classification, regression and multi-label (2011). http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)
Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1022–1029, June 2009. https://doi.org/10.1109/CVPR.2009.5206740
Ioannou, Y., et al.: Decision forests, convolutional networks and the models in-between. arXiv:1603.01250 (March 2016)
Jordan, M.I.: A statistical approach to decision tree modeling. In: Proceedings of the Seventh Annual Conference on Computational Learning Theory, COLT 1994, New York, NY, USA, pp. 13–20 (1994)
Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the em algorithm. Neural Comput. 6(2), 181–214 (1994). https://doi.org/10.1162/neco.1994.6.2.181
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Kontschieder, P., Fiterau, M., Criminisi, A., Rota Bulò, S.: Deep neural decision forests. In: ICCV (2015)
Kontschieder, P., Kohli, P., Shotton, J., Criminisi, A.: GeoF: geodesic forests for learning coupled predictors. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 775–781, June 2005. https://doi.org/10.1109/CVPR.2005.288
McGill, M., Perona, P.: Deciding how to decide: dynamic routing in artificial neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia, 06–11 August 2017, vol. 70, pp. 2363–2372. http://proceedings.mlr.press/v70/mcgill17a.html
Menze, B.H., Kelm, B.M., Splitthoff, D.N., Koethe, U., Hamprecht, F.A.: On oblique random forests. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6912, pp. 453–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23783-6_29
Montillo, A., et al.: Entanglement and differentiable information gain maximization. In: Criminisi, A., Shotton, J. (eds.) Decision Forests for Computer Vision and Medical Image Analysis. ACVPR, pp. 273–293. Springer, London (2013). https://doi.org/10.1007/978-1-4471-4929-3_19
Murthy, K.V.S.: On growing better decision trees from data. Ph.D. thesis, The Johns Hopkins University (1996)
Norouzi, M., Collins, M.D., Fleet, D.J., Kohli, P.: Co2 forest: improved random forest by continuous optimization of oblique splits. arXiv:1506.06155 (2015)
Norouzi, M., Collins, M.D., Johnson, M., Fleet, D.J., Kohli, P.: Efficient non-greedy optimization of decision trees. In: NIPS, December 2015
PyTorch: http://www.pytorch.org/
Quinlan, J.R.: Induction of decision trees. In: Shavlik, J.W., Dietterich, T.G. (eds.) Readings in Machine Learning. Morgan Kaufmann, Los Altos (1990). Originally published in Mach. Learn. 1, 81–106 (1986)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Richmond, D., Kainmueller, D., Yang, M., Myers, E., Rother, C.: Mapping auto-context decision forests to deep convnets for semantic segmentation. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference (BMVC), pp. 144.1–144.12. BMVA Press, September 2016. https://doi.org/10.5244/C.30.144
Rose, K., Gurewitz, E., Fox, G.C.: Statistical mechanics and phase transitions in clustering. Phys. Rev. Lett. 65, 945–948 (1990). https://doi.org/10.1103/PhysRevLett.65.945
Rota Bulo, S., Kontschieder, P.: Neural decision forests for semantic image labelling. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014
Sethi, I.K.: Entropy nets: from decision trees to neural networks. Proc. IEEE 78(10), 1605–1613 (1990)
Suárez, A., Lutsko, J.F.: Globally optimal fuzzy decision trees for classification and regression. IEEE Trans. Pattern Anal. Mach. Intell. 21(12), 1297–1311 (1999)
Welbl, J.: Casting random forests as artificial neural networks (and profiting from it). In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 765–771. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_66
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747 (2017)
Acknowledgments
The authors gratefully acknowledge financial support by DFG grant HA 4364/10-1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hehn, T.M., Hamprecht, F.A. (2019). End-to-End Learning of Deterministic Decision Trees. In: Brox, T., Bruhn, A., Fritz, M. (eds) Pattern Recognition. GCPR 2018. Lecture Notes in Computer Science(), vol 11269. Springer, Cham. https://doi.org/10.1007/978-3-030-12939-2_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-12939-2_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12938-5
Online ISBN: 978-3-030-12939-2
eBook Packages: Computer ScienceComputer Science (R0)