Abstract
We present a new regression algorithm called Groves of trees and show empirically that it is superior in performance to a number of other established regression methods. A Grove is an additive model usually containing a small number of large trees. Trees added to the Grove are trained on the residual error of other trees already in the Grove. We begin the training process with a single small tree in the Grove and gradually increase both the number of trees in the Grove and their size. This procedure ensures that the resulting model captures the additive structure of the response. A single Grove may still overfit to the training set, so we further decrease the variance of the final predictions with bagging. We show that in addition to exhibiting superior performance on a suite of regression test problems, bagged Groves of trees are very resistant to overfitting.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
Breiman, L.: Bagging Predictors. Machine Learning 24, 123–140 (1996)
Hooker, G.: Discovering ANOVA Structure in Black Box Functions. In: Proc. ACM SIGKDD, ACM Press, New York (2004)
Bylander, T.: Estimating Generalization Error on Two-Class Datasets Using Out-of-Bag Estimates. Machine Learning 48(1-3), 287–297 (2002)
Friedman, J.: Greedy Function Approximation: a Gradient Boosting Machine. Annals of Statistics 29, 1189–1232 (2001)
Friedman, J.: Stochastic Gradient Boosting. Computational Statistics and Data Analysis 38, 367–378 (2002)
Torgo, L.: Regression DataSets, http://www.liacc.up.pt/~ltorgo/Regression/DataSets.html
Rasmussen, C.E., Neal, R.M., Hinton, G., van Camp, D., Revow, M., Ghahramani, Z., Kustra, R., Tibshirani, R.: Delve. University of Toronto, http://www.cs.toronto.edu/~delve
Meyer, M., Vlachos, P.: StatLib. Department of Statistics at Carnegie Mellon University, http://lib.stat.cmu.edu
Camacho, R.: Inducing Models of Human Control Skills. In: Nédellec, C., Rouveirol, C. (eds.) Machine Learning: ECML-1998. LNCS, vol. 1398, Springer, Heidelberg (1998)
Chipman, H., George, E., McCulloch, R.: Bayesian Ensemble Learning. In: Advances in Neural Information Processing Systems 19, pp. 265–272 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sorokina, D., Caruana, R., Riedewald, M. (2007). Additive Groves of Regression Trees. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-74958-5_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)