Accelerated gradient boosting
- 30 Downloads
Gradient tree boosting is a prediction algorithm that sequentially produces a model in the form of linear combinations of decision trees, by solving an infinite-dimensional optimization problem. We combine gradient boosting and Nesterov’s accelerated descent to design a new algorithm, which we call AGB (for Accelerated Gradient Boosting). Substantial numerical evidence is provided on both synthetic and real-life data sets to assess the excellent performance of the method in a large variety of prediction problems. It is empirically shown that AGB is less sensitive to the shrinkage parameter and outputs predictors that are considerably more sparse in the number of trees, while retaining the exceptional performance of gradient boosting.
KeywordsGradient boosting Nesterov’s acceleration Trees
We greatly thank two referees for valuable comments and insightful suggestions, which led to a substantial improvement of the paper.
- Biau, G., & Cadre, B. (2017). Optimization by gradient boosting. arXiv:1707.05023.
- Breiman, L. (1997). Arcing the edge. Technical Report 486, Statistics Department, University of California, Berkeley.Google Scholar
- Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation, 11, 1493–1517.Google Scholar
- Breiman, L. (2000). Some infinite theory for predictor ensembles. Technical Report 577, Statistics Department, University of California, Berkeley.Google Scholar
- Bubeck, S. (2013). ORF523: Nesterov’s accelerated gradient descent. https://blogs.princeton.edu/imabandit/2013/04/01/acceleratedgradientdescent.
- Chen, T. & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785–794). New York: ACM.Google Scholar
- Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In S. Lorenza (Ed.), Machine learning: Proceedings of the thirteenth international conference on machine learning (pp. 148–156). San Francisco: Morgan Kaufmann Publishers.Google Scholar
- Jain, P., Netrapalli, P., Kakade, S. M., Kidambi, R., & Sidford, A. (2018). Accelerating stochastic gradient descent for least squares regression. In S. Bubeck, V. Perchet, & P. Rigollet (Ed.), Proceedings of the 31st conference on learning theory (Vol. 75, pp. 545–604). PMLR.Google Scholar
- Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Boosting algorithms as gradient descent. In S. A. Solla, T. K. Leen, & K. Müller (Eds.), Proceedings of the 12th international conference on neural information processing systems (pp. 512–518). Cambridge, MA: The MIT Press.Google Scholar
- Mason, L., Baxter, J., Bartlett, P., & Frean, M. (2000). Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 221–246). Cambridge, MA: The MIT Press.Google Scholar
- Qu, G., & Li, N. (2016). Accelerated distributed Nesterov gradient descent. In 54th Annual Allerton conference on communication, control, and computing (pp. 209–216). Red Hook: Curran Associates, Inc.Google Scholar
- Ridgeway, G. (2007). Generalized boosted models: A guide to the gbm package. http://www.saedsayad.com/docs/gbm2.pdf.
- Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5, 197–227.Google Scholar
- Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013). On the importance of initialization and momentum in deep learning. In S. Dasgupta & D. McAllester (Eds.), Proceedings of the 30th international conference on machine learning, proceedings of machine learning research (pp. 1139–1147).Google Scholar
- Tseng, P. (2008). On accelerated proximal gradient methods for convex-concave optimization. http://www.mit.edu/~dimitrib/PTseng/papers/apgm.pdf.