The random forest is a popular and effective classification method. It uses a combination of bootstrap resampling and subspace sampling to construct an ensemble of decision trees that are then averaged for a final prediction. In this paper, we propose a potential improvement on the random forest that can be thought of as applying a weight to each tree before averaging. The new method is motivated by the potential instability of averaging predictions of trees that may be of highly variable quality, and because of this, we replace the regular average with a Cesáro average. We provide both a theoretical analysis that gives exact conditions under which the new approach outperforms the traditional random forest, and numerical analysis that shows the new approach is competitive when training a classification model on numerous realistic data sets.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Apostol, T. (1976). Introduction to analytic number theory, Berlin Germany. New York: Springer.
Bache, K. , & Lichman, M. UCI machine learning repository. http://archive.ics.uci.edu/ml.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Daho, M.E.H., Settouti, N., Lazouni, M.E., Chikh, M.E.A. (2014). Weighted vote for trees aggregation in random forest. In Intl Conference on Multimedia Computing Systems (ICMCS) (pp. 438–443).
Friedman, J.H. (2006). Recent advances in predictive (machine) learning. Journal of Classification, 23, 175–197.
Hendricks, P. (2015). titanic: Titanic passenger survival data set. R package version 0.1.0. https://CRAN.R-project.org/package=titanic.
Li, H.B., Wang, W., Ding, H.W., Dong, J. (2010). Trees weighting random forest method for classifying high-dimensional noisy data. In Proc. IEEE 7th Int. Conf. e-Business Eng. (ICEBE) (pp. 160–163).
Naghibi, S.A., Pourghasemi, H.R., Dixon, B. (2016). GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environmental Monitoring and Assessment, 188, 44.
Ronao, C.A., & Cho, S.B. (2015). Random forests with weighted voting for anomalous query access detection in relational databases. Artificial Intelligence and Soft Computing, 9120, 36–48.
Stein, E., & Shakarchi, R. (2003). Fourier analysis: an introduction Princeton. New Jersey: Princeton University Press.
Subasi, A., Alickovic, E., Kevric, J. (2017). Diagnosis of chronic kidney disease by using random forest. CMBEBIH, 62, 589–594.
Weisstein, E.W. (2004). Harmonic series. http://mathworld.wolfram.com/HarmonicSeries.html.
Winham, S.J., Freimuth, R.R., Biernacka, J.M. (2013). A weighted random forests approach to improve predictive performance. Statistical Analysis and Data Mining, 6, 496–505.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Pham, H., Olafsson, S. On Cesáro Averages for Weighted Trees in the Random Forest. J Classif 37, 223–236 (2020). https://doi.org/10.1007/s00357-019-09322-8
- Machine learning
- Random forest