On Cesáro Averages for Weighted Trees in the Random Forest
- 9 Downloads
The random forest is a popular and effective classification method. It uses a combination of bootstrap resampling and subspace sampling to construct an ensemble of decision trees that are then averaged for a final prediction. In this paper, we propose a potential improvement on the random forest that can be thought of as applying a weight to each tree before averaging. The new method is motivated by the potential instability of averaging predictions of trees that may be of highly variable quality, and because of this, we replace the regular average with a Cesáro average. We provide both a theoretical analysis that gives exact conditions under which the new approach outperforms the traditional random forest, and numerical analysis that shows the new approach is competitive when training a classification model on numerous realistic data sets.
KeywordsClassification Machine learning Random forest
- Apostol, T. (1976). Introduction to analytic number theory, Berlin Germany. New York: Springer.Google Scholar
- Bache, K. , & Lichman, M. UCI machine learning repository. http://archive.ics.uci.edu/ml.
- Daho, M.E.H., Settouti, N., Lazouni, M.E., Chikh, M.E.A. (2014). Weighted vote for trees aggregation in random forest. In Intl Conference on Multimedia Computing Systems (ICMCS) (pp. 438–443).Google Scholar
- Hendricks, P. (2015). titanic: Titanic passenger survival data set. R package version 0.1.0. https://CRAN.R-project.org/package=titanic.
- Li, H.B., Wang, W., Ding, H.W., Dong, J. (2010). Trees weighting random forest method for classifying high-dimensional noisy data. In Proc. IEEE 7th Int. Conf. e-Business Eng. (ICEBE) (pp. 160–163).Google Scholar
- Weisstein, E.W. (2004). Harmonic series. http://mathworld.wolfram.com/HarmonicSeries.html.