Two-level quantile regression forests for bias correction in range prediction
- 463 Downloads
Quantile regression forests (QRF), a tree-based ensemble method for estimation of conditional quantiles, has been proven to perform well in terms of prediction accuracy, especially for range prediction. However, the model may have bias and suffer from working with high dimensional data (thousands of features). In this paper, we propose a new bias correction method, called bcQRF that uses bias correction in QRF for range prediction. In bcQRF, a new feature weighting subspace sampling method is used to build the first level QRF model. The residual term of the first level QRF model is then used as the response feature to train the second level QRF model for bias correction. The two-level models are used to compute bias-corrected predictions. Extensive experiments on both synthetic and real world data sets have demonstrated that the bcQRF method significantly reduced prediction errors and outperformed most existing regression random forests. The new method performed especially well on high dimensional data.
KeywordsBias correction Random forests Quantile regression forests High dimensional data Data mining
This work is supported by the Shenzhen New Industry Development Fund under Grant NO.JC201005270342A and the project “Some Advanced Statistical Learning Techniques for Computer Vision” funded by the National Foundation of Science and Technology Development, Vietnam under grant number 102.01-2011.17.
- Breiman, L. (1999). Using adaptive bagging to debias regressions. Technical report, Technical Report 547, Statistics Dept. UCB.Google Scholar
- Hothorn, T., Hornik, K., & Zeileis, A. (2011) party: A laboratory for recursive part (y) itioning. r package version 0.9-9999. URL: http://cran.r-project.org/package=party. Accessed 28 Nov 2013.
- Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the boruta package. Journal of Statistical Software, 36, 1–13.Google Scholar
- Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R news, 2(3), 18–22.Google Scholar
- Meinshausen, N. (2012). Quantregforest: quantile regression forests. R package version 0.2-3.Google Scholar
- Tung, N. T., Huang, J. Z., Imran, K., Li, M. J., & Williams, G. (2014). Extensions to quantile regression forests for very high dimensional data. In Advances in knowledge discovery and data mining, vol. 8444, (pp. 247–258). Springer.Google Scholar
- Tuv, E., Borisov, A., & Torkkola, K. (2006). Feature selection using ensemble based ranking against artificial contrasts. In Neural Networks, 2006. IJCNN’06. International Joint Conference on, (pp. 2181–2186). IEEE.Google Scholar
- Xu, R. (2013). Improvements to random forest methodology. PhD thesis, Iowa State University.Google Scholar