Constructive Aggregation and Its Application to Forecasting with Dynamic Ensembles
- 1.5k Downloads
Abstract
While the predictive advantage of ensemble methods is nowadays widely accepted, the most appropriate way of estimating the weights of each individual model remains an open research question. Meanwhile, several studies report that combining different ensemble approaches leads to improvements in performance, due to a better trade-off between the diversity and the error of the individual models in the ensemble. We contribute to this research line by proposing an aggregation framework for a set of independently created forecasting models, i.e. heterogeneous ensembles. The general idea is to, instead of directly aggregating these models, first rearrange them into different subsets, creating a new set of combined models which is then aggregated into a final decision. We present this idea as constructive aggregation, and apply it to time series forecasting problems. Results from empirical experiments show that applying constructive aggregation to state of the art dynamic aggregation methods provides a consistent advantage. Constructive aggregation is publicly available in a software package. Data related to this paper are available at: https://github.com/vcerqueira/timeseriesdata. Code related to this paper is available at: https://github.com/vcerqueira/tsensembler.
Keywords
Ensemble learning Forecasting Constructive induction Regression Dynamic expert aggregationNotes
Acknowledgements
This work is financed by Project “Coral - Sustainable Ocean Exploitation: Tools and Sensors/NORTE-01-0145-FEDER-000036”, which is financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund (ERDF).
References
- 1.Aiolfi, M., Timmermann, A.: Persistence in forecasting performance and conditional combination strategies. J. Econ. 135(1), 31–53 (2006)MathSciNetCrossRefGoogle Scholar
- 2.Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1–2), 105–139 (1999)CrossRefGoogle Scholar
- 3.Brown, G.: Ensemble learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 312–320. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-30164-8_252CrossRefGoogle Scholar
- 4.Brown, G., Wyatt, J.L., Tiňo, P.: Managing diversity in regression ensembles. J. Mach. Learn. Res. 6, 1621–1650 (2005)MathSciNetzbMATHGoogle Scholar
- 5.Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 18. ACM (2004)Google Scholar
- 6.Cerqueira, V., Torgo, L., Smailović, J., Mozetič, I.: A comparative study of performance estimation methods for time series forecasting, pp. 529–538. IEEE (2017)Google Scholar
- 7.Cerqueira, V., Torgo, L., Pinto, F., Soares, C.: Arbitrated ensemble for time series forecasting. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10535, pp. 478–494. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71246-8_29CrossRefGoogle Scholar
- 8.Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York (2006)CrossRefGoogle Scholar
- 9.Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
- 10.Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefGoogle Scholar
- 11.Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)CrossRefGoogle Scholar
- 12.Gaillard, P., Goude, Y.: Forecasting electricity consumption by aggregating experts; how to design a good set of experts. In: Antoniadis, A., Poggi, J.-M., Brossat, X. (eds.) Modeling and Stochastic Learning for Forecasting in High Dimensions. LNS, vol. 217, pp. 95–115. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18732-7_6CrossRefGoogle Scholar
- 13.Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 44 (2014)CrossRefGoogle Scholar
- 14.Hyndman, R.: Time series data library. http://data.is/TSDLdemo. Accessed 11 Dec 2017
- 15.Hyndman, R.J., et al.: forecast: Forecasting functions for time series and linear models, R package version 5.6 (2014)Google Scholar
- 16.Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab - an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004)CrossRefGoogle Scholar
- 17.Kennel, M.B., Brown, R., Abarbanel, H.D.: Determining embedding dimension for phase-space reconstruction using a geometrical construction. Phys. Rev. A 45(6), 3403 (1992)CrossRefGoogle Scholar
- 18.Kuhn, M., Weston, S., Keefer, C., Coulter N.: C code for Cubist by Ross Quinlan. In: Cubist: Rule-and Instance-Based Regression Modeling, R package version 0.0.18 (2014)Google Scholar
- 19.Kuncheva, L.I.: A theoretical study on six classifier fusion strategies. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 281–286 (2002)CrossRefGoogle Scholar
- 20.Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, New York (2004)CrossRefGoogle Scholar
- 21.Lichman, M.: UCI machine learning repository (2013). https://archive.ics.uci.edu/ml
- 22.Mevik, B.H., Wehrens, R., Liland, K.H.: pls: Partial Least Squares and Principal Component Regression, r package version 2.6-0 (2016)Google Scholar
- 23.Milborrow, S.: earth: Multivariate Adaptive Regression Spline Models. Derived from mda:mars by Trevor Hastie and Rob Tibshirani (2012)Google Scholar
- 24.Newbold, P., Granger, C.W.: Experience with forecasting univariate time series and the combination of forecasts. J. R. Stat. Society. Ser. (Gen.) 137, 131–165 (1974)MathSciNetCrossRefGoogle Scholar
- 25.Pfahringer, B.: Winning the KDD99 classification cup: bagged boosting. ACM SIGKDD Explor. Newsl. 1(2), 65–66 (2000)CrossRefGoogle Scholar
- 26.R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Austria, Vienna (2013)Google Scholar
- 27.van Rijn, J.N., Holmes, G., Pfahringer, B., Vanschoren, J.: The online performance estimation framework: heterogeneous ensemble learning for data streams. Mach. Learn. 107, 1–28 (2018)MathSciNetCrossRefGoogle Scholar
- 28.Timmermann, A.: Forecast combinations. In: Handbook of Economic Forecasting, vol. 1, pp. 135–196 (2006)CrossRefGoogle Scholar
- 29.Webb, G.I.: Multiboosting: a technique for combining boosting and wagging. Mach. Learn. 40(2), 159–196 (2000)CrossRefGoogle Scholar
- 30.Webb, G.I., Zheng, Z.: Multistrategy ensemble learning: reducing error by combining ensemble learning techniques. IEEE Trans. Knowl. Data Eng. 16(8), 980–991 (2004)CrossRefGoogle Scholar
- 31.Wnek, J., Michalski, R.S.: Hypothesis-driven constructive induction in AQ17-HCI: a method and experiments. Mach. Learn. 14(2), 139–168 (1994)CrossRefGoogle Scholar
- 32.Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)CrossRefGoogle Scholar
- 33.Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)CrossRefGoogle Scholar
- 34.Wright, M.N.: ranger: A Fast Implementation of Random Forests, R package (2015)Google Scholar
- 35.Yu, Y., Zhou, Z.H., Ting, K.M.: Cocktail ensemble for regression. In: 7th IEEE International Conference on Data Mining, ICDM 2007, pp. 721–726. IEEE (2007)Google Scholar
- 36.Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning, ICML 2003, pp. 928–936 (2003)Google Scholar