Abstract
In this work, we address the task of feature ranking for multi-target regression (MTR). The task of MTR concerns problems where there are multiple continuous dependent variables and the goal is to learn a model for predicting all of the targets simultaneously. This task is receiving an increasing attention from the research community. However, performing feature ranking in the context of MTR has not been studied. Here, we propose three feature ranking methods for MTR: Symbolic, Genie3 and Random Forest. These methods are then coupled with three types of ensemble methods: Bagging, Random Forest, and Extremely Randomized Trees. All of the ensemble methods use predictive clustering trees (PCTs) as base predictive models. PCTs are a generalization of decision trees capable of MTR. In total, we consider eight different ensemble-ranking pairs. We extensively evaluate these pairs on 26 benchmark MTR datasets. The results reveal that all of the methods produce relevant feature rankings and that the best performing method is Genie3 ranking used with Random Forests of PCTs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kaggle: Online product sales. https://www.kaggle.com/c/online-sales. Accessed 05 May 2017
Kaggle: See click predict fix. https://www.kaggle.com/c/see-click-predict-fix. Accessed 05 May 2017
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57(1), 289–300 (1995)
Blockeel, H.: Top-down induction of first order logical decision trees. Ph.D. thesis, Katholieke Universiteit Leuven, Leuven, Belgium (1998)
Borchani, H., Varando, G., Bielza, C., Larrañaga, P.: A survey on multi-output regression. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 5(5), 216–233 (2015)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1984)
Brobbey, A.: Variable Selection in Multivariate Multiple Regression. Master’s thesis, Department of Mathematics and Statistics, Memorial University, Newfoundland and Labrador, Canada (2015)
Cunningham, P., Delany, S.J.: k-Nearest Neighbour Classifiers. Technical report 2, University College Dublin (2007)
Demšar, D., Debeljak, M., Džeroski, S., Lavigne, C.: Modelling pollen dispersal of genetically modified oilseed rape within the field. In: Proceedings of the 9th Annual Meeting of the Ecological Society of America. p. 152 (2005)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Džeroski, S., Demšar, D., Grbović, J.: Predicting chemical parameters of river water quality from bioindicator data. Appl. Intell. 1(13), 7–17 (2000)
Geurts, P., Erns, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 36(1), 3–42 (2006)
Goovaerts, P.: Geostatistics for Natural Resources Evaluation. Oxford University Press, New York (1997)
Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)
Hatzikos, E.V., Tsoumakas, G., Tzanis, G., Nick, B., Vlahavas, I.P.: An empirical study on sea water quality prediction. Knowl. Based Syst. 21(6), 471–478 (2008)
Huynh-Thu, V.A., Irrthum, A., Wehenkel, L., Geurts, P.: Inferring regulatory networks from expression data using tree-based methods. PLoS One 5(9), 1–10 (2010)
Kampichler, C., Džeroski, S., Wieland, R.: Application of machine learning techniques to the analysis of soil ecological data bases: relationships between habitat features and Collembolan community characteristics. Soil Biol. Biochem. 32(2), 197–209 (2000)
Karalič, A., Bratko, I.: First order regression. Mach. Learn. 26(2–3), 147–176 (1997)
Kocev, D., Džeroski, S., White, M., Newell, G., Griffioen, P.: Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecol. Model. 220(8), 1159–1168 (2009)
Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recognit. 46(3), 817–833 (2013)
Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., Vlahavas, I.: Multi-target regression via input space expansion: treating targets as inputs. Mach. Learn. 104(1), 55–98 (2016)
Stańczyk, U., Jain, L.C. (eds.): Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence. Springer, Heidelberg (2015)
Stojanova, D.: Estimating Forest Properties from Remotely Sensed Data by using Machine Learning. Master’s thesis, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia (2009)
Stojanova, D., Panov, P., Gjorgjioski, V., Kobler, A., Džeroski, S.: Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecol. Inform. 5(4), 256–266 (2000)
Todorovski, L., Blockeel, H., Dzeroski, S.: Ranking with predictive clustering trees. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 444–455. Springer, Heidelberg (2002). doi:10.1007/3-540-36755-1_37
Tsanas, A., Xifara, A.: Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build. 49, 560–567 (2012)
Yeh, I.C.: Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cem. Concr. Compos. 29, 474–480 (2007)
Acknowledgements
We would like to acknowledge the support of the European Commission through the project MAESTRA - Learning from Massive, Incompletely annotated, and Structured Data (Grant number ICT-2013-612944), and of the Slovenian Research Agency through a young researcher grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Petković, M., Džeroski, S., Kocev, D. (2017). Feature Ranking for Multi-target Regression with Tree Ensemble Methods. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds) Discovery Science. DS 2017. Lecture Notes in Computer Science(), vol 10558. Springer, Cham. https://doi.org/10.1007/978-3-319-67786-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-67786-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67785-9
Online ISBN: 978-3-319-67786-6
eBook Packages: Computer ScienceComputer Science (R0)