Feature Ranking for Multi-target Regression with Tree Ensemble Methods

Petković, Matej; Džeroski, Sašo; Kocev, Dragi

doi:10.1007/978-3-319-67786-6_13

Matej Petković^17,18,
Sašo Džeroski^17,18 &
Dragi Kocev^17,18

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10558))

Included in the following conference series:

International Conference on Discovery Science

1050 Accesses
7 Citations

Abstract

In this work, we address the task of feature ranking for multi-target regression (MTR). The task of MTR concerns problems where there are multiple continuous dependent variables and the goal is to learn a model for predicting all of the targets simultaneously. This task is receiving an increasing attention from the research community. However, performing feature ranking in the context of MTR has not been studied. Here, we propose three feature ranking methods for MTR: Symbolic, Genie3 and Random Forest. These methods are then coupled with three types of ensemble methods: Bagging, Random Forest, and Extremely Randomized Trees. All of the ensemble methods use predictive clustering trees (PCTs) as base predictive models. PCTs are a generalization of decision trees capable of MTR. In total, we consider eight different ensemble-ranking pairs. We extensively evaluate these pairs on 26 benchmark MTR datasets. The results reveal that all of the methods produce relevant feature rankings and that the best performing method is Genie3 ranking used with Random Forests of PCTs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kaggle: Online product sales. https://www.kaggle.com/c/online-sales. Accessed 05 May 2017
Kaggle: See click predict fix. https://www.kaggle.com/c/see-click-predict-fix. Accessed 05 May 2017
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57(1), 289–300 (1995)
MathSciNet MATH Google Scholar
Blockeel, H.: Top-down induction of first order logical decision trees. Ph.D. thesis, Katholieke Universiteit Leuven, Leuven, Belgium (1998)
Google Scholar
Borchani, H., Varando, G., Bielza, C., Larrañaga, P.: A survey on multi-output regression. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 5(5), 216–233 (2015)
Article Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1984)
MATH Google Scholar
Brobbey, A.: Variable Selection in Multivariate Multiple Regression. Master’s thesis, Department of Mathematics and Statistics, Memorial University, Newfoundland and Labrador, Canada (2015)
Google Scholar
Cunningham, P., Delany, S.J.: k-Nearest Neighbour Classifiers. Technical report 2, University College Dublin (2007)
Google Scholar
Demšar, D., Debeljak, M., Džeroski, S., Lavigne, C.: Modelling pollen dispersal of genetically modified oilseed rape within the field. In: Proceedings of the 9th Annual Meeting of the Ecological Society of America. p. 152 (2005)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Džeroski, S., Demšar, D., Grbović, J.: Predicting chemical parameters of river water quality from bioindicator data. Appl. Intell. 1(13), 7–17 (2000)
Article Google Scholar
Geurts, P., Erns, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 36(1), 3–42 (2006)
Article MATH Google Scholar
Goovaerts, P.: Geostatistics for Natural Resources Evaluation. Oxford University Press, New York (1997)
Google Scholar
Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)
Article Google Scholar
Hatzikos, E.V., Tsoumakas, G., Tzanis, G., Nick, B., Vlahavas, I.P.: An empirical study on sea water quality prediction. Knowl. Based Syst. 21(6), 471–478 (2008)
Article Google Scholar
Huynh-Thu, V.A., Irrthum, A., Wehenkel, L., Geurts, P.: Inferring regulatory networks from expression data using tree-based methods. PLoS One 5(9), 1–10 (2010)
Article Google Scholar
Kampichler, C., Džeroski, S., Wieland, R.: Application of machine learning techniques to the analysis of soil ecological data bases: relationships between habitat features and Collembolan community characteristics. Soil Biol. Biochem. 32(2), 197–209 (2000)
Article Google Scholar
Karalič, A., Bratko, I.: First order regression. Mach. Learn. 26(2–3), 147–176 (1997)
Article MATH Google Scholar
Kocev, D., Džeroski, S., White, M., Newell, G., Griffioen, P.: Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecol. Model. 220(8), 1159–1168 (2009)
Article Google Scholar
Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recognit. 46(3), 817–833 (2013)
Article Google Scholar
Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., Vlahavas, I.: Multi-target regression via input space expansion: treating targets as inputs. Mach. Learn. 104(1), 55–98 (2016)
Article MathSciNet Google Scholar
Stańczyk, U., Jain, L.C. (eds.): Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence. Springer, Heidelberg (2015)
Google Scholar
Stojanova, D.: Estimating Forest Properties from Remotely Sensed Data by using Machine Learning. Master’s thesis, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia (2009)
Google Scholar
Stojanova, D., Panov, P., Gjorgjioski, V., Kobler, A., Džeroski, S.: Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecol. Inform. 5(4), 256–266 (2000)
Article Google Scholar
Todorovski, L., Blockeel, H., Dzeroski, S.: Ranking with predictive clustering trees. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 444–455. Springer, Heidelberg (2002). doi:10.1007/3-540-36755-1_37
Chapter Google Scholar
Tsanas, A., Xifara, A.: Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build. 49, 560–567 (2012)
Article Google Scholar
Yeh, I.C.: Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cem. Concr. Compos. 29, 474–480 (2007)
Article Google Scholar

Download references

Acknowledgements

We would like to acknowledge the support of the European Commission through the project MAESTRA - Learning from Massive, Incompletely annotated, and Structured Data (Grant number ICT-2013-612944), and of the Slovenian Research Agency through a young researcher grant.

Author information

Authors and Affiliations

Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
Matej Petković, Sašo Džeroski & Dragi Kocev
Jožef Stefan International Postgraduate School, Ljubljana, Slovenia
Matej Petković, Sašo Džeroski & Dragi Kocev

Authors

Matej Petković
View author publications
You can also search for this author in PubMed Google Scholar
Sašo Džeroski
View author publications
You can also search for this author in PubMed Google Scholar
Dragi Kocev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matej Petković .

Editor information

Editors and Affiliations

Kyoto University, Kyoto, Japan
Akihiro Yamamoto
Hokkaido University, Sapporo, Japan
Takuya Kida
National Institute of Informatics, Tokyo, Japan
Takeaki Uno
Gakushuin University, Tokyo, Japan
Tetsuji Kuboyama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Petković, M., Džeroski, S., Kocev, D. (2017). Feature Ranking for Multi-target Regression with Tree Ensemble Methods. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds) Discovery Science. DS 2017. Lecture Notes in Computer Science(), vol 10558. Springer, Cham. https://doi.org/10.1007/978-3-319-67786-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-67786-6_13
Published: 16 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67785-9
Online ISBN: 978-3-319-67786-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics