Skip to main content

Feature Ranking for Multi-target Regression with Tree Ensemble Methods

  • Conference paper
  • First Online:
Discovery Science (DS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10558))

Included in the following conference series:

Abstract

In this work, we address the task of feature ranking for multi-target regression (MTR). The task of MTR concerns problems where there are multiple continuous dependent variables and the goal is to learn a model for predicting all of the targets simultaneously. This task is receiving an increasing attention from the research community. However, performing feature ranking in the context of MTR has not been studied. Here, we propose three feature ranking methods for MTR: Symbolic, Genie3 and Random Forest. These methods are then coupled with three types of ensemble methods: Bagging, Random Forest, and Extremely Randomized Trees. All of the ensemble methods use predictive clustering trees (PCTs) as base predictive models. PCTs are a generalization of decision trees capable of MTR. In total, we consider eight different ensemble-ranking pairs. We extensively evaluate these pairs on 26 benchmark MTR datasets. The results reveal that all of the methods produce relevant feature rankings and that the best performing method is Genie3 ranking used with Random Forests of PCTs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kaggle: Online product sales. https://www.kaggle.com/c/online-sales. Accessed 05 May 2017

  2. Kaggle: See click predict fix. https://www.kaggle.com/c/see-click-predict-fix. Accessed 05 May 2017

  3. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57(1), 289–300 (1995)

    MathSciNet  MATH  Google Scholar 

  4. Blockeel, H.: Top-down induction of first order logical decision trees. Ph.D. thesis, Katholieke Universiteit Leuven, Leuven, Belgium (1998)

    Google Scholar 

  5. Borchani, H., Varando, G., Bielza, C., Larrañaga, P.: A survey on multi-output regression. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 5(5), 216–233 (2015)

    Article  Google Scholar 

  6. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  7. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  8. Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1984)

    MATH  Google Scholar 

  9. Brobbey, A.: Variable Selection in Multivariate Multiple Regression. Master’s thesis, Department of Mathematics and Statistics, Memorial University, Newfoundland and Labrador, Canada (2015)

    Google Scholar 

  10. Cunningham, P., Delany, S.J.: k-Nearest Neighbour Classifiers. Technical report 2, University College Dublin (2007)

    Google Scholar 

  11. Demšar, D., Debeljak, M., Džeroski, S., Lavigne, C.: Modelling pollen dispersal of genetically modified oilseed rape within the field. In: Proceedings of the 9th Annual Meeting of the Ecological Society of America. p. 152 (2005)

    Google Scholar 

  12. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  13. Džeroski, S., Demšar, D., Grbović, J.: Predicting chemical parameters of river water quality from bioindicator data. Appl. Intell. 1(13), 7–17 (2000)

    Article  Google Scholar 

  14. Geurts, P., Erns, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 36(1), 3–42 (2006)

    Article  MATH  Google Scholar 

  15. Goovaerts, P.: Geostatistics for Natural Resources Evaluation. Oxford University Press, New York (1997)

    Google Scholar 

  16. Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)

    Article  Google Scholar 

  17. Hatzikos, E.V., Tsoumakas, G., Tzanis, G., Nick, B., Vlahavas, I.P.: An empirical study on sea water quality prediction. Knowl. Based Syst. 21(6), 471–478 (2008)

    Article  Google Scholar 

  18. Huynh-Thu, V.A., Irrthum, A., Wehenkel, L., Geurts, P.: Inferring regulatory networks from expression data using tree-based methods. PLoS One 5(9), 1–10 (2010)

    Article  Google Scholar 

  19. Kampichler, C., Džeroski, S., Wieland, R.: Application of machine learning techniques to the analysis of soil ecological data bases: relationships between habitat features and Collembolan community characteristics. Soil Biol. Biochem. 32(2), 197–209 (2000)

    Article  Google Scholar 

  20. Karalič, A., Bratko, I.: First order regression. Mach. Learn. 26(2–3), 147–176 (1997)

    Article  MATH  Google Scholar 

  21. Kocev, D., Džeroski, S., White, M., Newell, G., Griffioen, P.: Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecol. Model. 220(8), 1159–1168 (2009)

    Article  Google Scholar 

  22. Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recognit. 46(3), 817–833 (2013)

    Article  Google Scholar 

  23. Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., Vlahavas, I.: Multi-target regression via input space expansion: treating targets as inputs. Mach. Learn. 104(1), 55–98 (2016)

    Article  MathSciNet  Google Scholar 

  24. Stańczyk, U., Jain, L.C. (eds.): Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence. Springer, Heidelberg (2015)

    Google Scholar 

  25. Stojanova, D.: Estimating Forest Properties from Remotely Sensed Data by using Machine Learning. Master’s thesis, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia (2009)

    Google Scholar 

  26. Stojanova, D., Panov, P., Gjorgjioski, V., Kobler, A., Džeroski, S.: Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecol. Inform. 5(4), 256–266 (2000)

    Article  Google Scholar 

  27. Todorovski, L., Blockeel, H., Dzeroski, S.: Ranking with predictive clustering trees. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 444–455. Springer, Heidelberg (2002). doi:10.1007/3-540-36755-1_37

    Chapter  Google Scholar 

  28. Tsanas, A., Xifara, A.: Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build. 49, 560–567 (2012)

    Article  Google Scholar 

  29. Yeh, I.C.: Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cem. Concr. Compos. 29, 474–480 (2007)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to acknowledge the support of the European Commission through the project MAESTRA - Learning from Massive, Incompletely annotated, and Structured Data (Grant number ICT-2013-612944), and of the Slovenian Research Agency through a young researcher grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matej Petković .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Petković, M., Džeroski, S., Kocev, D. (2017). Feature Ranking for Multi-target Regression with Tree Ensemble Methods. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds) Discovery Science. DS 2017. Lecture Notes in Computer Science(), vol 10558. Springer, Cham. https://doi.org/10.1007/978-3-319-67786-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67786-6_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67785-9

  • Online ISBN: 978-3-319-67786-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics