Machine Learning

, Volume 107, Issue 11, pp 1673–1709 | Cite as

Ensembles for multi-target regression with random output selections

  • Martin BreskvarEmail author
  • Dragi Kocev
  • Sašo Džeroski
Part of the following topical collections:
  1. Special Issue of the Discovery Science 2016


We address the task of multi-target regression, where we generate global models that simultaneously predict multiple continuous variables. We use ensembles of generalized decision trees, called predictive clustering trees (PCTs), in particular bagging and random forests (RF) of PCTs and extremely randomized PCTs (extra PCTs). We add another dimension of randomization to these ensemble methods by learning individual base models that consider random subsets of target variables, while leaving the input space randomizations (in RF PCTs and extra PCTs) intact. Moreover, we propose a new ensemble prediction aggregation function, where the final ensemble prediction for a given target is influenced only by those base models that considered it during learning. An extensive experimental evaluation on a range of benchmark datasets has been conducted, where the extended ensemble methods were compared to the original ensemble methods, individual multi-target regression trees, and ensembles of single-target regression trees in terms of predictive performance, running times and model sizes. The results show that the proposed ensemble extension can yield better predictive performance, reduce learning time or both, without a considerable change in model size. The newly proposed aggregation function gives best results when used with extremely randomized PCTs. We also include a comparison with three competing methods, namely random linear target combinations and two variants of random projections.


Predictive clustering trees Multi-target regression Output space decomposition Structured outputs Ensemble methods 



We acknowledge the financial support of the Slovenian Research Agency via the grants P2-0103 and a young researcher grant to MB, as well as the European Commission, through the grants MAESTRA (Learning from Massive, Incompletely annotated, and Structured Data) and HBP (The Human Brain Project), SGA1 and SGA2. SD also acknowledges support by Slovenian Research Agency (via grants J4-7362, L2-7509, and N2-0056), the European Commission (project LANDMARK) and ARVALIS (project BIODIV). The computational experiments presented here were executed on a computing infrastructure from the Slovenian Grid (SLING) initiative.


  1. Abraham, Z., Tan, P. N., Winkler, J., Zhong, S., Liszewska, M., et al. (2013). Position preserving multi-output prediction. In Joint European conference on machine learning and knowledge discovery in databases (pp. 320–335), Springer.Google Scholar
  2. Aho, T., Ženko, B., Džeroski, S., & Elomaa, T. (2012). Multi-target regression with rule ensembles. Journal of Machine Learning Research, 13, 2367–2407.MathSciNetzbMATHGoogle Scholar
  3. Alvarez, M. A., Rosasco, L., Lawrence, N. D., et al. (2012). Kernels for vector-valued functions: A review. Foundations and Trends$\textregistered $ in Machine Learning, 4(3), 195–266.Google Scholar
  4. Appice, A., & Džeroski, S. (2007). Stepwise induction of multi-target model trees. In Machine Learning: ECML 2007, LNCS (Vol. 4701, pp. 502–509). Springer.Google Scholar
  5. Appice, A., & Malerba, D. (2014). Leveraging the power of local spatial autocorrelation in geophysical interpolative clustering. Data Mining and Knowledge Discovery, 28(5–6), 1266–1313.MathSciNetCrossRefGoogle Scholar
  6. Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1), 105–139.CrossRefGoogle Scholar
  7. Blockeel, H. (1998). Top-down induction of first order logical decision trees. Ph.D. thesis, Katholieke Universiteit Leuven, Leuven, Belgium.Google Scholar
  8. Blockeel, H., Džeroski, S., & Grbović, J. (1999). Simultaneous prediction of multiple chemical parameters of river water quality with TILDE. In Proceedings of the 3rd European conference on PKDD—LNAI (Vol. 1704, pp. 32–40). Springer.Google Scholar
  9. Blockeel, H., Raedt, L. D., & Ramon, J. (1998). Top-down induction of clustering trees. In Proceedings of the 15th international conference on machine learning (pp. 55–63), Morgan Kaufmann.Google Scholar
  10. Blockeel, H., & Struyf, J. (2002). Efficient algorithms for decision tree cross-validation. Journal of Machine Learning Research, 3, 621–650.zbMATHGoogle Scholar
  11. Borchani, H., Varando, G., Bielza, C., & Larrañaga, P. (2015). A survey on multi-output regression. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 5(5), 216–233.Google Scholar
  12. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.MathSciNetzbMATHGoogle Scholar
  13. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefzbMATHGoogle Scholar
  14. Breiman, L., & Friedman, J. (1997). Predicting multivariate responses in multiple linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(1), 3–54.MathSciNetCrossRefzbMATHGoogle Scholar
  15. Debeljak, M., Kocev, D., Towers, W., Jones, M., Griffiths, B., & Hallett, P. (2009). Potential of multi-objective models for risk-based mapping of the resilience characteristics of soils: Demonstration at a national level. Soil Use and Management, 25(1), 66–77.CrossRefGoogle Scholar
  16. Deger, F., Mansouri, A., Pedersen, M., Hardeberg, J. Y., & Voisin, Y. (2012). Multi-and single-output support vector regression for spectral reflectance recovery. In 2012 eighth international conference on signal image technology and internet based systems (SITIS) (pp. 805–810). IEEE.Google Scholar
  17. Demšar, D., Džeroski, S., Larsen, T., Struyf, J., Axelsen, J., Bruns-Pedersen, M., et al. (2006). Using multi-objective classification to model communities of soil. Ecological Modelling, 191(1), 131–143.CrossRefGoogle Scholar
  18. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.MathSciNetzbMATHGoogle Scholar
  19. Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56(293), 52–64.MathSciNetCrossRefzbMATHGoogle Scholar
  20. Džeroski, S., Demšar, D., & Grbović, J. (2000). Predicting chemical parameters of river water quality from bioindicator data. Applied Intelligence, 13(1), 7–17.CrossRefGoogle Scholar
  21. Džeroski, S., Kobler, A., Gjorgjioski, V., & Panov, P. (2006). Using decision trees to predict forest stand height and canopy cover from LANSAT and LIDAR data. In Managing environmental knowledge: EnviroInfo 2006: Proceedings of the 20th international conference on informatics for environmental protection (pp. 125–133). Aachen: Shaker Verlag.Google Scholar
  22. Džeroski, S. (2007). Towards a general framework for data mining (pp. 259–300). Berlin: Springer. Scholar
  23. Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. Annals of Mathematical Statistics, 11, 86–92.MathSciNetCrossRefzbMATHGoogle Scholar
  24. Gamberger, D., Ženko, B., Mitelpunkt, A., Shachar, N., & Lavrač, N. (2016). Clusters of male and female alzheimers disease patients in the Alzheimers disease neuroimaging initiative (ADNI) database. Brain Informatics, 3(3), 169–179.CrossRefGoogle Scholar
  25. Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.CrossRefzbMATHGoogle Scholar
  26. Gjorgjioski, V., Džeroski, S., & White, M. (2008). Clustering analysis of vegetation data. Technical report 10065, Jožef Stefan Institute.Google Scholar
  27. Han, Z., Liu, Y., Zhao, J., & Wang, W. (2012). Real time prediction for converter gas tank levels based on multi-output least square support vector regressor. Control Engineering Practice, 20(12), 1400–1409.CrossRefGoogle Scholar
  28. Ikonomovska, E., Gama, J., & Džeroski, S. (2011). Incremental multi-target model trees for data streams. In Proceedings of the 2011 ACM symposium on applied computing (pp. 988–993). ACM.Google Scholar
  29. Iman, R. L., & Davenport, J. M. (1980). Approximations of the critical region of the Friedman statistic. Communications in Statistics: Theory and Methods, 9(6), 571–595.CrossRefzbMATHGoogle Scholar
  30. Izenman, A. J. (1975). Reduced-rank regression for the multivariate linear model. Journal of multivariate analysis, 5(2), 248–264.MathSciNetCrossRefzbMATHGoogle Scholar
  31. Jančič, S., Frisvad, J. C., Kocev, D., Gostinčar, C., Džeroski, S., & Gunde-Cimerman, N. (2016). Production of secondary metabolites in extreme environments: Food- and airborne Wallemia spp. produce toxic metabolites at hypersaline conditions. PLoS ONE, 11(12), e0169116.CrossRefGoogle Scholar
  32. Joly, A. (2017). Exploiting random projections and sparsity with random forests and gradient boosting methods—Application to multi-label and multi-output learning, random forest model compression and leveraging input sparsity. arXiv preprint arXiv:1704.08067 Google Scholar
  33. Joly, A., Geurts, P., Wehenkel, L. (2014). Random forests with random projections of the output space for high dimensional multi-label classification. In Joint European conference on machine learning and knowledge discovery in databases (pp. 607–622). Springer.Google Scholar
  34. Kaggle. (2008). Kaggle competition: Online product sales. Accessed July 19, 2017.
  35. Kocev, D. (2011). Ensembles for predicting structured outputs. Ph.D. thesis, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia.Google Scholar
  36. Kocev, D., & Ceci, M. (2015). Ensembles of extremely randomized trees for multi-target regression. In Discovery science: 18th international conference (DS 2015), LNCS, (Vol. 9356, pp. 86–100).Google Scholar
  37. Kocev, D., Džeroski, S., White, M., Newell, G., & Griffioen, P. (2009). Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecological Modelling, 220(8), 1159–1168.CrossRefGoogle Scholar
  38. Kocev, D., Naumoski, A., Mitreski, K., Krstić, S., & Džeroski, S. (2010). Learning habitat models for the diatom community in Lake Prespa. Ecological Modelling, 221(2), 330–337.CrossRefGoogle Scholar
  39. Kocev, D., Vens, C., Struyf, J., & Džeroski, S. (2007). Ensembles of multi-objective decision trees. In ECML ’07: Proceedings of the 18th European conference on machine learning—LNCS (Vol. 4701, pp. 624–631). Springer.Google Scholar
  40. Kocev, D., Vens, C., Struyf, J., & Džeroski, S. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833.CrossRefGoogle Scholar
  41. Kriegel, H. P., Borgwardt, K., Kröger, P., Pryakhin, A., Schubert, M., & Zimek, A. (2007). Future trends in data mining. Data Mining and Knowledge Discovery, 15, 87–97.MathSciNetCrossRefGoogle Scholar
  42. Levatić, J., Ceci, M., Kocev, D., & Džeroski, S. (2014). Semi-supervised learning for multi-target regression. In International workshop on new frontiers in mining complex patterns (pp. 3–18). Springer.Google Scholar
  43. Madjarov, G., Gjorgjevikj, D., Dimitrovski, I., & Džeroski, S. (2016). The use of data-derived label hierarchies in multi-label classification. Journal of Intelligent Information Systems, 47(1), 57–90.CrossRefGoogle Scholar
  44. Marek, K., Jennings, D., Lasch, S., Siderowf, A., Tanner, C., Simuni, T., et al. (2011). The Parkinson Progression Marker Initiative (PPMI). Progress in Neurobiology, 95(4), 629–635.CrossRefGoogle Scholar
  45. Micchelli, C. A., & Pontil, M. (2004). Kernels for multi-task learning. In Advances in neural information processing systems 17—Proceedings of the 2004 conference (pp. 921–928).Google Scholar
  46. Nemenyi, P. B. (1963). Distribution-free multiple comparisons. Ph.D. thesis, Princeton University, Princeton, NY, USA.Google Scholar
  47. Panov, P., Soldatova, L. N., & Džeroski, S. (2016). Generic ontology of datatypes. Information Sciences, 329, 900–920.CrossRefGoogle Scholar
  48. Slavkov, I., Gjorgjioski, V., Struyf, J., & Džeroski, S. (2010). Finding explained groups of time-course gene expression profiles with predictive clustering trees. Molecular BioSystems, 6(4), 729–740.CrossRefGoogle Scholar
  49. Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., & Vlahavas, I. (2016). Multi-target regression via input space expansion: Treating targets as inputs. Machine Learning, 104(1), 55–98.MathSciNetCrossRefzbMATHGoogle Scholar
  50. Stojanova, D., Ceci, M., Appice, A., & Džeroski, S. (2012). Network regression with predictive clustering trees. In Data mining and knowledge discovery (pp. 1–36).Google Scholar
  51. Stojanova, D., Panov, P., Gjorgjioski, V., Kobler, A., & Džeroski, S. (2010). Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecological Informatics, 5(4), 256–266.CrossRefGoogle Scholar
  52. Struyf, J., & Džeroski, S. (2006). Constraint based induction of multi-objective regression trees. In Proceedings of the 4th international workshop on knowledge discovery in inductive databases KDID—LNCS (Vol. 3933, pp. 222–233). Springer.Google Scholar
  53. Szymański, P., Kajdanowicz, T., & Kersting, K. (2016). How is a data-driven approach better than random choice in label space division for multi-label classification? Entropy, 18(8), 282.CrossRefGoogle Scholar
  54. Tsoumakas, G., Spyromitros-Xioufis, E., Vrekou, A., & Vlahavas, I. (2014). Multi-target regression via random linear target combinations. In Machine learning and knowledge discovery in databases: ECML-PKDD 2014, LNCS (Vol. 8726, pp. 225–240).Google Scholar
  55. Tsoumakas, G., & Vlahavas, I. (2007). Random k-labelsets: An ensemble method for multilabel classification. In Proceedings of the 18th European conference on machine learning (pp. 406–417).Google Scholar
  56. Vens, C., Struyf, J., Schietgat, L., Džeroski, S., & Blockeel, H. (2008). Decision trees for hierarchical multi-label classification. Machine Learning, 73(2), 185–214.CrossRefGoogle Scholar
  57. Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques. Los Altos: Morgan Kaufmann.zbMATHGoogle Scholar
  58. Xu, S., An, X., Qiao, X., Zhu, L., & Li, L. (2013). Multi-output least-squares support vector regression machines. Pattern Recognition Letters, 34(9), 1078–1084.CrossRefGoogle Scholar
  59. Yang, Q., & Wu, X. (2006). 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making, 5(4), 597–604.CrossRefGoogle Scholar
  60. Ženko, B. (2007). Learning predictive clustering rules. Ph.D. thesis, Faculty of Computer Science, University of Ljubljana, Ljubljana, Slovenia.Google Scholar
  61. Zhang, W., Liu, X., Ding, Y., & Shi, D. (2012). Multi-output LS-SVR machine in extended feature space. In 2012 IEEE international conference on computational intelligence for measurement systems and applications (CIMSA) (pp. 130–134). IEEE.Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Department of Knowledge TechnologiesJožef Stefan InstituteLjubljanaSlovenia
  2. 2.Jožef Stefan International Postgraduate SchoolLjubljanaSlovenia

Personalised recommendations