Skip to main content

Semi-supervised Learning for Multi-target Regression

  • Conference paper
  • First Online:
New Frontiers in Mining Complex Patterns (NFMCP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8983))

Included in the following conference series:

Abstract

The most common machine learning approach is supervised learning, which uses labeled data for building predictive models. However, in many practical problems, the availability of annotated data is limited due to the expensive, tedious and time-consuming annotation procedure. At the same, unlabeled data can be easily available in large amounts. This is especially pronounced for predictive modelling problems with a structured output space and complex labels.

Semi-supervised learning (SSL) aims to use unlabeled data as an additional source of information in order to build better predictive models than can be learned from labeled data alone. The majority of work in SSL considers the simple tasks of classification and regression where the output space consists of a single variable. Much less work has been done on SSL for structured output prediction.

In this study, we address the task of multi-target regression (MTR), a type of structured output prediction, where the output space consists of multiple numerical values. Our main objective is to investigate whether we can improve over supervised methods for MTR by using unlabeled data. We use ensembles of predictive clustering trees in a self-training fashion: the most reliable predictions (passing a reliability threshold) on unlabeled data are iteratively used to re-train the model. We use the variance of the ensemble models’ predictions as an indicator of the reliability of predictions. Our results provide a proof-of-concept: The use of unlabeled data improves the predictive performance of ensembles for multi-target regression, but further efforts are needed to automatically select the optimal threshold for the reliability of predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning, vol. 2. MIT Press, Cambridge (2006)

    Book  Google Scholar 

  2. Demšar, D., Džeroski, S., Larsen, T., Struyf, J., Axelsen, J., Pedersen, M., Krogh, P.: Using multi-objective classification to model communities of soil. Ecol. Model. 191(1), 131–143 (2006)

    Article  Google Scholar 

  3. Stojanova, D., Panov, P., Gjorgjioski, V., Kobler, A., Džeroski, S.: Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecol. Inf. 5(4), 256–266 (2010)

    Article  Google Scholar 

  4. Levatić, J., Kocev, D., Džeroski, S.: The importance of the label hierarchy in hierarchical multi-label classification. J. Intel. Inf. Syst. 1–25 (2014)

    Google Scholar 

  5. Appice, A., Džeroski, S.: Stepwise induction of multi-target model trees. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 502–509. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Struyf, J., Džeroski, S.: Constraint based induction of multi-objective regression trees. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 222–233. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Kocev, D., Džeroski, S., White, M.D., Newell, G.R., Griffioen, P.: Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecol. Model. 220(8), 1159–1168 (2009)

    Article  Google Scholar 

  8. Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recognit. 46(3), 817–833 (2013)

    Article  Google Scholar 

  9. Brefeld, U.: Semi-supervised structured prediction models. Ph.D. thesis, Humboldt-Universität zu Berlin, Berlin (2008)

    Google Scholar 

  10. Zhang, Y., Yeung, D.-Y.: Semi-supervised multi-task regression. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 617–631. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  11. Navaratnam, R., Fitzgibbon, A., Cipolla, R.: The joint manifold model for semi-supervised multi-valued regression. In: Proceedings of the 11th IEEE International Conference on Computer Vision, pp. 1–8 (2007)

    Google Scholar 

  12. Zhu, X.: Semi-supervised learning literature survey. Technical report, Computer Sciences, University of Wisconsin-Madison (2008)

    Google Scholar 

  13. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 189–196 (1995)

    Google Scholar 

  14. Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: Proceedings of the 7th IEEE Workshop on Applications of Computer Vision (2005)

    Google Scholar 

  15. Riloff, E., Wiebe, J., Wilson, T.: Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the 7th Conference on Natural Language Learning, pp. 25–32 (2003)

    Google Scholar 

  16. Bandouch, J., Jenkins, O.C., Beetz, M.: A self-training approach for visual tracking and recognition of complex human activity patterns. Int. J. Comput. Vis. 99(2), 166–189 (2012)

    Article  MathSciNet  Google Scholar 

  17. Brefeld, U., Grtner, T., Scheffer, T., Wrobel, S.: Efficient co-regularised least squares regression. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 137–144 (2006)

    Google Scholar 

  18. Zhou, Z.H., Li, M.: Semi-supervised regression with co-training style algorithms. IEEE Trans. Knowl. Data Eng. 19(11), 1479–1493 (2007)

    Article  Google Scholar 

  19. Appice, A., Ceci, M., Malerba, D.: An iterative learning algorithm for within-network regression in the transductive setting. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 36–50. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  20. Appice, A., Ceci, M., Malerba, D.: Transductive learning for spatial regression with co-training. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1065–1070 (2010)

    Google Scholar 

  21. Yang, M.C., Wang, Y.C.F.: A self-learning approach to single image super-resolution. IEEE Trans. Multimed. 15(3), 498–508 (2013)

    Article  Google Scholar 

  22. Malerba, D., Ceci, M., Appice, A.: A relational approach to probabilistic classification in a transductive setting. Eng. Appl. Artif. Intel. 22(1), 109–116 (2009)

    Article  Google Scholar 

  23. Blockeel, H., Struyf, J.: Efficient algorithms for decision tree cross-validation. J. Mach. Learn. Res. 3, 621–650 (2002)

    Google Scholar 

  24. Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, New York (1984)

    MATH  Google Scholar 

  25. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  26. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  27. Bosnić, Z., Kononenko, I.: Comparison of approaches for estimating reliability of individual regression predictions. Data Knowl. Eng. 67(3), 504–516 (2008)

    Article  Google Scholar 

  28. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100. ACM Press (1998)

    Google Scholar 

  29. Stojanova, D.: Estimating forest properties from remotely sensed data by using machine learning. Master’s thesis, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia (2009)

    Google Scholar 

  30. Demšar, D., Debeljak, M., Lavigne, C., Džeroski, S.: Modelling pollen dispersal of genetically modified oilseed rape within the field. In: The Annual Meeting of the Ecological Society of America (2005)

    Google Scholar 

  31. Asuncion, A., Newman, D.: UCI machine learning repository (2007)

    Google Scholar 

  32. Gjorgjioski, V., Džeroski, S.: Clustering Analysis of Vegetation Data. Technical report, Jožef Stefan Institute (2003)

    Google Scholar 

  33. Blockeel, H., Džeroski, S., Grbović, J.: Simultaneous prediction of multiple chemical parameters of river water quality with TILDE. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 32–40. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  34. Chawla, N., Karakoulas, G.: Learning from labeled and unlabeled data: an empirical study across techniques and domains. J. Artif. Intel. Res. 23(1), 331–366 (2005)

    MATH  Google Scholar 

Download references

Acknowledgments

We acknowledge the financial support of the Slovenian Research Agency, via the grant P2-0103 and a young researcher grant to the first author, and the European Commission, via the grants ICT-2013-612944 MAESTRA and ICT-2013-604102 HBP.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michelangelo Ceci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Levatić, J., Ceci, M., Kocev, D., Džeroski, S. (2015). Semi-supervised Learning for Multi-target Regression. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2014. Lecture Notes in Computer Science(), vol 8983. Springer, Cham. https://doi.org/10.1007/978-3-319-17876-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17876-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17875-2

  • Online ISBN: 978-3-319-17876-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics