Semi-supervised Learning for Multi-target Regression

Levatić, Jurica; Ceci, Michelangelo; Kocev, Dragi; Džeroski, Sašo

doi:10.1007/978-3-319-17876-9_1

Jurica Levatić^10,11,
Michelangelo Ceci¹²,
Dragi Kocev^10,12 &
…
Sašo Džeroski^10,11

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8983))

Included in the following conference series:

International Workshop on New Frontiers in Mining Complex Patterns

825 Accesses
5 Citations

Abstract

The most common machine learning approach is supervised learning, which uses labeled data for building predictive models. However, in many practical problems, the availability of annotated data is limited due to the expensive, tedious and time-consuming annotation procedure. At the same, unlabeled data can be easily available in large amounts. This is especially pronounced for predictive modelling problems with a structured output space and complex labels.

Semi-supervised learning (SSL) aims to use unlabeled data as an additional source of information in order to build better predictive models than can be learned from labeled data alone. The majority of work in SSL considers the simple tasks of classification and regression where the output space consists of a single variable. Much less work has been done on SSL for structured output prediction.

In this study, we address the task of multi-target regression (MTR), a type of structured output prediction, where the output space consists of multiple numerical values. Our main objective is to investigate whether we can improve over supervised methods for MTR by using unlabeled data. We use ensembles of predictive clustering trees in a self-training fashion: the most reliable predictions (passing a reliability threshold) on unlabeled data are iteratively used to re-train the model. We use the variance of the ensemble models’ predictions as an indicator of the reliability of predictions. Our results provide a proof-of-concept: The use of unlabeled data improves the predictive performance of ensembles for multi-target regression, but further efforts are needed to automatically select the optimal threshold for the reliability of predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning, vol. 2. MIT Press, Cambridge (2006)
Book Google Scholar
Demšar, D., Džeroski, S., Larsen, T., Struyf, J., Axelsen, J., Pedersen, M., Krogh, P.: Using multi-objective classification to model communities of soil. Ecol. Model. 191(1), 131–143 (2006)
Article Google Scholar
Stojanova, D., Panov, P., Gjorgjioski, V., Kobler, A., Džeroski, S.: Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecol. Inf. 5(4), 256–266 (2010)
Article Google Scholar
Levatić, J., Kocev, D., Džeroski, S.: The importance of the label hierarchy in hierarchical multi-label classification. J. Intel. Inf. Syst. 1–25 (2014)
Google Scholar
Appice, A., Džeroski, S.: Stepwise induction of multi-target model trees. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 502–509. Springer, Heidelberg (2007)
Chapter Google Scholar
Struyf, J., Džeroski, S.: Constraint based induction of multi-objective regression trees. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 222–233. Springer, Heidelberg (2006)
Chapter Google Scholar
Kocev, D., Džeroski, S., White, M.D., Newell, G.R., Griffioen, P.: Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecol. Model. 220(8), 1159–1168 (2009)
Article Google Scholar
Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recognit. 46(3), 817–833 (2013)
Article Google Scholar
Brefeld, U.: Semi-supervised structured prediction models. Ph.D. thesis, Humboldt-Universität zu Berlin, Berlin (2008)
Google Scholar
Zhang, Y., Yeung, D.-Y.: Semi-supervised multi-task regression. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 617–631. Springer, Heidelberg (2009)
Chapter Google Scholar
Navaratnam, R., Fitzgibbon, A., Cipolla, R.: The joint manifold model for semi-supervised multi-valued regression. In: Proceedings of the 11th IEEE International Conference on Computer Vision, pp. 1–8 (2007)
Google Scholar
Zhu, X.: Semi-supervised learning literature survey. Technical report, Computer Sciences, University of Wisconsin-Madison (2008)
Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 189–196 (1995)
Google Scholar
Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: Proceedings of the 7th IEEE Workshop on Applications of Computer Vision (2005)
Google Scholar
Riloff, E., Wiebe, J., Wilson, T.: Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the 7th Conference on Natural Language Learning, pp. 25–32 (2003)
Google Scholar
Bandouch, J., Jenkins, O.C., Beetz, M.: A self-training approach for visual tracking and recognition of complex human activity patterns. Int. J. Comput. Vis. 99(2), 166–189 (2012)
Article MathSciNet Google Scholar
Brefeld, U., Grtner, T., Scheffer, T., Wrobel, S.: Efficient co-regularised least squares regression. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 137–144 (2006)
Google Scholar
Zhou, Z.H., Li, M.: Semi-supervised regression with co-training style algorithms. IEEE Trans. Knowl. Data Eng. 19(11), 1479–1493 (2007)
Article Google Scholar
Appice, A., Ceci, M., Malerba, D.: An iterative learning algorithm for within-network regression in the transductive setting. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 36–50. Springer, Heidelberg (2009)
Chapter Google Scholar
Appice, A., Ceci, M., Malerba, D.: Transductive learning for spatial regression with co-training. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1065–1070 (2010)
Google Scholar
Yang, M.C., Wang, Y.C.F.: A self-learning approach to single image super-resolution. IEEE Trans. Multimed. 15(3), 498–508 (2013)
Article Google Scholar
Malerba, D., Ceci, M., Appice, A.: A relational approach to probabilistic classification in a transductive setting. Eng. Appl. Artif. Intel. 22(1), 109–116 (2009)
Article Google Scholar
Blockeel, H., Struyf, J.: Efficient algorithms for decision tree cross-validation. J. Mach. Learn. Res. 3, 621–650 (2002)
Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, New York (1984)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Bosnić, Z., Kononenko, I.: Comparison of approaches for estimating reliability of individual regression predictions. Data Knowl. Eng. 67(3), 504–516 (2008)
Article Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100. ACM Press (1998)
Google Scholar
Stojanova, D.: Estimating forest properties from remotely sensed data by using machine learning. Master’s thesis, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia (2009)
Google Scholar
Demšar, D., Debeljak, M., Lavigne, C., Džeroski, S.: Modelling pollen dispersal of genetically modified oilseed rape within the field. In: The Annual Meeting of the Ecological Society of America (2005)
Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Google Scholar
Gjorgjioski, V., Džeroski, S.: Clustering Analysis of Vegetation Data. Technical report, Jožef Stefan Institute (2003)
Google Scholar
Blockeel, H., Džeroski, S., Grbović, J.: Simultaneous prediction of multiple chemical parameters of river water quality with TILDE. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 32–40. Springer, Heidelberg (1999)
Chapter Google Scholar
Chawla, N., Karakoulas, G.: Learning from labeled and unlabeled data: an empirical study across techniques and domains. J. Artif. Intel. Res. 23(1), 331–366 (2005)
MATH Google Scholar

Download references

Acknowledgments

We acknowledge the financial support of the Slovenian Research Agency, via the grant P2-0103 and a young researcher grant to the first author, and the European Commission, via the grants ICT-2013-612944 MAESTRA and ICT-2013-604102 HBP.

Author information

Authors and Affiliations

Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
Jurica Levatić, Dragi Kocev & Sašo Džeroski
Jožef Stefan International Postgraduate School, Ljubljana, Slovenia
Jurica Levatić & Sašo Džeroski
Department of Informatics, University of Bari Aldo Moro, Bari, Italy
Michelangelo Ceci & Dragi Kocev

Authors

Jurica Levatić
View author publications
You can also search for this author in PubMed Google Scholar
Michelangelo Ceci
View author publications
You can also search for this author in PubMed Google Scholar
Dragi Kocev
View author publications
You can also search for this author in PubMed Google Scholar
Sašo Džeroski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michelangelo Ceci .

Editor information

Editors and Affiliations

Università degli Studi di Bari Aldo Moro, Bari, Italy
Annalisa Appice
Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
Università degli Studi di Bari Aldo Moro, Bari, Italy
Corrado Loglisci
ICAR-CNR, Rende, Italy
Giuseppe Manco
ICAR-CNR, Rende, Italy
Elio Masciari
University of North Carolina, Charlotte, USA and Warsaw University of Technology, Warsaw, Poland
Zbigniew W. Ras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Levatić, J., Ceci, M., Kocev, D., Džeroski, S. (2015). Semi-supervised Learning for Multi-target Regression. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2014. Lecture Notes in Computer Science(), vol 8983. Springer, Cham. https://doi.org/10.1007/978-3-319-17876-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-17876-9_1
Published: 28 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17875-2
Online ISBN: 978-3-319-17876-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics