Abstract
Co-training is a well known semi-supervised learning algorithm, in which two classifiers are trained on two different views (feature sets): the initially small training set is iteratively updated with unlabelled samples classified with high confidence by one of the two classifiers. In this paper we address an issue that has been overlooked so far in the literature, namely, how co-training performance is affected by the size of the initial training set, as it decreases to the minimum value below which a given learning algorithm can not be applied anymore. In this paper we address this issue empirically, testing the algorithm on 24 real datasets artificially splitted in two views, using two different base classifiers. Our results show that a very small training set, even made up of one only labelled sample per class, does not adversely affect co-training performance.
Keywords
Download to read the full chapter text
Chapter PDF
References
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings 11th Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
Balcan, M.F., Blum, A., Yang, K., Saul, L.K.: Co-Training and Expansion: Towards Bridging Theory and Practice. In: Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 89–96. MIT Press (2005)
Christoudias, C.M., Urtasun, R., Kapoorz, A., Darrell, T.: Co-training with noisy perceptual observations. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2844–2851 (2009)
Didaci, L., Roli, F.: A Bayesian Analysis of Co-Training Algorithm with Insufficient Views. In: Proc. 11th International Conference on Information Science, Signal Processing and their Applications, pp. 1141–1145. IEEE (2012)
Du, J., Ling, C.X., Zhou, Z.-H.: When Does Co-Training Work in Real Data? IEEE Transactions on Knowledge and Data Engineering 23(35), 788–799 (2011)
Zhou, Z.-H., Zhan, D.-C., Yang, Q.: Semi-Supervised Learning with Very Few Labeled Training Examples. In: Proc. AAAI, pp. 675–680 (2007)
Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2010), http://archive.ics.uci.edu/ml
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Didaci, L., Fumera, G., Roli, F. (2012). Analysis of Co-training Algorithm with Very Small Training Sets. In: Gimel’farb, G., et al. Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2012. Lecture Notes in Computer Science, vol 7626. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34166-3_79
Download citation
DOI: https://doi.org/10.1007/978-3-642-34166-3_79
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34165-6
Online ISBN: 978-3-642-34166-3
eBook Packages: Computer ScienceComputer Science (R0)