Combining Committee-Based Semi-Supervised Learning and Active Learning

Farouk Abdel Hady, Mohamed; Schwenker, Friedhelm

doi:10.1007/s11390-010-9357-6

Combining Committee-Based Semi-Supervised Learning and Active Learning

Regular Paper
Published: 11 July 2010

Volume 25, pages 681–698, (2010)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Mohamed Farouk Abdel Hady¹ &
Friedhelm Schwenker¹

356 Accesses
45 Citations
Explore all metrics

Abstract

Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views are sufficient for learning and independent given the class. However, these assumptions are strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-supervised Learning Based on Improved Co-training by Committee

Semi-supervised active learning algorithm for SVMs based on QBC and tri-training

Article 25 November 2020

Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection

Article 23 November 2015

References

Zhou Z H, Chen K J, Jiang Y. Exploiting unlabeled data in content-based image retrieval. In Proc. the 15th European Conf. Machine Learning (ECML 2004), Pisa, Italy, Sept. 20-24, 2004, pp.525-536.
Li M, Zhou Z H. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. Systems, Man and Cybernetics – Part A: Systems and Humans, 2007, 37(6): 1088-1098.
Article Google Scholar
Levin A, Viola P, Freund Y. Unsupervised improvement of visual detectors using Co-Training. In Proc. the Int. Conf. Computer Vision, Graz, Austria, April 1-3, 2003, pp.626-633.
Nigam K, McCallum A K, Thrun S, Mitchell T. Text classification from labeled and unlabeled documents using EM. Machine Learning, 2000, 39(2/3): 103-134.
Article MATH Google Scholar
Kiritchenko S, Matwin S. Email classification with Co-Training. In Proc. the 2001 Conf. the Centre for Advanced Studies on Collaborative Research (CASCON2001), Toronto, Canada, Nov. 5-7, 2001, pp.8-19.
Nigam K, Ghani R. Analyzing the effectiveness and applicability of Co-Training. In Proc. the 9th Int. Conf. Informa tion and Knowledge Management, McLean, USA, Nov. 6-11, 2000, pp.86-93.
Lewis D D, Gale A W. A sequential algorithm for training text classifiers. In Proc. the Special Interest Group on Info. Retrieval, Dublin, Irland, July 3-6, 1994, pp.3-12.
Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 1977, 39(1): 1-38.
MathSciNet MATH Google Scholar
Blum A, Mitchell T. Combining labeled and unlabeled data with Co-Training. In Proc. the 11th Annual Conf. Computational Learning Theory (COLT1998), Madison, USA, July 24-26, 1998, pp.92-100.
Muslea I, Minton S, Knoblock C A. Selective sampling with redundant views. In Proc. the 17th National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, Austin, USA, Jul. 30-Aug. 3, 2000, pp.621-626.
Goldman S, Zhou Y. Enhancing supervised learning with unlabeled data. In Proc. the 17th Int. Conf. Machine Learning (ICML 2000), Stanford, USA, June 29-July 2, 2000, pp.327-334.
Zhou Y, Goldman S. Democratic co-learning. In Proc. the 16th IEEE Int. Conf. Tools with Artificial Intelligence (ICTAI 2004), Boca Roton, USA, Nov. 15-17, 2004, pp.594-202.
Zhou Z H, Li M. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. and Data Eng., 2005, 17(11): 1529-1541.
Article Google Scholar
Freund Y, Seung H S, Shamir E, Tishby N. Selective sampling using the query by committee algorithm. Machine Learning, 1997, 28(2/3): 133-168.
Article MATH Google Scholar
McCallum A K, Nigam K. Employing EM and pool-based active learning for text classification. In Proc. the 15th Int. Conf. Machine Learning (ICML 1998), Madison, USA, July 24-27, 1998, pp.350-358.
Muslea I, Minton S, Knoblock C A. Active + semi-supervised learning = robust multi-view learning. In Proc. the 19th Int. Conf. Machine Learning (ICML 2002), Sydney, Australia, Jul. 8-12, 2002, pp.435-442.
Roli F. Semi-supervised multiple classifier systems: Background and research directions. In Proc. the 6th International Workshop on Multiple Classifier Systems (MCS 2005), Seaside, USA, June 13-15, 2005, pp.1-11.
Zhu X. Semi-supervised learning literature survey. Technical Report 1530, Univ. Wisconsin-Madison, 2008.
Zhou Z H. When semi-supervised learning meets ensemble learning. In Proc. the 8th Int. Workshop on Multiple Classifier Systems (MCS 2009), Raykjavik, Iceland, Jun. 10-12, 2009, pp.529-538.
Zhou Z H, Li M. Semi-supervised learning by disagreement. Knowledge and Information Systems, Springer. (In press)
Breiman L. Bagging predictors. Machine Learning, 1996, 24(2): 123-140.
MathSciNet MATH Google Scholar
Balcan M F, Blum A, Yang K. Co-Training and Expansion: Towards Bridging Theory and Practice. Advances in Neural Information Processing Systems 17, MIT Press, 2005, pp.89-96.
Wang W, Zhou Z H. Analyzing Co-Training style algorithms. In Proc. the 18th European Conference on Machine Learning (ECML 2007), Warsaw, Poland, Sept. 17-21, 2007, pp.454-465.
Feger F, Koprinska I. Co-Training using RBF nets and different feature splits. In Proc. the Int. Joint Conf. Neural Networks (IJCNN 2006), Vancouver, Canada, July 16-21, 2006, pp.1878-1885.
Brown G, Wyatt J, Harris R, Yao X. Diversity creation methods: A survey and categorisation. Information Fusion, 2005, 6(1): 5-20.
Article Google Scholar
Provost F J, Domingos P. Tree induction for probability-based ranking. Machine Learning, 2003, 52(30): 199-215.
Article MATH Google Scholar
Liang H, Yan Y. Improve decision trees for probability-based ranking by lazy learners. In Proc. the 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2006), 2006, pp.427-435.
Ho T K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Analysis and Machine Intelligence, 1998, 20(8): 832-844.
Article Google Scholar
Ho T K. Nearest neighbors in random subspaces. In Proc. Joint IAPR Int. Workshops SSPR’98 and SPR’98, Sydney, Australia, Aug. 11-13, 1998, pp.640-648.
Blake C, Merz C J. UCI repository of machine learning databases. University of California, http://www.ics.uci.edu/~mlearn/MLRepository.html, 1998.
Fay R. Feature selection and information fusion in hierarchical neural networks for iterative 3D-object recognition. [Ph.D. Dissertation]. Ulm University, Germany, 2007.
Nene S, Nayar S, Murase H. Columbia object image library: COIL. http://citeseer.ist.psu.edu/article/nene96columbia.html, 1996.
Aviles-Cruz C, Gurin-Dugu A, Voz J L, Van Cappel D. Databases, Enhanced Learning for Evolutive Neural Architecture. Tech. Rep. R3-B1-P, INPG, UCL, TSA (1995) 47, http://www.dice.ucl.ac.be/neural-nets/Research/Projects/ELENA/elena.htm.
Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, October 1999.

Download references

Author information

Authors and Affiliations

Institute of Neural Information Processing, University of Ulm, Ulm, Germany
Mohamed Farouk Abdel Hady & Friedhelm Schwenker

Authors

Mohamed Farouk Abdel Hady
View author publications
You can also search for this author in PubMed Google Scholar
Friedhelm Schwenker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Farouk Abdel Hady.

Additional information

This work was partially supported by the Transregional Collaborative Research Centre SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation (DFG). The first author was supported by a scholarship of the German Academic Exchange Service (DAAD).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Farouk Abdel Hady, M., Schwenker, F. Combining Committee-Based Semi-Supervised Learning and Active Learning. J. Comput. Sci. Technol. 25, 681–698 (2010). https://doi.org/10.1007/s11390-010-9357-6

Download citation

Received: 15 May 2009
Revised: 02 February 2010
Published: 11 July 2010
Issue Date: July 2010
DOI: https://doi.org/10.1007/s11390-010-9357-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining Committee-Based Semi-Supervised Learning and Active Learning

Abstract

Access this article

Similar content being viewed by others

Semi-supervised Learning Based on Improved Co-training by Committee

Semi-supervised active learning algorithm for SVMs based on QBC and tri-training

Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combining Committee-Based Semi-Supervised Learning and Active Learning

Abstract

Access this article

Similar content being viewed by others

Semi-supervised Learning Based on Improved Co-training by Committee

Semi-supervised active learning algorithm for SVMs based on QBC and tri-training

Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation