Augmenting Supervised Neural Classifier Training Using a Corpus of Unlabeled Data

  • Andrew Skabar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2479)


In recent years, there has been growing interest in applying techniques that incorporate knowledge from unlabeled data into systems performing supervised learning. However, disparate results have been presented in the literature, and there is no general consensus that the use of unlabeled examples should always improve classifier performance. This paper proposes a method for incorporating a corpus of unlabeled examples into the supervised training of a neural network classifier and presents results from applying the technique to several datasets from the UCI repository. While the results do not provide support for the claim that unlabeled data can improve overall classification accuracy, a bias-variance decomposition shows that classifiers trained with unlabeled data display lower bias and higher variance than classifiers trained using labeled data alone.


Classification Performance Supervise Learning Unlabeled Data Target Class Label Training 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dempster, A.P., Laird, N.M. and Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, (1977), pp. 1–38.zbMATHMathSciNetGoogle Scholar
  2. 2.
    Ghahramani, Z. & Jordan, I.: Supervised learning from incomplete data via an EM approach, in Advances in Neural Information Processing Systems 6. J.D. Cowan, G. Tesauro and J. Alspector (eds). Morgan Kaufmann Publishers, San Francisco, CA, (1994).Google Scholar
  3. 3.
    Nigam, K., McCallum, A.K., Thrun, S, & Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning, 39, (2000) pp. 103–134.zbMATHCrossRefGoogle Scholar
  4. 4.
    Blum, A. & Mitchell, T.: Combining labeled and unlabeled data with co-training, Proceeding of the Eleventh ANNUAL Conference on Computational Learning Theory (1998) pp. 92–100.Google Scholar
  5. 5.
    Goldman, S, and Zhou, Y.: Enhancing supervised learning with unlabeled data, Proceedings of International Conference on Machine Learning ICML 2000, (2000).Google Scholar
  6. 6.
    Vapnik, V.: Statistical Learning Theory. Wiley, (1998).Google Scholar
  7. 7.
    Jaakkola, T., Meila, M. & Jebara, T.: Maximum Entropy Discrimination, in nips, vol. 12, (1999), pp 470–476.Google Scholar
  8. 8.
    Shahshahani, B.M. and Landgrebe, D.A.: The effect of unlabeled samples in reducing the small size problem and mitigating the Hughes phenomenon, IEEE Transactions on Geoscience and Remote Sensing, 32(5), (1994) pp 1087–1095.CrossRefGoogle Scholar
  9. 9.
    Baluja, S.: Probabilistic modeling for face orientation discrimination: Learning from labeled and unlabeled data, Neural and Information Processing Systems (NIPS) (1998).Google Scholar
  10. 10.
    Cozman, F.G and Cohen, I.: Unlabeled Data Can Degrade Classification Performance of Generative Classifiers, HP Labs Technical Report HPL-2001-234 (2001).Google Scholar
  11. 11.
    Richard, M.D. and Lippmann, R.P.: Neural network classifiers estimate Bayesian a posteriori probabilities, Neural Computation, 3(4) (1991) pp. 461–483.CrossRefGoogle Scholar
  12. 12.
    White, H.: Learning in artificial neural networks: a statistical perspective. Neural Computation 1(4), (1989), pp. 425–464.CrossRefGoogle Scholar
  13. 13.
    Tarassenko, L., Hayton, P. & Brady, M.: Novelty detection for the identification of masses in mammograms, Proc. Fourth International IEEE Conference on Artificial Neural Networks, vol. 409, (1995) pp. 442–447.CrossRefGoogle Scholar
  14. 14.
    Parra, L., Deco, G. & Miesbach, S.: Statistical independence and novelty detection with information preserving nonlinear maps, Neural Computation, vol. 8, (1996), pp. 260–269.CrossRefGoogle Scholar
  15. 15.
    Duda, R.O. & Hart, P.E.: Pattern Recognition and Scene Analysis, John Wiley & Sons, New York, (1973).Google Scholar
  16. 16.
    Skabar, A.: Single-class classifier learning using neural networks: extracting context from unlabeled data, Artificial Intelligence and Applications (AIA2002), Malaga, Spain, 2002.Google Scholar
  17. 17.
    Bishop, C.: Neural Networks for Pattern Recognition, Oxford University Press, Oxford, (1995).Google Scholar
  18. 18.
    Geman, S., Bienenstock, E. & Doursat, R.: Neural Networks and the Bias/Variance Dilemma, Neural Computation, Vol. 4, (1992) pp. 1–58.CrossRefGoogle Scholar
  19. 19.
    Kohavi, R. & Wolpert, D.H.: Bias plus variance decomposition for zero-one loss functions, Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, (1996), pp. 275–283.Google Scholar
  20. 20.
    Breiman, L.: Bias, variance, and Arcing Classifiers. Technical Report 444486, Statistics Department, University of California, Berkeley, CA, (1996).Google Scholar
  21. 21.
    Kong, E.B. and Dietterich, T.G.: Error-correcting output coding corrects bias and variance, Proceedings of the 12th International Conference on Machine Learning, Tahoe City, CA, (1995) pp. 313–321.Google Scholar
  22. 22.
    Friedman, J.H.: On bias, variance, 0/1-loss, and the curse-of-dimensionality, Data Mining and Knowledge Discovery, Vol. 1, No. 1, Kluwer Academic Publishers. (1997) pp 55–77CrossRefGoogle Scholar
  23. 23.
    Seeger, M.: Learning with labeled and unlabeled data. Technical Report, Institute of Adaptive and Neural Computation, University of Edinburgh, Edinburgh, UK, (2001).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Andrew Skabar
    • 1
  1. 1.School of Information TechnologyInternational University in GermanyBruchsalGermany

Personalised recommendations