Aspects of Semi-supervised and Active Learning in Conditional Random Fields

  • Nataliya Sokolovska
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6913)


Conditional random fields are among the state-of-the art approaches to structured output prediction, and the model has been adopted for various real-world problems. The supervised classification is expensive, since it is usually expensive to produce labelled data. Unlabeled data are relatively cheap, but how to use it? Unlabeled data can be used to estimate marginal probability of observations, and we exploit this idea in our work.

Introduction of unlabeled data and of probability of observations into a purely discriminative model is a challenging task.

We consider an extrapolation of a recently proposed semi-supervised criterion to the model of conditional random fields, and show its drawbacks. We discuss alternative usage of the marginal probability and propose a pool-based active learning approach based on quota sampling. We carry out experiments on synthetic as well as on standard natural language data sets, and we show that the proposed quota sampling active learning method is efficient.


conditional random fields probability of observations active learning semi-supervised learning 


  1. 1.
    Altun, Y., McAllester, D., Belkin, M.: Maximum margin semi-supervised learning for structured variables. In: NIPS (2005)Google Scholar
  2. 2.
    Bouchard, G., Triggs, B.: The trade-off between generative and discriminative classifiers. In: IASC (2004)Google Scholar
  3. 3.
    Brefeld, U., Scheffer, T.: Semi-supervised learning for structured output variables. In: ICML (2006)Google Scholar
  4. 4.
    Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2006)CrossRefGoogle Scholar
  5. 5.
    Daumé III, H.: Semi-supervised or semi-unsupervised? In: NAACL Workshop on Semi-supervised Learning for NLP (2009)Google Scholar
  6. 6.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of The Royal Statistical Society Series B 39(1), 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  7. 7.
    Goodman, J.T.: A bit of progress in language modeling. Technical Report MSR-TR-2001-72, Microsoft Research, Redmond (August 2001)Google Scholar
  8. 8.
    Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: NIPS (2004)Google Scholar
  9. 9.
    Holub, A., Perona, P.: A discriminative framework for modelling object classes. In: CVPR 2005 (2005)Google Scholar
  10. 10.
    Jiao, F., Wang, S., Lee, C.H., Greiner, R., Schuurmans, D.: Semi-supervised conditional random fields for improved sequence segmentation and labeling. In: ACL/COLING (2006)Google Scholar
  11. 11.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML (2001)Google Scholar
  12. 12.
    Lasserre, J.A., Bishop, C.M., Minka, T.P.: Principled hybrids of generative and discriminative models. In: CVPR (2006)Google Scholar
  13. 13.
    Mann, G., McCallum, A.: Efficient computation of entropy gradient for semi-supervised conditional random fields. In: NAACL/HLT (2007)Google Scholar
  14. 14.
    Mann, G., McCallum, A.: Simple, robust, scalable semi-supervised learning via expectation regularization. In: ICML (2007)Google Scholar
  15. 15.
    Mann, G., McCallum, A.: Generalized expectation criteria for semi-supervised learning of conditional random fields. In: ACL (2008)Google Scholar
  16. 16.
    Minka, T.: Discriminative models, not discriminative training. Technical Report TR-2005-144, Microsoft Cambridge (2005)Google Scholar
  17. 17.
    Ng, A., Jordan, M.: On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In: NIPS (2002)Google Scholar
  18. 18.
    Qi, Y., Kuksa, P.P., Collobert, R., Kavukcuoglu, K., Weston, J.: Semi-supervised sequence labelling with self-learned feature. In: ICDM (2009)Google Scholar
  19. 19.
    Quattoni, A., Collins, M., Darrell, T.: Conditional random fields for object recognition. In: NIPS (2004)Google Scholar
  20. 20.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  21. 21.
    Scudder, H.J.: Probability of error of some adaptive pattern-recognition machines. IEEE Transactions on Information Theory 11, 363–371 (1965)CrossRefzbMATHMathSciNetGoogle Scholar
  22. 22.
    Seeger, M.: Learning with labeled and unlabeled data. Technical report, University of Edinburgh, Institute for Adaptive and Neural Computation (2002)Google Scholar
  23. 23.
    Sejnowski, T.J., Rosenberg, C.R.: Parallel networks that learn to pronounce english text. Complex Systems 1 (1987)Google Scholar
  24. 24.
    Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference 90, 227–244 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  25. 25.
    Sokolovska, N.: Contributions to estimation of probabilistic discriminative models: semi-supervised learning and feature selection. PhD thesis, TELECOM ParisTech (2010)Google Scholar
  26. 26.
    Sokolovska, N., Cappé, O., Yvon, F.: The asymptotics of semi-supervised learning in discriminative probabilistic models. In: ICML (2008)Google Scholar
  27. 27.
    Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. The MIT Press, Cambridge (2006)Google Scholar
  28. 28.
    Suzuki, J., Fujino, A., Isozaki, H.: Semi-supervised structured output learning based on a hybrid generative and discriminative approach. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (2007)Google Scholar
  29. 29.
    Suzuki, J., Isozaki, H.: Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In: ACL (2008)Google Scholar
  30. 30.
    Suzuki, J., Isozaki, H., Carreras, X., Collins, M.: An empirical study of semi-supervised structured conditional models for dependency parsing. In: EMNLP (2009)Google Scholar
  31. 31.
    Tjong Kim Sang, E.F., de Meulder, F.: Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In: CoNLL (2003)Google Scholar
  32. 32.
    Tomanek, K., Hahn, U.: Semi-supervised active learning for sequence labeling. In: ACL and AFNLP (2009)Google Scholar
  33. 33.
    Tsuboi, Y., Kashima, H., Mori, S., Oda, H., Matsumoto, Y.: Training conditional random fields using incomplete annotations. In: COLING (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Nataliya Sokolovska
    • 1
  1. 1.LRI, CNRS UMR 8623 & INRIA SaclayUniversity Paris SudOrsayFrance

Personalised recommendations