Exploiting Inference for Approximate Parameter Learning in Discriminative Fields: An Empirical Study

  • Sanjiv Kumar
  • Jonas August
  • Martial Hebert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3757)


Estimation of parameters of random field models from labeled training data is crucial for their good performance in many image analysis applications. In this paper, we present an approach for approximate maximum likelihood parameter learning in discriminative field models, which is based on approximating true expectations with simple piecewise constant functions constructed using inference techniques. Gradient ascent with these updates exhibits compelling limit cycle behavior which is tied closely to the number of errors made during inference. The performance of various approximations was evaluated with different inference techniques showing that the learned parameters lead to good classification performance so long as the method used for approximating the gradient is consistent with the inference mechanism. The proposed approach is general enough to be used for the training of, e.g., smoothing parameters of conventional Markov Random Fields (MRFs).


Markov Chain Monte Carlo Neural Information Processing System Saddle Point Approximation Inference Technique Gradient Ascent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kumar, S., Hebert, M.: Discriminative fields for modeling spatial dependencies in natural images. In: Adv. in Neural Information Processing Systems, NIPS (2004)Google Scholar
  2. 2.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. Int. Conf. on Machine Learning (2001)Google Scholar
  3. 3.
    Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proc. Human Language Technology-NAACL (2003)Google Scholar
  4. 4.
    Kumar, S., Hebert, M.: Multiclass discriminative fields for parts-based object detection. In: Snowbird Learning Workshop, Utah (2004)Google Scholar
  5. 5.
    Hinton, G.E.: Training product of experts by minimizing contrastive divergence. Neural Computation 14, 1771–1800 (2002)zbMATHCrossRefGoogle Scholar
  6. 6.
    Tu, Z.W., Zhu, S.C.: Image segmentation by data-driven markov chain monte carlo. IEEE Trans on Pattern Analysis and Machine Intelligence 24(5), 657–673 (2002)CrossRefGoogle Scholar
  7. 7.
    Williams, C.K.I., Agakov, F.V.: An Analysis of Contrastive Divergence Learning in Gaussian Boltzmann Machines. EDI-INF-RR-0120, Informatics Research Report (May 2002)Google Scholar
  8. 8.
    Wainwright, M.J., Jaakkola, T., Willsky, A.S.: Tree-reweighted belief propagation and approximate ml estimation by pseudo-moment matching. In: 9th Workshop on AI Stat (2003)Google Scholar
  9. 9.
    McCallum, A., Rohanimanesh, K., Sutton, C.: Dynamic conditional random fields for jointly labeling multiple sequences. In: NIPS 2003 workshop on Syntax, Semantics and Statistic (2003)Google Scholar
  10. 10.
    Yedidia, J.S., Freeman, W.T., Weiss, Y.: Generalized belief propagation. Advances Neural Information Processing Systems 13, 689–695 (2001)Google Scholar
  11. 11.
    Geiger, D., Girosi, F.: Parallel and deterministic algorithms from mrf’s: Surface reconstruction. IEEE Trans PAMI 5(5), 401–412 (1991)Google Scholar
  12. 12.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. of the IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  13. 13.
    Collins, M.: Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In: Proc. EMNLP (2002)Google Scholar
  14. 14.
    Freund, Y., Schapire, R.: Large margin classification using the perceptron algorithm. Machine Learning 37(3), 277–296 (1999)zbMATHCrossRefGoogle Scholar
  15. 15.
    Besag, J.: On the statistical analysis of dirty pictures. Journal of Royal Statistical Soc. B-48, 259–302 (1986)zbMATHMathSciNetGoogle Scholar
  16. 16.
    Greig, D.M., Porteous, B.T., Seheult, A.H.: Exact maximum a posteriori estimation for binary images. Journal of Royal Statis. Soc. 51(2), 271–279 (1989)Google Scholar
  17. 17.
    Murphy, K., Torralba, A., Freeman, W.T.: Using the forest to see the trees:a graphical model relating features, objects and scenes. In: Advances in Neural Information Processing Systems, NIPS 2003 (2003)Google Scholar
  18. 18.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley, New York (2001)zbMATHGoogle Scholar
  19. 19.
    Taskar, B., Guestrin, C., Koller, D.: Max-margin markov network. In: Neural Information Processing Systems Conference, NIPS 2003 (2003)Google Scholar
  20. 20.
    LeCun, Y., Huang, F.J.: Loss functions for discriminative training of energy-based models. AI-Stats (2005)Google Scholar
  21. 21.
    Qi, Y., Szummer, M., Minka, T.P.: Bayesian conditional random fields. AI & Statistics (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Sanjiv Kumar
    • 1
  • Jonas August
    • 1
  • Martial Hebert
    • 1
  1. 1.The Robotics InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations