ICANN 98 pp 13-22 | Cite as

Variational Learning in Graphical Models and Neural Networks

  • Christopher M. Bishop
Part of the Perspectives in Neural Computing book series (PERSPECT.NEURAL)


Variational methods are becoming increasingly popular for inference and learning in probabilistic models. By providing bounds on quantities of interest, they offer a more controlled approximation framework than techniques such as Laplace’s method, while avoiding the mixing and convergence issues of Markov chain Monte Carlo methods, or the possible computational intractability of exact algorithms. In this paper we review the underlying framework of variational methods and discuss example applications involving sigmoid belief networks, Boltzmann machines and feed-forward neural networks.


Posterior Distribution Graphical Model Hide Variable Expectation Maximization Algorithm Markov Chain Monte Carlo Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    M. I. Jordan, Z. Gharamani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. In M. I. Jordan, editor, Learning in Graphical Models. Kluwer, 1998.Google Scholar
  2. [2]
    R. M. Neal and G. E. Hinton. A new view of the EM algorithm that justifies incremental and other variants. In M. I. Jordan, editor, Learning in Graphical Models. Kluwer, 1998.Google Scholar
  3. [3]
    A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39(1):1–38, 1977.MathSciNetMATHGoogle Scholar
  4. [4]
    L. K. Saul, T. Jaakkola, and M. I. Jordan. Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research, 4:61–76, 1996.MATHGoogle Scholar
  5. [5]
    T. Jaakkola and M. I. Jordan. Approximating posteriors via mixture models. In M. I. Jordan, editor, Learning in Graphical Models. Kluwer, 1998.Google Scholar
  6. [6]
    C. M. Bishop, N. Lawrence, T. Jaakkola, and M. I. Jordan. Approximating posterior distributions in belief networks using mixtures. In Advances in Neural Information Processing Systems, volume 10, 1998.Google Scholar
  7. [7]
    B. Frey, N. Lawrence, and C. M. Bishop. Markovian inference in belief networks, 1998. Draft technical report.Google Scholar
  8. [8]
    D. Ackley, G. Hinton, and T. Sejnowski. A learning algorithm for Boltzmann machines. Cognitive Science, 9:147–169, 1985.CrossRefGoogle Scholar
  9. [9]
    C. Peterson and J. R. Anderson. A mean field learning algorithm for neural networks. Complex Systems, 1:995–1019, 1987.MATHGoogle Scholar
  10. [10]
    N. Lawrence, C. M. Bishop, and M. Jordan. Mixture representations for inference and learning in Boltzmann machines. In Uncertainty in Artificial Intelligence. Morgan Kaufmann, 1998.Google Scholar
  11. [11]
    C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.Google Scholar
  12. [12]
    D. J. C. MacKay. A practical Bayesian framework for back-propagation networks. Neural Computation, 4(3):448–472, 1992.CrossRefGoogle Scholar
  13. [13]
    G. E. Hinton and D. van Camp. Keeping neural networks simple by minimizing the description length of the weights. In Proceedings of the Sixth Annual Conference on Computational Learning Theory, pages 5–13, 1993.Google Scholar
  14. [14]
    D. Barber and C. M. Bishop. Variational learning in Bayesian neural networks. In C. M. Bishop, editor, Generalization in Neural Networks and Machine Learning. Springer Verlag, 1998.Google Scholar

Copyright information

© Springer-Verlag London 1998

Authors and Affiliations

  • Christopher M. Bishop
    • 1
  1. 1.Microsoft ResearchCambridgeUK

Personalised recommendations