Variational Learning in Graphical Models and Neural Networks
Variational methods are becoming increasingly popular for inference and learning in probabilistic models. By providing bounds on quantities of interest, they offer a more controlled approximation framework than techniques such as Laplace’s method, while avoiding the mixing and convergence issues of Markov chain Monte Carlo methods, or the possible computational intractability of exact algorithms. In this paper we review the underlying framework of variational methods and discuss example applications involving sigmoid belief networks, Boltzmann machines and feed-forward neural networks.
KeywordsPosterior Distribution Graphical Model Hide Variable Expectation Maximization Algorithm Markov Chain Monte Carlo Method
Unable to display preview. Download preview PDF.
- M. I. Jordan, Z. Gharamani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. In M. I. Jordan, editor, Learning in Graphical Models. Kluwer, 1998.Google Scholar
- R. M. Neal and G. E. Hinton. A new view of the EM algorithm that justifies incremental and other variants. In M. I. Jordan, editor, Learning in Graphical Models. Kluwer, 1998.Google Scholar
- T. Jaakkola and M. I. Jordan. Approximating posteriors via mixture models. In M. I. Jordan, editor, Learning in Graphical Models. Kluwer, 1998.Google Scholar
- C. M. Bishop, N. Lawrence, T. Jaakkola, and M. I. Jordan. Approximating posterior distributions in belief networks using mixtures. In Advances in Neural Information Processing Systems, volume 10, 1998.Google Scholar
- B. Frey, N. Lawrence, and C. M. Bishop. Markovian inference in belief networks, 1998. Draft technical report.Google Scholar
- N. Lawrence, C. M. Bishop, and M. Jordan. Mixture representations for inference and learning in Boltzmann machines. In Uncertainty in Artificial Intelligence. Morgan Kaufmann, 1998.Google Scholar
- C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.Google Scholar
- G. E. Hinton and D. van Camp. Keeping neural networks simple by minimizing the description length of the weights. In Proceedings of the Sixth Annual Conference on Computational Learning Theory, pages 5–13, 1993.Google Scholar
- D. Barber and C. M. Bishop. Variational learning in Bayesian neural networks. In C. M. Bishop, editor, Generalization in Neural Networks and Machine Learning. Springer Verlag, 1998.Google Scholar