Skip to main content

A Tutorial on the EM Algorithm and Its Applications to Neural Network Learning

  • Chapter
Dealing with Complexity

Abstract

In the past few years, neural networks (also denoted NNs in the sequel) have commanded considerable attention as data analysis tools. Neural Networks can be viewed as universal approximators of non-linear functions that can learn from examples. This chapter focuses on an iterative algorithm for training neural networks inspired by the strong correspondences existing between NNs and some statistical methods [1][2]. This algorithm is often considered for the solution of complex statistical problems with hidden data and we will show that it is also well suited for some NNs learning problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Amari, “Information geometry of the EM and em algorithms for neural networks”, Neural Networks, vol. 8, pp. 1379–1408, 1995.

    Article  Google Scholar 

  2. C. Couvreur and P. Couvreur, “Neural networks and statistics: a naive comparison”, to appear in JORBEL: Belgian Journal of Operations Research, Statistics and Computer Sciences, 1997.

    Google Scholar 

  3. A.P. Dempster, N.M. Laird and D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society B, vol. 39, pp. 1–38, 1977.

    MATH  MathSciNet  Google Scholar 

  4. M.I. Jordan and R.A. Jacobs, “Hierarchical mixtures of experts and the EM-algorithm”, Neural Computations, vol. 6, pp. 181–214, 1994.

    Article  Google Scholar 

  5. R.A. Redner and H.F. Walker, “Mixture densities, maximum likelihood and the EM algorithm”, SIAM Review, vol. 26, pp. 192–239, 1984.

    Article  MathSciNet  Google Scholar 

  6. L.R. Rabiner, “A tutorial on hidden Markov models and selected application in speech recognition”, Proceedings of the IEEE, vol. 77, pp. 257–286, 1989.

    Article  Google Scholar 

  7. D.M. Titterington, “Some recent research in the analysis of mixture distributions”, Statistics, vol. 21, pp. 619–641, 1990.

    Article  MATH  MathSciNet  Google Scholar 

  8. J.A. Fessler and A.O. Hero, “Space-alternating generalized expectation- maximization algorithm”, IEEE Transactions on Signal Processing, vol. 42, pp. 2664–2677, 1994.

    Article  Google Scholar 

  9. M. Segal and E. Weinstein, “The cascade EM algorithm”, Proceedings of the IEEE, vol. 76, pp. 1388–1390, 1988.

    Article  Google Scholar 

  10. M. Segal and E. Weinstein, “A new method for evaluating the log-likelihood gradient, the Hessian, and the Fisher information matrix for linear dynamic systems”, IEEE Transactions on Information Theory, vol. 35, pp. 682–687, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  11. E.M. Johansson, F.V. Dowka and D.M. Goodman, “Back-propagation learning for multi-layer feed forward neural networks using the conjugate gradient method, report UCRL-JC-104850, Lawrence Livermore National Lab, Livermore, CA, 1990.

    Google Scholar 

  12. D.M. Titterington, “Recursive Parameter Estimation using Incomplete Data”, J. Roy. Stat. Soc. B, vol. 39, pp. 1–38, 1977.

    Google Scholar 

  13. M.I. Jordan, and R.A. Jacobs, “Hierarchies of adaptive experts”, in Advances in Neural Information Processing Systems, vol. 4, J.E. Moody, S.J. Hanson, and R.P. Lippmann, Eds, pp. 985–992, Morgan Kaufman, San Mateo, CA, 1992.

    Google Scholar 

  14. M.I. Jordan and L. Xu, “Convergence results for the EM approach to mixtures of experts architectures”, Neural Networks, vol. 8, pp. 1409–1431, 1995.

    Article  Google Scholar 

  15. L. Xu, M.I. Jordan and J. Hinton, New gating net for mixture of experts, EM algorithm and piecewise function approximations, preprint, 1994.

    Google Scholar 

  16. J. Zhuang and S. Amari, “Piecewise-linear division of signal space by a multilayer neural network with the maximum detector”, Transactions of the Institute of Electronics, Information and Communication Engineers, vol. J76-D, pp. 1435–1440, 1993 (in Japanese).

    Google Scholar 

  17. C.F.J., Wu, “On the convergence properties of the EM algorithm”, Annals of Statistics, vol. 11, pp. 95–103, 1983.

    Article  MATH  MathSciNet  Google Scholar 

  18. R.A. Boyles, “On the convergence of the EM algorithm”, J. Roy. Stat. Soc. B, vol. 45, pp. 47–50, 1983.

    MATH  MathSciNet  Google Scholar 

  19. T.A. Louis, “Finding the observed information matrix when using the EM algo- rithm”, Journal of the Royal Statistical Society B, vol. 44, pp. 226–233, 1982.

    MATH  MathSciNet  Google Scholar 

  20. X.-L. Meng and D.B. Rubin, “On the global and componentwise rates of convergences of the EM algorithm”, Lin. Alg. and Appl., vol. 199, pp. 413–425, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  21. M. Jamshidian and R.I. Jennrich, “Conjugate gradient acceleration for the EM algorithm”, Journal of the American Statistical Association, vol. 88, pp. 221–228, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  22. I. Meilijson, `A fast improvement of the EM algorithm on its own terms“, Journal of the Royal Statistical Society B, vol. 51, pp. 127–138, 1989.

    MATH  MathSciNet  Google Scholar 

  23. F. Girosi, T. Poggio and Jones, “Regularization theory and neural network architectures”, Neural Computations, vol. 7, pp. 219–269, 1995.

    Article  Google Scholar 

  24. P.J. Green, “On the use of the EM algorithm for penalized likelihood estimation”, Journal of the Royal Statistical Society B, vol. 52, pp. 443–452, 1990.

    MATH  Google Scholar 

  25. M.R. Segal, P. Bacchetti and N.P. Jewell, “Variances for maximum penalized likelihood estimates obtained via the EM algorithm”, J. Roy. Stat. Soc. B, vol. 56, pp. 345–352, 1994.

    MATH  Google Scholar 

  26. X.-L. Meng and D.B. Rubin, “Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm”, Journal of the American Statistical Association, vol. 86, pp. 899–909, 1991.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag London Limited

About this chapter

Cite this chapter

Kárný, M., Warwick, K., Kůrková, V. (1998). A Tutorial on the EM Algorithm and Its Applications to Neural Network Learning. In: Kárný, M., Warwick, K., Kůrková, V. (eds) Dealing with Complexity. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-1523-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-1523-6_4

  • Publisher Name: Springer, London

  • Print ISBN: 978-3-540-76160-0

  • Online ISBN: 978-1-4471-1523-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics