A Tutorial on the EM Algorithm and Its Applications to Neural Network Learning

Kárný, Mirek; Warwick, Kevin; Kůrková, Vera

doi:10.1007/978-1-4471-1523-6_4

Mirek Kárný Csc, DrSc⁵,
Kevin Warwick BSc, PhD, DSc, DrSc⁶ &
Vera Kůrková PhD⁷

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

241 Accesses

Abstract

In the past few years, neural networks (also denoted NNs in the sequel) have commanded considerable attention as data analysis tools. Neural Networks can be viewed as universal approximators of non-linear functions that can learn from examples. This chapter focuses on an iterative algorithm for training neural networks inspired by the strong correspondences existing between NNs and some statistical methods [1][2]. This algorithm is often considered for the solution of complex statistical problems with hidden data and we will show that it is also well suited for some NNs learning problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Amari, “Information geometry of the EM and em algorithms for neural networks”, Neural Networks, vol. 8, pp. 1379–1408, 1995.
Article Google Scholar
C. Couvreur and P. Couvreur, “Neural networks and statistics: a naive comparison”, to appear in JORBEL: Belgian Journal of Operations Research, Statistics and Computer Sciences, 1997.
Google Scholar
A.P. Dempster, N.M. Laird and D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society B, vol. 39, pp. 1–38, 1977.
MATH MathSciNet Google Scholar
M.I. Jordan and R.A. Jacobs, “Hierarchical mixtures of experts and the EM-algorithm”, Neural Computations, vol. 6, pp. 181–214, 1994.
Article Google Scholar
R.A. Redner and H.F. Walker, “Mixture densities, maximum likelihood and the EM algorithm”, SIAM Review, vol. 26, pp. 192–239, 1984.
Article MathSciNet Google Scholar
L.R. Rabiner, “A tutorial on hidden Markov models and selected application in speech recognition”, Proceedings of the IEEE, vol. 77, pp. 257–286, 1989.
Article Google Scholar
D.M. Titterington, “Some recent research in the analysis of mixture distributions”, Statistics, vol. 21, pp. 619–641, 1990.
Article MATH MathSciNet Google Scholar
J.A. Fessler and A.O. Hero, “Space-alternating generalized expectation- maximization algorithm”, IEEE Transactions on Signal Processing, vol. 42, pp. 2664–2677, 1994.
Article Google Scholar
M. Segal and E. Weinstein, “The cascade EM algorithm”, Proceedings of the IEEE, vol. 76, pp. 1388–1390, 1988.
Article Google Scholar
M. Segal and E. Weinstein, “A new method for evaluating the log-likelihood gradient, the Hessian, and the Fisher information matrix for linear dynamic systems”, IEEE Transactions on Information Theory, vol. 35, pp. 682–687, 1989.
Article MATH MathSciNet Google Scholar
E.M. Johansson, F.V. Dowka and D.M. Goodman, “Back-propagation learning for multi-layer feed forward neural networks using the conjugate gradient method, report UCRL-JC-104850, Lawrence Livermore National Lab, Livermore, CA, 1990.
Google Scholar
D.M. Titterington, “Recursive Parameter Estimation using Incomplete Data”, J. Roy. Stat. Soc. B, vol. 39, pp. 1–38, 1977.
Google Scholar
M.I. Jordan, and R.A. Jacobs, “Hierarchies of adaptive experts”, in Advances in Neural Information Processing Systems, vol. 4, J.E. Moody, S.J. Hanson, and R.P. Lippmann, Eds, pp. 985–992, Morgan Kaufman, San Mateo, CA, 1992.
Google Scholar
M.I. Jordan and L. Xu, “Convergence results for the EM approach to mixtures of experts architectures”, Neural Networks, vol. 8, pp. 1409–1431, 1995.
Article Google Scholar
L. Xu, M.I. Jordan and J. Hinton, New gating net for mixture of experts, EM algorithm and piecewise function approximations, preprint, 1994.
Google Scholar
J. Zhuang and S. Amari, “Piecewise-linear division of signal space by a multilayer neural network with the maximum detector”, Transactions of the Institute of Electronics, Information and Communication Engineers, vol. J76-D, pp. 1435–1440, 1993 (in Japanese).
Google Scholar
C.F.J., Wu, “On the convergence properties of the EM algorithm”, Annals of Statistics, vol. 11, pp. 95–103, 1983.
Article MATH MathSciNet Google Scholar
R.A. Boyles, “On the convergence of the EM algorithm”, J. Roy. Stat. Soc. B, vol. 45, pp. 47–50, 1983.
MATH MathSciNet Google Scholar
T.A. Louis, “Finding the observed information matrix when using the EM algo- rithm”, Journal of the Royal Statistical Society B, vol. 44, pp. 226–233, 1982.
MATH MathSciNet Google Scholar
X.-L. Meng and D.B. Rubin, “On the global and componentwise rates of convergences of the EM algorithm”, Lin. Alg. and Appl., vol. 199, pp. 413–425, 1994.
Article MATH MathSciNet Google Scholar
M. Jamshidian and R.I. Jennrich, “Conjugate gradient acceleration for the EM algorithm”, Journal of the American Statistical Association, vol. 88, pp. 221–228, 1993.
Article MATH MathSciNet Google Scholar
I. Meilijson, `A fast improvement of the EM algorithm on its own terms“, Journal of the Royal Statistical Society B, vol. 51, pp. 127–138, 1989.
MATH MathSciNet Google Scholar
F. Girosi, T. Poggio and Jones, “Regularization theory and neural network architectures”, Neural Computations, vol. 7, pp. 219–269, 1995.
Article Google Scholar
P.J. Green, “On the use of the EM algorithm for penalized likelihood estimation”, Journal of the Royal Statistical Society B, vol. 52, pp. 443–452, 1990.
MATH Google Scholar
M.R. Segal, P. Bacchetti and N.P. Jewell, “Variances for maximum penalized likelihood estimates obtained via the EM algorithm”, J. Roy. Stat. Soc. B, vol. 56, pp. 345–352, 1994.
MATH Google Scholar
X.-L. Meng and D.B. Rubin, “Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm”, Journal of the American Statistical Association, vol. 86, pp. 899–909, 1991.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Theory & Automation, Pod vodarenskou vezi 4, 182 08, Prague 8, Czech Republic
Mirek Kárný Csc, DrSc
Department of Cybernetics, University of Reading, RG6 6AY, Whiteknights, Reading, UK
Kevin Warwick BSc, PhD, DSc, DrSc
Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod vodarenskou vezi 2, 182 07, Prague 8, Czech Republic
Vera Kůrková PhD

Authors

Mirek Kárný Csc, DrSc
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Warwick BSc, PhD, DSc, DrSc
View author publications
You can also search for this author in PubMed Google Scholar
Vera Kůrková PhD
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Information Theory & Automation, Pod vodarenskou vezi 4, 182 08, Prague 8, Czech Republic
Mirek Kárný Csc, DrSc
Department of Cybernetics, University of Reading, RG6 6AY, Whiteknights, Reading, UK
Kevin Warwick BSc, PhD, DSc, DrSc
Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod vodarenskou vezi 2, 182 07, Prague 8, Czech Republic
Vera Kůrková PhD

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kárný, M., Warwick, K., Kůrková, V. (1998). A Tutorial on the EM Algorithm and Its Applications to Neural Network Learning. In: Kárný, M., Warwick, K., Kůrková, V. (eds) Dealing with Complexity. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-1523-6_4

Download citation

DOI: https://doi.org/10.1007/978-1-4471-1523-6_4
Publisher Name: Springer, London
Print ISBN: 978-3-540-76160-0
Online ISBN: 978-1-4471-1523-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics