Abstract
In the past few years, neural networks (also denoted NNs in the sequel) have commanded considerable attention as data analysis tools. Neural Networks can be viewed as universal approximators of non-linear functions that can learn from examples. This chapter focuses on an iterative algorithm for training neural networks inspired by the strong correspondences existing between NNs and some statistical methods [1][2]. This algorithm is often considered for the solution of complex statistical problems with hidden data and we will show that it is also well suited for some NNs learning problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Amari, “Information geometry of the EM and em algorithms for neural networks”, Neural Networks, vol. 8, pp. 1379–1408, 1995.
C. Couvreur and P. Couvreur, “Neural networks and statistics: a naive comparison”, to appear in JORBEL: Belgian Journal of Operations Research, Statistics and Computer Sciences, 1997.
A.P. Dempster, N.M. Laird and D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society B, vol. 39, pp. 1–38, 1977.
M.I. Jordan and R.A. Jacobs, “Hierarchical mixtures of experts and the EM-algorithm”, Neural Computations, vol. 6, pp. 181–214, 1994.
R.A. Redner and H.F. Walker, “Mixture densities, maximum likelihood and the EM algorithm”, SIAM Review, vol. 26, pp. 192–239, 1984.
L.R. Rabiner, “A tutorial on hidden Markov models and selected application in speech recognition”, Proceedings of the IEEE, vol. 77, pp. 257–286, 1989.
D.M. Titterington, “Some recent research in the analysis of mixture distributions”, Statistics, vol. 21, pp. 619–641, 1990.
J.A. Fessler and A.O. Hero, “Space-alternating generalized expectation- maximization algorithm”, IEEE Transactions on Signal Processing, vol. 42, pp. 2664–2677, 1994.
M. Segal and E. Weinstein, “The cascade EM algorithm”, Proceedings of the IEEE, vol. 76, pp. 1388–1390, 1988.
M. Segal and E. Weinstein, “A new method for evaluating the log-likelihood gradient, the Hessian, and the Fisher information matrix for linear dynamic systems”, IEEE Transactions on Information Theory, vol. 35, pp. 682–687, 1989.
E.M. Johansson, F.V. Dowka and D.M. Goodman, “Back-propagation learning for multi-layer feed forward neural networks using the conjugate gradient method, report UCRL-JC-104850, Lawrence Livermore National Lab, Livermore, CA, 1990.
D.M. Titterington, “Recursive Parameter Estimation using Incomplete Data”, J. Roy. Stat. Soc. B, vol. 39, pp. 1–38, 1977.
M.I. Jordan, and R.A. Jacobs, “Hierarchies of adaptive experts”, in Advances in Neural Information Processing Systems, vol. 4, J.E. Moody, S.J. Hanson, and R.P. Lippmann, Eds, pp. 985–992, Morgan Kaufman, San Mateo, CA, 1992.
M.I. Jordan and L. Xu, “Convergence results for the EM approach to mixtures of experts architectures”, Neural Networks, vol. 8, pp. 1409–1431, 1995.
L. Xu, M.I. Jordan and J. Hinton, New gating net for mixture of experts, EM algorithm and piecewise function approximations, preprint, 1994.
J. Zhuang and S. Amari, “Piecewise-linear division of signal space by a multilayer neural network with the maximum detector”, Transactions of the Institute of Electronics, Information and Communication Engineers, vol. J76-D, pp. 1435–1440, 1993 (in Japanese).
C.F.J., Wu, “On the convergence properties of the EM algorithm”, Annals of Statistics, vol. 11, pp. 95–103, 1983.
R.A. Boyles, “On the convergence of the EM algorithm”, J. Roy. Stat. Soc. B, vol. 45, pp. 47–50, 1983.
T.A. Louis, “Finding the observed information matrix when using the EM algo- rithm”, Journal of the Royal Statistical Society B, vol. 44, pp. 226–233, 1982.
X.-L. Meng and D.B. Rubin, “On the global and componentwise rates of convergences of the EM algorithm”, Lin. Alg. and Appl., vol. 199, pp. 413–425, 1994.
M. Jamshidian and R.I. Jennrich, “Conjugate gradient acceleration for the EM algorithm”, Journal of the American Statistical Association, vol. 88, pp. 221–228, 1993.
I. Meilijson, `A fast improvement of the EM algorithm on its own terms“, Journal of the Royal Statistical Society B, vol. 51, pp. 127–138, 1989.
F. Girosi, T. Poggio and Jones, “Regularization theory and neural network architectures”, Neural Computations, vol. 7, pp. 219–269, 1995.
P.J. Green, “On the use of the EM algorithm for penalized likelihood estimation”, Journal of the Royal Statistical Society B, vol. 52, pp. 443–452, 1990.
M.R. Segal, P. Bacchetti and N.P. Jewell, “Variances for maximum penalized likelihood estimates obtained via the EM algorithm”, J. Roy. Stat. Soc. B, vol. 56, pp. 345–352, 1994.
X.-L. Meng and D.B. Rubin, “Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm”, Journal of the American Statistical Association, vol. 86, pp. 899–909, 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag London Limited
About this chapter
Cite this chapter
Kárný, M., Warwick, K., Kůrková, V. (1998). A Tutorial on the EM Algorithm and Its Applications to Neural Network Learning. In: Kárný, M., Warwick, K., Kůrková, V. (eds) Dealing with Complexity. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-1523-6_4
Download citation
DOI: https://doi.org/10.1007/978-1-4471-1523-6_4
Publisher Name: Springer, London
Print ISBN: 978-3-540-76160-0
Online ISBN: 978-1-4471-1523-6
eBook Packages: Springer Book Archive