Scaled CGEM: A Fast Accelerated EM

  • Jörg Fischer
  • Kristian Kersting
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2837)


The EM algorithm is a popular method for maximum likelihood estimation of Bayesian networks in the presence of missing data. Its simplicity and general convergence properties make it very attractive. However, it sometimes converges slowly. Several accelerated EM methods based on gradient-based optimization techniques have been proposed. In principle, they all employ a line search involving several NP-hard likelihood evaluations. We propose a novel acceleration called SCGEM based on scaled conjugate gradients (SCGs) well-known from learning neural networks. SCGEM avoids the line search by adopting the scaling mechanism of SCGs applied to the expected information matrix. This guarantees a single likelihood evaluation per iteration. We empirically compare SCGEM with EM and conventional conjugate gradient accelerated EM. The experiments show that SCGEM can significantly accelerate both of them and is equal in quality.


Bayesian Network Conjugate Gradient Line Search Data Case Likelihood Evaluation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bauer, E., Koller, D., Singer, Y.: Update Rules for Parameter Estimation in Bayesian Networks. In: Geiger, D., Shenoy, P.P. (eds.) Proceedings of the Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI 1997), Providence, Rhode Island, USA, pp. 3–13. Morgan Kaufmann, San Francisco (1997)Google Scholar
  2. 2.
    Beinlich, I., Suermondt, H., Chavez, R., Cooper, G.: The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In: Hunter, J. (ed.) Proceedings of the Second European Conference on Artificial Intelligence and Medicine (AIME 1989), City University, London, UK. LNMI, vol. 38, pp. 247–256. Springer, Heidelberg (1989)Google Scholar
  3. 3.
    Binder, J., Koller, D., Russell, S., Kanazawa, K.: Adaptive Probabilistic Networks with Hidden Variables. Machine Learning 29(2/3), 213–244 (1997)zbMATHCrossRefGoogle Scholar
  4. 4.
    Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence 42, 393–405 (1990)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Dempster, A., Larid, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)zbMATHGoogle Scholar
  6. 6.
    Heckerman, D.: A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research (1995)Google Scholar
  7. 7.
    Jamshidian, M., Jennrich, R.I.: Conjugate Gradient Accleration of the EM Algorithm. Journal of the American Statistical Association 88(412), 221–228 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Jamshidian, M., Jennrich, R.I.: Accleration of the EM Algorithm by using Quasi-Newton Methods. Jour. of the Royal Stat. Society B 59(3), 569–587 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Kersting, K., Landwehr, N.: Scaled Conjugate Gradients for Maximum likelihood: An Empirical Comparison with the EM Algorithm. In: Gámez, J.A., Salmer´on, A. (eds.) Proceedings of the First European Workshop on Probabilistic Graphical Models (PGM 2002), Cuenca, Spain, pp. 89–98 (2002)Google Scholar
  10. 10.
    Lange, K.: A quasi-Newton acceleration of the EM algorithm. Statistica Sinica 5, 1–18 (1995)zbMATHMathSciNetGoogle Scholar
  11. 11.
    Lauritzen, S.L.: The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis 19, 191–201 (1995)zbMATHCrossRefGoogle Scholar
  12. 12.
    Luenberger, D.G.: Linear and Nonlinear Programming. Addison-Wesley, Reading (1984)zbMATHGoogle Scholar
  13. 13.
    McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions. Wiley, Chichester (1997)zbMATHGoogle Scholar
  14. 14.
    Mitchell, T.M.: Machine Learning. The McGraw-Hill Companies, Inc., New York (1997)zbMATHGoogle Scholar
  15. 15.
    Møller, M.: A Scaled Conjugate Gradient Algoritm for Fast Supervised Learning. Neural Networks 6, 525–533 (1993)CrossRefGoogle Scholar
  16. 16.
    Nabney, I.: NETLAB: Algorithms for Pattern Recognition. In: Advances in Pattern Recognition, Springer, Heidelberg (2001), Google Scholar
  17. 17.
    Ortiz, L.E., Kaelbling, L.P.: Accelerating EM: An Empirical Study. In: Laskey, K.B., Prade, H. (eds.) Proceedings of the Fifteenth Annual Conference on Uncertainty in Articial Intelligence (UAI 1999), Stockholm, Sweden, pp. 512–521. Morgan Kaufmann, San Francisco (1999)Google Scholar
  18. 18.
    Ortiz, L.E., Kaelbling, L.P.: Notes on methods based on maximum-likelihodd estimation for learning the parameters of the mixture-of-Gaussians model. Technical Report CS-99-03, Department of Computer Science, Brown University (1999)Google Scholar
  19. 19.
    Pearl, J.: Reasoning in Intelligent Systems: Networks of Plausible Inference, 2nd edn. Morgan Kaufmann, San Francisco (1991)Google Scholar
  20. 20.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge (1993), Google Scholar
  21. 21.
    Thiesson, B.: Accelerated quantification of Bayesian networks with incomplete data. In: Fayyad, U.M., Uthurusamy, R. (eds.) Proceedings of First International Conference on Knowledge Discovery and Data Mining, Montreol, Canada, pp. 306–311. AAAI Press, Menlo Park (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Jörg Fischer
    • 1
  • Kristian Kersting
    • 1
  1. 1.Institute for Computer Science, Machine Learning LabAlbert-Ludwigs-UniversityFreiburg i. Brg.Germany

Personalised recommendations