Advertisement

Second-Order Optimization over the Multivariate Gaussian Distribution

  • Luigi MalagòEmail author
  • Giovanni Pistone
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9389)

Abstract

We discuss the optimization of the stochastic relaxation of a real-valued function, i.e., we introduce a new search space given by a statistical model and we optimize the expected value of the original function with respect to a distribution in the model. From the point of view of Information Geometry, statistical models are Riemannian manifolds of distributions endowed with the Fisher information metric, thus the stochastic relaxation can be seen as a continuous optimization problem defined over a differentiable manifold. In this paper we explore the second-order geometry of the exponential family, with applications to the multivariate Gaussian distributions, to generalize second-order optimization methods. Besides the Riemannian Hessian, we introduce the exponential and the mixture Hessians, which come from the dually flat structure of an exponential family. This allows us to obtain different Taylor formulæ according to the choice of the Hessian and of the geodesic used, and thus different approaches to the design of second-order methods, such as the Newton method.

Keywords

Tangent Space Newton Method Fisher Information Exponential Family Fisher Information Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

Giovanni Pistone is supported by de Castro Statistics, Collegio Carlo Alberto, Moncalieri, and he is a member of GNAMPA-INDAM.

References

  1. 1.
    Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. PAMI 6, 721–741 (1984)CrossRefzbMATHGoogle Scholar
  2. 2.
    Rubinstein, R.: The cross-entropy method for combinatorial and continuous optimization. Methodol. Comput. Appl. Probab. 1, 127–190 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Larrañaga, P., Lozano, J.A. (eds.): Estimation of Distribution Algoritms: A New Tool for Evolutionary Computation. Springer, New York (2001)Google Scholar
  4. 4.
    Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9, 159–195 (2001)CrossRefGoogle Scholar
  5. 5.
    Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., Schmidhuber, J.: Natural evolution strategies. JMLR 15, 949–980 (2014)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Malagò, L., Matteucci, M., Pistone, G.: Towards the geometry of estimation of distribution algorithms based on the exponential family. In: Proceedings of FOGA 2011, pp. 230–242, ACM (2011)Google Scholar
  7. 7.
    Ollivier, Y., Arnold, L., Auger, A., Hansen, N.: Information-geometric optimization algorithms: a unifying picture via invariance principles. arXiv:1106.3708 (2011)
  8. 8.
    Lasserre, J.B.: Global optimization with polynomials and the problem of moments. SIAM J. Optim. 11, 796–817 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Malagò, L., Matteucci, M., Pistone, G.: Stochastic relaxation as a unifying approach in 0/1 programming. In: NIPS 2009 Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra (DISCML) (2009)Google Scholar
  10. 10.
    Rao, R.C.: Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–91 (1945)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10, 251–276 (1998)CrossRefGoogle Scholar
  12. 12.
    Amari, S.: Neural learning in structured parameter spaces - natural Riemannian gradient. In: NIPS 1997, pp. 127–133. MIT Press (1997)Google Scholar
  13. 13.
    Pascanu, R., Bengio, Y.: Revisiting natural gradient for deep networks. In: Proceedings of ICLR 2014 (2014)Google Scholar
  14. 14.
    Kakade, S.: A natural policy gradient. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) NIPS 2001, pp. 1531–1538. MIT Press (2001)Google Scholar
  15. 15.
    Kuusela, M., Raiko, T., Honkela, A., Karhunen, J.: A gradient-based algorithm competitive with variational Bayesian EM for mixture of Gaussians. In: Neural Networks (IJCNN 2009), pp. 1688–1695 (2009)Google Scholar
  16. 16.
    Amari, S.: Differential-Geometrical Methods in Statistics. Lecture Notes in Statistics, vol. 28. Springer, New York (1985)zbMATHGoogle Scholar
  17. 17.
    Amari, S.-I., Barndorff-Nielsen, O.E., Kass, R.E., Lauritzen, S.L., Rao, C.R.: Chapter 4: Statistical manifolds. Differential geometry in statistical inference. Institute of Mathematical Statistics, Hayward, CA (1987)zbMATHGoogle Scholar
  18. 18.
    Amari, S., Nagaoka, H.: Methods of Information Geometry. American Mathematical Society, Providence (2000)zbMATHGoogle Scholar
  19. 19.
    Pistone, G.: Algebraic varieties vs differentiable manifolds in statistical models. In: Gibilisco, P., Riccomagno, E., Rogantin, M.P., Wynn, H.P. (eds.) Algebraic and Geometric Methods in Statistics, pp. 339–363. Cambridge University Press, Cambridge (2009)Google Scholar
  20. 20.
    Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)CrossRefzbMATHGoogle Scholar
  21. 21.
    Malagò, L., Pistone, G.: Stochastic relaxation over the exponential family: second-order geometry. In: NIPS 2014 Workshop on Optimization for Machine Learning (OPT 2014), Montreal, Canada, 12 December 2014 (2014)Google Scholar
  22. 22.
    Brown, L.D.: Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. Lecture Notes - Monograph Series, vol. 9. Institute of Mathematical Statistics, Hayward (1986)zbMATHGoogle Scholar
  23. 23.
    Manton, J.H.: A framework for generalising the Newton method and other iterative methods from euclidean space to manifolds. arXiv:1106.3708 (2012v1; 2014v2)
  24. 24.
    Malagò, L., Pistone, G.: Combinatorial optimization with information geometry: the Newton method. Entropy 16, 4260–4289 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    do Carmo, M.P.: Riemannian Geometry, Mathematics: Theory & Applications. Birkhäuser Boston Inc., Boston (1992)CrossRefGoogle Scholar
  26. 26.
    Malagò, L., Pistone, G.: Information geometry of the Gaussian distribution in view of stochastic optimization. In: Proceedings of FOGA 2015, pp. 150–162 (2015)Google Scholar
  27. 27.
    Skovgaard, L.T.: A Riemannian geometry of the multivariate normal model. Scand. J. Stat. 11, 211–223 (1984)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Shinshu University & Inria Saclay – Île-de-FranceNaganoJapan
  2. 2.Collegio Carlo AlbertoMoncalieriItaly

Personalised recommendations