Abstract
The Wasserstein distance on multivariate non-degenerate Gaussian densities is a Riemannian distance. After reviewing the properties of the distance and the metric geodesic, we present an explicit form of the Riemannian metrics on positive-definite matrices and compute its tensor form with respect to the trace inner product. The tensor is a matrix which is the solution to a Lyapunov equation. We compute the explicit formula for the Riemannian exponential, the normal coordinates charts and the Riemannian gradient. Finally, the Levi-Civita covariant derivative is computed in matrix form together with the differential equation for the parallel transport. While all computations are given in matrix form, nonetheless we discuss also the use of a special moving frame.
Similar content being viewed by others
References
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008). (with a foreword by Paul Van Dooren)
Aliprantis, C.D., Border, K.C.: Infinite Dimensional Analysis. A Hitchhiker’s Guide, 3rd edn. Springer, Berlin (2006)
Amari, S., Nagaoka, H.: Methods of information geometry. American Mathematical Society, Providence (2000). (translated from the 1993 Japanese original by Daishi Harada)
Amari, S.I.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998). https://doi.org/10.1162/089976698300017746
Amari, S.I.: Information geometry and its applications. Appl. Math. Sci. 194 (2016). https://doi.org/10.1007/978-4-431-55978-8
Anderson, T.W.: An Introduction to Multivariate Statistical Analysis. Wiley Series in Probability and Statistics, 3rd edn. Wiley, Hoboken (2003)
Bhatia, R.: Positive Definite Matrices. Princeton Series in Applied Mathematics. Princeton University Press, Princeton (2007). ([2015] paperback edition of the 2007 original [MR2284176])
Bhatia, R., Jain, T., Lim, Y.: On the Bures-Wasserstein distance between positive definite matrices. Expositiones Mathematicae (2018). https://doi.org/10.1016/j.exmath.2018.01.002 arXiv:1712.01504 (in press)
Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions. Comm. Pure Appl. Math. 44(4), 375–417 (1991). https://doi.org/10.1002/cpa.3160440402
do Carmo, M.P.: Riemannian geometry. Mathematics: Theory and Applications. Birkhuser Boston Inc., Cambridge (1992). (translated from the second Portuguese edition by Francis Flaherty)
Chevallier, E., Kalunga, E., Angulo, J.: Kernel density estimation on spaces of Gaussian distributions and symmetric positive definite matrices. SIAM J. Imaging Sci. 10(1), 191–215 (2017). https://doi.org/10.1137/15M1053566
Dowson, D.C., Landau, B.V.: The Fréchet distance between multivariate normal distributions. J. Multivar. Anal. 12(3), 450–455 (1982). https://doi.org/10.1016/0047-259X(82)90077-X
Gelbrich, M.: On a formula for the \(L^2\) Wasserstein metric between measures on Euclidean and Hilbert spaces. Math. Nachr. 147, 185–203 (1990). https://doi.org/10.1002/mana.19901470121
Givens, C.R., Shortt, R.M.: A class of Wasserstein metrics for probability distributions. Michigan Math. J. 31(2), 231–240 (1984). https://doi.org/10.1307/mmj/1029003026
Halmos, P.R.: Finite-dimensional vector spaces. The University Series in Undergraduate Mathematics, 2nd edn. D. Van Nostrand Co., Inc., Princeton-Toronto-New York-London (1958)
Hyvrinen, A.: Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 6, 695–709 (2005)
Klingenberg, W.P.A.: Riemannian Geometry, De Gruyter Studies in Mathematics, vol. 1, 2nd edn. Walter de Gruyter & Co., Berlin (1995). https://doi.org/10.1515/9783110905120
Knott, M., Smith, C.S.: On the optimal mapping of distributions. J. Optim. Theory Appl. 43(1), 39–49 (1984). https://doi.org/10.1007/BF00934745
Lafferty, J.D.: The density manifold and configuration space quantization. Trans. Am. Math. Soc. 305(2), 699–741 (1988). https://doi.org/10.2307/2000885
Lang, S.: Differential and Riemannian manifolds, Graduate Texts in Mathematics, vol. 160, 3rd edn. Springer, Berlin Heidelberg (1995)
Lott, J.: Some geometric calculations on Wasserstein space. Comm. Math. Phys. 277(2), 423–437 (2008). https://doi.org/10.1007/s00220-007-0367-3
Magnus, J.R., Neudecker, H.: Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley Series in Probability and Statistics. Wiley, Chichester (1999). (Revised reprint of the 1988 original)
Malagò, L., Pistone, G.: Combinatorial optimization with information geometry: Newton method. Entropy 16, 4260–4289 (2014)
Malagò, L., Pistone, G.: Information geometry of the Gaussiandistributionin view of stochastic optimization. In: Proceedings of FOGA’15, held on January 17-20, 2015, Aberystwyth,Wales, 2015 (2015)
Mangasarian, O.L., Fromovitz, S.: The Fritz John necessary optimality conditions in the presence of equality and inequality constraints. J. Math. Anal. Appl. 17, 37–47 (1967). https://doi.org/10.1016/0022-247X(67)90163-1
McCann, R.J.: A convexity principle for interacting gases. Adv. Math. 128(1), 153–179 (1997). https://doi.org/10.1006/aima.1997.1634
McCann, R.J.: Polar factorization of maps on Riemannian manifolds. Geom. Funct. Anal. 11(3), 589–608 (2001). https://doi.org/10.1007/PL00001679
Olkin, I., Pukelsheim, F.: The distance between two random vectors with given dispersion matrices. Linear Algebra Appl. 48, 257–263 (1982). https://doi.org/10.1016/0024-3795(82)90112-4
Otto, F.: The geometry of dissipative evolution equations: the porous medium equation. Comm. Partial Differential Equations 26(1-2), 101–174 (2001)
Papadopoulos, A.: Metric spaces, convexity and non-positive curvature, IRMA Lectures in Mathematics and Theoretical Physics, vol. 6, 2nd edn. European Mathematical Society (EMS), Zürich (2014). https://doi.org/10.4171/132
Parry, M., Dawid, A.P., Lauritzen, S.: Proper local scoring rules. Ann. Stat. 40(1), 561–592 (2012). https://doi.org/10.1214/12-AOS971
Pistone, G.: Nonparametric information geometry. In: F. Nielsen, F. Barbaresco (eds.) Geometric Science of Information, Lecture Notes in Comput. Sci., vol. 8085, pp. 5–36. Springer, Heidelberg (2013). First International Conference, GSI 2013 Paris, France, August 28-30 (2013) (proceedings)
Pistone, G., Sempi, C.: An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one. Ann. Stat. 23(5), 1543–1561 (1995)
Simoncini, V.: Computational methods for linear matrix equations. SIAM Rev. 58(3), 377–441 (2016). https://doi.org/10.1137/130912839
Skovgaard, L.T.: A Riemannian geometry of the multivariate normal model. Scand. J. Stat. 11(4), 211–223 (1984)
Takatsu, A.: Wasserstein geometry of Gaussian measures. Osaka J. Math. 48(4), 1005–1026 (2011)
Villani, C.: Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften. Springer, Berlin Heidelberg (2008)
Wachspress, E.L.: Trail to a Lyapunov equation solver. Comput. Math. Appl. 55(8), 1653–1659 (2008). https://doi.org/10.1016/j.camwa.2007.04.048
Acknowledgements
The authors wish to thank two anonymous referees for helpful comments. G. Pistone acknowledges the support of de Castro Statistics and Collegio Carlo Alberto. He is a member of GNAMPA-INdAM.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Malagò, L., Montrucchio, L. & Pistone, G. Wasserstein Riemannian geometry of Gaussian densities. Info. Geo. 1, 137–179 (2018). https://doi.org/10.1007/s41884-018-0014-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41884-018-0014-4
Keywords
- Information geometry
- Gaussian distribution
- Wasserstein distance
- Riemannian metrics
- Natural gradient
- Riemannian exponential
- Normal coordinates
- Levi-Civita covariant derivative
- Optimization on positive-definite symmetric matrices