Abstract
In this paper we investigate dually flat structure of the space of positive definite matrices induced by a class of convex functions called V-potentials, from a viewpoint of information geometry. It is proved that the geometry is invariant under special linear group actions and naturally introduces a foliated structure. Each leaf is proved to be a homogeneous statistical manifold with a negative constant curvature and enjoy a special decomposition property of canonically defined divergence. As an application to statistics, we finally give the correspondence between the obtained geometry on the space and the one on elliptical distributions induced from a certain Bregman divergence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amari, S.: Differential-geometrical methods in statistics, Lecture notes in Statistics. vol. 28, Springer, New York (1985)
Amari, S., Nagaoka, H.: Methods of information geometry, AMS & OUP, Oxford (2000)
David, A.P.: The geometry of proper scoring rules. Ann. Inst. Stat. 59, 77–93 (2007)
Eguchi, S.: Information geometry and statistical pattern recognition. Sugaku Expositions Amer. Math. Soc. 19, 197–216 (2006) (originally Sūgaku, 56, 380–399 (2004) in Japanese)
Eguchi, S.: Information divergence geometry and the application to statistical machine learning. In: Emmert-Streib, F., Dehmer, M. (eds.) Information Theory and Statistical Learning, pp. 309–332. Springer, New York (2008)
Eguchi, S., Copas, J.: A class of logistic-type discriminant functions. Biometrika 89(1), 1–22 (2002)
Eguchi, S., Komori, O., Kato, S.: Projective power entropy and maximum tsallis entropy distributions. Entropy 13, 1746–1764 (2011)
Fang, K.T., Kotz, S., Ng, K.W.: Symmetric Multivariate and Related Distributions. Chapman and Hall, London (1990)
Faraut, J., Korányi, A.: Analysis on Symmetric Cones. Oxford University Press, New York (1994)
Grunwald, P.D., David, A.P.: Game theory, maximum entropy, minimum discrepancy and robust bayesian decision theory. Ann. Stat. 32, 1367–1433 (2004)
Hao, J.H., Shima, H.: Level surfaces of nondegenerate functions in \({ r}^{n+1}\). Geom. Dedicata 50(2), 193–204 (1994)
Helgason, S.: Differential Geometry and Symmetric Spaces. Academic Press, New York (1962)
Higuchi, I., Eguchi, S.: Robust principal component analysis with adaptive selection for tuning parameters. J. Mach. Learn. Res. 5, 453–471 (2004)
Kanamori, T., Ohara, A.: A bregman extension of quasi-newton updates I: an information geometrical framework. Optim. Methods Softw. 28(1), 96–123 (2013)
Kakihara, S., Ohara, A., Tsuchiya, T.: Information geometry and interior-point algorithms in semidefinite programs and symmetric cone programs. J. Optim. Theory Appl. 157(3), 749–780 (2013)
Kass, R.E., Vos, P.W.: Geometrical Foundations of Asymptotic Inference. Wiley, New York (1997)
Koecher, M.: The Minnesota Notes on Jordan Algebras and their Applications. Springer, Berlin (1999)
Kullback, S.: Information Theory and Statistics. Wiley, New York (1959)
Kurose, T.: Dual connections and affine geometry. Math. Z. 203(1), 115–121 (1990)
Kurose, T.: On the divergences of 1-conformally flat statistical manifolds. Tohoku Math. J. 46(3), 427–433 (1994)
Lauritzen, S.: Statistical manifolds. In: Amari, S.-I., et al. (eds.) Differential Geometry in Statistical Inference, Institute of Mathematical Statistics, Hayward (1987)
Minami, M., Eguchi, S.: Robust blind source separation by beta-divergence. Neural Comput. 14, 1859–1886 (2002)
Muirhead, R.J.: Aspects of Multivariate Statistical Theory. Wiley, New York (1982)
Murata, N., Takenouchi, T., Kanamori, T., Eguchi, S.: Information geometry of u-boost and bregman divergence. Neural Comput. 16, 1437–1481 (2004)
Murray, M.K., Rice, J.W.: Differential Geometry and Statistics. Chapman & Hall, London (1993)
Naudts, J.: Continuity of a class of entropies and relative entropies. Rev. Math. Phys. 16, 809–822 (2004)
Naudts, J.: Estimators, escort probabilities, and \(\phi \)-exponential families in statistical physics. J. Ineq. Pure Appl. Math. 5, 102 (2004)
Nesterov, Y.E., Todd, M.J.: Primal-dual interior-point methods for self-scaled cones. SIAM J. Optim. 8, 324–364 (1998)
Nomizu, K., Sasaki, T.: Affine differential geometry. Cambridge University Press, Cambridge (1994)
Ohara, A.: Geodesics for dual connections and means on symmetric cones. Integr. Eqn. Oper. Theory 50, 537–548 (2004)
Ohara, A., Amari, S.: Differential geometric structures of stable state feedback systems with dual connections. Kybernetika 30(4), 369–386 (1994)
Ohara, A., Eguchi, S.: Geometry on positive definite matrices induced from V-potential function. In: Nielsen, F., Barbaresco, F. (eds.) Geometric Science of Information; Lecture Notes in Computer Science 8085, pp. 621–629. Springer, Berlin (2013)
Ohara, A., Eguchi, S.: Group invariance of information geometry on \(q\)-gaussian distributions induced by beta-divergence. Entropy 15, 4732–4747 (2013)
Ohara, A., Suda, N., Amari, S.: Dualistic differential geometry of positive definite matrices and its applications to related problems. Linear Algebra Appl. 247, 31–53 (1996)
Ohara, A., Wada, T.: Information geometry of \(q\)-Gaussian densities and behaviors of solutions to related diffusion equations. J. Phys. A: Math. Theor. 43, 035002 (18pp.) (2010)
Ollila, E., Tyler, D., Koivunen, V., Poor, V.: Complex elliptically symmetric distributions : survey, new results and applications. IEEE Trans. signal process. 60(11), 5597–5623 (2012)
Rothaus, O.S.: Domains of positivity. Abh. Math. Sem. Univ. Hamburg 24, 189–235 (1960)
Sasaki, T.: Hyperbolic affine hyperspheres. Nagoya Math. J. 77, 107–123 (1980)
Scott, D.W.: Parametric statistical modeling by minimum integrated square error. Technometrics 43, 274–285 (2001)
Shima, H.: The geometry of Hessian structures. World Scientific, Singapore (2007)
Takenouchi, T., Eguchi, S.: Robustifying adaboost by adding the naive error rate. Neural Comput. 16(4), 767–787 (2004)
Tsallis, C.: Introduction to Nonextensive Statistical Mechanics. Springer, New York (2009)
Uohashi, K., Ohara, A., Fujii, T.: 1-conformally flat statistical submanifolds. Osaka J. Math. 37(2), 501–507 (2000)
Uohashi, K., Ohara, A., Fujii, T.: Foliations and divergences of flat statistical manifolds. Hiroshima Math. J. 30(3), 403–414 (2000)
Vinberg, E.B.: The theory of convex homogeneous cones. Trans. Moscow Math. Soc. 12, 340–430 (1963)
Wolkowicz, H., et al. (eds.): Handbook of Semidefinite Programming. Kluwer Academic Publishers, Boston (2000)
Acknowledgments
We thank the anonymous referees for their constructive comments and careful checks of the original manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendices
2.1.1 A Proof of Theorem 1
It is observed that \(-\nu _1(\det P) \not = 0\) on \(PD(n,\mathbf{R})\) is necessary because the second term is not positive definite. Hence, the Hessian can be represented as
Here \(\tilde{X}=P^{-1/2}XP^{-1/2}, \tilde{Y}=P^{-1/2}YP^{-1/2}\), \(\mathrm{vec}(\bullet )\) is the operator that maps \(A=(a_{ij}) \in \mathbf{R}^{n \times n}\) to \([a_{11},\cdots ,a_{n1},a_{12},\cdots ,a_{n2},\cdots ,a_{1n},\cdots ,a_{nn} ]^T \in \mathbf{R}^{n^2}\), and \(I_n\) and \(I_{n^2}\) denote the unit matrices of order \(n\) and \(n^2\), respectively. By congruently transforming the matrix \(I_{n^2}-\beta ^{(V)}(\det P) \mathrm{vec}(I_n)\mathrm{vec}^T(I_n)\) with a proper permutation matrix, we see the positive definiteness of \(g^{(V)}\) is equivalent with \(-\nu _1(\det P) > 0\) and
Let \(W\) be an orthogonal matrix that has \(\mathbf{1}/\sqrt{n}\) as the first column vector. Since the following eigen-decomposition
holds, the conditions (2.5) are necessary and sufficient for positive definiteness of \(g^{(V)}\). Thus, the statement follows.\(\square \)
2.1.2 B Proof of Theorem 2
Since the components of \(P^*\) is an affine coordinate for the connection \(^* \nabla ^{(V)}\), the parallel shift \(\pi _t(Y)\) along the curve \(\gamma \) satisfies
for any \(t\).
From Lemma 1, this implies
for any \(t \; (-\epsilon <t<\epsilon )\).
By calculating the left-hand side, we get
where \(s_t=\det P_t\). If \(t=0\), then this equation implies that
where \(s=\det P\). Hence we observe that
Taking the trace for both sides of (2.25), we get
From (2.25) and (2.26) it follows that
This completes the proof.\(\square \)
2.1.3 C Proof of Proposition 4
Since geometric structure \((\mathcal {L}_s,g^{(V)})\) is also invariant under the transformation \(\tau _G\) where \(G \in SL(n,\mathbf{R})\), it suffices to consider at \(\lambda I \in \mathcal {L}_s\), where \(\lambda = s^{1/n}\).
Let \(\tilde{X} \in \mathcal {X}(\mathcal {L}_s)\) be a vector field defined at each \(P \in \mathcal {L}_s\) by
where \(\tilde{X}^i\) are certain smooth functions on \(\mathcal {L}_s\). Consider the curve \(P_t= \lambda \exp X t \in \mathcal {L}_s\) starting at \(t=0\) and a vector field \(\tilde{Y}\) along \(P_t\) defined by
where \(Y_t\) is an arbitrary smooth curve in \(T_I \mathcal {L}_1\) with \(Y_0=Y\) and \(\tilde{Y}^i\) are smooth functions on \(P_t\). We show that the \((T_{\lambda I} \mathcal {L}_s)^\perp \)-component of \(\left( \hat{\nabla }^{(V)}_{\tilde{X}} \tilde{Y} \right) _{\lambda I}\), i.e., the covariant derivative at \(\lambda I\) orthogonal to \(T_{\lambda I} \mathcal {L}_s\), vanishes for any \(X\) and \(Y \in T_I \mathcal {L}_1\) if and only if \(\nu _2(s)=0\).
We see
hold. Note that
then using (2.13) and corollary 1, we obtain
For the third equality we have used that \(\varPhi (\lambda X,\lambda Y,\lambda I)=0\) for any \(X\) and \(Y \in T_I \mathcal {L}_1\).
Since it holds that
and \(-\nu _1(s)+\nu _2(s)n \not =0\) by (2.5), the \((T_{\lambda I} \mathcal {L}_s)^\perp \)-component of \(\left( \hat{\nabla }^{(V)}_{\tilde{X}} \tilde{Y} \right) _{\lambda I}\) vanishes for any \(X\) and \(Y \in T_I \mathcal {L}_1\) if and only if
Here, we have used \(\text {tr}((dY_t/dt)_{t=0})=0\). The above equality is equivalent to \(\nu _2(s)=~0\). Hence, we conclude that the statement holds.\(\square \)
2.1.4 D Proof of Proposition 6
The statements (i) and (ii) follow from direct calculations. Since \(\left( ^* \tilde{\nabla }^{(V)}_{\tilde{X}} \tilde{Y} \right) _P\) is the orthogonal projection of \(\left( ^*\nabla ^{(V)} _{\tilde{X}} \tilde{Y} \right) _P\) to \(T_P \mathcal {L}_s\) with respect to \(g_P^{(V)}\), it can be represented by
where \(\delta \) is determined from the orthogonality condition
Similarly to the proof of Proposition 4 where \(\lambda =s^{1/n}\), we see that
Since \(\varPhi ^{\perp }(\lambda X,\lambda Y, \lambda I) \in (T_{\lambda I} \mathcal {L}_s)^\perp \) and \((dY_t/dt)_{t=0} \in T_{\lambda I} \mathcal {L}_s\), the orthogonal projection of \(\left( {^*\nabla }^{(V)}_{\tilde{X}} \tilde{Y} \right) _{\lambda I}\) to \(T_{\lambda I} \mathcal {L}_s\) is that of \(\lambda (dY_t/dt)_{t=0}-\lambda (YX+XY)/2\). Thus, from the orthogonality condition we have
which is independent of \(V(s)\).\(\square \)
2.1.5 E Proof of Theorem 4
For \(P\) and \(Q\) in \(PD(n,\mathbf{R})\), we shortly write two density functions in a \(U\)-model as \(f_P(x)=f(x,P)\) and \( f_Q(x)=f(x,Q)\).
It suffices to show the dual canonical divergence \({^*D^{(V)}}(P,Q)=D^{(V)}(Q,P)\) of \((PD(n,\mathbf{R}), \nabla , g^{(V)})\) given by (2.10) coincides with \(D_U(f_P,f_Q)\). Note that an exchange of the order for two arguments in a divergence only causes that of the definitions of primal and dual affine connections in (2.3) but does not affect whole dualistic structure of the induced geometry.
Recalling (2.6), we have
where \(V'\) denotes the derivative of \(V\) by \(s\). On the other hand, we can directly differentiate \(\varphi ^{(V)}(P)\) defined via (2.24)
where \(\mathrm{E}_P\) is the expectation operator with respect to \(f_P(x)\). Thus, we have
Note that
because \(\xi (u)\) is the identity. From the definition, \(U\)-divergence is
Using (2.27), the third term is expressed by
Hence, \(D_U(f_P,f_Q)={^*D^{(V)}}(P,Q)=D^{(V)}(Q,P)\).\(\square \)
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Ohara, A., Eguchi, S. (2014). Geometry on Positive Definite Matrices Deformed by V-Potentials and Its Submanifold Structure. In: Nielsen, F. (eds) Geometric Theory of Information. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-05317-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-05317-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05316-5
Online ISBN: 978-3-319-05317-2
eBook Packages: EngineeringEngineering (R0)