Abstract
Clustering categorical distributions in the probability simplex is a fundamental task met in many applications dealing with normalized histograms. Traditionally, differential-geometric structures of the probability simplex have been used either by (i) setting the Riemannian metric tensor to the Fisher information matrix of the categorical distributions, or (ii) defining the dualistic information-geometric structure induced by a smooth dissimilarity measure, the Kullback–Leibler divergence. In this work, we introduce for this clustering task a novel computationally-friendly framework for modeling the probability simplex termed Hilbert simplex geometry. In the Hilbert simplex geometry, the distance function is described by a polytope. We discuss the pros and cons of those different statistical modelings, and benchmark experimentally these geometries for center-based k-means and k-center clusterings. Furthermore, since a canonical Hilbert metric distance can be defined on any bounded convex subset of the Euclidean space, we also consider Hilbert’s projective geometry of the elliptope of correlation matrices and study its clustering performances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
To contrast with this result, let us mention that infinitesimal small balls in Riemannian geometry have Euclidean ellipsoidal shapes (visualized as Tissot’s indicatrix in cartography).
- 2.
For positive values a and b, the arithmetic-geometric mean inequality states that \(\sqrt{ab}\le \frac{a+b}{2}\).
- 3.
Consider \(A = (1/3,1/3,1/3)\), \(B = (1/6,1/2,1/3)\), \(C = (1/6,2/3,1/6)\) and \(D = (1/3,1/2,1/6)\). Then \(2AB^2 +2BC^2 = 4.34\) but \(AC^2 + BD^2 = 3.84362411135\).
References
Agresti, A.: Categorical Data Analysis, vol. 482. Wiley, New Jercy (2003)
Aggarwal, C.C., Zhai, C.X.: Mining Text Data. Springer Publishing Company, Berlin (2012)
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: International Conference on Computer Vision, pp. 104–111. IEEE (2009)
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press, Cambridge (2012)
Chaudhuri, K., McGregor, A.: Finding metric structure in information theoretic clustering. In: Conference on Learning Theory (COLT), pp. 391–402 (2008)
Lebanon, G.: Learning Riemannian metrics. In: Conference on Uncertainty in Artificial Intelligence (UAI), pp. 362–369 (2002)
Rigouste, L., Cappé, O., Yvon, F.: Inference and evaluation of the multinomial mixture model for text clustering. Inf. Process. Manag. 43(5), 1260–1280 (2007)
Huang, Z.: Extensions to the \(k\)-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2(3), 283–304 (1998)
Arthur, D., Vassilvitskii, S.: \(k\)-means++: the advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1027–1035 (2007)
Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–306 (1985)
Tropp, J.A.: Simplicial faces of the set of correlation matrices. Discret. Comput. Geom. 60(2), 512–529 (2018)
Kass, R.E., Vos, P.W.: Geometrical Foundations of Asymptotic Inference. Wiley Series in Probability and Statistics. Wiley-Interscience, New Jercy (1997)
Hotelling, H.: Spaces of statistical parameters. Bull. Amer. Math. Soc. 36, 191 (1930)
Rao, C.R.: Information and accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37(3), 81–91 (1945)
Rao, C.R.: Information and the accuracy attainable in the estimation of statistical parameters. Breakthroughs in Statistics, pp. 235–247. Springer, New York (1992)
Stigler, S.M.: The epic story of maximum likelihood. Stat. Sci. 22(4), 598–620 (2007)
Amari, Si: Information Geometry and Its Applications. Applied Mathematical Sciences, vol. 194. Springer, Japan (2016)
Calin, O., Udriste, C.: Geometric Modeling in Probability and Statistics. Mathematics and Statistics. Springer International Publishing, New York (2014)
Amari, Si, Cichocki, A.: Information geometry of divergence functions. Bull. Pol. Acad. Sci.: Tech. Sci. 58(1), 183–195 (2010)
Shima, H.: The Geometry of Hessian Structures. World Scientific, Singapore (2007)
Liang, X.: A note on divergences. Neural Comput. 28(10), 2045–2062 (2016)
Jenssen, R., Principe, J.C., Erdogmus, D., Eltoft, T.: The Cauchy–Schwarz divergence and Parzen windowing: connections to graph theory and mercer kernels. J. Frankl. Inst. 343(6), 614–629 (2006)
Hilbert, D.: Über die gerade linie als kürzeste verbindung zweier punkte. Mathematische Annalen 46(1), 91–96 (1895)
Busemann, H.: The Geometry of Geodesics. Pure and Applied Mathematics, vol. 6. Elsevier Science, Amsterdam (1955)
de la Harpe, P.: On Hilbert’s metric for simplices. Geometric Group Theory, vol. 1, pp. 97–118. Cambridge University Press, Cambridge (1991)
Lemmens, B., Nussbaum, R.: Birkhoff’s version of Hilbert’s metric and its applications in analysis. Handbook of Hilbert Geometry, pp. 275–303 (2014)
Richter-Gebert, J.: Perspectives on Projective Geometry: A Guided Tour Through Real and Complex Geometry. Springer, Berlin (2011)
Bi, Y., Fan, B., Wu, F.: Beyond Mahalanobis metric: Cayley–Klein metric learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2339–2347 (2015)
Nielsen, F., Muzellec, B., Nock, R.: Classification with mixtures of curved Mahalanobis metrics. In: IEEE International Conference on Image Processing (ICIP), pp. 241–245 (2016)
Nielsen, F., Muzellec, B., Nock, R.: Large margin nearest neighbor classification using curved Mahalanobis distances (2016). arXiv:1609.07082 [cs.LG]
Stillwell, J.: Ideal elements in Hilbert’s geometry. Perspect. Sci. 22(1), 35–55 (2014)
Bernig, A.: Hilbert geometry of polytopes. Archiv der Mathematik 92(4), 314–324 (2009)
Nielsen, F., Sun, K.: Clustering in Hilbert simplex geometry. CoRR arXiv: abs/1704.00454 (2017)
Nielsen, F., Shao, L.: On balls in a polygonal Hilbert geometry. In: 33st International Symposium on Computational Geometry (SoCG 2017). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik (2017)
Laidlaw, D.H., Weickert, J.: Visualization and Processing of Tensor Fields: Advances and Perspectives. Mathematics and Visualization. Springer, Berlin (2009)
Lemmens, B., Walsh, C.: Isometries of polyhedral Hilbert geometries. J. Topol. Anal. 3(02), 213–241 (2011)
Condat, L.: Fast projection onto the simplex and the \(\ell _1\) ball. Math. Program. 158(1–2), 575–585 (2016)
Park, P.S.: Regular polytopic distances. Forum Geom. 16, 227–232 (2016)
Boissonnat, J.D., Sharir, M., Tagansky, B., Yvinec, M.: Voronoi diagrams in higher dimensions under certain polyhedral distance functions. Discret. Comput. Geom. 19(4), 485–519 (1998)
Bengtsson, I., Zyczkowski, K.: Geometry of Quantum States: An Introduction to Quantum Entanglement. Cambridge University Press, Cambridge (2017)
Nielsen, F.: Cramér–Rao lower bound and information geometry. Connected at Infinity II, pp. 18–37. Springer, Berlin (2013)
Chapman, D.G.: Minimum variance estimation without regularity assumptions. Ann. Math. Stat. 22(4), 581–586 (1951)
Hammersley, H.: On estimating restricted parameters. J. R. Stat. Society. Ser. B (Methodol.) 12(2), 192–240 (1950)
Nielsen, F., Sun, K.: On Hölder projective divergences. Entropy 19(3), 122 (2017)
Nielsen, F., Nock, R.: Further heuristics for \(k\)-means: the merge-and-split heuristic and the \((k,l)\)-means. arXiv:1406.6314 (2014)
Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Approximate \(k\)-means++ in sublinear time. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 1459–1467 (2016)
Nielsen, F., Nock, R.: Total Jensen divergences: definition, properties and \(k\)-means++ clustering (2013). arXiv:1309.7109 [cs.IT]
Ackermann, M.R., Blömer, J.: Bregman clustering for separable instances. Scandinavian Workshop on Algorithm Theory, pp. 212–223. Springer, Berlin (2010)
Manthey, B., Röglin, H.: Worst-case and smoothed analysis of \(k\)-means clustering with Bregman divergences. J. Comput. Geom. 4(1), 94–132 (2013)
Endo, Y., Miyamoto, S.: Spherical \(k\)-means++ clustering. Modeling Decisions for Artificial Intelligence, pp. 103–114. Springer, Berlin (2015)
Nielsen, F., Nock, R., Amari, Si: On clustering histograms with \(k\)-means by using mixed \(\alpha \)-divergences. Entropy 16(6), 3273–3301 (2014)
Brandenberg, R., König, S.: No dimension-independent core-sets for containment under homothetics. Discret. Comput. Geom. 49(1), 3–21 (2013)
Panigrahy, R.: Minimum enclosing polytope in high dimensions (2004). arXiv:cs/0407020 [cs.CG]
Saha, A., Vishwanathan, S., Zhang, X.: New approximation algorithms for minimum enclosing convex shapes. In: ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1146–1160 (2011)
Nielsen, F., Nock, R.: On the smallest enclosing information disk. Inf. Process. Lett. 105(3), 93–97 (2008)
Sharir, M., Welzl, E.: A combinatorial bound for linear programming and related problems. STACS 92, 567–579 (1992)
Welzl, E.: Smallest enclosing disks (balls and ellipsoids). New Results and New trends in Computer Science, pp. 359–370. Springer, Berlin (1991)
Nielsen, F., Nock, R.: Approximating smallest enclosing balls with applications to machine learning. Int. J. Comput. Geom. Appl. 19(05), 389–414 (2009)
Arnaudon, M., Nielsen, F.: On approximating the Riemannian \(1\)-center. Comput. Geom. 46(1), 93–104 (2013)
Bâdoiu, M., Clarkson, K.L.: Smaller core-sets for balls. In: ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 801–802 (2003)
Nielsen, F., Hadjeres, G.: Approximating covering and minimum enclosing balls in hyperbolic geometry. International Conference on Networked Geometric Science of Information, pp. 586–594. Springer, Cham (2015)
Bădoiu, M., Clarkson, K.L.: Optimal core-sets for balls. Comput. Geom. 40(1), 14–22 (2008)
Bachem, O., Lucic, M., Krause, A.: Scalable and distributed clustering via lightweight coresets (2017). arXiv:1702.08248 [stat.ML]
Nielsen, F., Nock, R.: On approximating the smallest enclosing Bregman balls. In: Proceedings of the Twenty-Second Annual Symposium on Computational Geometry, pp. 485–486. ACM (2006)
Nock, R., Nielsen, F.: Fitting the smallest enclosing Bregman ball. ECML, pp. 649–656. Springer, Berlin (2005)
Deza, M., Sikirić, M.D.: Voronoi polytopes for polyhedral norms on lattices. Discret. Appl. Math. 197, 42–52 (2015)
Körner, M.C.: Minisum hyperspheres, Springer Optimization and Its Applications, vol. 51. Springer, New York (2011)
Reem, D.: The geometric stability of Voronoi diagrams in normed spaces which are not uniformly convex (2012). arXiv:1212.1094 [cs.CG]
Foertsch, T., Karlsson, A.: Hilbert metrics and Minkowski norms. J. Geom. 83(1–2), 22–31 (2005)
Cencov, N.N.: Statistical Decision Rules and Optimal Inference. Translations of Mathematical Monographs, vol. 53. American Mathematical Society, Providence (2000)
Cena, A.: Geometric structures on the non-parametric statistical manifold. Ph.D. thesis, University of Milano (2002)
Shen, Z.: Riemann-Finsler geometry with applications to information geometry. Chin. Ann. Math. Ser. B 27(1), 73–94 (2006)
Khosravifard, M., Fooladivanda, D., Gulliver, T.A.: Confliction of the convexity and metric properties in \(f\)-divergences. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 90(9), 1848–1853 (2007)
Dowty, J.G.: Chentsov’s theorem for exponential families (2017). arXiv:1701.08895 [math.ST]
Doup, T.M.: Simplicial Algorithms on the Simplotope, vol. 318. Springer Science & Business Media, Berlin (2012)
Vernicos, C.: Introduction aux géométries de Hilbert. Séminaire de théorie spectrale et géométrie 23, 145–168 (2004)
Arnaudon, M., Nielsen, F.: Medians and means in Finsler geometry. LMS J. Comput. Math. 15, 23–37 (2012)
Papadopoulos, A., Troyanov, M.: From Funk to Hilbert geometry (2014). arXiv:1406.6983 [math.MG]
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
8 Isometry of Hilbert Simplex Geometry to a Normed Vector Space
Consider the Hilbert simplex metric space \((\varDelta ^d,\rho _\mathrm {HG})\) where \(\varDelta ^d\) denotes the d-dimensional open probability simplex and \(\rho _\mathrm {HG}\) the Hilbert cross-ratio metric. Let us recall the isometry ([25], 1991) of the open standard simplex to a normed vector space \((V^d,\Vert \cdot \Vert _\mathrm {NH})\). Let \(V^d=\{v\in \mathbb {R}^{d+1} \ :\ \sum _i v^i=0\}\) denote the d-dimensional vector space sitting in \(\mathbb {R}^{d+1}\). Map a point \(p=(\lambda ^0,\ldots ,\lambda ^{d})\in \varDelta ^d\) to a point \(v(x)=(v^0,\ldots , v^{d})\in V^d\) as follows:
We define the corresponding norm \(\Vert \cdot \Vert _\mathrm {NH}\) in \(V^d\) by considering the shape of its unit ball \(B_V=\{v\in V^d \ :\ |v^i-v^j|\le 1, \forall i\not =j\}\). The unit ball \(B_V\) is a symmetric convex set containing the origin in its interior, and thus yields a polytope norm \(\Vert \cdot \Vert _\mathrm {NH}\) (Hilbert norm) with \(2\left( {\begin{array}{c}d+1\\ 2\end{array}}\right) =d(d+1)\) facets. Reciprocally, let us notice that a norm induces a unit ball centered at the origin that is convex and symmetric around the origin.
The distance in the normed vector space between \(v\in V^d\) and \(v'\in V^d\) is defined by:
where \(A\oplus B=\{a+b \ :\ a\in A,b\in B\}\) is the Minkowski sum.
The reverse map from the normed space \(V^d\) to the probability simplex \(\varDelta ^d\) is given by:
Thus we have \((\varDelta ^d,\rho _\mathrm {HG})\cong (V^d,\Vert \cdot \Vert _\mathrm {NH})\). In 1D, \((V^1,\Vert \cdot \Vert _\mathrm {NH})\) is isometric to the Euclidean line.
Note that computing the distance in the normed vector space requires naively \(O(d^2)\) time.
Unfortunately, the norm \(\Vert \cdot \Vert _\mathrm {NH}\) does not satisfy the parallelogram law.Footnote 3 Notice that a norm satisfying the parallelogram law can be associated with an inner product via the polarization identity. Thus the isometry of the Hilbert geometry to a normed vector space is not equipped with an inner product. However, all norms in a finite dimensional space are equivalent. This implies that in finite dimension, \((\varDelta ^d,\rho _\mathrm {HG})\) is quasi-isometric to the Euclidean space \(\mathbb {R}^d\). An example of Hilbert geometry in infinite dimension is reported in [25]. Hilbert spaces are not CAT spaces except when \(\mathscr {C}\) is an ellipsoid [76].
9 Hilbert Geometry with Finslerian/Riemannian Structures
In a Riemannian geometry, each tangent plane \(T_pM\) of a d-dimensional manifold M is equivalent to \(\mathbb {R}^d\): \(T_pM\simeq \mathbb {R}^d\). The inner product at each tangent plane \(T_pM\) can be visualized by an ellipsoid shape, a convex symmetric object centered at point p. In a Finslerian geometry, a norm \(\Vert \cdot \Vert _p\) is defined in each tangent plane \(T_pM\), and this norm is visualized as a symmetric convex object with non-empty interior. Finslerian geometry thus generalizes Riemannian geometry by taking into account generic symmetric convex objects instead of ellipsoids for inducing norms at each tangent plane. Any Hilbert geometry induced by a compact convex domain \(\mathscr {C}\) can be expressed by an equivalent Finslerian geometry by defining the norm in \(T_p\) at p as follows [76]:
where \(F_\mathscr {C}\) is the Finsler metric, \(\Vert \cdot \Vert \) is an arbitrary norm on \(\mathbb {R}^d\), and \(p^+\) and \(p^-\) are the intersection points of the line passing through p with direction v:
A geodesic \(\gamma \) in a Finslerian geometry satisfies:
In \(T_pM\), a ball of center c and radius r is defined by:
Thus any Hilbert geometry induces an equivalent Finslerian geometry, and since Finslerian geometries include Riemannian geometries, one may wonder which Hilbert geometries induce Riemannian structures? The only Riemannian geometries induced by Hilbert geometries are the hyperbolic Cayley–Klein geometries [27, 29, 30] with the domain \(\mathscr {C}\) being an ellipsoid. The Finslerian modeling of information geometry has been studied in [71, 72].
There is not a canonical way of defining measures in a Hilbert geometry since Hilbert geometries are Finslerian but not necessary Riemannian geometries [76]. The Busemann measure is defined according to the Lebesgue measure \(\lambda \) of \(\mathbb {R}^d\): Let \(B_p\) denote the unit ball wrt. to the Finsler norm at point \(p\in \mathscr {C}\), and \(B_e\) the Euclidean unit ball. Then the Busemann measure for a Borel set \(\mathscr {B}\) is defined by [76]:
The existence and uniqueness of center points of a probability measure in Finsler geometry have been investigated in [77].
10 Bounding Hilbert Norm with Other Norms
Let us show that \(\Vert v\Vert _\mathrm {NH}\le \beta _{d,c} \Vert v\Vert _c\), where \(\Vert \cdot \Vert _c\) is any norm. Let \(v=\sum _{i=0}^{d} e_i x_i\), where \(\{e_i\}\) is a basis of \(\mathbb {R}^{d+1}\). We have:
where the first inequality comes from the triangle inequality, and the second inequality is from the Cauchy–Schwarz inequality. Thus we have:
with \(\beta _d=\sqrt{d+1}\) since \(\Vert e_i\Vert _\mathrm {NH}\le 1\).
Let \(\alpha _{d,c}=\min _{\{v \ :\ \Vert v\Vert _c = 1\}} \Vert v\Vert _\mathrm {NH}\). Consider \(u=\frac{v}{\Vert v\Vert _c}\). Then \(\Vert u\Vert _c=1\) so that \(\Vert v\Vert _\mathrm {NH}\ge \alpha _{d,c} \Vert v\Vert _c\). To find \(\alpha _d\), we consider the unit \(\ell _2\) ball in \(V^d\), and find the smallest \(\lambda >0\) so that \(\lambda B_V\) fully contains the Euclidean ball (Fig. 14).
Therefore, we have overall:
In general, note that we may consider two arbitrary norms \(\Vert \cdot \Vert _l\) and \(\Vert \cdot \Vert _u\) so that:
11 Funk Directed Metrics and Funk Balls
The Funk metric [78] wrt a convex domain \(\mathscr {C}\) is defined by
where a is the intersection of the domain boundary and the affine ray R(x, y) starting from x and passing through y. Correspondingly, the reverse Funk metric is
where b is the intersection of R(y, x) with the boundary. The Funk metric is not a metric distance.
The Hilbert metric is simply the arithmetic symmetrization:
It is interesting to explore clustering based on the Funk geometry, which we leave as a future work.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Nielsen, F., Sun, K. (2019). Clustering in Hilbert’s Projective Geometry: The Case Studies of the Probability Simplex and the Elliptope of Correlation Matrices. In: Nielsen, F. (eds) Geometric Structures of Information. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-02520-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-02520-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02519-9
Online ISBN: 978-3-030-02520-5
eBook Packages: EngineeringEngineering (R0)