Abstract
In topological data analysis, persistent homology characterizes robust topological features in data and it has a summary representation, called a persistence diagram. Statistical research for persistence diagrams have been actively developed, and the persistence weighted kernel shows several advantages over other statistical methods for persistence diagrams. If data is drawn from some probability distribution, the corresponding persistence diagram have randomness. Then, the expectation of the persistence diagram by the persistence weighted kernel is well-defined. In this paper, we study relationships between a probability distribution and the persistence weighted kernel in the viewpoint of (1) the strong law of large numbers and the central limit theorem, (2) a confidence interval to estimate the expectation of the persistence weighted kernel numerically, and (3) the stability theorem to ensure the continuity of the map from a probability distribution to the expectation. In numerical experiments, we demonstrate our method gives an interesting counterexample to a common view in topological data analysis.
Similar content being viewed by others
Notes
This was originally called the persistence weighted Gaussian kernel in [26, 27] because we mainly focused on the Gaussian kernel \(k(x,y)=\exp (- \left\| x-y\right\| ^{2}/2\sigma ^{2}) ~ (\sigma >0)\) as the positive definite kernel, but the framework can be generalized to other positive definite kernels. Hence, we drop the word “Gaussian” here.
A subset \(\mathbf {J}\subset \mathbb {R}\) is said to be an interval if, for any \(a,c \in \mathbf {J}\), \(b \in \mathbb {R}\) satisfying \(a< b < c\) is in \(\mathbf {J}\).
A multiset is a set with multiplicity of each point. Note that the collection of birth-death pairs should be a multiset because an interval decomposition of \(\mathbb {U}\) can contain several intervals with the same birth-death pairs.
Precisely speaking, the definition of the birth and death time can contain \(-\infty \) and \(\infty \). However, in practical, we can assume that all birth and death times take neither \(\infty \) nor \(-\infty \) (for more details, please see Section 2.1.2 in [27]).
By considering infinite multiplicity of the diagonal set \({\varDelta }\), there always exists a multi-bijection from \(D \cup {\varDelta }\) to \(E \cup {\varDelta }\). The bottleneck distance is also called \(\infty \)-Wasserstein distance.
This is also shown in Corollary 4 of [2]
\(B^{*}\) is the set of all continuous linear real-valued functions \(f:B \rightarrow \mathbb {R}\).
We call \(V:\varOmega \rightarrow B\)Radon if, for any \({\varepsilon }>0\), there exists a compact set K in the Borel \(\sigma \)-set of B such that \({\mathrm {Pr}}(V \in K) \ge 1-{\varepsilon }\). For the proof of \(\left\| \mathbb {E}[V]\right\| _{B} \le \mathbb {E}[\Vert V \Vert _{B}]\) and other details, please see Section 2.1 in [29].
We do not define the concept of type 2 in this paper because a Hilbert space is of type 2 and a Banach space which will be used in this paper is a reproducing kernel Hilbert space. For more details, please see Section 9.2 in [29].
\(\mathcal {N}(\mu ,\sigma ^{2})\) denotes the normal distribution with mean \(\mu \in \mathbb {R}\) and variance \(\sigma ^{2}>0\).
Without loss of generality, we can make \(\log (T \mathop {\mathrm {Lip}}(k)\mathop {\mathrm {Bdd}}(k)^{-1/2})>0\) by retaking larger T or \(\mathop {\mathrm {Lip}}(k)\) if necessary. Remark that \(- \log {\varepsilon }\ge 0\) for \({\varepsilon }\in (0,1]\) and \( \int _{0}^{1} \sqrt{- \log {\varepsilon }}d{\varepsilon }< \infty \).
A probability measure \(\pi \) on \(M \times M\) is called a coupling between \(\mu \) and \(\nu \) if, for a natural projection \(p_{i}(x_{1},x_{2})=x_{i} ~ (i=1,2, ~ x_{i} \in M)\), their induced measures satisfy \((p_{1})_{*}\pi =\mu \) and \((p_{2})_{*}\pi =\nu \).
In experiments, the quantile is estimated by resampling from both \(\varvec{D}_{n}\) and \(\varvec{E}_{n}\) randomly, called the aggregated data, because the current hypothesis is \(P=Q\). For more details, please see Appendix A.
For comparison in machine learning tasks among the PWK vector, a persistence landscape, and the persistence scale-space kernel, please see [27].
For the reason of fixing the parameters in this way, please also see [27].
\(\alpha \) is the significance level. q is the dimension of persistence diagrams. m is the number of sampling to compute the type I error empirically. N is the number of trials to compute the type I error.
Note that \(\phi _{*}\mathcal {P}f^{k,w}_{z}\) is the value at z of the expectation of the PWK vector, that is, \(\phi _{*}\mathcal {P}f^{k,w}_{z}=\mathbb {E}_{X \sim \mathcal {P}}[V^{k,w}(D_{q}(\mathbb {B}(X)))](z)\).
References
Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., Chepushtanova, S., Hanson, E., Motta, F., Ziegelmeier, L.: Persistence images: a stable vector representation of persistent homology. J. Mach. Learn. Res. 18(8), 1–35 (2017)
Berlinet, A., Thomas-Agnan, C.: Reproducing kernel Hilbert spaces in probability and statistics. Springer, Berlin (2011)
Bubenik, P.: Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16(1), 77–102 (2015)
Bubenik, P., Scott, J.A.: Categorification of persistent homology. Discret. Comput. Geom. 51(3), 600–627 (2014). https://doi.org/10.1007/s00454-014-9573-x
Cang, Z., Mu, L., Wu, K., Opron, K., Xia, K., Wei, G.W.: A topological approach for protein classification. Comput. Math. Biophys. 3(1), 140–162 (2015)
Carlsson, G., de Silva, V.: Zigzag persistence. Found. Comput. Math. 10(4), 367–405 (2010)
Carrière, M., Cuturi, M., Oudot, S.: Sliced Wasserstein kernel for persistence diagrams. In: D. Precup, Y.W. Teh (eds.) Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 70, pp. 664–673. PMLR, International Convention Centre, Sydney, Australia (2017). http://proceedings.mlr.press/v70/carriere17a.html
Chazal, F., Fasy, B., Lecci, F., Michel, B., Rinaldo, A., Wasserman, L.: Subsampling methods for persistent homology. In: F. Bach, D. Blei (eds.) Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 37, pp. 2143–2151. PMLR, Lille, France (2015). http://proceedings.mlr.press/v37/chazal15.html
Chazal, F., Fasy, B.T., Lecci, F., Rinaldo, A., Singh, A., Wasserman, L.: On the bootstrap for persistence diagrams and landscapes. Model. Anal. Inf. Syst. 20(6), 111–120 (2013)
Chazal, F., Fasy, B.T., Lecci, F., Rinaldo, A., Wasserman, L.: Stochastic convergence of persistence landscapes and silhouettes. In: Proceedings of the Thirtieth Annual Symposium on Computational Geometry, SOCG’14, pp. 474–483. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2582112.2582128
Chazal, F., de Silva, V., Oudot, S.: Persistence stability for geometric complexes. Geom. Dedic. 173(1), 193–214 (2014). https://doi.org/10.1007/s10711-013-9937-z
Cohen-Steiner, D., Edelsbrunner, H., Harer, J.: Stability of persistence diagrams. Discret. Comput. Geom. 37(1), 103–120 (2007). https://doi.org/10.1007/s00454-006-1276-5
Crawley-Boevey, W.: Decomposition of pointwise finite-dimensional persistence modules. J. Algebra Appl. 14(05), 1550066 (2015)
Donatini, P., Frosini, P., Lovato, A.: Size functions for signature recognition. Proceedings of SPIE - The International Society for Optical Engineering 3454 (1998)
Durrett, R.: Probability: Theory and Examples. Cambridge University Press, Cambridge (2010)
Edelsbrunner, H., Letscher, D., Zomorodian, A.: Topological persistence and simplification. Discret. Comput. Geom. 28(4), 511–533 (2002). https://doi.org/10.1007/s00454-002-2885-2
Gameiro, M., Hiraoka, Y., Izumi, S., Kramar, M., Mischaikow, K., Nanda, V.: A topological measurement of protein compressibility. Jpn J. Ind. Appl. Math. 32(1), 1–17 (2015)
Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.J.: A kernel method for the two-sample-problem. In: B. Schölkopf, J.C. Platt, T. Hoffman (eds.) Advances in Neural Information Processing Systems 19, pp. 513–520. MIT Press (2007). http://papers.nips.cc/paper/3110-a-kernel-method-for-the-two-sample-problem.pdf
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13(Mar), 723–773 (2012)
Gretton, A., Fukumizu, K., Harchaoui, Z., Sriperumbudur, B.K.: A fast, consistent kernel two-sample test. In: Y. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, A. Culotta (eds.) Advances in Neural Information Processing Systems 22, pp. 673–681. Curran Associates, Inc. (2009). http://papers.nips.cc/paper/3738-a-fast-consistent-kernel-two-sample-test.pdf
Hatcher, A.: Algebraic Topology. Cambridge University Press, Cambridge (2002)
Hiraoka, Y., Kusano, G.: Relative interleavings and applications to sensor networks. Jpn. J. Ind. Appl. Math. 33, 1–22 (2016)
Hiraoka, Y., Nakamura, T., Hirata, A., Escolar, E.G., Matsue, K., Nishiura, Y.: Hierarchical structures of amorphous solids characterized by persistent homology. Proc. Natl. Acad. Sci. 113(26), 7035–7040 (2016). https://doi.org/10.1073/pnas.1520877113
Kosorok, M.R.: Introduction to Empirical Processes and Semiparametric Inference. Springer, New York (2008). https://doi.org/10.1007/978-0-387-74978-5
Kramár, M., Levanger, R., Tithof, J., Suri, B., Xu, M., Paul, M., Schatz, M.F., Mischaikow, K.: Analysis of Kolmogorov flow and Rayleigh–Bénard convection using persistent homology. Phys. D Nonlinear Phenom. 334, 82–98 (2016)
Kusano, G., Hiraoka, Y., Fukumizu, K.: Persistence weighted gaussian kernel for topological data analysis. In: M.F. Balcan, K.Q. Weinberger (eds.) Proceedings of The 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 48, pp. 2004–2013. PMLR, New York, New York, USA (2016). http://proceedings.mlr.press/v48/kusano16.html
Kusano, G., Fukumizu, K., Hiraoka, Y.: Kernel method for persistence diagrams via kernel embedding and weight factor. J. Mach. Learn. Res. 18(189), 1–41 (2018)
Kwitt, R., Huber, S., Niethammer, M., Lin, W., Bauer, U.: Statistical topological data analysis - a kernel perspective. In: C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama, R. Garnett (eds.) Advances in Neural Information Processing Systems 28, pp. 3070–3078. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5887-statistical-topological-data-analysis-a-kernel-perspective.pdf
Ledoux, M., Talagrand, M.: Probability in Banach Spaces: Isoperimetry and Processes, vol. 23. Springer, New York (2013). 10.1007/978-3-642-20212-4
Matérn, B.: Spatial variation: stochastic models and their application to some problems in forest surveys and other sampling investigations. Meddelanden Fran Statens Skogsforskningsinstitut 49(5), 1–144 (1960)
Nakamura, T., Hiraoka, Y., Hirata, A., Escolar, E.G., Nishiura, Y.: Persistent homology and many-body atomic structure for medium-range order in the glass. Nanotechnology 26, 304001 (2015)
Paulsen, V.I., Raghupathi, M.: An Introduction to the Theory of Reproducing Kernel Hilbert Spaces, vol. 152. Cambridge University Press, Cambridge (2016)
Reininghaus, J., Huber, S., Bauer, U., Kwitt, R.: A stable multi-scale kernel for topological machine learning. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4741–4748 (2015). https://doi.org/10.1109/CVPR.2015.7299106
Robins, V., Turner, K.: Principal component analysis of persistent homology rank functions with case studies of spatial point patterns, sphere packing and colloids. Phys. D Nonlinear Phenom. 334, 99–117 (2016)
Saadatfar, M., Takeuchi, H., Robins, V., Francois, N., Hiraoka, Y.: Pore configuration landscape of granular crystallization. Nat. Commun. 8, 15082 EP (2017). https://doi.org/10.1038/ncomms15082
de Silva, V., Ghrist, R.: Coverage in sensor networks via persistent homology. Algebraic Geom. Topol. 7(1), 339–358 (2007)
Skraba, P., Ovsjanikov, M., Chazal, F., Guibas, L.: Persistence-based segmentation of deformable shapes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pp. 45–52 (2010). https://doi.org/10.1109/CVPRW.2010.5543285
Van der Vaart, A.: Asymptotic Statistics, vol. 3. Cambridge University Press, Cambridge (1998). 10.1017/CBO9780511802256
Zomorodian, A., Carlsson, G.: Computing persistent homology. Discret. Comput. Geom. 33(2), 249–274 (2005). https://doi.org/10.1007/s00454-004-1146-y
Acknowledgements
The author wish to express their sincere gratitude to Yasuaki Hiraoka, Tomoyuki Shirai, and Emerson Escolar for valuable discussions and comments on this paper. This work is supported by JSPS Research Fellow (17J02401).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Kernel two sample test
Appendix A: Kernel two sample test
In this section, we briefly review the kernel two sample test, following [18, 19].
1.1 Gereral framework of the two sample test
Let \((\mathcal {X}, \mathcal {B}_{\mathcal {X}})\) be a topological space, P and Q be probability distributions on \(\mathcal {X}\), \(X_{1},\ldots ,X_{m}, {\mathrm {i.i.d.}}\sim P\), \(Y_{1},\ldots , Y_{n}, {\mathrm {i.i.d.}}\sim Q\), and \(\theta (\varvec{X}_{m}, \varvec{Y}_{n})\) be a statistics. If \(H_{0}\) is true, the statistics \(\theta (\varvec{X}_{m}, \varvec{Y}_{n})\) depends only on P because \(Y_{1},\ldots , Y_{n}, {\mathrm {i.i.d.}}\sim Q=P\). Here, we assume that the upper \(\alpha \)-quantile \({\hat{\xi }}_{m,n,\alpha }\) which satisfies \({\mathrm {Pr}}(\theta (\varvec{X}_{m}, \varvec{Y}_{n}) \le {\hat{\xi }}_{m,n,\alpha })=1-\alpha \) is computable when \(H_{0}\) is true. When \(\alpha \) is set a small value, if a pair of realizations \((\varvec{x}_{m}, \varvec{y}_{n})\) of \((\varvec{X}_{m},\varvec{Y}_{n})\) satisfies \({\hat{\xi }}_{m,n, 1-\alpha /2} \le \theta (\varvec{x}_{m}, \varvec{y}_{n}) \le {\hat{\xi }}_{m,n,\alpha /2}\), we conclude that the hypothesis \(H_{0}\) is accepted under \(H_{0}\). Otherwise, we conclude that \(H_{0}\) is rejected. The threshold \(\alpha \) is called the significance level and \(\alpha =0.01\) or 0.05 is often used.
1.2 Kernel method for the two sample problem
Let k be a measurable positive definite kernel on \(\mathcal {X}\) satisfying \(\int _{\mathcal {X}}\int _{\mathcal {X}} k(x,y)^{2}dP(x)dQ(y) < \infty \). In [18, 19], a statistics
is used to the two sample test and the distribution function is given as follows:
Theorem 12
(Theorem 8 in [18], Theorem 12 in [19]) Under the null hypothesis \(H_{0}\), \(n\mathrm {MMD}_{u}(\varvec{X}_{n},\varvec{Y}_{n};k)^{2} \rightarrow _{\mathrm {d}}\sum _{i=1}^{\infty }\lambda _{i}(z_{i}^{2}-2)\) where \(z_{1},\ldots , {\mathrm {i.i.d.}}\sim \mathcal {N}(0,2)\), \(\{\lambda _{i}\}_{i=1}^{\infty }\) are the solutions to the eigenvalue equation
and \({\tilde{k}}(x_{i},x_{j})=k(x_{i},x_{j})-\int _{\mathcal {X}}k(x_{i},x)dP(x)-\int _{\mathcal {X}}k(x,x_{j})dP(x)-\int _{\mathcal {X}}\int _{\mathcal {X}}k(x,x')dP(x)dP(x')\).
In order to obtain the distribution of \(\sum _{i=1}^{\infty }\lambda _{i}(z_{i}^{2}-2)\) numerically, we approximate the eigenvalues \(\{\lambda _{i}\}_{i=1}^{\infty }\) in Eq. (21). Let \(\tilde{\varvec{k}}\) denote the centered Gram matrix of \(\{x_{1},\ldots ,x_{n}\}\) whose (i, j) component is given by \((\tilde{\varvec{k}})_{i,j}=k(x_{i},x_{j})- n^{-1} \sum _{b=1}^{n}k(x_{i},x_{b})-n^{-1} \sum _{a=1}^{n}k(x_{a},x_{j}) + n^{-2} \sum _{a,b=1}^{n}k(x_{a},x_{b})\) and \(\{{\hat{\mu }}_{i}\}_{i=1}^{n}\) be the set of the eigenvalues of \(\tilde{\varvec{k}}\). Then, it is shown from Theorem 1 in [20] that \(\sum _{i=1}^{n}{\hat{\lambda }}_{i}(z_{i}^{2}-2) \rightarrow _{\mathrm {d}}\sum _{i=1}^{\infty }\lambda _{i}(z_{i}^{2}-2)\) where \({\hat{\lambda }}_{i}=n^{-1}{\hat{\mu }}_{i}\). Therefore, the upper \(\alpha \)-quantile of \(n\mathrm {MMD}_{u}(\varvec{X}_{n}, \varvec{Y}_{n})^{2}\) is numerically obtained from the histogram of \(\sum _{i=1}^{n}{\hat{\lambda }}_{i}(z_{i}^{2}-2)\). Since the current null hypothesis is \(P=Q\), we have \(Y_{i} \sim P\) and we can approximate the eigenvalues on the aggregated data, that is, the eigenvalues are approximated by the centered Gram matrix of \(\{x_{1},\ldots ,x_{n},y_{1},\ldots ,y_{n}\}\). We estimate the quantile of \(\sum _{i=1}^{2n}{\hat{\lambda }}_{i}(z_{i}^{2}-2)\) by the (standard) bootstrap method. To sum up, the algorithm of the kernel two sample problem is given as follows:
Since the output \({\hat{p}}\) of Algorithm 3 is the acceptance ratio of \(H_{0}\), \(1-{\hat{p}}\) is the type I error when \(H_{0}\) is true.
About this article
Cite this article
Kusano, G. On the expectation of a persistence diagram by the persistence weighted kernel. Japan J. Indust. Appl. Math. 36, 861–892 (2019). https://doi.org/10.1007/s13160-019-00374-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13160-019-00374-2