Skip to main content

Learning a Gaussian Process Model on the Riemannian Manifold of Non-decreasing Distribution Functions

  • Conference paper
  • First Online:
Book cover PRICAI 2019: Trends in Artificial Intelligence (PRICAI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11671))

Included in the following conference series:

Abstract

In this work, we consider the problem of learning regression models from a finite set of functional objects. In particular, we introduce a novel framework to learn a Gaussian process model on the space of Strictly Non-decreasing Distribution Functions (SNDF). Gaussian processes (GPs) are commonly known to provide powerful tools for non-parametric regression and uncertainty estimation on vector spaces. On top of that, we define a Riemannian structure of the SNDF space and we learn a GP model indexed by SNDF. Such formulation enables to define an appropriate covariance function, extending the Matérn family of covariance functions. We also show how the full Gaussian process methodology, namely covariance parameter estimation and prediction, can be put into action on the SNDF space. The proposed method is tested using multiple simulations and validated on real-world data.

The authors thank the ANITI program (Artificial Natural Intelligence Toulouse Institute) and the ANR Project RISCOPE (Risk-based system for coastal flooding early warning). JM Loubes acknowledges the funding by DEEL-IRT and C. Samir acknowledges the funding by CNRS Prime.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anderes, E.: On the consistent separation of scale and variance for Gaussian random fields. Ann. Stat. 38, 870–893 (2010)

    Article  MathSciNet  Google Scholar 

  2. Bachoc, F.: Cross validation and maximum likelihood estimations of hyper-parameters of gaussian processes with model misspecification. Comput. Stat. Data Anal. 66, 55–69 (2013)

    Article  MathSciNet  Google Scholar 

  3. Bachoc, F.: Asymptotic analysis of covariance parameter estimation for Gaussian processes in the misspecified case. Bernoulli 24, 1531–1575 (2018)

    Article  MathSciNet  Google Scholar 

  4. Bachoc, F., Gamboa, F., Loubes, J.M., Venet, N.: A Gaussian process regression model for distribution inputs. IEEE Trans. Inf. Theor. (2017)

    Google Scholar 

  5. Bachoc, F., Suvorikova, A., Loubes, J.M., Spokoiny, V.: Gaussian process forecast with multidimensional distributional entries. arXiv preprint arXiv:1805.00753 (2018)

  6. Boothby, W.M.: An Introduction to Differential Manifolds and Riemannian Geometry. Academic Press, New york (1975)

    MATH  Google Scholar 

  7. Dryden, L., Mardia, K.V.: Statistical Shape Analysis. Wiley, Hoboken (1998)

    MATH  Google Scholar 

  8. Efrat, A., Fan, Q., Venkatasubramanian, S.: Curve matching, time warping, and light fields: new algorithms for computing similarity between curves. J. Math. Imaging Vis. 27(3), 203–216 (2007)

    Article  MathSciNet  Google Scholar 

  9. Gamboa, F., Loubes, J.M., Maza, E.: Semi-parametric estimation of shifts. Electron. J. Stat. 1, 616–640 (2007)

    Article  MathSciNet  Google Scholar 

  10. Gervini, D., Gasser, T.: Self-modeling warping functions. J. Roy. Stat. Soc. B 66, 959–971 (2004)

    Article  Google Scholar 

  11. Grenander, U., Miller, M., Klassen, E., Le, H., Srivastava, A.: Computational anatomy: an emerging discipline. Q. Appl. Math. 4, 617–694 (1998)

    Article  MathSciNet  Google Scholar 

  12. James, G.: Curve alignment by moments. Ann. Appl. Stat., 480–501 (2007)

    Article  MathSciNet  Google Scholar 

  13. Kendall, D.G.: Shape manifolds, procrustean metrics and complex projective spaces. Bull. London Math. Soc. 16, 81–121 (1984)

    Article  MathSciNet  Google Scholar 

  14. Kneip, A., Gasser, T.: Statistical tools to analyze data representing a sample of curves. Ann. Stat. 20, 1266–1305 (1992)

    Article  MathSciNet  Google Scholar 

  15. Kolmogorov, A.N.: Wienersche spiralen und einige andere interessante kurven im hilbertschen raum. Doklady Akad. Nauk SSSR 26, 115–118 (1940)

    MathSciNet  MATH  Google Scholar 

  16. Kurtek, S., Srivastava, A., Wu, W.: Signal estimation under random time-warpings and nonlinear signal alignment. In: Neural Information Processing Systems (NIPS) (2011)

    Google Scholar 

  17. Liu, X., Müller, H.G.: Functional convex averaging and synchronization for time-warped random curves. J. Am. Stat. Assoc. 99, 687–699 (2004)

    Article  MathSciNet  Google Scholar 

  18. Michor, P.W., Mumford, D.: Riemannian geometries on spaces of plane curves. J. Eur. Math. Soc. 8, 1–48 (2006)

    Article  MathSciNet  Google Scholar 

  19. Ramsay, J.O., Li, X.: Curve registration. J. Roy. Stat. Soc. B 60, 351–363 (1998)

    Article  MathSciNet  Google Scholar 

  20. Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer Series in Statistics, 2nd edn. Springer, New York (2005). https://doi.org/10.1007/b98888

    Book  MATH  Google Scholar 

  21. Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  22. Sakoe, H.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26, 43–49 (1978)

    Article  Google Scholar 

  23. Srivastava, A., Wu, W., Kurtek, S., Klassen, E., Marron, J.S.: Registration of functional data using fisher-rao metric. arXiv:1103.3817v2 (2011)

  24. Srivastava, A., Klassen, E.: Functional and Shape Data Analysis. Springer, New York (2016). https://doi.org/10.1007/978-1-4939-4020-2

    Book  MATH  Google Scholar 

  25. Stein, M.L.: Interpolation of Spatial Data. Springer Series in Statistics. Springer, New York (1999). https://doi.org/10.1007/978-1-4612-1494-6

    Book  MATH  Google Scholar 

  26. Tang, R., Müller, H.G.: Pairwise curve synchronization for functional data. Biometrika 95(4), 875–889 (2008)

    Article  MathSciNet  Google Scholar 

  27. Zhang, H.: Inconsistent estimation and asymptotically equivalent interpolations in model-based geostatistics. J. Am. Stat. Assoc. 99, 250–261 (2004)

    Article  Google Scholar 

  28. Zhang, H., Wang, Y.: Kriging and cross-validation for massive spatial data. Environmetrics Official J. Int. Environ. Soc. 21(3–4), 290–304 (2010)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to François Bachoc .

Editor information

Editors and Affiliations

A Proofs

A Proofs

Proof (Proof of Proposition 1)

Proof

Let \(F_1,\dots ,F_n\) in \( {{\mathcal {F}}}\). For \(i=1,\ldots ,n\), let \(g_i = \log _{1}( \phi _i )\). Consider the matrix \(\tilde{C}=(<g_i,g_j>)_{\{i,j\}}\). This matrix is a Grammian matrix in \({\mathbb {R}}^{n \times n}\) hence there exists a non negative diagonal matrix D and an orthogonal matrix P such that

$$\begin{aligned} \tilde{C}=PDP^{'} = PD^{1/2} D^{1/2}P^{'}. \end{aligned}$$

Let \(e_1,\dots ,e_n\) be the canonical basis of \({\mathbb {R}}^n\). Then \( e_i^t \tilde{C} e_j= u_i^t u_j \) where \(u_i^t=e_i^t P D^{1/2}\). Note that the \(u_i\)’s are vectors in \({\mathbb {R}}^n\) that depend on the \(f_1,\dots ,f_n\). We get that

$$\begin{aligned} <g_i,g_j>\,= u_i^tu_j, \end{aligned}$$

and for any \(F_1,\dots ,F_n\) in \( {{\mathcal {F}}}\) there are \(u_1,\dots ,u_n\) in \({\mathbb {R}}^n\) such that

$$\begin{aligned} \Vert \log _{1}( \phi _i) - \log _{1}( \phi _j)\Vert = \Vert u_i-u_j\Vert . \end{aligned}$$

So any covariance matrix that can be written as \([ K(\Vert \log _{1}( \phi _i) - \log _{1}( \phi _j) \Vert ) ]_{i,j}\) can be seen as a covariance matrix \([ K(\Vert u_i-u_j\Vert ) ]_{i,j}\) on \({\mathbb {R}}^n\) and inherits its properties. The invertibility and non-negativity of this covariance matrix entail the invertibility and non-negativity of the first one, which proves the result.

Proof (Proof of Theorem 1)

Proof

Let \(\theta _1 , \theta _2 \in \varTheta \), with \(\theta _1 \ne \theta _2\). Then, there exists \(t^* \in [0, \pi /4] \) so that \(K_{\theta _1}(0) - K_{\theta _1}(2 t^*) \ne K_{ \theta _2 }(0) - K_{ \theta _2 }( 2t^*)\).

For \(i \in {\mathbb {N}}\), let \(c_i:[0,1] \rightarrow {\mathbb {R}}\) be defined by \(c_i(t) = t^* \cos (2 \pi i t)\). Then, \(c_i \in T_1({\mathcal {H}})\). Let \(\tilde{e}_i = \exp _{1}( c_i )\). Then, for \(t \in [0,1]\)

$$\begin{aligned} \tilde{e}_i(t) = \cos ( t^* ) + \frac{\sin ( t^* )}{t^*} t^* \cos (2 \pi i t) \ge \cos ( t^* ) - \sin (t^*) \ge 0. \end{aligned}$$

It follows that \(\tilde{e}_i \in {\mathcal {H}}\) and we can let \(\tilde{F}_i(t) = \int _{0}^t \tilde{e}_i(s)^2 ds\). Letting \(\bar{e}_i = \exp _{1}( -c_i )\), we obtain similarly that \(\bar{e}_i \in {\mathcal {H}}\) and we let \(\bar{F}_i(t) = \int _{0}^t \bar{e}_i(s)^2 ds\).

Consider the 2n elements \((F_1,...,F_{2n})\) composed by the pairs \(( \tilde{F}_i,\bar{F_i})\) for \(i=1,\dots ,n\). Consider a Gaussian process Z on \( {{\mathcal {F}}}\) with mean function zero and covariance function \(K_{ \theta _1 }\). Then, the Gaussian vector \(W = (Z(F_i))_{i=1,...,2n}\) has covariance matrix C given by

$$\begin{aligned} C_{i,j} = {\left\{ \begin{array}{ll} K_{ \theta _1 }(0) &{} \text{ if } i=j \\ K_{ \theta _1 }(2 t^*) &{} \text{ if } i \text{ odd } \text{ and } j=i+1 \\ K_{ \theta _1 }(2 t^*) &{} \text{ if } i \text{ even } \text{ and } j=i-1 \\ K_{ \theta _1 }( \sqrt{2} t^*) &{} \text{ else. } \end{array}\right. } \end{aligned}$$

Hence, we have \(C = D + M\) where M is the matrix with all components equal to \(K_{ \theta _1 }( \sqrt{2} t^*)\) and where D is block diagonal, composed of n blocks of size \(2 \times 2\), with each block \(B_{2,2}\) equal to

$$\begin{aligned} \begin{pmatrix} K_{ \theta _1 }(0) - K_{ \theta _1 }( \sqrt{2} t^*) &{} K_{ \theta _1 }(2 t^*) - K_{ \theta _1 }( \sqrt{2} t^*) \\ K_{ \theta _1 }(2 t^*) - K_{ \theta _1 }( \sqrt{2} t^*) &{} K_{ \theta _1 }(0) - K_{ \theta _1 }( \sqrt{2} t^*) \end{pmatrix}. \end{aligned}$$

Hence, in distribution, \(W = M + E\), with M and E independent, \(M=(z,....,z)\) where \(z \sim {\mathcal {N}}(0,K_{ \theta _1 }( \sqrt{2} t^*))\) and where the n pairs \((E_{2k+1},E_{2k+2})\), \(k=0,...,n-1\) are independent, with distribution \({\mathcal {N}}(0,B_{2,2})\). Hence, with \(\bar{W}_1 = (1/n) \sum _{k=0}^{n-1} W_{2k+1}\), \(\bar{W}_2 = (1/n) \sum _{k=0}^{n-1} W_{2k+2}\) and \(\bar{E} = (1/n) \sum _{k=0}^{n-1} (E_{2k+1},E_{2k+2})^t\), we have

$$\begin{aligned} \hat{B}&:= \frac{1}{n} \sum _{i=0}^{n-1} \begin{pmatrix} W_{2i+1} - \bar{W}_1 \\ W_{2i+2} - \bar{W}_2 \end{pmatrix} \begin{pmatrix} W_{2i+1} - \bar{W}_1 \\ W_{2i+2} - \bar{W}_2 \end{pmatrix}^t \\&= \frac{1}{n} \sum _{i=0}^{n-1} \begin{pmatrix} E_{2i+1} \\ E_{2i+2} \end{pmatrix} \begin{pmatrix} E_{2i+1} \\ E_{2i+2} \end{pmatrix}^t - \bar{E} \bar{E}^t \\&\rightarrow _{n \rightarrow \infty }^{p} B_{2,2}. \end{aligned}$$

Hence, there exists a subsequence \(n' \rightarrow \infty \) so that, almost surely \(\hat{B} \rightarrow B_{2,2}\) as \(n' \rightarrow \infty \). Hence, almost surely \(\hat{B}_{1,1} - \hat{B}_{1,2} \rightarrow K_{ \theta _1 }(0) - K_{ \theta _1 }(2 t^*)\) as \(n' \rightarrow \infty \). Hence, the event \(\{ \hat{B}_{2,2} \rightarrow _{n' \rightarrow \infty } K_{ \theta _1 }(0) - K_{ \theta _1 }(2 t^*) \}\) has probability one under \({\mathbb {P}}_{\theta _1}\). With the same arguments, we can show that the event \(\{ \hat{B}_{2,2} \rightarrow _{n'' \rightarrow \infty } K_{ \theta _2 }(0) - K_{ \theta _2 }(2 t^*) \)} has probability one under \({\mathbb {P}}_{\theta _2}\), where \(n''\) is a subsequence extracted from \(n'\). Since these two events have zero intersection, it follows that \({\mathbb {P}}_{\theta _1}\) and \({\mathbb {P}}_{\theta _2}\) are orthogonal. Hence, \(\theta \) is microergodic.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Samir, C., Loubes, JM., Yao, AF., Bachoc, F. (2019). Learning a Gaussian Process Model on the Riemannian Manifold of Non-decreasing Distribution Functions. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11671. Springer, Cham. https://doi.org/10.1007/978-3-030-29911-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29911-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29910-1

  • Online ISBN: 978-3-030-29911-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics