Probability Theory and Related Fields

, Volume 171, Issue 3–4, pp 1045–1091 | Cite as

Convergence of the reach for a sequence of Gaussian-embedded manifolds

  • Robert J. Adler
  • Sunder Ram Krishnan
  • Jonathan E. Taylor
  • Shmuel Weinberger


Motivated by questions of manifold learning, we study a sequence of random manifolds, generated by embedding a fixed, compact manifold M into Euclidean spheres of increasing dimension via a sequence of Gaussian mappings. One of the fundamental smoothness parameters of manifold learning theorems is the reach, or critical radius, of M. Roughly speaking, the reach is a measure of a manifold’s departure from convexity, which incorporates both local curvature and global topology. This paper develops limit theory for the reach of a family of random, Gaussian-embedded, manifolds, establishing both almost sure convergence for the global reach, and a fluctuation theory for both it and its local version. The global reach converges to a constant well known both in the reproducing kernel Hilbert space theory of Gaussian processes, as well as in their extremal theory.


Gaussian process Manifold Random embedding Critical radius Reach Curvature Asymptotics Fluctuation theory 

Mathematics Subject Classification

Primary 60G15 57N35 Secondary 60D05 60G60 

1 Introduction

This paper has two themes to it. One lies in the general area of the geometry of Gaussian processes, or random fields, over general spaces, and is about random embeddings. The second is more topological, and can be seen as putting probability measures on spaces of manifolds, and then studying the behavior of their reach. Both are motivated from recent results in manifold learning.

1.1 Gaussian embeddings

We start with parameter spaces which will always be m-dimensional, compact, smooth manifolds, without boundary, and which will be denoted by M. On M, we define a centered, unit variance, smooth, Gaussian process \(f:\,M\rightarrow {\mathbb R}\), the distribution of which is characterized by its covariance function \(\mathbb {C}:\,M\times M\rightarrow {\mathbb R}\). Taking \(k\ge 1\), we also define a \({\mathbb R}^k\)-valued process
$$\begin{aligned} f^k(x)= \left( f_1(x),f_2(x),\ldots ,f_k(x)\right) , \end{aligned}$$
made up of the first k processes in an infinite sequence of i.i.d. copies of f. It is not hard to check (under the mild side requirements that will be made formal later) that (1.1) defines, with probability one, an embedding (i.e. an injective homeomorphism) \(f^k(M)\) of M into \({\mathbb R}^k\) for all \(k\ge 2m+1\), akin to what one would expect from the Whitney embedding theorem. We call this a Gaussian embedding of M.
It is easy to check that the diameter of \(f^k(M)\) is \(O(\sqrt{k})\). Thus, to keep the embedding under control, we need to normalise it either by \(\sqrt{k}\), or self-normalise by defining
$$\begin{aligned} h^k(x)\ \mathop {=}\limits ^{\Delta }\ \frac{f^k(x)}{\Vert f^k(x)\Vert },\qquad x\in M, \end{aligned}$$
where \(\Vert \cdot \Vert \) is the standard Euclidean norm, and consider the embedding \(h^k(M)\), which now lies in the unit sphere \(S^{k-1}\) in \({\mathbb R}^k\). For reasons of notational convenience, this is the embedding that we shall consider in the current paper, although we could just as well have adopted a \(\sqrt{k}\) normalisation without any qualitative changes in our results, although some of the details would be different. We call \(h^k(M)\) a self-normalised, Gaussian, embedding of M.

However, although all of Mf and the ambient spheres are smooth, it is not so clear how smooth these embeddings are going to be as \(k\rightarrow \infty \). On the one hand, the self-normalisation in (1.2) ensures that \(h^k(M)\) lies in a fixed radius sphere. On the other hand, high-dimensional spheres are strange objects, with surface areas tending to zero as the dimension grows. Thus, given the increasing independence added into the mapping with each new f component, it is not at all a priori clear whether the embeddings eventually become rough, and perhaps fractal, or whether there is some sort of strong law behavior that leads to deterministic behavior in the limit. If the latter case is correct (which it is) then an associated fluctuation theory is called for.

The main results of this paper resolve these issues, at least in the framework of the reach of the self-normalised Gaussian embeddings \(h^k(M)\), as \(k\rightarrow \infty \).

1.2 Reach

The modern notion of reach seems to have appeared first in the classic paper [9] of Federer, in which he introduced the notion of sets with positive reach and their associated curvatures and curvature measures. Working in an Euclidean setting, Federer was able to include, in a single framework, Steiner’s tube formula for convex sets and Weyl’s tube formula for \(C^2\) smooth submanifolds of \({\mathbb R}^n\). The importance of this framework extended, however, far beyond tube formulae, as it became clear that much of the theory surrounding convex sets could be extended to sets that were, in some sense, locally convex, and that the reach of a set was precisely the way to quantify this property.

To be just a little more precise, but going beyond the Euclidean setting, we start with a smooth manifold N embedded in an ambient manifold \(\widehat{N}\). Then the local reach at a point \(x\in N\) is the furthest distance one can travel, along any vector based at x but normal to N in \(\widehat{N}\), without meeting a similar vector originating at another point in N. The (global) reach of N is then the infimum of all local reaches. The reach is related to local properties of N through its second fundamental form, but also to global structure, since points on N that are far apart in a geodesic sense might be quite close in the metric of the ambient space \(\widehat{N}\). The reach of a manifold is also known as its ‘critical radius’ for a good geometrical reason described below. (See the paragraph following (2.3).) However, to avoid any possible confusion, we shall use only the term ‘reach’ throughout this paper.

We shall give precise definitions in the following section, noting for now that beyond its importance in tube formulae and other classical areas of Differential Geometry and Topology, the notion of positive reach has recently begun to play an important role in the literature of Topological Data Analysis (TDA) in general, and manifold learning via algebraic techniques in particular. We shall discuss this briefly at the end of Sect. 2.

1.3 Main results and structure of the paper

With the terminology we have so far alluded to (but in most cases have yet to define rigorously) let \(\theta (N,x)\) denote the local reach of a manifold N at the point \(x\in N\), while
$$\begin{aligned} \tau \equiv \tau (N) \mathop {=}\limits ^{\Delta }\inf _{x\in N} \theta (N,x), \end{aligned}$$
denotes the global reach of N. We, however, are interested in the reach of \(h^k(M)\), and the main result of this paper is Theorem 4.3, which states that there is a deterministic function \(\sigma ^2_c(f,x), x\in M\), such that, with probability one, and uniformly in \(x\in M\),
$$\begin{aligned} \cot ^2 \left( \theta \left( h^k(M),h^k(x)\right) \right) \rightarrow \sigma _c^2(f,x), \end{aligned}$$
as \(k\rightarrow \infty \). An immediate consequence of this is the existence of a constant, denoted by \(\sigma ^2_c(f)\), such that the sequence of global reaches satisfies
$$\begin{aligned} \cot ^2 \left( \tau \left( h^k(M)\right) \right) \mathop {\rightarrow }\limits ^{a.s. }\ \sigma ^2_c(f) \mathop {=}\limits ^{\Delta }\sup _{x\in M} \sigma _c^2(f,x). \end{aligned}$$
While the notation regarding the various versions of \(\sigma ^2\) is a little heavy, it is time-honored, since the constant \(\sigma ^2_c(f)\) has appeared previously in the extremal theory of Gaussian processes. In fact, one of the most interesting aspects of the convergence in (1.4) is the, a priori surprising, fact that \(\sigma ^2_c(f)\) is the limit. This constant had arisen earlier in a completely different context in [1, 24]. That context, described briefly in Sect. 4.2, related to rigorously proving the so called ‘Euler characteristic heuristic’, which approximates a wide class of Gaussian extremal probabilities via the expected Euler characteristic of their excursion sets. The role of the constant there is in quantifying the super-exponentially small error rate involved in the approximation. We shall discuss the importance of this constant in more detail in Sect. 4.
Given the convergence in (1.3), it is natural to ask if an associated fluctuation result also holds. Indeed, this is the case, and Theorem 4.3 also gives us that
$$\begin{aligned} \sqrt{k}\left( \cot ^2 \left( \theta \left( h^k(M),h^k(\cdot )\right) \right) - \sigma _c^2(f,\cdot )\right) \end{aligned}$$
converges, in distribution, as \(k\rightarrow \infty \), to a limit which can be bounded by the supremum of a certain Gaussian process, the precise distribution of which is given much later in Theorem 11.1.

The remainder of the paper is organised as follows: In the following section we have collected some general results about positive reach that were a large part of the motivation for our study. The reader uninterested in motivation can skip all but the definition of reach in Sect. 2.1. The reader interested in knowing more about the history and applications of positive reach is referred to the excellent survey by Thale [25], or Chapter 7 of [6], which discusses reach in the context of TDA.

After the brief Sect. 3 devoted to notations, Sect. 4 defines Gaussian processes on manifolds and associated notions such as the induced metric. It also introduces the constant \(\sigma ^2_c(f)\). Much of this section is a quick summary of the material in [1] needed for this paper, and once this is done we have everything defined well enough to state the main result of the paper.

The real work starts in Sect. 5, in which we develop specific representations for the reach of a \(S^{k-1}\) embedded manifold which form the basis of all that follows. Some of the results here already exist in the literature, and the proofs of these are relegated to an “Appendix”. Some are new and full proofs are given in situ. Sect. 6 lists four lemmas, from which, together with the representation of Sect. 5, the proof of the a.s. convergence in the main theorem follows easily. Following this, Sects. 710, which is where the hardest work lies, then prove these lemmas, one at a time. In Sect. 11 we turn to the fluctuation result of (1.5), both proving it and describing the limit process. Two technical appendices complete the paper.

2 Critical radius and positive reach

2.1 The definition

Throughout the paper our underlying manifold M will satisfy the following assumptions:

Assumption 2.1

M is an m-dimensional manifold, compact, boundaryless, oriented, \(C^3\), and connected.

Sometimes we shall assume that M is associated with a Riemannian metric g, and sometimes that it is embedded in a smooth Riemannian manifold \((\widehat{M},{{\widehat{g}}})\). The main example that we shall need for this paper for an embedding space is the unit sphere \({{\widehat{M}}}=S^{k-1}\), but we shall also meet the simple Euclidean case \({{\widehat{M}}}={\mathbb R}^k\) when discussing tube formulae below. In the first example, geodesics are along great circles, and the associated Riemannian distance is measured via angular distance. In the second, the geometry is the standard Euclidean one.

As an aside, we note that all our results could be extended to the case of manifolds with boundary, and even stratified manifolds satisfying the kind of side conditions endemic to [1]. However, then we would also have to suffer through all the heavy notation endemic to [1], which seemed unnecessary, given that our primary motivation was to establish a general principle rather than the most general result possible.

For the main result of the paper, all of the conditions in Assumption 2.1 are required. This is not true for some of the lemmas along the way, but for ease of exposition we shall generally adopt all the conditions throughout the paper. For the fluctuation result, we shall even need that M is \(C^6\), and we will add that assumption when needed. Of course, if the majority of the authors were topologists rather than probabilists, we would probably just have assumed that M is ‘smooth’ (i.e. \(C^\infty \)) and then not have been concerned with optimal levels of differentiability.

We need the standard exponential map (cf. [17]) that maps tangent vectors to points on the manifold. This, for \(x\in M\) and \(X\in T_x\widehat{M}\), the tangent space to x in \(\widehat{M}\), is given by the local diffeomorphism
$$\begin{aligned} \exp ^{\widehat{M}}_{x}(X) = \gamma _{x,\eta _X}(\Vert X\Vert ), \end{aligned}$$
where \(\gamma _{x,\eta _X}\) is the unit speed geodesic in \(\widehat{M}\) starting at x in the direction \(\eta _X\mathop {=}\limits ^{\Delta }X/\Vert X\Vert \in \ S(T_x\widehat{M})\), the (sphere of) unit tangent vectors at x. The notion of reach is closely related to the radius of the largest ball around the origins in \(T_x\widehat{M}\), \(x\in M\), for which all the exponential maps are, in fact, diffeomorphisms.
To give a more formal definition, let \(d_M(x,y)\) (\(d_{{\widehat{M}}}(x,y)\)) denote geodesic distance between points \(x,y\in M\) (\(\in {\widehat{M}}\)), and for \(x\in M\) and \(A\subset M\) set
$$\begin{aligned} d_M(x,A) \ \mathop {=}\limits ^{\Delta }\inf _{y\in A} d(x,y), \end{aligned}$$
with a similar definition for \(x\in {\widehat{M}}\) and \(A\subset {\widehat{M}}\).
Then the local reach of M in \(\widehat{M}\) at x, in a direction \(\eta \in S(T_x\widehat{M})\), is defined by
$$\begin{aligned} \theta _{\ell }(x,\eta )\ \mathop {=}\limits ^{\Delta }\ \sup \left\{ p:\,d_{\widehat{M}}\left( \exp ^{\widehat{M}}_x(p\,\eta ),M\right) =p\right\} . \end{aligned}$$
Thus, if \(p>\theta _{\ell }(x,\eta )\), there is a point \(y\ne x\) in M which is closer to \(\exp ^{\widehat{M}}_{x}(p\,\eta )\) than x is. The local reach of M in \(\widehat{M}\) at the point x is defined as
$$\begin{aligned} \theta (M,x)\ \equiv \ \theta (x)\mathop {=}\limits ^{\Delta }\inf _{\eta \in T_x^\perp M \cap S(T_x\widehat{M})}\theta _{\ell }(x,\eta ), \end{aligned}$$
where \(T_x^\perp M\) is the normal space at x, of M in \(\widehat{M}\). Taking an infimum over the entire manifold finally gives the global reach of M in \(\widehat{M}\):
$$\begin{aligned} \tau (M) \equiv \tau \ \equiv \ \theta (M,{{\widehat{M}}})\ \mathop {=}\limits ^{\Delta } \inf _{x\in M}\theta (x) \ \equiv \ \inf _{x\in M}\theta (M,x). \end{aligned}$$
A more picturesque definition of reach, in the Euclidean setting for which \({\widehat{M}}={\mathbb R}^n\), which also explains the terminology ‘critical radius’ is as follows: Imagine rolling a ball of radius r and dimension n over the manifold M, but in such a way that the ball only touches M at a single point. The largest choice of radius that allows this is the critical radius/reach.

The simplest Euclidean example is provided when M is a convex set, in which case its reach will be infinite. In fact, infinite reach characterizes these convex sets. If M is a sphere, then its reach is equal to its radius. If M is the disjoint union of two spheres, the reach is the minimum of the two radii and half of the closest distance between the spheres.

On the other hand, if \({{\widehat{M}}}\) is itself a sphere, and M a great circle, then the reach of M (in angular coordinates, and as a subset of the ambient sphere) will be \(\pi /2\). In general, the reach of a closed submanifold of a sphere will be no more than \(\pi /2\).

This is all you need to know about reach to skip to Sect. 4 and read the rest of the paper. The rest of this section is motivational.

2.2 Medial axis

An alternative way to think of reach is via the notions of the medial axis of M and its local feature size, notions which have been developed in the Computational Geometry community [3]. Given M embedded in \({{\widehat{M}}}\), define the set
$$\begin{aligned} G= \left\{ y\in {{\widehat{M}}}:\,\exists \ x_1\ne x_2\in M \text { such that } d_{{\widehat{M}}}(y,M) = d_{{\widehat{M}}}(y,x_1) = d_{{\widehat{M}}}(y,x_2)\right\} . \end{aligned}$$
The closure of G is called the medial axis, and for any \(x\in M\) the local feature size s(x) is the distance of x to the medial axis. It is easy to check that
$$\begin{aligned} \theta (M,{{\widehat{M}}}) = \inf _{x\in M} s (x). \end{aligned}$$
In fact, Federer’s original definition of reach in [9] for the purely Euclidean case was, implicitly, in terms of medial axes.

2.3 On tube formulae

As mentioned earlier, the birthplace of the notion of reach is Weyl’s volume of tubes formula, a classical result in Differential Geometry, and an extension of the much earlier Steiner’s tube formula for convex sets in \({\mathbb R}^n\). Interestingly, Weyl’s original paper [27] was motivated by a question raised by Hotelling [11] related to the derivation of confidence regions around regression curves. Both of these papers still make for fascinating (but not easy) reading today, and both generated enormous literatures, one mathematical (e.g. [10]) and one statistical (e.g. [12] and the literature referenced there). For its importance to Probability see, for example, [1] and the references therein.

Restricting ourselves to the Euclidean setting for the moment, define the tube of radius \(\rho >0\) around M in \({\widehat{M}} ={\mathbb R}^k\) to be
$$\begin{aligned} \text {Tube}(M,\rho ) = \left\{ x\in {\mathbb R}^k:\,\inf _{y\in M} \Vert y-x\Vert \le \rho \right\} . \end{aligned}$$
Then Weyl’s tube formula states that, for \(\rho <\theta (M,{\mathbb R}^k)\),
$$\begin{aligned} \text {Vol} \left( \text {Tube}(M,\rho )\right) =\sum _{j=0}^m\mathcal {L}_j(M)\omega _{k-j}\rho ^{k-j}, \end{aligned}$$
where Vol is k-dimensional Lebesgue volume, \(\omega _{n}\) denotes the volume of a unit n-dimensional ball, and the \({\mathcal {L}}_j{(M)}\) are the Lipschitz–Killing curvatures of M. These are also known as quermassintegrales, Minkowski, Dehn and Steiner functionals, and intrinsic volumes, although in many of these cases the indexing and normalisations differ. It is worth noting, as Weyl established in what he considered the part of [27] that required more than “what could have been accomplished by any student in a course of calculus”, that these functionals are intrinsic. That is, they are independent of the embedding of M into \({\mathbb R}^k\). (See for example, Lemma 10.5.1 in [1], where this fact is given a probabilistic proof in the notation we use here.)

It is hard to overstate the importance of (2.4), along with its variants for more general ambient spaces. The fact that the formula ceases to hold for \(\rho \) larger than the reach means that all the applications of tube formulae also fail at some point, and it is knowing where this point is that makes the reach such an important parameter of a manifold.

2.4 Condition numbers, manifold learning and learning homology

Standard manifold learning scenarios usually start with a ‘cloud’ of points \({\mathcal {X}}=\{x_1,\ldots ,x_n\}\) in some high dimensional space, which are believed to be sampled from an underlying manifold M of much lower dimension m, with or without additional noise. (Additional noise will mean that the points need not lie on M itself, but rather are sampled from some region near M.) A classical problem is to construct a set which approximates M in a useful fashion. This is a well known problem with a vast literature, and ‘useful’ here is usually taken to be mean physical closeness in some norm.

More recently, a new literature has appeared, motivated by ideas from Algebraic Topology, in which the aim of physical closeness is replaced with the aim of correctly recovering the topology of M. Two of the earliest papers in this area are [18, 19] (but see also [8]) and it is these papers that were in fact the original motivators of the current one.

In [19] the setup is that of a random sample from M, and the recovery method—or at least the theorems describing its properties—relies on knowing the reach \(\tau \) of M. In this case, choosing an \(\varepsilon \in (0, \tau /2)\), the simple union of \(\varepsilon \)-balls centered at the points of \({\mathcal {X}}\) is chosen as the estimator of M. That is
$$\begin{aligned} M_{estimate} \ \mathop {=}\limits ^{\Delta }\bigcup _{x\in {\mathcal {X}}} B_\varepsilon (x). \end{aligned}$$
The arguments in [19] follow a general paradigm towards solving a wide class of optimisation and other algorithmic problems, suggested by Steven Smale in [21]. Smale proposed a two-part scheme, the first part of which involves studying deterministic algorithms and determining their efficiency (in terms of running time, or some other cost function) as a function of input parameters such as a condition number of the input. (In the manifold setting, the condition number of the problem is essentially the inverse of the reach.) The second step involves averaging over random inputs, or random versions of the basic problem, to obtain either probabilistic tail estimates or average values for the algorithmic cost. [2] gives an insight into this approach in the setting of convex optimisation, but our interest is in the TDA setting.
Thus, the proofs in [19] start by showing that, if one has a dense enough subset of points in M, then M is a deformation retract of \(M_{estimate}\), and so both sets have the same homology. For the second stage, it shows that if a large enough sample is taken then one can bound, from below, the probability of the sample being dense enough. The final result is that, for all small \(\delta \), if
$$\begin{aligned} n\ >\ \beta _1\left( \log (\beta _2)+\log \left( \frac{1}{\delta }\right) \right) , \end{aligned}$$
$$\begin{aligned} \beta _1=\frac{4^m\, \text {vol}(M)}{\omega _m(\varepsilon \cos \gamma _1)^m},\qquad \beta _2=\frac{8^m\, \text {vol}(M)}{\omega _m(\varepsilon \cos \gamma _2)^m }, \end{aligned}$$
and \(\gamma _1=\arcsin (\varepsilon /8\tau )\) and \(\gamma _2=\arcsin (\varepsilon /16\tau )\), then the homology of \(M_{estimate}\) equals the homology of M with probability at least \(1-\delta \). A corresponding result for the case of sampling with noise is given in [18].

We have brought the above equations to show, explicitly, how the reach appears in the complexity of this estimation problem. The smaller the reach, is, the smaller one is forced to take \(\varepsilon \), and the larger the sample size n needs to be for a given estimation accuracy.

Of course, for a given problem, one does not know what M is, and so, a fortiori, little is known about its reach. Consequently, in the spirit of Smale’s two step procedure, we need to enrich the second stage by also averaging over possible M. The current paper is a step in this direction, by formulating a class of random Gaussian manifolds and beginning a study of their reach.

Moreover, the main result of the paper has an immediate application in the manifold learning situation. Although Theorem 4.3 relates only to a very specific random embedding of M into a high dimensional sphere, a liberal interpretation of the theorem implies that the part of the complexity of the estimation problem depending on the reach is more or less independent of any embedding of M into a higher dimensional space. The import of this is that there is no ‘curse of dimensionality’, related to reach, that involves the dimension of the ambient space.

Of course, we can only make these claims for the Gaussian-embedding that we study, but the fact that they are proven in the Gaussian case will alleviate concerns among practitioners, in general, that ambient dimension has an effect on reach. This was not known until now, even for a special case.

A second practical implication of this paper is the introduction, albeit implicit, of a new class of smooth random manifolds that are both reasonable and mathematically tractable. Recalling the two stage paradigm of Smale above, it would be interesting, and probably useful, to introduce into the TDA setting the notion of Bayesian optimization. In terms of the above homology learning example, by this we mean minimizing not the probability of correctly identifying the homology for a fixed (but unknown) M, but rather minimizing the expectation of some cost function of this probability, averaged over a (random) family of possible M. The calculations of the current paper, along with those of [15] which address issues of the asymptotic isometry of Gaussian-embedded Riemannian manifolds, show that the model introduced here allows for tractable mathematical manipulation.

3 Some (standard) notation

Many of our proofs, and even some of our definitions, freely use standard notation from Differential Geometry. Since we expect that not all readers will be familiar with this, we include here a brief notational guide. There are many standard texts to which one can turn for details. Lee’s book [17] is our favourite, but the quick and dirty treatment in Chapter 7 of [1] also suffices.

We are working with a Riemannian manifold (Mg), for which the Riemannian metric is, for each \(x\in M\), an inner product \(g_x\) on the tangent space \(T_xM\) to M at x. If \(\{(U_\alpha ,\phi _\alpha )\}_\alpha \) is an atlas for M, then for each chart \((U_\alpha ,\phi _\alpha )\) we shall often need a (local) orthonormal frame field \(X^{\alpha }=\{X_1^\alpha ,\dots ,X^\alpha _m\}\) for the tangent bundle \(T^\alpha M\mathop {=}\limits ^{\Delta }\{T_xM,\ x\in U_\alpha \}\), where orthonormality is in the metric g. We shall refer to this later as “choosing an orthonormal frame field”, without specific reference to charts or the index \(\alpha \). Since all our later definitions and calculations are local (i.e. can be carried out in terms of local charts) this is not a problem (and global issues such as parallelizability do not arise.)

If \(F:\,M\rightarrow {\mathbb R}\) is a smooth function, and \(X=X_x\in T_xM\), then by
$$\begin{aligned} XF(x),\ \ (XF)(x),\ \ (X_xF)(x),\ \ X_xF(x), \ \ X_xF_x,\ \ \text {etc}, \end{aligned}$$
we mean the derivative of F in direction \(X_x\) at x. At various times we will make use of all of these possible notations, so as to make individual formulae either clear and/or compact.
As opposed to the above derivatives, the gradient, \(\nabla F\), of F is the unique continuous vector field on M such that
$$\begin{aligned} g_x(\nabla F_x, X_x) = X_xF, \end{aligned}$$
for every vector field X. If F is a function of more than one parameter, say F(xy), then we will denote the gradient with respect to x as \(\nabla _xF(x,y)\), etc.
The (covariant) Hessian \(\nabla ^2F\) is the bilinear symmetric map from \(C^1(T(M)) \times C^1(T(M))\) to \( C^0(M)\) (i.e. it is a double differential form) defined by
$$\begin{aligned} (\nabla ^2F)(X,Y) \equiv \nabla ^2F(X,Y) \mathop {=}\limits ^{\Delta }XYF - \nabla _XYF = g(\nabla _X \nabla F, Y), \end{aligned}$$
where, while \(\nabla \) with no subscript denotes the gradient, when it is subscripted with a vector field, as in \(\nabla _X\), it indicates the Levi-Civita connection of (Mg).

It is standard that \(\nabla ^2F\) could also have been defined to be \(\nabla (\nabla F)\), which is from where the notation comes. Recall that in the simple Euclidean case the Hessian is typically considered to be the \(N\times N\) matrix \(H_F=(\partial ^2 F/\partial x_i\partial x_j)_{i,j=1}^N\). In the more general setting above, \(H_F\) defines the two-form by setting \(\nabla ^2f(X,Y)= XH_FY'\). In this case (3.2) follows from simple calculus.

We shall need the obvious, but important, fact that if x is a critical point of F (i.e. \(\nabla F(x)=0\)) then \(XF(x)=0\) for all \(X\in T_xM\) and so by (3.2) it follows that \( \nabla ^2F(X,Y)(x)= XYF(x)\). Consequently, at these points, the Hessian is independent of the metric g.

This concludes our brief excursion into notation. We can now turn to some details on Gaussian fields on manifolds before stating our main results.

4 Gaussian processes on manifolds, and the main theorem

As mentioned earlier, our basic reference for Gaussian processes is [1]. Here we shall only give the very minimum in definitions and notation needed for this paper.

4.1 Gaussian processes on Riemannian manifolds

We start, as usual, with a \(C^3\) compact manifold M, with or without an associated Riemannian metric g. (For the novice, Sect. 3 explains these terms and some of the following notation.)

The distribution of a real valued Gaussian process, or random field, \(f:\,M\rightarrow {\mathbb R}\), with zero mean (assumed henceforth) is then determined by its covariance function \(\mathbb {C}:\,M\times M\rightarrow {\mathbb R}\) given by
$$\begin{aligned} \mathbb {C}(x,y)= \mathbb {E}\{f(x)f(y)\}. \end{aligned}$$
If \(\mathbb {C}\) is smooth enough, the process also induces a Riemannian metric on the tangent bundle T(M) of M defined by
$$\begin{aligned} g_x(X,Y)\ \mathop {=}\limits ^{\Delta }\ \mathbb {E}\{(Xf)(x)\times (Yf)(x)\} = Y_y X_x \mathbb {C}(x,y)\big |_{y=x}, \end{aligned}$$
where XY are vector fields with values \(X_x,Y_x\) in the tangent space \(T_x M\). We shall assume throughout that \(\mathbb {C}\) is positive definite on \(M\times M\), from which it follows that g is a well defined Riemannian metric, which we call the metric induced by f.

From now on, we shall make one of two—essentially complementary—assumptions:

Assumption 4.1

If, in the above setting, we are given a manifold M as in Assumption 2.1 and a Gaussian process \(f:\,M\rightarrow {\mathbb R}\), but no metric on M, we shall assume that M is endowed with the metric induced by f.

If, on the other hand, we start with a Riemannian manifold (Mg), then we shall choose a Gaussian process in such a way that the metric induced by (4.1) is precisely g.

The fact that, given a metric g, there exists a Gaussian process inducing this metric, is a consequence of the Nash embedding theorem (cf. proof of Theorem 12.6.1 in [1]). This assumption is crucial to all that follows, and there is no known general topological or geometric theory for Gaussian processes when the metric on M is not the one induced by the process.

The only additional assumptions that we require relate to smoothness and non-degeneracy for f, but for this we need some notation. Thus we write, from now on, \(\nabla \) for the Levi-Civita connection of (Mg), and \(\nabla ^2\) for the corresponding covariant Hessian. Fix an orthonormal (with respect to g) frame field \(E=(E_1,\dots ,E_m)\) in T(M). The specific choice of E is not important.

Assumption 4.2

We assume that the zero mean Gaussian process \(f:\,M\rightarrow {\mathbb R}\) has, with probability one, continuous first, second, and third order derivatives over M, and, for each \(x\in M\), the joint distributions of the \((1+m+m(m+1)/2)\)-dimensional random vector
$$\begin{aligned} \left( f(x),\, ((\nabla f)(E_i))(x), \, ((\nabla ^2 f)(E_i,E_j))(x), \ 1\le i,j \le m \right) \end{aligned}$$
are nondegenerate.

We shall also assume that \(\mathbb {E}\{f^2(x)\}\), the variance of f, is constant, and for convenience, we take the constant to be one. No other homogeneity assumptions are required.

Conditions on the covariance function of f guaranteeing the differentiability requirements of the assumption are implicit in Corollary 11.3.2 of [1]. If M is a Euclidean domain with smooth boundary, then these conditions require that the various sixth-order partial derivatives of the covariance function satisfy
$$\begin{aligned} \left| C^{(6)}(x,x) + C^{(6)}(y,y) -2C^{(6)}(x,y)\right| \ \le \ K\,|\ln |x-y||^{-(1+\alpha )}, \end{aligned}$$
for all \(x,y\in M\), some finite K and \(\alpha >0\), and where \(C^{(6)}\) is a generic sixth-order partial derivative of the covariance function. For a general manifold corresponding conditions in terms of charts and atlases suffice. See Chapter 12 of [1] for details.

The degeneracy conditions (4.2) are close to trivial, but important. Together with smoothness, they ensure that the sample paths of f are a.s. Morse over M.

As an aside regarding Assumption 4.2, we note that the requirement that \(f\in C^3(M)\) can probably be done away with. It arises as a side issue in a tightness argument in Sect. 9.3, which requires a uniform bound on increments of fourth order derivatives of \(\mathbb {C}\). A (much) more complicated argument would probably require only that \(f\in C^{2+\epsilon }\) for some \(\epsilon >0\), but rather than lose sight of the forest for the trees we are happy to live with the extra smoothness. In fact, in order to prove the fluctuation result (1.5), we shall even have to assume that \(f\in C^6(M)\). We shall explain how the need for these high levels of smoothness arise below, when we have the requisite notation.

4.2 The parameter \(\sigma ^2_c(f)\)

Given the above setting, we now define a new Gaussian process on
$$\begin{aligned} M^{*}\ \mathop {=}\limits ^{\Delta } \ (M\times M)\setminus \text {diag}(M\times M) \end{aligned}$$
by setting
$$\begin{aligned} f^x(y)=\frac{f(y)-\mathbb {E}\left\{ f(y)\,\big | \, f(x),\nabla f(x)\right\} }{1-\mathbb {C}(x,y)}. \end{aligned}$$
The fact that this process is well defined is not obvious, since as \(y\rightarrow x\) in (4.4) both numerator and denominator approach zero. Nevertheless, as we shall show in Sect. 8.1, if we have enough smoothness for f, then the limit behaves well. For example, just to be certain that \(\lim _{y\rightarrow x} f^x(y)\) is well defined requires that \(f\in C^2\).

(In fact, ratios of the 0 / 0 nature appear throughout the proofs, with denominators such as \(1-\mathbb {C}(x,y)\) (as above), \(1-\mathbb {C}^2 (x,y)\), and even \((1-\mathbb {C}^2 (x,y))^2\), all of which are problematic as \(y\rightarrow x\). For the a.s. convergence of (1.4), this leads to the requirement that \(f\in C^3(M)\). For the fluctuation result (1.5) we will even need to assume that \(f\in C^6(M)\). While these conditions seem, at first, rather severe, they seem to be necessary and not just a consequence of our method of proof. For example, as far as the \(C^3\) requirement is concerned, it comes from the fact that local reach involves curvature, and so second order derivatives. However, the definition of \(f^x\) itself involves \(\nabla f\), leading to the requirement for three derivatives for f (or at least slightly more than \(C^2\)). For details, see Sect. 9.4 and, in particular, the proof of Lemma 12.2).

In any case, \(f\in C^3(M)\) is more than enough to ensure that it makes sense to define the function \(\sigma _c(f,\cdot )\), and the constant \(\sigma _c(f)\), as follows:
$$\begin{aligned} \sigma ^2_c(f,x)&\mathop {=}\limits ^{\Delta }&\sup _{y\in M\setminus x}\text {Var}\left( f^x(y)\right) , \end{aligned}$$
$$\begin{aligned} \sigma ^2_c(f)&\mathop {=}\limits ^{\Delta }&\sup _{x\in M}\sigma ^2_c(f,x). \end{aligned}$$
We now have everything we need to state the main result of the paper, but, first, we explain why the above two definitions are already ‘well known’.

Associated with the Gaussian process f are a reproducing kernel Hilbert space, H, and an \(L_2\) space, \(\mathcal H\), which is the completion of the span of f over M. Writing S(H) and \(S(\mathcal H)\) for the unit spheres of these spaces, there is an isometry, \(\Psi \) between M, when given the metric g induced by f, and the embedded submanifold \(\Psi (M)\subset S(\mathcal H)\), determined by \(\Psi (x)=f(x)\), for all \(x\in M\). There are also isometries between \(S(\mathcal H)\) and S(H), and so between M and a corresponding subset of S(H), the details of which can be found, for example, in Chapter 3 of [1], but which date back to the earliest history of Gaussian processes.

It turns out that \(\sigma ^2_c(f,x)\) is precisely the cotangent squared of the local reach of \(\Psi (M)\) at the point f(x), when \(S(\mathcal H)\) is considered as a submanifold of \(\mathcal H\). It follows immediately that \(\sigma ^2_c(f)\) is the cotangent squared of the corresponding global reach. Similar statements can be made about the isometric embedding of M into S(H), but would take longer to explain. The bottom line, however, is that both \(\sigma ^2_c(f,\cdot )\) and \(\sigma ^2_c(f)\) are basic quantities inherently connected to (Mg) when it is viewed via isometric embeddings into larger spaces, and that there is a lot of Hilbert sphere geometry lying behind the asymptotics of this paper.

These observations are relatively recent. In our current notation, they can be found in Section 14.4.3 of [1], but see also [22] and the references therein.

The reason that \(\sigma _c(f,\cdot )\) and \(\sigma _c(f)\) have been of more recent interest is that they arise in the rigorous justification of the so-called ‘Euler characteristic heuristic’ for approximating the distribution of the supremum of smooth Gaussian processes. In this setting, let \(\chi \left( A_u(f,M)\right) \) denote the Euler characteristic of the excursion set \(A_u\) of the Gaussian field f, defined by
$$\begin{aligned} A_u\ \equiv \ A_u(f,M) = \{x\in M: f(x)\ge u\}. \end{aligned}$$
It has been ‘well known’ for some decades that, at least for high levels \(u \in {\mathbb R}\), the mean Euler characteristic provides a good estimate of the exceedance probability, \(\mathbb {P}\left\{ \sup _{x\in M}f(x)\ge u\right\} \). That is, for large u, the difference
$$\begin{aligned} \text {Diff}_{f,M}(u)\ \mathop {=}\limits ^{\Delta }\ \mathbb {E}\left\{ \chi \left( A_u(f,M)\right) \right\} -\mathbb {P}\left\{ \sup _{x\in M}f(x)\ge u\right\} \end{aligned}$$
is small.
Relatively recently (cf. [1, 22, 24]) this statement has been made precise. (These sources actually treat the more general setting of stratified manifolds, which requires an additional condition of local convexity for M, as well as some minor side conditions on both M and f. The definition of \(\sigma ^2_c(f)\) is also correspondingly changed. See, for example, [23] for a discussion of why local convexity is required. In fact, what is required is close to positive reach, and the reason that (4.7) fails for zero reach is much the same reason that tube formulae fail. But that is another story.) In our setting, it is now known that
$$\begin{aligned} \liminf _{u\rightarrow \infty }-u^{-2}\log \left| \text {Diff}_{f,M}(u)\right| \ \ge \ \frac{1}{2}\left( 1+\frac{1}{\sigma ^2_c(f)}\right) . \end{aligned}$$

4.3 Main result

With the introduction, motivation and almost all of the notation behind us, we are almost ready to state the main result of the paper. However, two more items of notation are required. The first gives the local radius of \(h^k(M)\), as a submanifold of \(S^{k-1}\), at the image, under \(h^k=f^k/\Vert f^k\Vert \), of the point \(x\in M\) [cf. (1.2)]. This is given, for \(x\in M\), by
$$\begin{aligned} \theta _k(x) \ \mathop {=}\limits ^{\Delta }\ \inf _{\eta \in T_{h^k(x)}^\perp (h^k(M))\cap S(T_{h^k(x)}S^{k-1})} \theta _\ell (h^k(x),\eta ), \end{aligned}$$
where \(\eta \) is a unit vector in the tangent space \(T_{h^k(x)}S^{k-1}\) pointing in a normal direction to \(h^k(M)\) at \(h^k(x)\in h^k(M)\). The second gives the global reach, as
$$\begin{aligned} \theta _k \ \mathop {=}\limits ^{\Delta }\inf _{x\in M} \theta _k(x). \end{aligned}$$

Theorem 4.3

Let M be a manifold satisfying Assumption 2.1, and let \(f:\,M\rightarrow {\mathbb R}\) be a Gaussian process satisfying Assumptions 4.1 and 4.2. Assume that \(\sigma ^2_c(f)\), as defined by (4.6), is finite. Consider the embedding (1.2) of M into the unit sphere in \(\mathbb {R}^k\), and let \(\theta _k\) be the global reach of the random manifold \(h^k(M)\). Then, with probability one,
$$\begin{aligned} \cot ^2\theta _k\ \rightarrow \ \sigma _c^2(f), \quad \text {as } k\rightarrow \infty . \end{aligned}$$
If, in addition, M is \(C^6\), and the sample paths of f are a.s. \(C^6\) on M, then there exists a sequence \(\bar{\gamma }_k\) of random processes from \(M\rightarrow {\mathbb R}\), such that, for all \(x\in M\),
$$\begin{aligned} \sqrt{k}\left| \cot ^2\theta _k(x)-\sigma ^2_c(f,x)\right| \ \le \ \left| \bar{\gamma }_k(x)\right| , \end{aligned}$$
and a limit process \(\bar{\gamma }:\,M\rightarrow {\mathbb R}\), such that,
$$\begin{aligned} \bar{\gamma }_k(\cdot ) \ \Rightarrow \ \ \bar{\gamma }(\cdot ), \end{aligned}$$
where the convergence here is weak convergence, in the Banach space of continuous functions on M with supremum norm, and
$$\begin{aligned} \bar{\gamma }(x) = \sup _{y\in M\setminus x}\gamma (x,y), \end{aligned}$$
where \(\gamma \) is the Gaussian process over \(M^{*}\) defined by (11.14).

We defer all further discussion of the fluctuation result of (4.11) and (4.12) until Sect. 11, where it will be restated as Theorem 11.1, and the (rather involved) definition of the process \(\gamma \) will appear. Until then we shall concentrate on the a.s. convergence of (4.10).

As an aside, note that a variation of some of the easier arguments in the following sections show that the sequence of mappings \(h^k\) tends, with probability one, to an isometry, in the sense that the associated pullbacks to M of the usual metric on \(S^{k-1}\) tend to the induced metric (4.1) on M. We provide a rigorous treatment of this result in [15], albeit with the self-normalisation of (1.2) replaced by a \(\sqrt{k}\) normalisation. We also prove there that this gives rise to the a.s. convergence of a class of intrinsic functionals of \(h^k(M)\) to the corresponding functionals on (Mg). We refer you to [15] for details.

5 Computation of the reach

This section contains two purely geometric lemmas from which follow the probabilistic computations that make up most of the paper. The first gives a characterisation of the reach of general submanifolds of spheres, and the second does the same for the specific submanifolds \(h^k(M)\) in terms of the functions \(f^k\). To start, recall that geodesic distance on the sphere is measured in terms of angles, \(r\in [0,\pi )\). Let M be a submanifold of \(S^{k-1}\), and \(\eta _x\) a unit normal vector at \(x\in M\).

We can now state the following characterisation, which implicitly assumes, as we shall from now on, that M has dimension at least one. As stated it is identical to Lemma 2.1 of [22], restricted to our setting. ([22] treats the more general setting of stratified manifolds.). Furthermore, as pointed out there, the proof is essentially the same as the proof given in [12] for the one-dimensional case. Nevertheless, because of its importance to this paper, and (only) for the sake of completeness, we give the proof in “Appendix 1”.

An important point to note for the statement of this lemma is that since the manifold M is embedded in \(\mathbb {R}^k\), we can, and do, treat all tangent spaces \(T_xM\) as affine subspaces of \(\mathbb {R}^k\), with origin at x.

Lemma 5.1

Let M be a submanifold of \(S^{k-1}\), satisfying the conditions of Assumption 2.1. Let \(T^{\perp }_x M \subset T_x S^{k-1} \subset T_x\mathbb {R}^k\) be the normal space of M at x as it sits in \(S^{k-1}\), viz. the affine subspace of \(\mathbb {R}^k\) which is the orthogonal complement of \(\text {span}(T_x M\oplus \{x\})\subset T_x\mathbb {R}^k\) in \(T_x\mathbb {R}^k\). Then the reach, \(\theta (x)\), at x is given by
$$\begin{aligned} \cot ^2(\theta (x))= \sup _{y{\in M\setminus \{x\}}}\frac{\Vert P_{T^{\perp }_x M}y\Vert ^2}{(1-\langle x,y\rangle )^2}, \end{aligned}$$
where \(P_{T_x^{\perp }M}\) is orthogonal projection onto \(T_x^{\perp }M\).

We are now in a position to derive the global reach of our random manifold \(h^k(M)\). The result is given in the next lemma. However, before stating and proving the lemma, we need some preparatory definitions.

Recalling the embedding maps \(h^k\) and the components \(f^k\) of (1.2), let \((X_1,..,\) \(X_m)\) be a frame bundle of full rank over M, and define the \(k\times (m+1)\) matrix
$$\begin{aligned}L_x = \begin{bmatrix} f_1(x)&\quad X_1f_1(x)&\quad \cdots&\quad X_mf_1(x) \\ \vdots&\quad \vdots&\quad \vdots&\quad \vdots \\ f_k(x)&\quad X_1f_k(x)&\quad \cdots&\quad X_mf_k(x) \end{bmatrix},\end{aligned}$$
and the projection matrix
$$\begin{aligned} P_x=L_x\left( L_x^ {T}L_x\right) ^{-1}L_x^ {T}, \end{aligned}$$
assuming that k is large enough (\(k\ge 2m+1\)). By the independence of the \(f_j\) and the non-degeneracy of Assumption 4.2, the rows of \(L_x\) are a.s. linearly independent, and so \(L_x^ {T}L_x\) is a.s. invertible. The matrix \(P_x\) orthogonally projects vectors in \({\mathbb R}^k\) onto
$$\begin{aligned} \text {span}\left( f^k(x),\,f^k_{*}(X_i),\,1\le i\le m\right) , \end{aligned}$$
where \(f^k_{*}\) is the usual push forward operator.
Consider now the following expression, well known from the Statistics literature as the maximum likelihood estimate based on k samples of the correlation coefficient between f(x) and f(y); viz.
$$\begin{aligned} \widehat{\mathbb {C}}_k(x,y)=\frac{\sum _{j=1}^kf_j(x)f_j(y)}{\sqrt{\sum _{j=1}^k(f_j(x))^2\sum _{j=1}^k(f_j(y))^2}}. \end{aligned}$$
Consider the conditional process \(f^x(y)\) defined on \(M^{*}\) by (4.4) and denote k i.i.d. realizations of it at y by
$$\begin{aligned} f^{x,k}(y)=\left( f^x_1(y),\ldots ,f^x_k(y)\right) . \end{aligned}$$
Define an ‘error process’
$$\begin{aligned} E^{x,k}(y)= & {} \frac{k}{\Vert f^k(y)\Vert ^2}\frac{(1-\mathbb {C}(x,y))^2}{(1-\widehat{\mathbb {C}}_k(x,y))^2} \left( \frac{1}{k}\Vert P_x f^{x,k}(y)\Vert ^2\right) . \end{aligned}$$
The key lemma to be proven before starting probabilistic calculations is the following.

Lemma 5.2

Let M be a manifold satisfying the conditions of Assumption 2.1, embedded into \(S^{k-1}\) via the embedding map defined in (1.2). Assume that f satisfies Assumptions 4.1 and 4.2. Then, with probability one, the reach of \(h^k(M)\) is given by
$$\begin{aligned} \cot ^2 \theta _k =\sup _{x\in M}\sup _{y\in M\setminus \{x\}}\frac{k}{\Vert f^k(y)\Vert ^2}\frac{(1-\mathbb {C}(x,y))^2}{(1-\widehat{\mathbb {C}}_k(x,y))^2}\left( \frac{1}{k}\Vert f^{x,k}(y)\Vert ^2\right) -E^{x,k}(y). \end{aligned}$$


The global reach is obtained by taking infima of local reaches as given in (2.3). However, since the cotangent is decreasing in the first quadrant, we have
$$\begin{aligned} \cot ^2\theta _k = \sup _{x\in M}\cot ^2\theta _k(x). \end{aligned}$$
Using the result from Lemma 5.1, the above is equal to
$$\begin{aligned} \sup _{x\in M}\sup _{y\in M\setminus \{x\}}\frac{\Vert (I-P_x)h^k(y)\Vert ^2}{\left( 1-\langle h^k(x),h^k(y)\rangle \right) ^2}. \end{aligned}$$
Since f is centered Gaussian, its derivatives are also centered Gaussians. Furthermore, setting
$$\begin{aligned}v_k^{xy}\ \mathop {=}\limits ^{\Delta }\ \left( \mathbb {E}\left\{ f_1(y)\,\big | \, f_1(x),\nabla f_1(x)\right\} ,\ldots ,\mathbb {E}\left\{ f_k(y)\,\big | \, f_k(x),\nabla f_k(x)\right\} \right) ^T, \end{aligned}$$
we have \((I-P_x)v_k^{xy}=0\). This fact, along with (4.4) and (5.4), shows that
$$\begin{aligned} \cot ^2\theta _k= \sup _{x\in M}\sup _{y\in M\setminus \{x\}}\frac{k}{\Vert f^k(y)\Vert ^2}\frac{(1-\mathbb {C}(x,y))^2}{(1-\widehat{\mathbb {C}}_k(x,y))^2}\left( \frac{1}{k}\Vert (I-P_x)f^{x,k}(y)\Vert ^2\right) . \end{aligned}$$
From the fact that we have orthogonal projections, this is
$$\begin{aligned} \sup _{x\in M}\sup _{y\in M\setminus \{x\}}\frac{k}{\Vert f^k(y)\Vert ^2}\frac{(1-\mathbb {C}(x,y))^2}{(1-\widehat{\mathbb {C}}_k(x,y))^2}\left( \frac{1}{k}\Vert f^{x,k}(y)\Vert ^2\right) -E^{x,k}(y), \end{aligned}$$
and the lemma is proven. \(\square \)
We shall see later that the error term \(E^{x,k}(y)\) in (5.5) goes to zero, and so we shall be primarily concerned with the a.s. convergence of
$$\begin{aligned} \sup _{x\in M}\sup _{y\in M\setminus \{x\}}\frac{k}{\Vert f^k(y)\Vert ^2}\frac{(1-\mathbb {C}(x,y))^2}{(1-\widehat{\mathbb {C}}_k(x,y))^2}\left( \frac{1}{k}\Vert f^{x,k}(y)\Vert ^2\right) . \end{aligned}$$
For this, we need to establish convergence results for the three terms here. The results we need are stated as four lemmas in the next section.

6 Four key lemmas and the proof of the main theorem

The proof of Theorem 4.3 follows from the four lemmas stated below and is given at the end of this section. Throughout this section we shall assume, without further comment, that M satisfies the conditions of Assumption 2.1. The conditions on f vary, since not all the lemmas require the same level of smoothness. All the conditions, however, are implied by Assumptions 4.1 and 4.2.

We start by showing that the first two terms in (5.5) converge uniformly, with probability one, to 1.

Lemma 6.1

Let \(f^k\) be a \({\mathbb R}^k\)-valued random process on M, with i.i.d. components, each a centered, unit variance Gaussian process over M, with a.s. continuous sample paths. Then, with probability one,
$$\begin{aligned} \lim _{k\rightarrow \infty }\sup _{y\in M} \left| \frac{k}{\Vert f^k(y)\Vert ^2}-1\right| = 0. \end{aligned}$$

Lemma 6.2

Let \(f^k\) be as in the Lemma 6.1, but also \(C^3\). Denote the covariance function of its components by \(\mathbb {C}(x,y)\), and let \(\widehat{\mathbb {C}}_k(x,y)\) be as defined in (5.1). Then, with probability one,
$$\begin{aligned} \lim _{k\rightarrow \infty }\sup _{(x,y)\in M^{*}}\left| \left( \frac{1-\mathbb {C}(x,y)}{1-\widehat{\mathbb {C}}_k(x,y)}\right) ^2-1\right| = 0. \end{aligned}$$

The third lemma (after some trivial calculations) will—see below—give us that the remaining term in (5.5) converges to the parameter \(\sigma ^2_c(f)\).

Lemma 6.3

Under the same assumptions as in Lemma 6.2, and with probability one,
$$\begin{aligned} \lim _{k\rightarrow \infty }\sup _{(x,y)\in M^{*}}\left| \frac{\Vert f^{x,k}(y)\Vert ^2}{k}-\text {Var}\left( f^x(y)\right) \right| = 0. \end{aligned}$$

It will follow from the proof of this lemma that \(f^x(y)\) is bounded even when we are arbitrarily close to diag(\(M\times M\)). This is needed to ensure that all the terms defined in (5.5) are, a priori, well defined.

The final step we need is the following.

Lemma 6.4

Under the same assumptions as in Lemma 6.2, and with \(E^{x,k}(y)\) as defined in (5.2), we have, with probability one,
$$\begin{aligned}\lim _{k\rightarrow \infty }\sup _{(x,y)\in M^{*}}E^{x,k}(y)= 0.\end{aligned}$$

We now show how to prove the main result as a straightforward consequence of the previous four lemmas.

Proof of Theorem 4.3

It is immediate from the results of Lemmas 5.2, 6.1, 6.2 and 6.4 that, with probability one,
$$\begin{aligned} \lim _{k\rightarrow \infty }\cot ^2\theta _k= \lim _{k\rightarrow \infty }\sup _{(x,y)\in M^{*}}\frac{\Vert f^{x,k}(y)\Vert ^2}{k}, \end{aligned}$$
and we shall be done once we show that the right hand limit is \(\sigma ^2_c(f)\).
However, this is immediate from the much stronger result in Lemma 6.3 that
$$\begin{aligned} \lim _{k\rightarrow \infty }\sup _{(x,y)\in M^{*}}\left| \frac{\Vert f^{x,k}(y)\Vert ^2}{k}-\text {Var}\left( f^x(y)\right) \right| = 0 \end{aligned}$$
and that, by definition,
$$\begin{aligned} \sigma ^2_c(f) = \sup _{(x,y)\in M^{*}} \text {Var}\left( f^x(y)\right) . \end{aligned}$$
This completes the proof of Theorem 4.3, modulo proving the four lemmas. \(\square \)

7 Proof of Lemma 6.1

We need to prove that
$$\begin{aligned} \lim _{k\rightarrow \infty }\sup _{y \in M}\left| \frac{k}{\Vert f^k(y)\Vert ^2}-1\right| = 0,\qquad \text {a.s}. \end{aligned}$$
However, this follows almost trivially from the following standard strong law for Banach space valued random variables, which, since we use it often, we state in full.

Theorem 7.1

([16], Corollary 7.10) Let X be a Borel random variable with values in a separable Banach space B with norm \(\Vert \cdot \Vert _{\text {B}}\). Let \(S_n\) be the partial sum of n i.i.d. realizations of X. Then,
$$\begin{aligned} \frac{S_n}{n}\mathop {\longrightarrow }\limits ^{\text {a.s.}} 0\end{aligned}$$
if, and only if, \(\mathbb {E}\{\Vert X\Vert _{\text {B}}\}<\infty \) and \(\mathbb {E}\{X\}=0\).
To prove (7.1), we set \(X=(f(y))^2-1\) in the above theorem. The Banach space B is C(M) (continuous functions over M), equipped with the sup norm. The mean zero condition is trivial. To check the moment condition on the norm of \((f)^2-1\), note that
$$\begin{aligned} \mathbb {E}\left\{ \sup _{y\in M}|(f(y))^2-1|\right\}\le & {} 1+\mathbb {E}\left\{ \sup _{y\in M}|(f(y))^2|\right\} \\\le & {} 1+\mathbb {E}\left\{ \left( \sup _{y\in M}|f(y)|\right) ^2\right\} <\infty . \end{aligned}$$
Finiteness of the expectation here follows from the Borell–Tsirelson–Ibragimov–Sudakov inequality (e.g. Theorem 2.1.2 in [1]). This is all that is needed to prove (7.1).

8 Proof of Lemma 6.3

Before starting this proof in earnest, we need to check that all the terms that are implicitly assumed to exist in the statement of the lemma are well defined. In particular, we need to consider the limits
$$\begin{aligned} \lim _{y\rightarrow x} f^x(y) = \lim _{y\rightarrow x} \frac{f(y)-\mathbb {E}\left\{ f(y)\,\big | \, f(x),\nabla f(x)\right\} }{1-\mathbb {C}(x,y)}, \end{aligned}$$
the problem being that both numerator and denominator tend to zero in the limit. If (8.1) is not well defined, then the supremum in the lemma makes no sense. (Note that away from the diagonal in \(M\times M\) there is no problem with either boundedness or continuity, due to the assumed smoothness of f.)

8.1 The limit (8.1) is well defined

The proof is basically an application of L’Hôpital’s rule. To start, we take an orthonormal frame field \(X=\{X_1,\dots ,X_m\}\) for the tangent bundle of M, where orthonormality is in the induced metric g of (4.1) and with the conventions described in Sect. 3.

Then standard computations for this situation (cf. Section 12.2.2 of [1] for precisely this case) give that the vector \((f(y),f(x),\nabla f(x))\) has a mean zero, multivariate Gaussian distribution with covariance matrix
$$\begin{aligned} \left[ \begin{matrix} 1 &{} \mathbb {C}(x,y) &{} \nabla _x\mathbb {C}(x,y) \\ \mathbb {C}(x,y) &{} 1 &{} 0\\ \nabla _x \mathbb {C}(x,y) &{} 0 &{} 1 \end{matrix} \right] . \end{aligned}$$
From this and the definition of Gaussian conditional expectations, it immediately follows that
$$\begin{aligned} f^x(y) = f(x)+\frac{f(y)-f(x)}{1-\mathbb {C}(x,y)}-\sum _{i=1}^m\frac{ {X}_i f(x)\,{X}_i \mathbb {C}(x,y)}{1-\mathbb {C}(x,y)}. \end{aligned}$$
Now take any \(X=\sum _{i=1}^m d_i X_i \in T_xM\), and let c be a \(C^2\) curve in M such that
$$\begin{aligned} c:\,(-\delta ,\delta )\rightarrow M, \quad c(0)=x,\quad \dot{c}(0)=X. \end{aligned}$$
As \(y\rightarrow x\) along this curve, we have
$$\begin{aligned} \qquad \lim _{y\rightarrow x}f^x(y)=\lim _{u\rightarrow 0}\left[ f(x)-\frac{\left( \frac{ {d}f(c(u))}{{d}u}-\sum _{i=1}^m X_i f(x)\frac{ {d}X_i \mathbb {C}(x,c(u))}{ {d}u}\right) }{\frac{ {d}\mathbb {C}(x,c(u))}{ {d}u}}\right] .\qquad \qquad \end{aligned}$$
Consider the limit of the ratio in the above expression, this being the only problematic term. This is
$$\begin{aligned} \frac{(\sum _i d_iX_i)f(x)-\sum _i(X_i f(x)(\sum _j d_j(X_jX_i)\mathbb {C}(x,x))}{(\sum _i d_iX_i)\mathbb {C}(x,x)}. \end{aligned}$$
Note that because of our choice of Riemannian metric, and the fact that the \(X_i\) were chosen to be orthonormal, we have
$$\begin{aligned} X_jX_i \mathbb {C}(x,x)= g(X_i,X_j) = \delta _{ij}, \end{aligned}$$
the Kronecker delta. Therefore the numerator in (8.4) is zero. The denominator is also zero because of the assumption of constant variance on f. Thus, to find the true limit, another application of L’Hôpital’s rule is necessary, and so we have
$$\begin{aligned} \lim _{y\rightarrow x} f^x(y)=f(x)-\lim _{u\rightarrow 0}\left( \frac{\frac{ {d}^2f(c(u))}{ {d}u^2}-\sum _iX_if(x)\frac{ {d}^2X_i\mathbb {C}(x,c(u))}{ {d}u^2}}{\frac{ {d}^2\mathbb {C}(x,c(u))}{ {d}u^2}}\right) . \end{aligned}$$
It is easy to see that
$$\begin{aligned} \lim _{u\rightarrow 0}\frac{ {d}^2f(c(u))}{ {d}u^2}=\nabla ^2f(x)(X,X)+\nabla _{X}Xf(x), \end{aligned}$$
$$\begin{aligned} \lim _{u\rightarrow 0}\frac{ {d}^2X_i \mathbb {C}(x,c(u))}{ {d}u^2}= & {} XXX_i\mathbb {C}(x,x)\\= & {} \mathbb {E}\{XXf\,X_if\} \\= & {} \mathbb {E}\{\nabla _XXf\,X_if\} \\= & {} g(\nabla _XX,X_i), \end{aligned}$$
where in the second-last equality, we have used calculations from [1] [cf. Eqn. (12.2.14)]. Consequently,
$$\begin{aligned} \lim _{u\rightarrow 0}\sum _iX_if(x)\frac{ {d}^2X_i \mathbb {C}(x,c(u))}{ {d}u^2}= & {} \sum _ig(\nabla _XX,X_i)X_if(x)\\= & {} \nabla _XXf(x), \end{aligned}$$
and so, moving to the notation of 2-forms, the limit in (8.3) is given by the well defined expression
$$\begin{aligned} f(x)-\frac{\nabla ^2f(x)(X,X)}{\nabla ^2 \mathbb {C}(x,x)(X,X)}, \end{aligned}$$
and the limit in (8.1), albeit dependent on the path of approach of y to x, is also well defined. As a consequence, we also have that, for each finite k,
$$\begin{aligned} \sup _{(x,y)\in M^{*}}\frac{\Vert f^{x,k}(y)\Vert ^2}{k} \end{aligned}$$
is a.s. finite.

8.2 Completing the proof

We now turn to the proof of the lemma, establishing (6.1).

This, however, follows exactly along the lines of the proof of Lemma 6.1, again applying Theorem 7.1. We need only take as our Banach space \(C_b(M^{*})\), the bounded, continuous functions on \(M^{*}\) with supremum norm, and as our basic random variable \(X=(f^{x}(y))^2-\text {Var}\left( f^x(y)\right) \).

The previous subsection establishes the a.s. boundedness of X needed to make the argument work.

9 Proof of Lemma 6.2

Lemma 6.2 involves showing that the ratio
$$\begin{aligned} \frac{1-\mathbb {C}(x,y)}{1-\widehat{\mathbb {C}}_k(x,y)} \end{aligned}$$
converges, uniformly, to one, as \(k\rightarrow \infty \). For given \(x\ne y\), this is straightforward, following from a strong law of large numbers, much as in the previous two proofs. However, as \(x\rightarrow y\), even for fixed k, there is no easy way to find a uniform bound on the ratio, since both numerator and denominator tend to zero.

9.1 Outline of the proof

We start by writing \(\widehat{\mathbb {C}}\) as a sum of three terms:
$$\begin{aligned} \widehat{\mathbb {C}}_k(x,y)= \mathbb {C}(x,y)+\text {Bias}(\widehat{\mathbb {C}}_k (x,y))+\xi _k(x,y), \end{aligned}$$
where \(\xi _k(x,y)\) is mean zero, random error with variance \(\text {Var}(\widehat{\mathbb {C}}_k)\), and the deterministic bias term is \(\mathbb {E}\{\widehat{\mathbb {C}}_k -\mathbb {C}\}\).
We shall show in “Appendix 2” that
$$\begin{aligned} \text {Bias}(\widehat{\mathbb {C}}_k(x,y))= & {} \frac{-\mathbb {C}(x,y)(1-\mathbb {C}^2(x,y))}{2{(k+1)}}+O\left( \frac{1}{k^2}\right) , \end{aligned}$$
$$\begin{aligned} \text {Var}(\widehat{\mathbb {C}}_k(x,y))= & {} \frac{(1-\mathbb {C}^2(x,y))^2}{k}+O\left( \frac{1}{k^2}\right) . \end{aligned}$$
and that the remainder terms in both expressions are uniformly bounded over \( M\times M\). (In fact, this is almost classical, in that expressions for the bias and variance of the correlation coefficient estimator centered around the sample means (as opposed to \(\widehat{\mathbb {C}}\), which is centered at zero) are well known in the Statistics literature, dating back, at least, to [14] [Chapter 16, see (16.73) and (16.74)]. “Appendix 2” treats the zero-centered \(\widehat{\mathbb {C}}\) case.)
For notational convenience, set
$$\begin{aligned} {\widetilde{\zeta }}_k(x,y)\ \mathop {=}\limits ^{\Delta }\ \sqrt{k}\left( \widehat{\mathbb {C}}_k(x,y)-\mathbb {C}(x,y)\right) \end{aligned}$$
Since the notation is getting long, from now on, we interchangeably use \(a_k(x,y)\) and \(a_k^{xy}\) for a function \(a_k\) of x and y, refrain from writing out explicitly the summation indices and their range in some situations where they are obvious, and also introduce
$$\begin{aligned} \beta _k^{xy} \ \mathop {=}\limits ^{\Delta }\ \text {Bias} \left( \widehat{\mathbb {C}}_k(x,y)\right) . \end{aligned}$$
Then, in view of (9.1), and up to a term of \(O(k^{-2})\) in the denominator, we have
$$\begin{aligned} \left( \frac{1-\mathbb {C}(x,y)}{1-\widehat{\mathbb {C}}_k(x,y)}\right) = \frac{1-\mathbb {C}^{xy}}{1-\mathbb {C}^{xy}+\frac{\mathbb {C}^{xy}(1-(\mathbb {C}^{xy})^2)}{2{(k+1})}-\sqrt{k}\frac{\left( \widehat{\mathbb {C}}_k^{xy}-\mathbb {C}^{xy}-\beta _k^{xy}\right) }{1-(\mathbb {C}^{xy})^2}\frac{1-(\mathbb {C}^{xy})^2}{\sqrt{k}}}.\qquad \qquad \end{aligned}$$
Cancelling \((1-\mathbb {C}^{xy})\) from numerator and denominator, this becomes
$$\begin{aligned} \left( {1+\frac{\mathbb {C}^{xy}(1+\mathbb {C}^{xy})}{2{(k+1)}}-\sqrt{k}\frac{\left( \widehat{\mathbb {C}}_k^{xy}-\mathbb {C}^{xy}-\beta _k^{xy})\right) }{1-(\mathbb {C}^{xy})^2}\frac{1+\mathbb {C}^{xy}}{\sqrt{k}}}\right) ^{-1}. \end{aligned}$$
The only problematic term here is
$$\begin{aligned} \sqrt{k}\frac{\left( \widehat{\mathbb {C}}_k(x,y)-\mathbb {C}(x,y)-\beta _k(x,y))\right) }{1-\mathbb {C}^2(x,y)}, \end{aligned}$$
since the second term converges deterministically to zero, and the final multiplicative factor is bounded by \(2/\sqrt{k}\). We shall prove that the sequence of random processes defined by (9.6) converges weakly to a continuous Gaussian process on \(M^{*}\). This, the extra divisor of \(\sqrt{k}\) in (9.5), and some elementary probability arguments which we leave to the reader, will be enough to prove Lemma 6.2.
In fact, in view of (9.1), we can drop the bias term from (9.6), and suffice with the weak convergence, over \(M^{*}\), of the processes
$$\begin{aligned} \zeta _k(x,y) \ \mathop {=}\limits ^{\Delta }\ \frac{ {\widetilde{\zeta }}_k(x,y)}{1-\mathbb {C}^2(x,y)} = \sqrt{k} \frac{\widehat{\mathbb {C}}_k(x,y)-\mathbb {C}(x,y)}{1-\mathbb {C}^2(x,y)}. \end{aligned}$$
We shall prove this in a number of stages.
To start, we show the weak convergence of the numerator in (9.7)—\({\widetilde{\zeta }}_k\)—which is much less delicate than that of the ratio \(\zeta _k\), there being no 0 / 0 issues. The convergence of the finite dimensional distributions is shown in the following Sect. 9.2 and the tightness in 9.3. The final step is to apply the continuous mapping theorem, (e.g. [5], Section 1.5) for which we need to know that the mapping between function spaces that takes
$$\begin{aligned} \phi (x,y)\ \rightarrow \ \frac{\phi (x,y)}{1-\mathbb {C}^2(x,y)} \end{aligned}$$
is continuous, with probability one, for \({\widetilde{\zeta }}\), the process on \(M\times M\) which is the limit of the \({\widetilde{\zeta }}_k\). We have already seen that ratios like that on the right hand side here are problematic, and computable, at the \(y\rightarrow x\) limit, only via L’Hôpital’s rule. Consequently, the weak convergence of the \({\widetilde{\zeta }}_k\) is going to have to be in a function space with a norm that takes into account convergence of derivatives as well as the function values. This is going to make the tightness argument rather intricate, which is why Sect. 9.3 is the longest in the paper. The continuous mapping argument will be given at the end, in the brief Sect. 9.4.

9.2 Fi-di convergence of \({\widetilde{\zeta }}_k\), and characterising the limit

The main result of this section is the following.

Lemma 9.1

The finite-dimensional distributions of \( {\widetilde{\zeta }}_k\), on \(M\times M\), converge to those of the zero mean, Gaussian process, \({\widetilde{\zeta }}\), with covariance function given by
$$\begin{aligned} \mathbb {E}\{ {\widetilde{\zeta }}(x_0,y_0) {\widetilde{\zeta }}(x,y)\}= & {} \frac{1}{2}{\mathbb {C}^{x_0 y_0}\mathbb {C}^{xy}} \left[ (\mathbb {C}^{y_0 x})^2+(\mathbb {C}^{y_0 y})^2+(\mathbb {C}^{x_0 x})^2+(\mathbb {C}^{x_0 y})^2\right] \nonumber \\&+\,\mathbb {C}^{xy_0}\left[ \mathbb {C}^{x_0 y}-\mathbb {C}^{x_0 x}\mathbb {C}^{xy}\right] +\mathbb {C}^{y_0 y}\left[ \mathbb {C}^{x_0 x}-\mathbb {C}^{x_0 y}\mathbb {C}^{xy}\right] \nonumber \\&-\,\mathbb {C}^{x_0 y_0}\left[ \mathbb {C}^{x_0 x}\mathbb {C}^{x_0 y}+\mathbb {C}^{y_0 y}\mathbb {C}^{xy_0}\right] . \end{aligned}$$

The proof will rely on the following result of Anderson.

Theorem 9.2

([4], Theorem 4.2.3) Let \(\{U(k)\}\) be a sequence of d-component random vectors and b a fixed vector such that \(\sqrt{k}(U(k)-b)\) has the limiting distribution \(\mathcal {N}(0,T)\) as \(k\rightarrow \infty \). Let g(u) be a vector-valued function of u such that each component \(g_j(u)\) has a nonzero differential at \(u=b\), and let \(\psi _b\) be the matrix with (ij)-th component \(({\partial g_j(u)}/{\partial u_i})|_{u=b}\). Then \(\sqrt{k}(g(U(k))-g(b))\) has the limiting distribution \(\mathcal {N}(0,\psi _b^\prime T\psi _b)\).

Proof of Lemma 9.1 As one might guess from the complicated form of (9.9) the calculations involved are somewhat tedious, and so we shall concentrate on making the main steps clear.

Towards that end, we introduce the following notation just for this proof. For any \(i,j\in \mathbb {N}\), and points \((x_i,y_j)\in M\times M\), define
$$\begin{aligned} \bar{C}^{x_i ,y_j}_{11}(k)= & {} \Vert f^k(x_i)\Vert ^2,\\ \bar{C}^{x_i, y_j}_{22}(k)= & {} \Vert f^k(y_j)\Vert ^2,\\ \bar{C}^{x_i ,y_j}_{12}(k)= & {} \sum _{\ell =1}^k f_\ell (x_i)f_\ell (y_j). \end{aligned}$$
Now define
$$\begin{aligned} U(k)= & {} \frac{1}{k} \left( \bar{C}_{11}^{x_1,y_1}, \bar{C}_{22}^{x_1,y_1}, \bar{C}_{12}^{x_1,y_1}, \dots , \bar{C}_{11}^{x_n,y_n}, \bar{C}_{22}^{x_n,y_n}, \bar{C}_{12}^{x_n,y_n}\right) , \\ b= & {} \left( 1, 1, \mathbb {C}^{x_1,y_1}, \dots , 1, 1, \mathbb {C}^{x_n,y_n} \right) . \end{aligned}$$
Thus, the elements of U are the maximum likelihood estimators of the corresponding elements of b. It then follows from standard estimation theory (e.g. [4], Theorem 3.4.4) that \(\sqrt{k}(U(k)-b)\) has a limiting normal distribution with mean 0 and some covariance matrix T, the specific structure of which does not concern us at the moment.
In order to prove the lemma, we require the asymptotic distribution of
$$\begin{aligned} \left\{ \sqrt{k}(\widehat{\mathbb {C}}_k(x_i,y_i)-\mathbb {C}(x_i,y_i))\right\} _{i=1}^n. \end{aligned}$$
However, using the vector U above it is easy to relate the \(\widehat{\mathbb {C}}\)s to the \(\bar{C}\)s, and if we now define a function \(g:\,\mathbb {R}^{3n}\rightarrow \mathbb {R}^n\) by
$$\begin{aligned} g(u_1,u_2,\ldots ,u_{3n})=\left( \frac{u_3}{\sqrt{u_1 u_2}},\frac{u_6}{\sqrt{u_4 u_5}},\ldots , \frac{u_{3n}}{\sqrt{u_{3n-1} u_{3n-2}}}\right) , \end{aligned}$$
then Theorem 9.2 establishes the claimed convergence of finite dimensional distributions, and so proves the lemma, modulo two issues.

The first is the condition on the differential required by Theorem 9.2, but this is trivial. The second is to derive the exact form (9.9) of the limiting covariances, which, while not intrinsically hard, is a long and tedious calculation. The calculation starts by writing out the covariance function for \(\widehat{\mathbb {C}}\) and computing moments, all of which involve Gaussian variables. Fortunately, most of the detailed calculations that we need were carried out long ago in the statistical literature and, can be found, for example, in [13] [e.g. Chapter 41, Example 41.6]. What remains is to send \(k\rightarrow \infty \) in these expressions, and deduce (9.9). We shall not go through the tedious details here. \(\square \)

9.3 Tightness of \( {\widetilde{\zeta }}_k\)

For the reasons alluded to above and exploited below, we shall prove tightness in the Banach space of twice continuously differentiable functions on \(M\times M\), which we denote by \({B^{(2)}}\), equipped with the norm
$$\begin{aligned} \Vert f\Vert _{B^{(2)}}\mathop {=}\limits ^{\Delta }\max \left\{ \Vert f\Vert _{\infty },\, \Vert \nabla f\Vert _{\infty },\, \Vert \nabla ^2 f\Vert _{\infty }\right\} , \end{aligned}$$
where the norms of the first and second order derivatives are obtained by taking maximum over the norms of the 2m and \(4m^2\) components of the Riemannian differential and Hessian, respectively.
To break the rather long proof of tightness into bite sized pieces, we write
$$\begin{aligned} {\widetilde{\zeta }}_k(x,y) = \alpha _k(x,y)\, +\, \Delta _k(x,y), \end{aligned}$$
$$\begin{aligned} \alpha _k(x,y)\ \mathop {=}\limits ^{\Delta }\ \sqrt{k}\left( \frac{\frac{1}{k}\sum _{j=1}^k (f_j(x) f_j(y)\, - \, \mathbb {C}(x,y))}{\sqrt{\frac{\sum (f_j(x))^2}{k}}\sqrt{\frac{\sum (f_j(y))^2}{k}}}\right) \end{aligned}$$
$$\begin{aligned} \Delta _k(x,y) \ \mathop {=}\limits ^{\Delta }\ \sqrt{k}\left( 1-\sqrt{\frac{\sum (f_j(x))^2}{k}}\sqrt{\frac{\sum (f_j(y))^2}{k}}\right) \frac{ \mathbb {C}(x,y)}{\sqrt{\frac{\sum (f_j(x))^2}{k}}\sqrt{\frac{\sum (f_j(y))^2}{k}}}. \end{aligned}$$
In the following two subsections we shall prove that the sequences \(\alpha _k\) and \(\Delta _k\) converge weakly in \({B^{(2)}}\), from which the convergence of \({\widetilde{\zeta }}_k\) immediately follows.

9.3.1 \(\alpha _k\) converges weakly in \({B^{(2)}}\)

We start with something even simpler than \(\alpha _k\), viz. the sequence of random functions \(\eta _k\) defined by
$$\begin{aligned} \eta _k(x,y)= \sqrt{k}\left( \frac{1}{k}\sum f_j(x) f_j(y)\, -\, \mathbb {C}(x,y)\right) . \end{aligned}$$
To prove the weak convergence of this sequence, we use the theorem stated as part of Example 1.5.10 in [26] [p. 41] (also see the discussion after the statement of the theorem).

To this end, note that the summands in (9.13) are i.i.d. copies of the random function \(f\otimes f:\,M\times M\rightarrow {\mathbb R}\) defined by \((f\otimes f) (x,y) = f(x)f(y)\). If we endow \(M\times M\) with the topology induced by the Riemannian distance \(d_{M\times M}\) (this is the metric we use in place of the semi-metric in the theorem from [26]), then \(M\times M\) is compact in this topology. Since the convergence of the finite dimensional distributions of (9.13) follows from Theorem 9.2, all that is left to check for weak convergence of (9.13) is tightness.

In order to show tightness, we first need to set up some notation, in particular for Taylor expansions on \(M\times M\), in terms of Riemannian normal coordinates.

Consider normal neighbourhoods \(U_1,U_2\) (local coordinates \((x^i),(y^i),\) respectively) around \(x_0\) and \(y_0\) in M, and take \(U_1\times U_2\) around \((x_0,y_0)\) in \(M\times M\). Then, \((x^i:y^i)\) give us the following definition of coordinates in the product space \(U_1\times U_2\):
$$\begin{aligned} (x^1,\dots , x^m:y^1,\dots , y^m)(x_0,y_0)= \left[ (x^1, \dots , x^m)(x_0):(y^1,\dots , y^m)(y_0)\right] . \end{aligned}$$
$$\begin{aligned} T_{(x_0,y_0)}(U_1\times U_2)=T_{x_0}U_1\oplus T_{y_0}U_2, \end{aligned}$$
any tangent vector v in the product tangent space splits uniquely as the sum of \(v_1 \in T_{x_0}U_1\) and \(v_2 \in T_{y_0}U_2\). This further gives us the following definition for the exponential map in \(M\times M\):
$$\begin{aligned} \exp _{(x_0,y_0)}^{M\times M}(v)=\left( \exp _{x_0}^M(v_1),\exp _{y_0}^M(v_2)\right) . \end{aligned}$$
Let the coordinate basis vectors for the tangent spaces be denoted by \(\left( \frac{\partial }{\partial x^i}\right) \) and \(\left( \frac{\partial }{\partial y^i}\right) \), considered as row vectors. The concatenation of the two serves as a basis for the product tangent space, and so any vector v in this space can be written as
$$\begin{aligned} v= v_1\oplus v_2 = \sum _{i=1}^mv^i\frac{\partial }{\partial x^i}+\sum _{i=1}^{m}v^{i+m}\frac{\partial }{\partial y^i}. \end{aligned}$$
This allows us to write the following Taylor expansion for \(\mathbb {C}(x,y)\) about \((x_0,y_0)\):
$$\begin{aligned} \mathbb {C}(x,y)= & {} \mathbb {C}(x_0,y_0)\, +\, v\left[ \left( \frac{\partial \mathbb {C}(x,y)}{\partial x^1},\dots ,\frac{\partial \mathbb {C}(x,y)}{\partial y^m}\right) |_{(x_0,y_0)}\right] ^T \\&+\, \frac{1}{2}v\left[ \frac{\partial ^2\mathbb {C}(x,y)}{\partial ^k x_i \partial ^l y_j}|_{(x_0,y_0)}\right] v^T \ +\ O(\Vert v\Vert ^3), \end{aligned}$$
where \(k+l=2\). Finally, we recall a few important facts from the topic of normal coordinates and geodesics (cf. [17]) that the geodesic starting from \((x_0,y_0)\) in the direction v is given in Riemannian normal coordinates by \(t(v^1\cdots v^{2m})\), geodesics are locally minimizing, and so along with the previous fact, we have \(\Vert v\Vert ^2=d^2_{M\times M}((x,y),(x_0,y_0))\). Also, importantly, since the Christoffel symbols vanish at the centers of the normal charts, covariant derivatives at the centers reduce to usual partial derivatives. Therefore, working with normal coordinates is useful in local calculations.
Returning now to the issue of tightness, we need to establish moment bounds on the second derivatives of the processes \(\eta _k\) of (9.13). Clearly, the variance and correlation function of \(\eta _k\), are, respectively, the variance of \(f(x)f(y)-\mathbb {C}(x,y)\) and the correlation
$$\begin{aligned} E\left\{ (f(x)f(y)-\mathbb {C}(x,y))(f(x_0)f(y_0)-\mathbb {C}(x_0,y_0))\right\} . \end{aligned}$$
To investigate second derivatives, it is useful to move to the notation of 2-forms. Doing so, it follows from simple algebra that the diagonal elements of the Hessian matrix of this process are given by
$$\begin{aligned}&\bigg \{\nabla ^2 f(x)\left( \frac{\partial }{\partial x^i},\frac{\partial }{\partial x^i}\right) f(y)-\nabla ^2 \mathbb {C}^{xy}\left( \frac{\partial }{\partial x^i},\frac{\partial }{\partial x^i}\right) , \\&\quad \nabla ^2 f(y)\bigg (\frac{\partial }{\partial y^j},\frac{\partial }{\partial y^j}\bigg ) f(x)-\nabla ^2 \mathbb {C}^{xy}\left( \frac{\partial }{\partial y^j},\frac{\partial }{\partial y^j}\right) \bigg \}, \quad {1\le i,j\le m}, \end{aligned}$$
with other elements in the upper triangular portion falling into one of the three groups
$$\begin{aligned} \nabla ^2 f(x)\left( \frac{\partial }{\partial x^i},\frac{\partial }{\partial x^j}\right) f(y)-\nabla ^2 \mathbb {C}^{xy}\left( \frac{\partial }{\partial x^i},\frac{\partial }{\partial x^j}\right) ,\qquad 1\le i<j\le m,\\ \nabla ^2 f(y)\left( \frac{\partial }{\partial y^i},\frac{\partial }{\partial y^j}\right) f(x)-\nabla ^2 \mathbb {C}^{xy}\left( \frac{\partial }{\partial y^i},\frac{\partial }{\partial y^j}\right) ,\qquad 1\le i<j\le m,\end{aligned}$$
$$\begin{aligned} \frac{\partial f(x)}{\partial x^i}\frac{\partial f(y)}{\partial y^j}-\nabla ^2 \mathbb {C}^{xy}\left( \frac{\partial }{\partial x^i},\frac{\partial }{\partial y^j}\right) ,\qquad 1\le i\le j\le m. \end{aligned}$$
For the sake of illustration, we focus on only one ‘type’ of term. Computations for the other terms are basically the same. The term we shall consider is
$$\begin{aligned} \nabla ^2 f(x)\left( \frac{\partial }{\partial x^1},\frac{\partial }{\partial x^2}\right) f(y)-\nabla ^2 \mathbb {C}^{xy}\left( \frac{\partial }{\partial x^1},\frac{\partial }{\partial x^2}\right) , \end{aligned}$$
and we now also note that the term involving the derivatives of \(\mathbb {C}\) does not present any problem for the upper bound on the increments since \(\mathbb {C}\) is deterministic and the fact that \(f\in C^3\) implies that \(\mathbb {C}\) is at least \(C^6\). Thus, it is enough to prove the following bound
$$\begin{aligned}&E\left\{ \left( \nabla ^2 f(x)\left( \frac{\partial }{\partial x^1},\frac{\partial }{\partial x^2}\right) f(y)-\nabla ^2 f(x)\left( \frac{\partial }{\partial x^1},\frac{\partial }{\partial x^2}\right) |_{x=x_0} f(y_0)\right) ^2\right\} \\&\le K d^2_{M\times M}((x,y),(x_0,y_0)) \end{aligned}$$
for any two pairs \((x,y), (x_0,y_0)\in M\times M\), and constant K. The expectation here is bounded above by
$$\begin{aligned}&\ \ 2E\left\{ \left( \frac{\partial ^2 f(x)}{\partial x^2\partial x^1}f(y)-\frac{\partial ^2 f(x)}{\partial x^2\partial x^1}|_{x=x_0}f(y_0)\right) ^2\right\} \nonumber \\&\quad +\,2E\left\{ \left( \left( \nabla _{\frac{\partial }{\partial x^1}}\frac{\partial }{\partial x^2}f(x)\right) f(y)-\left( \nabla _{\frac{\partial }{\partial x^1}}\frac{\partial }{\partial x^2}f(x)\right) |_{x=x_0} f(y_0)\right) ^2\right\} . \qquad \qquad \quad \end{aligned}$$
As far as the first expectation here is concerned, using Wick’s formula, the fact that f has unit variance, and, for a differential operator D of any order, writing \(D\mathbb {C}^{x_0y_0}\) for \(D\mathbb {C}^{xy}|_{(x_0,y_0)}\), we have that it is equal to
$$\begin{aligned}&\left[ \frac{\partial ^4 \mathbb {C}^{xx}}{\partial (x^2)^2\partial (x^1)^2}+\frac{\partial ^4 \mathbb {C}^{x_0x_0}}{\partial (x^2)^2\partial (x^1)^2}-2\frac{\partial ^4 \mathbb {C}^{xx_0}}{\partial (x^2)^2\partial (x^1)^2}\mathbb {C}^{yy_0}\right] \nonumber \\&\qquad +\ \left[ \left( \frac{\partial ^2 \mathbb {C}^{xy}}{\partial x^2\partial x^1}\right) ^2+\left( \frac{\partial ^2 \mathbb {C}^{x_0y_0}}{\partial x^2\partial x^1}\right) ^2-2\frac{\partial ^2 \mathbb {C}^{xy_0}}{\partial x^2\partial x^1}\frac{\partial ^2 \mathbb {C}^{x_0y}}{\partial x^2\partial x^1}\right] . \end{aligned}$$
The important point to be checked is that terms which are O(1) and \(O(\Vert v\Vert )\) cancel. We check this thoroughly below. The second order terms can be trivially bounded using the facts that \(f\in C^3\) and \(|v_i|\le \Vert v\Vert .\) This technique of bounding gives the required constant K independent of the points, but does not offer too much insight. Consequently, we illustrate how this can be done for one case only.
Consider the case
$$\begin{aligned}&\left[ \frac{\partial ^3 \mathbb {C}^{xx_0}}{\partial (x^1)^3}|_{x=x_0}v^1 +\cdots +\frac{\partial ^3 \mathbb {C}^{xx_0}}{\partial x^m\partial (x^1)^2}|_{x=x_0}v^m\right] \\&\quad \times \left[ \frac{\partial ^3 \mathbb {C}^{yy_0}}{\partial (y^1)^3}|_{y=y_0}v^{m+1}+\cdots +\frac{\partial ^3 \mathbb {C}^{yy_0}}{\partial y^m\partial (y^1)^2}|_{y=y_0}v^{2m}\right] . \end{aligned}$$
The above is obviously smaller than
$$\begin{aligned}&\left[ \ \left| \frac{\partial ^3 \mathbb {C}^{xx_0}}{\partial (x^1)^3}|_{x=x_0}v^1\right| + \cdots +\left| \frac{\partial ^3 \mathbb {C}^{xx_0}}{\partial x^m\partial (x^1)^2}|_{x=x_0}v^m\right| \ \right] \\&\quad \times \left[ \ \left| \frac{\partial ^3 \mathbb {C}^{yy_0}}{\partial (y^1)^3}|_{y=y_0}v^{m+1}\right| +\cdots +\left| \frac{\partial ^3 \mathbb {C}^{yy_0}}{\partial y^m\partial (y^1)^2}|_{y=y_0}v^{2m}\right| \ \right] . \end{aligned}$$
This immediately yields the following upper bound, in which \(M_3\) is a bound on the third order derivatives of \(\mathbb {C}\):
$$\begin{aligned}&M_3^2[|v^1|+\cdots +|v^m|]\times [|v^{m+1}|+\cdots +|v^{2m}|]\\&\qquad \le \ M_3^2m^2d^2_{M\times M}((x,y),(x_0,y_0)) = M_3^2m^2\Vert v\Vert ^2 . \end{aligned}$$
With the second order terms out of the way, we return to our claim that the zeroth and first order terms cancel out in (9.15). Focus first on the second term in that expression. For any \((x,y)\in M\times M,\) introduce the function
$$\begin{aligned} g(x,y)\ \mathop {=}\limits ^{\Delta }\ \frac{\partial ^2 \mathbb {C}^{xy}}{\partial x^2\partial x^1}, \end{aligned}$$
which, by assumption, is at least \(C^4\). Expanding this in a Taylor series about \((x_0,y_0)\), we have
$$\begin{aligned} g(x,y)= & {} g(x_0,y_0)+\sum _{i=1}^m \frac{\partial g(x,y)}{\partial x^i}|_{(x_0,y_0)}v^i\\&+\,\sum _{i=1}^m \frac{\partial g(x,y)}{\partial y^i}|_{(x_0,y_0)}v^{m+i}+O(\Vert v\Vert ^2). \end{aligned}$$
In shorter notation, let us write the above as
$$\begin{aligned} g(x,y)= g(x_0,y_0)+\sum _{i=1}^{2m}e_i v^i+O(\Vert v\Vert ^2), \end{aligned}$$
where the \(e_i\) are the coefficients from the previous formula. Then,
$$\begin{aligned} \left( \frac{\partial ^2 \mathbb {C}^{xy}}{\partial x^2\partial x^1}\right) ^2+\left( \frac{\partial ^2 \mathbb {C}^{x_0y_0}}{\partial x^2\partial x^1}\right) ^2= & {} 2(g(x_0,y_0))^2+2g(x_0,y_0)\sum _{i=1}^{2m}e_i v^i \nonumber \\&+\,O(\Vert v\Vert ^2). \end{aligned}$$
Next, define the following smooth functions over M:
$$\begin{aligned} h(x)\ \mathop {=}\limits ^{\Delta }\ \frac{\partial ^2 \mathbb {C}^{xy_0}}{\partial x^2\partial x^1},\qquad t(y)\ \mathop {=}\limits ^{\Delta }\ \frac{\partial ^2 \mathbb {C}^{x_0y}}{\partial x^2\partial x^1}.\end{aligned}$$
It is immediate that
$$\begin{aligned} h(x)= & {} g(x_0,y_0)+\sum _{i=1}^m e_i v^i+O(\Vert v\Vert ^2),\\ t(x)= & {} g(x_0,y_0)+\sum _{i=m+1}^{2m}e_i v^i+O(\Vert v\Vert ^2).\end{aligned}$$
$$\begin{aligned} -2\frac{\partial ^2 \mathbb {C}^{xy_0}}{\partial x^2\partial x^1}\frac{\partial ^2 \mathbb {C}^{x_0y}}{\partial x^2\partial x^1}=-2[g^2(x_0,y_0)+g(x_0,y_0)\sum _{i=1}^{2m}e_iv^i]+O(\Vert v\Vert ^2). \end{aligned}$$
It is now clear that, as claimed, at least for the second expression in (9.15), the zeroth and first order terms in the Taylor expansion cancel [cf. (9.16) and (9.17)].
Turning now to the first term in (9.15), define a function \(w:\,\text {diag}(M\times M)\rightarrow {\mathbb R}\) by
$$\begin{aligned}w(x,x)\ \mathop {=}\limits ^{\Delta }\ \frac{\partial ^4 \mathbb {C}^{xx}}{\partial (x^2)^2\partial (x^1)^2},\end{aligned}$$
and a function on \(a:\,M\rightarrow {\mathbb R}\) by
$$\begin{aligned} a(x)\ \mathop {=}\limits ^{\Delta } \ \frac{\partial ^4 \mathbb {C}^{xx_0}}{\partial (x^2)^2\partial (x^1)^2}.\end{aligned}$$
These admit the Taylor series expansions
$$\begin{aligned} w(x,x)= w(x_0,x_0)+2\sum _{i=1}^m\frac{\partial w(x,x)}{\partial x^1}|_{(x_0,x_0)}v^i+O(\Vert v\Vert ^2),\end{aligned}$$
$$\begin{aligned} a(x)= w(x_0,x_0)+\sum _{i=1}^m\frac{\partial w(x,x)}{\partial x^1}|_{(x_0,x_0)}v^i+O(\Vert v\Vert ^2).\end{aligned}$$
Noting that the Taylor series expansion of \(\mathbb {C}^{yy_0}\) about \(y_0\) is
$$\begin{aligned} 1+O(\Vert v\Vert ^2), \end{aligned}$$
it is easy to see that here also, only the terms starting from the second order remain. Again, following the same basic lines as in the previous argument shows that a similar upper bound holds for the second expectation in (9.14). From our earlier discussions, we are therefore done regarding proof of tightness of (9.13).
In addition, since by Lemma 6.1, we know that \(\sqrt{\sum (f_j(x))^2/k}\) converges, uniformly over M, and with probability one, to 1, we have (e.g. [5] [Theorem 4.4]) the joint weak convergence of the pair
$$\begin{aligned} \left( \sqrt{k}\left( \frac{\sum f_j(x) f_j(y)}{k}-\mathbb {C}(x,y)\right) ,\ \sqrt{\frac{\sum (f_j(x))^2}{k}}\sqrt{\frac{\sum (f_j(y))^2}{k}}\right) . \end{aligned}$$
Given this, the continuous mapping theorem immediately yields the weak convergence of \(\alpha _k\), as required.

9.3.2 \(\Delta _k\) converges weakly in \({B^{(2)}}\)

Recall the expression for \(\Delta _k\):
$$\begin{aligned} \sqrt{k}\left( 1-\sqrt{\frac{\sum (f_j(x))^2}{k}}\sqrt{\frac{\sum (f_j(y))^2}{k}}\right) \times \frac{\mathbb {C}(x,y)}{\sqrt{\frac{\sum (f_j(x))^2}{k}}\sqrt{\frac{\sum (f_j(y))^2}{k}}}. \end{aligned}$$
We have already seen that the denominator in the rightmost ratio here converges a.s., and uniformly, to one, and so a simple application of Theorem 4.4 from [5] and the continuous mapping theorem imply that we need only concern ourselves with the weak convergence of the sequence of processes \(\Gamma _k\) defined by
$$\begin{aligned} \Gamma _k(x,{y}) \ \mathop {=}\limits ^{\Delta }\ {\sqrt{k}\left( \sqrt{\frac{\sum (f_j(x))^2}{k}}\sqrt{\frac{\sum (f_j(y))^2}{k}}-1\right) }. \end{aligned}$$
To prove this convergence, we shall, for large enough k, bound \(\Gamma _k\) from above and below by two sequences of processes, which converge to the same limit. These bounds [cf. (9.21) below] involve a common term, the weak convergence of which is known, and a smaller term, which converges a.s. and uniformly to zero.

The bound depends on the following algebraic inequality, due to Cartwright and Field [7].

Theorem 9.3

([7]) Let \(w_i,\,1\le i\le n\) be numbers summing to 1. Let \(x_i\) be positive numbers in [ab] \((0<a<b)\), whose arithmetic and geometric means are denoted by \(AM_w\) and \(GM_w\), respectively. Then,
$$\begin{aligned} \frac{1}{2b}\sum w_i(x_i-AM_w)^2\ \le \ AM_w-GM_w\ \le \ \frac{1}{2a}\sum w_i(x_i-AM_w)^2.\end{aligned}$$
To apply Theorem 9.3 we note first that we know that \(k^{-1} \sum (f_j(x))^2\) converges to 1 a.s. and uniformly. Thus, given any \(\varepsilon >0\), there exists a (random) \(k_0\) such that, for all \( x\in M\), and all \(k\ge {k_0}\),
$$\begin{aligned} 1-\varepsilon \le \frac{1}{k} \sum f^2_j(x) \ \le \ 1+ \varepsilon . \end{aligned}$$
Now apply the theorem, assuming that \(k\ge k_0\), taking \(n=2, [a,b]=[1-\varepsilon ,1+\varepsilon ]\), \(w_1=w_2=1/2\), and
$$\begin{aligned} x_1 = \sum f_j^2(x)/k, \quad x_2 = \sum f_j^2(y)/k. \end{aligned}$$
$$\begin{aligned} N_k(x) \ \mathop {=}\limits ^{\Delta }\ \sqrt{k}\left( \sum f_j^2(x)/k\, - \, 1\right) , \end{aligned}$$
a little algebra leads to
$$\begin{aligned}&\frac{1}{2}\left( N_k(x) + N_k (y)\right) - \frac{(N_k(x) -N_k(y))^2}{4(1-\varepsilon )\sqrt{k}} \nonumber \\&\quad \le \ \Gamma _k(x,y)\ \le \ \frac{1}{2}\left( N_k(x) + N_k (y)\right) - \frac{(N_k(x) -N_k(y))^2}{4(1+\varepsilon )\sqrt{k}} . \end{aligned}$$
But this is precisely the inequality that we described above. Although it would be straightforward to establish it independently, the weak convergence of \(N_k\) has already been proven in Sect. 9.3.1, since \(N_k\) is just the process (9.13) over \(\text {diag}(M\times M)\). From this immediately follows the weak convergence of the processes from \(M\times M\rightarrow {\mathbb R}\) defined by \((x,y)\rightarrow N_k(x)+N_k(y)\) and \((x,y)\rightarrow N_k(x)-N_k(y)\).

This completes the proof of the weak convergence of \(\Delta _k\).

9.4 The continuous mapping argument

To complete the proof of Lemma 6.2, we exploit the fact, proven in the previous two sections, that \({\widetilde{\zeta }}_k\) converges weakly in \({B^{(2)}}\) to the Gaussian process \({\widetilde{\zeta }}\) with covariance function given by (9.9), and use it to show that the ratio processes
$$\begin{aligned} \zeta _k(x,y) = \frac{{\widetilde{\zeta }}_k (x,y)}{1-\mathbb {C}^2(x,y)} \end{aligned}$$
converge weakly in \(C_b(M^{*})\).
As described at the beginning of this section, this follows immediately from an application of the continuous mapping theorem, once we show that the mapping \(H:\,{B^{(2)}}\rightarrow C_b(M^{*})\), defined by
$$\begin{aligned} (H(\phi ))(x,y)= \frac{\phi (x,y)}{1-\mathbb {C}^2(x,y)} \end{aligned}$$
is continuous, with probability one, for the probability measure supported on the paths of \({\widetilde{\zeta }}\).

Recall that \({\widetilde{\zeta }}\) is at least \(C^2\) over \(M\times M\) because of weak convergence in \({B^{(2)}}\). The same (and more) is true for the covariance function \(\mathbb {C}\), so the issue of continuity of H is trivial if we restrict \(\gamma \) to a region away from the diagonal of \(M\times M\).

So the only question remaining is what happens as \(y\rightarrow x\). What we shall now do is investigate the limits
$$\begin{aligned} \lim _{y\rightarrow x} \frac{ {\widetilde{\zeta }}(x,y)}{1-\mathbb {C}^2(x,y)}, \end{aligned}$$
and show that they depend only on ratios of well defined functions of the second derivatives of \({\widetilde{\zeta }}\) and \(\mathbb {C}\). This will immediately imply the continuity of H, and thus complete the proof of Lemma 6.2.
To this end, take \(X_x\in T_x M,\) and a \(C^2\) curve c in M such that
$$\begin{aligned} c:\,(-\delta ,\delta )\rightarrow M,\quad c(0)=x,\quad \dot{c}(0)=X_x. \end{aligned}$$
Then, using the symmetry of \({\widetilde{\zeta }}\) and \(\mathbb {C}\), when y approaches x along \(X_x\),
$$\begin{aligned} \lim _{y\rightarrow x}\frac{ {\widetilde{\zeta }}(x,y)}{1-\mathbb {C}^2(x,y)}=\lim _{u\rightarrow 0}\frac{ {\widetilde{\zeta }}(c(u),x)}{1-\mathbb {C}^2(c(u),x)}, \end{aligned}$$
when the limit on the left hand side exists. It follows from (9.9) that the limit of the numerator is zero, while the same is true of the denominator since \(\mathbb {C}(x,x)=1\).
By L’Hôpital’s rule, the limit above is equal to
$$\begin{aligned} \lim _{u\rightarrow 0}\frac{\frac{ {d} {\widetilde{\zeta }}(c(u),x)}{ {d}u}}{-2\mathbb {C}(c(u),x)\frac{ {d}\mathbb {C}(c(u),x)}{ {d}u}}, \end{aligned}$$
The denominator here is easily seen to be zero, since \(x=y\) is a critical point for \(\mathbb {C}(x,y)\) and \(\mathbb {C}\) is differentiable. To check that the same is true for the numerator, note that \({\widetilde{\zeta }}\) is differentiable, with zero mean and covariance function given by the second derivative of the covariance function of \({\widetilde{\zeta }}\). That is,
$$\begin{aligned} \left[ \mathbb {E}\left\{ (X_x {\widetilde{\zeta }}(x,y))^2\right\} \right] _{y=x} = \left[ X_{y_1}X_{y_2}\mathbb {E}\left\{ {\widetilde{\zeta }}(x,y_1) {\widetilde{\zeta }}(x,y_2)\right\} \right] _{y_1=y_2=x}. \end{aligned}$$
Using the specific form (9.9) of this covariance, and denoting \(X_{ x_1}\mathbb {C}^{ x_1 x}\) for \(X_{ y_1}\mathbb {C}^{ y_1 x}|_{ y_1= x_1}\), we have that
$$\begin{aligned}&X_{ y_1}\mathbb {E}\{ {\widetilde{\zeta }}( y_1, x) {\widetilde{\zeta }}( y_2, x)\}|_{ y_1= x_1}\\&\quad =\, \frac{1}{2}{(\mathbb {C}^{ y_2 x})^3} X_{ x_1}\mathbb {C}^{ x_1 x}+\frac{1}{2}{\mathbb {C}^{ y_2 x}} X_{ x_1}\mathbb {C}^{ x_1 x}\\&\qquad +\, {{\frac{3}{2}}}{\mathbb {C}^{ y_2 x}}(\mathbb {C}^{ x_1 x})^2X_{ x_1}\mathbb {C}^{ x_1 x}\\&\qquad +\,\frac{1}{2}{\mathbb {C}^{ y_2 x}} \left[ (\mathbb {C}^{ x_1 y_2})^2X_{ x_1}\mathbb {C}^{ x_1 x}+2\mathbb {C}^{ x_1 x}\mathbb {C}^{ x_1 y_2}X_{ x_1}\mathbb {C}^{ x_1 y_2}\right] +\mathbb {C}^{ y_2 x}X_{ x_1}\mathbb {C}^{ x_1 x} \\&\qquad -\,(\mathbb {C}^{ y_2 x})^2X_{ x_1}\mathbb {C}^{ x_1 y_2} +X_{ x_1}\mathbb {C}^{ x_1 y_2}-\mathbb {C}^{ y_2 x}X_{ x_1}\mathbb {C}^{ x_1 x} \\&\qquad -\,2\mathbb {C}^{ x_1 y_2}\mathbb {C}^{ x_1 x}X_{ x_1}\mathbb {C}^{ x_1 x}-(\mathbb {C}^{ x_1 x})^2X_{ x_1}\mathbb {C}^{ x_1 y_2}-\mathbb {C}^{ y_2 x}X_{ x_1}\mathbb {C}^{ x_1 x}. \end{aligned}$$
Taking the additional derivative \(X_{ y_2}\), and then setting \( x_1= x_2= x\) gives
$$\begin{aligned} 2\left( \nabla ^2\mathbb {C}( x, x)(X_ x,X_ x)-\nabla ^2\mathbb {C}( x, x)(X_ x,X_ x)\right) =0. \end{aligned}$$
Thus, since the variance here is zero, \( y= x\) is indeed a critical point of \( {\widetilde{\zeta }}( y, x)\), and so to evaluate the limit in (9.24) we need yet another round of L’Hôpital’s rule. This gives us that the limit is equal to
$$\begin{aligned} \lim _{u\rightarrow 0}\frac{\frac{ {d}^2 {\widetilde{\zeta }}(c(u), x)}{ {d}u^2}}{-2\left[ \left( \frac{ {d}\mathbb {C}(c(u), x)}{ {d}u}\right) ^2+\mathbb {C}(c(u), x)\frac{ {d}^2\mathbb {C}(c(u), x)}{ {d}u^2}\right] } = \frac{-\nabla ^2 {\widetilde{\zeta }}( x, x)(X_ x,X_ x)}{2\nabla ^2\mathbb {C}( x, x)(X_ x,X_ x)} \end{aligned}$$
the equality here following from the fact that \(y= x\) is a critical point for both \( {\widetilde{\zeta }}( y,x)\) and \(\mathbb {C}( y, x)\).

However, all terms here are well defined, finite, and non-zero with probability one, so we are done.

10 Proof of Lemma 6.4

To prove the lemma, we need to show that the sequence
$$\begin{aligned} \sup _{x,y\in M^{*}} E^{x,k}(y)= \sup _{x,y\in M^{*}}\frac{k}{\Vert f^k(y)\Vert ^2}\frac{(1-\mathbb {C}(x,y))^2}{(1-\widehat{\mathbb {C}}_k(x,y))^2}\left( \frac{1}{k}\Vert P_x f^{x,k}(y)\Vert ^2\right) \end{aligned}$$
converges to zero, with probability one.
By Lemmas 6.1 and 6.2 we know that each of the first two factors here a.s. converge, uniformly over \(M^{*}\), to one. So it suffices to show the convergence of the final factor to zero, or that
$$\begin{aligned} \lim _{k\rightarrow \infty }\sup _{(x,y)\in M^{*}}\frac{1}{k}\Vert P_x f^{x,k}(y)\Vert ^2= 0, \qquad a.s. \end{aligned}$$
Since we have already shown that \(f^x(y)\) is a.s. bounded over \(M^{*}\), the absolute value of its supremum has (on a large deviations scale) Gaussian-like tails, and so standard Gaussian arguments show that the maximum of k i.i.d. copies of this process can, asymptotically, be a.s. bounded by \(C\sqrt{\log k}\) for some finite C.
Since \(P_x\) is orthogonal projection onto an \((m+1)\)-dimensional subspace of \(\mathbb {R}^k\), it now follows that, for large enough k,
$$\begin{aligned} \frac{1}{k}\Vert P_x f^{x,k}(y)\Vert ^2\ < \ \frac{C(m+1)\log k}{k}, \end{aligned}$$
from which (10.1) now follows, and we are done.

11 Fluctuation theory for Local Reaches

We now return to the last part of Theorem 4.3, in which we described a fluctuation result involving the local reaches of the random manifolds \(h^k(M)\). In particular, we want to consider the \(k\rightarrow \infty \) distributional limit of the functions
$$\begin{aligned} \sqrt{k}\left( \cot ^2\theta _k(\cdot )-\sigma ^2_c(f,\cdot )\right) \end{aligned}$$
where \(\theta _k(x)\), defined by (4.8), is the local reach of \(h^k(M)\) at the point \(h^k(x)\), for \(x\in M\).

The main result of this section is Theorem 11.1, which contains what is needed to complete the statement of Theorem 4.3, in that the limit process for (11.1) is now described in (formidable) detail.

To make that detail appear a little more natural, we shall do a little algebra before stating the theorem.

11.1 Some algebra and rearrangements

From the proof of Lemma 5.2, we know that
$$\begin{aligned} \cot ^2 \theta _k(x) = \sup _{y\in M\setminus \{x\}} \left\{ R_k(x,y) \, -\, E^{x,k}(y)\right\} , \end{aligned}$$
where \(E^{x,k}(y)\) is the ‘error’ term defined at (5.2) and we set
$$\begin{aligned} R_k(x,y)\ \mathop {=}\limits ^{\Delta }\ \frac{k}{\Vert f^k(y)\Vert ^2}\frac{(1-\mathbb {C}(x,y))^2}{(1-\widehat{\mathbb {C}}_k(x,y))^2}\left( \frac{1}{k}\Vert f^{x,k}(y)\Vert ^2\right) . \end{aligned}$$
We already know from Lemma 6.4 that \(E^{x,k}(y)\rightarrow 0\) uniformly in x and y as \(k\rightarrow \infty \). However, looking back over the proof, in particular the final inequality (10.2), we see that the same is true for \(\sqrt{k}E^{x,k}(y)\), from which it follows that we can ignore the error term in (11.2). In addition, we also know from Theorem 4.3 that
$$\begin{aligned} \lim _{k\rightarrow \infty }\sup _{(x,y)\in M^{*}}\left| R_k(x,y)-\text {Var}(f^x(y))\right| =0. \end{aligned}$$
Thus it seems not unreasonable that the structure of the limit of (11.1) might come from a continuous mapping theorem and the weak convergence of the random processes \(\gamma _k\), where
$$\begin{aligned} \gamma _k(x,y) \ \mathop {=}\limits ^{\Delta }\ \sqrt{k}\left( R_k(x,y)-\text {Var}(f^x(y))\right) ,\qquad (x,y)\in M^{*}. \end{aligned}$$
If we now recall/introduce the notations,
$$\begin{aligned} \Sigma ^{(1)}_k(x)= \frac{\Vert f^k(x)\Vert ^2}{k},\qquad \Sigma ^{(2)}_k(x,y) = \frac{\Vert f^{x,k}(y)\Vert ^2}{k}, \end{aligned}$$
$$\begin{aligned} Z_k (x,y)= & {} \sqrt{k}\left[ \left( \frac{1-\mathbb {C}(x,y)}{1-\widehat{\mathbb {C}}_k(x,y)}\right) ^2-1\right] , \end{aligned}$$
$$\begin{aligned} B_k(x,y)= & {} \sqrt{k}\left( \Sigma ^{(2)}_k(x,y) -\text {Var}(f^x(y))\right) , \end{aligned}$$
$$\begin{aligned} N_k(y)= & {} \sqrt{k}\left( \Sigma ^{(1)}_k(y) - 1\right) , \end{aligned}$$
then it takes no more than a few lines of simple algebra to check that
$$\begin{aligned} \gamma _k(x,y)= & {} \ B_k(x,y) \, -\, N_k(y)Z_k(x,y) \frac{\Sigma ^{(2)}_k(x,y)}{\sqrt{k}\Sigma ^{(1)}_k(y)} \nonumber \\&-\ \frac{\Sigma ^{(2)}_k(x,y)}{\Sigma ^{(1)}_k(y)} N_k(y) \, +\, \Sigma ^{(2)}_k(x,y) Z_k(x,y) . \end{aligned}$$
Now we wave our hands a little to come to some vague conclusions, before stating Theorem 11.1 which will make these conclusions precise, and then giving proofs. Firstly however, to reduce the lengths of some of the formulae to come, we recall some of our notational shorthand,
$$\begin{aligned} \mathbb {C}^{xy} = \mathbb {C}(x,y),\quad \widehat{\mathbb {C}}_k^{xy} = \widehat{\mathbb {C}}_k(x,y), \end{aligned}$$
and introduce
$$\begin{aligned} V^{xy}= \text {Var}\left( f^x(y)\right) = \frac{1-(\mathbb {C}^{xy})^2-\sum _{i=1}^m (X_i\mathbb {C}^{xy})^2}{(1-\mathbb {C}^{xy})^2}. \end{aligned}$$
(To see why the right hand side here is indeed \(\text {Var}\left( f^x(y)\right) \), see (12.4) below).
Now consider the various terms in (11.9). Although we did not state it explicitly, we have actually already proven that \(N_k\) has a Gaussian limit. (See the discussion below in the proof of Lemma 12.1.) We also know, from Lemmas 6.1 and 6.3, that, as \(k\rightarrow \infty \), uniformly on M and \(M^{*}\), respectively,
$$\begin{aligned} \Sigma ^{(1)}_k(x)\mathop {\rightarrow }\limits ^{a.s. }1 \ \ \text {and}\ \ \Sigma ^{(2)}_k(x,y)\mathop {\rightarrow }\limits ^{a.s. }V^{xy}. \end{aligned}$$
In addition, Lemma 6.2 and the a.s. convergence of \(\Sigma ^{(1)}_k\) lead to the expectation (this is the handwaving step) that \(B_k\) and \(Z_k\) will both have Gaussian limits. Substituting this ‘information’ into (11.9), the implication is that the first term on the right hand side will have a Gaussian limit on \(M^{*}\), the second will converge to zero, the third will converge to \(V^{xy}\) times a Gaussian process on M, while the last term will converge to \(V^{xy}\) times a Gaussian process on \(M^{*}\). Unfortunately, all the limit processes will be correlated, which is what makes the precise description of this a little long-winded, as we now see.

11.2 The fluctuation result

Theorem 11.1

Let f and M satisfy the assumptions of Theorem 4.3, including the conditions that M is \(C^6\) and that, with probability one, \(f\in C^6(M)\). Then there exists a sequence \(\bar{\gamma }_k\) of random processes from \(M\rightarrow {\mathbb R}\), such that, for all \(x\in M\),
$$\begin{aligned} \sqrt{k}\left| \cot ^2\theta _k(x)-\sigma ^2_c(f,x)\right| \ \le \ \left| \bar{\gamma }_k(x)\right| \end{aligned}$$
and a limit process \(\bar{\gamma }:\,M\rightarrow {\mathbb R}\) such that
$$\begin{aligned} \bar{\gamma }_k(\cdot ) \Rightarrow \ \bar{\gamma }(\cdot ). \end{aligned}$$
The convergence here is weak convergence, in the Banach space \(C_b(M)\) of continuous functions on M with supremum norm, and
$$\begin{aligned} \bar{\gamma }(x) = \sup _{y\in M\setminus x}\gamma (x,y), \end{aligned}$$
where \(\gamma \) is the a.s. continuous Gaussian process over \(M^{*}\) representable in distribution as
$$\begin{aligned} \gamma (x,y)= \beta (x,y)+V^{xy}\eta (y)+ 2V^{xy}\zeta (x,y)(1+\mathbb {C}(x,y)). \end{aligned}$$
  1. 1.
    \(\eta (y)\) is a centered Gaussian process over M with correlation function
    $$\begin{aligned} \mathbb {E}\{\eta (y_1)\eta (y_2)\}=2(\mathbb {C}(y_1,y_2))^2. \end{aligned}$$
  2. 2.
    \(\beta (x,y)\) is a centered Gaussian process over \(M^{*}\) with correlation function
    $$\begin{aligned} \mathbb {E}\{\beta (x_1,y_1)\beta (x_2,y_2)\}=2\left( \mathbb {E}\{f^{x_1}(y_1)f^{x_2}(y_2)\}\right) ^2. \end{aligned}$$
  3. 3.
    \(\zeta (x,y)\) is a centered Gaussian process over \(M^{*}\) with correlation function
    $$\begin{aligned}&\mathbb {E}\{ \zeta (x_1,y_1) \zeta (x_2,y_2)\} \\&\quad =\, \frac{1}{(1-(\mathbb {C}^{x_1y_1})^{2})(1-(\mathbb {C}^{x_2y_2})^{2})}\\&\qquad \times \bigg \{\frac{1}{2}\mathbb {C}^{x_1 y_1}\mathbb {C}^{x_2y_2} \left[ (\mathbb {C}^{y_1 x_2})^2+(\mathbb {C}^{y_1 y_2})^2+(\mathbb {C}^{x_1 x_2})^2+(\mathbb {C}^{x_1 y_2})^2\right] \\&\qquad +\,\mathbb {C}^{x_2y_1}\left[ \mathbb {C}^{x_1 y_2}-\mathbb {C}^{x_1 x_2}\mathbb {C}^{x_2y_2}\right] +\mathbb {C}^{y_1 y_2}\left[ \mathbb {C}^{x_1 x_2}-\mathbb {C}^{x_1 y_2}\mathbb {C}^{x_2y_2}\right] \\&\qquad -\,\mathbb {C}^{x_1 y_1}\left[ \mathbb {C}^{x_1 x_2}\mathbb {C}^{x_1 y_2}+\mathbb {C}^{y_1 y_2}\mathbb {C}^{x_2y_1}\right] \bigg \}. \end{aligned}$$
As for the corresponding cross-covariance functions, we write them in terms of \(X_1,\dots ,X_m\), an orthonormal (with respect to the induced metric) vector field on M. None of the cross-covariances are dependent on the particular choice of vector field.
$$\begin{aligned} \mathbb {E}\{\eta (y_1)\beta (x_2,y_2)\} =\,&\frac{ 2\left[ \mathbb {C}^{y_1y_2}-\mathbb {C}^{x_2y_2}\mathbb {C}^{x_2y_1}-\sum _i X_i\mathbb {C}^{xy_1}|_{x=x_2} X_i\mathbb {C}^{xy_2}|_{x=x_2} \right] ^2}{(1-\mathbb {C}^{x_2 y_2})^2},\\ \mathbb {E}\{\eta (y_1)\zeta (x_2,y_2)\} =\,&\frac{2\mathbb {C}^{x_2y_1}\mathbb {C}^{y_1y_2}-\mathbb {C}^{x_2y_2}\left\{ (\mathbb {C}^{x_2y_1})^2+(\mathbb {C}^{y_1y_2})^2\right\} }{1- (\mathbb {C}^{x_2 y_2})^2},\\ \mathbb {E}\{\zeta (x_1,y_1)\beta (x_2,y_2)\} =\,&\frac{1}{(1-(\mathbb {C}^{x_2 y_2})^2)}\\&\times \left[ \frac{2}{(1-\mathbb {C}^{x_1y_1})^2}\left\{ \left( \mathbb {C}^{y_1y_2}-\mathbb {C}^{x_1y_1}\mathbb {C}^{x_1y_2} -\sum X_i \mathbb {C}^{xy_1}|_{x=x_1} X_i \mathbb {C}^{xy_2}|_{x=x_1}\right) \right. \right. \\&\times \left. \left( \mathbb {C}^{y_1x_2}-\mathbb {C}^{x_1y_1}\mathbb {C}^{x_1x_2}-\sum X_i\mathbb {C}^{xy_1}|_{x=x_1}X_i\mathbb {C}^{xx_2}|_{x=x_1}\right) \right\} \\&-\frac{\mathbb {C}^{x_2y_2}}{(1-\mathbb {C}^{x_1y_1})^2}\left\{ \left( \mathbb {C}^{y_1y_2}-\mathbb {C}^{x_1y_1}\mathbb {C}^{x_1y_2}-\sum X_i\mathbb {C}^{xy_1}|_{x=x_1} X_i \mathbb {C}^{xy_2}|_{x=x_1}\right) ^2\right. \\&+\left. \left. \left( \mathbb {C}^{y_1x_2}-\mathbb {C}^{x_1y_1}\mathbb {C}^{x_1x_2}-\sum X_i \mathbb {C}^{xy_1}|_{x=x_1} X_i \mathbb {C}^{xx_2}|_{x=x_1}\right) ^2\right\} \right] . \end{aligned}$$

Although Theorem 11.1 takes a lot of space to state, its main implication is simple: The limiting fluctuations of the local reach of \(h^k(M)\) are bounded by a functional of a Gaussian process on \(M^{*}\). The detailed structure of this Gaussian process is complicated, and depends, in terms of properties such as differentiability, on the underlying covariance of f. For example, while the limit is a.s. continuous, it will not typically be differentiable, and fine sample path properties such as Hölder continuity will depend on the behaviour of the underlying covariance \(\mathbb {C}\) in ways that are not at all obvious.

12 Proof of Theorem 11.1

We start with two lemmas, and then use these to complete the proof in the Sect. 12.2.

12.1 Two lemmas

Lemma 12.1

Under the conditions of Theorem 11.1, and with the notation of the previous section, we have the joint weak convergence of the following vector valued process over \(\mathbb {C}_b(M^{*})\):
$$\begin{aligned} \left( \Sigma ^{(1)}_k,\,\Sigma ^{(2)}_k,\,B_k,\,N_k,\,Z_k\right) \ \Rightarrow \ \ \left( 1,\, V,\,\beta ,\,\eta ,\,2\zeta (1+\mathbb {C})\right) , \end{aligned}$$
where \(V:\,M\times M\rightarrow {\mathbb R}\) is defined by \(V(x,y)=V^{xy}\).


Most of the pieces that make up the proof of Theorem 11.1 are actually already in hand. For a start, by Lemmas 6.1 and 6.3 we know \(\Sigma ^{(2)}_k(x,y)\) and \(\Sigma ^{(1)}_k(y)\) converge to the deterministic limits \(V^{xy}\) and 1, respectively, where the convergence is almost surely uniform in \((x,y)\in M^{*}\) and \(y\in M\). The corresponding weak convergence is, obviously, implied by this. Secondly, in Sect. 9.3.1 we established the weak convergence of \(N_k\) in \(C_b(M)\) [cf. (9.20) and the last paragraph of Sect. 9.3.2].

To add \(Z_k\) to this convergence, note that the main term there—\((1-\mathbb {C}(x,y))/(1-\widehat{\mathbb {C}}_k(x,y))\)—already appeared in Sect. 9.1, and can be rewritten as in (9.4). Substituting there the \(\zeta _k(x,y)\) of (9.7) and expanding out the powers, simple algebra leads to the fact that
$$\begin{aligned} Z_k(x,y) = 2(1+\mathbb {C}(x,y))\zeta _k(x,y)\, + \, O(1/\sqrt{k}), \end{aligned}$$
and we have already shown the weak convergence of \(\zeta _k\) in \(C_b(M^{*})\) [cf. (9.22)].

Note that to this point we have relied on results that arose in earlier parts of the paper, and these required only that \(f\in C^3(M)\). The additional level of differentiability required by the lemma, and so also by Theorem 11.1, comes from the following lemma, which completes the collection by establishing the weak convergence of \(B_k\).

In view of the fact that all the limit processes are either deterministic or Gaussian, applications of Slutsky’s theorem and the Cramér–Wold device then complete the proof, modulo calculating all the limit variances and covariances, for which we do not intend to write out the details. \(\square \)

Lemma 12.2

Under the conditions of Lemma 12.1, and with the notation of the previous section, \(B_k\Rightarrow \beta \) in \(C_b(M^{*})\).


The proof follows along the same lines as the proof of the weak convergence of \(\alpha _k\) described in Sect. 9.3.1.

To start, we once again choose an orthonormal frame field \(\{X_i\}\) for M, with the conventions of Sect. 3. Write the corresponding Riemannian normal basis vectors as \(\{{\partial }/{\partial x^i}\}\), and \(\left\{ {\partial }/{\partial x^i}:{\partial }/{\partial y^i}\right\} \) as the corresponding basis for the tangent spaces on \(M\times M\). In this basis, we have
$$\begin{aligned} f^x(y) = \frac{f(y)-\mathbb {C}^{xy}f(x)}{1-\mathbb {C}^{xy}}-\sum _{i=1}^m\frac{ \frac{\partial f(x)}{\partial x^i} \,\frac{\partial \mathbb {C}^{xy}}{\partial x^i}}{1-\mathbb {C}^{xy}}, \end{aligned}$$
$$\begin{aligned} V^{xy} =\text {Var}\left( f^x(y)\right) \ = \ \frac{1-(\mathbb {C}^{xy})^2-\sum _{i} (\frac{\partial \mathbb {C}^{xy}}{\partial x^i})^2}{(1-\mathbb {C}^{xy})^2}. \end{aligned}$$
If we now write
$$\begin{aligned} \Lambda _\ell (x,y)&\mathop {=}\limits ^{\Delta }&(1-\mathbb {C}^{xy})^2 \,\left( \left( f_\ell ^x(y)\right) ^2-V^{xy}\right) , \end{aligned}$$
then we can also write
$$\begin{aligned} B_k(x,y) = \frac{k^{-1/2}\sum _{\ell =1}^k \Lambda _\ell (x,y) }{(1-\mathbb {C}^{xy})^2}. \end{aligned}$$
Suppose we can show that the numerator here has a Gaussian limit, \(\Lambda \), say, as \(k\rightarrow \infty \). Since it is a sum of i.i.d. processes, this should not be too hard. To complete the proof of the weak convergence of the \(B_k\) we could then use a continuous mapping argument, as before, by defining a map, H say, between functions on \(M^{*}\) via
$$\begin{aligned} (H(\phi )) (x,y) = \frac{\phi (x,y)}{(1-\mathbb {C}^{xy})^2}, \end{aligned}$$
where the image function is in \(C_b(M^{*})\). For this to work, we need to know that H is continuous, with probability one, for the probability measure supported on the paths of \(\Lambda \). (This is not straightforward, since, as we shall soon see, we once again run into 0 / 0 issues for \((H(\Lambda ))(x,y)\) as \(x\rightarrow y\).) As a first step in checking this continuity, we need to know something about \(\Lambda \), and the function space on which the convergence of the numerator in (12.5) to \(\Lambda \) occurs.
We start with \(\Lambda \). Since, by assumption, it is mean zero Gaussian, all of its properties are determined by its covariance function. Given the expressions (12.3) and (12.4), it is not hard to check that this is given by
$$\begin{aligned} \mathbb {E}\left\{ \Lambda _\ell (x_1,y_1)\Lambda _\ell (x_2,y_2)\right\}= & {} \mathbb {E}\left\{ \Lambda (x_1,y_1)\Lambda (x_2,y_2)\right\} \nonumber \\= & {} \mathbb {C}^{y_1y_2}-\mathbb {C}^{x_2y_1}\mathbb {C}^{x_2y_2} -\ \sum _i \frac{\partial \mathbb {C}^{x_2y_1}}{\partial x^i} \frac{\partial \mathbb {C}^{x_2y_2}}{\partial x^i}\nonumber \\&+\,\mathbb {C}^{x_1x_2}\mathbb {C}^{x_1y_1}\mathbb {C}^{x_2y_2}\nonumber \\&+\,\mathbb {C}^{x_1y_1}\sum _i \frac{\partial \mathbb {C}^{x_2x_1}}{\partial x^i} \frac{\partial \mathbb {C}^{x_2y_2}}{\partial x^i}\nonumber \\&-\,\sum _i \frac{\partial \mathbb {C}^{x_1y_2}}{\partial x^i}\frac{\partial \mathbb {C}^{x_1y_1}}{\partial x^i}\nonumber \\&+\,\mathbb {C}^{x_2y_2}\sum _i\frac{\partial \mathbb {C}^{x_1x_2}}{\partial x^i}\frac{\partial \mathbb {C}^{x_1y_1}}{\partial x^i}\nonumber \\&+\,\sum _{i,j}\frac{\partial \mathbb {C}^{x_1y_1}}{\partial x^i}\frac{\partial \mathbb {C}^{x_2y_2}}{\partial x^i}\frac{\partial ^2 \mathbb {C}^{x_1x_2}}{\partial x^i \partial x^j}. \end{aligned}$$
(Note that setting \(x=x_1=x_2\) and \(y=y_1=y_2\) here is what gives the numerator in the expression for \(V^{xy}\) in (11.11)).
We can now consider the behaviour of
$$\begin{aligned} \lim _{y\rightarrow x}\frac{\Lambda (x,y)}{(1-\mathbb {C}^{xy})^2}. \end{aligned}$$
To see how this works, we restrict the argument to the case in which M is one-dimensional. While notationally much simpler than the general case (although we shall see in a moment that it is hardly ‘simple’) it is indicative of the general situation. In the general case the limit in (12.8) will be taken along a specific path of y’s, for which the final direction of approach to x will be what plays the role of the single dimension in the following calculation.
Taking then \(x,y\in M\subset {\mathbb R}^1\), it is an immediate consequence of (12.7) that the variance of \(\Lambda (x,y)\) tends to zero as \(y\rightarrow x\), and thus so does \(\Lambda (x,y)\) itself. The denominator here clearly also converges to zero, and so to compute the ratio we need to resort to an application of L’Hôpital’s rule, which gives us that the limit in (12.8) is the same as
$$\begin{aligned} \lim _{y\rightarrow x} \frac{\frac{\partial \Lambda (x,x)}{\partial x}}{-2(1-\mathbb {C}^{xx})\frac{\partial \mathbb {C}^{xx}}{\partial x}}. \end{aligned}$$
Once again, it is obvious that the denominator vanishes in the limit.
As for the numerator, it follows from (12.7) and the fact that g-norm of \(\frac{\partial }{\partial x}\) is one that
$$\begin{aligned} \mathbb {E}\left\{ \left( \frac{\partial L(x,y)}{\partial x}\right) ^2\right\} \bigg |_{y=x}= & {} \frac{\partial }{\partial y_1}\frac{\partial }{\partial y_2}\mathbb {E}\{L(x,y_1)L(x,y_2)\}|_{y_1=y_2=x}\\= & {} 4\left( \mathbb {C}^{y_1y_2}-\mathbb {C}^{xy_1}\mathbb {C}^{xy_2}-\frac{\partial \mathbb {C}^{xy_1}}{\partial x}\frac{\partial \mathbb {C}^{xy_2}}{\partial x}\right) \\&\times \left( \frac{\partial ^2 \mathbb {C}^{y_1y_2}}{\partial y_1 \partial y_2}-\frac{\partial \mathbb {C}^{xy_1}}{\partial y_1}\frac{\partial \mathbb {C}^{xy_2}}{\partial y_2}-\frac{\partial ^2 \mathbb {C}^{xy_1}}{\partial y_1\partial x}\frac{\partial ^2 \mathbb {C}^{xy_2}}{\partial y_2\partial x}\right) \\&+\ 4\left( \frac{\partial \mathbb {C}^{y_1y_2}}{\partial y_1}-\frac{\partial \mathbb {C}^{xy_1}}{\partial y_1}\mathbb {C}^{xy_2}-\frac{\partial ^2 \mathbb {C}^{xy_1}}{\partial y_1\partial x}\frac{\partial \mathbb {C}^{xy_2}}{\partial x}\right) \\&\times \left( \frac{\partial \mathbb {C}^{y_1y_2}}{\partial y_2}-\frac{\partial \mathbb {C}^{xy_2}}{\partial y_2}\mathbb {C}^{xy_1}-\frac{\partial ^2 \mathbb {C}^{xy_2}}{\partial y_2\partial x}\frac{\partial \mathbb {C}^{xy_1}}{\partial x}\right) . \end{aligned}$$
However, evaluated at \(y_1=y_2=x\), this also vanishes, so yet another application of L’Hôpital’s rule is required.
In fact, two more applications of L’Hôpital’s rule are required, and while the derivation follows the line of the previous applications, the formulae are rather long, and so we will skip the details. However, in the end, one finds that
$$\begin{aligned} \lim _{y\rightarrow x}\frac{\Lambda (x,y)}{(1-\mathbb {C}^{xy})^2}= \frac{\frac{\partial ^4 L(x,x)}{\partial x^4}}{6\left( \frac{\partial ^2 \mathbb {C}^{xx}}{\partial x^2}\right) ^2}, \end{aligned}$$
where the variance of the numerator is
$$\begin{aligned} 72\left( \frac{\partial ^4 \mathbb {C}^{xx}}{\partial x^4}-\left( \frac{\partial ^3 \mathbb {C}^{xx}}{\partial x^3}\right) ^2-1\right) ^2, \end{aligned}$$
which, like the denominator of (12.10) is non-zero. (This is a consequence of the non-degeneracy assumed in Assumption 4.2).
The punch line to all this is that in order to apply the continuous mapping theorem with the mapping H of (12.6), we need to have convergence not only of the sum \(k^{-1/2} \sum \Lambda _\ell \), but also at least four of its derivatives. That is, we need weak convergence in the Banach space \(B^{(4)}\) of four times continuously differentiable functions on \(M\times M\), equipped with the norm
$$\begin{aligned} \qquad \ \Vert f\Vert _{B^{(4)}} \mathop {=}\limits ^{\Delta }\max \left\{ \Vert f\Vert _{\infty },\, \Vert \nabla f\Vert _{\infty },\, \Vert \nabla ^2 f\Vert _{\infty }, \Vert \nabla ^3 f\Vert _{\infty },\, \Vert \nabla ^4 f\Vert _{\infty }\right\} , \end{aligned}$$
[cf. (9.11)].

Now that we know what to do, the rest is, at least in principle, straightforward, and the proof follows along the same lines of the proof of the weak convergence of \(\alpha _k\) we treated in Sect. 9.3.1. Convergence of finite dimensional distributions follows from Theorem 9.2, while tightness requires the computation of moments of increments of the \(\Lambda _\ell \) and their first four derivatives. Note, however, that \(\Lambda _\ell (x,y)\) involves \(f_\ell ^x(y)\). Since we have seen that \(f_\ell ^x(\cdot )\), as a function on M, basically possesses one less level of differentiability that f itself, requiring four derivatives for \(\Lambda _\ell \) ultimately leads to requiring \(f\in C^5(M)\). In addition, since the arguments is Sect. 9.3.1 relied on a Taylor expansion, one further derivative is required, which is why the lemma, and so Theorem 11.1, require \(f\in C^6(M)\).

We leave the details to the reader. While they are long and involved, the fact that all random variables are either Gaussian or squares of Gaussians means that there is no more involved than Wick’s formula and accounting. \(\square \)

12.2 Proof of Theorem 11.1

From (11.2), (11.4) and the definition (4.5) of \(\sigma _c^2(f,x))\) we have that
$$\begin{aligned} \sqrt{k}\left| \cot ^2\theta _k(x)-\sigma _c^2(f,x)\right|= & {} \sqrt{k}\,\Big |\sup _{y\in M\setminus \{x\}}\left( R^{xy}_k -E^{x,k}(y)\right) -\sup _{y\in M\setminus \{x\}}V^{xy}\Big |\\\le & {} \sqrt{k}\,\Big |\sup _{y\in M\setminus \{x\}}R^{xy}_k -\sup _{y\in M\setminus \{x\}}V^{xy}\Big | \\&+\, \sqrt{k}\sup _{y\in M\setminus \{x\}}\ \big | E^{x,k}(y)\big |\\\le & {} \sup _{y\in M\setminus \{x\}}\left| \gamma _k(x,y)\right| \ +\ \sqrt{k}\sup _{y\in M\setminus \{x\}}\ \big |E^{x,k}(y)\big |. \end{aligned}$$
From the discussion preceding (11.3) we know that we can ignore the second term here in the limit. The representation of \(\gamma _k\) in (11.9) in terms of the processes \(\Sigma _k^{(1)}, \Sigma _k^{(2)}\) \(B_k, N_k\) and \(Z_k\), the joint weak convergence of all of these in Lemma 12.1, and an application of the continuous mapping theorem, complete the proof of Theorem 11.1. \(\square \)



We would like to thank Takashi Owada for useful discussions, and two referees for helpful comments.


  1. 1.
    Adler, R.J., Taylor, J.E.: Random Fields and Geometry. Springer Monographs in Mathematics. Springer, New York (2007)Google Scholar
  2. 2.
    Amelunxen, D., Bürgisser, P.: Probabilistic analysis of the Grassmann condition number. arXiv:1112.2603
  3. 3.
    Amenta, N., Bern, M.: Surface reconstruction by Voronoi filtering. Discrete Comput. Geom. 22(4), 481–504 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Anderson, T.W.: An Introduction to Multivariate Statistical Analysis. Wiley Series in Probability and Statistics, 3rd edn. Wiley, Hoboken (2003)Google Scholar
  5. 5.
    Billingsley, P.: Convergence of Probability Measures. Wiley Series in Probability and Statistics: Probability and Statistics, 2nd edn. Wiley, New York (1999)CrossRefzbMATHGoogle Scholar
  6. 6.
    Boissonnat, J.-D., Chazal, F., Yvinec, M.: Computational geometry and topology for data analysis. Book in preparation. (2016)
  7. 7.
    Cartwright, D.I., Field, M.J.: A refinement of the arithmetic mean-geometric mean inequality. Proc. Am. Math. Soc. 71(1), 36–38 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Chazal, F., Lieutier, A.: Smooth manifold reconstruction from noisy and non-uniform approximation with guarantees. Comput. Geom. 40(2), 156–170 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Federer, H.: Curvature measures. Trans. Am. Math. Soc. 93, 418–491 (1959)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Gray, A.: Tubes. Advanced Book Program. Addison-Wesley Publishing Company, Redwood City (1990)Google Scholar
  11. 11.
    Hotelling, H.: Tubes and spheres in \(n\)-spaces and a class of statistical problems. Am. J. Math. 61, 440–460 (1939)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Johansen, S., Johnstone, I.M.: Hotelling’s theorem on the volume of tubes: some illustrations in simultaneous inference and data analysis. Ann. Statist. 18(2), 652–684 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Kendall, M., Stuart, A.: The Advanced Theory of Statistics, Design and Analysis, and Time-Series, vol. 3, 3rd edn. Hafner Press [Macmillan Publishing Co., Inc.], New York (1976)zbMATHGoogle Scholar
  14. 14.
    Kendall, M.G., Stuart, A.: The Advanced Theory of Statistics. Distribution Theory, vol. 1, 3rd edn. Hafner Publishing Co., New York (1969)zbMATHGoogle Scholar
  15. 15.
    Krishnan, S.R., Taylor, J.E., Adler, R.J.: The intrinsic geometry of some random manifolds. arXiv:1512.05622 (2015)
  16. 16.
    Ledoux, M., Talagrand, M.: Probability in Banach spaces, Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)]. Isoperimetry and Processes. Springer, Berlin (1991)Google Scholar
  17. 17.
    Lee, J.M.: Riemannian Manifolds, An Introduction to Curvature, Volume 176 of Graduate Texts in Mathematics, vol. 176. Springer, New York (1997)Google Scholar
  18. 18.
    Niyogi, P., Smale, S., Weinberger, S.: A topological view of unsupervised learning from noisy data. SIAM J. Comput. 40(3), 646–663 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Niyogi, P., Smale, S., Weinberger, S.: Finding the homology of submanifolds with high confidence from random samples. Discrete Comput. Geom. 39(1–3), 419–441 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Olkin, I., Pratt, J.W.: Unbiased estimation of certain correlation coefficients. Ann. Math. Statist 29, 201–211 (1958)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Smale, S.: Complexity theory and numerical analysis. In: Iserles, A. (ed.) Acta Numerica, vol. 6, pp. 523–551. Cambridge University Press, Cambridge (1997)Google Scholar
  22. 22.
    Takemura, A., Kuriki, S.: On the equivalence of the tube and Euler characteristic methods for the distribution of the maximum of Gaussian fields over piecewise smooth domains. Ann. Appl. Probab. 12(2), 768–796 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Takemura, A., Kuriki, S.: Tail probability via the tube formula when the critical radius is zero. Bernoulli 9(3), 535–558 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Taylor, J., Takemura, A., Adler, R.J.: Validity of the expected Euler characteristic heuristic. Ann. Probab. 33(4), 1362–1396 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Thäle, C.: 50 years sets with positive reach—a survey. Surv. Math. Appl. 3, 123–165 (2008)MathSciNetzbMATHGoogle Scholar
  26. 26.
    van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer Series in Statistics. With Applications to Statistics. Springer, New York (1996)zbMATHGoogle Scholar
  27. 27.
    Weyl, H.: On the volume of tubes. Am. J. Math. 61(2), 461–472 (1939)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Electrical EngineeringTechnionHaifaIsrael
  2. 2.Department of StatisticsStanford UniversityStanfordUSA
  3. 3.Department of MathematicsUniversity of ChicagoChicagoUSA

Personalised recommendations