Convergence Analysis of Deterministic KernelBased Quadrature Rules in Misspecified Settings
Abstract
This paper presents convergence analysis of kernelbased quadrature rules in misspecified settings, focusing on deterministic quadrature in Sobolev spaces. In particular, we deal with misspecified settings where a test integrand is less smooth than a Sobolev RKHS based on which a quadrature rule is constructed. We provide convergence guarantees based on two different assumptions on a quadrature rule: one on quadrature weights and the other on design points. More precisely, we show that convergence rates can be derived (i) if the sum of absolute weights remains constant (or does not increase quickly), or (ii) if the minimum distance between design points does not decrease very quickly. As a consequence of the latter result, we derive a rate of convergence for Bayesian quadrature in misspecified settings. We reveal a condition on design points to make Bayesian quadrature robust to misspecification, and show that, under this condition, it may adaptively achieve the optimal rate of convergence in the Sobolev space of a lesser order (i.e., of the unknown smoothness of a test integrand), under a slightly stronger regularity condition on the integrand.
Keywords
Kernelbased quadrature rules Misspecified settings Sobolev spaces Reproducing kernel Hilbert spaces Bayesian quadratureMathematics Subject Classification
Primary 65D30 Secondary 65D32 65D05 46E35 46E221 Introduction
1.1 KernelBased Quadrature Rules
How can we obtain a quadrature rule whose convergence rate is faster than \(O(n^{1/2})\)? In practice, one often has prior knowledge or belief on the integrand f, such as smoothness, periodicity and sparsity. Exploiting such knowledge or assumption in constructing a quadrature rule \(\{ (w_i,X_i) \}_{i=1}^n\) may achieve faster rates of convergence, and such methods have been extensively studied in the literature for decades; see, e.g., [17] and [9] for review.
This paper deals with quadrature rules using reproducing kernel Hilbert spaces (RKHS) explicitly or implicitly to achieve fast convergence rates; we will refer to such methods as kernelbased quadrature rules or simply kernel quadrature. As discussed in Sect. 2.4, notable examples include quasiMonte Carlo methods [17, 18, 26, 42], Bayesian quadrature [9, 48] and kernel herding [5, 10, 11]. These methods have been studied extensively in recent years [4, 8, 30, 45, 46, 55, 62] and have recently found applications in, for instance, machine learning and statistics [3, 9, 21, 31, 32, 43, 50].
1.2 Misspecified Settings
This paper focuses on situations where the assumption \(f \in {{\mathcal {H}}}_k\) is violated, that is, misspecified settings. As explained above, convergence guarantees for kernel quadrature rules often assume that \(f\in {{\mathcal {H}}}_k\). However, in practice one may lack the full knowledge on the properties on the integrand, and therefore, misspecification of the RKHS (via the choice of its reproducing kernel k) may occur, that is, \(f \notin {{\mathcal {H}}}_k\).
Such misspecification is likely to happen when the integrand is a black box function. An illustrative example can be found in applications to computer graphics such as the problem of illumination integration (see, e.g., [9]), where the task is to compute the total amount of light arriving at a camera in a virtual environment. This problem is solved by quadrature, with integrand f(x) being the intensity of light arriving at the camera from a direction x (angle). However, the value of f(x) is only given by simulation of the environment for each x, so the integrand f is a black box function. Similar situations can be found in application to statistics and machine learning. A representative example is the computation of marginal likelihood for a probabilistic model, which is an important but challenging task required for model selection (see, e.g., [47]). In modern scientific applications where complex phenomena are dealt with (e.g., climate science), we often encounter situations where the evaluation of a likelihood function, which forms the integrand in marginal likelihood computation, involves an expensive simulation model, making the integrand complex and even black box.
If the integrand is a black box function, there is a tradeoff between the risk of misspecification and gain in the rate of convergence for kernelbased quadrature rules; for a faster convergence rate, one may want to use a quadrature rule for a narrower \({{\mathcal {H}}}_k\) such as of higherorder differentiability, while such a choice may cause misspecification of the function class. Therefore, it is of great importance to elucidate their convergence properties in misspecified situations, in order to make use of such quadrature rules in a safe manner.
1.3 Contributions
This paper provides convergence rates of kernelbased quadrature rules in misspecified settings, focusing on deterministic rules (i.e., without randomization). The focus of misspecification is placed on the order of Sobolev spaces: The unknown order s of the integrand f is overestimated as r, that is, \(s \le r\).
Let \(\varOmega \subset {\mathbb {R}}^d\) be a bounded domain with a Lipschitz boundary (see Sect. 3 for definition). For \(r>d/2\), consider a positive definite kernel \(k_r\) on \(\varOmega \) that satisfies the following assumption;
Assumption 1
The resulting RKHS \({{\mathcal {H}}}_{k_r}(\varOmega )\) is normequivalent to the standard Sobolev space \(H^r(\varOmega )\). The Matérn and Wendland kernels satisfy Assumption 1 (see Sect. 2).
 In Sect. 4.1, it is assumed that \(\sum _{i=1}^n  w_i  = O(n^{c})\) as \(n \rightarrow \infty \) for some constant \(c \ge 0\). Note that \(c = 0\) is taken if the weights satisfy \(\max _{i=1,\dots ,n} w_i = O(n^{1})\), an example of which is the equal weights \(w_1 = \cdots =w_n = 1/n\). Under this assumption and other suitable conditions, Corollary 2 showsThe rate \(O(n^{bs/r})\) in (5) holds if \(c = 0\). Therefore, this result provides convergence guarantees in particular for equalweight quadrature rules, such as quasiMonte Carlo methods and kernel herding, in the misspecified setting.$$\begin{aligned}  P_n f  P f  = O( n^{  bs/r + c (rs)/r } ) \quad (n \rightarrow \infty ). \end{aligned}$$
 Section 4.2 uses an assumption on design points \(X^n := \{X_1,\dots ,X_n\}\) in terms of separation radius\(q_{X_n}\), which is defined byCorollary 3 shows that, if \(q_{X^n} = \varTheta (n^{a})\) as \(n \rightarrow \infty \) for some \(a > 0\), under other regularity conditions,$$\begin{aligned} q_{X_n} := \frac{1}{2} \min _{i \ne j} \Vert X_i  X_j \Vert . \end{aligned}$$(6)The best possible rate is \(O(n^{bs/r})\) when \(a = b/r\). This result provides a convergence guarantee for quadrature rules that obtain the weights \(w_1,\dots ,w_n\) to give \(O(n^{b})\) for the worstcase error with \(X_1,\dots ,X_n\) fixed beforehand. We demonstrate this result by applying it to Bayesian quadrature, as explained below. Our result may also provide the following guideline for practitioners: in order to make a kernel quadrature rule robust to misspecification, one should specify the design points so that the spacing is not too small.$$\begin{aligned}  P_n f  P f  = O(n^{ \min ( b  a(rs), as)} ) \quad (n \rightarrow \infty ). \end{aligned}$$(7)
 Section 5 discusses a convergence rate for Bayesian quadrature under the misspecified setting, demonstrating the results of Sect. 4.2. Given design points \(X^n=\{X_1,\dots ,X_n\}\), Bayesian quadrature defines weights \(w_1,\ldots ,w_n\) as the minimizer of worstcase error (3), which can be obtained by solving a linear equation (see Sect. 2.4 for more detail). For points \(X^n=\{X_1,\dots ,X_n\}\) in \(\varOmega \), the fill distance\(h_{X^n,\varOmega }\) is defined byAssume that there exists a constant \(c_q > 0\) independent of \(X^n\) such that$$\begin{aligned} h_{X^n, \varOmega } := \sup _{x \in \varOmega } \min _{i=1,\dots ,n} \Vert x  X_i \Vert . \end{aligned}$$(8)and that \(h_{X^n,\varOmega } = O(n^{ 1/d})\) as \(n \rightarrow \infty \). Then, Corollary 4 shows that with Bayesian quadrature weights based on the kernel \(k_r\) we have$$\begin{aligned} h_{X^n,\varOmega } \le c_q q_{X^n}, \end{aligned}$$(9)Note that the rate \(O(n^{  s/d })\) matches the minimax optimal rate for deterministic quadrature rules in the Sobolev space of order s [40], which implies that Bayesian quadrature can be adaptive to the unknown smoothness s of the integrand f. The adaptivity means that it can achieve the rate \(O(n^{s/d})\) without the knowledge of s; it only requires the knowledge of the upper bound of the true smoothness \(s \le r\).$$\begin{aligned} \left P_n f  Pf \right = O(n^{  s/d }) \quad (n \rightarrow \infty ). \end{aligned}$$
 Section 3 establishes a rate of convergence for Bayesian quadrature in the wellspecified case, which serves as a basis for the results in the misspecified case (Sect. 5). Corollary 1 asserts that if the design points satisfy \(h_{X^n, \varOmega } = O(n^{1/d})\) as \(n \rightarrow \infty \), thenThis rate \(O(n^{r/d})\) is minimax optimal for deterministic quadrature rules in Sobolev spaces. To the best of our knowledge, this optimality of Bayesian quadrature has not been established before, while recently there has been extensive theoretical analysis on Bayesian quadrature [4, 8, 9, 44].$$\begin{aligned} e_n(P; {{\mathcal {H}}}_{k_r}(\varOmega )) = O(n^{r/d}) \quad (n\rightarrow \infty ). \end{aligned}$$
Preliminary results This paper expands on preliminary results reported in a conference paper by the authors [29]. Specifically, this paper is a complete version of the results presented in Section 5 of [29]. The current paper contains significantly new topics mainly in the following points: (i) We establish the rate of convergence for Bayesian quadrature with deterministic design points and show that it can achieve minimax optimal rates in Sobolev spaces (Sect. 3); (ii) we apply our general convergence guarantees in misspecified settings to the specific case of Bayesian quadrature and reveal the conditions required for Bayesian quadrature to be robust to misspecification (Sect. 5); to make the contribution (ii) possible, we derive finite sample bounds on quadrature error in misspecified settings (Sect. 4). These results are not included in the conference paper.
We also mention that this paper does not contain the results presented in Section 4 of the conference paper [29], which deal with randomized design points. For randomized design points, theoretical analysis can be done based on an approximation theory developed in the statical learning theory literature [12]. On the other hand, the analysis in the deterministic case makes use of the approximation theory developed by [37], which is based on Calderón’s decomposition formula in harmonic analysis [19]. This paper focuses on the deterministic case, and we will report a complete version of the randomized case in a forthcoming paper.
Related work The setting of this paper is complementary to that of [45], in which the integrand is smoother than assumed. That paper proposes to apply the control functional method by [46] to quasiMonte Carlo integration, in order to make it adaptable to the (unknown) greater smoothness of the integrand.
Another related line of research is the proposals of quadrature rules that are adaptive to less smooth integrands [14, 15, 16, 20, 23]. For instance, [20] proposed a kernelbased quadrature rule on a finitedimensional sphere. Their method is essentially a Bayesian quadrature using a specific kernel designed for spheres. They derive convergence rates for this method in both wellspecified and misspecified settings and obtain results similar to ours. The current work differs from [20] in mainly two aspects: (i) Quadrature problems are considered in standard Euclidean spaces, as opposed to spheres; (ii) a generic framework is presented, as opposed to the analysis of a specific quadrature rule. See also a recent work by [62], in which Bayesian quadrature for vectorvalued numerical integration is proposed and its adaptability to the less smooth integrands is discussed.
QuasiMonte Carlo rules based on a certain digit interlacing algorithm [14, 15, 16, 23] are also shown to be adaptive to the (unknown) lower smoothness of an integrand. These papers assume that an integrand is in an anisotropic function class in which every function possesses (squareintegrable) partial mixed derivatives of order \(\alpha \in {\mathbb {N}}\) in each variable. Examples of such spaces include Korobov spaces, Walsh spaces and Sobolev spaces of dominating mixed smoothness (see, e.g., [17, 42]). In their notation, an integer d, which is a parameter called an interlacing factor, can be regarded as an assumed smoothness. Then, if an integrand belongs to an anisotropic function class with smoothness \(\alpha \in {\mathbb {N}}\) such that \(\alpha \le d\), the rate of the form \(O(n^{\alpha + \varepsilon })\) (or \(O(n^{\alpha 1/2 + \varepsilon })\) in a randomized setting) is guaranteed for the quadrature error for arbitrary \(\varepsilon > 0\). The present work differs from these works in that (i) isotropic Sobolev spaces are discussed, where the order of differentiability is identical in all directions of variables, and that (ii) theoretical guarantees are provided for generic quadrature rules, as opposed to analysis of specific quadrature methods.
2 Preliminaries
2.1 Basic Definitions and Notation
We will use the following notation throughout the paper. The set of positive integers is denoted by \({\mathbb {N}}\), and \({\mathbb {N}}_0 := {\mathbb {N}}\cup \{ 0 \}\). For \(\alpha := (\alpha _1,\dots ,\alpha _d)^T \in {\mathbb {N}}_0^d\), we write \( \alpha  := \sum _{i=1}^d \alpha _i\). The ddimensional Euclidean space is denoted by \({\mathbb {R}}^d\), and the closed ball of radius \(R>0\) centered at \(z\in {\mathbb {R}}^d\) by B(z, R). For \(a \in {\mathbb {R}}\), \(\lfloor a \rfloor \) is the greatest integer that is less than a. For a set \(\varOmega \subset {\mathbb {R}}^d\), \(\mathrm{diam} (\varOmega ) := \sup _{x, y \in \varOmega } \Vert x  y\Vert \) is the diameter of \(\varOmega \).
For \(s \in {\mathbb {N}}\) and an open set \(\varOmega \) in \({\mathbb {R}}^d\), \(C^s(\varOmega )\) denotes the vector space of all functions on \(\varOmega \) that are continuously differentiable up to order s, and \(C_B^s(\varOmega ) \subset C^s(\varOmega )\) the Banach space of all functions whose partial derivatives up to order s are bounded and uniformly continuous. The norm of \(C_B^s(\varOmega )\) is given by \(\Vert f \Vert _{C_B^s(\varOmega )} := \sum _{\alpha \in {\mathbb {N}}_0^d:  \alpha  \le s} \sup _{x \in \varOmega } \partial ^\alpha f (x) \), where \(\partial ^\alpha \) is the partial derivative with multiindex \(\alpha \in {\mathbb {N}}_0^d\). The Banach space of the continuous functions that vanish at infinity is denoted by \(C_0 := C_0({\mathbb {R}}^d)\) with sup norm. Let \(C_0^s := C_0^s({\mathbb {R}}^d) := C_0({\mathbb {R}}^d) \cap C_B^s({\mathbb {R}}^d)\) be a Banach space with the norm \(\Vert f \Vert _{C_0^s({\mathbb {R}}^d)} := \Vert f \Vert _{C_B^s({\mathbb {R}}^d)}\).
For function f and a measure \(\mu \) on \({\mathbb {R}}^d\), the support of f and \(\mu \) is denoted by \(\mathrm{supp}(f)\) and \(\mathrm{supp} (\mu )\), respectively. The restriction of f to a subset \(\varOmega \in {\mathbb {R}}^d\) is denoted by \(f_\varOmega \).
Let F and \(F^*\) be normed vector spaces with norms \(\Vert \cdot \Vert _F\) and \(\Vert \cdot \Vert _{F^*}\), respectively. Then, F and \(F^*\) are said to be normequivalent, if \(F= F^*\) as a set, and there exist constants \(C_1, C_2 > 0\) such that \(C_1 \Vert f \Vert _{F^*} \le \Vert f \Vert _{F} \le C_2 \Vert f \Vert _{F^*}\) for all \(f \in F\). For a Hilbert space \({{\mathcal {H}}}\) with inner product \(\langle \cdot ,\cdot \rangle _{{\mathcal {H}}}\), the norm of \(f\in {{\mathcal {H}}}\) is denoted by \(\Vert f\Vert _{{\mathcal {H}}}\).
2.2 Sobolev Spaces and Reproducing Kernel Hilbert Spaces
Here we briefly review key facts regarding Sobolev spaces necessary for stating and proving our contributions; for details, we refer to [1, 6, 59]. We first introduce reproducing kernel Hilbert spaces. For details, see, e.g., [58, Section 4] and [61, Section 10].
Let \(\varOmega \) be a set. A Hilbert space \({{\mathcal {H}}}\) of realvalued functions on \(\varOmega \) is a reproducing kernel Hilbert space (RKHS) if the functional \(f\mapsto f(x)\) is continuous for any \(x\in \varOmega \). Let \(\langle \cdot ,\cdot \rangle _{{\mathcal {H}}}\) be the inner product of \({{\mathcal {H}}}\). Then, there is a unique function \(k_x\in {{\mathcal {H}}}\) such that \(f(x)=\langle f,k_x\rangle _{{\mathcal {H}}}\). The kernel defined by \(k(x,y):=k_x(y)\) is positive definite and called reproducing kernel of \({{\mathcal {H}}}\). It is known (Moore–Aronszajn theorem [2]) that for every positive definite kernel \(k: \varOmega \times \varOmega \rightarrow {\mathbb {R}}\) there exists a unique RKHS \({{\mathcal {H}}}\) with k as the reproducing kernel. Therefore, the notation \({{\mathcal {H}}}_k\) is used to the RKHS associated with k.
In the following, we will introduce two definitions of Sobolev spaces, i.e., (10) and (11), as both will be used throughout our analysis.
2.3 KernelBased Quadrature Rules
We briefly review basic facts regarding kernelbased quadrature rules necessary to describe our results. For details, we refer to [9, 17].
2.4 Examples of KernelBased Quadrature Rules
Bayesian quadrature This is a class of kernelbased quadrature rules that has been studied extensively in the literature on statistics and machine learning [4, 7, 8, 9, 13, 22, 25, 27, 35, 46, 48, 49, 51]. In Bayesian quadrature, design points \(X_1,\dots ,X_n\) may be obtained jointly in a deterministic manner [9, 13, 35, 48, 51], sequentially (adaptively) [8, 25, 27, 49] or randomly [4, 7, 9, 22, 46]. For instance, [9] proposed to generate design points randomly as a Markov chain Monte Carlo sample, or deterministically by a quasiMonte Carlo rule, specifically as a higherorder digital net [15].
This way of constructing the estimate \(P_n f\) is called Bayesian quadrature, since \(P_n f\) can be seen as a posterior estimate in a certain Bayesian inference problem with f generated as sample of a Gaussian process (see, e.g., [27] and [9]).
QuasiMonte Carlo QuasiMonte Carlo (QMC) methods are equalweight quadrature rules designed for the uniform distribution on a hypercube \([0,1]^d\) [17]. Modern QMC methods make use of RKHSs and the associated kernels to define and calculate the worstcase error in order to obtain good design points (e.g., [14, 18, 26, 54]). Therefore, such QMC methods are instances of kernelbased quadrature rules; see [42] and [17] for a review.
Kernel herding In the machine learning literature, an equalweight quadrature rule called kernel herding [11] has been studied extensively [5, 27, 28, 32]. It is an algorithm that greedily searches for design points so as to minimize the worstcase error in an RKHS. In contrast to QMC methods, kernel herding may be used with an arbitrarily distribution P on a generic measurable space, given that the integral \(\int k(\cdot ,x)\mathrm{d}P(x)\) admits a closedform solution with a reproducing kernel k. It has been shown that a fast rate \(O(n^{1})\) is achievable for the worstcase error, when the RKHS is finitedimensional [11]. While empirical studies indicate that the fast rate would also hold in the case of an infinitedimensional RKHS, its theoretical proof remains an open problem [5].
3 Convergence Rates of Bayesian Quadrature
This section discusses the convergence rates of Bayesian quadrature in wellspecified settings. It is shown that Bayesian quadrature can achieve the minimax optimal rates for deterministic quadrature rules in Sobolev spaces. The result also serves as a preliminary to Sect. 5, where misspecified cases are considered.
Let \(\varOmega \) be an open set in \({\mathbb {R}}^d\) and \(X^n := \{ X_1,\dots , X_n\}\subset \varOmega \). The main notion to express the convergence rate is fill distance \(h_{X^n,\varOmega }\) (8), which plays a central role in the literature on scattered data approximation [61] and has been used in the theoretical analysis of Bayesian quadrature in [9, 44].
Definition 1
(Interior cone condition) A set \(\varOmega \subset {\mathbb {R}}^d\) is said to satisfy an interior cone condition if there exist an angle \(\theta \in (0,2\pi )\) and a radius \(R > 0\) such that every \(x \in \varOmega \) is associated with a unit vector \(\xi (x)\) so that the cone \(C(x, \xi (x), \theta , R)\) is contained in \(\varOmega \).
The interior cone condition requires that there is no ‘pinch point’ (i.e., a \(\prec \)shape region) on the boundary of \(\varOmega \); see also [44].
Next, the notions of special Lipschitz domain [57, p.181] and Lipschitz boundary^{2} are defined as follows (see [57, p.189]; [6, Definition 1.4.4]).
Definition 2
 1.
\({\tilde{\varOmega }} = \{ (x,y) \in {\mathbb {R}}^d: y > \varphi (x) \}\);
 2.
\(\varphi \) is a Lipschitz function such that \(\varphi (x)  \varphi (x')  \le M \Vert x  x' \Vert \) for all \(x,x' \in {\mathbb {R}}^{d1}\), where \(M > 0\).
Definition 3
 1.
For any \(x \in \partial \varOmega \), there exists an index i such that \(B(x,\varepsilon ) \subset U_i\), where \(B(x,\varepsilon )\) is the ball centered at x and radius \(\varepsilon \);
 2.
\(U_{i_1} \cap \cdots \cap U_{i_{N+1}} = \emptyset \) for any distinct indices \(\{ i_1,\dots ,i_{N+1} \}\);
 3.
For each index i, there exists a special Lipschitz domain \(\varOmega _i \subset {\mathbb {R}}^d\) with Lipschitz bound b such that \(U_i \cap \varOmega = U_i \cap \varOmega _i\) and \(b\le M\).
Examples of a set \(\varOmega \) having a Lipschitz boundary include: (i) \(\varOmega \) is an open bounded set whose boundary \(\partial \varOmega \) is \(C^1\) embedded in \({\mathbb {R}}^d\); (ii) \(\varOmega \) is an open bounded convex set [57, p.189].
Proposition 1
Proof
Remark 1

Typically, the fill distance \(h_{X^n,\varOmega }\) decreases to 0 as the number n of design points increases. Therefore, the upper bound \(C h_{X^n \varOmega }^r\) provides a faster rate of convergence for \(e_n(P; W_2^r(\varOmega ))\) by a larger value of the degree r of smoothness.

The condition \(h_{X^n,\varOmega } \le h_0\) requires that the design points \(X^n = \{ X_1,\dots ,X_n \}\) must cover the set \(\varOmega \) to a certain extent in order to guarantee the error bound to hold. This requirement arises since we have used a result from the scattered data approximation literature [61, Corollary 11.33] to derive inequality (19) in our proof. In the literature, such a condition is necessary and we refer an interested reader to Section 11 of [61] and references therein.

The constant \(h_0 > 0\) depends only on the constants \(\theta \) and R in the interior cone condition (Definition 1). The explicit form is \(h_0 := Q(\lfloor r \rfloor , \theta ) R\), where \(Q(\lfloor r \rfloor ,\theta ) := \frac{ \sin \theta \sin \psi }{8 \lfloor r \rfloor ^2 (1 + \sin \theta ) (1 + \sin \psi ) }\) with \(\psi := 2 \arcsin \frac{\sin \theta }{4(1+\sin \theta )}\) [61, p.199].
The following is an immediate corollary to Proposition 1.
Corollary 1
Remark 2
 Result (21) implies that the same rate is attainable for the Sobolev space \(H^r(\varOmega )\) (instead of \(H_{k_r}(\varOmega ))\):with (the sequence of) the same weighted points \(\{ (w_i,X_i) \}_{i=1}^\infty \). This follows from the norm equivalence between \({{\mathcal {H}}}_{k_r}(\varOmega )\) and \(H^r(\varOmega )\).$$\begin{aligned} e_n(P;H^r(\varOmega )) = O(n^{ \alpha r}) \quad (n \rightarrow \infty ) \end{aligned}$$(22)

If the fill distance satisfies \(h_{X^n,\varOmega } = O(n^{1/d})\) as \(n \rightarrow \infty \), then \(e_n(P; H^r(\varOmega )) = O(n^{ r/d})\). This rate is minimax optimal for the deterministic quadrature rules for the Sobolev space \(H^r(\varOmega )\) on a hypercube [40, Proposition 1 in Section 1.3.12]. Corollary 1 thus shows that Bayesian quadrature achieves the minimax optimal rate in this setting.

The decay rate for the fill distance \(h_{X^n,\varOmega } = O(n^{1/d})\) holds when, for example, the design points \(X^n = \{ X_1,\dots ,X_n \}\) are equally spaced grid points in \(\varOmega \). Note that this rate cannot be improved: If the fill distance decreased at a rate faster than \(O(n^{1/d})\), then \(e_n(P; H^r(\varOmega ))\) would decrease more quickly than the minimax optimal rate, which is a contradiction.
4 Main Results
This section presents the main results on misspecified settings. Two results based on different assumptions are discussed: one on the quadrature weights in Sect. 4.1 and the other on the design points in Sect. 4.2. The approximation theory for Sobolev spaces developed by [37] is employed in the results.
4.1 Convergence Rates Under an Assumption on Quadrature Weights
Theorem 1
Proof
Remark 3

The integrand f is assumed to satisfy \(f \in H^s(\varOmega ) \cap C_B^s(\varOmega ) \cap L_1(\varOmega )\), which is slightly stronger than just assuming \(f \in H^s(\varOmega )\).

In upper bound (23), the constant \(\sigma > 0\) controls the tradeoff between the two terms: \(c_2 (1+\sigma ^2)^{\frac{rs}{2}} e_n(P;{{\mathcal {H}}}_{k_r}(\varOmega )) \Vert f \Vert _{H^s(\varOmega )}\) and \(c_1 \left( \sum _{i=1}^n w_i + 1 \right) \cdot \sigma ^{s} \Vert f \Vert _{C_B^s(\varOmega )}\). In the proof, the integrand f is approximated by a bandlimited function \(g_\sigma \in H^r(\varOmega )\), where \(\sigma \) is the highest spectrum that \(g_\sigma \) possesses. Thus, the tradeoff in the upper bound corresponds to the tradeoff between the accuracy of approximation of f by \(g_\sigma \) and the penalty incurred on the regularity of \(g_\sigma \).
The following result, which is a corollary of Theorem 1, provides a rate of convergence for the quadrature error in a misspecified setting. It is derived by assuming certain rates for the quantity \(\sum _{i=1}^n  w_i \) and the worstcase error \(e_n(P;{{\mathcal {H}}}_{k_r})\).
Corollary 2
Proof
Remark 4

The exponent of the rate in (31) consists of two terms: \(bs/r\) and \(c(rs)/r\). The first term \(bs/r\) corresponds to a degraded rate from the original \(O(n^{b})\) by the factor of smoothness ratio s / r, while the second term \(c(rs)/r\) makes the rate slower. The effect of the second term increases as the constant c or the gap \((rs)\) of misspecification becomes larger.

The obtained rate recovers \(O(n^{b})\) for \(r=s\) (wellspecified case) regardless of the value of c.

Consider the misspecified case \(r>s\). If \(c>0\), the term \(c(rs)/r\) always makes the rate slower. It is thus better to have \(c=0\), as in this case we have the rate \(O(n^{bs/r})\) in the misspecified setting. The weights with \(\max _{i=1,\dots ,n} w_i = O(n^{1})\), such as equal weights \(w_i=1/n\), realize \(c=0\).

As mentioned earlier, the minimax optimal rate for the worstcase error in the Sobolev space \(H^r(\varOmega )\) with \(\varOmega \) being a cube in \({\mathbb {R}}^d\) and P being the Lebesgue measure on \(\varOmega \) is \(O(n^{r/d})\) [40, Proposition 1 in Section 1.3.12]. If design points satisfy \(b = r/d\) and \(c = 0\) in this setting, Corollary 2 provides the rate \(O(n^{s/d})\) for \(f \in H^s(\varOmega ) \cap C_B^s(\varOmega ) \cap L_1(\varOmega )\). This rate is the same as the minimax optimal rate for \(H^s(\varOmega )\) and hence implies some adaptivity to the order of differentiability.
 The assumption \(\sum _{i=1}^n w_i = O(n^c)\) can be also interpreted from a probabilistic viewpoint. Assume that the observation involves noise, \(Y_i := f(X_i) + \varepsilon _i\ (i=1,\dots ,n)\), where \(\varepsilon _i\) is independent noise with \({{\mathbb {E}}}[ \varepsilon _i^2] = \sigma _\mathrm{noise}^2\) (\(\sigma _{\mathrm{noise}} > 0\) is a constant) for \(i=1,\dots ,n\), and that \(Y_i\) are used for numerical integration. The expected squared error is decomposed asIn the last expression, the first term \(\left P_n f  Pf \right ^2\) is the squared error in the noiseless case, and the second term \(\sigma _{\mathrm{noise}}^2 \sum _{i=1}^n w_i^2 \) is the error due to noise. Since \(\sum _{i=1}^n w_i^2 \le ( \sum _{i=1}^n w_i )^2 = O(n^{2c})\), the error in the second term may be larger as c increases. Hence, quadrature weights having smaller c are preferable in terms of robustness to the existence of noise; this in turn makes the quadrature rule more robust to the misspecification of the degree of smoothness.$$\begin{aligned} {{\mathbb {E}}}_{\varepsilon _1,\dots ,\varepsilon _n} \left[ \left( \sum _{i=1}^n w_i Y_i  Pf \right) ^2 \right]= & {} {{\mathbb {E}}}_{\varepsilon _1,\dots ,\varepsilon _n} \left[ \left( P_n f  Pf + \sum _{i=1}^n w_i \varepsilon _i \right) ^2 \right] \nonumber \\= & {} \left P_n f  Pf \right ^2 + \sigma _{\mathrm{noise}}^2 \sum _{i=1}^n w_i^2. \end{aligned}$$
Theorem 1 and Corollary 2 require a control on the absolute sum of the quadrature weights \(\sum _{i=1}^n w_i\). This is possible with, for instance, equalweight quadrature rules that seek for good design points. However, the control of \(\sum _{i=1}^n w_i\) could be difficult for quadrature rules that obtain the weights by optimization based on prefixed design points. This includes the case of Bayesian quadrature that optimizes the weights without any constraint. To deal with such methods, in the next section we will develop theoretical guarantees that do not rely on the assumption on the quadrature weights, but on a certain assumption on the design points.
4.2 Convergence Rates Under an Assumption on Design Points
This subsection provides convergence guarantees in a misspecified settings under an assumption on the design points. The assumption is described in terms of separation radius (6), which is (the half of) the minimum distance between distinct design points. The separation radius of points \(X^n := \{ X_1,\dots , X_n \} \subset {\mathbb {R}}^d\) is denoted by \(q_{X^n}\). Note that if \(X^n\subset \varOmega \) for some \(\varOmega \), then the separation radius lower bounds the fill distance, i.e., \(q_{X^n} \le h_{X^n,\varOmega }\).
Henceforth, we will consider a bounded domain \(\varOmega \), and without loss of generality, we assume that it satisfies \(\mathrm{diam}(\varOmega ) \le 1\).
Theorem 2
Proof
Remark 5

From \(q_{X^n} \le h_{X^n}\), the separation radius \(q_{X^n}\) typically converges to zero as \(n\rightarrow \infty \). For the upper bound in (32), the factor \(q_{X^n}^{(rs)}\) in the first term diverges to infinity as \(n\rightarrow \infty \), while the second term goes to zero. Thus, \(q_{X^n}\) should decay to zero in an appropriate speed depending on the rate of \(e_n(P;{{\mathcal {H}}}_{k_r}(\varOmega ))\), in order to make the quadrature error small in the misspecified setting.

Note that as the gap between r and s becomes large, the effect of the separation radius becomes serious; this follows from the expression \(q_{X^n}^{(rs)}\).
Based on Theorem 2, we establish below a rate of convergence in a misspecified setting by assuming a certain rate of decay for the separation radius as the number of design points increases.
Corollary 3
Proof
5 Bayesian Quadrature in Misspecified Settings
To demonstrate the results of Sect. 4, a rate of convergence for Bayesian quadrature in misspecified settings is derived. To this end, an upper bound on the integration error of Bayesian quadrature is first provided, when the smoothness of an integrand is overestimated. It is obtained by combining Theorem 2 in Sect. 4 and Proposition 1 in Sect. 3.
Theorem 3
Proof
Remark 7
 Condition (44) implies thatwhere \(c' := c_q^{1/\delta }\) is independent of \(X^n\). This condition is stronger for a larger value of \(\delta \), requiring that distinct design points should not be very close to each other. Note that the lower bound \(1s/r<\delta \) is necessary for the upper bound of error (45) to have a positive exponent, while the upper bound \(\delta \le 1\) follows from \(q_{X^n} \le h_{X^n,\varOmega }\), which holds by definition. The constraint \(1s/r<\delta \) and (49) thus imply that a stronger condition is required for \(X^n\) as the degree of misspecification becomes more serious (i.e., as the ratio s / r becomes smaller).$$\begin{aligned} c' h_{X^n,\varOmega }^{1/\delta } \le q_{X^n} \le h_{X^n,\varOmega }, \end{aligned}$$(49)
 If condition (44) is satisfied for \(\delta = 1\), then the design points \(X^n\) are called quasiuniform [53, Section 7.3]. In this case, the bound in (45) isThis is the same order of approximation as that of Proposition 1 when \(r = s\). Proposition 1 provides an error bound for Bayesian quadrature in a wellspecified case, where one knows the degree of smoothness s of the integrand. Therefore, (50) suggests that if the design points are quasiuniform, then Bayesian quadrature can be adaptive to the (unknown) degree of the smoothness s of the integrand f, even in a situation where one only knows its upper bound \(r \ge s\).$$\begin{aligned}  P_n f  Pf  \le C \max \left( \Vert f \Vert _{C_B^s(\varOmega )}, \Vert f \Vert _{H^s(\varOmega )} \right) h_{X,\varOmega }^s. \end{aligned}$$(50)
We obtain the following as a corollary of Theorem 3. The proof is obvious and omitted.
Corollary 4
Remark 8

The rate \(O(n^{s/d})\) in (52) matches the minimax optimal rate of deterministic quadrature rules for the worstcase error in the Sobolev space \(H^s(\varOmega )\) with \(\varOmega \) being a cube [40, Proposition 1 in Section 1.3.12]. Therefore, it is shown that the optimal rate may be achieved by Bayesian quadrature, even in the misspecified setting (under a slightly stronger assumption that \(f \in H^s(\varOmega ) \cap C_B^s(\varOmega )\)). In other words, Bayesian quadrature may achieve the optimal rate adaptively, without knowing the degree s of smoothness of a test function: One just needs to know its upper bound \(r \ge s\).

The main assumptions required for the optimal rate (52) are that (i) \(h_{X^n,\varOmega } = O(n^{1/d})\) and that (ii) \(h_{X^n,\varOmega } \le c_q q_{X^n}^\delta \) for \(\delta = 1\). Recall that (i) is the same assumption that is required for the optimal rate \(O(n^{r/d})\) in the wellspecified setting \(f \in H^r(\varOmega )\) (Corollary 1). On the other hand, (ii) is the one required for the finite sample bound in Theorem 3. Both these assumptions are satisfied, for instance, if \(X_1,\dots ,X_n\) are grid points in \(\varOmega \).
6 Simulation Experiments
We conducted simulation experiments to empirically assess the obtained theoretical results. MATLAB code for reproducing the results is available at https://github.com/motonobuk/kernelquadrature. We focus on Bayesian quadrature in these experiments.
6.1 Problem Setting

Uniform\(X^n = \{X_1,\dots ,X_n\}\) are equally spaced grid points in [0, 1] with \(X_1 = 0\) and \(X_n = 1\), that is, \(X_i = (i1) / (n1)\) for \(i = 1,\dots ,n\).

Nonuniform\(X^n = \{X_1,\dots ,X_n \}\) are nonequally spaced points in [0, 1], such that \(X_i = (i1)/(n1)\) if i is odd, and \(X_i = X_{i1} + (n1)^{2}\) if i is even.
Evaluation measure For each pair of \(r\ (= 1,2,3,4)\) and \(s\ (= 1,2,3,4)\), we first computed quadrature weights \(w_1,\dots ,w_n\) by minimizing the worstcase error in \(H^r([0,1])\) and then evaluated the quadrature rule \((w_i,X_i)_{i=1}^n\) by computing the worstcase error in \(H^s([0,1])\), that is, \(\sup _{\Vert f \Vert _{H^s([0,1])} \le 1} P_n f Pf\). More concretely, we computed the weights \(w_1,\dots ,w_n\) by formula (17) for Bayesian quadrature using the kernel \(k_r\) and then evaluated worstcase error (12) by computing the square root of (16) using the kernel \(k_s\). In this way, one can evaluate the performance of kernel quadrature under various settings. For instance, the case \(s < r\) is a situation where the true smoothness s is smaller than the assumed one r, the misspecified setting we have dealt in the paper.
6.2 Results
The simulation results are shown in Fig. 1 (Uniform design points) and Fig. 2 (Nonuniform design points). In the figures, we also report the exponents in the empirical rates of the fill distance \(h_{X^n,\varOmega }\), the separation radius \(q_{X^n}\), and the absolute sum of weights \(\sum _{i=1}^n w_i\) in the top of each subfigure; see the captions of Figs. 1 and 2 for details. Based on these, we can draw the following observations.
Optimal rates in the wellspecified case In both Figs. 1 and 2, the black solid lines are the worstcase errors in the wellspecified case \(s = r\). The empirical convergence rates of these worstcase errors are very close to the optimal rates derived in Sect. 3 (see Corollary 1 and its remarks), confirming the theoretical results. Proposition 1 and Corollary 1 also show that the worstcase error in the wellspecified case is determined by the fill distance and is independent of the separation radius. The simulation results are consistent with this, since for both Figs. 1 and 2 the fill distance decays essentially at the rate \(O(n^{1})\), while the separation radius decays quicker for Fig. 2 than for Fig. 1.
Adaptability to greater smoothness While the case \(s > r\) is not covered by our theoretical analysis, Figs. 1 and 2 show some adaptation to the greater smoothness. This phenomenon is also observed by Bach [4, Section 5], who showed (for quadrature weights obtained with regularized matrix inversion) that if \(2r \ge s > r\), then the optimal rate is still attainable in an adaptive way. Bach [4, Section 6] verified this finding in experiments with quadrature weights without regularization. In our experiments, this phenomenon is observed for all cases of \(2 r \ge s > r\) expect for the case \(r = 2\) and \(s = 4\) in both Figs. 1 and 2. Note, however, that in [4], design points are assumed to be randomly generated from a specific proposal distribution, so the results there are not directly applicable to deterministic quadrature rules.
The effect of the separation radius In Fig. 1, the rate for \(s = 1\), that is, \(O(n^{1.052})\), remains essentially the same for different values of \(r = 1,2,3,4\). This rate is essentially the optimal rate for \(s = 1\), thus showing the adaptability of Bayesian quadrature to the unknown lesser smoothness (for \(r = 2, 3, 4\)). On the other hand, in Fig. 2 on nonuniform design points, the rate for \(s = 1\) becomes slower as r increases. That is, the rates are \(O(n^{1.035})\) for \(r = 1\) (the wellspecified case), \(O(n^{0.945})\) for \(r = 2\), \(O(n^{0.919})\) for \(r = 3\) and \(O(n^{0.748})\) for \(r = 4\). This phenomenon may be attributed to the fact that the separation radius of the design points for Fig. 2 decays faster than those for Fig. 1. Corollary 4 shows that the rates in the misspecified case \(s < r\) become slower as the separation radius decays more quickly and/or as the gap \(rs\) (or the degree of misspecification) increases, and this is consistent with the simulation results.
The effect of the weights While the sum of absolute weights \(\sum _{i=1}^n w_i\) remains constant in Fig. 1, this quantity increases in Fig. 2. In the notation of Corollary 2, \(\sum _{i=1}^n w_i = O(n^c)\) with \(c = 0\) for Fig. 1 while \(c \approx 0.5\) for Fig. 2 with \(r = 2, 3, 4\). Therefore, the observation given in the preceding paragraph is also consistent with Corollary 2, since it states that larger c makes the rates slower in the misspecified case. Note that the separation radius and the quantity \(\sum _{i=1}^n w_i\) are intimately related in the case of Bayesian quadrature, since the weights are computed from the inverse of the kernel matrix as (17) and thus affected by the smallest eigenvalue of the kernel matrix, while this smallest eigenvalue strongly depends on the separation radius and the smoothness of the kernel; see, e.g., [52] [61, Section 12] and references therein.
7 Discussion
In this paper, we have discussed the convergence properties of kernel quadrature rules with deterministic design points in misspecified settings. In particular, we have focused on settings where quadrature weighted points are generated based on misspecified assumptions on the degree of smoothness, that is, the situation where the integrand is less smooth than assumed.
We have revealed conditions for quadrature rules under which adaptation to the unknown lesser degree of smoothness occurs. In particular, we have shown that a kernel quadrature rule is adaptive if the sum of absolute weights remains constant, or if the spacing between design points is not too small (as measured by the separation radius). Moreover, by focusing on Bayesian quadratures as working examples, we have shown that they can achieve minimax optimal rates of the unknown degree of smoothness, if the design points are quasiuniform. We expect that this result provides a practical guide for developing kernel quadratures that are robust to the misspecification of the degree of smoothness; such robustness is important in modern applications of quadrature methods, such as numerical integration in sophisticated Bayesian models, since they typically involve complicated or black box integrands, and thus, misspecification is likely to happen.
There are several important topics to be investigated as part of future work.
Other RKHSs This paper has dealt with Sobolev spaces as RKHSs of kernel quadrature. However, there are many other important RKHSs of interest where similar investigation can be carried out. For instance, Gaussian RKHSs (i.e., the RKHSs of Gaussian kernels) have been widely used in the literature on Bayesian quadrature. Such an RKHS consists of functions with infinite degree of smoothness. This makes theoretical analysis challenging: Our analysis relies on the approximation theory developed by Narcowich and Ward [37], which only applies to the standard Sobolev spaces. Similarly, the theory of [37] is also not applicable to Sobolev spaces with dominating mixed smoothness, which have been popular in the QMC literature. In order to analyze quadrature rules in these RKHSs, we therefore need to extend the approximation theory of [37] to such spaces. Overall, this is an important but challenging theoretical problem. (We also mention that relevant results are available in followup papers [38, 39]. While these results do not directly provide the desired generalizations due to the same reasons mentioned above, these could still be potentially useful for our purpose.)
Sequential (adaptive) quadrature Another important direction is the analysis for kernel quadratures that sequentially select design points. Such methods are also called adaptive, since the selection of the next point \(X_{n+1}\) depends on the function values \(f(X_1), \dots , f(X_n)\) of the already selected points \(X_1,\dots ,X_n\). Note that the adaptability here is different from that of the current paper where we used it in the context of adaptability of quadrature to unknown degree of smoothness. For instance, the WSABI algorithm by [25] is an example of adaptive Bayesian quadrature which is considered as state of the art for the application of Bayesian model evidence calculation. Such adaptive methods have been known to be able to outperform nonadaptive methods in the following case: The hypothesis space is imbalanced or nonconvex (see, e.g., Section 1 of [41]). In the worstcase error, the hypothesis space is the unit ball in the RKHS \({{\mathcal {H}}}\), which is balanced and convex and so adaptation does not help. In fact, it is known that the optimal rate can be achieved without adaptation. However, if the hypothesis space is imbalanced (i.e., f being in the hypothesis space does not imply that \(f\) is in the hypothesis space), then adaptive methods may perform better. For instance, the WSABI algorithm focuses on nonnegative integrands, which means that the hypothesis is imbalanced, and thus, adaptive selection helps. Our analysis in this paper has focused on the worstcase error defined by the unit ball in an RKHS, which is balanced and convex. A future direction is thus to consider the setting of imbalanced or nonconvex hypothesis spaces, such as the one consisting of nonnegative functions, which will enable us to analyze the convergence behavior of sequential or adaptive Bayesian quadrature in misspecified settings.
Random design points We have focused on deterministic quadrature rules in this paper. In the literature, however, the use of random design points has also been popular. For instance, the design points of Bayesian quadrature might be i.i.d. with a certain proposal distribution or generated as an MCMC sequence. Likewise, QMC methods usually apply randomization to deterministic design points. Our forthcoming paper will deal with such situations and provide more general results than the current paper.
Footnotes
Notes
Acknowledgements
The open access funding is provided by the Max Planck Society. We would like to express our gratitude to the editor and anonymous referees for their constructive feedback that greatly improved the paper. Most of this work has been done when MK was working at the Institute of Statistical Mathematics, Tokyo.
References
 1.Adams, R.A., Fournier, J.J.F.: Sobolev Spaces, 2nd edn. Academic Press, New York (2003)zbMATHGoogle Scholar
 2.Aronszajn, N.: Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3) pp. 337–404 (1950)MathSciNetCrossRefzbMATHGoogle Scholar
 3.Avron, H., Sindhwani, V., Yang, J., Mahoney, M.W.: QuasiMonte Carlo feature maps for shiftinvariant kernels. Journal of Machine Learning Research 17(120), 1–38 (2016)MathSciNetzbMATHGoogle Scholar
 4.Bach, F.: On the equivalence between kernel quadrature rules and random feature expansions. Journal of Machine Learning Research 18(19), 1–38 (2017)MathSciNetzbMATHGoogle Scholar
 5.Bach, F., LacosteJulien, S., Obozinski, G.: On the equivalence between herding and conditional gradient algorithms. In: J. Langford, J. Pineau (eds.) Proceedings of the 29th International Conference on Machine Learning (ICML2012), pp. 1359–1366. Omnipress (2012)Google Scholar
 6.Brenner, S.C., Scott, L.R.: The Mathematical Theory of Finite Element Methods, 3rd edn. Springer (2008)CrossRefzbMATHGoogle Scholar
 7.Briol, F.X., Oates, C.J., Cockayne, J., Chen, W.Y., Girolami, M.: On the sampling problem for kernel quadrature. In: D. Precup, Y.W. Teh (eds.) Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 70, pp. 586–595. PMLR (2017)Google Scholar
 8.Briol, F.X., Oates, C.J., Girolami, M., Osborne, M.A.: FrankWolfe Bayesian quadrature: Probabilistic integration with theoretical guarantees. In: C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama, R. Garnett (eds.) Advances in Neural Information Processing Systems 28, pp. 1162–1170. Curran Associates, Inc. (2015)Google Scholar
 9.Briol, F.X., Oates, C.J., Girolami, M., Osborne, M.A., Sejdinovic, D.: Probabilistic integration: A role in statistical computation? Statistical Science (2018). To appearGoogle Scholar
 10.Chen, W.Y., Mackey, L., Gorham, J., Briol, F.X., Oates, C.: Stein points. In: J. Dy, A. Krause (eds.) Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, pp. 844–853. PMLR (2018)Google Scholar
 11.Chen, Y., Welling, M., Smola, A.: Supersamples from kernelherding. In: P. Grünwald, P. Spirtes (eds.) Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI 2010), pp. 109–116. AUAI Press (2010)Google Scholar
 12.Cucker, F., Zhou, D.X.: Learning Theory: An approximation theory view point. Cambridge University Press (2007)CrossRefzbMATHGoogle Scholar
 13.Diaconis, P.: Bayesian numerical analysis. Statistical decision theory and related topics IV 1, 163–175 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
 14.Dick, J.: Explicit constructions of quasiMonte Carlo rules for the numerical integration of highdimensional periodic functions. SIAM Journal on Numerical Analysis 45, 2141–2176 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
 15.Dick, J.: Walsh spaces containing smooth functions and quasi–Monte Carlo rules of arbitrary high order. SIAM Journal on Numerical Analysis 46(3), 1519–1553 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
 16.Dick, J.: Higher order scrambled digital nets achieve the optimal rate of the root mean square error for smooth integrands. The Annals of Statistics 39(3), 1372–1398 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 17.Dick, J., Kuo, F.Y., Sloan, I.H.: High dimensional numerical integration  the QuasiMonte Carlo way. Acta Numerica 22 133288 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
 18.Dick, J., Nuyens, D., Pillichshammer, F.: Lattice rules for nonperiodic smooth integrands. Numerische Mathematik 126(2), 259–291 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 19.Frazier, M., Jawerth, B., Weiss, G.L.: LittlewoodPaley Theory and the Study of Function Spaces. American Mathematical Society (1991)Google Scholar
 20.Fuselier, E., Hangelbroek, T., Narcowich, F.J., Ward, J.D., Wright, G.B.: Kernel based quadrature on spheres and other homogeneous spaces. Numerische Mathematik 127(1), 57–92 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 21.Gerber, M., Chopin, N.: Sequential quasi Monte Carlo. Journal of the Royal Statistical Society. Series B. Statistical Methodology 77(3), 509579 (2015)MathSciNetCrossRefGoogle Scholar
 22.Ghahramani, Z., Rasmussen, C.E.: Bayesian monte carlo. In: S. Becker, S. Thrun, K. Obermayer (eds.) Advances in Neural Information Processing Systems 15, pp. 505–512. MIT Press (2003)Google Scholar
 23.Goda, T., Dick, J.: Construction of interlaced scrambled polynomial lattice rules of arbitrary high order. Foundations of Computational Mathematics 15(5), 1245–1278 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 24.Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.: A kernel twosample test. Jounal of Machine Learning Research 13, 723–773 (2012)MathSciNetzbMATHGoogle Scholar
 25.Gunter, T., Osborne, M.A., Garnett, R., Hennig, P., Roberts, S.J.: Sampling for inference in probabilistic models with fast Bayesian quadrature. In: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 27, pp. 2789–2797. Curran Associates, Inc. (2014)Google Scholar
 26.Hickernell, F.J.: A generalized discrepancy and quadrature error bound. Mathematics of Computation 67(221), 299–322 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
 27.Huszár, F., Duvenaud, D.: Optimallyweighted herding is Bayesian quadrature. In: N. de Freitas, K. Murphy (eds.) Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI2012), pp. 377–385. AUAI Press (2012)Google Scholar
 28.Kanagawa, M., Nishiyama, Y., Gretton, A., Fukumizu, K.: Filtering with stateobservation examples via kernel monte carlo filter. Neural Computation 28(2), 382–444 (2016)CrossRefGoogle Scholar
 29.Kanagawa, M., Sriperumbudur, B.K., Fukumizu, K.: Convergence guarantees for kernelbased quadrature rules in misspecified settings. In: D.D. Lee, M. Sugiyama, U.V. Luxburg, I. Guyon, R. Garnett (eds.) Advances in Neural Information Processing Systems 29, pp. 3288–3296. Curran Associates, Inc. (2016)Google Scholar
 30.Karvonen, T., Oates, C.J., Särkkä, S.: A BayesSard cubature method. In: Advances in Neural Information Processing Systems 31. Curran Associates, Inc. (2018). To appearGoogle Scholar
 31.Kersting, H., Hennig, P.: Active uncertainty calibration in Bayesian ODE solvers. In: Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI 2016), pp. 309–318. AUAI Press (2016)Google Scholar
 32.LacosteJulien, S., Lindsten, F., Bach, F.: Sequential kernel herding: FrankWolfe optimization for particle filtering. In: G. Lebanon, S.V.N. Vishwanathan (eds.) Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 38, pp. 544–552. PMLR (2015)Google Scholar
 33.Matèrn, B.: Spatial variation. Meddelanden fran Statens Skogsforskningsinstitut 49(5) (1960)Google Scholar
 34.Matèrn, B.: Spatial Variation, 2nd edn. SpringerVerlag (1986)CrossRefzbMATHGoogle Scholar
 35.Minka, T.: Deriving quadrature rules from Gaussian processes. Tech. rep., Statistics Department, Carnegie Mellon University (2000)Google Scholar
 36.Muandet, K., Fukumizu, K., Sriperumbudur, B.K., Schölkopf, B.: Kernel mean embedding of distributions : A review and beyond. Foundations and Trends in Machine Learning 10(1–2), 1–141 (2017)CrossRefzbMATHGoogle Scholar
 37.Narcowich, F.J., Ward, J.D.: Scattereddata interpolation on \(\mathbb{R}^n\): Error estimates for radial basis and bandlimited functions. SIAM Journal on Mathematical Analysis 36, 284–300 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
 38.Narcowich, F.J., Ward, J.D., Wendland, H.: Sobolev bounds on functions with scattered zeros, with applications to radial basis function surface fitting. Mathematics of Computation 74(250), 743–763 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
 39.Narcowich, F.J., Ward, J.D., Wendland, H.: Sobolev error estimates and a Bernstein inequality for scattered data interpolation via radial basis functions. Constructive Approximation 24(2), 175–186 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 40.Novak, E.: Deterministic and Stochastic Error Bounds in Numerical Analysis. SpringerVerlag (1988)CrossRefzbMATHGoogle Scholar
 41.Novak, E.: Some results on the complexity of numerical integration. In: R. Cools, D. Nuyens (eds.) Monte Carlo and QuasiMonte Carlo Methods. Springer Proceedings in Mathematics & Statistics, vol. 163, pp. 161–183. Springer, Cham (2016)Google Scholar
 42.Novak, E., Wózniakowski, H.: Tractability of Multivariate Problems, Vol. II: Standard Information for Functionals. EMS (2010)CrossRefzbMATHGoogle Scholar
 43.Oates, C., Niederer, S., Lee, A., Briol, F.X., Girolami, M.: Probabilistic models for integration error in the assessment of functional cardiac models. In: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (eds.) Advances in Neural Information Processing Systems 30, pp. 110–118. Curran Associates, Inc. (2017)Google Scholar
 44.Oates, C.J., Cockayne, J., Briol, F.X., Girolami, M.: Convergence rates for a class of estimators based on Stein’s method. Bernoulli (2018). To appearGoogle Scholar
 45.Oates, C.J., Girolami, M.: Control functionals for quasiMonte Carlo integration. In: A. Gretton, C.C. Robert (eds.) Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 51, pp. 56–65. PMLR (2016)Google Scholar
 46.Oates, C.J., Girolami, M., Chopin, N.: Control functionals for Monte Carlo integration. Journal of the Royal Statistical Society, Series B 79(2), 323–380 (2017)MathSciNetCrossRefGoogle Scholar
 47.Oates, C.J., Papamarkou, T., Girolami, M.: The controlled thermodynamic integral for Bayesian model evidence evaluation. Journal of the American Statistical Association 111(514), 634–645 (2016)MathSciNetCrossRefGoogle Scholar
 48.O’Hagan, A.: Bayes–Hermite quadrature. Journal of Statistical Planning and Inference 29, 245–260 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
 49.Osborne, M.A., Duvenaud, D.K., Garnett, R., Rasmussen, C.E., Roberts, S.J., Ghahramani, Z.: Active learning of model evidence using Bayesian quadrature. In: F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 25, pp. 46–54. Curran Associates, Inc. (2012)Google Scholar
 50.Paul, S., Chatzilygeroudis, K., Ciosek, K., Mouret, J.B., Osborne, M.A., Whiteson, S.: Alternating optimisation and quadrature for robust control. In: The ThirtySecond AAAI Conference on Artificial Intelligence (AAAI18), pp. 3925–3933 (2018)Google Scholar
 51.Särkkä, S., Hartikainen, J., Svensson, L., Sandblom, F.: On the relation between Gaussian process quadratures and sigmapoint methods. Journal of Advances in Information Fusion 11(1), 31–46 (2016)Google Scholar
 52.Schaback, R.: Error estimates and condition numbers for radial basis function interpolation. Advances in Computational Mathematics 3(3), 251–264 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
 53.Schaback, R., Wendland, H.: Kernel techniques: From machine learning to meshless methods. Acta Numerica 15, 543–639 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 54.Sloan, I.H., Wózniakowski, H.: When are quasiMonte Carlo algorithms efficient for high dimensional integrals? Journal of Complexity 14(1), 1–33 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
 55.Sommariva, A., Vianello, M.: Numerical cubature on scattered data by radial basis functions. Computing 76, 295–310 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 56.Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Schölkopf, B., Lanckriet, G.R.: Hilbert space embeddings and metrics on probability measures. Jounal of Machine Learning Research 11, 1517–1561 (2010)MathSciNetzbMATHGoogle Scholar
 57.Stein, E.M.: Singular Integrals and Differentiability Properties of Functions. Princeton University Press, Princeton, NJ (1970)zbMATHGoogle Scholar
 58.Steinwart, I., Christmann, A.: Support Vector Machines. Springer (2008)zbMATHGoogle Scholar
 59.Triebel, H.: Theory of Function Spaces III. Birkhäuser Verlag (2006)Google Scholar
 60.Wendland, H.: Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Advances in Computational Mathematics 4(1), 389–396 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
 61.Wendland, H.: Scattered Data Approximation. Cambridge University Press, Cambridge, UK (2005)zbMATHGoogle Scholar
 62.Xi, X., Briol, F.X., Girolami, M.: Bayesian quadrature for multiple related integrals. In: J. Dy, A. Krause (eds.) Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, pp. 5373–5382. PMLR (2018)Google Scholar
Copyright information
OpenAccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.