1 Introduction

Filtering refers to the sequential estimation of the state u and/or parameters of a system through sequential incorporation of online data y. The most complete estimation of the state \(u_n\) at time n is given by its probability distribution conditional on the observations up to the given time \(\mathbb {P}(du_n|y_1,\ldots , y_n)\) [2, 27]. For linear Gaussian systems, the analytical solution may be given in closed form via update formulae for the mean and covariance known as the Kalman filter [31]. More generally, however, closed form solutions typically are not known. One must therefore resort to either algorithms which approximate the probabilistic solution by leveraging ideas from control theory in the data assimilation community [27, 32], or Monte Carlo methods to approximate the filtering distribution itself [2, 11, 15]. The ensemble Kalman filter (EnKF) [9, 17, 35] combines elements of both approaches. In the linear Gaussian case it converges to the Kalman filter solution in the large-ensemble limit [41], and even in the nonlinear case, under suitable assumptions it converges [36, 37] to a limit which is optimal among those which incorporate the data linearly and use a single update iteration [36, 40, 44]. In the case of spatially extended models approximated on a numerical grid, the state space itself may become very high-dimensional and even the linear solves may become intractable, due to the cost of computing the covariance matrix. Therefore, one may be inclined to use the EnKF filter even for linear Gaussian problems in which the solution is computationally intractable despite being given in closed form by the Kalman filter.

The Multilevel Monte Carlo method (MLMC) is a hierarchical and variance-reduction based approximation method initially developed for weak approximations of random fields and stochastic differential equations [18, 19, 21]. Recently, a number of works have emerged which extend the MLMC framework to the setting of Monte Carlo algorithms designed for Bayesian inference. Examples include Markov chain Monte Carlo [14, 22], sequential Monte Carlo samplers [6, 26, 42], particle filters [20, 25], and EnKF [23]. The filtering papers thus far [20, 23, 25] consider only finite-dimensional SDE forward models. In this work, we develop a new multilevel ensemble Kalman filtering method (MLEnKF) for the setting of infinite-dimensional state-space models with evolution in continuous-time. The method consists of a hierarchy of pairwise coupled EnKF-like ensembles on different finite-resolution levels of the underlying infinite-dimensional evolution model that all depend on the same Kalman gain in the update step. The method presented in this work may be viewed as an extension of the finite-dimensional-state-space MLEnKF method [23], which only considered a hierarchy of time-discretization resolution levels.

Under sufficient regularity, the large-ensemble limit of EnKF is equal in distribution to the so-called mean-field EnKF (MFEnKF), cf. [34, 36, 37]. In nonlinear settings, however, MFEnKF is not equal in distribution to the Bayes filter, which is the exact filter distribution. More precisely, the error of EnKF approximating the Bayes filter may be decomposed into a statistical error, due to the finite ensemble size, and a Gaussian bias that is introduced by the Kalman-filter-like update step in EnKF. While the update-step bias error in EnKF is difficult both to quantify and deal with, the statistical error can, in theory, be reduced to arbitrary magnitude. However, the high computational cost of simulations in high-dimensional state space often imposes small ensemble size as a practical constraint. By making use of hierarchical variance-reduction techniques, the MLEnKF method developed in this work is capable of obtaining a much smaller statistical error than EnKF at the same fixed cost.

In addition to design an MLEnKF method for spatio-temporal processes, we provide an asymptotic performance analysis of the method that is applicable under sufficient regularity of the filtering problem and \(L^p\)-strong convergence of the numerical method approximating the underlying model dynamics. Sections 5 and 6 are devoted to a detailed analysis and practical implementation of MLEnKF applied to linear and semilinear stochastic reaction–diffusion equations. In particular, we describe how the pairwise coupling of EnKF-like hierarchies should be implemented for one specific numerical solver (the exponential-Euler method), and provide numerical evidence for the efficiency gains of MLEnKF over EnKF.

Since particle filters are known to often perform better than EnKF, we also include a few remarks on how we believe such methods would compare to MLEnKF in filtering settings with spatial processes. Due to the poor scaling of particle ensemble size in high dimensions, which can even be exponential [7, 38], particle filters are typically not used for spatial processes, or even modestly high-dimensional processes. There has been some work in the past few years which overcomes this issue either for particular examples [5] or by allowing for some bias [4, 45, 48, 49]. But particle filters cannot yet be considered practically applicable for general spatial processes. If there is a well-defined limit of the model as the state-space dimension d grows such that the effective dimension of the target density with respect to the proposal remains finite or even small, then useful particle filters can be developed [33, 39]. As noted in [10], the key criterion which needs to be satisfied is that the proposal and the target are not mutually singular in the limit. MLMC has been applied recently to particle filters, in the context where the approximation arises due to time discretization of a finite-dimensional SDE [20, 25]. It is an interesting open problem to design multilevel particle filters for spatial processes: Both the range of applicability and the asymptotic performance of such a method versus MLEnKF when applied to spatial processes are topics that remain to be studied.

The rest of the paper is organized as follows. Section 2 introduces the filtering problem and notation. The design of the MLEnKF method is presented in Sect. 3. Section 4 studies the weak approximation of MFEnKF by MLEnKF, and shows that in this setting, MLEnKF inherits almost the same favorable asymptotic “cost-to-accuracy” performance as standard MLMC applied to weak approximations of stochastic spatio-temporal processes. Section 5 presents a detailed analysis and description of the implementation of MLEnKF for a family of stochastic reaction–diffusion models. Section 6 provides numerical studies of filtering problems with linear and semilinear stochastic reaction–diffusion models that corroborate our theoretical findings. Conclusions and future directions are presented in Sect. 7, and auxiliary theoretical results and technical proofs are provided in “Appendices A, B and C”.

2 Set-up and single level algorithm

2.1 General set-up

Let \((\Omega ,\mathcal {F}, \mathbb {P})\) be a complete probability space, where \(\mathbb {P}\) is a probability measure on the measurable space \((\Omega , \mathcal {F})\). Let \(\mathcal {V}\) be a separable Hilbert space with inner product \(\langle \cdot ,\cdot \rangle _{\mathcal {V}}\) and norm \(\Vert \cdot \Vert _{\mathcal {V}} = \sqrt{\langle \cdot ,\cdot \rangle _{\mathcal {V}}}\). Let V denote a subspace of \(\mathcal {V}\) which is closed in the topology induced by the norm \(\Vert \cdot \Vert _{V} = \sqrt{\langle \cdot ,\cdot \rangle _{V}}\), which is assumed to be a stronger norm than \(\Vert \cdot \Vert _{\mathcal {V}}\). For an arbitrary separable Hilbert space \((\mathcal {K}, \Vert \cdot \Vert _{\mathcal {K}})\), we denote the associated \(L^p\)-Bochner space by

where , or the shorthand \(\Vert u \Vert _p\) whenever confusion is not possible. For an arbitrary pair of Hilbert spaces \(\mathcal {K}_1\) and \(\mathcal {K}_2\), the space of bounded linear mappings from the former space into the latter is denoted by

where

$$\begin{aligned} \Vert H\Vert _{L(\mathcal {K}_1,\mathcal {K}_2)} \mathrel {\mathop :}=\sup _{x \in \mathcal {K}_1\setminus \{0\}} \frac{\Vert Hx\Vert _{\mathcal {K}_2}}{\Vert x\Vert _{\mathcal {K}_1}}. \end{aligned}$$

In finite dimensions, \((\mathbb {R}^m, \langle \cdot , \cdot \rangle )\) represents the m-dimensional Euclidean vector space with norm \(| \cdot | \mathrel {\mathop :}=\sqrt{\langle \cdot , \cdot \rangle }\), and for matrices \(A \in L(\mathbb {R}^{m_1},\mathbb {R}^{m_2})\), \(|A| := \Vert A\Vert _{L(\mathbb {R}^{m_1},\mathbb {R}^{m_2})}\).

2.1.1 The filtering problem

Given \(u_0 \in \cap _{p\ge 2} L^p(\Omega ,V)\) and the mapping \(\Psi : L^p(\Omega ,V) \times \Omega \rightarrow L^p(\Omega ,V)\), we consider the discrete-time dynamics

$$\begin{aligned} u_{n+1}(\omega ) = \Psi (u_n,\omega ), \quad \text {for} \quad n=0,1,\ldots ,N-1. \end{aligned}$$
(1)

and the sequence of observations

$$\begin{aligned} y_n(\omega ) = H u_n(\omega ) + \eta _n(\omega ), \quad n = 1,2, \ldots , N. \end{aligned}$$
(2)

Here, \(H \in L(\mathcal {V}, \mathbb {R}^{m})\) and the sequence \(\{\eta _n\}\) consists of independent and identically \(N(0,\Gamma )-\)distributed random variables with \(\Gamma \in \mathbb {R}^{m\times m}\) positive definite. In the sequel, the explicit dependence on \(\omega \) will be suppressed where confusion is not possible. A general filtering objective is to track the signal \(u_n\) given a fixed sequence of observations \(Y_n := (y_1,y_2, \ldots , y_n)\), i.e., to track the distribution of \(u_n|Y_n\) for \(n=1,\ldots \). In this work, however, we restrict ourselves to considering the more specific objective of approximating for a given quantity of interest (QoI) \(\varphi : \mathcal {V}\rightarrow \mathbb {R}\). The index n will be referred to as time, whether the actual time between observations is 1 or not (in the examples in Sect. 5 and beyond it will be called T), but this will not cause confusion since time is relative.

2.1.2 The dynamics

We consider problems in which \(\Psi \) is the finite-time evolution of an SPDE, e.g. (35), and we will assume that \(\Psi \) cannot be evaluated exactly, but that there exists a sequence \(\{\Psi ^{\ell }:L^p(\Omega ,\mathcal {V})\times \Omega \rightarrow L^p(\Omega ,\mathcal {V}) \}_{\ell =0}^\infty \) of approximations to the solution \(\Psi :=\Psi ^{\infty }\) satisfying the following uniform-in-\(\ell \) stability properties

Assumption 1

For every \(p \ge 2\), it holds that \(\Psi : L^p(\Omega ,V)\times \Omega \rightarrow L^p(\Omega ,V)\), and for all \(u,v \in L^p(\Omega , \mathcal {V})\), the solution operators \(\{\Psi ^{\ell }\}_{\ell =0}^\infty \) satisfy the following conditions: there exists a constant \(0<c_\Psi <\infty \) depending on p such that

  1. (i)

    \(\Vert \Psi ^{\ell }(u) -\Psi ^{\ell }(v) \Vert _{L^p(\Omega ,\mathcal {V})} {\le } c_{\Psi } \Vert u-v\Vert _{L^p(\Omega ,\mathcal {V})} \) , and

  2. (ii)

    \(\Vert \Psi ^{\ell }(u)\Vert _{L^p(\Omega ,\mathcal {V})} \le c_{\Psi } (1+\Vert u\Vert _{L^p(\Omega ,\mathcal {V})})\).

For notational simplicity, we restrict ourselves to settings in which the map \(\Psi (\cdot )\) does not depend on n, but the results in this work do of course extend easily to non-autonomous settings when the assumptions on \(\{\Psi _n\}_{n=1}^N\) are uniform with respect to n.

Remark 1

The two approximation spaces \(V\subset \mathcal {V}\) are introduced in order to obtain convergence rates for numerical simulation methods \(\Psi ^\ell \) that are discretized in physical or state space. See Assumption 2(i)–(ii) and inequality (41) for an example of how this may be obtained in practice.

2.1.3 The Bayes filter

The pair of discrete-time stochastic processes \((u_n, y_n)\) constitutes a hidden Markov model, and the exact (Bayes-filter) distribution of \(u_n|Y_n\) may in theory be determined iteratively through the system of prediction-update equations

$$\begin{aligned} \mathbb {P}(du_n | Y_n)&= \frac{1}{Z(Y_n)}{\mathcal {L}(u_n;y_n) \mathbb {P}(du_n |Y_{n-1})}, \\ \mathbb {P}(du_n |Y_{n-1})&= \int _{u_{n-1}\in \mathcal {V}} \mathbb {P}(du_n|u_{n-1})\mathbb {P}(du_{n-1}|Y_{n-1}), \\ \mathcal {L}(u_n;y_n)&= \exp \Big \{-\frac{1}{2} |\Gamma ^{-1/2}(y_n-Hu_n)|^2\Big \}, \\ Z(Y_n)&= \int _{u_{n}\in \mathcal {V}} \mathcal {L}(u_n;y_n) \mathbb {P}(du_n |Y_{n-1}). \end{aligned}$$

When the state space is infinite-dimensional and the dynamics cannot be evaluated exactly, however, this is an extremely challenging problem. Consequently, we will here restrict ourselves to constructing weak approximation methods of the mean-field EnKF, cf. Sect. 2.4.

2.2 Some details on Hilbert spaces, Hilbert–Schmidt operators, and Cameron–Martin spaces

For two arbitrary separable Hilbert spaces \(\mathcal {K}_1 \) and \(\mathcal {K}_2\), the tensor product \(\mathcal {K}_1 \otimes \mathcal {K}_2\) is also a Hilbert space. For rank-1 tensors, its inner product is defined by

$$\begin{aligned} \langle u \otimes v, u' \otimes v' \rangle _{\mathcal {K}_1 \otimes \mathcal {K}_2} = \langle u, u' \rangle _{\mathcal {K}_1}\langle v, v' \rangle _{\mathcal {K}_2} \qquad \forall u, u' \in \mathcal {K}_1, \quad \forall v,v' \in \mathcal {K}_2, \end{aligned}$$

which extends by linearity to any tensor of finite rank. The Hilbert space \(\mathcal {K}_1\otimes \mathcal {K}_2\) is the completion of this set with respect to the induced norm

$$\begin{aligned} \Vert u \otimes v\Vert _{\mathcal {K}_1 \otimes \mathcal {K}_2} = \Vert u\Vert _{\mathcal {K}_1}\Vert v\Vert _{\mathcal {K}_2}. \end{aligned}$$
(3)

Let \(\{e_k\}\) and \(\{{\hat{e}}_k\}\) be orthonormal bases for \(\mathcal {K}_1\) and \(\mathcal {K}_2\), respectively, and observe that finite sums of rank-1 tensors of the form \(X \mathrel {\mathop :}=\sum _{i,j} \alpha _{ij} e_i \otimes {\hat{e}}_j \in \mathcal {K}_1 \otimes \mathcal {K}_2\) can be identified with a bounded linear mapping

$$\begin{aligned} T_{X}: \mathcal {K}_2^* \rightarrow \mathcal {K}_1 \quad \text {with} \quad T_{X}(f) \mathrel {\mathop :}=\sum _{i,j} \alpha _{ij} f({\hat{e}}_j) e_i, \quad \text {for } f \in \mathcal {K}_2^*. \end{aligned}$$
(4)

For two bounded linear operators \(A,B: \mathcal {K}_2^* \rightarrow \mathcal {K}_1\) we recall the definition of the Hilbert-Schmidt inner product and norm

$$\begin{aligned} \langle A,B \rangle _{HS} = \sum _{k} \langle A {\hat{e}}^*_k, B {\hat{e}}^*_k \rangle _{\mathcal {K}_1}, \qquad |A|_{HS} = \langle A,A \rangle _{HS}^{1/2}, \end{aligned}$$

where \(\{{\hat{e}}^*_k\}\) is the orthonormal basis of \(\mathcal {K}_2^*\) satisfying \({\hat{e}}^*_k({\hat{e}}_j) = \delta _{jk}\) for all jk in the considered index set. A bounded linear operator \(A:\mathcal {K}_2^* \rightarrow \mathcal {K}_1\) is called a Hilbert-Schmidt operator if \(|A|_{HS} < \infty \) and \(HS(\mathcal {K}_2^*, \mathcal {K}_1)\) is the space of all such operators. In view of (4),

$$\begin{aligned} \begin{aligned} |T_{X}|_{HS}^2&= \sum _{k} \left\langle \sum _{i,j} \alpha _{ij} e^*_k({\hat{e}}_j) e_i, \sum _{i',j'} \alpha _{i'j'} e^*_k({\hat{e}}_{j'}) e_{i'} \right\rangle _{\mathcal {K}_1} \\&= \sum _{i,j} |\alpha _{ij}|^2 \\&= \Vert X\Vert _{\mathcal {K}_1 \otimes \mathcal {K}_2}. \end{aligned} \end{aligned}$$

By completion, the space \(\mathcal {K}_1 \otimes \mathcal {K}_2\) is isometrically isomorphic to \(HS(\mathcal {K}_2^*,\mathcal {K}_1)\) (and also to \(HS(\mathcal {K}_2,\mathcal {K}_1)\) by the Riesz representation theorem). For an element \(A \in \mathcal {K}_1\otimes \mathcal {K}_2\) we identify the norms

$$\begin{aligned} \Vert A\Vert _{\mathcal {K}_1 \otimes \mathcal {K}_2} = |A|_{HS}, \end{aligned}$$

and such elements will interchangeably be considered either as members of \(\mathcal {K}_1\otimes \mathcal {K}_2\) or of \(HS(\mathcal {K}_2^*,\mathcal {K}_1)\). When viewed as \(A \in HS(\mathcal {K}_2^*,\mathcal {K}_1)\), the mapping \(A: \mathcal {K}_2^* \rightarrow \mathcal {K}_1\) is defined by

$$\begin{aligned} A f \mathrel {\mathop :}=\sum _{i,j} A_{ij} f( {\hat{e}}_j) e_i, \quad \text {for } f \in \mathcal {K}_2^*, \end{aligned}$$

where \(A_{ij} \mathrel {\mathop :}=\langle e_i, A {\hat{e}}_j^* \rangle _{\mathcal {K}_1}\), and when viewed as \(A \in \mathcal {K}_1\otimes \mathcal {K}_2\), we use tensor-basis representation

$$\begin{aligned} A = \sum _{i,j} A_{ij} e_i \otimes {\hat{e}}_j. \end{aligned}$$

The covariance operator for a pair of random variables \(Z, X \in L^2(\Omega , V)\) is denoted by

and whenever \(Z=X\), we employ the shorthand \({\mathrm {Cov}}[Z] \mathrel {\mathop :}={\mathrm {Cov}}[Z,Z]\). For completeness and later reference, let us prove that said covariance belongs to \(V\otimes V\).

Proposition 1

If \(u\in L^2(\Omega ,V)\), then \(C:={\mathrm {Cov}}[u] \in V\otimes V\).

Proof

By Jensen’s inequality,

\(\square \)

2.3 Ensemble Kalman filtering

EnKF is an ensemble-based extension of Kalman filtering to nonlinear settings. Let \(\{ \hat{v}_{0,i}\}_{i=1}^M\) denote an ensemble of M i.i.d. particles with \(\hat{v}_{0,i} {\mathop {=}\limits ^{D}} u_0\). The initial distribution \(\mathbb {P}_{u_0}\) can thus be approximated by the empirical measure of \(\{ \hat{v}_{0,i}\}_{i=1}^M\). By extension, let \(\{ \hat{v}_{n,i}\}_{i=1}^M\) denote the ensemble-based approximation of the updated distribution \(u_n|Y_n\) (at \(n=0\) we employ the convention \(Y_0=\emptyset \), so that \(u_0=u_0|Y_0\)). Given an updated ensemble \(\{ \hat{v}_{n,i}\}_{i=1}^M\), the ensemble-based approximation of the prediction distribution \(u_{n+1}|Y_n\) is obtained through simulating each particle one observation time ahead:

$$\begin{aligned} v_{n+1,i} = \Psi (\hat{v}_{n,i}), \quad i =1,2,\ldots , M. \end{aligned}$$
(5)

We will refer to \(\{\hat{v}_{n+1,i}\}_{i=1}^M\) as the prediction ensemble at time \(n+1\), and we also note that in many settings, the exact dynamics \(\Psi \) in (5) have to be approximated by a numerical solver.

Next, given \(\{\hat{v}_{n+1,i}\}_{i=1}^M\) and a new observation \(y_{n+1}\), the ensemble-based approximation of the updated distribution \(u_{n+1}|Y_{n+1}\) is obtained through updating each particle path

$$\begin{aligned} \begin{aligned} \tilde{y}_{n+1,i}&= y_{n+1} + \eta _{n+1,i},\\ \hat{v}_{n+1,i}&= (I - K^{\mathrm{MC}}_{n+1}H) v_{n+1,i} + K^{\mathrm{MC}}_{n+1} \tilde{y}_{n+1,i}, \end{aligned} \end{aligned}$$
(6)

where \(\{ \eta _{n+1,i} \}_{i=1}^M\) is an independent and identically \(N(0, \Gamma )-\hbox {distributed}\) sequence, the Kalman gain

$$\begin{aligned} K^{\mathrm{MC}}_{n+1} = \left( C_{n+1}^{\mathrm{MC}} H^* \right) (S^{\mathrm{MC}}_{n+1})^{-1} \end{aligned}$$

is a function of

$$\begin{aligned} S^{\mathrm{MC}}_{n+1} = HC_{n+1}^{\mathrm{MC}} H^* + \Gamma , \end{aligned}$$

the adjoint observation operator \(H^* \in L(\mathbb {R}^{m}, \mathcal {V}^*)\), defined by

$$\begin{aligned} (H^*a)(w) = \langle a, Hw \rangle \quad \text {for all} \quad a \in \mathbb {R}^m \quad \text {and} \quad w \in \mathcal {V}, \end{aligned}$$

and the prediction covariance

$$\begin{aligned} \begin{aligned} C_{n+1}^{\mathrm{MC}}&= \mathrm {Cov}_M[v_{n+1}] \ , \end{aligned} \end{aligned}$$

with

$$\begin{aligned}&\mathrm {Cov}_M[u_n,v_n] \mathrel {\mathop :}=\frac{M}{M-1}\left( E_M[u_n \otimes v_n] - E_M[u_n] \otimes E_M[v_n] \right) \, \nonumber \\&E_M[v_{n}] \mathrel {\mathop :}=\frac{1}{M} \sum _{i=1}^M v_{n,i} \, \end{aligned}$$
(7)

and the shorthand \(\mathrm {Cov}_M[u_n] \mathrel {\mathop :}=\mathrm {Cov}_M[u_n,u_n]\).

We introduce the following notation for the empirical measure of the updated ensemble \(\{\hat{v}_{n,i}\}_{i=1}^{M}\):

$$\begin{aligned} {\hat{\mu }}^{\mathrm{MC}}_n = \frac{1}{M} \sum _{i=1}^{M} \delta _{\hat{v}_{n,i}}, \end{aligned}$$
(8)

and for any QoI \(\varphi : \mathcal {V}\rightarrow \mathbb {R}\), let

$$\begin{aligned} {\hat{\mu }}_n^{\mathrm{MC}}[\varphi ] := \int \varphi d {\hat{\mu }}^{\mathrm{MC}}_n = \frac{1}{M}\sum _{i=1}^{M} \varphi (\hat{v}_{n,i}) . \end{aligned}$$

Due to the update formula (6), all ensemble particles are correlated to one another after the first update. Even in the linear Gaussian case, the ensemble will not remain Gaussian after the first update. Nonetheless, it has been shown that in the large-ensemble limit, EnKF converges in \(L^p(\Omega )\) to the correct (Bayes-filter) Gaussian in the linear and finite-dimensional case [37, 41], with the rate \(\mathcal {O}(M^{-1/2})\) for Lipschitz-functional QoI with polynomial growth at infinity. Furthermore, in the nonlinear cases admitted by Assumption 1, EnKF converges in the same sense and with the same rate to a mean-field limiting distribution described below.

Remark 2

The perturbed observations \({\tilde{y}}_{n,i}\) were originally introduced in [9] to correct the variance-deflation-type error that appeared in its absence in implementations following the original formulation of EnKF [16]. It has become known as the perturbed observation implementation.

2.4 Mean-field ensemble Kalman filtering

In order to describe and study convergence properties of EnKF in the large-ensemble limit, we now introduce the mean-field EnKF (MFEnKF) [36]: Let \(\hat{\bar{ v}}_0 \sim \mathbb {P}_{u_0}\) and

(9)
(10)

Here \(\eta _n\) are i.i.d. draws from \(N(0,\Gamma ).\) In the finite-dimensional state-space setting, it was shown in [36, 37] that for nonlinear state-space models and nonlinear models with additive Gaussian noise, respectively, EnKF converges to MFEnKF with the \(L^p(\Omega )\) convergence rate \(\mathcal {O}(M^{-1/2})\), as long as the models satisfy a Lipschitz criterion, similar to (but stronger than) Assumption 1. And in [23], we showed for that MLEnKF converges toward MFEnKF with a higher rate than EnKF does in said finite-dimensional setting. The work [34] extended convergence results to infinite-dimensional state space for square-root filters. In this work, the aim is to prove convergence of the MLEnKF for infinite-dimensional state space, with the same favorable asymptotic cost-to-accuracy performance as in [23].

The following \(L^p\)-boundedness properties ensures the existence of the MFEnKF-process and its mean-field Kalman gain, and they will be needed when studying the properties of MLEnKF:

Proposition 2

Assume the initial data of the hidden Markov model (1) and (2) satisfies \(u_0 \in \cap _{p\ge 2} L^p(\Omega ,V)\). Then the MFEnKF process (9) and (10) satisfies \(\bar{v}_n, \hat{\bar{v}}_n \in \cap _{p \ge 2} L^p(\Omega ,V)\) and \(\Vert \bar{K}_n\Vert _{L(\mathbb {R}^{m},V)} < \infty \) for all \(n \in \mathbb {N}\).

Proof

Since \(\hat{\bar{v}}_0 = u_0\), the property clearly holds for \(n=0\). Given \(\hat{\bar{v}}_n\in L^p(\Omega ,V)\), Assumption 1 guarantees \(\bar{v}_{n+1} \in L^p(\Omega ,V)\). By Proposition 1, \(\bar{C}_{n+1} \in V\otimes V\). Since \(H\bar{C}_{n+1}H^* \ge 0\) and \(\Gamma >0\), it follows that \(H^*{\bar{S}}_{n+1}^{-1} \in L(\mathbb {R}^{m}, \mathcal {V}^*)\) as

$$\begin{aligned} \begin{aligned} \Vert H^* {\bar{S}}_{n+1}^{-1}\Vert _{ L(\mathbb {R}^{m}, \mathcal {V}^*)}&\le \Vert H^*\Vert _{ L(\mathbb {R}^{m}, \mathcal {V}^*)} |{\bar{S}}_{n+1}^{-1}|\\&\le \Vert H\Vert _{L(\mathcal {V},\mathbb {R}^{m})} |\Gamma ^{-1}| \\&< \infty . \end{aligned} \end{aligned}$$

Furthermore, since \(\mathcal {V}^* \subset V^*\) it also holds that \(\Vert H^* {\bar{S}}_{n+1}^{-1}\Vert _{L(\mathbb {R}^{m},V^*)} < \infty \) and

$$\begin{aligned} \begin{aligned} \Vert \bar{K}_{n+1}\Vert _{L(\mathbb {R}^{m},V)}&\le \Vert \bar{C}_{n+1}\Vert _{L(V^*,V)} \Vert H^* {\bar{S}}_{n+1}^{-1}\Vert _{L(\mathbb {R}^{m},V^*)}\\&\le \Vert \bar{C}_{n+1}\Vert _{V\otimes V} \Vert H^* {\bar{S}}_{n+1}^{-1}\Vert _{L(\mathbb {R}^{m},V^*)}\\&< \infty . \end{aligned} \end{aligned}$$

The result follows by recalling that \(V\subset \mathcal {V}\) and by the triangle inequality:

$$\begin{aligned} \Vert \hat{\bar{v}}_{n+1}\Vert _{L^p(\Omega ,V)}&\le \Vert \bar{v}_{n+1}\Vert _{L^p(\Omega ,V)}\\&\quad + \Vert \bar{K}_{n+1}\Vert _{L(\mathbb {R}^{m},V)}\left( \Vert H \bar{v}_{n+1}\Vert _{L^{p}(\Omega ,\mathbb {R}^{m})} + \Vert {\tilde{y}}_{n+1}\Vert _{L^p(\Omega , \mathbb {R}^{m})}\right) \\&< \infty . \end{aligned}$$

\(\square \)

We conclude this section with some remarks on tensorized representations of the Kalman gain and related auxiliary operators that will be useful when developing MLEnKF algorithms in Sect. 3.3.

2.4.1 The Kalman gain and auxiliary operators

Introducing complete orthonormal bases \(\{e_i\}_{i=1}^m\) for \(\mathbb {R}^{m}\), and \(\{\phi _j\}\) for \(\mathcal {V}\), it follows that \(H \in L(\mathcal {V}, \mathbb {R}^{m})\) can be written

$$\begin{aligned} H = \sum _{i=1}^m\sum _{j=1}^\infty H_{ij} e_i \otimes \phi _j^* \end{aligned}$$

with \(H_{ij} \mathrel {\mathop :}=\langle e_i , H \phi _j\rangle \). And since \(\Vert H\Vert _{L(\mathcal {V},\mathbb {R}^{m})} < \infty \), it holds that

$$\begin{aligned} \sum _{j=1}^\infty H_{ij} \phi _j^* \in \mathcal {V}^*, \quad \text {for all } i \in \{1,2,\ldots ,m\}. \end{aligned}$$

For the covariance matrix, it holds almost surely that \({C}^{\mathrm {MC}}_{n+1} \in V\otimes V\subset \mathcal {V}\otimes \mathcal {V}\), so it may be represented by

$$\begin{aligned} {C}^{\mathrm {MC}}_{n+1}= \sum _{i,j=1}^\infty C_{n+1,ij}^{\mathrm{MC}} \phi _i \otimes \phi _j, \quad \text {where} \quad C_{n+1,ij}^{\mathrm{MC}} \mathrel {\mathop :}=\langle \phi _i,{C}^{\mathrm {MC}}_{n+1} \phi _j^*\rangle _{\mathcal {V}}. \end{aligned}$$

For the auxiliary operator, it holds almost surely that

$$\begin{aligned} R^{\mathrm{MC}}_{n+1} \mathrel {\mathop :}=C_{n+1}^{\mathrm{MC}} H^* \in L(\mathbb {R}^{m}, V), \end{aligned}$$

so it can be represented by

$$\begin{aligned} R^{\mathrm{MC}}_{n+1} = \sum _{i=1}^m \sum _{j=1}^\infty R^{\mathrm{MC}}_{n+1,ij} \phi _i \otimes e_j, \quad \text {where} \quad R^{\mathrm{MC}}_{n+1,ij}= \sum _{k=1}^\infty C_{n+1,ik}^{\mathrm{MC}} H_{jk}. \end{aligned}$$

Lastly, since \((S^{\mathrm{MC}})^{-1} \in L(\mathbb {R}^{m},\mathbb {R}^{m})\) and \(K^{\mathrm{MC}} \in L(\mathbb {R}^{m},V)\) almost surely, it holds that

$$\begin{aligned} S^{\mathrm{MC}}_{n+1,ij} = \left( \sum _{k=1}^\infty H_{ik} R^{\mathrm{MC}}_{n+1,kj}\right) + \Gamma _{ij} \text { and } K^{\mathrm{MC}}_{n+1,ij} = \sum _{k=1}^m R^{\mathrm{MC}}_{n+1,ik} \left( (S^{\mathrm{MC}}_{n+1})^{-1}\right) _{kj}. \end{aligned}$$

3 Multilevel EnKF

3.1 Notation and assumptions

Recall that \(\{\phi _k\}_{k=1}^{\infty }\) represents a complete orthonormal basis for \(\mathcal {V}\) and consider the hierarchy of subspaces \(\mathcal {V}_\ell = \mathrm{span} \{\phi _k\}_{k=1}^{N_\ell }\), where \(\{N_\ell \}\) is an exponentially increasing sequence of natural numbers further described below in Assumption 2. By construction, \(\mathcal {V}_0 \subset \mathcal {V}_1 \subset \dots \subset \mathcal {V}\). We define a sequence of orthogonal projection operators \(\{\mathcal {P}_\ell : \mathcal {V}\rightarrow \mathcal {V}_\ell \}\) by

$$\begin{aligned} \mathcal {P}_\ell v \mathrel {\mathop :}=\sum _{j=1}^{N_\ell } \langle \phi _j, v\rangle _{\mathcal {V}} \phi _j \in \mathcal {V}_\ell . \end{aligned}$$

It trivially follows that \(\mathcal {V}_\ell \) is isometrically isomorphic to \(\mathbb {R}^{N_\ell }\), so that any element \(v^\ell \in \mathcal {V}_\ell \) will, when convenient, be viewed as the unique corresponding element of \(\mathbb {R}^{N_\ell }\) whose kth component is given by \(\langle \phi _k, v^\ell \rangle _{\mathcal {V}}\) for \(k \in \{1,2,\ldots ,N_\ell \}\). For the practical construction of numerical methods, we also introduce a second sequence of projection operators \(\{\Pi _\ell : \mathcal {V}\rightarrow \mathcal {V}_\ell \}\), e.g., interpolant operators, which are assumed to be close to the corresponding orthogonal projectors and to satisfy the constraint \(\Pi _\ell \mathcal {V}= \mathcal {P}_\ell \mathcal {V}= \mathcal {V}_\ell \). This framework can accommodate spectral methods, for which typically \(\Pi _\ell = \mathcal {P}_\ell \), as well as finite element type approximations, for which \(\Pi _\ell \) more commonly will be taken as an interpolant operator. In the latter case, the basis \(\{\phi _j\}\) will be a hierarchical finite element basis, cf. [8, 47].

We now introduce two additional assumptions on the hierarchy of dynamics and two assumptions on the projection operators that will be needed in order to prove the convergence of MLEnKF and its superior efficiency compared to EnKF. For two non-negative sequences \(\{f_\ell \}\) and \(\{g_\ell \}\), the notation \(f_\ell \lesssim g_\ell \) means there exist a constant \(c>0\) such that \(f_\ell \le c g_\ell \) holds for all \(\ell \in {\mathbb {N}} \cup \{0\}\), and the notation \(f_\ell \eqsim g_\ell \) means that both \(f_\ell \lesssim g_\ell \) and \(g_\ell \lesssim f_\ell \) are true.

Assumption 2

Assume the initial data of the hidden Markov model (1) and (2) satisfies \(u_0 \in \cap _{p\ge 2}L^p(\Omega ,V)\). Consider a hierarchy of solution operators \(\{\Psi ^{\ell }: L^p(\Omega ,\mathcal {V})\times \Omega \rightarrow L^p(\Omega ,\mathcal {V}_\ell ) \}\) for which Assumption 1 holds that are associated to a hierarchy of subspaces \(\{\mathcal {V}_\ell \}\) with resolution dimension \(N_\ell \eqsim \kappa ^\ell \) for some \(\kappa >1\). Let \(h_\ell \eqsim N_\ell ^{-1/d}\) and \(\Delta t_\ell \eqsim h_\ell ^{\gamma _t}\), for some \(\gamma _t>0\), respectively denote the spatial and the temporal resolution parameter on level \(\ell \). For a given set of exponent rates \(\beta , \gamma _x, \gamma _t>0\), the following conditions are fulfilled:

  1. (i)

    \(\Vert \Psi ^{\ell }(u) - \Psi (u) \Vert _{L^p(\Omega , \mathcal {V})} \lesssim (1+\Vert u\Vert _{L^p(\Omega , V)}) h_\ell ^{\beta /2}\), for all \(p\ge 2\) and \(u\in \cap _{p\ge 2}L^p(\Omega , V)\),

  2. (ii)

    for all \(u \in V\),

    $$\begin{aligned} \Vert (I- \mathcal {P}_\ell ) u \Vert _{\mathcal {V}} \lesssim \Vert u\Vert _{V} h_\ell ^{\beta /2} \quad \text {and} \quad \Vert (\Pi _\ell - \mathcal {P}_\ell )u\Vert _{\mathcal {V}} \lesssim \Vert u\Vert _{V} h^{\beta /2}_\ell , \end{aligned}$$
  3. (iii)

    the computational cost of applying \(\Pi _\ell \) to any element of \(\mathcal {V}\) is \(\mathcal {O}(N_\ell )\) and that of applying \(\Psi ^{\ell }\) to any element of \(\mathcal {V}\) is

    $$\begin{aligned} \text {Cost}(\Psi ^{\ell }) \eqsim h_\ell ^{-(d\gamma _x + \gamma _t)}, \end{aligned}$$

    where d denotes the dimension of the spatial domain of elements in \(\mathcal {V}\), and \(d\gamma _x + \gamma _t \ge d\).

3.2 The MLEnKF method

MLEnKF computes particle paths on a hierarchy of finite-dimensional function spaces with accuracy levels determined by the solvers \(\{\Psi ^{\ell }: L^p(\Omega ,\mathcal {V})\times \Omega \rightarrow L^p(\Omega ,\mathcal {V}_\ell ) \}\). Let \(v^{\ell }_{n}\) and \(\hat{v}^\ell _{n}\) respectively represent prediction and updated ensemble state at time n of a particle on resolution level \(\ell \), i.e., with dynamics governed by \(\Psi ^\ell \). For an ensemble-size sequence \(\{M_\ell \}_{\ell =0}^L \subset \mathbb {N}\setminus \{1\}\) that is further described in (18), the initial setup for MLEnKF consists of a hierarchy of ensembles \(\{\hat{v}^{0}_{0,i}\}_{i=1}^{M_0}\) and \(\{(\hat{v}^{\ell -1}_{0,i}, \hat{v}^{\ell }_{0,i})_{i=1}^{M_\ell }\}_{\ell =1}^L\). For \(\ell =0\), \(\{\hat{v}^{0}_{0,i}\}_{i=1}^{M_0}\) is a sequence of i.i.d. random variables with \(\hat{v}^{0}_{0,i} \sim \mathbb {P}_{\Pi _0 u_0}\), and for \(\ell \ge 1\), \(\{(\hat{v}^{\ell }_{0,i}, \hat{v}^{\ell -1}_{0,i})\}_{i=1}^{M_\ell }\) is a sequence of i.i.d. random variable 2-tuples with \(\hat{v}^{\ell }_{0,i} \sim \mathbb {P}_{\Pi _\ell u_0}\) and pairwise coupling through \(\hat{v}^{\ell -1}_{0,i} = \Pi _{\ell -1} \hat{v}^{\ell }_{0,i}\). MLEnKF approximates the initial reference distribution \(\mathbb {P}_{u_0|Y_0}\) (recalling the convention \(Y_0 = \emptyset \), so that \(u_0|Y_0 = u_0\)) by the multilevel-Monte-Carlo-based and signed empirical measure

$$\begin{aligned} {\hat{\mu }}^{\mathrm{ML}}_0 = \frac{1}{M_0} \sum _{i=1}^{M_0} \delta _{ \hat{v}^{0}_{0,i}} + \sum _{\ell =1}^{L} \frac{1}{M_\ell } \sum _{i=1}^{M_\ell }( \delta _{\hat{v}^{\ell }_{0,i}} -\delta _{\hat{v}^{\ell -1}_{0,i}} ). \end{aligned}$$

Similar to EnKF, the mapping

$$\begin{aligned} \{ (\hat{v}^{0}_{n,i})_{i=1}^{M_0}, \{(\hat{v}^{\ell -1}_{n,i}, \hat{v}^{\ell }_{n,i})_{i=1}^{M_\ell }\}_{\ell =1}^L\} \mapsto \{ (\hat{v}^{0}_{n+1,i})_{i=1}^{M_0}, \{(\hat{v}^{\ell -1}_{n+1,i}, \hat{v}^{\ell }_{n+1,i})_{i=1}^{M_\ell }\}_{\ell =1}^L\} \end{aligned}$$

represents the transition of the MLEnKF hierarchy of ensembles over one prediction-update step and

$$\begin{aligned} {\hat{\mu }}^{\mathrm{ML}}_0 = \frac{1}{M_0} \sum _{i=1}^{M_0} \delta _{ \hat{v}^{0}_{n,i}} + \sum _{\ell =1}^{L} \frac{1}{M_\ell } \sum _{i=1}^{M_\ell }( \delta _{\hat{v}^{\ell }_{n,i}} -\delta _{\hat{v}^{\ell -1}_{n,i}} ) \end{aligned}$$

represents the empirical distribution of the updated MLEnKF at time n. The MLEnKF prediction step consists of simulating all particle paths on all resolution one observation-time forward:

$$\begin{aligned} v^{0}_{n+1,i} = \Psi ^{0}(\hat{v}^{0}_{n,i},\omega _{0,i}), \end{aligned}$$

for \(\ell =0\) and \(i=1,2,\ldots , M_0\), and the pairwise coupling

$$\begin{aligned} \begin{aligned} v^{\ell -1}_{n+1,i}&= \Psi ^{\ell -1}(\hat{v}^{\ell -1}_{n,i},\omega _{\ell ,i}),\\ v^{\ell }_{n+1,i}&= \Psi ^\ell (\hat{v}^{\ell }_{n,i}, \omega _{\ell ,i}), \end{aligned} \end{aligned}$$
(11)

for \(\ell =1,\ldots ,L\) and \(i = 1,2, \ldots , M_\ell \). Note here that the driving noise in the second argument of the dynamics \(\Psi ^{\ell -1}\) and \(\Psi ^{\ell }\) is pairwise coupled, and otherwise independent. For the update step, the MLEnKF prediction covariance matrix is given by the following multilevel sample-covariance estimator

$$\begin{aligned} \begin{aligned} C^{\mathrm {ML}}_{n+1}&= \sum _{\ell =0}^L \mathrm {Cov}_{M_\ell }[ v^{\ell }_{n+1}] - \mathrm {Cov}_{M_\ell }[ v^{\ell -1}_{n+1}], \end{aligned} \end{aligned}$$
(12)

and the multilevel Kalman gain is defined by

$$\begin{aligned} K^{\mathrm {ML}}_{n+1} = C^{\mathrm {ML}}_{n+1} H^* (S^{\mathrm{ML}}_{n+1})^{-1}, \text { where } S^{\mathrm{ML}}_{n+1} := (H C^{\mathrm {ML}}_{n+1} H^*)^+ + \Gamma , \end{aligned}$$
(13)

where

$$\begin{aligned} (H C^{\mathrm {ML}}_{n+1} H^*)^+ := \sum _{ \begin{array}{c} i=1\\ \lambda _i\ge 0 \end{array} }^m \lambda _i q_i q_i^{\mathsf {T}}, \end{aligned}$$
(14)

with \((\lambda _j,q_j)_{j=1}^m\) denoting the eigenpairs of \(H C^{\mathrm {ML}}_{n+1} H^* \in \mathbb {R}^{m\times m}\). The new observation \(y_{n+1}\) is assimilated into the hierarchy of ensembles by the following multilevel extension of EnKF at the zeroth level:

$$\begin{aligned} \begin{aligned} \tilde{y}^{0}_{n+1,i}&= y_{n+1} + \eta ^0_{n+1,i} \\ \hat{v}^{0}_{n+1,i}&= (I - \Pi _0 K^{\mathrm {ML}}_{n+1} H ) {v}^{0}_{n+1,i} + \Pi _{0} K^{\mathrm {ML}}_{n+1} \tilde{y}^{0}_{n+1,i}, \end{aligned} \end{aligned}$$

for \(i=1,2,\ldots , M_0\) with \(\{\eta ^0_{n+1,i}\}_{i=1}^{M_0}\) , and for each of the higher levels, \(\ell =1,\ldots ,L\), the pairwise coupling of perturbed observations

$$\begin{aligned} \begin{aligned} \tilde{y}^{\ell }_{n+1,i}&= y_{n+1} + \eta ^{\ell }_{n+1,i} \\ \hat{v}^{\ell -1}_{n+1,i}&= (I - \Pi _{\ell -1} K^{\mathrm {ML}}_{n+1} H ) {v}^{\ell -1}_{n+1,i} + \Pi _{\ell -1} K^{\mathrm {ML}}_{n+1} \tilde{y}^{\ell }_{n+1,i}, \\ \hat{v}^{\ell }_{n+1,i}&= (I - \Pi _\ell K^{\mathrm {ML}}_{n+1} H ) {v}^{\ell }_{n+1,i} + \Pi _{\ell } K^{\mathrm {ML}}_{n+1} \tilde{y}^{\ell }_{n+1,i}, \end{aligned} \end{aligned}$$
(15)

for \(i=1,\ldots ,M_\ell \), with the sequence \(\{\eta ^{\ell }_{n+1,i}\}_{i,\ell }\) being independent and identically \(N(0, \Gamma )-\)distributed. It is precisely the multiplication with the Kalman gain in the update step that correlates all the MLEnKF particles. In comparison to standard MLMC where all samples except the pairwise coupled ones are independent, this global correlation in MLEnKF substantially complicates the convergence analysis of the method.

Remark 3

Although unlikely, the multilevel sample prediction covariance \(C^{\mathrm {ML}}_{n+1}\) may have negative eigenvalues and, worst case, this could lead to \(S^{\mathrm{ML}}_{n+1}= H C^{\mathrm {ML}}_{n+1} H^* + \Gamma \) becoming a singular matrix. The impetus for replacing the matrix \((H C^{\mathrm {ML}}_{n+1} H^*)\) with its positive semidefinite “counterpart” \((H C^{\mathrm {ML}}_{n+1} H^*)^+\) in the Kalman gain formula (13) is to ensure that \(S^{\mathrm{ML}}_{n+1}\) is invertible and to obtain the bound \(|(S^{\mathrm{ML}}_{n+1})^{-1}| \le |\Gamma ^{-1}|\).

The following notation denotes the (signed) empirical measure of the multilevel ensemble \(\{(\hat{v}^{\ell -1}_{n,i},\hat{v}^{\ell }_{n,i})_{i=1}^{M_\ell }\}_{\ell =0}^L\):

$$\begin{aligned} {\hat{\mu }}^{\mathrm{ML}}_n = \frac{1}{M_0} \sum _{i=1}^{M_0} \delta _{ \hat{v}^{0}_{n,i}} + \sum _{\ell =1}^{L} \frac{1}{M_\ell } \sum _{i=1}^{M_\ell }( \delta _{\hat{v}^{\ell }_{n,i}} -\delta _{\hat{v}^{\ell -1}_{n,i}} ), \end{aligned}$$
(16)

and for any QoI \(\varphi : \mathcal {V}\rightarrow \mathbb {R}\), let

$$\begin{aligned} {\hat{\mu }}_n^{\mathrm{ML}}[\varphi ] := \int \varphi d{\hat{\mu }}^{\mathrm{ML}}_n = \sum _{\ell =0}^{L} \frac{1}{M_\ell }\sum _{i=1}^{M_\ell } \left( \varphi (\hat{v}^{\ell }_{n,i}) - \varphi (\hat{v}^{\ell -1}_{n,i})\right) . \end{aligned}$$

We conclude this section with an estimate that relates to the computational cost of one MLEnKF update step.

Proposition 3

Given an MLEnKF hierarchy of prediction ensembles

$$\begin{aligned} \{ (v^{0}_{n+1,i})_{i=1}^{M_0}, \{(v^{\ell -1}_{n+1,i}, v^{\ell }_{n+1,i})_{i=1}^{M_\ell }\}_{\ell =1}^L\}, \end{aligned}$$

the cost of constructing the multilevel Kalman gain \(K^{\mathrm {ML}}_{n+1}\) is proportional to \(\sum _{\ell =0}^L m N_\ell M_\ell \). And if Assumption 2(iii) holds, then the cost of updating the \(\ell \)th level ensemble

$$\begin{aligned} (v^{\ell -1}_{n+1,i}, v^{\ell }_{n+1,i})_{i=1}^{M_\ell } \mapsto (v^{\ell -1}_{n+1,i},\hat{v}^{\ell }_{n+1,i})_{i=1}^{M_\ell } \end{aligned}$$

by (15) is proportional to \(m N_\ell M_\ell \).

Proof

Notice that it is not required to compute the full multilevel prediction covariance \(C^{\mathrm{ML}}_{n+1}\) in order to build the MLEnKF Kalman gain, but rather only

$$\begin{aligned} R^{\mathrm{ML}}_{n+1} := C^{\mathrm {ML}}_{n+1} H^* = \sum _{\ell =0}^L \Big ( \mathrm {Cov}_{M_\ell }[ v^{\ell }_{n+1}, H v^{\ell }_{n+1}] - \mathrm {Cov}_{M_\ell }[ v^{\ell -1}_{n+1}, H v^{\ell -1}_{n+1}] \Big ). \end{aligned}$$
(17)

(The advantage of storing \(R^{\mathrm{ML}}_{n+1} \in \mathbb {R}^{N_L \times m}\) rather than \(C^{\mathrm{ML}}_{n+1} \in \mathbb {R}^{N_L \times N_L}\) is the dimensional reduction obtained for large L, since then \(N_L \gg m\).)

For the Kalman gain, the cost of computing \(\mathrm {Cov}_{M_\ell }[ v^{\ell }_{n+1}, H v^{\ell }_{n+1}] \in \mathbb {R}^{N_\ell \times m}\) is proportional to \(m N_\ell M_\ell \). There are also the insignificant one-time costs of constructing and inverting \(S^{\mathrm{ML}}_{n+1}\), and the matrix multiplication \(R^{\mathrm{ML}}_{n+1}(S^{\mathrm{ML}}_{n+1})^{-1}\). In total, these costs are proportional to \(N_Lm^2\).

The cost of updating the \(\ell \)th level ensemble by (15) contains the one-time cost of the matrix multiplications \(\Pi _\ell K^{\mathrm {ML}}_{n+1}\) which by Assumption 2(iii) is proportional to \(mN_\ell \). For each particle, the cost of computing \(H {v}^{\ell }_{n+1,i} \) is proportional to \(N_\ell \), since \({v}^{\ell }_{n+1,i} \in \mathcal {V}_\ell \), and the cost of computing \((\Pi _\ell K^{\mathrm {ML}}_{n+1})(H {v}^{\ell }_{n+1,i} )\) and \((\Pi _\ell K^{\mathrm {ML}}_{n+1})\tilde{y}^{\ell }_{n+1,i}\) are both proportional to \(mN_\ell \). \(\square \)

3.3 MLEnKF algorithms

A subtlety with computing (17) efficiently is that the summands will be elements of different sized tensor spaces since

$$\begin{aligned} \mathrm {Cov}_{M_\ell }[ v^{\ell -1}_{n}, H v^{\ell -1}_{n} ]\in \mathbb {R}^{N_{\ell -1} \times m} \quad \text {while} \quad \mathrm {Cov}_{M_\ell }[ v^{\ell }_{n}, H v^{\ell }_{n}] \in \mathbb {R}^{N_{\ell } \times m} \end{aligned}$$

for \(\ell =1,2,\ldots ,L\). The algorithm presented below efficiently computes (17) through performing all arithmetic operations in the tensor space of lowest possible dimension available at the current stage of computations. When Proposition 3 applies, the computational cost of the algorithm is \(\mathcal {O}(m\sum _{\ell =0}^L M_\ell N_\ell )\). For ease of exposition, we will in the sequel employ the convention \(v^{-1}_{n,i} = \hat{v}^{-1}_{n,i} =0\) for all ni.

figure a

In Algorithm 2, we summarize the main steps for one predict-update iteration of the MLEnKF method.

figure b

4 Theoretical results

In this section we derive theoretical results on the approximation error and computational cost of weakly approximating the MFEnKF filtering distribution by MLEnKF. We begin by stating the main theorem of this paper. It gives an upper bound for the computational cost of achieving a sought accuracy in \(L^p(\Omega )\)-norm when using the MLEnKF method to approximate the expectation of a QoI. The theorem may be considered an extension to spatially extended models of the earlier work [23].

Theorem 1

(MLEnKF accuracy vs. cost) Consider a Lipschitz continuous QoI \(\varphi :\mathcal {V}\rightarrow \mathbb {R}\) and suppose Assumption 2 holds. For a given \(\varepsilon >0\), let L and \(\{M_\ell \}_{\ell =0}^L\) be defined under the constraints \(L = \lceil 2{d}\log _\kappa (\varepsilon ^{-1})/\beta \rceil \) and

$$\begin{aligned} {M_\ell \eqsim {\left\{ \begin{array}{ll} h_\ell ^{(\beta +d\gamma _x+\gamma _t)/2} h^{-\beta }_L, &{} \text {if} \quad \beta > d\gamma _x +\gamma _t, \\ h_\ell ^{(\beta +d\gamma _x +\gamma _t)/2} L^2 h^{-\beta }_L, &{} \text {if} \quad \beta = d\gamma _x +\gamma _t, \\ h_\ell ^{(\beta + d\gamma _x +\gamma _t)/2} h^{-(\beta + d\gamma _x +\gamma _t)/2}_L, &{}\text {if} \quad \beta < d\gamma _x +\gamma _t. \end{array}\right. }} \end{aligned}$$
(18)

Then, for any \(p\ge 2\) and \(n \in \mathbb {N}\),

$$\begin{aligned} \Vert {\hat{\mu }}^{\mathrm{ML}}_n [\varphi ] - {\hat{\bar{\mu }}}_n [\varphi ] \Vert _{L^p(\Omega )} \lesssim |\log (\varepsilon )|^n\varepsilon , \end{aligned}$$
(19)

where we recall that \({\hat{\mu }}^{\mathrm{ML}}_n\) denotes the MLEnKF empirical measure (16), and \({\hat{\bar{\mu }}}_n\) denotes the mean-field EnKF distribution at time n (meaning \(\hat{\bar{ v}}_{n} \sim {\hat{\bar{\mu }}}_n\)).

The computational cost of the MLEnKF estimator

$$\begin{aligned} \mathrm {Cost}\left( {\hat{\mu }}^{\mathrm{ML}}_n [\varphi ]\right) := \sum _{\ell =0}^L M_\ell \mathrm {Cost}\left( \Psi ^\ell \right) \end{aligned}$$

satisfies

$$\begin{aligned} \mathrm {Cost}\left( {\hat{\mu }}^{\mathrm{ML}}_n [\varphi ]\right) \eqsim {\left\{ \begin{array}{ll} \varepsilon ^{-2}, &{} \text {if} \quad \beta > d\gamma _x +\gamma _t,\\ \varepsilon ^{-2} \left| \log (\varepsilon ) \right| ^3, &{} \text {if} \quad \beta = d\gamma _x +\gamma _t,\\ \varepsilon ^{- 2(d\gamma _x +\gamma _t)/\beta }, &{} \text {if} \quad \beta < d\gamma _x+\gamma _t. \end{array}\right. } \end{aligned}$$

The proof of this theorem is presented at the end of this section, and it depends upon the intermediary results presented prior to the proof.

Remark 4

The constraint \(d\gamma _x +\gamma _t \ge d\) in Assumption 2(iii) was imposed to ensure that the computational cost of the forward simulation, Cost\((\Psi ^\ell ) \eqsim h_\ell ^{-(d\gamma _x +\gamma _t) }\), is either linear or superlinear in \(N_\ell \). In view of Proposition  3, the share of the total cost of a single predict and update step assigned to level \(\ell \) is proportional to \(M_\ell \text {Cost}(\Psi ^\ell )\). This cost estimate is used as input in the standard-MLMC-constrained-optimization approach to determining \(M_\ell \), cf. (18). However, it is important to observe that in settings with high dimensional observations, \(m \ge N_0\), the input in said optimization problem needs to be modified accordingly, as then the cost on the lower levels will be dominated by m rather than \(\hbox {Cost}(\Psi ^\ell )\).

The first result we present is a collection of direct consequences of Assumption 2:

Proposition 4

If Assumption 2 holds, then for all \(u,v \in \cap _{p \ge 2} L^p(\Omega , V)\), and globally Lipschitz QoI \(\varphi :\mathcal {V}\rightarrow {\mathbb {R}}\),

  1. (i)

    \(\Vert \Psi ^{\ell }(v) - \Psi ^{\ell -1}(v) \Vert _{L^p(\Omega , \mathcal {V})} \lesssim (1+\Vert v\Vert _{L^p(\Omega , V)}) h_\ell ^{\beta /2}\), for all \(p\ge 2\),

  2. (ii)

    , for all \(p \ge 2\),

  3. (iii)

    and for all \(n\ge 1\), the MFEnKF prediction covariance (9) satisfies

    $$\begin{aligned} \Vert (I- \Pi _\ell ) {\bar{C}}_n \Vert _{\mathcal {V}\otimes \mathcal {V}} \lesssim \Vert \Psi (\hat{{\bar{v}}}_{n-1}) \Vert _{L^2(\Omega , V)} h_\ell ^{\beta /2}. \end{aligned}$$

Proof

Property (i) follows from Assumption 2(i) and the triangle inequality. Property (ii) follows from the Lipschitz continuity of \(\varphi \) followed by the triangle inequality, Assumption 1(i), and Assumption 2(i). For property (iii), Proposition 2, Jensen’s inequality, definition (3), and Hölder’s inequality implies that

Since \((I-\mathcal {P}_\ell ) \bar{v}_n =(I-\mathcal {P}_\ell )\Psi (\hat{\bar{v}}_{n-1})\), Assumption 2(ii) implies that

$$\begin{aligned} \begin{aligned} \Vert (I-\Pi _\ell )\Psi (\hat{\bar{v}}_{n-1}) \Vert _{L^2(\Omega ,\mathcal {V})}&\le \Vert (I-\mathcal {P}_\ell )\Psi (\hat{\bar{v}}_{n-1}) \Vert _{L^2(\Omega ,\mathcal {V})}\\&\quad + \Vert (\Pi _\ell -\mathcal {P}_\ell ) \Psi (\hat{\bar{v}}_{n-1}) \Vert _{L^2(\Omega ,\mathcal {V})}\\&\le 2 \Vert \Psi (\hat{\bar{v}}_{n-1}) \Vert _{L^2(\Omega , V)} h^{\beta /2}_\ell . \end{aligned} \end{aligned}$$

\(\square \)

Similar to the analysis in [23, 36, 37, 41], we next introduce an auxiliary mean-field multilevel ensemble \(\{(\bar{v}^{\ell -1}_{n,i}, \bar{v}^{\ell }_{n,i})_{i=1}^{M_\ell }\}_{\ell =0}^L\), where every particle pair \((\bar{v}^{\ell -1}_{n,i},\bar{v}^{\ell }_{n,i})\) evolves by the respective forward mappings \(\Psi ^{\ell -1}\) and \(\Psi ^{\ell }\) using the same driving noise realization as the corresponding MLEnKF particle pair \((v^{\ell -1}_{n,i},v^{\ell }_{n,i})\). Note however that in the update of the mean-field multilevel ensemble, the limiting form MFEnKF covariance \(\bar{C}_n\) and Kalman gain \(\bar{K}_n\) from Eqs. (9) and (10) are used rather than corresponding ones based on sample moments of the multilevel ensemble itself. That is, the initial condition for each coupled particle pair is identical to that of MLEnKF

$$\begin{aligned} (\hat{\bar{ v}}^{\ell -1}_{0,i},\hat{\bar{ v}}^{\ell }_{0,i}) = (\hat{v}^{\ell -1}_{0,i},\hat{v}^{\ell }_{0,i}) \end{aligned}$$
(20)

and one prediction-update iteration is given by

$$\begin{aligned}&\text{ Prediction }\;\left\{ \begin{array}{ll} \bar{v}^{\ell -1}_{n+1,i} &{} = \Psi ^{\ell -1}(\hat{\bar{ v}}^{\ell -1}_{n,i},\omega _{\ell ,i}),\\ \bar{v}^{\ell }_{n+1,i} &{} = \Psi ^\ell (\hat{\bar{ v}}^{\ell }_{n,i}, \omega _{\ell ,i}), \end{array}\right. \end{aligned}$$
(21)
$$\begin{aligned}&\text{ Update }\left\{ \begin{array}{lll} {{\tilde{y}}}_{n+1,i}^\ell &{} =y_{n+1}+\eta _{n+1,i}^\ell ,\\ \hat{\bar{ v}}_{n+1,i}^{\ell -1} &{}=(I- \Pi _{\ell -1}\bar{K}_{n+1}H){\bar{v}}_{n+1,i}^{\ell -1}+\Pi _{\ell -1}\bar{K}_{n+1}{{\tilde{y}}}_{n+1,i}^\ell ,\\ \hat{\bar{ v}}_{n+1,i}^{\ell } &{}=(I- \Pi _{\ell } \bar{K}_{n+1}H){\bar{v}}_{n+1,i}^\ell +\Pi _\ell \bar{K}_{n+1}{{\tilde{y}}}_{n+1,i}^\ell , \end{array}\right. \end{aligned}$$
(22)

for \(\ell =0,1,\ldots , L\) and \(i = 1,2,\ldots , M_\ell \) (similar to before, we employ the convention \(\bar{v}^{-1} = \hat{\bar{ v}}^{-1} \mathrel {\mathop :}=0\)). By similar reasoning as in Proposition 2, it can be shown that also \(\bar{v}^{\ell }_n\), \(\hat{\bar{ v}}^{\ell }_n \in \cap _{p \ge 2} L^p(\Omega , V)\) for any \(\ell ,n \in \mathbb {N}\cup \{0\}\). One may think of the auxiliary mean-field multilevel ensemble as “shadowing” the MLEnKF ensemble.

Before bounding the difference between the multilevel and mean-field Kalman gains by the two next lemmas, let us recall that they respectively are given by

$$\begin{aligned} K^{\mathrm{ML}}_n = C^{\mathrm{ML}}_n H^* ( (H C^{\mathrm{ML}}_n H^*)^+ + \Gamma )^{-1} \quad \text {and} \quad \bar{K}_{n} = {\bar{C}}_n H^* ( H {\bar{C}}_n H^* + \Gamma )^{-1}. \end{aligned}$$

Lemma 1

For the matrix \((H C^{\mathrm{ML}}_n H^*)^+:\mathbb {R}^{m\times m}\) defined by (14) with the spectral decomposition eigenpairs \((\lambda _j, q_j)_{j=1}^m\) it holds that

$$\begin{aligned} |(H C^{\mathrm{ML}}_n H^*)^+ - H C^{\mathrm{ML}}_n H^* | \le \left\| H\right\| ^2_{L(\mathcal {V},\mathbb {R}^{m})} \left\| C^{\mathrm{ML}}_n - \bar{C}_n \right\| _{\mathcal {V}\otimes \mathcal {V}}. \end{aligned}$$
(23)

Proof

Since \((H C^{\mathrm{ML}}_n H^*)^+ - H {\bar{C}}_n H^*\) is self-adjoint and positive semi-definite,

$$\begin{aligned} \begin{aligned} |(H C^{\mathrm{ML}}_n H^*)^+ - H C^{\mathrm{ML}}_n H^*|&= \max _{\Vert q\Vert _{\mathbb {R}^{m}}=1} q^*\Big \{ (H C^{\mathrm{ML}}_n H^*)^+ - H C^{\mathrm{ML}}_n H^* \Big \} q\\&= \max \{-\min _{j; \lambda _j < 0} \lambda _j, 0\}. \end{aligned} \end{aligned}$$

It remains to verify the lemma when . Let the normalized eigenvector associated to the eigenvalue \( \min _{\{j; \lambda _j< 0\}} \lambda _j\) be denoted \(q_{\max }\). Then, since \((HR^{\mathrm{ML}}_n)^+ q_{\max }= 0\) and the mean-field covariance \({\bar{C}}_n\) is self-adjoint and positive semi-definite,

$$\begin{aligned} \begin{aligned} |(H C^{\mathrm{ML}}_n H^*)^+ - H C^{\mathrm{ML}}_n H^*|&= -q_{\max }^* H C^{\mathrm{ML}}_n H^* q_{\max }\\&\le q_{\max }^* H{\bar{C}}_n H^* q_{\max } - q_{\max } H C_n^{\mathrm{ML}} H^*q_{\max }\\&\le |H({\bar{C}}_n - C_n^{\mathrm{ML}})H^*|\\&\le \left\| H\right\| ^2_{L(\mathcal {V},\mathbb {R}^{m})} \left\| {\bar{C}}_n - C_n^{\mathrm{ML}}\right\| _{L(\mathcal {V},\mathcal {V})}\\&\le \left\| H\right\| ^2_{L(\mathcal {V},\mathbb {R}^{m})} \left\| {\bar{C}}_n - C_n^{\mathrm{ML}} \right\| _{\mathcal {V}\otimes \mathcal {V}}. \end{aligned} \end{aligned}$$

\(\square \)

The next step is to bound the Kalman gain error in terms of the covariance error.

Lemma 2

There exists a positive constant \({\tilde{c}}_n<\infty \), depending on \(\left\| H\right\| _{L(\mathcal {V},\mathbb {R}^{m})}\), \(|\Gamma ^{-1}|\), and \(\left\| \bar{K}_n \right\| _{L(\mathbb {R}^{m},\mathcal {V})}\), such that

$$\begin{aligned} \left\| K^{\mathrm{ML}}_n - \bar{K}_n\right\| _{L(\mathbb {R}^{m}, \mathcal {V})} \le {\tilde{c}}_n \left\| {C}^{\mathrm{ML}}_{n}-\bar{C}_{n} \right\| _{\mathcal {V}\otimes \mathcal {V}}. \end{aligned}$$

Proof

The proof of this lemma as is similar to that of [23, Lemma 3.4]. For completeness, we have included a proof in “Appendix C”. \(\square \)

The next lemma bounds the distance between the prediction covariance matrices of MLEnKF and MFEnKF. For that purpose, let us first recall that the dynamics for the mean-field multilevel ensemble \(\{(\bar{v}^{\ell -1}_{n,i}, \bar{v}^{\ell }_{n,i})_{i=1}^{M_\ell }\}_{\ell =0}^L\) is described in Eqs. (21) and (22), and introduce the auxiliary matrix

$$\begin{aligned} \bar{C}^{\mathrm{ML}}_n \mathrel {\mathop :}=\sum _{\ell =0}^L \mathrm {Cov}_{M_\ell }[\bar{v}^\ell _n] - \mathrm {Cov}_{M_\ell }[\bar{v}^{\ell -1}_n]. \end{aligned}$$
(24)

Lemma 3

For any \(\varepsilon >0\), let L and \(\{M_\ell \}_{\ell =0}^L\) be defined as in Theorem 1. If Assumption 2 holds, then for any \(p\ge 2\) and \(n \in \mathbb {N}\),

$$\begin{aligned} \Vert C^{\mathrm{ML}}_n - \bar{C}_n\Vert _{L^p(\Omega , \mathcal {V}\otimes \mathcal {V})} \lesssim \varepsilon + \Vert C^{\mathrm{ML}}_n - \bar{C}^{\mathrm{ML}}_n\Vert _{L^p(\Omega , \mathcal {V}\otimes \mathcal {V})}. \end{aligned}$$
(25)

Proof

Introducing the auxiliary covariance matrix

$$\begin{aligned} \bar{C}^L_n \mathrel {\mathop :}=\mathrm {Cov}[\bar{v}^L_n] \end{aligned}$$

and using the triangle inequality,

$$\begin{aligned} \Vert C^{\mathrm{ML}}_n - \bar{C}_n\Vert _{p} \le \Vert \bar{C}^L_n - \bar{C}_n\Vert _{p} + \Vert \bar{C}^{\mathrm{ML}}_n - \bar{C}^L_n\Vert _{p} + \Vert C^{\mathrm{ML}}_n - \bar{C}^{\mathrm{ML}}_n \Vert _{p}. \end{aligned}$$

The result follows by Lemmas 4 and 5. \(\square \)

Lemma 4

For any \(\varepsilon >0\), let L be defined as in Theorem 1. If Assumption 2 holds, then for any \(n\in \mathbb {N}\) and \(p\ge 2\),

$$\begin{aligned}&\max \left\{ \Vert \bar{v}_{n}^\ell - \bar{v}_{n}\Vert _{L^p(\Omega , \mathcal {V})}, \Vert \hat{\bar{ v}}_{n}^\ell - \hat{\bar{ v}}_{n}\Vert _{L^p(\Omega , \mathcal {V})} \right\} \lesssim h^{\beta /2}_\ell , \end{aligned}$$
(26)
$$\begin{aligned}&\max \left\{ \Vert \bar{v}_{n}^\ell - \bar{v}_{n}^{\ell -1}\Vert _{L^p(\Omega , \mathcal {V})}, \Vert \hat{\bar{ v}}_{n}^\ell - \hat{\bar{ v}}_{n}^{\ell -1}\Vert _{L^p(\Omega , \mathcal {V})} \right\} \lesssim h_\ell ^{\beta /2}, \quad \forall \ell \in \mathbb {N}, \end{aligned}$$
(27)
$$\begin{aligned}&\Vert \bar{C}^L_n- \bar{C}_n\Vert _{\mathcal {V}\otimes \mathcal {V}} \lesssim \varepsilon . \end{aligned}$$
(28)

Proof

Recall that initial data of the limit mean-field methods is given by \(\hat{\bar{ v}}_0\sim {\hat{\mu }}_0\) and that \(\hat{\bar{ v}}^\ell _0 =\Pi _\ell \hat{\bar{ v}}_0\), so that by Assumption 2(ii),

$$\begin{aligned} \Vert \hat{\bar{ v}}_0 - \hat{\bar{ v}}^\ell _0 \Vert _{L^p(\Omega ,\mathcal {V})} \lesssim \Vert \hat{\bar{ v}}_0 \Vert _{L^p(\Omega ,V)}h_\ell ^{\beta /2}. \end{aligned}$$

By Assumptions 1(i) and 2(i),

$$\begin{aligned} \Vert \bar{v}_n - \bar{v}^\ell _n \Vert _{L^p(\Omega ,\mathcal {V})} \lesssim \Vert \hat{\bar{ v}}_{n-1} - \hat{\bar{ v}}^\ell _{n-1} \Vert _{L^p(\Omega ,\mathcal {V})} + (1+\Vert \hat{\bar{ v}}_{n-1}\Vert _{L^p(\Omega ,V)} )h_\ell ^{\beta /2}, \end{aligned}$$

and by Proposition 4(iii),

$$\begin{aligned} \begin{aligned} \Vert \hat{\bar{ v}}_n - \hat{\bar{ v}}^\ell _n \Vert _{{L^p(\Omega ,\mathcal {V})}}&\le \left\| I-\bar{K}_{n}H\right\| _{L(\mathcal {V},\mathcal {V})} \Vert \bar{v}_{n}^\ell - \bar{v}_{n}\Vert _{{L^p(\Omega ,\mathcal {V})} }\\&\quad + \Vert (I-\Pi _\ell )\bar{K}_{n}(H\bar{v}_{n}^\ell + {\tilde{y}}_n) \Vert _{L^p(\Omega ,\mathcal {V})} \\&\le c \left( \Vert \bar{v}_{n}^\ell - \bar{v}_{n}\Vert _{L^p(\Omega ,\mathcal {V})} + \Vert (I-\Pi _\ell ){\bar{C}}_{n}\Vert _{\mathcal {V}\times \mathcal {V}} \right) \\&\lesssim \Vert \bar{v}_{n}^\ell - \bar{v}_{n}\Vert _{L^p(\Omega ,\mathcal {V})} + \Vert \Psi (\hat{\bar{ v}}_{n-1})\Vert _{L^2(\Omega , V)} h_\ell ^{\beta /2}. \end{aligned} \end{aligned}$$

Inequality (26) consequently holds by induction, and thus also (27) by the triangle inequality. To prove inequality (28),

\(\square \)

We complete the proof of Lemma 3 by deriving the following bound for \(\Vert \bar{C}^{\mathrm{ML}}_n - \bar{C}^L_n\Vert _p\) :

Lemma 5

(Multilevel i.i.d. sample covariance error) For any \(\varepsilon >0\), let L and \(\{M_\ell \}_{\ell =0}^L\) be defined as in Theorem 1. If Assumption 2 holds, then for any \(p\ge 2\) and \(n \in \mathbb {N}\),

$$\begin{aligned} \Vert \bar{C}^{\mathrm{ML}}_n - \bar{C}^L_n\Vert _{L^p(\Omega , \mathcal {V}\otimes \mathcal {V})} \lesssim \varepsilon , \end{aligned}$$

where we recall that \(\bar{C}^L_n \mathrel {\mathop :}=\mathrm {Cov}[\bar{v}^L_n]\).

Proof

Since the sample covariances in (24) are unbiased,

and therefore

Next, introduce the linear centering operator \(\Upsilon :L^1(\Omega ,\mathcal {V}\otimes \mathcal {V}) \rightarrow L^1(\Omega ,\mathcal {V}\otimes \mathcal {V})\), defined by . Then, by Eq. (24),

where \(\Delta _\ell \bar{v}_n := \bar{v}_n^\ell - \bar{v}_n^{\ell -1}\) with the convention \(\bar{v}_n^{-1}=0\), and

$$\begin{aligned} \begin{aligned} \Upsilon \left( \mathrm {Cov}_{M_\ell }[\bar{v}^\ell _n, \Delta _\ell \bar{v}_n]\right)&= \mathrm {Cov}_{M_\ell }[\bar{v}^\ell _n, \Delta _\ell \bar{v}_n] - \mathrm {Cov}[\bar{v}^\ell _n, \Delta _\ell \bar{v}_n],\\ \Upsilon \left( \mathrm {Cov}_{ M_\ell }[\Delta _\ell \bar{v}_n, \bar{v}^{\ell -1}_n]\right)&= \mathrm {Cov}_{M_\ell }[\Delta _\ell \bar{v}_n, \bar{v}^{\ell -1}_n] - \mathrm {Cov}[\Delta _\ell \bar{v}_n, \bar{v}^{\ell -1}_n].\\ \end{aligned} \end{aligned}$$

By Lemmas 4 and 10 (the latter lemma is located in “Appendix A)”,

\(\square \)

We now turn to bounding the last term of the right-hand side of inequality (25).

Lemma 6

For any \(\varepsilon >0\), let L and \(\{M_\ell \}_{\ell =0}^L\) be defined as in Theorem 1. If Assumption 2 holds, then for any \(p\ge 2\) and \(n \in \mathbb {N}\),

$$\begin{aligned} \begin{aligned} \Vert C^{\mathrm{ML}}_n - \bar{C}^{\mathrm{ML}}_n\Vert _{L^p(\Omega , \mathcal {V}\otimes \mathcal {V})} \le&8 \sum _{l=0}^L \Vert v_{n}^{\ell } - \bar{v}_{n}^\ell \Vert _{L^{2p}(\Omega , \mathcal {V})} (\Vert v_{n}^{\ell }\Vert _{L^{2p}(\Omega , \mathcal {V})} + \Vert \bar{v}_{n}^{\ell }\Vert _{L^{2p}(\Omega ,\mathcal {V})}). \end{aligned} \end{aligned}$$
(29)

Proof

From the definitions of the sample covariance (7) and multilevel sample covariance (12), one obtains the bounds

$$\begin{aligned} \begin{aligned} \Vert C^{\mathrm{ML}}_n - \bar{C}^{\mathrm{ML}}_n\Vert _{p}&\le \sum _{\ell =0}^L \Big ( \Vert \mathrm {Cov}_{M_\ell }[v_n^\ell ] - \mathrm {Cov}_{M_\ell }[\bar{v}_n^\ell ]\Vert _{p} \\&\quad + \Vert \mathrm {Cov}_{M_\ell }[v_n^{\ell -1}] - \mathrm {Cov}_{M_\ell }[\bar{v}_n^{\ell -1}]\Vert _{p}\Big ), \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \left\| \mathrm {Cov}_{M_\ell }[v_n^\ell ] - \mathrm {Cov}_{M_\ell }[\bar{v}_n^\ell ]\right\| _{p}&\le \frac{M_\ell }{M_\ell -1}\left\| E_{M_\ell }[v_n^\ell \otimes v_n^\ell ] - E_{M_\ell }[\bar{v}_n^\ell \otimes \bar{v}_n^\ell ]\right\| _{p} \\&\quad + \frac{M_\ell }{M_\ell -1} \left\| E_{M_\ell }[v_n^\ell ] \otimes E_{M_\ell }[ v_n^\ell ] - E_{M_\ell }[\bar{v}_n^\ell ] \otimes E_{M_\ell }[\bar{v}_n^\ell ]\right\| _{p}\\&=\mathrel {\mathop :}I_{1} + I_{2}. \end{aligned} \end{aligned}$$

The bilinearity of the sample covariance yields that

$$\begin{aligned} I_1 \le 2\left\| E_{M_\ell }[(v_n^\ell - \bar{v}_n^\ell ) \otimes v_n^\ell ] \right\| _{p} + 2\left\| E_{M_\ell }[\bar{v}_n^\ell \otimes (v_n^\ell - \bar{v}_n^\ell ) ]\right\| _{p} \end{aligned}$$
(30)

and

$$\begin{aligned} I_2 \le 2\left\| E_{M_\ell }[(v_n^\ell - \bar{v}_n^\ell )] \otimes E_{M_\ell }[v_n^\ell ] \right\| _{p} + 2\left\| E_{M_\ell }[\bar{v}_n^\ell {]} \otimes E_{M_\ell }[(v_n^\ell - \bar{v}_n^\ell ) ]\right\| _{p}. \end{aligned}$$

For bounding \(I_1\) we use Jensen’s and Hölder’s inequalities:

The second summand of inequality (30) is bounded similarly, and we obtain

$$\begin{aligned} I_1 \le 2\left\| v_n^\ell - \bar{v}_n^\ell \right\| _{2p} \left( \left\| v_n^\ell \right\| _{2p} + \left\| \bar{v}_n^\ell \right\| _{2p} \right) . \end{aligned}$$

The \(I_2\) term can also be bounded with similar steps as in the preceding argument so that also

$$\begin{aligned} I_2 \le 2\left\| v_n^\ell - \bar{v}_n^\ell \right\| _{2p} \left( \left\| v_n^\ell \right\| _{2p }+ \left\| \bar{v}_n^\ell \right\| _{2p} \right) . \end{aligned}$$

The proof is finished by summing the contributions of \(I_1\) and \(I_2\) over all levels. \(\square \)

The propagation of error in update steps of MLEnKF is governed by the magnitude \(\Vert {\bar{C}}_n - C_n^{\mathrm{ML}}\Vert _p\), i.e., the distance between the MFEnKF prediction covariance and the MLEnKF prediction covariance. The next lemma makes use of Lemma 6 to bound the distance between the mean-field multilevel ensemble \(\{(\hat{\bar{ v}}^{\ell -1}_{n,i},\hat{\bar{ v}}^{\ell }_{n,i})_{i=1}^{M_\ell }\}_{\ell =0}^L\) and the MLEnKF ensemble \(\{(\hat{v}^{\ell -1}_{n,i}, \hat{v}^{\ell }_{n,i})_{i=1}^{M_\ell }\}_{\ell =0}^L\).

Lemma 7

(Distance between ensembles) For any \(\varepsilon >0\), let L and \(\{M_\ell \}_{\ell =0}^L\) be defined as in Theorem 1. If Assumption 2 holds, then for any \(p\ge 2\) and \(n \in \mathbb {N}\),

$$\begin{aligned} \sum _{\ell =0}^L \Vert \hat{v}_{n}^{\ell }- \hat{\bar{ v}}_{n}^{\ell }\Vert _{L^p(\Omega , \mathcal {V})} \lesssim |\log (\varepsilon )|^n\varepsilon . \end{aligned}$$
(31)

Proof

The proof is similar to that of [23, Lemma 3.10]. For completeness, a proof is given in “Appendix C”. \(\square \)

With the bound between MLEnKF and its multilevel MFEnKF shadow, that conveniently for analysis consists of independent particles, we are finally ready to prove the main result.

Proof of Theorem 1

By the triangle inequality,

$$\begin{aligned} \Vert {\hat{\mu }}^{\mathrm{ML}}_n [\varphi ] - {\hat{\bar{\mu }}}_n [\varphi ]\Vert _{p}&\le \Vert {\hat{\mu }}^{\mathrm{ML}}_n [\varphi ] - {\hat{\bar{\mu }}}^{\mathrm{ML}}_n [\varphi ]\Vert _{p} + \Vert {\hat{\bar{\mu }}}^{\mathrm{ML}}_n [\varphi ] - {\hat{\bar{\mu }}}^L_n[\varphi ]\Vert _{p} \nonumber \\&\quad + \Vert {\hat{\bar{\mu }}}^L_n[\varphi ] - {\hat{\bar{\mu }}}_n [\varphi ]\Vert _{p}, \end{aligned}$$
(32)

where \({\hat{\bar{\mu }}}^{\mathrm{ML}}_n\) denotes the empirical measure associated to the mean-field multilevel ensemble \(\{(\hat{\bar{ v}}^{\ell -1}_{n,i}, \hat{\bar{ v}}^{\ell }_{n,i}))_{i=1}^{M_\ell }\}_{\ell =0}^L\), and \({\hat{\bar{\mu }}}^L_n\) denotes the probability measure associated to \(\hat{\bar{ v}}^L\). The two first summands on the right-hand side above relate to the statistical error, whereas the last relates to the bias.

By the Lipschitz continuity of the QoI \(\varphi \), the triangle inequality, Lemma 7, and using the conventions \(\varphi (\hat{v}_n^{-1})=0\) and \(\varphi (\hat{\bar{ v}}_n^{-1})=0\), the first term satisfies the following bound

$$\begin{aligned} \begin{aligned} \left\| {\hat{\mu }}^{\mathrm{ML}}_n [\varphi ] - {\hat{\bar{\mu }}}^{\mathrm{ML}}_n [\varphi ]\right\| _{p}&= \left\| \sum _{\ell =0}^L E_{M_\ell }\big [\varphi (\hat{v}_n^{\ell }) - \varphi (\hat{v}_n^{\ell -1}) - (\varphi (\hat{\bar{ v}}_n^\ell ) - \varphi (\hat{\bar{ v}}_n^{\ell -1}))\big ] \right\| _{p}\\&\le \sum _{\ell =0}^L \left( \left\| \varphi (\hat{v}_n^{\ell }) - \varphi (\hat{\bar{ v}}_n^\ell ) \right\| _{p} + \left\| \varphi (\hat{v}_n^{\ell -1}) - \varphi (\hat{\bar{ v}}_n^{\ell -1}) \right\| _{p} \right) \\&\le c_\varphi \sum _{\ell =0}^L \left( \left\| \hat{v}_n^{\ell } - \hat{\bar{ v}}_n^\ell \right\| _{p} + \left\| \hat{v}_n^{\ell -1} - \hat{\bar{ v}}_n^{\ell -1} \right\| _{p} \right) \\&\lesssim |\log (\varepsilon )|^n\varepsilon . \end{aligned} \end{aligned}$$

For the second summand of (32), we employ the telescoping property

and Lemmas 4 and 9 to obtain

Finally, the bias term in (32) satisfies

(33)

where the last step follows from the Lipschitz continuity of the QoI and Lemma 4. \(\square \)

Remark 5

Theorem 1 shows the cost-to-accuracy performance of MLEnKF with a disconcerting logarithmic penalty factor in (19) that grows geometrically in n. The same penalty appears in the work [23], yet the numerical experiments there indicate a rate of convergence that is uniform in n. The discrepancy between theory and practice may be an artifact of conservative bounds used in the proof of said theorem. By imposing further regularity constraints on the dynamics and the QoI, we were able to obtain an error bound without said logarithmic penalty factor for an alternative finite-dimensional-state-space MLEnKF method with local-level Kalman gains [24]. As an alternative to imposing further regularity constraints, we also suspect that ergodicity of the MFEnKF process may be used to avoid the geometrically growing the logarithmic penalty factor. Recently, there has been much work on the stability of EnKF [12, 13, 46].

We conclude this section with a result on the cost-to-accuracy performance of EnKF. It shows that MLEnKF generally outperforms EnKF.

Theorem 2

(EnKF accuracy vs. cost) Consider a Lipschitz continuous QoI \(\varphi :\mathcal {V}\rightarrow \mathbb {R}\), and suppose Assumption 2 holds. For a given \(\varepsilon >0\), let L and M be defined under the respective constraints \(L = \lceil 2{d} \log _\kappa (\varepsilon ^{-1})/\beta \rceil \) and \( M \eqsim \varepsilon ^{-2}\). Then, for any \(n \in \mathbb {N}\) and \(p \ge 2\),

$$\begin{aligned} \Vert \hat{\mu }^{\mathrm{MC}}_n [\varphi ] - \hat{\bar{\mu }}_n [\varphi ] \Vert _{L^p(\Omega ,\mathcal {V})} \lesssim \varepsilon , \end{aligned}$$

where \(\hat{\mu }^{\mathrm{MC}}_n\) denotes the EnKF empirical measure, cf. Eq. (8), with particle evolution given by the EnKF predict and update formulae at resolution level L (i.e., using the numerical approximation \(\Psi ^L\) in the prediction and the projection operator \(\Pi _L\) in the update).

The computational cost of the EnKF estimator

$$\begin{aligned} \mathrm {Cost}\left( \hat{\mu }^{\mathrm{MC}}_n [\varphi ]\right) := M \mathrm {Cost}\left( \Psi ^L\right) \end{aligned}$$

satisfies

$$\begin{aligned} \mathrm {Cost}\left( \hat{\mu }^{\mathrm{MC}}_n [\varphi ]\right) \eqsim \varepsilon ^{-2(1+(d\gamma _x{+\gamma _t})/\beta )}. \end{aligned}$$

Sketch of proof

By the triangle inequality,

$$\begin{aligned} \begin{aligned} \Vert \hat{\bar{\mu }}_n[\varphi ] -\hat{\mu }^{\mathrm{MC}}_n [\varphi ] \Vert _{L^p(\Omega )}&\le \left\| \hat{\bar{\mu }}_n[\varphi ] - \hat{\bar{\mu }}_n^L [\varphi ]\right\| _{L^p(\Omega )} +\left\| \hat{\bar{\mu }}^L[\varphi ] - \hat{\bar{\mu }}_n^{\mathrm{MC}} [\varphi ] \right\| _{L^p(\Omega )} \\&\quad + \left\| \hat{\bar{\mu }}_n^{\mathrm{MC}} [\varphi ] -\hat{\mu }^{\mathrm{MC}}_n [\varphi ]\right\| _{L^p(\Omega )} =\mathrel {\mathop :}I + II + III, \end{aligned} \end{aligned}$$

where \(\hat{\bar{\mu }}_n^{\mathrm{MC}}\) denotes the empirical measure associated to the EnKF ensemble \(\{\hat{\bar{ v}}^L_{n,i} \}_{i=1}^M\) and \(\hat{\bar{\mu }}^{L}_n\) denotes the empirical measure associated to \(\hat{\bar{ v}}^L_n\). It follows by inequality (33) that \(I \lesssim \varepsilon \).

For the second term, the Lipschitz continuity of the QoI \(\varphi \) implies there exists a positive scalar \(c_\varphi \) such that \(|\varphi (x)| \le c_\varphi (1+\left\| x \right\| _{\mathcal {V}})\). Since \(\hat{\bar{v}}_n^L \in L^p(\Omega , V)\) for any \(n \in \mathbb {N}\) and \(p\ge 2\), it follows by Lemma 9 (on the Hilbert space \(\mathbb {R}\)) that

For the last term, let us first assume that for any \(p \ge 2\) and \(n \in \mathbb {N}\),

$$\begin{aligned} \left\| \hat{v}_{n}^L - \hat{\bar{ v}}_{n}^L \right\| _{L^p(\Omega ,\mathcal {V})} \lesssim \varepsilon , \end{aligned}$$
(34)

for the single particle dynamics \(\hat{v}_{n,1}^L\) and \(\hat{\bar{ v}}_{n,1}^L\) respectively associated to the EnKF ensemble \(\{\hat{v}_{n,i}^L \}_{i=1}^M\) and the mean-field EnKF ensemble \(\{\hat{\bar{ v}}_{n,i}^L\}_{i=1}^M\). Then the Lipschitz continuity of \(\varphi \), the fact that \(\hat{v}_{n,1}^L, \hat{\bar{ v}}_{n,1}^L \in L^p(\Omega , V)\) for any \(n\in \mathbb {N}\) and \(p\ge 2\) holds (when assuming (34)), and the triangle inequality yield that

$$\begin{aligned} \begin{aligned} III = \left\| E_{M}[ \varphi (\hat{v}_n^L) - \varphi (\hat{\bar{ v}}_n^L)] \right\| _{L^p(\Omega )} \le c_\varphi \left\| \hat{v}_n^L - \hat{\bar{ v}}_n^L \right\| _{L^p(\Omega ,\mathcal {V})} \lesssim \varepsilon . \end{aligned} \end{aligned}$$

All that remains is to verify (34), but we omit this as it can be done by similar steps as for the proof of inequality (31). \(\square \)

5 MLEnKF-adapted numerical methods for a class of stochastic partial differential equations

In this section we develop an MLEnKF-adapted version of the exponential Euler method, for the purpose of solving a family of stochastic reaction–diffusion equations. For a relatively large class of problems, we derive an \(L^p(\Omega , \mathcal {V})\)-convergence rate \(\beta \) for pairwise coupled numerical solutions, cf. Assumption 2, which will needed when implementing MLEnKF.

5.1 The stochastic reaction–diffusion equation

We consider the following stochastic partial differential equation (SPDE)

(35)

where \(T>0\), and the reaction f, the cylindrical Wiener process W and the linear smoothing operator B will be described below. Our base-space is \(\mathcal {K}= L^2(0,1)\), we denote by \(A: D(A) =H^2(0,1) \cap H^1_0(0,1)\rightarrow \mathcal {K}\) the Laplace operator \(\Delta \) with homogeneous Dirichlet boundary conditions and \(H^k(0,1)\) denotes the Sobolev space of order \(k \in \mathbb {N}\). A spectral decomposition of \(-A\) yields the sequence of eigenpairs \(\{(\lambda _j, \phi _j)\}_{j \in \mathbb {N}}\) where \(-A \phi _j = \lambda _j \phi _j\) with \(\phi _j \mathrel {\mathop :}=\sqrt{2} \sin (j\pi x)\) and \(\lambda _j =\pi ^2 j^2\). \(\mathcal {K}= \overline{\text {span}\{\phi _j\}}\), it follows that

$$\begin{aligned} Av = \sum _{j \in \mathbb {N}} -\lambda _j \langle \phi _j, v \rangle _{\mathcal {K}} \phi _j, \quad \forall v \in D(A), \end{aligned}$$

and eigenpairs of the spectral decomposition give rise to the following family of Hilbert spaces parametrized over \(r \in \mathbb {R}\):

$$\begin{aligned} \begin{aligned} \mathcal {K}_r \mathrel {\mathop :}=D((-A)^{r}) = \Big \{v:[0,1] \rightarrow \mathbb {R}\mid&v \text { is } {\mathcal {B}}([0,1])/{\mathcal {B}}(\mathbb {R})\text {-measurable} \\&\text {and } \sum _{j \in \mathbb {N}} \lambda ^{2r}_j \left| \langle \phi _j, v \rangle _{\mathcal {K}} \right| ^2 < \infty \Big \}, \end{aligned} \end{aligned}$$

with norm \(\Vert \cdot \Vert _{\mathcal {K}_r} \mathrel {\mathop :}=\Vert (-A)^{r}(\cdot )\Vert _{\mathcal {K}}\). Associated with the probability space \((\Omega , \mathcal {F}, \mathbb {P})\) and normal filtration \(\{\mathcal {F}_t\}_{t \in [0,T]}\), the \(I_{\mathcal {K}}\)-cylindrical Wiener process is defined by

$$\begin{aligned} W(t, \cdot ) = \sum _{j \in \mathbb {N}} W_j(t)\phi _j, \end{aligned}$$

where \(\{W_j : [0,T] \times \Omega \rightarrow \mathbb {R}\}_{j \in \mathbb {N}}\) is a sequence of independent \(\mathcal {F}_t /\mathcal {B}(\mathbb {R})\)-adapted standard Wiener processes. The smoothing operator is defined by

$$\begin{aligned} B \mathrel {\mathop :}=\sum _{j \in \mathbb {N}} \lambda _j^{-b} \phi _j \otimes \phi _j, \end{aligned}$$
(36)

with the smoothing paramter \(b \ge 0\). It may be shown that \(B \mathcal {K}_r = \mathcal {K}_{r+b}\), and this implies that B becomes progressively more smoothing the higher the value of b.

In the remaining part of this section, we assume the following conditions on the nested Hilbert spaces \(V\subset \mathcal {V}\) and regularity conditions on the initial data \(u_0\) and the reaction term f hold:

Assumption 3

The Hilbert spaces \(V \subset \mathcal {V}\) are of the form \(\mathcal {V}= \mathcal {K}_{r_1}\) and \(V = \mathcal {K}_{r_2}\) for a pair of parameters \(r_1,r_2 \in \mathbb {R}\) satisfying

$$\begin{aligned} \max \{0,b-1/4\} \le r_1< r_2 < b +1/4, \end{aligned}$$

the initial data \(u_0\) is \(\mathcal {F}_0 /\mathcal {B}(V)\)-measurable and \( u_0\in \cap _{p\ge 2} L^p(\Omega , V)\), and the reaction satisfies

$$\begin{aligned} f \in {\mathrm {Lip}}(\mathcal {K}_{r_1}) \mathrel {\mathop :}=\left\{ g \in C(\mathcal {K}_{r_1}, \mathcal {K}_{r_1}) \, \Big | \, \sup _{u,v \in \mathcal {K}_{r_1}, u \ne v} \frac{ \Vert g(u) - g(v)\Vert _{\mathcal {K}_{r_1}}}{\Vert u-v\Vert _{\mathcal {K}_{r_1}}} < \infty \right\} . \end{aligned}$$

Under Assumption 3 there exists an up to modifications unique \((\Omega , \mathcal {F}, \mathbb {P}, \{\mathcal {F}_t\}_{t \in [0,T]})\)-mild solution of (35), which in this setting corresponds to a mapping \(u:[0,T]\times [0,1] \times \Omega \rightarrow \mathbb {R}\) that is an \(\mathcal {F}_t/\mathcal {B}(\mathcal {K}_{r_2})\)-predictable stochastic process satisfying

$$\begin{aligned} u(t) = e^{At}u(0) + \int _0^t e^{A(t-s)} f(u(s)) ds + \int _0^t e^{A(t-s)} B dW_s \end{aligned}$$
(37)

\(\mathbb {P}\)-almost surely for all \(t\in [0,T]\). Moreover, for any \(p \ge 2\) and \(r \in [r_1,r_2]\), it holds that

$$\begin{aligned} \Vert u(T,\cdot )\Vert _{L^p(\Omega , \mathcal {K}_r )} \le C(1+\Vert u_0\Vert _{L^p(\Omega , \mathcal {K}_r)}), \end{aligned}$$
(38)

where \(C>0\) depends on p, r, and T, cf. [28].

Remark 6

The Dirichlet homogeneous boundary conditions imposed in (35) only make pointwise sense provided \(u(t,\cdot ) \in \mathcal {K}_{1/2+\delta }\) for some \(\delta >0\) and all \(t \in (0,T]\), \(\mathbb {P}\)-almost surely. In lower-reglarity settings, e.g., when \(u(t,\cdot ) \notin C(0,1)\), said boundary condition should be interpreted in mild rather than pointwise sense.

5.2 The filtering problem

We consider a discrete-time filtering problem of the form (1) and (2) with the above SPDE as underlying model with \(\Psi (u_n)\) denoting the mild solution of (35) at \(T>0\) given the initial data \(u_n \in \cap _{p \ge 2} L^p(\Omega , V)\). The underlying dynamics at observation times \(0,T,2T, \ldots \) is thus described by the dynamics

$$\begin{aligned} u_{n+1} = \Psi (u_{n}), \end{aligned}$$

and the finite-dimensional partial observation of \(u_n\) at time nT is given by

$$\begin{aligned} y_n = H u_n + \eta _n, \quad \eta _n \sim N(0,\Gamma ) ~~ {\mathrm{i.i.d.}} \perp u_n, \end{aligned}$$
(39)

where \(H u = [H_1(u), \dots , H_m(u)]^{\mathsf {T}}\in \mathbb {R}^m\) with \(H_i \in \mathcal {V}^*\) for \(i=1,2,\ldots , m\). Note further that Assumption 3 implies that \(V\subset \mathcal {K}_0=\mathcal {K}\), so we may represent the mild solution in the basis \(\{\phi _j\}\) at any observation time nT:

$$\begin{aligned} u_{n} = \sum _{j \in \mathbb {N}} u_{n}^{(j)} \phi _j, \quad \text {where} \quad u_{n}^{(j)} \mathrel {\mathop :}=\langle u_{n+1}, \phi _j \rangle _{\mathcal {K}}. \end{aligned}$$

5.3 Spatial truncation

Before introducing a fully discrete MLEnKF approximation method for the filtering problem, let us warm up by having a quick look at an exact-in-time-truncated-in-space approximation method. It consists of the hierarchy of subspaces \(\mathcal {V}_\ell = \mathcal {P}_\ell \mathcal {V}= \text {span}(\{\phi _{k}\}_{k=1}^{N_\ell })\), \(\Pi _\ell = \mathcal {P}_\ell \) and

$$\begin{aligned} \Psi ^\ell (u_n) \mathrel {\mathop :}=\mathcal {P}_\ell u_{n+1} = \sum _{j=1}^{N_\ell } u_{n+1}^{(j)} \phi _j, \quad \text {for any} \quad u_n \in \mathcal {V}. \end{aligned}$$

To verify that this approximation method can be used in the MLEnKF framework, it remains to verify Assumptions 1 and 2(i)–(ii), and to determine the rate parameter \(\beta >0\). The Eq. (37), the regularity \(f \in {\mathrm{Lip}}(\mathcal {K}_{r_1})\), the inequality

$$\begin{aligned} \sup _{v \in \mathcal {K}_{r}\setminus \{0\} } \frac{ \Vert e^{At} v \Vert _{\mathcal {K}_r}}{\Vert v\Vert _{\mathcal {K}_r}} \le 1, \quad \text {for all} \quad r \in \mathbb {R}\quad \text {and} \quad t \ge 0, \end{aligned}$$
(40)

and Jensen’s inequality imply that for any \(p\ge 2\), there exists a \(C>0\) such that

$$\begin{aligned} \begin{aligned}&\Vert \Psi ^\ell (u_0) - \Psi ^\ell (v_0)\Vert _{L^p(\Omega ,\mathcal {V})} \le \Vert \Psi (u_0)- \Psi (v_0)\Vert _{L^p(\Omega ,\mathcal {V})} \\&\le \Vert u_0 -v_0\Vert _{L^p(\Omega ,\mathcal {V})} + \left\| \int _{0}^T e^{A(t-s)} (f(u(s))-f(v(s))) ds \right\| _{L^p(\Omega ,\mathcal {V})} \\&\le \Vert u_0 -v_0\Vert _{L^p(\Omega ,\mathcal {V})} + \int _{0}^T \Vert f(u(s))-f(v(s))\Vert _{L^p(\Omega ,\mathcal {V})} ds \\&= \Vert u_0 -v_0\Vert _{L^p(\Omega ,\mathcal {V})} + C \int _{0}^T \Vert u(s)-v(s)\Vert _{L^p(\Omega , \mathcal {V})} ds. \end{aligned} \end{aligned}$$

Hence by Gronwall’s inequality,

$$\begin{aligned} \Vert \Psi ^\ell (u_0) - \Psi ^\ell (v_0)\Vert _{L^p(\Omega ,\mathcal {V})}\le C \Vert u_0-v_0\Vert _{L^p(\Omega , \mathcal {V})}, \end{aligned}$$

which verifies Assumption 1(i). Assumption 1(ii) follows from (38). To verify that Assumption 2(i) holds with rate \(\beta = 4(r_2-r_1)\), observe that for any \(p\ge 2\),

$$\begin{aligned} \begin{aligned} \Vert \Psi ^\ell (u_0) - \Psi (u_0) \Vert _{L^p(\Omega , \mathcal {V})}&= \Vert \sum _{j>N_\ell } \lambda _j^{r_1} u_1^{(j)} \phi _j \Vert _{L^p(\Omega , \mathcal {K})} \\&\le \lambda _{N_\ell }^{-(r_2-r_1)} \Vert \sum _{j> N_\ell } \lambda _{j}^{r_2} u_1^{(j)} \phi _j \Vert _{L^p(\Omega , \mathcal {K})} \\&\lesssim N_\ell ^{-2(r_2-r_1)} \Vert u_1 \Vert _{L^p(\Omega , V)} \\&\lesssim (1 + \Vert u_0 \Vert _{L^p(\Omega , V)}) h_\ell ^{2(r_2-r_1)}, \end{aligned} \end{aligned}$$
(41)

where the last inequality follows from (38) and \(h_\ell \eqsim N_\ell ^{-1}\). Assumption 2(ii) follows by a similar shift-space argument. (Relating to Assumption 2(iii), we leave the question of the computational cost of this method open for the time being, but see Sect. 6.2 for treatment in one example.)

5.4 A fully-discrete approximation method

The fully-discrete approximation method consists of Galerkin approximation in space and numerical integration in time by the exponential Euler scheme, cf. [29, 30]. Given a timestep \(\Delta t_\ell = T/J_\ell \), let \(\{U_{\ell , k} \}_{k=0}^{J_\ell } \subset \mathcal {V}_\ell \) with \(U_{\ell ,0} = \mathcal {P}_\ell u_0\) denote the numerical approximation SPDE (35) on level \(\ell \). It is given by the scheme

$$\begin{aligned} \begin{aligned} U_{\ell ,k+1}&= e^{A_\ell \Delta t_\ell } U_{\ell ,k} + A_\ell ^{-1} \left( e^{A_\ell \Delta t_\ell } -I \right) f_\ell (U_{\ell , k}) \\&\quad + \underbrace{\mathcal {P}_\ell \int _{k\Delta t_\ell }^{(k+1)\Delta t_\ell } e^{A ((k+1)\Delta t_\ell -s)} B dW(s) }_{=\mathrel {\mathop :}R_{\ell ,k}}, \end{aligned} \end{aligned}$$

where \(A_\ell \mathrel {\mathop :}=\mathcal {P}_\ell A\) and \(f_\ell \mathrel {\mathop :}=\mathcal {P}_\ell f\). The jth mode of the scheme \(U^{(j)}_{\ell ,k} \mathrel {\mathop :}=\langle U_{\ell , k}, \phi _j\rangle _{\mathcal {K}}\) for \(j =1,2,\ldots , N_\ell \), is given by

$$\begin{aligned} U_{\ell , k+1}^{(j)} = e^{-\lambda _j \Delta t_\ell } U_{\ell ,k}^{(j)} + \frac{1- e^{-\lambda _j \Delta t_\ell }}{\lambda _j} \left( f_\ell (U_{\ell , k} ) \right) ^{(j)} + R_{\ell ,k}^{(j)}, \end{aligned}$$
(42)

for \(k=0,1,\ldots , J_\ell -1\) with i.i.d.

$$\begin{aligned} R_{\ell ,k}^{(j)} \sim N\left( 0, \frac{1-e^{-2\lambda _j \Delta t_\ell }}{2\lambda _j^{1+2b}} \right) , \end{aligned}$$
(43)

for \(j \in \{1,2,\ldots , N_\ell \}\), \(k \in \{0,1,\ldots , J_\ell -1\}\) and \(\ell = 0,1, \ldots \) In view of the mode-wise numerical solution, the \(\ell \)th level solution operator for the fully-discrete approximation method is defined by

$$\begin{aligned} {\widetilde{\Psi }}^\ell (u_0) \mathrel {\mathop :}=\sum _{j=1}^{N_\ell } U_{\ell , J_\ell }^{(j)} \phi _j. \end{aligned}$$

5.4.1 Coupling of levels

For a hierarchy of temporal resolutions \(\{\Delta t_\ell = T/J_\ell \}\) with \(J_\ell = 2^\ell J_0\), pairwise correlated solutions \(({\widetilde{\Psi }}^{\ell -1}(u_0), \widetilde{\Psi }^\ell (u_0))\) are obtained through first generating the fine-level driving noise \(\{R_{\ell ,k}\}_{k}\) and the solution \(\widetilde{\Psi }^\ell (u_0)\) by (42) and thereafter computing the coarse level solution conditioned on \(\{R_{\ell ,k}\}_{k}\). Since \(J_\ell = 2J_{\ell -1}\), it follows that

$$\begin{aligned} \begin{aligned}&\mathcal {P}_{\ell -1} \int _{k\Delta t_{\ell -1}}^{(k+1)\Delta t_{\ell -1}} e^{A ((k+1)\Delta t_{\ell -1} -s)} BdW(s) \\&\quad = e^{A_{\ell -1}\Delta t_\ell } \mathcal {P}_{\ell -1} \int _{2k\Delta t_{\ell }}^{(2k+1)\Delta t_{\ell }} e^{A ((2k+1)\Delta t_{\ell } -s)} B dW(s) \\&\qquad + \mathcal {P}_{\ell -1} \int _{(2k+1)\Delta t_{\ell }}^{2(k+1)\Delta t_{\ell }} e^{A ((k+1)\Delta t_{\ell -1} -s)} B dW(s). \end{aligned} \end{aligned}$$

Consequently

$$\begin{aligned} \begin{aligned} R_{\ell -1,k} \Big | (R_{\ell ,2k}, R_{\ell ,2k+1})&= e^{A_{\ell -1} \Delta t_\ell } \mathcal {P}_{\ell -1}R_{\ell ,2k} + \mathcal {P}_{\ell -1} R_{\ell ,2k+1}, \end{aligned} \end{aligned}$$

and the conditional coarse-level solution \({\widetilde{\Psi }}^{\ell -1}(u_0) |\{R_{\ell ,k}\}_{k}\) is of the form

$$\begin{aligned} \begin{aligned} U_{\ell -1, k+1}&= e^{A_{\ell -1} \Delta t_{\ell -1} } U_{\ell -1,k} + A_{\ell -1}^{-1} \left( e^{A_{\ell -1} \Delta t_{\ell -1}} - I \right) f_{\ell -1}( U_{\ell -1, k}) \\&\quad + e^{A_{\ell -1} \Delta t_\ell } \mathcal {P}_{\ell -1} R_{\ell ,2k} + \mathcal {P}_{\ell -1}R_{\ell ,2k+1} \end{aligned} \end{aligned}$$

for \(k= 0,1,\ldots ,J_{\ell -1}-1\), and with the initial condition \(U_{\ell -1,0} = \mathcal {P}_{\ell -1} u_0\). In other words, the scheme for the jth mode of the coarse level solution is given by

$$\begin{aligned} \begin{aligned} U_{\ell -1, k+1}^{(j)}&= e^{-\lambda _j \Delta t_{\ell -1} } U_{\ell -1,k}^{(j)} + \frac{1-e^{-\lambda _j \Delta t_{\ell -1}}}{\lambda _j} \left( f_{\ell -1}( U_{\ell -1, k}) \right) ^{(j)} \\&\quad + e^{-\lambda _j \Delta t_\ell } R_{\ell ,2k}^{(j)} + R_{\ell ,2k+1}^{(j)}, \end{aligned} \end{aligned}$$
(44)

for \(j =1,2,\ldots ,N_{\ell -1}\) and \(k= 0,1,\ldots ,J_{\ell -1}-1\), and the coarse level solution takes the form

$$\begin{aligned} {\widetilde{\Psi }}^{\ell -1}(u_0) = \sum _{j=1}^{N_{\ell -1}} U_{\ell -1, J_{\ell -1}}^{(j)} \phi _j. \end{aligned}$$

This coupling approach may be viewed as an extension of the multilevel coupling of Ornstein–Uhlenbeck processes for stochastic differential equations [43].

5.4.2 Assumptions and convergence rates

To show that the fully-discrete numerical method may be used in MLEnKF, it remains to verify that Assumptions 1 and 2(i)–(ii) hold.

Assumption 1(i): Let \(U_{\ell , k}\) and \({\bar{U}}_{\ell , k}\) denote solutions at time \(t=k\Delta t_\ell \) of the scheme (42) with respective initial data \(U_{\ell , 0} = \mathcal {P}_\ell u_0\) and \({\bar{U}}_{\ell , 0} = \mathcal {P}_\ell v_0\) fpr some \(u_0,v_0 \in L^p(\Omega , \mathcal {V})\). Then, by (40), and the properties: (a) for all \(\ell \ge 0\) and \(v \in \mathcal {V}_\ell \)

$$\begin{aligned} \Vert A_\ell ^{-1}( e^{A_\ell \Delta t_\ell } - I) v \Vert _{\mathcal {V}} = \left\| \int _0^{\Delta t_\ell } e^{A_\ell s} v ds \right\| _{\mathcal {V}} \le \int _{0}^{\Delta t_\ell } \Vert e^{As} v\Vert _{\mathcal {V}} ds \le \Vert v\Vert _{\mathcal {V}} \Delta t_\ell , \end{aligned}$$

and (b) \(f \in {\mathrm{Lip}}(\mathcal {V},\mathcal {V})\); there exists a \(C>0\) such that

$$\begin{aligned} \begin{aligned} \Vert U_{\ell , J_\ell } - {\bar{U}}_{\ell , J_\ell } \Vert _{L^p(\Omega , \mathcal {V})}&\le (1+C\Delta t_\ell )\Vert U_{\ell , J_{\ell }-1} - {\bar{U}}_{\ell ,J_\ell -1}\Vert _{L^p(\Omega , \mathcal {V})} \\&\le (1+C\Delta t_\ell )^{T/\Delta t_\ell } \Vert \mathcal {P}_\ell (u_0- v_0)\Vert _{L^p(\Omega , \mathcal {V})}\\&\le e^{CT} \Vert u_0-v_0\Vert _{L^p(\Omega , \mathcal {V})}. \end{aligned} \end{aligned}$$

Consequently, for every \(p \ge 2\), there exists a \(c_{\Psi }>0\) such that

$$\begin{aligned} \left\| {\widetilde{\Psi }}^\ell (u_0) - {\widetilde{\Psi }}^\ell (v_0) \right\| _p \le c_{\Psi }\left\| u_0 - v_0\right\| _p \end{aligned}$$

holds for all \(\ell \ge 0\) and \(u_0,v_0 \in L^p(\Omega ,\mathcal {V})\).

Assumption 1(ii): Under the regularity constraints imposed by Assumption 3, it holds for all \(\ell \in \mathbb {N}\) and \(U_{\ell ,k} = \sum _{j=1}^{N_\ell } U^{(j)}_{\ell ,k} \phi _k\) that

$$\begin{aligned} \max _{k \in \{0,1,\ldots , J_\ell \}} \Vert U_{\ell ,k}\Vert _{L^p(\Omega , \mathcal {K}_{r} )} \le C(1+\Vert u_0\Vert _{L^p(\Omega , \mathcal {K}_{r})}), \quad \forall p \ge 2 \text { and } r \in [r_1, r_2), \end{aligned}$$

where \(C>0\) depends on r and p, but not on \(\ell \), cf. [28, Lemma 8.2.21].

Assumption 2(i): We begin by introducing the auxiliary \(\ell \)th level exact-in-time Galerkin approximation

$$\begin{aligned} u^\ell (t) \mathrel {\mathop :}=e^{A_\ell t}u_0 + \int _0^t e^{A_\ell (t-s)} f_\ell (u^\ell (s)) ds + \int _0^t e^{A_\ell (t-s)} dW_s, \quad t \in [0,T], \end{aligned}$$

and the notation \({\widehat{\Psi }}^\ell (u_0) \mathrel {\mathop :}=u^\ell (T)\). Assumption 3 and [28, Corollary 8.1.12] imply that for any \(p \ge 2\) and \(r \in [r_1, r_2]\)

$$\begin{aligned} \sup _{\ell \ge 0} \left\| {\widehat{\Psi }}^\ell (u_0)\right\| _{L^p(\Omega , \mathcal {K}_r)} \lesssim (1+ \left\| u_0\right\| _{L^p(\Omega , \mathcal {K}_r)}). \end{aligned}$$

The triangle inequality and (41) yield that

$$\begin{aligned} \left\| \Psi (u_0) - {\widetilde{\Psi }}^\ell (u_0) \right\| _{L^p(\Omega , \mathcal {V})} \le \left\| \Psi (u_0) - {\widehat{\Psi }}^\ell (u_0) \right\| _{L^p(\Omega ,\mathcal {V})} + \left\| {\widehat{\Psi }}^\ell (u_0) - {\widetilde{\Psi }}^\ell (u_0)\right\| _{L^p(\Omega ,\mathcal {V})}, \end{aligned}$$

and by [28, Corollary 8.1.11-12 and Theorem 8.2.25],Footnote 1 it respectively holds that for any \(p\ge 2\)

$$\begin{aligned} \left\| \Psi (u_0) -{\widehat{\Psi }}^\ell (u_0) \right\| _{L^p(\Omega ,\mathcal {V})} \lesssim (1+ \Vert u_0\Vert _{L^p(\Omega , V)}) N_\ell ^{2(r_1-r_2)}, \end{aligned}$$

and

$$\begin{aligned} \left\| {\widehat{\Psi }}^\ell (u_0) - {\widetilde{\Psi }}^\ell (u_0) \right\| _{L^p(\Omega ,\mathcal {V})} \lesssim (1+ \Vert u_0\Vert _{L^p(\Omega , \mathcal {V})})J_\ell ^{r_1-r_2}. \end{aligned}$$
(45)

This verifies Assumption 2(i) as it leads to the following bound: for any \(p\ge 2\),

$$\begin{aligned} \Vert \Psi (u_0) - {\widetilde{\Psi }}^\ell (u_0) \Vert _{L^p(\Omega , \mathcal {V})} \lesssim (1+ \Vert u_0\Vert _{L^p(\Omega , V)}) (N_\ell ^{2(r_1-r_2)} + J_\ell ^{r_1-r_2}). \end{aligned}$$

Assumption 2(ii) only depends on the projection operator, and thus follows from (41).

5.5 Linear forcing

For the remaining part of this section consider the linear case \(f(u)=u\) of the filtering problem in Sect. 5.2. We derive explicit values for the rate exponents \(\beta \), \(\gamma _x\) and \(\gamma _t\) when applying MLEnKF with either the exact-in-time-truncated-in-space approximation method in Sect. 5.3 or the fully-discrete approximation method in Sect. 5.4.

The exact solution of the jth mode for this linear case is

$$\begin{aligned} u_{n+1}^{(j)} = e^{(1-\lambda _j)T} u_n^{(j)} + \xi _n^{(j)}, \quad \xi _n^{(j)} \sim N\left[ 0,\frac{\lambda _j^{-2b}}{2 (\lambda _j-1)}(1-e^{2(1- \lambda _j )T})\right] \perp u_n^{(j)}. \end{aligned}$$

Although we see that the underlying dynamics can be solved exactly, the filtering problem is still non-trivial since correlations between the modes \(\{u_{n+1}^{(j)}\}_j\) will arise from the assimilation of observations (39), unless the observation operator is of the special form \(H(\cdot ) = [H_1(\cdot ), \ldots , H_m(\cdot )]^{\mathsf {T}}\) with all operator components of the form \(H_i=\phi _j^*\) for some \(j\in \mathbb {N}\).

Since the Galerkin and spatial approximation methods coincide in the linear setting, meaning \(\Psi ^\ell = {\widehat{\Psi }}^\ell \), it holds by (41) that for any \(p \ge 2\),

$$\begin{aligned} \Vert \Psi (u_0) -{\widehat{\Psi }}^\ell (u_0) \Vert _{L^p(\Omega ,\mathcal {V})} \lesssim (1+ \Vert u_0\Vert _{L^p(\Omega , V)}) N_\ell ^{2(r_1-r_2)}. \end{aligned}$$
(46)

Let us next show that the time discretization convergence rate (45) is improved from \(r_1-r_2\) in the above nonlinear setting to 1 in the linear setting. We begin by studying the properties of the sequence \(\{\mathcal {P}_\ell {\widetilde{\Psi }}^{m}(u_0)\}_{m=\ell }^\infty \) for a fixed \(\ell \in \mathbb {N}\). The jth mode projected difference of coupled solutions for \(m>\ell \) is given by

$$\begin{aligned} \langle \mathcal {P}_\ell ({\widetilde{\Psi }}^{m}(u_0) - {\widetilde{\Psi }}^{m-1}(u_0)), \phi _j \rangle _{\mathcal {K}} = {\left\{ \begin{array}{ll} U^{(j)}_{m,J_{m}} - U^{(j)}_{m-1,J_{m-1}}, &{} \text {if } j\le N_\ell ,\\ 0&{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

and the difference can be bounded as follows:

Lemma 8

Consider the SPDE (35) with \(f(u)=u\), and other assumptions as stated in Sect. 5.1. Then for any \({\bar{u}}_0 \in L^2(\Omega , \mathcal {V})\) and \(m \in \mathbb {N}\), the sequence

$$\begin{aligned} I_{m,j} \mathrel {\mathop :}=\left\langle \mathcal {P}_{m-1} \left( {\widetilde{\Psi }}^{m}({\bar{u}}_0) - {\widetilde{\Psi }}^{m-1}({\bar{u}}_0)\right) , \phi _j \right\rangle _{\mathcal {K}}, \quad j =1,2,\ldots \end{aligned}$$

can be split into three parts

$$\begin{aligned} I_{m,j}= I_{m,j,1} +I_{m,j,2} + I_{m,j,3}, \end{aligned}$$

where \(I_{m,j,1}, I_{m,j,2},\) and \(I_{m,j,3}\) for every \(j=1,2,\ldots \) is a triplet of mutually independent random variables and \(I_{m,j,1} = I_{m,j,2} = I_{m,j,3}=0\) for all \(j> N_{m-1}\). Furthermore, there exists a constant \(c>0\) that depends on \(T>0\) and \(\lambda _1>1\) such that for any \(m \in \mathbb {N}\) and all \(j \le N_{m-1}\),

$$\begin{aligned} |I_{m,j,1}| \le c |{\bar{u}}_0^{(j)}| \Delta t_m, \end{aligned}$$

and \(I_{m,j,2}\) and \(I_{m,j,3}\) are mean zero Gaussians with variance bounded by

(47)

Proof

See “Appendix B”. \(\square \)

By Lemma 8 and Assumption 3, there exists a \(C>0\) depending on p, T, \(\lambda _1\) and \(b +1/4-r_1\) such that for any \(m > \ell \) and \(u_0 \in \cap _{p\ge 2} L(\Omega , V)\),

$$\begin{aligned}&\left\| \mathcal {P}_\ell \left( {\widetilde{\Psi }}^m(u_0) - {\widetilde{\Psi }}^{m-1}(u_0) \right) \right\| _{L^p(\Omega ,\mathcal {V})}^2 \le \left\| \mathcal {P}_{m-1}\left( {\widetilde{\Psi }}^m(u_0) - {\widetilde{\Psi }}^{m-1}(u_0) \right) \right\| _{L^p(\Omega ,\mathcal {V})}^2 \nonumber \\&\le \left\| \sum _{j=1}^{\infty } (I_{m,j,1} +I_{m,j,2}+I_{m,j,3}) \phi _j \right\| _{L^p(\Omega ,\mathcal {V})}^2\nonumber \\&\le 3 \left\| \sum _{j=1}^{\infty } I_{m,j,1} \phi _j \right\| _{L^{p}(\Omega ,\mathcal {V})}^2 + 3 \left\| \sum _{j=1}^{\infty } I_{m,j,2} \phi _j \right\| _{L^{p}(\Omega ,\mathcal {V})}^2 +3 \left\| \sum _{j=1}^{\infty } I_{m,j,3} \phi _j \right\| _{L^{p}(\Omega ,\mathcal {V})}^2\nonumber \\&\le 3c^2\Delta t_m^2 \left\| u_0 \right\| _{L^{p}(\Omega ,\mathcal {V})}^2 + 3\left\| \sum _{j=1}^{\infty } I_{m,j,2}^2 \langle \phi _j,\phi _j \rangle _\mathcal {V}\right\| _{L^{p/2}(\Omega )} + 3\left\| \sum _{j=1}^{\infty } I_{m,j,3}^2 \langle \phi _j,\phi _j \rangle _\mathcal {V}\right\| _{L^{p/2}(\Omega )} \nonumber \\&\le 3c^2\Delta t_m^2 \left\| u_0 \right\| _{L^{p}(\Omega ,\mathcal {V})}^2 + 3 \sum _{j=1}^{\infty } \left( \left\| I_{m,j,2}^2\right\| _{L^{p/2}(\Omega )} + \left\| I_{m,j,2}^2\right\| _{L^{p/2}(\Omega )}\right) \lambda _j^{2r_1}\nonumber \\&\le 3c^2 \Delta t_m^2\left( \left\| u_0 \right\| _{L^{p}(\Omega ,\mathcal {V})}^2 + 2\sum _{j=1}^{\infty } \lambda _j^{2(r_1-b)-1}\right) \nonumber \\&\le C\left( 1+\left\| u_0 \right\| _{L^{p}(\Omega ,\mathcal {V})}\right) ^2 \Delta t_{m}^2. \end{aligned}$$
(48)

Here, the sixth inequality follows from \(I_{m,j,2}\) and \(I_{m,j,3}\) being mean zero Gaussians with variance bounded by (47), which implies that for any \(p\ge 2\), there exists a constant \(C>0\) depending on p such that

$$\begin{aligned} \max _{r \in \{2,3\}} \left\| I_{m,j,r}^2\right\| _{L^{p/2}(\Omega )}= \max _{r \in \{2,3\}} \left\| I_{m,j,r}\right\| _{L^{p}(\Omega )}^2 \le C \frac{\Delta t_m^2}{\lambda _j^{1+2b}} \end{aligned}$$

holds for all \(j \in \mathbb {N}\). And the last inequality follows from the assumption \(r_1<b+1/4\), which implies that \(2(r_1-b)-1<-1/2\) and hence

$$\begin{aligned} \sum _{j=1}^\infty \lambda _j^{2(r_1-b)-1}< \sum _{j=1}^\infty (j^2)^{2(r_1-b)-1} <\infty . \end{aligned}$$

From inequality (48) we deduce that \(\{\mathcal {P}_\ell {\widetilde{\Psi }}^{m}(u_0)\}_{m=\ell }^\infty \) is \(L^p(\Omega , \mathcal {V})\)-Cauchy and that there exists a constant \(C>0\) depending on p, T, \(\lambda _1\) and \(b +1/4-r_1\) such that

$$\begin{aligned} \begin{aligned} \Vert \mathcal {P}_\ell \Psi (u_0) - {\widetilde{\Psi }}^\ell (u_0) \Vert _{L^p(\Omega , \mathcal {V})}&\le \sum _{m=\ell +1}^\infty \Vert \mathcal {P}_\ell ({\widetilde{\Psi }}^{m}(u_0) - {\widetilde{\Psi }}^{m-1}(u_0) ) \Vert _{L^p(\Omega , \mathcal {V})}\\&\le C (1+\Vert u_0\Vert _{L^p(\Omega , \mathcal {V})}) \sum _{m=\ell +1}^\infty \Delta t_{m} \\&\le C (1+\Vert u_0\Vert _{L^p(\Omega , \mathcal {V})}) \Delta t_{\ell }\sum _{k=1}^\infty 2^{-k} \\&= C(1+\Vert u_0\Vert _{L^p(\Omega , \mathcal {V})}) J_\ell ^{-1}. \end{aligned} \end{aligned}$$
(49)

In view of the preceding inequality and (46) we obtain the following \(L^p\)-strong convergence rate for the fully discrete scheme:

Theorem 3

Consider the SPDE (35) with \(f(u)=u\) and other assumptions as stated in Sect. 5.1. Then for all \(p\ge 2\) and \(\ell \in \mathbb {N}\cup \{0\}\), there exists a \(C>0\) such that

$$\begin{aligned} \Vert \Psi (u_0) - {\widetilde{\Psi }}^\ell (u_0) \Vert _{L^p(\Omega , \mathcal {V})} \le C (1+ \Vert u_0\Vert _{L^p(\Omega , V)})(N_\ell ^{2(r_1-r_2)} + J_\ell ^{-1}), \quad \forall \ell \ge 0, \end{aligned}$$
(50)

where C depends on \(r_1,r_2\) and p, but not on \(\ell \).

Remark 7

To the best of our knowledge, the \(L^p\)-strong time-discretization convergence rate (49) is an improvement of the literature in two ways. First, for \(p=2\), it is slightly higher than \(\mathcal {O}(\log (\Delta t^{-1}) \Delta t)\), which is the best rate in the literature, cf. [29]. And second, this is the first proof of order 1 \(L^p\)-strong time-discretization convergence rate for any \(p\ge 2\).

5.5.1 Error equilibration

The temporal and spatial discretization errors of (50) are equlibrated through determining the base \(\kappa >1\) that induces a sequence \(\{N_\ell = N_0 \kappa ^\ell \}\) such that \(N_\ell ^{2(r_1-r_2)} \eqsim J_\ell ^{-1} \eqsim 2^{-\ell }\). The solution is \(\kappa = 2^{(r_2-r_1)/2}\), which yields the following \(L^p\)-strong convergence rate in (50):

$$\begin{aligned} \Vert \Psi (u_0) - {\widetilde{\Psi }}^\ell (u_0) \Vert _{L^p(\Omega , \mathcal {V})} \lesssim (1+ \Vert u_0\Vert _{L^p(\Omega , V)}) h_\ell ^{2(r_2-r_1)}. \end{aligned}$$

In view of Assumption 2, MLEnKF with the fully-discrete approximation method yields the convergence rate \(\beta = 4(r_2-r_1)\) and the computational cost rates \(\gamma _x = 1\) and \(\gamma _t = 2(r_2-r_1)\) in the considered linear setting.

Remark 8

(MLEnKF time) Note that one could consider applying the SDE version of [23] to a fixed finite approximation of the SPDE. However, in this case we would be incurring a fixed baseline cost associated to that discretization. In comparison to using a single level method, there would be a gain in efficiency, as a result of using the multilevel identity with respect to the time discretization. But this would be still substantially less efficient than accounting also for the spatial approximation in the multilevel method, as we do in the method considered here.

6 Numerical examples

In this section we present numerical performance studies of EnKF and MLEnKF applied to two different filtering problems with underlying dynamics given by the SPDE (35). In the first example, the reaction term of the SPDE is linear, and in the second example we consider a nonlinear, and thus more challenging, reaction term.

6.1 Discretization parameters and the relationship between computational cost and accuracy

If we neglect the logarithmic factor in (19), as is motivated by Remark 5, then Theorems 1 and 2 respectively imply the following relations between mean squared error (MSE) and computational cost

$$\begin{aligned} \mathrm {Cost}\left( {\hat{\mu }}^{\mathrm{ML}}_n [\varphi ] \right) ^{\min (1,\beta /(d \gamma _x + \gamma _t))} \Vert {\hat{\mu }}^{\mathrm{ML}}_n [\varphi ] - {\hat{\bar{\mu }}}_n [\varphi ] \Vert _2^2 \lesssim {\left\{ \begin{array}{ll} 1 &{} \text {if } \beta \ne d\gamma _x +\gamma _t, \\ L^3 &{} \text {if } \beta = d\gamma _x +\gamma _t, \end{array}\right. } \end{aligned}$$

and

$$\begin{aligned} \mathrm {Cost}\left( {\hat{\mu }}^{\mathrm{MC}}_n [\varphi ]\right) ^{\beta /(\beta +d \gamma _x +\gamma _t)} \Vert {\hat{\mu }}^{\mathrm{MC}}_n [\varphi ] - {\hat{\bar{\mu }}}_n [\varphi ] \Vert _2^2 \lesssim 1. \end{aligned}$$

In other words,

$$\begin{aligned} \Vert {\hat{\mu }}^{\mathrm{ML}}_n [\varphi ] - {\hat{\bar{\mu }}}_n [\varphi ] \Vert _2^2 \lesssim {\left\{ \begin{array}{ll} \mathrm {Cost}\left( {\hat{\mu }}^{\mathrm{ML}}_n [\varphi ]\right) ^{-1} &{} \text {if } \beta > d\gamma _x + \gamma _t,\\ L^3 \mathrm {Cost}\left( {\hat{\mu }}^{\mathrm{ML}}_n [\varphi ]\right) ^{-1} &{} \text {if } \beta = d\gamma _x + \gamma _t,\\ \mathrm {Cost}\left( {\hat{\mu }}^{\mathrm{ML}}_n [\varphi ]\right) ^{-\beta /(d\gamma _x+\gamma _t)} &{} \text {if } \beta < d\gamma _x + \gamma _t, \end{array}\right. } \end{aligned}$$
(51)

and

$$\begin{aligned} \Vert {\hat{\mu }}^{\mathrm{MC}}_n [\varphi ] - {\hat{\bar{\mu }}}_n [\varphi ] \Vert _2^2 \lesssim \mathrm {Cost}\left( {\hat{\mu }}^{\mathrm{MC}}_n[\varphi ]\right) ^{-\beta /(\beta + d\gamma _x + \gamma _t)}. \end{aligned}$$
(52)

For all test problems, we use the observation-time interval \(T=1/2\), \(N = 40\) observation times, \(N_\ell = 2^{\ell +2}\), and, when relevant \(J_\ell = 2^{\ell +2}\) (i.e., for the fully-discrete numerical method). The approximation error, which we refer to as the mean squared error (MSE), is defined as the sum of the squared QoI error over the observation times and averaged over 100 realizations of the respective filtering methods. That is,

$$\begin{aligned} \text {MSE(MLEnKF)} \mathrel {\mathop :}=\frac{1}{100}\sum _{i=1}^{100} \sum _{n=0}^{N} \left| {\hat{\mu }}^{\mathrm{ML}}_{n,i} [\varphi ] - {\hat{\bar{\mu }}}_n [\varphi ] \right| ^2 \approx \sum _{n=0}^{N} \Vert {\hat{\mu }}^{\mathrm{ML}}_{n} [\varphi ] - {\hat{\bar{\mu }}}_n [\varphi ]\Vert _2^2, \end{aligned}$$

where \(\{{\hat{\mu }}^{\mathrm{ML}}_{\cdot ,i} [\varphi ]\}_{i=1}^{100}\) is a sequence of i.i.d. QoI evaluations induced from i.i.d. realizations of the MLEnKF. And similarly,

$$\begin{aligned} \text {MSE(EnKF)} \mathrel {\mathop :}=\frac{1}{100}\sum _{i=1}^{100} \sum _{n=0}^{N} \left| {\hat{\mu }}^{\mathrm{MC}}_{n,i} [\varphi ] - {\hat{\bar{\mu }}}_n [\varphi ] \right| ^2 \approx \sum _{n=0}^{N} \Vert {\hat{\mu }}^{\mathrm{MC}}_{n} [\varphi ] - {\hat{\bar{\mu }}}_n [\varphi ]\Vert _2^2. \end{aligned}$$

In the examples below we numerically verify that the considered numerical methods respectively fulfill (51) and (52), when the above computational cost expressions are replaced/approximated by the wall-clock runtime of the computer implementations of the respective methods. More precisely, we numerically verify that the following approximate asymptotic inequalities hold:

$$\begin{aligned} \text {MSE(MLEnKF)} \lessapprox {\left\{ \begin{array}{ll} \text {Runtime(MLEnKF)}^{-1} &{} \text {if } \beta > d\gamma _x + \gamma _t,\\ L^3 \text {Runtime(MLEnKF)}^{-1} &{} \text {if } \beta = d\gamma _x + \gamma _t,\\ \text {Runtime(MLEnKF)}^{-\beta /(d\gamma _x+\gamma _t)} &{} \text {if } \beta < d\gamma _x + \gamma _t, \end{array}\right. } \end{aligned}$$
(53)

and

$$\begin{aligned} \Vert {\hat{\mu }}^{\mathrm{MC}}_n [\varphi ] - {\hat{\bar{\mu }}}_n [\varphi ] \Vert _2^2 \lessapprox \text {Runtime(EnKF)}^{-\beta /(\beta + d\gamma _x + \gamma _t)}. \end{aligned}$$
(54)

6.2 Linear filtering problems

We consider the filtering problem in Sect. 5.2 with the linear forcing \(f(u)=u\) in the underlying dynamics (35), smoothing parameter \(b=1/2\), approximation space parameters \(r_1=1/4+\upsilon \) and \(r_2 = 3/4-\upsilon \) with \(\upsilon = 10^{-4}\), observation functional

$$\begin{aligned} H =\delta _{0.5} = \sqrt{2} \sum _{j=1}^\infty \sin (j\pi /2) \phi _j^*, \end{aligned}$$

observation noise parameter \(\Gamma =0.5\), QoI

$$\begin{aligned} \varphi = 1^* = \sum _{j \in \mathbb {N}} \frac{\sqrt{2}(1-\cos (j\pi ))}{j\pi } \phi _{j}^*, \end{aligned}$$
(55)

and initial data

$$\begin{aligned} u_0(x) = 1-2\left| x-\frac{1}{2}\right| = \sum _{j\in \mathbb {N}} (-1)^{j-1} \frac{4\sqrt{2}}{((2j-1)\pi )^2} \phi _{2j-1}(x). \end{aligned}$$

We note that \(H, \varphi \in \mathcal {V}^*\) and \(u_0 \in V\). Figure 1 illustrates one exact-in-time simulation of the SPDE.

Fig. 1
figure 1

Exact-in-time simulation of the SPDE in Sect. 6.2 over one observation-time interval with spatial resolution \(N_{10}=2^{12}\)

By the approximation \(\beta = 2(r_2-r_1) \approx 2\), application of the error equilibration in Sect. 5.5.1 yields (\(d\gamma _x = 1\),\(\gamma _t = 1\)) for the fully-discrete method and (\(d\gamma _x = 1\), \(\gamma _t = 0\)) for the spatially-discrete method. Figure 2 and the left subfigure of Fig. 3 display the runtime-to-MSE performance for the spatially-discrete and fully-discrete methods, respectively.

Fig. 2
figure 2

Runtime-to-MSE comparison for the filtering problem in Sect. 6.2 using the spatially-discrete method

The right subfigure of Fig. 3 displays the graph of

$$\begin{aligned} (\text {Runtime(MLEnKF)}, \, \text {MSE(MLEnKF)}\times \text {Runtime(MLEnKF)}/L^3), \end{aligned}$$

(where \(\text {MSE(MLEnKF)}\) in the second argument denotes the “MSE” obtained for a given “Runtime”). The numerical observations are consistent with the approximate theoretical predictions (53) and (54).

Fig. 3
figure 3

Left: Runtime-to-MSE comparison for the filtering problem in Sect. 6.2 using the fully-discrete numerical method. Right: Graph of \((\text {Runtime(MLEnKF)}, \, \text {MSE(MLEnKF)}\times \text {Runtime(MLEnKF)}/L^3)\) for the fully-discrete numerical method. Note the y-axis here, and in future such plots, has linear scaling

The reference-solution sequence \(\{{\hat{\bar{\mu }}}_n [\varphi ] \}_{n=1}^N\) that is needed to estimate the MSE in the above figures, is approximated by Kalman filtering the subspace \(\mathcal {V}_{12} \subset \mathcal {V}\), which is an \(N_{12} = 2^{14}\)-dimensional subspace. This yields an accurate approximation, since when the underlying dynamics (35) is linear with Gaussian additive noise, the full-space Kalman filter distribution equals the reference MFEnKF distribution \({\hat{\bar{\mu }}}\). Furthermore, EnKF and MLEnKF solutions are computed with ensemble particles at no higher spatial resolution than \(\mathcal {V}_{9}\) in the cost-to-accuracy studies.

6.3 A nonlinear filtering problem

We seek the mild solution to the following nonlinear SPDE with periodic boundary conditions

(56)

where W and B are described below. Here, the operator \(-A = (I-\Delta )\) is defined as a mapping \(A: H^2(0,1) \cap H^1_{\text {per}}(0,1)\rightarrow \mathcal {K}= L^2(0,1)\), where \(H^1_{\text {per}}(0,1) := \{f \in H(0,1) \mid (f-f(0)) \in H^1_0(0,1) \}\). The periodic boundary condition is different from the zero-valued boundary condition in (35), and, in order to spectrally decompose \(-A\), we now express the base-space \(\mathcal {K}=L^2(0,1)\) by the closure of the span of the Fourier basis

$$\begin{aligned} \phi _{k}(x) = {\left\{ \begin{array}{ll} 1, &{} k=1,\\ \sqrt{2} \cos (2k\pi x), &{} k=2,4,6\ldots \,,\\ \sqrt{2} \sin (2(k-1)\pi x), &{} k=3,5,7,\ldots \end{array}\right. } \end{aligned}$$
(57)

The operator \(-A\) is spectrally decomposed by

$$\begin{aligned} -A\phi _k = \lambda _k \phi _k \end{aligned}$$

with

$$\begin{aligned} \lambda _k = {\left\{ \begin{array}{ll} 1, &{} k=1,\\ 1+(2k\pi )^2, &{} k=2,4,6, \ldots \, ,\\ 1 + (2(k-1)\pi )^2 &{} k=3,5,7, \ldots \end{array}\right. } \end{aligned}$$

As in Sect. 5.1, we introduce the family of Hilbert spaces parametrized in \(r\in \mathbb {R}\)

$$\begin{aligned} \begin{aligned} \mathcal {K}_r \mathrel {\mathop :}=D((-A)^{r}) = \Big \{v:[0,1] \rightarrow \mathbb {R}\mid&v \text { is } {\mathcal {B}}([0,1])/{\mathcal {B}}(\mathbb {R})\text {-measurable} \\ {}&\text {and } \sum _{j \in \mathbb {N}} \lambda ^{2r}_j \left| \langle \phi _j, v \rangle _{\mathcal {K}} \right| ^2 < \infty \Big \}, \end{aligned} \end{aligned}$$

with norm \(\Vert \cdot \Vert _{\mathcal {K}_r} \mathrel {\mathop :}=\Vert (-A)^{r}(\cdot )\Vert _{\mathcal {K}}\). As smoothing operator B, we consider (36) with parameter \(b=1/4\), where W denotes an \(I_{\mathcal {K}}\)-cylindrical Wiener process [both B and W are of course expanded in the currently considered basis (57)]. We consider the approximation spaces

$$\begin{aligned} \mathcal {V}= \mathcal {K}_0 \quad \text {and} \quad V= \mathcal {K}_{(1-\nu )/2}, \end{aligned}$$

where \(\nu =10^{-4}\), the QoI (55) and the observation operator

$$\begin{aligned} H = 1_{x>0.5}^* = \frac{1}{2}\phi _0^* + \sum _{k=1}^\infty \frac{\sqrt{2}}{\pi k} \phi _{2k+1}^*. \end{aligned}$$

The spectral representation of the initial data

$$\begin{aligned} u(0,\cdot ) = \frac{1}{3} + \sum _{k=1} \frac{2\sqrt{2}}{(\pi k)^{2}} \phi _{2k}(\cdot ) \end{aligned}$$

implies that \(u(0,\cdot ) \in \mathcal {K}_{(1-\nu )/2}\). Figure 4 illustrates one simulation of the SPDE by the numerical scheme described below.

Fig. 4
figure 4

Simulation of the SPDE (56) over one observation-time interval by the numerical scheme in Sect. 6.3.1 on resolutions \(N_{10}=J_{10}=2^{12}\)

By the Lipschitz-continuity of the reaction term, it follows that Assumption 3 is fulfilled. Moreover, the well-posedness theory for the zero-valued boundary condition for the SPDE (35) extends to the current setting, and so does the theory for the fully-discrete exponential Euler method in Sect. 5.4, cf. [28].

6.3.1 Numerical scheme

We apply the coupled fully discrete approximation method described in Sect. 5.4, but, due to the nonlinearity of the reaction term \(f(u)=\sin (\pi u)\), the spectral representation in the coupled scheme (42) and (44) needs to be approximated. Namely, the approximation of \([(f_\ell (U_{\ell ,k}))^{(1)}, \ldots , (f_\ell (U_{\ell ,k}))^{(N_\ell )}]\) is obtained by application of the fast Fourier transform (FFT) as follows:

  1. 1.

    Given the spectral representation \([U_{\ell , k}^{(1)}, \ldots , U_{\ell , k}^{(N_\ell )}]\) compute the physical-space-on-uniform-mesh representation by the inverse FFT

    $$\begin{aligned}{}[U_{\ell , k}(0), U_{\ell , k}(1/N_\ell ), \ldots , U_{\ell , k}(1 - N_\ell ^{-1})] = \text {IFFT}[U_{\ell , k}^{(1)}, \ldots , U_{\ell , k}^{(N_\ell )}]. \end{aligned}$$
  2. 2.

    Evaluate the nonlinear reaction term in physical space and approximate the spectral representation by FFT

    $$\begin{aligned}&[(f_\ell (U_{\ell ,k}))^{(1)}, \ldots , (f_\ell (U_{\ell ,k}))^{(N_\ell )}]\\&\quad \approx \text {FFT}[f(U_{\ell , k}(0)), f(U_{\ell , k}(1/N_\ell )), \ldots , f(U_{\ell , k}(1 - N_\ell ^{-1}))]. \end{aligned}$$

The spectral approximation of the coarse-level reaction term is obtained analogously. Due to the FFT approximation error in step 2. above, we cannot directly obtain the rate parameter \(\beta \) from the analysis in Sect. 5.4. To infer \(\beta \), we instead perform numerical studies of the \(L^p(\Omega ,\mathcal {V})\)-convergence rate of the coupled-level difference of the FFT-based fully-discrete method \({\widetilde{\Psi }}^\ell (u) - {\widetilde{\Psi }}^{\ell -1}(u)\) towards 0, where the expectation is estimated with the Monte Carlo method with \(M=10^5\) samples:

$$\begin{aligned} \left( \frac{1}{M} \sum _{i=1}^{M} \Vert {\widetilde{\Psi }}^\ell (u_0; \omega _i) - {\widetilde{\Psi }}^{\ell -1}(u_0; \omega _i)\Vert _{\mathcal {V}}^p\right) ^{1/p} \approx \Vert {\widetilde{\Psi }}^\ell (u) - {\widetilde{\Psi }}^{\ell -1}(u)\Vert _{L^p(\Omega ,\mathcal {V})} \, . \end{aligned}$$
(58)

Recalling that \(h_\ell ^{-1} \eqsim N_\ell \eqsim J_\ell = 2^{2+\ell }\) for the numerical solver \({\widetilde{\Psi }}^\ell \), we infer from the results of the numerical study (58), which is provided in Fig. 5, that

$$\begin{aligned} \Vert {\widetilde{\Psi }}^\ell (u) - \Psi (u)\Vert _p \lessapprox h_\ell ^{\beta /2} \, , \end{aligned}$$
(59)

with \(\beta =2\). Further numerical studies, which we do not include here, indicate that the right hand side of (59) may be decomposed into \(\mathcal {O}(N_\ell ^{-1} + J_\ell ^{-1})\). On the basis of these observations, the configuration of discretization parameters for this problem, \(N_\ell \eqsim J_\ell \), is in alignment with the efficiency-optimized error equilibration strategy in Sect. 5.5.1.

Fig. 5
figure 5

Numerical estimates of the error \(\Vert {\widetilde{\Psi }}^\ell (u_0) - {\widetilde{\Psi }}^{\ell -1}(u_0)\Vert _p\) by the Monte Carlo method (58) for \(p=2\) (dash-dot), \(p=4\) (solid-circle), and \(p=8\) (dash-diamond). The solid line represents the reference function \(f(\ell ) = 2^{-(\ell +1)}\)

The left subfigure in Fig. 6 displays the results of the runtime-to-MSE studies of EnKF and MLEnKF.

Fig. 6
figure 6

Left: Runtime-to-MSE for the nonlinear filtering problem in Sect. 6.3.1 using the fully-discrete method. Right: Graph of \((\text {Runtime(MLEnKF)}, \, \text {MSE(MLEnKF)}\times \text {Runtime(MLEnKF)}/L^3)\) for the fully-discrete method

As pseudo-reference solution, we use the approximation

$$\begin{aligned} \frac{1}{200} \sum _{i=1}^{200} \mu ^{\mathrm{{ML}}}_{n,i}[\varphi ] \approx \hat{\bar{\mu }}_n[\varphi ], \quad n=1,2,\ldots , 40, \end{aligned}$$

with the MLEnKF estimator \(\mu ^{\mathrm{{ML}}}_{n,i}[\varphi ]\) here being computed on a finer resolution than all those considered in the runtime-to-MSE study. The right subfigure in Fig. 6 displays the graph of

$$\begin{aligned} (\text {Runtime(MLEnKF)}, \, \text {MSE(MLEnKF)}\times \text {Runtime(MLEnKF)}/L^3). \end{aligned}$$

Once again, the numerical observations are consistent with the theoretical asymptotical behavior predicted by (53) and (54).

Remark 9

(MLEnKF versus Multilevel particle filters) To the best of our knowledge, there does not exist a general multilevel particle filter for SPDE to this date. When the effective dimension on level \(\ell \) is \(N_\ell \), the general requirement for particle filters is that the ensemble size on that level is bounded from below by \(ce^{N_\ell }\) particles, for some constant \(c>0\). Effective dimension refers to the dimension of the space over which importance sampling needs to be performed [1, 3, 10]. For example, in the case of full observations, the effective dimension can be equal to the state-space dimension. For MLEnKF, on the other hand, the level \(\ell \) ensemble size is always bounded from above by \(\mathcal {O}(L^2 N_\ell ^{(\beta -d\gamma _x-\gamma _t)/(2d)} )\), even with full observations. The set of MLEnKF-tractable problems is therefore substantially larger than the set of problems tractable by particle filters.

7 Conclusion

We have presented the design and analysis of a multilevel EnKF method for infinite-dimensional spatio-temporal processes depending on a hierarchical decomposition of both the spatial and the temporal parameters. We have proved theoretically and provided numerical evidence that under suitable assumptions, a similar asymptotic cost-to-accuracy is obtained for MLEnKF as that one obtains for standard multilevel Monte Carlo methods. This result has potential for broad impact across application areas in which there has been a recent explosion of interest in EnKF, for example weather prediction and subsurface exploration.