Skip to main content
Log in

What is the correct cost functional for variational data assimilation?

  • Published:
Climate Dynamics Aims and scope Submit manuscript

Abstract

Variational approaches to data assimilation, and weakly constrained four dimensional variation (WC-4DVar) in particular, are important in the geosciences but also in other communities (often under different names). The cost functions and the resulting optimal trajectories may have a probabilistic interpretation, for instance by linking data assimilation with maximum aposteriori (MAP) estimation. This is possible in particular if the unknown trajectory is modelled as the solution of a stochastic differential equation (SDE), as is increasingly the case in weather forecasting and climate modelling. In this situation, the MAP estimator (or “most probable path” of the SDE) is obtained by minimising the Onsager–Machlup functional. Although this fact is well known, there seems to be some confusion in the literature, with the energy (or “least squares”) functional sometimes been claimed to yield the most probable path. The first aim of this paper is to address this confusion and show that the energy functional does not, in general, provide the most probable path. The second aim is to discuss the implications in practice. Although the mentioned results pertain to stochastic models in continuous time, they do have consequences in practice where SDE’s are approximated by discrete time schemes. It turns out that using an approximation to the SDE and calculating its most probable path does not necessarily yield a good approximation to the most probable path of the SDE proper. This suggest that even in discrete time, a version of the Onsager–Machlup functional should be used, rather than the energy functional, at least if the solution is to be interpreted as a MAP estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. We are grateful to referee Stéphàne Vannitsem for stressing this point.

  2. The problem is the translation invariance of the standard volume. In an infinite dimensional normed space, a ball of unit radius may contain infinitely many disjoint balls of sufficiently small but nonzero radius. By translation invariance, these balls must have the same volume. But this means that either the volume of the unit ball is infinity or the volume of a sufficiently small ball is zero.

  3. The limit is in fact in the \(L_2\) sense.

  4. Strictly speaking this “fact” is only correct in a much weaker sense but still sufficient to derive the Onsager–Machlup functional; The correct statement is that

    $$\begin{aligned} {\mathbb {E}}\left[ \exp \left( \int _{0}^{T} f(z_{t} + W_{t}) {\mathrm {d}}W_{t} + \frac{1}{2} \int _{0}^{T} f'(z_{t}) {\mathrm {d}}t \right) \left| \right. \sup _t |W_t | \le \epsilon \right] \rightarrow 1 \end{aligned}$$

    for \(\epsilon \rightarrow 0\), see Ikeda and Watanabe (1989).

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jochen Bröcker.

Additional information

The author was supported by the UK Engineering and Physical Sciences Research Council under grant agreement EP/L012669/1. Fruitful discussions with Tobias Kuna, Dan Crisan, Andrew Stuart, Colin Cotter, and Horatio Boedihardjo are gratefully acknowledged. Referee Stéphàne Vannitsem and a second anonymous referee provided a number of important comments which helped to improve this manuscript.

Appendices

Appendix A: Derivation of the correct functional

We will attempt a more careful calculation of the \(\epsilon \)–weight of a path which will not only allow us to take the limits in the right order and obtain the correct expression for the density, but also to identify the reason why interchanging these limits gives a different result. We will later restrict our attention to linear dynamics. It should be said that for linear dynamics, the additional term in the Onsager–Machlup functional (11) does not depend on the reference trajectory and hence minimising \({\mathcal {A}}_{OM}\) or \({\mathcal {A}}_{E}\) gives the same results in this case. However, the functionals are still different and only the Onsager–Machlup functional provides the correct density.

First we note the following simple but important fact. Let \(Z^{(1)}, Z^{(2)}\) be random variables with values in \({\mathbb {R}}^N\) with densities \(p_1, p_2\) respectively, and \(p_2(z)> 0\) for all \(z \in {\mathbb {R}}^N\). Further, let \(\phi \) be a function on \({\mathbb {R}}^N\). Then the identity

$$\begin{aligned} {\mathbb {E}}(\phi (Z^{(1)})) = {\mathbb {E}}\left( \phi (Z^{(2)}) \frac{p_1(Z^{(2)})}{p_2(Z^{(2)})} \right), \end{aligned}$$

holds, since

$$\begin{aligned} {\mathbb {E}}(\phi (Z^{(1)})) = \int _{{\mathbb {R}}^n}\phi (z) p_1(z) {\mathrm {d}}z = \int _{{\mathbb {R}}^n} \phi (z) \frac{p_1(z)}{p_2(z)} p_2(z) {\mathrm {d}}z = {\mathbb {E}}\left( \phi (Z^{(2)}) \frac{p_1(Z^{(2)})}{p_2(Z^{(2)})} \right) . \end{aligned}$$
(23)

On the other hand, note that

$$\begin{aligned} \mathbbm {P}(\max _k | X_{t_k} - z_{t_k}| \le \epsilon ) = {\mathbb {E}}\left( H\left( \frac{\max _k | X_{t_k} - z_{t_k}|}{\epsilon } - 1\right) \right) , \end{aligned}$$
(24)

where H is the Heaviside function. We might use Eqs (23) in (24) with

$$\begin{aligned} \begin{aligned} \phi (z)&= H\left( \frac{\max _k |z_k|}{\epsilon } - 1\right) , \\ Z^{(1)}&= (X^{({\varDelta })}_{t_1} - z_{t_1}, \ldots , X^{({\varDelta })}_{t_N} - z_{t_N}),\\ Z^{(2)}&= (W_{t_1}, \ldots , W_{t_N}), \end{aligned} \end{aligned}$$

where \((X^{({\varDelta })}_{t_1}, \ldots , X^{({\varDelta })}_{t_N})\) is a solution to the Euler approximation (13). Note that \((X^{({\varDelta })}_{t_1} - z_{t_1}, \ldots , X^{({\varDelta })}_{t_N} - z_{t_N})\) is then a solution of the system (19). We therefore obtain

$$\begin{aligned} \mathbbm {P}(\max _k|X_{t_k} - z_{t_k}| \le \epsilon ) = {\mathbb {E}}\left[ H\left( \frac{\max _k |W_{t_k}|}{\epsilon } - 1\right) \exp ( A + B + C ) \right], \end{aligned}$$
(25)

with

$$\begin{aligned} \begin{aligned} A&= -\frac{{\varDelta }}{2\rho ^2} \sum _{n = 1}^{N} \left( \frac{z_{t_n} - z_{t_{n-1}}}{{\varDelta }} - f(z_{t_{n-1}} + \rho W_{t_{n-1}})\right) ^2\\ B&= - \frac{1}{\rho } \sum _{n = 1}^{N} \left( \frac{z_{t_n} - z_{t_{n-1}}}{{\varDelta }}\right) (W_{t_n} - W_{t_{n-1}})\\ C&= \frac{1}{\rho } \sum _{n = 1}^{N} (f(z_{t_{n-1}} + \rho W_{t_{n-1}})) (W_{t_n} - W_{t_{n-1}}). \end{aligned} \end{aligned}$$
(26)

In terms of the limits \({\varDelta } \rightarrow 0\) and \(\epsilon \rightarrow 0\), the first two terms A and B will converge to

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \lim _{{\varDelta } \rightarrow 0} A = -\frac{1}{2\rho ^2} \int _{0}^{T} (\dot{z_t} - f(z_t))^2 {\mathrm {d}}t, \end{aligned}$$
(27)

and zero, respectively, no matter in which order the limits are taken. The third term however shows different behaviour depending on whether \({\varDelta } \rightarrow 0\) or \(\epsilon \rightarrow 0\) first. If we take \({\varDelta } \rightarrow 0\) first, it can be shown that a well defined random variable obtainsFootnote 3 which can be written as an Ito integral

$$\begin{aligned} \lim _{{\varDelta } \rightarrow 0} C = \frac{1}{\rho } \int _{0}^{T} f(z_{t} + \rho W_{t}) {\mathrm {d}}W_{t}. \end{aligned}$$

We do not expect the reader to be familiar with the theory of Ito integrals – relevant here is that the limit of this expression for \(\epsilon \rightarrow 0\) will not be zero but

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \lim _{{\varDelta } \rightarrow 0} C = -\frac{1}{2} \int _{0}^{T} f'(z_{t}) {\mathrm {d}}t. \end{aligned}$$
(28)

A demonstration of this factFootnote 4 for the case where f is linear is given here for illustration. If \(f(x) = a x\) for some \(a \in {\mathbb {R}}\), then

$$\begin{aligned} \begin{aligned} C&= \frac{a}{\rho } \sum _{n = 1}^{N} (z_{t_{n-1}} + \rho W_{t_{n-1}}) (W_{t_n} - W_{t_{n-1}})\\&= \frac{a}{\rho } \sum _{n = 1}^{N} z_{t_{n-1}} (W_{t_n} - W_{t_{n-1}}) + a \sum _{n = 1}^{N} W_{t_{n-1}} (W_{t_n} - W_{t_{n-1}})\\&= \frac{a}{\rho } C_1 + a C_2. \end{aligned} \end{aligned}$$
(29)

It is easy to see that \(C_1 \rightarrow 0\) if \({\varDelta } \rightarrow 0\) and \(\epsilon \rightarrow 0\), no matter in which order these limits are taken. After some algebra, we can write \(C_2\) as

$$\begin{aligned} \begin{aligned} C_2&= \sum _{n = 1}^{N} W_{t_{n-1}} (W_{t_n} - W_{t_{n-1}})\\&= \frac{1}{2} W_{T}^2 - \frac{1}{2} \sum _{n = 1}^{N} (W_{t_n} - W_{t_{n-1}})^2. \end{aligned} \end{aligned}$$

Considering the mean and the variance of the second term, we obtain \(\frac{1}{2} T\) and \(\frac{1}{2} T{\varDelta }\), respectively, implying that (at least in a mean square sense) the second term converges to its mean \(\frac{1}{2} T\) if \({\varDelta } \rightarrow 0\). Hence

$$\begin{aligned} \lim _{{\varDelta } \rightarrow 0} C_2 = \frac{1}{2} W_{T}^2 - \frac{1}{2} T. \end{aligned}$$
(30)

Therefore, taking the limits \({\varDelta } \rightarrow 0\) and then \(\epsilon \rightarrow 0\) in Eq. (29) and using Eq. (30) we obtain

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \lim _{{\varDelta } \rightarrow 0}C = -\frac{a}{2} T \end{aligned}$$

which is the same as Eq. (28) for this special case.

Using Eq. (28) and the expression in Eqs. (27) in (25) we obtain that for small \(\epsilon \)

$$\begin{aligned} \begin{aligned} \mathbbm {P}(\sup _t |X_{t} - z_{t}| \le \epsilon )&\cong {\mathbb {E}}( H\left( \frac{\sup _t|W_{t}|}{\epsilon } - 1\right) \\&\quad \cdot \; \exp \left[ -\frac{1}{2\rho ^2} \int _{0}^{T} (\dot{z_t} - f(z_t))^2 {\mathrm {d}}t -\frac{1}{2} \int _{0}^{T} f'(z_{t}) {\mathrm {d}}t. \right] \end{aligned} \end{aligned}$$

so that we can conclude

$$\begin{aligned} \begin{aligned}&\lim _{\epsilon \rightarrow 0}\frac{\mathbbm {P}(\sup _t |X_{t} - z_{t}| \le \epsilon )}{ \mathbbm {P}( \sup _t|W_{t}| \le \epsilon )}\\&\quad = \exp \left[ -\frac{1}{2\rho ^2} \int _{0}^{T} (\dot{z_t} - f(z_t))^2 {\mathrm {d}}t -\frac{1}{2} \int _{0}^{T} f'(z_{t}) {\mathrm {d}}t. \right] \\&\quad = \exp (-{\mathcal {A}}_{OM}). \end{aligned} \end{aligned}$$

Note that if we used Eqs (25,26) as a starting point but subsequently took the limits in the wrong order, that is, first \(\epsilon \rightarrow 0\) and then \({\varDelta } \rightarrow 0\), we would have \(B, C \rightarrow 0\), so we would obtain the energy estimator \({\mathcal {A}}_{E}\).

As a final remark, by looking back at the calculations the reader will see that the only term that does not permit interchange of the limits is a second order or “quadratic” term \( \sum _{n = 1}^N (W_{t_n} - W_{t_{n-1}})^2\) which would vanish with \({\varDelta } \rightarrow 0\) if W were a differentiable function but converges to T in case of the Wiener process. Roughly speaking, this is because \(W_{t_n} - W_{t_{n-1}}\) is of order \(\sqrt{{\varDelta }}\), which more generally gives rise to the extra terms in the Ito calculus.

Appendix B: Derivation of Eq.  (18)

In this section, we will derive the Eq. (18), that is, we follow same steps as for the Euler scheme and take the limits as in Eq. (15), but starting with the implicit scheme (17) instead of the Euler scheme. If we set \(R_n = \rho (W_{t_n} - W_{t_{n-1}})\), then the implicit scheme (17) can be written in the form

$$\begin{aligned} X_{t_{n}} = X_{t_{n-1}} + F_1(X_{t_{n}}) + F_2(X_{t_{n-1}}) + R_n \end{aligned}$$

which can be expressed as \((R_1, \ldots , R_N) = {\varPsi }(X_{t_1}, \ldots , X_{t_N})\) with

$$\begin{aligned} {\varPsi }_n(x_1, \ldots , x_N) = x_{n} - x_{n-1} - F_1 (x_{n}) - F_2(x_{n-1}) \qquad \text {for }n = 1, \ldots , N. \end{aligned}$$

According to basic probability calculus, we have for the densities

$$\begin{aligned} p_{X_{t_1}, \ldots , X_{t_N}}(x_{1}, \ldots , x_{N}) = p_{R_1, \ldots , R_N}({\varPsi }(x_{1}, \ldots , x_{N})) \cdot \left| \frac{\partial {\varPsi }}{\partial x}(x_{1}, \ldots , x_{N}) \right| \end{aligned}$$
(31)

Since \(\frac{\partial {\varPsi }_k}{\partial x_l} = 0\) for \(k < l\), the Jacobi matrix of \({\varPsi }\) is lower left triangular and hence

$$\begin{aligned} \begin{aligned} \left| \frac{\partial {\varPsi }}{\partial x} \right| (x_{1}, \ldots , x_{N})&= \prod _{n = 1}^N \frac{\partial {\varPsi }_k}{\partial x_k}(x_{1}, \ldots , x_{N}) \\&= \prod _{n = 1}^N 1 - F_1'(x_k) \\&= \prod _{n = 1}^N 1 - (1 - \lambda ) {\varDelta } f'(x_k) \\&= \exp \left( \sum _{n = 1}^N \log (1 - (1 - \lambda ) {\varDelta } f'(x_k)) \right) . \end{aligned} \end{aligned}$$

We evaluate this expression with \(x_k = z_{t_k}\) for \(k = 1, \ldots , N\) where \(\{z_t\}\) is some trajectory on the interval \(I = [0, T]\) and \(N = T/{\varDelta }\). Since \(\log (1 + w) \cong w\) for small w, we can write the exponent approximately as

$$\begin{aligned} \sum _{n = 1}^N \log (1 - (1 - \lambda ) {\varDelta } f'(z_{t_n})) \cong - (1 - \lambda ) {\varDelta } \sum _{n = 1}^N f'(z_{t_n}) \end{aligned}$$

which is a Riemann sum converging to \(-(1 - \lambda )\int _I f'(z_t) \; {\mathrm {d}}t\). The first factor in Eq. (31), after normalisation and when evaluated along a trajectory, reads as

$$\begin{aligned} \begin{aligned}&\frac{p_{R_1, \ldots , R_N}({\varPsi }(z_{t_1}, \ldots , z_{t_N}))}{p_{R_1, \ldots , R_N}(0, \ldots , 0) }\\&= \exp \left( -\frac{{\varDelta }}{2 \rho ^2} \sum _{n = 1}^N \left( \frac{z_{t_n} - z_{t_{n-1}}}{{\varDelta }} - (1 - \lambda ) f(z_{t_n}) - \lambda f(z_{t_{n-1}}) \right) ^2 \right) . \end{aligned} \end{aligned}$$

Again, the exponent is a Riemann sum which converges to \(-\frac{1}{2 \rho ^2}\int _I (\dot{z}_t - f(z_t))^2 {\mathrm {d}}t\) for \({\varDelta } \rightarrow 0\). In summary, we get Eq. (18).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bröcker, J. What is the correct cost functional for variational data assimilation?. Clim Dyn 52, 389–399 (2019). https://doi.org/10.1007/s00382-018-4146-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00382-018-4146-y

Keywords

Navigation