What is the correct cost functional for variational data assimilation?

Bröcker, Jochen

doi:10.1007/s00382-018-4146-y

What is the correct cost functional for variational data assimilation?

Published: 15 March 2018

Volume 52, pages 389–399, (2019)
Cite this article

Climate Dynamics Aims and scope Submit manuscript

Jochen Bröcker¹

388 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Variational approaches to data assimilation, and weakly constrained four dimensional variation (WC-4DVar) in particular, are important in the geosciences but also in other communities (often under different names). The cost functions and the resulting optimal trajectories may have a probabilistic interpretation, for instance by linking data assimilation with maximum aposteriori (MAP) estimation. This is possible in particular if the unknown trajectory is modelled as the solution of a stochastic differential equation (SDE), as is increasingly the case in weather forecasting and climate modelling. In this situation, the MAP estimator (or “most probable path” of the SDE) is obtained by minimising the Onsager–Machlup functional. Although this fact is well known, there seems to be some confusion in the literature, with the energy (or “least squares”) functional sometimes been claimed to yield the most probable path. The first aim of this paper is to address this confusion and show that the energy functional does not, in general, provide the most probable path. The second aim is to discuss the implications in practice. Although the mentioned results pertain to stochastic models in continuous time, they do have consequences in practice where SDE’s are approximated by discrete time schemes. It turns out that using an approximation to the SDE and calculating its most probable path does not necessarily yield a good approximation to the most probable path of the SDE proper. This suggest that even in discrete time, a version of the Onsager–Machlup functional should be used, rather than the energy functional, at least if the solution is to be interpreted as a MAP estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Treating Nonlinearities in Data-Space Variational Assimilation

Comparison of Stochastic Parametrization Schemes Using Data Assimilation on Triad Models

A note on domain decomposition approaches for solving 3D variational data assimilation models

Article 16 January 2019

Notes

We are grateful to referee Stéphàne Vannitsem for stressing this point.
The problem is the translation invariance of the standard volume. In an infinite dimensional normed space, a ball of unit radius may contain infinitely many disjoint balls of sufficiently small but nonzero radius. By translation invariance, these balls must have the same volume. But this means that either the volume of the unit ball is infinity or the volume of a sufficiently small ball is zero.
The limit is in fact in the $L_2$ sense.
Strictly speaking this “fact” is only correct in a much weaker sense but still sufficient to derive the Onsager–Machlup functional; The correct statement is that
$$\begin{aligned} {\mathbb {E}}\left[ \exp \left( \int _{0}^{T} f(z_{t} + W_{t}) {\mathrm {d}}W_{t} + \frac{1}{2} \int _{0}^{T} f'(z_{t}) {\mathrm {d}}t \right) \left| \right. \sup _t |W_t | \le \epsilon \right] \rightarrow 1 \end{aligned}$$
for $\epsilon \rightarrow 0$, see Ikeda and Watanabe (1989).

References

Apte A, Hairer M, Stuart AM, Voss J (2007) Sampling the posterior: an approach to non-gaussian data assimilation. Phys D Nonlinear Phenom 230(1):50–64. https://doi.org/10.1016/j.physd.2006.06.009. http://www.sciencedirect.com/science/article/pii/S016727890600217X. (Data Assimilation. ISSN 0167-2789)
Breiman Leo (1973) Probability. Addison-Wesley, Reading, Mass
Google Scholar
Cotter SL, Dashti M, Robinson JC, Stuart AM (2009) Bayesian inverse problems for functions and applications to fluid mechanics. Inverse Probl 25(11):115008, 43. https://doi.org/10.1088/0266-5611/25/11/115008 ( ISSN 0266-5611)
Derber JC (1989) A variational continuous assimilation technique. Monthly Weather Rev 117(11):2437–2446
Article Google Scholar
Dutra DA, Teixeira BOS, Aguirre LA (2014) Maximum a posteriori state path estimation: Discretization limits and their interpretation. Automatica 50(5):1360–1368. ISSN 0005-1098. https://doi.org/10.1016/j.automatica.2014.03.003. http://www.sciencedirect.com/science/article/pii/S0005109814000958
Evensen Geir (2007) Data assimilation. The ensemble Kalman filter. Springer, New York
Google Scholar
Franzke Christian LE, O’Kane Terence J, Judith Berner, Williams Paul D, Valerio Lucarini (2015) Stochastic climate theory and modeling. Wiley Interdiscip Rev Clim Change 6(1):63–78. https://doi.org/10.1002/wcc.318 (ISSN 1757-7799)
Article Google Scholar
Gallot S, Hulin D, Lafontaine J (2004) Riemannian geometry. Universitext. Springer, Berlin, third edition. https://doi.org/10.1007/978-3-642-18855-8 (ISBN 3-540-20493-8)
Ide K, Courtier P, Ghil M, Lorenc AC (1997) Unified notation for data assimilation: operational, sequential and variational. J Meteorol Soc Jpn 75(1B):181–189
Article Google Scholar
Ikeda N, Watanabe S (1989) Stochastic differential equations and diffusion processes, volume 24 of North-Holland Mathematical Library. North-Holland Publishing Co., Amsterdam, second edition
Imkeller P, von Storch J-S (eds) (2001) Stochastic climate models, volume 49 of Progress in Probability. Birkhäuser Verlag, Basel. https://doi.org/10.1007/978-3-0348-8287-3 (ISBN 3-7643-6520-X)
Jazwinski AH (1970) Stochastic processes and filtering theory volume 64 of mathematics in science and engineering. Academic Press, New York (ISBN 9780123815507)
Google Scholar
Kalnay Eugenia (2001) Atmospheric modeling, data assimilation and predictability, 1st edn. Cambridge University Press, Cambridge
Google Scholar
Kloeden PE, Platen E (1992) Numerical solution of Stochastic differential equations. Springer, Berlin
Book Google Scholar
Milstein GN (1995) Numerical integration of stochastic differential equations, volume 313 of Mathematics and its Applications. Kluwer Academic Publishers Group, Dordrecht. https://doi.org/10.1007/978-94-015-8455-5. (Translated and revised from the 1988 Russian original. ISBN 0-7923-3213-X)
Mortensen RE (1968) Maximum-likelihood recursive nonlinear filtering. J Optim Theory Appl 2:386–394
Article Google Scholar
Mörters P, Peres Y (2010) Brownian motion, volume 30 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511750489. (With an appendix by Oded Schramm and Wendelin Werner. ISBN 978-0-521-76018-8)
Øksendal B (1998) Stochastic differential equations. Universitext. Springer, Berlin, fifth edition. https://doi.org/10.1007/978-3-662-03620-4. (An introduction with applications.ISBN 3-540-63720-6)
Stuart AM (2010) Inverse problems: a bayesian perspective. Acta Numer 19:451–559. https://doi.org/10.1017/S0962492910000061
Article Google Scholar
Sugiura N (2017) The Onsager–Machlup functional for data assimilation. Nonlinear Process Geophys 24(4):701–712. https://doi.org/10.5194/npg-24-701-2017
Article Google Scholar
Vanden-Eijnden E, Weare J (2013) Data assimilation in the low noise regime with application to the kuroshio. Monthly Weather Rev 141(6):1822–1841, 6 2013. https://doi.org/10.1175/MWR-D-12-00060.1 (ISSN 0027-0644)
Zeitouni O, Dembo A (1987) A maximum a posteriori estimator for trajectories of diffusion processes. Stochastics 20(3):221
Article Google Scholar
Zeitouni O, Dembo A (1988) An existence theorem and some properties of maximum a posteriori estimators of trajectories of diffusions. Stochastics 23(2):197. https://doi.org/10.1080/17442508808833490 (ISSN 0090-9491)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematical and Physical Sciences, University of Reading, Reading, UK
Jochen Bröcker

Authors

Jochen Bröcker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jochen Bröcker.

Additional information

The author was supported by the UK Engineering and Physical Sciences Research Council under grant agreement EP/L012669/1. Fruitful discussions with Tobias Kuna, Dan Crisan, Andrew Stuart, Colin Cotter, and Horatio Boedihardjo are gratefully acknowledged. Referee Stéphàne Vannitsem and a second anonymous referee provided a number of important comments which helped to improve this manuscript.

Appendices

Appendix A: Derivation of the correct functional

We will attempt a more careful calculation of the $\epsilon $–weight of a path which will not only allow us to take the limits in the right order and obtain the correct expression for the density, but also to identify the reason why interchanging these limits gives a different result. We will later restrict our attention to linear dynamics. It should be said that for linear dynamics, the additional term in the Onsager–Machlup functional (11) does not depend on the reference trajectory and hence minimising ${\mathcal {A}}_{OM}$ or ${\mathcal {A}}_{E}$ gives the same results in this case. However, the functionals are still different and only the Onsager–Machlup functional provides the correct density.

First we note the following simple but important fact. Let $Z^{(1)}, Z^{(2)}$ be random variables with values in ${\mathbb {R}}^N$ with densities $p_1, p_2$ respectively, and $p_2(z)> 0$ for all $z \in {\mathbb {R}}^N$. Further, let $\phi $ be a function on ${\mathbb {R}}^N$. Then the identity

$$\begin{aligned} {\mathbb {E}}(\phi (Z^{(1)})) = {\mathbb {E}}\left( \phi (Z^{(2)}) \frac{p_1(Z^{(2)})}{p_2(Z^{(2)})} \right), \end{aligned}$$

holds, since

$$\begin{aligned} {\mathbb {E}}(\phi (Z^{(1)})) = \int _{{\mathbb {R}}^n}\phi (z) p_1(z) {\mathrm {d}}z = \int _{{\mathbb {R}}^n} \phi (z) \frac{p_1(z)}{p_2(z)} p_2(z) {\mathrm {d}}z = {\mathbb {E}}\left( \phi (Z^{(2)}) \frac{p_1(Z^{(2)})}{p_2(Z^{(2)})} \right) . \end{aligned}$$

(23)

On the other hand, note that

$$\begin{aligned} \mathbbm {P}(\max _k | X_{t_k} - z_{t_k}| \le \epsilon ) = {\mathbb {E}}\left( H\left( \frac{\max _k | X_{t_k} - z_{t_k}|}{\epsilon } - 1\right) \right) , \end{aligned}$$

(24)

where H is the Heaviside function. We might use Eqs (23) in (24) with

$$\begin{aligned} \begin{aligned} \phi (z)&= H\left( \frac{\max _k |z_k|}{\epsilon } - 1\right) , \\ Z^{(1)}&= (X^{({\varDelta })}_{t_1} - z_{t_1}, \ldots , X^{({\varDelta })}_{t_N} - z_{t_N}),\\ Z^{(2)}&= (W_{t_1}, \ldots , W_{t_N}), \end{aligned} \end{aligned}$$

where $(X^{({\varDelta })}_{t_1}, \ldots , X^{({\varDelta })}_{t_N})$ is a solution to the Euler approximation (13). Note that $(X^{({\varDelta })}_{t_1} - z_{t_1}, \ldots , X^{({\varDelta })}_{t_N} - z_{t_N})$ is then a solution of the system (19). We therefore obtain

$$\begin{aligned} \mathbbm {P}(\max _k|X_{t_k} - z_{t_k}| \le \epsilon ) = {\mathbb {E}}\left[ H\left( \frac{\max _k |W_{t_k}|}{\epsilon } - 1\right) \exp ( A + B + C ) \right], \end{aligned}$$

(25)

with

$$\begin{aligned} \begin{aligned} A&= -\frac{{\varDelta }}{2\rho ^2} \sum _{n = 1}^{N} \left( \frac{z_{t_n} - z_{t_{n-1}}}{{\varDelta }} - f(z_{t_{n-1}} + \rho W_{t_{n-1}})\right) ^2\\ B&= - \frac{1}{\rho } \sum _{n = 1}^{N} \left( \frac{z_{t_n} - z_{t_{n-1}}}{{\varDelta }}\right) (W_{t_n} - W_{t_{n-1}})\\ C&= \frac{1}{\rho } \sum _{n = 1}^{N} (f(z_{t_{n-1}} + \rho W_{t_{n-1}})) (W_{t_n} - W_{t_{n-1}}). \end{aligned} \end{aligned}$$

(26)

In terms of the limits ${\varDelta } \rightarrow 0$ and $\epsilon \rightarrow 0$, the first two terms A and B will converge to

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \lim _{{\varDelta } \rightarrow 0} A = -\frac{1}{2\rho ^2} \int _{0}^{T} (\dot{z_t} - f(z_t))^2 {\mathrm {d}}t, \end{aligned}$$

(27)

and zero, respectively, no matter in which order the limits are taken. The third term however shows different behaviour depending on whether ${\varDelta } \rightarrow 0$ or $\epsilon \rightarrow 0$ first. If we take ${\varDelta } \rightarrow 0$ first, it can be shown that a well defined random variable obtains^{Footnote 3} which can be written as an Ito integral

$$\begin{aligned} \lim _{{\varDelta } \rightarrow 0} C = \frac{1}{\rho } \int _{0}^{T} f(z_{t} + \rho W_{t}) {\mathrm {d}}W_{t}. \end{aligned}$$

We do not expect the reader to be familiar with the theory of Ito integrals – relevant here is that the limit of this expression for $\epsilon \rightarrow 0$ will not be zero but

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \lim _{{\varDelta } \rightarrow 0} C = -\frac{1}{2} \int _{0}^{T} f'(z_{t}) {\mathrm {d}}t. \end{aligned}$$

(28)

A demonstration of this fact^{Footnote 4} for the case where f is linear is given here for illustration. If $f(x) = a x$ for some $a \in {\mathbb {R}}$, then

$$\begin{aligned} \begin{aligned} C&= \frac{a}{\rho } \sum _{n = 1}^{N} (z_{t_{n-1}} + \rho W_{t_{n-1}}) (W_{t_n} - W_{t_{n-1}})\\&= \frac{a}{\rho } \sum _{n = 1}^{N} z_{t_{n-1}} (W_{t_n} - W_{t_{n-1}}) + a \sum _{n = 1}^{N} W_{t_{n-1}} (W_{t_n} - W_{t_{n-1}})\\&= \frac{a}{\rho } C_1 + a C_2. \end{aligned} \end{aligned}$$

(29)

It is easy to see that $C_1 \rightarrow 0$ if ${\varDelta } \rightarrow 0$ and $\epsilon \rightarrow 0$, no matter in which order these limits are taken. After some algebra, we can write $C_2$ as

$$\begin{aligned} \begin{aligned} C_2&= \sum _{n = 1}^{N} W_{t_{n-1}} (W_{t_n} - W_{t_{n-1}})\\&= \frac{1}{2} W_{T}^2 - \frac{1}{2} \sum _{n = 1}^{N} (W_{t_n} - W_{t_{n-1}})^2. \end{aligned} \end{aligned}$$

Considering the mean and the variance of the second term, we obtain $\frac{1}{2} T$ and $\frac{1}{2} T{\varDelta }$, respectively, implying that (at least in a mean square sense) the second term converges to its mean $\frac{1}{2} T$ if ${\varDelta } \rightarrow 0$. Hence

$$\begin{aligned} \lim _{{\varDelta } \rightarrow 0} C_2 = \frac{1}{2} W_{T}^2 - \frac{1}{2} T. \end{aligned}$$

(30)

Therefore, taking the limits ${\varDelta } \rightarrow 0$ and then $\epsilon \rightarrow 0$ in Eq. (29) and using Eq. (30) we obtain

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \lim _{{\varDelta } \rightarrow 0}C = -\frac{a}{2} T \end{aligned}$$

which is the same as Eq. (28) for this special case.

Using Eq. (28) and the expression in Eqs. (27) in (25) we obtain that for small $\epsilon $

$$\begin{aligned} \begin{aligned} \mathbbm {P}(\sup _t |X_{t} - z_{t}| \le \epsilon )&\cong {\mathbb {E}}( H\left( \frac{\sup _t|W_{t}|}{\epsilon } - 1\right) \\&\quad \cdot \; \exp \left[ -\frac{1}{2\rho ^2} \int _{0}^{T} (\dot{z_t} - f(z_t))^2 {\mathrm {d}}t -\frac{1}{2} \int _{0}^{T} f'(z_{t}) {\mathrm {d}}t. \right] \end{aligned} \end{aligned}$$

so that we can conclude

$$\begin{aligned} \begin{aligned}&\lim _{\epsilon \rightarrow 0}\frac{\mathbbm {P}(\sup _t |X_{t} - z_{t}| \le \epsilon )}{ \mathbbm {P}( \sup _t|W_{t}| \le \epsilon )}\\&\quad = \exp \left[ -\frac{1}{2\rho ^2} \int _{0}^{T} (\dot{z_t} - f(z_t))^2 {\mathrm {d}}t -\frac{1}{2} \int _{0}^{T} f'(z_{t}) {\mathrm {d}}t. \right] \\&\quad = \exp (-{\mathcal {A}}_{OM}). \end{aligned} \end{aligned}$$

Note that if we used Eqs (25,26) as a starting point but subsequently took the limits in the wrong order, that is, first $\epsilon \rightarrow 0$ and then ${\varDelta } \rightarrow 0$, we would have $B, C \rightarrow 0$, so we would obtain the energy estimator ${\mathcal {A}}_{E}$.

As a final remark, by looking back at the calculations the reader will see that the only term that does not permit interchange of the limits is a second order or “quadratic” term $ \sum _{n = 1}^N (W_{t_n} - W_{t_{n-1}})^2$ which would vanish with ${\varDelta } \rightarrow 0$ if W were a differentiable function but converges to T in case of the Wiener process. Roughly speaking, this is because $W_{t_n} - W_{t_{n-1}}$ is of order $\sqrt{{\varDelta }}$, which more generally gives rise to the extra terms in the Ito calculus.

Appendix B: Derivation of Eq. (18)

In this section, we will derive the Eq. (18), that is, we follow same steps as for the Euler scheme and take the limits as in Eq. (15), but starting with the implicit scheme (17) instead of the Euler scheme. If we set $R_n = \rho (W_{t_n} - W_{t_{n-1}})$, then the implicit scheme (17) can be written in the form

$$\begin{aligned} X_{t_{n}} = X_{t_{n-1}} + F_1(X_{t_{n}}) + F_2(X_{t_{n-1}}) + R_n \end{aligned}$$

which can be expressed as $(R_1, \ldots , R_N) = {\varPsi }(X_{t_1}, \ldots , X_{t_N})$ with

$$\begin{aligned} {\varPsi }_n(x_1, \ldots , x_N) = x_{n} - x_{n-1} - F_1 (x_{n}) - F_2(x_{n-1}) \qquad \text {for }n = 1, \ldots , N. \end{aligned}$$

According to basic probability calculus, we have for the densities

$$\begin{aligned} p_{X_{t_1}, \ldots , X_{t_N}}(x_{1}, \ldots , x_{N}) = p_{R_1, \ldots , R_N}({\varPsi }(x_{1}, \ldots , x_{N})) \cdot \left| \frac{\partial {\varPsi }}{\partial x}(x_{1}, \ldots , x_{N}) \right| \end{aligned}$$

(31)

Since $\frac{\partial {\varPsi }_k}{\partial x_l} = 0$ for $k < l$, the Jacobi matrix of ${\varPsi }$ is lower left triangular and hence

$$\begin{aligned} \begin{aligned} \left| \frac{\partial {\varPsi }}{\partial x} \right| (x_{1}, \ldots , x_{N})&= \prod _{n = 1}^N \frac{\partial {\varPsi }_k}{\partial x_k}(x_{1}, \ldots , x_{N}) \\&= \prod _{n = 1}^N 1 - F_1'(x_k) \\&= \prod _{n = 1}^N 1 - (1 - \lambda ) {\varDelta } f'(x_k) \\&= \exp \left( \sum _{n = 1}^N \log (1 - (1 - \lambda ) {\varDelta } f'(x_k)) \right) . \end{aligned} \end{aligned}$$

We evaluate this expression with $x_k = z_{t_k}$ for $k = 1, \ldots , N$ where $\{z_t\}$ is some trajectory on the interval $I = [0, T]$ and $N = T/{\varDelta }$. Since $\log (1 + w) \cong w$ for small w, we can write the exponent approximately as

$$\begin{aligned} \sum _{n = 1}^N \log (1 - (1 - \lambda ) {\varDelta } f'(z_{t_n})) \cong - (1 - \lambda ) {\varDelta } \sum _{n = 1}^N f'(z_{t_n}) \end{aligned}$$

which is a Riemann sum converging to $-(1 - \lambda )\int _I f'(z_t) \; {\mathrm {d}}t$. The first factor in Eq. (31), after normalisation and when evaluated along a trajectory, reads as

$$\begin{aligned} \begin{aligned}&\frac{p_{R_1, \ldots , R_N}({\varPsi }(z_{t_1}, \ldots , z_{t_N}))}{p_{R_1, \ldots , R_N}(0, \ldots , 0) }\\&= \exp \left( -\frac{{\varDelta }}{2 \rho ^2} \sum _{n = 1}^N \left( \frac{z_{t_n} - z_{t_{n-1}}}{{\varDelta }} - (1 - \lambda ) f(z_{t_n}) - \lambda f(z_{t_{n-1}}) \right) ^2 \right) . \end{aligned} \end{aligned}$$

Again, the exponent is a Riemann sum which converges to $-\frac{1}{2 \rho ^2}\int _I (\dot{z}_t - f(z_t))^2 {\mathrm {d}}t$ for ${\varDelta } \rightarrow 0$. In summary, we get Eq. (18).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bröcker, J. What is the correct cost functional for variational data assimilation?. Clim Dyn 52, 389–399 (2019). https://doi.org/10.1007/s00382-018-4146-y

Download citation

Received: 01 June 2017
Accepted: 15 February 2018
Published: 15 March 2018
Issue Date: 24 January 2019
DOI: https://doi.org/10.1007/s00382-018-4146-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What is the correct cost functional for variational data assimilation?

Abstract

Access this article

Similar content being viewed by others

Treating Nonlinearities in Data-Space Variational Assimilation

Comparison of Stochastic Parametrization Schemes Using Data Assimilation on Triad Models

A note on domain decomposition approaches for solving 3D variational data assimilation models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Derivation of the correct functional

Appendix B: Derivation of Eq. (18)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

What is the correct cost functional for variational data assimilation?

Abstract

Access this article

Similar content being viewed by others

Treating Nonlinearities in Data-Space Variational Assimilation

Comparison of Stochastic Parametrization Schemes Using Data Assimilation on Triad Models

A note on domain decomposition approaches for solving 3D variational data assimilation models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Derivation of the correct functional

Appendix B: Derivation of Eq. (18)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation