Abstract
Variational approaches to data assimilation, and weakly constrained four dimensional variation (WC-4DVar) in particular, are important in the geosciences but also in other communities (often under different names). The cost functions and the resulting optimal trajectories may have a probabilistic interpretation, for instance by linking data assimilation with maximum aposteriori (MAP) estimation. This is possible in particular if the unknown trajectory is modelled as the solution of a stochastic differential equation (SDE), as is increasingly the case in weather forecasting and climate modelling. In this situation, the MAP estimator (or “most probable path” of the SDE) is obtained by minimising the Onsager–Machlup functional. Although this fact is well known, there seems to be some confusion in the literature, with the energy (or “least squares”) functional sometimes been claimed to yield the most probable path. The first aim of this paper is to address this confusion and show that the energy functional does not, in general, provide the most probable path. The second aim is to discuss the implications in practice. Although the mentioned results pertain to stochastic models in continuous time, they do have consequences in practice where SDE’s are approximated by discrete time schemes. It turns out that using an approximation to the SDE and calculating its most probable path does not necessarily yield a good approximation to the most probable path of the SDE proper. This suggest that even in discrete time, a version of the Onsager–Machlup functional should be used, rather than the energy functional, at least if the solution is to be interpreted as a MAP estimator.
Similar content being viewed by others
Notes
We are grateful to referee Stéphàne Vannitsem for stressing this point.
The problem is the translation invariance of the standard volume. In an infinite dimensional normed space, a ball of unit radius may contain infinitely many disjoint balls of sufficiently small but nonzero radius. By translation invariance, these balls must have the same volume. But this means that either the volume of the unit ball is infinity or the volume of a sufficiently small ball is zero.
The limit is in fact in the \(L_2\) sense.
Strictly speaking this “fact” is only correct in a much weaker sense but still sufficient to derive the Onsager–Machlup functional; The correct statement is that
$$\begin{aligned} {\mathbb {E}}\left[ \exp \left( \int _{0}^{T} f(z_{t} + W_{t}) {\mathrm {d}}W_{t} + \frac{1}{2} \int _{0}^{T} f'(z_{t}) {\mathrm {d}}t \right) \left| \right. \sup _t |W_t | \le \epsilon \right] \rightarrow 1 \end{aligned}$$for \(\epsilon \rightarrow 0\), see Ikeda and Watanabe (1989).
References
Apte A, Hairer M, Stuart AM, Voss J (2007) Sampling the posterior: an approach to non-gaussian data assimilation. Phys D Nonlinear Phenom 230(1):50–64. https://doi.org/10.1016/j.physd.2006.06.009. http://www.sciencedirect.com/science/article/pii/S016727890600217X. (Data Assimilation. ISSN 0167-2789)
Breiman Leo (1973) Probability. Addison-Wesley, Reading, Mass
Cotter SL, Dashti M, Robinson JC, Stuart AM (2009) Bayesian inverse problems for functions and applications to fluid mechanics. Inverse Probl 25(11):115008, 43. https://doi.org/10.1088/0266-5611/25/11/115008 ( ISSN 0266-5611)
Derber JC (1989) A variational continuous assimilation technique. Monthly Weather Rev 117(11):2437–2446
Dutra DA, Teixeira BOS, Aguirre LA (2014) Maximum a posteriori state path estimation: Discretization limits and their interpretation. Automatica 50(5):1360–1368. ISSN 0005-1098. https://doi.org/10.1016/j.automatica.2014.03.003. http://www.sciencedirect.com/science/article/pii/S0005109814000958
Evensen Geir (2007) Data assimilation. The ensemble Kalman filter. Springer, New York
Franzke Christian LE, O’Kane Terence J, Judith Berner, Williams Paul D, Valerio Lucarini (2015) Stochastic climate theory and modeling. Wiley Interdiscip Rev Clim Change 6(1):63–78. https://doi.org/10.1002/wcc.318 (ISSN 1757-7799)
Gallot S, Hulin D, Lafontaine J (2004) Riemannian geometry. Universitext. Springer, Berlin, third edition. https://doi.org/10.1007/978-3-642-18855-8 (ISBN 3-540-20493-8)
Ide K, Courtier P, Ghil M, Lorenc AC (1997) Unified notation for data assimilation: operational, sequential and variational. J Meteorol Soc Jpn 75(1B):181–189
Ikeda N, Watanabe S (1989) Stochastic differential equations and diffusion processes, volume 24 of North-Holland Mathematical Library. North-Holland Publishing Co., Amsterdam, second edition
Imkeller P, von Storch J-S (eds) (2001) Stochastic climate models, volume 49 of Progress in Probability. Birkhäuser Verlag, Basel. https://doi.org/10.1007/978-3-0348-8287-3 (ISBN 3-7643-6520-X)
Jazwinski AH (1970) Stochastic processes and filtering theory volume 64 of mathematics in science and engineering. Academic Press, New York (ISBN 9780123815507)
Kalnay Eugenia (2001) Atmospheric modeling, data assimilation and predictability, 1st edn. Cambridge University Press, Cambridge
Kloeden PE, Platen E (1992) Numerical solution of Stochastic differential equations. Springer, Berlin
Milstein GN (1995) Numerical integration of stochastic differential equations, volume 313 of Mathematics and its Applications. Kluwer Academic Publishers Group, Dordrecht. https://doi.org/10.1007/978-94-015-8455-5. (Translated and revised from the 1988 Russian original. ISBN 0-7923-3213-X)
Mortensen RE (1968) Maximum-likelihood recursive nonlinear filtering. J Optim Theory Appl 2:386–394
Mörters P, Peres Y (2010) Brownian motion, volume 30 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511750489. (With an appendix by Oded Schramm and Wendelin Werner. ISBN 978-0-521-76018-8)
Øksendal B (1998) Stochastic differential equations. Universitext. Springer, Berlin, fifth edition. https://doi.org/10.1007/978-3-662-03620-4. (An introduction with applications.ISBN 3-540-63720-6)
Stuart AM (2010) Inverse problems: a bayesian perspective. Acta Numer 19:451–559. https://doi.org/10.1017/S0962492910000061
Sugiura N (2017) The Onsager–Machlup functional for data assimilation. Nonlinear Process Geophys 24(4):701–712. https://doi.org/10.5194/npg-24-701-2017
Vanden-Eijnden E, Weare J (2013) Data assimilation in the low noise regime with application to the kuroshio. Monthly Weather Rev 141(6):1822–1841, 6 2013. https://doi.org/10.1175/MWR-D-12-00060.1 (ISSN 0027-0644)
Zeitouni O, Dembo A (1987) A maximum a posteriori estimator for trajectories of diffusion processes. Stochastics 20(3):221
Zeitouni O, Dembo A (1988) An existence theorem and some properties of maximum a posteriori estimators of trajectories of diffusions. Stochastics 23(2):197. https://doi.org/10.1080/17442508808833490 (ISSN 0090-9491)
Author information
Authors and Affiliations
Corresponding author
Additional information
The author was supported by the UK Engineering and Physical Sciences Research Council under grant agreement EP/L012669/1. Fruitful discussions with Tobias Kuna, Dan Crisan, Andrew Stuart, Colin Cotter, and Horatio Boedihardjo are gratefully acknowledged. Referee Stéphàne Vannitsem and a second anonymous referee provided a number of important comments which helped to improve this manuscript.
Appendices
Appendix A: Derivation of the correct functional
We will attempt a more careful calculation of the \(\epsilon \)–weight of a path which will not only allow us to take the limits in the right order and obtain the correct expression for the density, but also to identify the reason why interchanging these limits gives a different result. We will later restrict our attention to linear dynamics. It should be said that for linear dynamics, the additional term in the Onsager–Machlup functional (11) does not depend on the reference trajectory and hence minimising \({\mathcal {A}}_{OM}\) or \({\mathcal {A}}_{E}\) gives the same results in this case. However, the functionals are still different and only the Onsager–Machlup functional provides the correct density.
First we note the following simple but important fact. Let \(Z^{(1)}, Z^{(2)}\) be random variables with values in \({\mathbb {R}}^N\) with densities \(p_1, p_2\) respectively, and \(p_2(z)> 0\) for all \(z \in {\mathbb {R}}^N\). Further, let \(\phi \) be a function on \({\mathbb {R}}^N\). Then the identity
holds, since
On the other hand, note that
where H is the Heaviside function. We might use Eqs (23) in (24) with
where \((X^{({\varDelta })}_{t_1}, \ldots , X^{({\varDelta })}_{t_N})\) is a solution to the Euler approximation (13). Note that \((X^{({\varDelta })}_{t_1} - z_{t_1}, \ldots , X^{({\varDelta })}_{t_N} - z_{t_N})\) is then a solution of the system (19). We therefore obtain
with
In terms of the limits \({\varDelta } \rightarrow 0\) and \(\epsilon \rightarrow 0\), the first two terms A and B will converge to
and zero, respectively, no matter in which order the limits are taken. The third term however shows different behaviour depending on whether \({\varDelta } \rightarrow 0\) or \(\epsilon \rightarrow 0\) first. If we take \({\varDelta } \rightarrow 0\) first, it can be shown that a well defined random variable obtainsFootnote 3 which can be written as an Ito integral
We do not expect the reader to be familiar with the theory of Ito integrals – relevant here is that the limit of this expression for \(\epsilon \rightarrow 0\) will not be zero but
A demonstration of this factFootnote 4 for the case where f is linear is given here for illustration. If \(f(x) = a x\) for some \(a \in {\mathbb {R}}\), then
It is easy to see that \(C_1 \rightarrow 0\) if \({\varDelta } \rightarrow 0\) and \(\epsilon \rightarrow 0\), no matter in which order these limits are taken. After some algebra, we can write \(C_2\) as
Considering the mean and the variance of the second term, we obtain \(\frac{1}{2} T\) and \(\frac{1}{2} T{\varDelta }\), respectively, implying that (at least in a mean square sense) the second term converges to its mean \(\frac{1}{2} T\) if \({\varDelta } \rightarrow 0\). Hence
Therefore, taking the limits \({\varDelta } \rightarrow 0\) and then \(\epsilon \rightarrow 0\) in Eq. (29) and using Eq. (30) we obtain
which is the same as Eq. (28) for this special case.
Using Eq. (28) and the expression in Eqs. (27) in (25) we obtain that for small \(\epsilon \)
so that we can conclude
Note that if we used Eqs (25,26) as a starting point but subsequently took the limits in the wrong order, that is, first \(\epsilon \rightarrow 0\) and then \({\varDelta } \rightarrow 0\), we would have \(B, C \rightarrow 0\), so we would obtain the energy estimator \({\mathcal {A}}_{E}\).
As a final remark, by looking back at the calculations the reader will see that the only term that does not permit interchange of the limits is a second order or “quadratic” term \( \sum _{n = 1}^N (W_{t_n} - W_{t_{n-1}})^2\) which would vanish with \({\varDelta } \rightarrow 0\) if W were a differentiable function but converges to T in case of the Wiener process. Roughly speaking, this is because \(W_{t_n} - W_{t_{n-1}}\) is of order \(\sqrt{{\varDelta }}\), which more generally gives rise to the extra terms in the Ito calculus.
Appendix B: Derivation of Eq. (18)
In this section, we will derive the Eq. (18), that is, we follow same steps as for the Euler scheme and take the limits as in Eq. (15), but starting with the implicit scheme (17) instead of the Euler scheme. If we set \(R_n = \rho (W_{t_n} - W_{t_{n-1}})\), then the implicit scheme (17) can be written in the form
which can be expressed as \((R_1, \ldots , R_N) = {\varPsi }(X_{t_1}, \ldots , X_{t_N})\) with
According to basic probability calculus, we have for the densities
Since \(\frac{\partial {\varPsi }_k}{\partial x_l} = 0\) for \(k < l\), the Jacobi matrix of \({\varPsi }\) is lower left triangular and hence
We evaluate this expression with \(x_k = z_{t_k}\) for \(k = 1, \ldots , N\) where \(\{z_t\}\) is some trajectory on the interval \(I = [0, T]\) and \(N = T/{\varDelta }\). Since \(\log (1 + w) \cong w\) for small w, we can write the exponent approximately as
which is a Riemann sum converging to \(-(1 - \lambda )\int _I f'(z_t) \; {\mathrm {d}}t\). The first factor in Eq. (31), after normalisation and when evaluated along a trajectory, reads as
Again, the exponent is a Riemann sum which converges to \(-\frac{1}{2 \rho ^2}\int _I (\dot{z}_t - f(z_t))^2 {\mathrm {d}}t\) for \({\varDelta } \rightarrow 0\). In summary, we get Eq. (18).
Rights and permissions
About this article
Cite this article
Bröcker, J. What is the correct cost functional for variational data assimilation?. Clim Dyn 52, 389–399 (2019). https://doi.org/10.1007/s00382-018-4146-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00382-018-4146-y