A Regularity-Aware Algorithm for Variational Data Assimilation of an Idealized Coupled Atmosphere–Ocean Model

Korn, Peter

doi:10.1007/s10915-018-0871-y

A Regularity-Aware Algorithm for Variational Data Assimilation of an Idealized Coupled Atmosphere–Ocean Model

Open access
Published: 14 November 2018

Volume 79, pages 748–786, (2019)
Cite this article

Download PDF

You have full access to this open access article

Journal of Scientific Computing Aims and scope Submit manuscript

A Regularity-Aware Algorithm for Variational Data Assimilation of an Idealized Coupled Atmosphere–Ocean Model

Download PDF

Peter Korn ORCID: orcid.org/0000-0002-7525-5732¹

1050 Accesses
7 Citations
Explore all metrics

Abstract

We study the problem of determining through a variational data assimilation approach the initial condition for a coupled set of nonlinear partial differential equations from which a model trajectory emerges in agreement with a given set of time-distributed observations. The partial differential equations describe an idealized coupled atmospheric–ocean model on a rotating torus. The model consist of the viscous shallow-water equations in geophysical scaling that represents the large-scale atmospheric dynamics coupled via a simplified but physically plausible coupling to a model that represents the large-scale ocean dynamics and consists of the incompressible two-dimensional Navier–Stokes equations and an advection–diffusion equation. We propose a variational algorithm (4D-Var) of the coupled data assimilation problem that is solvable and computable. This algorithm relies on the use of a variational cost functional that is tailored to the regularity of the coupled model as well as to the regularity of the observations by means of derivative-based norms. We support this proposal by developing regularity results for an idealized coupled atmospheric–ocean model using the concept of classical solutions. Based on these results we formulate a suitable cost functional. For this cost functional we prove the existence of optimal initial conditions in the sense of minimizers of the cost functional and characterize the minimizers by a first-order necessary condition involving the coupled adjoint equations. We prove local convergence of a gradient-based descent algorithm to optimal initial conditions using second-order adjoint information. Instrumental for our results is the use of suitable Sobolev norms instead of the standard Lebesgue norms in the cost functional. The index of the actual Sobolev space provides an additional scale selective mechanisms in the variational algorithm.

Strong Solvability of a Variational Data Assimilation Problem for the Primitive Equations of Large-Scale Atmosphere and Ocean Dynamics

Article Open access 26 April 2021

Variational Method for Solving the Quasi-Geostrophic Circulation Problem in a Two-Layer Ocean

Article 14 October 2022

A Discrete Data Assimilation Algorithm for the Three Dimensional Planetary Geostrophic Equations of Large-Scale Ocean Circulation

Article 29 July 2022

1 Introduction

Data assimilation aims at blending observational data with a dynamical model of a physical process. It constitutes a fundamental technique for modelling real-world phenomena. Numerical weather prediction and ocean state estimation are examples of scientific disciplines that rely on data assimilation. Data assimilation for coupled atmosphere–ocean models, also known as coupled data assimilation, has recently gained attention (see the report of the World Meteorogical Organisation WMO [29] and references therein). The main goal of coupled data assimilation is to extend the predictability horizon of weather and climate forecasts. An important example of such a coupled assimilation problem is the initialization of a coupled atmosphere–ocean model for the purpose of climate prediction on decadal time scales. The challenges in coupled data assimilation are manifold and comprise for example the formulation of error covariance matrices, or the treatment of vastly different spatio-temporal scales of the respective model components. The current state of knowledge is summarized consisely in [29]. The majority of actual approaches applies a strategy based on numerical experiments using a fully discrete configuration (see e.g. [15, 16, 30]). Our approach is complementary to this and focuses on the Partial Differential Equations of an idealized coupled atmosphere–ocean model.

The purpose of this paper is to describe fundamental mathematical properties that need to be fulfilled by a variational data assimilation algorithm for a coupled atmosphere–ocean system such that the corresponding minimization problem is solvable and computable. We aim to formulate a sound mathematical basis that guarantees a controlled behaviour of a coupled variational data assimilation algorithm. For this purpose we focus on the formulation of the 4D-Var cost functional. This cost functional is minimized and its critical points determine the optimal initial conditions of variational data assimilation. The key idea for implementing our goal is to connect the norms in the cost functional to the regularity theory for the coupled equations and to the regularity of the observations. Not respecting this connection may lead to an unpredictable behaviour of the algorithm. If for example the function spaces used in the background term of the cost functional does not match the function space for which the coupled equations are well-posed then one searches in the variational optimization for initial conditions that potentially create an ill-defined model trajectory. Similar reasoning applies to the observational term of the cost functional. If the gradient of the cost functional, obtained from a reverse in time integration of the adjoint equations forced by the model-observation difference, does not reside in the same space as the initial condition this creates ill-defined gradients that may results in an erratic behaviour in the gradient-based optimization process. An ill-posed behaviour of the optimization procedure can in principle be compensated by regularization techniques. Therefore it is important to derive criteria for well-posed behaviour and to understand the possible sources of ill-posed optimization in order to design minimally invasive regularization techniques. To substantiate our arguments we investigate a set of coupled nonlinear PDE’s and carry out a mathematical analysis of an idealized atmosphere–ocean model that consists of an atmospheric component, described by viscous shall-water equations in a geophysical scaling, following [25], and of an ocean component that is given by the incompressible 2D-Navier–Stokes equations, supplemented by an advection diffusion equation. The coupling between the atmosphere and the ocean is simplified and done via a forcing term and not via boundary conditions as it is done in coupled (three-dimensional) general circulation models. For the idealized coupled atmosphere–ocean PDE considered here we develop a regularity theory (see Theorem 1). Based on this result we formulate a variational cost functional that is tailored to the regularity of the underlying equations. For this cost functional we prove the existence of minimizers, i.e. the existence of optimal initial conditions (Theorem 4). We characterize the optimal initial conditions by a necessary adjoint criterion (Theorem 5) and prove convergence (Theorem 7) of a gradient-based algorithm via an estimate of the Hessian of the cost functional utilizing second-order adjoint equations of the coupled model.

For the equations of the idealized atmosphere–ocean model we carry out an elementary mathematical analysis consisting of a well-posedness and regularity result in the Sobolev space $\mathcal {H}^\mathbf{s}$ with $\mathbf{s}:=(s_i)_{i=1}^4\in \mathbb {Z}^4, s_i\ge 3$ (see Theorem 1). With this analysis we stay within the framework of classical solutions and do not use the concept of weak and strong solutions. The use of classical solutions facilitates the analysis, but excludes relevant phenomena such as shocks and turbulence, including them requires more sophisticated solution concepts (e.g [5, 35]). Since the focus of this work is not well-posedness per se or optimal regularity but the relation between regularity/well-posedness of coupled equations to solvability/computability of the variational data assimilation problem we prefer to stay in the regularity regime of classical solutions before we relax this condition in later work. We study the regularity of the associated linearized and adjoint equations and show that the solution of the coupled equations is differentiable with respect to the initial condition. In accordance with the aforementioned regularity results we formulate a cost functional which uses Sobolev spaces $L^2(T,\mathcal {H}^{\mathbf{s}})$^{Footnote 1} with positive as well as negative Sobolev space indices$s_i\in \mathbb {Z},\, (i=1\ldots 4)$. The variational cost functional consists of a observational term and a background term. In the language of Inverse Problems the background term provides a Tikhonov regularization. For this term we use a Sobolev space $\mathcal {H}^{\mathbf{s}}$ with $s_i\ge 3$, in consistency with the regularity of the underlying equations. For the observational term the range of—positive or negative—Sobolev space order is determined by the regularity of model solution and of observations. The physical effect of including positive Sobolev space indices in the observational term of the cost functional is to fit the model to the observations on small scales, while negative indices exclude the small scales from the fitting process by means of a filtering process. This implements a regularity-aware filter capability within the data assimilation algorithm that regulates which spatial scales are seen by the data assimilation algorithm, while leaving the model dynamics unaffected. The filter can be adjusted individually for different components of the coupled state vector via the degree $s_i$ of the respective Sobolev space. The proposed formulation of the cost functional is the key to prove existence of minimizers for the variational data assimilation problem of the coupled equations (Theorem 4) and to show in Theorem 5 that critical points of the cost functional satisfy an adjoint equation with an observation-based forcing term that depends also on the specific Sobolev space used. It allows also to prove convergence of a steepest descent algorithm to an optimal initial condition, provided the initial guess of the iterative algorithm is good enough (Theorem 7). This result uses the regularity of the second-order adjoint equations. This numerical minimization algorithm shows also that our Sobolev-metric based approach can easily be incorporated into the standard $L^2$-based variational data assimilation algorithms.

Derivative-based metrics provide an additional spatial filter mechanism in the data assimilation algorithm with the objective of emphasizing/de-emphasizing specific scales of individual components of a multi-component state vector within the minimization procedure. Which choice of Sobolev space indices for background and observational term of the cost functional may improve the data assimilation results has to be determined by numerical experiments. For ocean data assimilation based on the Primitive Equations the results about the global well-posedness of the Primitive Equations with initial conditions in $H^1$ (see [9]) suggest to apply at least the $H^1$-norm in the background term rather than the usually chosen $L^2$-norm (cf. [31]). The choice of the appropriate Sobolev norms for the observational data should depend on the regularity of the observations. We furthermore note that the Sobolev embedding theorem implies that for sufficiently high-order, in our case of two spatial dimensions, $s\ge 2$, one controls additionally also the $L^2(T,L^\infty )$-norm while at the same time a Hilbert-space structure is retained that allows to stay in the framework of a least-square problem.

The approach described here has to be supplemented by numerical experiments and our results provide the basis for such experiments. The purpose of such experiments is to illustrate first, what can go wrong if the connection between regularity of model equations and observations and the norms in the cost functional is lost and what goes right if this connection is respected. Second, one has to explore the potential of the filter capacity that is provided by the derivative-based metrics. These experiments are beyond the scope of this paper.

The use of alternative metrics beyond $L^2(T,L^2)$ is not new in the literature. In [6] enstrophy-based metrics and Sobolev-type cost functionals were discussed in the context of turbulence control via boundary forcing for a channel flow with a focus on numerical experiments (see also [33]). We are not aware of a similar study for atmospheric or oceanic data assimilation. The data assimilation community in Atmosphere and Ocean sciences focuses on another important element of the cost functional, namely the modelling of the observational and model error covariance matrix (see e.g. [39]) that are often modelled with the intention of implementing a kind of filtering or scale selection.

Structure of the paper In Sect. 2 we introduce the coupled model and generic data assimilation cost functionals. Section 3 describes the functional setting. We proceed in Sect. 4 with the analysis of the model equations, its linearization and the adjoint equations. Based on this analysis we propose in Sect. 5 a specific formulation of the cost functional. We show that the use of higher-order cost functionals lead to solvable data assimilation problems (Theorem 4) and demonstrate how the modification can be incorporated into a iterative gradient based optimization algorithms (Theorem 5) and prove in Sect. 6.2 a convergence result for a descent algorithm (Theorem 7) by second-order adjoint methods. The paper ends with a conclusion in Sect. 7.

2 The Coupled Model and the Associated Data Assimilation Problem

We are studying an idealized coupled atmosphere–ocean model that is described by the following equations

$$\begin{aligned} \text {Atmosphere: }\quad&\frac{\partial \mathbf{u}^a}{\partial t} +(\mathbf{u}^a\cdot \nabla )\mathbf{u}^a +\frac{1}{Ro^a}\mathbf{u}^{a\bot } +\frac{1}{Ro^a}\nabla \theta ^a = \frac{1}{Re^a}\triangle \mathbf{u}^a, \end{aligned}$$

(1)

$$\begin{aligned}&\frac{\partial \tilde{Fr}^a\theta ^a}{\partial t} +\frac{1}{Ro^a}{div}(\mathbf{u}^a) = - \gamma \theta ^o+\frac{1}{Pe^a}\triangle \theta ^a. \end{aligned}$$

(2)

$$\begin{aligned} \text {Ocean: }\quad&\frac{\partial \mathbf{u}^o}{\partial t} +(\mathbf{u}^o\cdot \nabla )\mathbf{u}^o +\frac{1}{Ro^o}\mathbf{u}^{o\bot } +\frac{1}{Ro^o}\nabla p^o = \sigma \mathbf{u}^a +\frac{1}{Re^o}\triangle \mathbf{u}^o, \end{aligned}$$

(3)

$$\begin{aligned}&\frac{\partial \theta ^o}{\partial t} +(\mathbf{u}^o\cdot \nabla )\theta ^o =\frac{1}{Pe^o}\triangle \theta ^o,\end{aligned}$$

(4)

$$\begin{aligned}&{div}(\mathbf{u}^o)=0, \\ \text {with }&\text { initial conditions }\nonumber \\&\mathbf{u}^a(t_0)=\mathbf{u}^a_0,\ \theta ^a(t_0)=\theta ^a_0, \ \mathbf{u}^o(t_0)=\mathbf{u}^o_0,\ \theta ^o(t_0)=\theta ^o_0. \nonumber \end{aligned}$$

(5)

The state of the system is described by a state vector $\psi :=(\psi ^a,\psi ^o)$ with atmospheric component $\psi ^a:=(\mathbf{u}^a,\theta ^a)$ and oceanic component $\psi ^o:=(\mathbf{u}^o,\theta ^o)$, where each component consists of a (vectorial) velocity field $\mathbf{u}^a, \mathbf{u}^o$ and a scalar variable $\theta ^a,\theta ^o$, respectively. All functions depend on a two-dimensional space variable and time. For the Coriolis term we use the notation $\mathbf{u}^{a\bot }:=(u_1^a,u_2^a)^\bot =(-u_2^a,u_1^a)$. The coupling functions$\gamma ,\sigma $ regulate the strength of the interaction between the two components. Both coupling functions depend on space and time variables, and vary smoothly in space and time. We write the coupled model Eqs. (1)–(5) also in the following notation

$$\begin{aligned} \begin{aligned}&\frac{\partial \psi }{\partial t} +\mathcal {N}(\psi ,\psi ) +L\psi +D\psi =\mathcal {C}\psi ,\\&\text {with initial conditons } \psi (t_0)=\psi _0, \end{aligned} \end{aligned}$$

(6)

where $\mathcal {N}$ contains the advective terms of the equations, the operator L represents the linear terms, except the Laplace operator which is described by D.

The coupling is described by the coupling operator $\mathcal {C}$. The physical picture of the atmosphere–ocean coupling is as follows. The ocean model component consists of a two-dimensional velocity field and an equation that describes the sea-surface temperature of the ocean, while the atmospheric component contains a two-dimensional velocity field and an equation describing the heat content. Both systems are linked together via a simple coupling: changes in the oceanic sea surface temperature heat or cool the atmosphere, this changes the atmospheric circulation and consequently the atmospheric winds that drive the ocean. The atmosphere is affected by the ocean through a thermodynamic effect, while the ocean is influenced by the atmosphere through a dynamic effect. The strength of this coupling is regulated by the space- and time-dependent coupling functions $\gamma ,\sigma $. This simple coupling procedure is motivated by practice in large-scale climate modelling (see e.g. [14]). This type of coupling has for example been used to study the El-Nino-Southern Oscillation phenomenon, a climate mechanism in the tropical Pacific that crucially depends on the atmosphere–ocean interaction (see e.g. chpt. 7 in [11], or [28]).

The numbersRe, Pe with superindex for the atmospheric and oceanic component denote the Reynolds and Péclet numbers. The numbers Ro and $\tilde{Fr}$ in the atmospheric component denote the Rossby and the Froude number. The Froude number measuring the degree of compressibility occurs in the atmospheric component only. The atmospheric Eqs. (1)–(5) arise from the (2D-) shallow water equations after non-dimensionalization as in chpt. 4 of [25]. If U and L denote a typical velocity and length scale, respectively and $H_0$ represents the mean height of the fluid and $N_0$ the typical size of the height perturbation, then one can introduce the following non-dimensional variables to the atmospheric equations $x':=\frac{x}{L},\ t':=\frac{t L}{U},\ \mathbf{u}':=\frac{\mathbf{u}}{U},\ \theta ':=\frac{\theta }{N_0}$. We introduce the non-dimensional parameters

$$\begin{aligned} Re:=\frac{LU}{\nu _u},\quad Pe:=\frac{LU}{\nu _\theta },\quad Ro:=\frac{U}{Lf},\quad Fr:=\frac{U}{\sqrt{gH_0}},\quad \varTheta :=\frac{N_0}{H_0}, \end{aligned}$$

(7)

where $\nu _u,\nu _\theta $ denote viscosity and diffusivity parameters, f is the rotation frequency, g the gravity constant. The Eqs. (1)–(5) are derived after applying the scaling above to the shallow water equations and using the assumption $Ro,Fr<< 1$, with the notation $\tilde{Fr}:=\sqrt{Fr}Ro$. More details, in particular the convergence of the uncoupled shallow water equations to the quasi-geostrophic equations in the vanishing Rossby number limit can be found in [7, 25]. A review of the mathematical theory of shallow-waters equations can be found in [8], for application in the context Geophysical Fluid Dynamics we refer to [36]. Through an appropriate choice of the non-dimensional parameters a coupled system (1)–(5) can be created that consists of a fast and a slow component.

The model Eqs. (1)–(5) do not describe general circulation models of atmosphere and ocean but comprise several idealizations and simplifications. First, the fact that the model is purely horizontal and the vertical axis is absent, excludes vertical dynamics of the atmospheric boundary layer and the oceanic mixed layer that are important for atmosphere–ocean interaction. Representing the rich dynamics of the two fluids by a state vector that consists in each case of a velocity field and a tracer variable represents another severe simplification. Furthermore we do not consider subgrid scale closures and physical parametrizations, and work with constant coefficients. As a consequence of our model configuration we work with an idealized coupling that focuses on first-order feedbacks between the two components, namely the thermal effect of the ocean on the atmosphere and the mechanical energy input of the atmosphere to the ocean. The heat transfer from the atmosphere to the ocean is on large scales negligible due to the larger heat capacity of the ocean. The momentum transfer from the ocean to the atmosphere can also be neglected on the scales under consideration here (see Remark 1). The coupling takes place via forcing terms and not, as in general circulation models, via boundary conditions. The complex and highly nonlinear physical behaviour of the atmosphere–ocean interface is poorly understood (see e.g. [32]). The coupling used here is clearly not realistic for describing the atmosphere–ocean interface on small scales but it has proven to be a useful approximation for large scales dynamics. It is physically plausible within the range of validity of our idealized atmosphere–ocean model, which has approximation quality for large-scale dynamics but not for small scales. For a description of the basic physical processes of atmosphere–ocean interaction we refer to [38]. A concise mathematical description of coupled general circulation models can be found in [21,22,23].

We now proceed by describing the data assimilation problem for the system (2) in an abstract manner. Let $\mathcal {X}$ be a generic “state space” of solutions of the coupled equations with a set of initial conditions $\mathcal {X}_0$. Let time distributed observations $\psi _{obs}\in \mathcal {Y}$ of the atmosphere and the ocean be given. These formal definitions will be specified in Sect. 5. Our aim is to fit a trajectory $\psi \in \mathcal {X}$ of the atmosphere–ocean model to the observations $\psi _{obs}\in \mathcal {Y}$ by minimizing the distance between coupled model trajectory and observations using the initial conditions $\psi _0 =(\mathbf{u}^a_0,\theta ^a_0, \mathbf{u}^o_0,\theta ^o_0)\in \mathcal {X}_0$ as a control variable. The coupled data assimilation problem consists in determining, for given observations $\psi _{obs}\in \mathcal {Y}$, an initial condition $\bar{\psi }_0\in \mathcal {X}_0$ such that

$$\begin{aligned} \begin{aligned}&\mathcal {J}(\bar{\psi }_0,\psi _{obs}) = \min _{\psi _0\in \mathcal {X}_0}\mathcal {J}(\psi _0,\psi _{obs})\\ \text { and }\quad&\\ {}&\psi (\bar{\psi }_0)\in \mathcal {X} \text { satisfies coupled model equations}\ (1)\text {--}(5)\\&\text { with initial condition } \bar{\psi }_0. \end{aligned} \end{aligned}$$

(8)

The cost functional $\mathcal {J}$ in (8) is defined as a sum of a background and an observational term

$$\begin{aligned} \begin{aligned} \mathcal {J}(\psi _0,\psi _{obs}):= \mathcal {J}_b(\psi _0) +\mathcal {J}_{obs}(\psi _0,\psi _{obs}). \end{aligned} \end{aligned}$$

(9)

The observational part $\mathcal {J}_{obs}$ of the cost functional measures the distance between the coupled model state and the observations and is defined by

$$\begin{aligned} \begin{aligned} \mathcal {J}_{obs}(\psi _0,\psi _{obs}) :=&\int _T ||\mathcal {M}[\psi _0]-\psi _{obs}||^2_{\mathcal {X}(d\mu _\mathcal {R})} dt. \end{aligned} \end{aligned}$$

(10)

Here $\mathcal {M}$ denotes the model operator that advances an initial condition $\psi _0$ to the model state $\psi (t)$ at time t. The observational error covariance operator $\mathcal {R}$ provides a statistical weighting of the model-observation misfit, according to the quality of the observations. We subsume this weighting into the Lebesgue measure and denote the resulting measure by $d\mu _\mathcal {R}$ and the model space supplemented with this measure by $\mathcal {X}(d\mu _\mathcal {R})$. The norm in (10) acts on the spatial variable.

The background term $\mathcal {J}_b$ of the cost functional in (9) is defined by

$$\begin{aligned} \begin{aligned} \mathcal {J}_b(\psi _0):=&\mathcal {J}_b(\psi _0,\psi _{back}):= ||\psi _0-\psi _{back}||^2_{\mathcal {X}(d\mu _\mathcal {B})}\\ =&\big \langle \mathcal {B}(\psi _0-\psi _{back}),\psi _0-\psi _{back}\big \rangle _{\mathcal {X}(dx)}, \end{aligned} \end{aligned}$$

(11)

where $\psi _{back}$ is a given background state. The background state incorporates prior information about the system and can for example be provided by a previous forecast. The model error covariance operator $\mathcal {B}$ provides a weighting according to the estimated background error and is assumed to be linear, bounded and positive definite. We do not address the important problem of modelling error statistics but assume these error covariance operators as given and that the error covariance operators preserve the functional space of model solutions and observations [see 26].

The data assimilation problem (8) and the cost functional (9)–(11) has been stated in a formal manner. In order to arrive at a sensible definition that provides the basis of a computational algorithm we have to give the abstract notion of “state space” $\mathcal {X}$, “space of initial conditions” $\mathcal {X}_0$ and “observation space” $\mathcal {Y}$ a precise meaning. This will be done in Sect. 5, relying on the mathematical analysis of the coupled model in Sect. 4, for which the next section provides that mathematical framework.

3 Functional Setting

Domain and Boundary Conditions The spatial domain is a two dimensional square $\varOmega :=(0,L)\times (0,L)$ with $L\in \mathbb {R}^+$. We assume periodic boundary conditions.

By $W^s(\varOmega )$ we denote the $L^2$-Sobolev space of order $s\in \mathbb {Z}_+\cup \{0\}$ that is defined as the set of functions $f\in L^2(\varOmega )$ such that its derivatives in the distributional sense $\mathcal {D}^\alpha f(x,y)=\partial ^{\alpha _1}_{x}\partial ^{\alpha _2}_{y} f(x,y)$ are in $L^2(\varOmega )$ for all $|\alpha |\le s$, with multi-index $\alpha =(\alpha _1,\alpha _2)\in \mathbb {Z}_+^2$, and degree $|\alpha |:=\alpha _1+\alpha _2$. The scalar product in $W^s(\varOmega )$ is defined by

$$\begin{aligned} \big \langle f,g\big \rangle _{W^s}:=\sum _{|\alpha |\le s}\int _\varOmega \mathcal {D}^\alpha f\cdot \mathcal {D}^\alpha g\, dx. \end{aligned}$$

(12)

The vectorial counterpart of the Sobolev space $W^s(\varOmega )$ is denoted by $\mathbf {W}^s(\varOmega )$. More information about Sobolev spaces can be found for example in [2, 12, 27]. We define

$$\begin{aligned} {V}:= & {} \{f:\mathbb {R}^2\rightarrow \mathbb {R}: f\text { is a trigonometric polynomial with period L,}\nonumber \\&\quad \text { and } \int _\varOmega f\, dx=0\}, \end{aligned}$$

(13)

and its vector-valued equivalent

$$\begin{aligned} \begin{aligned} \mathbf { V}:=\{u:\mathbb {R}^2\rightarrow \mathbb {R}^2:&u\text { is a vector-valued trigonometric polynomial with}\\&\text {period L, and} \int _\varOmega u\, dx=0\}. \end{aligned} \end{aligned}$$

(14)

We define now the following function spaces

$$\begin{aligned} \begin{aligned} H^s(\varOmega )&:= \text {the closure of }V\ \text {in }W^s(\varOmega ),\\ \mathbf {H}^s(\varOmega )&:= \text {the closure of }\mathbf { V}\ \text {in }\mathbf {W}^s(\varOmega ),\\ \mathbf {H}^s_{div}(\varOmega )&:=\{u\in \mathbf{H}^s(\varOmega ): {div}(u)=0\}. \end{aligned} \end{aligned}$$

(15)

The dual Sobolev space $H^{-s}(\varOmega )$ consists of linear functional on $f\in H^{-s}(\varOmega )$ cand an be identified with a distribution (cf. [27] Sect. 1.1.15)

$$\begin{aligned} f=g+\sum _{|\alpha |=s}(-1)^s\mathcal {D}^\alpha g_{\alpha },\qquad \text {with }g,g_\alpha \in L^2(\varOmega ), \end{aligned}$$

(16)

and the dual pairing between $f\in {H}^{-s}(\varOmega ),h\in {H}^{s}(\varOmega )$ can be expressed by

$$\begin{aligned} \big \langle f,h\big \rangle _{{H}^{-s}}:=\int _\varOmega gh\, dx+\int _\varOmega \sum _{|\alpha |=s}(-1)^s g_{\alpha }\mathcal {D}^\alpha h\,dx. \end{aligned}$$

(17)

Functions $\theta \in H^s(\varOmega )$ and $\mathbf{u}\in \mathbf {H}^s(\varOmega )$ or $\mathbf{u}\in \mathbf {H}^s_{div}(\varOmega )$ can be decomposed into an orthonormal basis of eigenfunctions $w_n(\mathbf{x}):=e^{\frac{2\pi i \mathbf{n}\cdot \mathbf{x}}{L}}$ of the Laplace operator such that (see e.g. [10])

$$\begin{aligned} \mathbf{u}(\mathbf{x})=\sum _{n\in \mathbb {Z}^2{\setminus }\{0\}}^\infty \hat{\mathbf{u}}_n w_n(\mathbf{x}),\ \text { and }\ \theta (\mathbf{x})=\sum _{n\in \mathbb {Z}^2{\setminus }\{0\}}^\infty \hat{\theta }_n w_n(\mathbf{x}). \end{aligned}$$

(18)

An equivalent characterization for $\theta \in H^s(\varOmega )$ in terms of Fourier coefficients that is valid for integer exponents $s\in \mathbb {Z}$, i.e. including the dual space of $H^s(\varOmega )$, reads as follows

$$\begin{aligned} \theta \in H^s(\varOmega ) \quad \Leftrightarrow \quad \sum _{n\in \mathbb {Z}^2{\setminus }\{0\}} |n|^{2s} |\hat{\theta }_n|^2<\infty . \end{aligned}$$

(19)

A scalar product in $H^s(\varOmega )$ in terms of the Laplacian that is equivalent to (12), is given by

$$\begin{aligned} \big \langle f,g\big \rangle _{{H}^{s}}=\int _\varOmega \sum _{|\alpha |=s/2} f\triangle ^{\alpha } g\,dx, \end{aligned}$$

(20)

and for the dual pairing between $f\in {H}^{-s}(\varOmega ),h\in {H}^{s}(\varOmega )$ in (17)

$$\begin{aligned} \big \langle f,h\big \rangle _{{H}^{-s}}=\int _\varOmega \sum _{|\alpha |=s/2} g_{\alpha }\triangle ^{\alpha } h\,dx. \end{aligned}$$

(21)

With this preparations we define now for $\mathbf{s}=(s_{u^a},s_{\theta ^a},s_{u^o},s_{\theta ^o})\in \mathbb {Z}^4$ the Sobolev space of state vectors by

$$\begin{aligned} \mathcal {H}^\mathbf{s}(\varOmega ):= \mathbf {H}^{s_{u^a}}(\varOmega )\times H^{s_{\theta ^a}}(\varOmega ) \times \mathbf {H}^{s_{u^o}}_{div}(\varOmega )\times H^{s_{\theta ^o}}(\varOmega ), \end{aligned}$$

(22)

with the norm of $\psi =(\mathbf{u}^a,\theta ^a,\mathbf{u}^o,\theta ^o)\in \mathcal {H}^{\mathbf{s}}$ given by

$$\begin{aligned} ||\psi ||_{\mathcal {H}^\mathbf{s}}:= (||\mathbf{u}^a||_{\mathbf {H}^{s_{\theta ^a}}}^2 + ||\theta ^a||_{H^{s_{\theta ^a}}}^2+ ||\mathbf{u}^o||_{\mathbf {H}^{s_{\theta ^0}}}^2 + ||\theta ^o||_{ H^{s_{\theta ^o}}}^2)^{1/2}. \end{aligned}$$

(23)

The notation $\mathbf{s}\le \mathbf{t}$ for two multi-indices $\mathbf{s}=(s_{u^a},s_{\theta ^a},s_{u^o},s_{\theta ^o}), \mathbf{t}=(t_{u^a},t_{\theta ^a},t_{u^o},t_{\theta ^o})$ is to be understood as relation between the components, i.e. $s_i\le t_i$ for all i. Similarly are expressions such as $\mathbf{s}+1$ are defined as $s_i+1$ for all components i. We use an analogous notation for the Lebesgue spaces and denote by $L^2, \mathbf{L}^2, \mathcal {L}^2$ the sets of square-integrable scalar functions, vector fields and state vectors, respectively.

Lemma 1

(Calculus Inequality, [19]) Let $s\in \mathbb {Z}_+\cup \{0\}$. Assume $f,g\in H^s(\varOmega )\cap L^\infty (\varOmega )$. Then for any multi-index $\alpha \in \mathbb {Z}^2_+$, $|\alpha |\le s$, we have

$$\begin{aligned} \begin{aligned} \mathrm{i)}&\ ||\mathcal {D}^\alpha (fg)||_{L^2}\le C_s\big (||f||_{L^\infty }||\mathcal {D}^sg||_{L^2} +||g||_{L^\infty }||\mathcal {D}^sf||_{L^2} \big ),\\ \mathrm{ii)}&\ ||\mathcal {D}^\alpha (fg)-f\mathcal {D}^\alpha g||_{L^2}\le C_s\big (||\nabla f||_{L^\infty }||\mathcal {D}^{s-1}g||_{L^2} +||g||_{L^\infty }||\mathcal {D}^sf||_{L^2} \big ). \end{aligned} \end{aligned}$$

Lemma 2

(Sobolev Inequality, [26, 27]) Let $s\in \mathbb {Z}_+$ with $s\ge 2, k\in \mathbb {Z}_+\cup \{0\}$. Then $H^{s+k}(\varOmega )\subseteq C^k(\varOmega )$, and there exists a constant $K_s$ such that for all $f\in H^{s+k}(\varOmega )$

$$\begin{aligned} \max _{x\in \varOmega }\sum _{ |\alpha |\le k}|\mathcal {D}^\alpha f(x)|\le K_s||f||_{H^{s+k}}. \end{aligned}$$

Lemma 3

(Interpolation inequality, [2]) Let $s\in \mathbb {Z}_+$. There exists a constant $C_s$ such that for all $f\in H^{s}(\varOmega )$ and for $0<s'<s$

$$\begin{aligned} ||f||_{H^{s'}}\le C_s||f||_{L^2}^{1-s'/s}||f||_{H^s}^{s'/s} \end{aligned}$$

Lemma 4

(Agmon’s inequality, [10]) There exists a constant $C_A>0$ such that for all $f\in H^2(\varOmega )\cap H^1_0(\varOmega )$

$$\begin{aligned} ||f||_{L^\infty (\varOmega )}\le C_A ||f||_{H^1(\varOmega )}^{1/2} ||f||_{H^2(\varOmega )}^{1/2}. \end{aligned}$$

(24)

Lemma 5

(Poincaré inequality, [10]) There exists a constant $C_P>0$ such that for all $f\in H^1(\varOmega )$.

$$\begin{aligned} ||f||_{L^2}\le C_P ||\nabla f||_{L^2}. \end{aligned}$$

(25)

Lemma 6

(Gronwall inequality, [12]) Let f, g and h be non-negative functions in $L^1_{loc}(T,\mathbb {R}$. Assume that f is absolutely continuous on T and that for almost every $t\in T$

$$\begin{aligned} \frac{df}{dt}\le gf+h. \end{aligned}$$

Then $f\in L^\infty _{loc}(T,\mathbb {R})$ and

$$\begin{aligned} f(t)\le f(0)exp\big (\int _0^tg(s)\, ds\big ) +\int _0^t h(s)exp\big (\int _s^t g(y)\, dy\big )\, ds. \end{aligned}$$

Young inequality:$a b\le \frac{1}{2\epsilon }a^2 + \frac{\epsilon }{2}b^2$ for $a,b\in \mathbb {R}$ and $\epsilon >0$.

General Assumption on Covariance Operators: The error covariance operators $\mathcal {B}, \mathcal {R}$ preserve the space of their respective arguments, i.e.

$$\begin{aligned} \text { if } \xi \in \mathcal {H}^{\mathbf{s}} \text { then } \mathcal {B}\xi \in \mathcal {H}^{\mathbf{s}}, \text { and }\mathcal {R}\xi \in \mathcal {H}^{\mathbf{s}}. \end{aligned}$$

(26)

4 Mathematical Analysis of the Coupled Model

In this section we provide the regularity results of the coupled model Eqs. (1)–(5) and of the associated linearized and adjoint equations. This enables us to prove the differentiability of the model solution with respect to the initial conditions (Lemma 7). These results are instrumental for the formulation of the data assimilation cost functional in Sect. 5.

4.1 Well-Posedness of the Coupled Equations

Definition 1

Let $\mathbf{s}\in \mathbb {Z}^4$. A state vector $\psi :=(\mathbf{u}^a,\theta ^a,\mathbf{u}^o,\theta ^o)$ is said to be a regular solution of(1)–(5) (or equivalently of (6)) on the time interval $T:=[t_0,t_1]$ if it satisfies(1)–(5) on $\varOmega \times T$ with initial condition $\psi (t_0)=\psi _0$ and if

$$\begin{aligned} \begin{aligned} \psi \in C(T,\mathcal {H}^\mathbf{s}(\varOmega ))\cap C^1 (T,\mathcal {H}^{\mathbf{s}-2}(\varOmega )). \end{aligned} \end{aligned}$$

(27)

The following theorem is the main result of this section.

Theorem 1

(Well-posedness of coupled model) Let the time interval $T:=[t_0,t_1]$ be given and let the initial condition of the coupled Eqs. (1)–(5) satisfy $\psi _0\in \mathcal {H}^\mathbf{s}(\varOmega )$ with $\mathbf{s}=(s_{u^a},s_{\theta ^a},s_{u^o},s_{\theta ^o})\in \mathbb {Z}_+^4$ such that all components of $\mathbf{s}$ are greater or equal than 3. Suppose the coupling functions satisfy $\sigma \in C(T,\mathcal {H}^{s_u^o}(\varOmega )),\, \gamma \in C(T,\mathcal {H}^{s_\theta ^a}(\varOmega ))$. Then there exists on the time interval T a regular solution $\psi $ of (1)–(5) in the sense of Definition 1. The solution is unique and depends continuously on the initial condition.

Proof

Step 1: Existence and Uniqueness of Galerkin Approximation.

The Galerkin approximation to the coupled Eqs. (1)–(5) reads as follows

$$\begin{aligned} \begin{aligned}&\frac{\partial \mathbf{u}_m^a}{\partial t} +P_m\left[ (\mathbf{u}^a_m\cdot \nabla )\mathbf{u}^a_m\right] +\frac{1}{Ro^a}\mathbf{u}^{a\bot }_m +\frac{1}{Ro^a}\nabla \theta ^a_m = \frac{1}{Re^a}\triangle \mathbf{u}^a_m,\\&\frac{\partial \tilde{Fr}^a\theta ^a_m}{\partial t} +\frac{1}{Ro^a}{div}(\mathbf{u}^a_m) = - \gamma \theta ^o_m +\frac{1}{Pe^a}\triangle \theta ^a_m,\\&\frac{\partial \mathbf{u}_m^o}{\partial t} +P_m[(\mathbf{u}_m^o\cdot \nabla )\mathbf{u}_m^o] +\frac{1}{Ro^o}\mathbf{u}_m^{o\bot } +\frac{1}{Ro^o}\nabla p_m^o = \sigma \mathbf{u}^a_m +\frac{1}{Re^o}\triangle \mathbf{u}^o_m,\\&\frac{\partial \theta _m^o}{\partial t} +P_m\left[ (\mathbf{u}_m^o\cdot \nabla )\theta _m^o\right] =\frac{1}{Pe^o}\triangle \theta _m^o, \end{aligned} \end{aligned}$$

(28)

here the Galerkin projections $\mathbf{u}_m^a,\theta _m^a$ for the atmospheric component are given as truncation of the expansions (18)

$$\begin{aligned} \begin{aligned}&\mathbf{u}_m^a(x,t):=P_m\mathbf{u}^a(x,t):=\sum _{n\in \{\mathbb {Z}^2{\setminus } \{ 0\}, |n|\le m\}} \hat{\mathbf{u}}^a_n(t) w_n(x)\\ \text { and }&\ \theta _m^a(x,t):=P_m\theta ^a(x,t):=\sum _{n\in \{\mathbb {Z}^2{\setminus } \{ 0\}, |n|\le m\}} \hat{\theta }_n(t) w_n(x), \end{aligned} \end{aligned}$$

(29)

with $w_n$ given by (4). The Galerkin system (28) has periodic boundary conditions and its initial condition is given by

$$\begin{aligned} \mathbf{u}^a_m(t_0)=\mathbf{u}^a_0,\quad \theta ^a_m(t_0)=\theta ^a_0,\quad U^o_m(t_0)=\mathbf{u}^o_0,\quad \theta ^o_m(t_0)=\theta ^o_0. \end{aligned}$$

(30)

For the oceanic component an analogous expansion holds. The corresponding operator equation for the state vector $\psi _m:=(\mathbf{u}_m^a,\theta _m^a,\mathbf{u}_m^o,\theta _m^o)$ is obtained by replacing $\psi $ in (6) by $\psi _m$, with initial condition $\psi (t_0)=(\mathbf{u}^a_m(t_0), \theta ^a_m(t_0), \mathbf{u}^o_m(t_0), \theta ^o_m(t_0))$.

The system (28) is a ordinary differential equation system with constant coefficients and a quadratic nonlinearity. By standard arguments one can show that this system is a Lipschitz continuous mapping from $\mathcal {H}^{\mathbf{s}}_m$ into itself and then it follows from the Picard Theorem that a unique solution $\psi _m\in C^1([t_0,t_1^m], \mathcal {H}^{\mathbf{s}})$ of (28) exists on time intervals $[t_0^m,t_1^m]$ that depend on m.

Step 2:$H^s$-Estimate Applying the derivative operator $\mathcal {D}^\alpha $ to the Galerkin system (28) yields for the atmospheric component

$$\begin{aligned} \begin{aligned}&\frac{\partial \mathcal {D}^\alpha \mathbf{u}^a_m}{\partial t}+(\mathbf{u}^a_m\cdot \nabla )\mathcal {D}^\alpha \mathbf{u}^a_m +\frac{1}{Ro^a}\mathcal {D}^\alpha \mathbf{u}^{a\bot }_m +\frac{1}{Ro^a}\nabla \mathcal {D}^\alpha \theta ^a_m = \frac{1}{Re^a}\triangle \mathcal {D}^\alpha \mathbf{u}^a_m +\mathcal {G}^a_\mathbf{u}(\alpha ),\\&\tilde{Fr}^a\frac{\partial \mathcal {D}^\alpha \theta ^a_m}{\partial t} +\frac{1}{Ro^a}{div}(\mathcal {D}^\alpha \mathbf{u}^a_m) = -\mathcal {D}^\alpha (\gamma \theta ^o_m) +\mathcal {G}^a_\theta (\alpha ) +\frac{1}{Pe^a}\triangle \mathcal {D}^\alpha \theta ^a_m, \end{aligned} \end{aligned}$$

(31)

where the nonlinear terms on the right-hand side are defined by

$$\begin{aligned} \begin{aligned}&\mathcal {G}^a_\mathbf{u}(\alpha ):=\mathbf{u}^a_m\cdot \nabla \mathcal {D}^\alpha \mathbf{u}^a_m-\mathcal {D}^\alpha [\mathbf{u}^a_m\cdot \nabla \mathbf{u}^a_m],\\&\mathcal {G}^a_\theta (\alpha ):=\mathbf{u}^a_m\cdot \mathcal {D}^\alpha \nabla \theta ^a_m-\mathcal {D}^\alpha [\mathbf{u}^a_m\cdot \nabla \theta ^a_m]. \end{aligned} \end{aligned}$$

Taking the $L^2$-inner product of (31) with $(\mathcal {D}^\alpha \mathbf{u}^a_m,\mathcal {D}^\alpha \theta ^a_m)$, integrating by parts and adding the two equations yields

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\frac{d}{dt}\big \{||\mathcal {D}^\alpha \mathbf{u}^a_m||_{\mathbf{L}^2}^2+\tilde{Fr}^a||\mathcal {D}^\alpha \theta ^a_m||_{L^2}^2\big \} +\frac{1}{Re^a}||\mathcal {D}^\alpha \nabla \mathbf{u}^a_m||_{\mathbf{L}^2}^2+\frac{1}{Pe^a}||\mathcal {D}^\alpha \nabla \theta ^a_m||_{L^2}^2\\&\quad = -\int _{\varOmega }\mathbf{u}^a_m\cdot \nabla \frac{|\mathcal {D}^\alpha \mathbf{u}^a_m|^2}{2}\, dx -\int _{\varOmega }\frac{1}{Ro^a}\nabla \mathcal {D}^\alpha \theta ^a_m\cdot \mathcal {D}^\alpha \mathbf{u}^a_m\, dx\\&\qquad -\int _{\varOmega }\frac{1}{Ro^a}{div}(\mathcal {D}^\alpha \mathbf{u}^a_m)\cdot \mathcal {D}^\alpha \theta ^a_m\,dx\\&\qquad -\int _\varOmega \mathcal {D}^\alpha (\gamma \theta ^a_m)\cdot \mathcal {D}^\alpha \theta ^o_m\,dx +\int _\varOmega \mathcal {G}^a_\mathbf{u}(\alpha )\cdot \mathcal {D}^\alpha \mathbf{u}^a_m\, dx +\int _\varOmega \mathcal {G}^a_\theta (\alpha )\cdot \mathcal {D}^\alpha \theta ^a_m\, dx, \end{aligned} \end{aligned}$$

(32)

where the Coriolis term vanishes due to $\big \langle \mathcal {D}^\alpha \mathbf{u}^{a\bot }_m\mathcal {D}^\alpha \mathbf{u}^a_m\big \rangle _{L^2}=0 $. For the coupling term we derive with the inequalities of Cauchy–Schwarz, Poincaré and with Lemma 2

$$\begin{aligned} \begin{aligned}&\left| \int _\varOmega \mathcal {D}^\alpha (\gamma \theta ^a_m)\cdot \mathcal {D}^\alpha \theta ^o_m\,dx\right| \le \int _\varOmega |\gamma \mathcal {D}^\alpha \theta ^a_m\cdot \mathcal {D}^\alpha \theta ^o_m|\, dx +\int _\varOmega |\theta ^a_m\mathcal {D}^\alpha \gamma \cdot \mathcal {D}^\alpha \theta ^o_m|\,dx\\&\quad \le K_sC_P||\gamma ||_{H^{s_\theta ^a}}||\mathcal {D}^\alpha \theta ^o_m||_{L^2}||\mathcal {D}^\alpha \theta ^a_m||_{L^2}. \end{aligned} \end{aligned}$$

(33)

After integrating the second term on the right hand side of (32) by parts and invoking the periodic boundary condition, this term cancels with the third term on the right hand side and we obtain with (33) and the Young inequality the following estimate

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\frac{d}{dt}\big \{||\mathcal {D}^\alpha \mathbf{u}^a_m||_{\mathbf{L}^2}^2+\tilde{Fr}^a||\mathcal {D}^\alpha \theta ^a_m||_{L^2}^2\big \} +\frac{1}{Re^a}||\mathcal {D}^\alpha \nabla \mathbf{u}^a_m||_{\mathbf{L}^2}^2+\frac{1}{Pe^a}||\mathcal {D}^\alpha \nabla \theta ^a_m||_{L^2}^2\\&\quad \le |\int _{\varOmega }div(\mathbf{u}^a_m)\frac{|\mathcal {D}^\alpha \mathbf{u}^a_m|^2}{2}\, dx| +\frac{1}{2Pe^a}||\mathcal {D}^\alpha \theta ^a_m||_{L^2}^2 +\frac{K_s^2C_P^2Pe^a||\gamma ||_{H^{s_\theta ^a}}^2}{2}||\mathcal {D}^\alpha \theta ^o_m||_{L^2}^2\\&\qquad +||\mathcal {G}^a_\mathbf{u}(\alpha )||_{\mathbf{L}^2}||\mathcal {D}^\alpha \mathbf{u}^a_m||_{\mathbf{L}^2} +||\mathcal {G}^a_\theta (\alpha )||_{L^2}||\mathcal {D}^\alpha \theta ^a_m||_{L^2}. \end{aligned} \end{aligned}$$

(34)

For the nonlinear forcing terms we obtain with Lemma 1 for all $\alpha \in \mathbb {Z}^2_+$ with $|\alpha |\le s_{u^a}$

$$\begin{aligned} \begin{aligned} ||\mathcal {G}^a_\mathbf{u}(\alpha )||_{\mathbf{L}^2}||\mathcal {D}^\alpha \mathbf{u}^a_m||_{\mathbf{L}^2} \le C_s||\nabla \mathbf{u}^a_m||_{\mathbf{L}^\infty }||\mathbf{u}^a_m||_{\mathbf{H}^{s_{u^a}}}||\mathcal {D}^\alpha \mathbf{u}^a_m||_{\mathbf{L}^2} , \end{aligned} \end{aligned}$$

(35)

and

$$\begin{aligned} \begin{aligned}&||\mathcal {G}^a_\theta (\alpha )||_{L^2}||\mathcal {D}^\alpha \theta ^a_m||_{L^2}\\&\quad \le C_s(||\nabla \mathbf{u}^a_m||_{\mathbf{L}^\infty }||\mathcal {D}^{s-1}\nabla \theta ^a_m||_{L^2} +||\nabla \theta ^a_m||_{L^\infty }||\mathcal {D}^s\mathbf{u}^a_m||_{\mathbf{L}^2})||\mathcal {D}^\alpha \theta ^a_m||_{L^2}. \end{aligned} \end{aligned}$$

(36)

From (35), (36) we obtain for (34)

$$\begin{aligned} \begin{aligned}&\frac{d}{dt}\big \{||\mathcal {D}^\alpha \mathbf{u}^a_m||_{\mathbf{L}^2}^2+\tilde{Fr}^a||\mathcal {D}^\alpha \theta ^a_m||_{L^2}^2\big \} +\frac{1}{Re^a}||\mathcal {D}^\alpha \nabla \mathbf{u}^a_m||_{\mathbf{L}^2}^2+\frac{1}{Pe^a}||\mathcal {D}^\alpha \nabla \theta ^a_m||_{L^2}^2\\&\quad \le 2||{div}(u^a_m)||_{\mathbf{L}^\infty }||\mathcal {D}^\alpha u^a_m||_{\mathbf{L}^2}^2 +K_s^2C_P^2Pe^a||\gamma ||_{H^{s_\theta ^a}}^2||\mathcal {D}^\alpha \theta ^o_m||_{L^2}^2\\&\qquad + 2C_s\big ( ||\nabla \mathbf{u}^a_m||_{\mathbf{L}^\infty }+||\nabla \theta ^a_m||_{L^\infty }\big ) \big ( ||\mathcal {D}^s\mathbf{u}^a_m||_{\mathbf{L}^2}||\mathcal {D}^\alpha \mathbf{u}^a_m||_{\mathbf{L}^2} +||\mathcal {D}^s\theta ^a_m||_{\mathbf{L}^2}||\mathcal {D}^\alpha \theta ^a_m||_{\mathbf{L}^2}\\&\qquad +||\mathcal {D}^s\mathbf{u}^a_m||_{\mathbf{L}^2}||\mathcal {D}^\alpha \theta ^a_m||_{\mathbf{L}^2} \big ). \end{aligned} \end{aligned}$$

(37)

We sum over all derivatives $\mathcal {D}^\alpha $ such that the degree $|\alpha |$ of any derivative is less or equal to the degree of the corresponding $\mathbf{s}$-component. This implies with the Young inequality the following upper bound for the atmospheric state $\psi ^a=(\mathbf{u}^a,\theta ^a)$

$$\begin{aligned} \begin{aligned} \frac{d}{dt}||\psi ^a_m||_{\mathcal {H}^{\mathbf{s}}}^2 +\frac{1}{Re^a}||\nabla \mathbf{u}^a_m||_{\mathbf{H}^{s_u^a}}^2+\frac{1}{Pe^a}||\nabla \theta ^a_m||_{H^{s_\theta ^a}}^2&\le C||div\psi ^a_m||_{\mathcal {L}^\infty } ||\psi ^a_m||_{\mathcal {H}^{\mathbf{s}}}^2\\&\quad +CK_s^2C_P^2Pe^a||\gamma ||_{H^{s_\theta ^a}}^2||\theta ^o_m||_{H^s}^2. \end{aligned} \end{aligned}$$

(38)

For the oceanic state we proceed analogously to the atmosphere. After applying $\mathcal {D}^\alpha $ to (28) we take the $L^2$-inner product with $(\mathcal {D}^\alpha \mathbf{u}^o_m,\mathcal {D}^\alpha \theta ^o_m)$, integrate by parts and add the two equations. This yields

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\frac{d}{dt}\big \{||\mathcal {D}^\alpha \mathbf{u}^o_m||_{\mathbf{L}^2}^2+||\mathcal {D}^\alpha \theta ^o_m||_{L^2}^2\big \} +\frac{1}{Re^o}||\mathcal {D}^\alpha \nabla \mathbf{u}^o_m||_{\mathbf{L}^2}^2+\frac{1}{Pe^o}||\mathcal {D}^\alpha \nabla \theta ^o_m||_{L^2}^2\\&\quad = -\int _{\varOmega }\mathbf{u}^o_m\cdot \nabla \frac{|\mathcal {D}^\alpha \mathbf{u}^o_m|^2}{2}\, dx -\int _{\varOmega }\mathbf{u}^o_m\cdot \nabla \frac{|\mathcal {D}^\alpha \theta ^o_m|^2}{2}\,dx +\int _{\varOmega }\sigma \mathcal {D}^\alpha \mathbf{u}^a_m\cdot \mathcal {D}^\alpha \mathbf{u}^o_m\, dx \\&\qquad +\int _\varOmega \mathcal {G}^o_\mathbf{u}(\alpha )\cdot \mathcal {D}^\alpha \mathbf{u}^a_m\, dx +\int _\varOmega \mathcal {G}^a_\theta (\alpha )\cdot \mathcal {D}^\alpha \theta ^a_m\, dx\\&\quad \le \int _\varOmega \mathcal {G}^o_\mathbf{u}(\alpha )\cdot \mathcal {D}^\alpha \mathbf{u}^a_m\, dx +\int _\varOmega \mathcal {G}^a_\theta (\alpha )\cdot \mathcal {D}^\alpha \theta ^a_m\, dx +K_sC_P||\sigma ||_{H^{s_u^o}}||\mathcal {D}^\alpha \mathbf{u}^a_m||_{L^2}||\mathcal {D}^\alpha \mathbf{u}^o_m||_{\mathbf{L}^2}, \end{aligned} \end{aligned}$$

(39)

where the pressure and the two gradient terms in the second line vanish after integration by part due to incompressibility and the periodic boundary conditions. The coupling term in the velocity equations has been treated analogously to (33) with the temperature variable $\theta ^a_m$ replaced by the velocity variable $\mathbf{u}^o_m$

$$\begin{aligned} \begin{aligned}&\left| \int _\varOmega \mathcal {D}^\alpha (\sigma \mathbf{u}^a_m)\cdot \mathcal {D}^\alpha \mathbf{u}^o_m\,dx\right| \le \int _\varOmega |\sigma \mathcal {D}^\alpha \mathbf{u}^a_m\cdot \mathcal {D}^\alpha \mathbf{u}^o_m|\, dx +\int _\varOmega |\mathbf{u}^a_m\mathcal {D}^\alpha \sigma \cdot \mathcal {D}^\alpha \mathbf{u}^o_m|\,dx\\&\le K_sC_P||\sigma ||_{H^{s_u^o}}||\mathcal {D}^\alpha \mathbf{u}^o_m||_{\mathbf{L}^2}||\mathcal {D}^\alpha \mathbf{u}^a_m||_{\mathbf{L}^2}. \end{aligned} \end{aligned}$$

(40)

With the estimates (35) and (36) for $\mathcal {G}^o_\mathbf{u}(\alpha )$ and $\mathcal {G}^o_\theta (\alpha )$ respectively we arrive analogously to the atmospheric estimate at the following inequality

$$\begin{aligned} \begin{aligned}&\frac{d}{dt}||\psi ^o_m||_{\mathcal {H}^{\mathbf{s}}}^2 +\frac{1}{Re^o}||\nabla \mathbf{u}^o_m||_{\mathbf{H}^{s_u^o}}^2+\frac{1}{Pe^o}||\nabla \theta ^o_m||_{H^{s_\theta ^o}}^2\\&\quad \le C ||div\psi ^o_m||_{\mathcal {L}^\infty }||\psi ^o_m||_{\mathcal {H}^{\mathbf{s}}}^2 +CK_s^2C_P^2 Re^o||\sigma ||_{H^{s_u^o}}^2||\mathbf{u}^a_m||_{\mathbf{H}^{s_u^a}}^2. \end{aligned} \end{aligned}$$

(41)

Adding (38) and (41) results in

$$\begin{aligned} \begin{aligned}&\frac{d}{dt}||\psi _m||_{\mathcal {H}^{\mathbf{s}}}^2 +\frac{1}{R}(||\nabla \mathbf{u}_m^a||_{\mathbf{H}^{s_u^a}}^2+||\nabla \mathbf{u}_m^o||_{\mathbf{H}^{s_u^o}}^2)+\frac{1}{P}(||\nabla \theta _m^a||_{H^{s_\theta ^a}}^2+||\nabla \theta _m^o||_{H^{s_\theta ^o}}^2)\\&\quad \le C||div\psi _m||_{\mathcal {L}^\infty }||\psi _m||_{\mathcal {H}^{\mathbf{s}}}^2 +CK_s^2C_P^2(||\gamma ||_{H^{s_\theta ^a}}^2Pe^a+||\sigma ||_{H^{s_u^o}}^2Re^o)||\psi _m||_{\mathcal {H}^{\mathbf{s}}}^2, \end{aligned} \end{aligned}$$

(42)

where $\frac{1}{R}:=\min \{\frac{1}{Re^a},\frac{1}{Re^o}\}$, $\frac{1}{P}:=\min \{\frac{1}{Pe^a}, \frac{1}{Pe^o}\}$. The inequality is still true if we neglect the positive gradient-terms on the left hand side

$$\begin{aligned} \begin{aligned}&\frac{d}{dt}||\psi _m||_{\mathcal {H}^{\mathbf{s}}}^2 \le C||div\psi _m||_{\mathcal {L}^\infty } ||\psi _m||_{\mathcal {H}^{\mathbf{s}}}^2 +CK_s^2C_P^2(||\gamma ||_{H^{s_\theta ^a}}^2Pe^a+||\sigma ||_{H^{s_u^o}}^2Re^o)||\psi _m||_{\mathcal {H}^{\mathbf{s}}}^2. \end{aligned} \end{aligned}$$

(43)

We now make use of our assumption that all components of the state vector belong to a Sobolev space $\mathcal {H}^{\mathbf{s}}$ and that all component of $\mathbf{s}$ are greater or equal to 3. This allows to apply Lemma 2 (with $s=2,\ k=1$) to the divergence term in (42) and it follows

$$\begin{aligned} \begin{aligned}&\frac{d}{dt}||\psi _m||_{\mathcal {H}^{\mathbf{s}}}^2 \le C_s ||\psi _m||_{\mathcal {H}^{\mathbf{s}}}^3 +CK_s^2C_P^2\left( ||\gamma ||_{H^{s_\theta ^a}}^2Pe^a+||\sigma ||_{H^{s_u^o}}^2Re^o\right) ||\psi _m||_{\mathcal {H}^{\mathbf{s}}}^2, \end{aligned} \end{aligned}$$

(44)

where the constant $C_s$ from Lemma 2 does depend on $\mathbf{s}$ but not on m. Hence

$$\begin{aligned} \begin{aligned}&\frac{d}{dt}||\psi _m||_{\mathcal {H}^{\mathbf{s}}} \le C_s ||\psi _m||_{\mathcal {H}^{\mathbf{s}}}^2 +K_s\left( ||\gamma ||_{H^{s_\theta ^a}}^2Pe^a+||\sigma ||_{H^{s_u^o}}^2Re^o\right) ||\psi _m||_{\mathcal {H}^{\mathbf{s}}}, \end{aligned} \end{aligned}$$

and with the Young inequality

$$\begin{aligned} \begin{aligned} \frac{d }{dt}||\psi _m||_{\mathcal {H}^{\mathbf{s}}}&\le (C_s+1)||\psi _m||_{\mathcal {H}^{\mathbf{s}}}^2+ M, \end{aligned} \end{aligned}$$

(45)

with $M:=CK_s^2C_P^2(||\gamma ||_{H^{s_\theta ^a}}^2Pe^a+||\sigma ||_{H^{s_u^o}}^2Re^o)$. Integrating this from $t_0$ to $t_1$ yields^{Footnote 2}

$$\begin{aligned} \begin{aligned} \arctan \sqrt{\frac{(C_s+1)}{M}}||\psi _m(t_1)||_{\mathcal {H}^{\mathbf{s}}} - \arctan \sqrt{\frac{(C_s+1)}{M}}||\psi _m(t_0)||_{\mathcal {H}^{\mathbf{s}}} \le \sqrt{(C_s+1)M}. \end{aligned} \end{aligned}$$

(46)

We chose now $t_1>t_0$ such that the following condition is satisfied

$$\begin{aligned} \begin{aligned} \sqrt{C_sM}t_1\le \frac{\pi }{2}-\arctan \sqrt{\frac{C_s}{M}}||\psi _0||_{\mathcal {H}^{\mathbf{s}}}. \end{aligned} \end{aligned}$$

(47)

A $t_1$ with this property exist, because the right-hand-side of (47) is positive. From (46) follows

$$\begin{aligned} \begin{aligned} \sqrt{\frac{(C_s+1)}{M}}||\psi _m(t_1)||_{\mathcal {H}^{\mathbf{s}}} \le \tan \bigg \{\sqrt{\frac{(C_s+1)}{M}}t_1 +\arctan \sqrt{\frac{(C_s+1)}{M}}||\psi _m(t_0)||_{\mathcal {H}^{\mathbf{s}}}\bigg \}. \end{aligned}\end{aligned}$$

(48)

With (30) follows

$$\begin{aligned} \begin{aligned} \sqrt{\frac{(C_s+1)}{M}}||\psi _m(t_1)||_{\mathcal {H}^{\mathbf{s}}} \le \tan \bigg \{\sqrt{\frac{(C_s+1)}{M}}t_1 +\arctan \sqrt{\frac{(C_s+1)}{M}}||\psi _0||_{\mathcal {H}^{\mathbf{s}}}\bigg \}. \end{aligned} \end{aligned}$$

(49)

Since the upper bound is independent on m, we have thus proven that the sequence $(\psi _m)_m$ is uniformly bounded in $L^\infty ([t_0,t_1],\mathcal {H}^{\mathbf{s}})$, where the endpoint $t_1$ satisfies (47). It follows that for each m the solutions $\psi _m$ of the Galerkin system do have a joint interval of existence $[t_0,t_1]$. The boundedness of $(\psi _m)_m$ establishes the existence of a subsequence $(\psi _k)_k$ that converges weakly in $L^2(T;\mathcal {H}^{\mathbf{s}})$ to a $\psi \in L^2(T;\mathcal {H}^{\mathbf{s}})$. According to (49) it holds that $\psi \in L^\infty (T;\mathcal {H}^{\mathbf{s}})$.

Step 3: Estimate on the time derivative.

The uniform boundedness of $(\psi _m)_m$ in $L^2(T;\mathcal {H}^{\mathbf{s}})$ implies together with (45) that $(\frac{d\psi _m}{dt})_m$ is uniformly bounded in $L^2(T;\mathcal {H}^{\mathbf{s}})$. The sequence $(\frac{d\psi _m}{dt})_m$ is in particular uniformly bounded in $L^2(T,\mathcal {L}^2)$ and in $L^2(T;\mathcal {H}^{-s})$. The last fact follows from the continuous injection $\mathcal {H}^{\mathbf{s}}\subseteq \mathcal {L}^2\subset \mathcal {H}^{-s}$.

Step 4:The limit satisfies$\psi \in C(T,\mathcal {H}^{\mathbf{s}})$.

The boundedness of $(\psi _m)_m$ in $L^2(T;\mathcal {H}^{\mathbf{s}})$ and of $(\frac{d\psi _m}{dt})_m$ in $L^2(T,\mathcal {L}^2)$ implies with the Aubin compactness theorem (cf. [10], Lemma 8.2) that a subsequence $(\psi _{k})_k$ of $(\psi _m)_m$ exists that converges strongly in $L^2(T,\mathcal {L}^2)$ and weakly in $L^2(T;\mathcal {H}^{\mathbf{s}})$ to the limit $\psi \in L^2(T,\mathcal {H}^{\mathbf{s}})$. We consider the $(\psi _k-\psi )$, and denote an arbitrary component of this difference by $(f_k^{(i)}-f^{(i)})$, with $i=1\ldots 4$. From Lemma 3 applied to the components $(f_k^{(i)}-f^{(i)})$ of the difference $(\psi _k-\psi )$ follows that for all $s_i' < s_i$ and $t\in T$

$$\begin{aligned} ||f_k^{(i)}(t)-f^{(i)}(t)||_{\mathcal {H}^{{ s_i}'}}\le C_s ||f_k^{(i)}(t)-f^{(i)}(t)||_{\mathcal {L}^2}^{1-s'_i/s_i} ||f_k^{(i)}(t)-f^{(i)}(t)||_{\mathcal {H}^{{s_i}}}^{s_i'/s_i}. \end{aligned}$$

(50)

From (50) we derive with the convergence of $(\psi _k)_k$ in $L^2(T,\mathcal {L}^2)$ and with the boundedness in $ L^2(T,\mathcal {H}^{\mathbf{s}})$ the (strong) convergence of $(\psi _k)_k$ in $C(T;\mathcal {H}^{\mathbf{s}'})$ for all $\mathbf{s}'<\mathbf{s}$. We now show that $\psi \in C(T;\mathcal {H}^\mathbf{s})$. The strong convergence in $C(T;\mathcal {H}^{\mathbf{s}'})$ and the density of $\mathcal {H}^{-\mathbf{s}'}$ in $\mathcal {H}^{-\mathbf{s}}$ for $\mathbf{s}'<s$ imply for all $\phi \in \mathcal {H}^{-\mathbf{s}'}$ that

$$\begin{aligned} \lim _{k\rightarrow \infty }\big \langle \psi _k(\cdot ,t),\phi \big \rangle _{\mathcal {L}^2}=\big \langle \psi (\cdot ,t),\phi \big \rangle _{\mathcal {L}^2}. \end{aligned}$$

(51)

This proves continuity in the weak sense, i.e. $\psi \in C_w(T;\mathcal {H}^\mathbf{s})$. The weak continuity implies for $\tau \in [t_0,t_1]$

$$\begin{aligned} \lim _{\tau \rightarrow t_0+}\inf ||\psi (\cdot ,\tau )||_{\mathcal {H}^{\mathbf{s}}}\ge ||\psi _0||_{\mathcal {H}^{\mathbf{s}}}. \end{aligned}$$

(52)

From (49) follows

$$\begin{aligned} \lim _{\tau \rightarrow t_0+}\sup ||\psi (\cdot ,\tau )||_{\mathcal {H}^{\mathbf{s}}}\le ||\psi _0||_{\mathcal {H}^{\mathbf{s}}}. \end{aligned}$$

(53)

From (52) and (53) we obtain continuity of the $\mathcal {H}^{\mathbf{s}}$-norm of the solution at initial time

$$\begin{aligned} \lim _{\tau \rightarrow t_0+}||\psi (\cdot ,\tau )||_{\mathcal {H}^{\mathbf{s}}}= ||\psi _0||_{\mathcal {H}^{\mathbf{s}}}. \end{aligned}$$

(54)

From (42) we get after integration over $T=[t_0,t_1]$

$$\begin{aligned} \begin{aligned}&\int _{t_0}^{t_1}\frac{1}{R}(||\nabla \mathbf{u}_m^a(s)||_{\mathbf{H}^s}^2+||\nabla \mathbf{u}_m^o(s)||_{\mathbf{H}^s}^2) +\frac{1}{P}(||\nabla \theta _m^a(s)||_{H^s}^2||\nabla \theta _m^o(s)||_{H^s}^2)\, ds\\&\quad \le ||\psi _m(t_0)||_{\mathcal {H}^{\mathbf{s}}}^2 + C_s\int _{t_0}^{t_1}||\psi _m(s)||_{\mathbf{H}^s}^2\, ds +K_s^2 (||\gamma ||_{H^{s_\theta ^a}}^2Pe^a+||\sigma ||_{H^{s_u^o}}^2Re^o)^2(t_1-t_0). \end{aligned} \end{aligned}$$

(55)

From (49) follows that the right hand side of (55) is bounded independent from m. This implies that $\psi \in L^2(T,\mathcal {H}^{s+1})$. Consequently there exists a set $E\subseteq T$ of Lebesgue-measure zero such that for all $\tau \in T{\setminus } E$ it holds that $\psi (\cdot ,\tau )\in \mathcal {H}^{s+1}$. This implies that for all $\delta >0$ there exists a $t_0^* < \delta $ such that $\psi (\cdot ,t_0^*)\in \mathcal {H}^{s+1}$. If we use $\psi _{t_0^*}:=\psi (\cdot ,t_0^*)$ as initial condition we can repeat all the arguments of our proof to establish the existence of a solution $\tilde{\psi }\in C([t_0^*,t_1^*],\mathcal {H}^{\mathbf{s}^*})$, with $s^*<s+1$. The two solutions $\psi ,\tilde{\psi }$ coincide on their joint interval of existence $[t_0,t_1]\cap [t_0^*,t_1^*]$. We obviously have for the two endpoints $t_1\le t_1^*$ and hence $\psi ,\tilde{\psi }$ coincide on $[t_0^*,t_1]$. Since $\delta >0$ was arbitrary we have $\psi \in C((t_0,t_1], \mathcal {H}^{\mathbf{s}})$ and combined with the continuity at $t_0$ (see (54)) it follows that $\psi \in C([t_0,t_1], \mathcal {H}^{\mathbf{s}})$.

Step 5: Existence Global in Time

Let $[t_0,t_1]$ be the maximal interval of existence of the solution $\psi $. If we assume that $t_1<\infty $ then this implies that

$$\begin{aligned} \lim _{t\rightarrow t_1+}\sup ||\psi (t)||_{\mathcal {H}^{\mathbf{s}}}=\infty . \end{aligned}$$

This contradicts $\psi \in C(T,\mathcal {H}^{\mathbf{s}})$. Therefore $t_1=\infty $ and the solution exists globally in time.

$\square $

Remark 1

(Coupling Variants) Some variants of coupling our atmosphere and ocean equations can be included in our analysis. The atmospheric wind forcing term $\sigma \mathbf{u}^a$ in the oceanic velocity equation in (1)–(5) can be modified into $\sigma (\mathbf{u}^a-\mathbf{u}^o)$ without significantly changing the proof of Theorem 1. This can immediately be seen from (40).

A second variant that can easily be incorporated is to include heat transfer from atmosphere to ocean and momentum transfer from ocean to atmosphere. The additional term $\kappa \theta ^a$ for heat transfer can be added to the right-hand side of the ocean heat equation in (1)–(5) and the term $\lambda \mathbf{u}^o$ for the momentum transfer to the right-hand side of the atmospheric velocity equation, where $\kappa , \lambda $ are sufficiently smooth coupling functions. These changes can be included without altering essentially the proof of Theorem 1.

4.2 Linearized, Adjoint Coupled Equations and Differentiability

We linearize the coupled model equations around a solution $\psi =(\psi ^a,\psi ^o)$ of (1)–(5). The resulting model equations (also referred to as “tangent linear model”) are given by

$$\begin{aligned} \begin{aligned} \text {Atmosphere: }&\frac{\partial {U}^a}{\partial t} +(\mathbf{u}^a\cdot \nabla ){U}^a +({U}^a\cdot \nabla )\mathbf{u}^a +\frac{1}{Ro^a}{U}^{a\bot } +\frac{1}{Ro^a}\nabla \varTheta ^a\\&= \frac{1}{Re^a}\triangle {U}^a +{F}_{{U}}^a,\\&\frac{\partial \tilde{Fr}^a{\varTheta }^a}{\partial t} +\frac{1}{Ro^a}{div}({U}^a) = - \gamma {\varTheta }^o+\frac{1}{Pe^a}\triangle {\varTheta }^a +{F}_{{\varTheta }}^a.\\ \text {Ocean: }&\frac{\partial {U}^o}{\partial t} +(\mathbf{u}^o\cdot \nabla ){U}^o +({U}^o\cdot \nabla )\mathbf{u}^o +\frac{1}{Ro^o}{U}^{o\bot } +\frac{1}{Ro^o}\nabla P^o\\&=\sigma {U}^a +\frac{1}{Re^o}\triangle {U}^o +{F}_{{U}}^o,\\&\frac{\partial {\varTheta }^o}{\partial t} +({U}^o\cdot \nabla )\theta ^o+{\varTheta }^o\cdot \nabla u^o = \frac{1}{Pe^o}\triangle {\varTheta }^o +{F}_{{\varTheta }}^o,\\&{div}({U}^o)=0,\\ \text {with initial conditions }&{U}^a(t_0)={U}^a_0, {\varTheta }^a(t_0)={\varTheta }^a_0,{U}^o(t_0)={U}^o_0, {\varTheta }^o(t_0)={\varTheta }_0^o, \end{aligned} \end{aligned}$$

(56)

where ${F}:=({F}_{{U}}^a,{F}_{{\varTheta }}^a,{F}_{{U}}^o,{F}_{{\varTheta }}^o)$ denotes the forcing of the linearized equations. In analogy to (6) we write the linearized equations of the (linear) state vector $\varPsi :=({U}^a,{\varTheta }^a,{U}^o,{\varTheta }^o)$ in the following form

$$\begin{aligned} \begin{aligned}&\frac{\partial \varPsi }{\partial t} +\mathcal {N}'[\psi ](\varPsi ) +L\varPsi +D\varPsi =\bar{\mathcal {C}}(\varPsi ^a,\varPsi ^o)+ F,\\ \text {with initial conditions }&\varPsi (t_0)=({U}^a(t_0), {\varTheta }^a(t_0),{U}^o(t_0), {\varTheta }^o(t_0)), \end{aligned} \end{aligned}$$

(57)

where the linear operators L includes pressure gradient and Coriolis force, D the dissipation and $\mathcal {N}'$ the linearization of the advective operator.

The next theorem establishes a regularity result about the linearized equations. The proof is based on the energy method and classical inequalities.

Theorem 2

(Regularity of Linearized Equations) Let the time interval $T:=(t_0,t_1]$ be given and let $\mathbf{s}=(s_{u^a},s_{\theta ^a},s_{u^o},s_{\theta ^o})\in \mathbb {Z}_+^4$ such that all components of $\mathbf{s}$ are greater or equal than 3. Let the initial condition of the coupled Eqs. (1)–(5) satisfy $\psi _0\in \mathcal {H}^{\mathbf{s}}(\varOmega )$. Assume furthermore that the coupling functions satisfy $\sigma \in C(T,\mathcal {H}^{s_u^o}(\varOmega )),\, \gamma \in C(T,\mathcal {H}^{s_\theta ^a}(\varOmega ))$. Suppose the initial condition of the linearized coupled equations satisfies $\varPsi _0\in \mathcal {H}^{\mathbf{s}}(\varOmega )$ and the forcing ${F}:=({F}_{{U}}^a,{F}_{{\varTheta }}^a,{F}_{{U}}^o,{F}_{{\varTheta }}^o) \in L^2(T,\mathbf{H}^{s-1})\times L^2(T,{ H}^{s-1})\times L^2(T,\mathbf{H}^{s-1})\times L^2(T,{ H}^{s-1})$. Then (56) has a unique solution on the time interval T with the properties

$$\begin{aligned} \varPsi \in C(T,\mathcal {H}^{\mathbf{s}})\cap L^2(T, \mathcal { H}^{s+1}). \end{aligned}$$

The state vector $\varPsi $ of (56) satisfies

$$\begin{aligned} \begin{aligned}&||\varPsi (t)||_{\mathcal {H}^{\mathbf{s}}}^2 \le ||\varPsi _0||_{\mathcal {H}^{\mathbf{s}}}^2e^{\int _{t_0}^t \big (M_s(y)+CK_s^2C_P^2(Pe^a||\gamma (y)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (y)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\,dy}\\&\quad +C\int _{t_0}^t \big [ (Pe^a+Pe^o)||F_{{\varTheta }}||_{H^{s-1}}^2+(Re^a+Re^o)||F_{{U}}||_{\mathbf{H}^{s-1}}^2\big ]\\&\qquad \times e^{\int _y^t \big (M_s(z)+CK_s^2C_P^2(Pe^a||\gamma (z)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (z)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\, dz}dy \end{aligned} \end{aligned}$$

(58)

where $M_s(t):=M(||\psi (t)||_{\mathcal {H}^{\mathbf{s}}},Re^a,Re^o,Pe^a,Pe^o)$ is a bounded function on T, $\nu ^*:=\min \{\frac{1}{Re^a},\frac{1}{Re^o}, \frac{1}{Pe^a},\frac{1}{Pe^o}\}$ and where $||F_{U}||_{\mathbf{H}^{s-1}}^2:=||F_{{U}^a}||_{\mathbf{H}^{s_u^a-1}}^2+||F_{{U}^o}||_{\mathbf{H}^{s_u^o-1}}^2$ and $||F_{\varTheta }||_{{H}^{s-1}}^2:=||F_{{\varTheta }^a}||_{{H}^{s_u^a-1}}^2+||F_{{\varTheta }^o}||_{{H}^{s_u^o-1}}^2$.

Proof

The Galerkin approximation to the atmospheric component in (56) reads as follows

$$\begin{aligned} \begin{aligned}&\frac{\partial {U}_m^a}{\partial t} +(\mathbf{u}^a\cdot \nabla ){U}^a_m +({U}^a_m\cdot \nabla )\mathbf{u}^a +\frac{1}{Ro^a}{U}^{a\bot }_m +\frac{1}{Ro^a}\nabla \varTheta ^a_m = \frac{1}{Re^a}\triangle {U}^a_m +\mathcal {F}_{{U}}^a,\\&\frac{\partial \tilde{Fr}^a{\varTheta }^a_m}{\partial t} +\frac{1}{Ro^a}{div}({U}^a_m) = - \gamma {\varTheta }^o_m +\frac{1}{Pe^a}\triangle {\varTheta }^a_m +F_{{\varTheta }}^a. \end{aligned} \end{aligned}$$

(59)

For the $\mathcal {H}^{\mathbf{s}}$- estimate we apply $\mathcal {D}^\alpha $ to (59), multiply by $\mathcal {D}^\alpha \varPsi _m$ and integrate over the spatial domain. If we now add velocity and scalar equations and integrate by parts, the gradient term of the velocity equation cancels with the divergence term in the $\theta $-equation

$$\begin{aligned} \frac{1}{Ro^a}\int _\varOmega \mathcal {D}^\alpha (\nabla \varTheta ^a_m)\cdot \mathcal {D}^\alpha {U}^a_m\, dx = -\frac{1}{Ro^a}\int _\varOmega \mathcal {D}^\alpha {div}({U}^a_m)\cdot \mathcal {D}^\alpha \varTheta ^a_m\,dx. \end{aligned}$$

This implies that it is sufficient to consider the following inequality

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\frac{d}{d t}(||\mathcal {D}^\alpha {U}^a_m||_{\mathbf{L}^2}+\tilde{Fr}^a||\mathcal {D}^\alpha {\varTheta }^a_m||_{L^2}) +\frac{1}{Re^a}||\nabla \mathcal {D}^\alpha {U}^a_m||_{\mathbf{L}^2}^2+\frac{1}{Pe^a}||\nabla \mathcal {D}^\alpha \varTheta ^a_m||_{L^2}^2\\&\quad \le \big \langle \mathcal {D}^\alpha [(\mathbf{u}^a\cdot \nabla ){U}^a_m],\mathcal {D}^\alpha {U}^a_m\big \rangle _{\mathbf{L}^2} +\big \langle \mathcal {D}^\alpha [({U}^a_m\cdot \nabla )\mathbf{u}^a],\mathcal {D}^\alpha {U}^a_m\big \rangle _{\mathbf{L}^2}\\&\qquad - \big \langle \mathcal {D}^\alpha (\gamma {\varTheta }^o_m),\mathcal {D}^\alpha {\varTheta }^a_m\big \rangle _{L^2} +\big \langle \mathcal {D}^\alpha F_{{\varTheta }}^a,\mathcal {D}^\alpha {\varTheta }^a_m\big \rangle _{L^2} +\big \langle \mathcal {D}^\alpha F_{{U}}^a,\mathcal {D}^\alpha {U}^a_m\big \rangle _{\mathbf{L}^2}. \end{aligned} \end{aligned}$$

(60)

We proceed by estimating all terms on the right hand side of (60) in terms of the corresponding Sobolev norm. The first term on the right-hand side is estimated with the inequality of Cauchy–Schwarz, Lemmas 1, 2 and Theorem 1

$$\begin{aligned} \begin{aligned}&\int _\varOmega \mathcal {D}^\alpha [(\mathbf{u}^a\cdot \nabla ){U}^a_m]\cdot \mathcal {D}^\alpha {U}^a_m\, dx \le ||\mathcal {D}^\alpha [(\mathbf{u}^a\cdot \nabla ){U}^a_m]||_{\mathbf{L}^2}||\mathcal {D}^\alpha {U}^a_m||_{\mathbf{L}^2}\\&\quad \le C_s\{ ||\mathbf{u}^a||_{\mathbf{L}^\infty }||\mathcal {D}^s\nabla {U}^a_m||_{\mathbf{L}^2} + ||\mathcal {D}^s\mathbf{u}^a||_{\mathbf{L}^2}||\nabla {U}^a_m||_{\mathbf{L}^\infty }\}||\mathcal {D}^\alpha {U}^a_m||_{\mathbf{L}^2}\\&\quad \le C_s K_s||\mathbf{u}^a||_{\mathbf{H}^{s_u^a}}||\nabla {U}^a_m||_{\mathbf{H}^{s_u^a}}||\mathcal {D}^\alpha {U}^a_m||_{\mathbf{L}^2}\\&\quad \le N_s||\nabla {U}^a_m||_{\mathbf{H}^{s_u^a}}||{U}^a_m||_{\mathbf{H}^{s_u^a}}, \end{aligned} \end{aligned}$$

(61)

where $N_s:=C_s K_s||\mathbf{u}^a_m||_{\mathbf{H}^{s_u^a}}$ and $\mathbf{u}^a_m\in L^\infty (T;\mathbf{H}^{s_u^a})$ according to Theorem 1. For the second term on the right-hand-side of (60) we use integration by parts and the periodic boundary conditions, and we find with the Cauchy–Schwarz inequality, Lemmas 1 and 2

$$\begin{aligned} \begin{aligned}&\int _\varOmega \mathcal {D}^\alpha [({U}^a_m\cdot \nabla )\mathbf{u}^a]\cdot \mathcal {D}^\alpha {U}^a_m\, dx = -\int _\varOmega \mathcal {D}^{\alpha -1}[({U}^a_m\cdot \nabla )\mathbf{u}^a]\cdot \mathcal {D}^{\alpha +1}{U}^a_m\, dx\\&\quad \le ||\mathcal {D}^{\alpha -1}[({U}^a_m\cdot \nabla )\mathbf{u}^a]||_{\mathbf{L}^2}||\mathcal {D}^{\alpha +1}{U}^a_m||_{\mathbf{L}^2}\\&\quad \le K_s||\mathcal {D}^{\alpha -1}[({U}^a_m\cdot \nabla )\mathbf{u}^a]||_{\mathbf{L}^2} ||\nabla {U}^a_m||_{\mathbf{H}^{s_u^a}}\\&\quad \le C_s\big (||{U}^a_m||_{L^\infty }||\mathcal {D}^{s-1}\nabla \mathbf{u}^a||_{\mathbf{L}^2} +||\nabla \mathbf{u}^a||_{\mathbf{L}^\infty }||\mathcal {D}^{s-1}{U}^a_m||_{\mathbf{L}^2}\big ) ||\nabla {U}^a_m||_{\mathbf{H}^{s_u^a}}\\&\quad \le C_s K_s||\mathbf{u}^a||_{\mathbf{H}^{s_u^a}}||{U}^a_m||_{\mathbf{H}^{s_u^a}} ||\nabla {U}^a_m||_{\mathbf{H}^{s_u^a}}\\&\quad \le N_s||\nabla {U}^a_m||_{\mathbf{H}^{s_u^a}}||{U}^a_m||_{\mathbf{H}^{s_u^a}}. \end{aligned} \end{aligned}$$

(62)

For the forcing terms in (60) we integrate by parts^{Footnote 3} and use the inequality of Cauchy–Schwarz

$$\begin{aligned} \begin{aligned}&\big \langle \mathcal {D}^\alpha F_{{\varTheta }}^a,\mathcal {D}^\alpha {\varTheta }^a_m\big \rangle _{L^2} +\big \langle \mathcal {D}^\alpha F_{{U}}^a,\mathcal {D}^\alpha {U}^a_m\big \rangle _{\mathbf{L}^2}\\&\quad \le ||\mathcal {D}^{\alpha -1} F_{{\varTheta }}^a||_{L^2}||\nabla \mathcal {D}^\alpha {\varTheta }^a_m||_{L^2} +||\mathcal {D}^{\alpha -1} F_{{U}}^a||_{\mathbf{L}^2}||\nabla \mathcal {D}^\alpha {U}^a_m||_{\mathbf{L}^2}\\&\quad \le ||F_{{\varTheta }}^a||_{H^{{s}_\theta ^a-1}}||\nabla {\varTheta }^a_m||_{H^{s_\theta ^a}} +||F_{{U}}^a||_{\mathbf{H}^{{s_u^a}-1}}||\nabla {U}^a_m||_{\mathbf{H}^{s_u^a}}, \end{aligned} \end{aligned}$$

(63)

where we have used that the boundary terms vanish due to the periodic boundary conditions. For the coupling term we obtain analogously to (33) with the inequalities of Cauchy–Schwarz, Poincaré and with Lemma 2

$$\begin{aligned} \begin{aligned} \big \langle \mathcal {D}^\alpha (\gamma {\varTheta }^o_m),\mathcal {D}^\alpha {\varTheta }^a_m\big \rangle _{L^2} \le K_sC_P||\gamma ||_{H^{s_\theta ^a}}||{\varTheta }^o_m||_{H^{s_\theta ^o}}||\nabla {\varTheta }^a_m||_{{ H}^{s_\theta ^a}}. \end{aligned} \end{aligned}$$

(64)

We collect (61)–(64) and sum up over all derivatives $\mathcal {D}^\alpha $ such that their degree is smaller or equal than the corresponding component of $\mathbf{s}$. This yields for (60) the following estimate

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\frac{d}{d t}(||{U}^a_m||_{\mathbf{H}^{s_u^a}}^2+Fr^a||{\varTheta }^a_m||_{H^{s_\theta ^a}}^2) +\frac{1}{Re^a}||\nabla {U}^a_m||_{\mathbf{H}^{s_u^a}}^2+\frac{1}{Pe^a}||\nabla \varTheta ^a_m||_{H^{s_\theta ^a}}^2\\&\quad \le N_s\big (||{U}^a_m||_{\mathbf{H}^{s_u^a}}+||F_{{U}}^a||_{\mathbf{H}^{{s_u^a}-1}}\big )||\nabla {U}^a_m||_{\mathbf{H}^{s_u^a}}\\&\qquad + \big (K_sC_P||\gamma ||_{H^{s_\theta ^a}} ||{\varTheta }^o_m||_{{H}^{s_\theta ^o}}+||F_{{\varTheta }}^a||_{H^{{s_\theta ^a}-1}} \big )||\nabla {\varTheta }^a_m||_{{H}^{s_\theta ^a}} \end{aligned} \end{aligned}$$

(65)

After applying Young’s inequality (see Sect. 3) with $\epsilon =\frac{1}{Re^a}$ and $\epsilon =\frac{1}{Pe^a}$ to the two terms on the right-hand side, the resulting quadratic terms of the form $(a+b)^2$ are estimated by $2(a^2+b^2)$ and finally we obtain

$$\begin{aligned} \begin{aligned}&\frac{d}{d t}(||{U}^a_m||_{\mathbf{H}^{s_u^a}}^2+Fr^a||{\varTheta }^a_m||_{H^{s_\theta ^a}}^2) +\frac{1}{Re^a}||\nabla {U}^a_m||_{\mathbf{H}^{s_u^a}}^2+\frac{1}{Pe^a}||\nabla \varTheta ^a_m||_{H^{s_\theta ^a}}^2\\&\quad \le CN_s^2Re^a ||{U}^a_m||_{\mathbf{H}^{s_u^a}}^2 +CN_s^2Re^a||F_{{U}}^a||_{\mathbf{H}^{{s_u^a}-1}}^2 +CC_P^2Pe^a||F_{{\varTheta }}^a||_{H^{{s_\theta ^a}-1}}^2\\&\quad +CK_s^2C_P^2Pe^a||\gamma ||_{H^{s_\theta ^a}}^2 ||{\varTheta }^o_m||_{\mathbf{H}^{s_\theta ^o}}^2. \end{aligned} \end{aligned}$$

(66)

This can be written as estimate for the atmospheric state

$$\begin{aligned} \begin{aligned}&\frac{d}{d t}||\varPsi ^a_m||_{\mathcal {H}^{\mathbf{s}}}^2 +\frac{1}{Re^a}||\nabla {U}^a_m||_{\mathbf{H}^{s_u^a}}^2 +\frac{1}{Pe^a}||\nabla \varTheta ^a_m||_{H^{s_\theta ^a}}^2\\&\quad \le M_s^a||\varPsi ^a_m||_{\mathcal {H}^{\mathbf{s}}}^2 +CPe^a||F_{{\varTheta }}^a||_{H^{{s_\theta ^a}-1}}^2 +CN_s^2Re^a||F_{{U}}^a||_{\mathbf{H}^{{s_u^a}-1}}^2\\&\qquad + CK_s^2C_P^2Pe^a||\gamma ||^2_{H^{s_\theta ^a}} ||{\varTheta }^o_m||_{H^{s_\theta ^o}}^2, \end{aligned} \end{aligned}$$

(67)

where $M_s^a:=M_s(C_s, K_s,Re^a,Pe^a,||\mathbf{u}^a(t)||_{\mathbf{H}^{s_u^a}})$ is bounded on T.

For the ocean component in (56) we proceed similarly and apply $\mathcal {D}^\alpha $ to the Galerkin system and take the $L^2$-inner product with $\mathcal {D}^\alpha {U}^o_m,\mathcal {D}^\alpha {\varTheta }^o_m$ and arrive at

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\frac{d}{d t}(||\mathcal {D}^\alpha {U}^o_m||_{\mathbf{L}^2}+||\mathcal {D}^\alpha {\varTheta }^o_m||_{L^2}) +\frac{1}{Re^o}||\nabla \mathcal {D}^\alpha {U}^o_m||_{\mathbf{L}^2}^2+\frac{1}{Pe^o}||\nabla \mathcal {D}^\alpha \varTheta ^o_m||_{L^2}^2\\&\quad \le \big \langle \mathcal {D}^\alpha [(\mathbf{u}^o\cdot \nabla ){U}^o_m],\mathcal {D}^\alpha {U}^o_m\big \rangle _{\mathbf{L}^2} +\big \langle \mathcal {D}^\alpha [({U}^o_m\cdot \nabla )\mathbf{u}^o],\mathcal {D}^\alpha {U}^o_m\big \rangle _{\mathbf{L}^2}\\&\qquad +\big \langle \mathcal {D}^\alpha ({U}^o_m\cdot \nabla \theta ^o),\mathcal {D}^\alpha {\varTheta }^o_m \big \rangle _{L^2} +\big \langle \mathcal {D}^\alpha ({\varTheta }^o_m\nabla u^o),\mathcal {D}^\alpha {\varTheta }^o_m \big \rangle _{L^2} \\&\qquad + \big \langle \mathcal {D}^\alpha (\sigma {U}^o_m),\mathcal {D}^\alpha {U}^a_m\big \rangle _{\mathbf{L}^2} +\big \langle \mathcal {D}^\alpha F_{{\varTheta }}^o,\mathcal {D}^\alpha {\varTheta }^o_m\big \rangle _{L^2} +\big \langle \mathcal {D}^\alpha F_{{U}}^o,\mathcal {D}^\alpha {U}^o_m\big \rangle _{\mathbf{L}^2}. \end{aligned} \end{aligned}$$

(68)

For the coupling term in the velocity equation we obtain analogously to (40) with the inequalities of Cauchy–Schwarz, Poincaré and with Lemma 2

$$\begin{aligned} \begin{aligned} \big \langle \mathcal {D}^\alpha (\sigma {U}^a_m),\mathcal {D}^\alpha {U}^o_m\big \rangle _{\mathbf{L}^2} \le K_sC_P||\sigma ||_{H^{s_u^o}}||\mathcal {D}^\alpha {\varTheta }^a_m||_{L^2}||\nabla \mathcal {D}^\alpha {U}^o_m||_{\mathbf{L}^2}. \end{aligned} \end{aligned}$$

(69)

The other terms in the velocity equation can be estimated analogously to the atmospheric case (see (61), (62))

$$\begin{aligned} \begin{aligned}&\big \langle \mathcal {D}^\alpha [(\mathbf{u}^o\cdot \nabla ){U}^o_m],\mathcal {D}^\alpha {U}^o_m\big \rangle _{\mathbf{L}^2} +\big \langle \mathcal {D}^\alpha [({U}^o_m\cdot \nabla )\mathbf{u}^o],\mathcal {D}^\alpha {U}^o_m\big \rangle _{\mathbf{L}^2}\\&\quad \le N_s||\nabla {U}^a_m||_{\mathbf{H}^{s_u^o}}||{U}^a_m||_{\mathbf{H}^{s_u^o}}. \end{aligned} \end{aligned}$$

(70)

We have to estimate the additional terms $({U}^o_m\cdot \nabla )\theta ^o$ and ${\varTheta }^o_m\nabla u^o$ in the third line of (68). For the first term we find after integration by parts with the inequality of Cauchy–Schwarz

$$\begin{aligned} \begin{aligned}&\int _\varOmega \mathcal {D}^\alpha ({U}^o_m\cdot \nabla \theta ^o)\mathcal {D}^\alpha {\varTheta }^o_m\,dx\\&\quad = \int _\varOmega [(\mathcal {D}^\alpha {U}^o_m)\cdot \nabla \theta ^o)]\mathcal {D}^\alpha {\varTheta }^o_m\,dx + \int _\varOmega [({U}^o_m\cdot \nabla )\mathcal {D}^\alpha \theta ^o)]\mathcal {D}^\alpha {\varTheta }^o_m\,dx\\&\quad = \int _\varOmega [(\mathcal {D}^\alpha {U}^o_m)\cdot \nabla \theta ^o)]\mathcal {D}^\alpha {\varTheta }^o_m\,dx - \int _\varOmega \nabla {U}^o_m \mathcal {D}^\alpha \theta ^o\mathcal {D}^\alpha {\varTheta }^o_m\,dx - \int _\varOmega {U}^o_m\mathcal {D}^\alpha \theta ^o\mathcal {D}^\alpha \nabla {\varTheta }^o_m\,dx\\&\quad \le ||\nabla \theta ^o||_{L^\infty }||\mathcal {D}^\alpha {U}^o_m||_{\mathbf{L}^2}||\mathcal {D}^\alpha {\varTheta }^o_m||_{L^2} + || \nabla {U}^o_m||_{\mathbf{L}^\infty } ||\mathcal {D}^\alpha \theta ^o||_{L^2}||\mathcal {D}^\alpha {\varTheta }^o_m ||_{L^2} \\&\qquad + || {U}^o_m||_{\mathbf{L}^\infty } ||\mathcal {D}^\alpha \theta ^o||_{L^2}||\mathcal {D}^\alpha \nabla {\varTheta }^o_m ||_{L^2}. \end{aligned} \end{aligned}$$

(71)

For the second term in the third line of (68) we have similarly with integration by parts, and the inequality of Cauchy–Schwarz

$$\begin{aligned} \begin{aligned}&\int _\varOmega \mathcal {D}^\alpha ({\varTheta }^o_m\nabla u^o)\mathcal {D}^\alpha {\varTheta }^o_m\,dx\\&\quad =\int _\varOmega [(\mathcal {D}^\alpha {\varTheta }^o_m)\nabla u^o]\mathcal {D}^\alpha {\varTheta }^o_m\,dx -\int _\varOmega \nabla {\varTheta }^o_m\ \mathcal {D}^\alpha u^o\mathcal {D}^\alpha {\varTheta }^o_m\,dx\\&\qquad -\int _\varOmega {\varTheta }^o_m \mathcal {D}^\alpha u^o\mathcal {D}^\alpha \nabla {\varTheta }^o_m\,dx\\&\quad \le ||\nabla u^o||_{\mathbf{L}^\infty }||\mathcal {D}^\alpha {\varTheta }^o_m||_{L^2}^2 +||\nabla {\varTheta }^o_m||_{L^\infty }||\mathcal {D}^\alpha u^o||_{\mathbf{L}^2}||\mathcal {D}^\alpha {\varTheta }^o_m||_{L^2}\\&\qquad +||{\varTheta }^o_m||_{L^\infty }||\mathcal {D}^\alpha u^o||_{\mathbf{L}^2}||\mathcal {D}^\alpha \nabla {\varTheta }^o_m||_{L^2}. \end{aligned} \end{aligned}$$

(72)

The estimates (71), (72) imply with Lemma 15

$$\begin{aligned} \begin{aligned}&\int _\varOmega \mathcal {D}^\alpha ({U}^o_m\cdot \nabla \theta ^o)\mathcal {D}^\alpha {\varTheta }^o_m\,dx +\int _\varOmega \mathcal {D}^\alpha ({\varTheta }^o_m\nabla u^o)\mathcal {D}^\alpha {\varTheta }^o_m\,dx\\&\quad \le ||\theta ^o||_{H^{s_\theta ^o}}||{U}^o_m||_{\mathbf{H}^{s_u^o}}||{\varTheta }^o_m||_{H^{s_\theta ^a}} +||{U}^o_m||_{\mathbf{H}^{s_u^o}}||{\varTheta }^o_m||_{H^{s_\theta ^o}}||\theta ^o||_{H^{s_\theta ^o}}\\&\qquad +||{U}^o_m||_{\mathbf{H}^{s_u^o}}||\theta ^o||_{H^{s_\theta ^o}}||\nabla {\varTheta }^o_m||_{H^{s_\theta ^o}}\\&\qquad +|| u^o||_{\mathbf{H}^{s_u^o}}||{\varTheta }^o_m||_{H^{s_\theta ^o}}^2 +||u^o||_{\mathbf{H}^{s_u^o}}||{\varTheta }^o_m||_{H^{s_\theta ^o}}^2+||u^o||_{\mathbf{H}^{s_u^o}}||{\varTheta }^o_m||_{H^{s_\theta ^o}}||\nabla {\varTheta }^o_m||_{H^{s_\theta ^o}}\\&\quad \le K_s\big (||{U}^o_m||_{\mathbf{H}^{s_u^o}}^2+||{\varTheta }^o_m||_{H^{s_\theta ^o}}^2+(||{U}^o_m||_{\mathbf{H}^{s_u^o}}+||{\varTheta }^o_m||_{H^{s_\theta ^o}})||\nabla {\varTheta }^o_m||_{H^{s_\theta ^o}}\big ), \end{aligned} \end{aligned}$$

(73)

where $K_s:=K_s(||\theta ^o||_{H^{s_\theta ^o}},|| u^o||_{\mathbf{H}^{s_u^o}},Re^o, Pe^o)$ is bounded on $T=[t_0,t_1]$ according to Theorem 1. After an application of Young’s inequality to the last term in (73) and the usual compensation with the term $\frac{1}{Pe^o}||\nabla {\varTheta }^o_m||_{H^{s^o_\theta }}$ on the left-hand side of (68), it follows with (70) for the ocean equations

$$\begin{aligned} \begin{aligned}&\frac{d}{d t}(||{U}^o_m||_{\mathbf{H}^{s_u^o}}^2+||{\varTheta }^o_m||_{H^{s_\theta ^o}}^2) +\frac{1}{Re^o}||\nabla {U}^o_m||_{\mathbf{H}^{s_u^o}}^2+\frac{1}{Pe^o}||\nabla \varTheta ^o_m||_{H^{s_\theta ^o}}^2\\&\quad \le K_s( ||{U}^o_m||_{H^{s_u^o}}^2+||{\varTheta }^o_m||_{H^{s_\theta ^o}}^2) +CPe^o||F_{{\varTheta }}^o||_{H^{{s_\theta ^o}-1}}^2 +CRe^o||F_{{U}}^o||_{\mathbf{H}^{{s_u^o}-1}}^2\\&\quad \quad + CK_s^2C_P^2Re^o ||\sigma ||_{H^{s_u^o}}^2 ||{U}^a_m||_{\mathbf{H}^{s_u^o}}^2, \end{aligned} \end{aligned}$$

(74)

where we have applied the Young inequality to the forcing term as in the atmospheric case (cf. (65), (66)). This implies for the ocean state

$$\begin{aligned} \begin{aligned}&\frac{d}{d t}||\varPsi ^o_m||_{\mathcal {H}^{\mathbf{s}}}^2 +\frac{1}{Re^a}||\nabla {U}^o_m||_{\mathbf{H}^{s_u^o}}^2 +\frac{1}{Pe^a}||\nabla \varTheta ^o_m||_{H^{s_\theta ^o}}^2\\&\quad \le M_s^o||\varPsi ^o_m||_{\mathcal {H}^{\mathbf{s}}}^2 +CPe^o||F_{{\varTheta }}^o||_{H^{{s_\theta ^o}-1}}^2 +CRe^o||F_{{U}}^o||_{\mathbf{H}^{{s_u^o}-1}}^2\\&\qquad + CK_s^2C_P^2Re^o ||\sigma ||_{H^{s_u^o}}^2||{U}^a_m||_{\mathbf{H}^{s_u^o}}^2, \end{aligned} \end{aligned}$$

(75)

where $M_s^o:=M_s(C_s, ||\theta ^o||_{H^{s_\theta ^o}},||\mathbf{u}^o||_{\mathbf{H}^{s_u^o}},Re^o,Pe^o)$ is bounded on T. Adding (67) and (75) we infer the following inequality for the coupled system

$$\begin{aligned} \begin{aligned}&\frac{d}{d t}||\varPsi _m||_{\mathcal {H}^{\mathbf{s}}}^2 +\frac{1}{R}(||\nabla {U}_m^a||_{\mathbf{H}^{s_u^a}}^2+||\nabla {U}_m^o||_{\mathbf{H}^{s_u^o}}^2) +\frac{1}{P}(||\nabla \varTheta _m^a||_{H^{s_\theta ^a}}^2+||\nabla \varTheta _m^o||_{H^{s_\theta ^o}}^2)\\&\quad \le \big (M_s+CK_s^2C_P^2(Pe^a||\gamma ||_{H^{s_\theta ^a}}^2+Re^o||\sigma ||_{H^{s_u^o}}^2\big )||\varPsi _m||_{\mathcal {H}^{\mathbf{s}}}^2\\&\quad \quad +C(Pe^a+Pe^o)(||F_{{\varTheta }^a}||_{H^{s_\theta ^a-1}}^2+||F_{{\varTheta }^o}||_{H^{s_\theta ^o-1}}^2)\\&\qquad +C(Re^a+Re^o)(||F_{{U}^a}||_{\mathbf{H}^{s_u^a-1}}^2+||F_{{U}^o}||_{\mathbf{H}^{s_u^o-1}}^2), \end{aligned} \end{aligned}$$

(76)

with $M_s:=M_s^a+M_s^o$, $\frac{1}{R}:=\min \{\frac{1}{Re^a},\frac{1}{Re^o}\}$, $\frac{1}{P}:=\min \{\frac{1}{Pe^a},\frac{1}{Pe^o}\}$. From the Poincaré inequality we have

$$\begin{aligned} \begin{aligned} \frac{d}{dt}||\varPsi _m||_{\mathcal {H}^{\mathbf{s}}}^2 +C_P^{-1}\nu _*||\varPsi _m||_{\mathcal {H}^{\mathbf{s}}}^2 \le&\big (M_s+CK_s^2C_P^2(Pe^a||\gamma ||_{H^{s_\theta ^a}}^2+Re^o||\sigma ||_{H^{s_u^o}}^2)\big ) ||\varPsi _m||_{\mathcal {H}^{\mathbf{s}}}^2\\&+C(Pe^a+Pe^o)(||F_{{\varTheta }^a}||_{H^{s_\theta ^a-1}}^2+||F_{{\varTheta }^o}||_{H^{s_\theta ^o-1}}^2)\\&+C(Re^a+Re^o)(||F_{{U}^a}||_{\mathbf{H}^{s_u^a-1}}^2+||F_{{U}^o}||_{\mathbf{H}^{s_u^o-1}}^2), \end{aligned} \end{aligned}$$

(77)

where $\nu _*:=\min \{\frac{1}{R},\frac{1}{P}\}$. Using Gronwall’s inequality it follows

$$\begin{aligned} \begin{aligned}&||\varPsi _m(t)||_{\mathcal {H}^{\mathbf{s}}}^2 \le ||\varPsi _0||_{\mathcal {H}^{\mathbf{s}}}^2e^{\int _{t_0}^t \big (M_s(y)+CK_s^2C_P^2(Pe^a||\gamma (y)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (y)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\,dy}\\&\quad +C\int _{t_0}^t \big [ (Pe^a+Pe^o)(||F_{{\varTheta }^a}||_{H^{s_\theta ^a-1}}^2+||F_{{\varTheta }^o}||_{H^{s_\theta ^o-1}}^2)\\&\quad +(Re^a+Re^o)(||F_{{U}^a}||_{\mathbf{H}^{s_u^a-1}}^2+||F_{{U}^o}||_{\mathbf{H}^{s_u^o-1}}^2)\big ]\\&\qquad \times e^{\int _y^t \big (M_s(z)+CK_s^2C_P^2(Pe^a||\gamma (z)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (z)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\, dz}dy \end{aligned} \end{aligned}$$

(78)

Since $M_s$ is bounded on T this implies that $\varPsi \in L^\infty (T,{\mathcal {H}^s})$. Integrating (77) over $T=[t_0,t_1]$ yields that $\varPsi \in L^2(T,{\mathcal {H}^{s+1}})$. $\square $

Proof of continuous dependency on the initial conditions and uniqueness of the coupled nonlinear Eqs. (1)–(5)

Let $\psi _1=(\mathbf{u}^a_1,\theta ^a_1,\mathbf{u}^o_1,\theta ^o_1)$ and $\psi _2=(\mathbf{u}^a_2,\theta ^a_2,\mathbf{u}^o_2,\theta ^o_2)$ be two solutions of the coupled nonlinear Eqs. (1)–(5). Define the difference $\delta \psi :=(\delta \mathbf{u}^a,\delta \theta ^a,\delta \mathbf{u}^o,\delta \theta ^o)$

$$\begin{aligned} \begin{aligned}&\delta \mathbf{u}^a:=\mathbf{u}^a_1-\mathbf{u}^a_2\quad \text {and}\quad \delta \theta ^a:=\theta ^a_1-\theta ^a_2,\\&\delta \mathbf{u}^o:=\mathbf{u}^o_1-\mathbf{u}^o_2\quad \text {and}\quad \delta \theta ^o:=\theta ^o_1-\theta ^o_2. \end{aligned} \end{aligned}$$

Then the difference for the atmospheric component satisfies the equations

$$\begin{aligned} \begin{aligned}&\frac{\partial \delta \mathbf{u}^a}{\partial } +\mathbf{u}^a_2\cdot \nabla \delta \mathbf{u}^a+\delta \mathbf{u}^a\cdot \nabla \mathbf{u}^a_2 +\delta \mathbf{u}^{a \bot } +\nabla (g\delta \theta ^a) =\frac{1}{Re^a}\triangle \delta \mathbf{u}^a,\\&\frac{\partial \delta \theta ^a}{\partial t} +{div}(\delta \mathbf{u}^a) =\frac{1}{Pe^a}\triangle \delta \theta ^a - \gamma \delta \theta ^o. \end{aligned} \end{aligned}$$

and analogously for the ocean. A comparison with the linearized equations (56) shows that the system above has an identical structure. Analogously to the linear equations (cf. (77) with vanishing forcing) one can derive the inequality

$$\begin{aligned} \begin{aligned} \frac{d}{dt}||\delta \psi ||_{\mathcal {L}^2}^2\le M||\delta \psi ||_{\mathcal {L}^2}^2. \end{aligned} \end{aligned}$$

The Gronwall lemma implies

$$\begin{aligned} \begin{aligned} ||\delta \psi (t)||_{\mathcal {L}^2}^2\le ||\delta \psi (t_0)||_{\mathcal {L}^2}^2e^{M(t-t_0)}. \end{aligned} \end{aligned}$$

The above inequality proves the continuous dependency on the initial conditions. In particular, if the two solutions $\psi _1$, $\psi _2$ have the same initial conditions, i.e. $\delta \psi (t_0)=0$, uniqueness follows. $\square $

The 4D-Var algorithm applies a gradient-based minimization such as steepest descent, which relies on the derivative of the model state with respect to the initial conditions. For this purpose we have to assure that the mapping of the initial state to the model state at a certain time instant is differentiable. This is the content of the following Lemma.

Lemma 7

The mapping $\psi _0\mapsto \psi (t;\psi _0)$ from $\mathcal {H}^{\mathbf{s}}$ into $L^2(T;\mathcal {H}^{\mathbf{s}})$, with $T:=(t_0,t_1]$, that assigns the solution of the coupled model equations to an initial condition has a Gateaux derivative $\frac{D\psi }{D\psi _0}h$ in every direction $h\in \mathcal {H}^{\mathbf{s}}$. Furthermore, $\frac{D\psi }{D\psi _0}h$ solves the linearized coupled equations (56) with initial condition $\varPsi (t_0)=h$ and forcing $F=0$.

Proof

Let $h\in \mathcal {H}^{\mathbf{s}}$. Denote by $\psi _0, \psi _0+\tau h\in \mathcal {H}^{\mathbf{s}}$ two initial conditions and by $\psi $ and $\psi _{\tau h}$ the corresponding solutions of the coupled model equations

$$\begin{aligned} \begin{aligned}&\frac{\partial \psi }{\partial t} +\mathcal {N}(\psi ,\psi ) +L\psi +D\psi =\mathcal {C}(\psi ^a,\psi ^o), \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned}&\frac{\partial \psi _{\tau h}}{\partial t} +\mathcal {N}(\psi _{\tau h},\psi _{\tau h}) +L\psi _{\tau h} +D\psi _{\tau h} =\mathcal {C}(\psi ^a_{\tau h},\psi ^o_{\tau h}). \end{aligned} \end{aligned}$$

Let $\varPsi $ be the solution of the linearized equations, which is linearized around $\psi $. Then $\varPsi $ satisfies

$$\begin{aligned} \begin{aligned}&\frac{\partial \varPsi }{\partial t} +\mathcal {N}'[\psi ](\varPsi ) +L\varPsi +D\varPsi =\bar{\mathcal {C}}(\varPsi ^a,\varPsi ^o), \end{aligned} \end{aligned}$$

with zero forcing and initial condition $\varPsi (t_0)=h$. The assertion of the lemma is proven if we have shown that $y(\tau ):=\psi _{\tau h}-\psi -\tau \varPsi $ satisfies

$$\begin{aligned} \lim _{\tau \rightarrow 0}\frac{||y(\tau )||_{L^2(T;{\mathcal {H}^{\mathbf{s}}})}}{|\tau |} =0. \end{aligned}$$

(79)

The function $y=(y^a_u,y^a_h,y^o_u,y^o_h)$ solves the equation

$$\begin{aligned} \frac{dy}{dt} +\mathcal {N}(\psi _{\tau h},\psi _{\tau h}) -\mathcal {N}(\psi ,\psi ) -\mathcal {N}'[\psi ](\tau \varPsi ) +L y +D y = \mathcal {C}(y^a,y^o), \end{aligned}$$

with initial condition $y_0=0$. If we introduce

$$\begin{aligned} k(x,t):=\mathcal {N}(\psi ,\psi )-\mathcal {N}(\psi _{\tau h},\psi _{\tau h}) + \mathcal {N}'[\psi ](\psi _{\tau h}-\psi ), \end{aligned}$$

the equation for y becomes

$$\begin{aligned} \frac{dy}{dt}+\mathcal {N}'[\psi ](y) +L y +D y = \mathcal {C}(y^a,y^o)+k. \end{aligned}$$

(80)

Note that (80) is a linearized coupled equation with initial condition $y_0=0$ and forcing given by $k=(k^a,k^o)$ and $k^a=(k^a_u,0)$, $k^o=(k^o_u,k^o_\theta )$. The forcing $k^a_u$ reads explicitly

$$\begin{aligned} \begin{aligned} k^a_u:=&(\mathbf{u}^a\cdot \nabla )\mathbf{u}^a-(\mathbf{u}^a_{\tau h}\cdot \nabla )\mathbf{u}^a_{\tau h} +(\mathbf{u}^a\cdot \nabla )(\mathbf{u}^a_{\tau h}-\mathbf{u}^a) +((\mathbf{u}^a_{\tau h}-\mathbf{u}^a)\cdot \nabla )\mathbf{u}^a\\ =&(\mathbf{u}^a_{\tau h}-\mathbf{u}^a)\cdot \nabla (\mathbf{u}^a_{\tau h}-\mathbf{u}^a). \end{aligned} \end{aligned}$$

The expression for $k^o_u$ is analogous to $k^a_u$ but with $\mathbf{u}^a$ replaced by $\mathbf{u}^o$. The forcing $k^o_\theta $ is given by

$$\begin{aligned} \begin{aligned} k^o_\theta :=&=(\mathbf{u}^o_{\tau h}-\mathbf{u}^o)\cdot \nabla \theta _{\tau h}^o+(\theta _{\tau h}^o-\theta ^o)\nabla u^o. \end{aligned} \end{aligned}$$

We prove now the following two inequalities:

$$\begin{aligned} \begin{aligned} \mathrm{i)}&\ \exists \ K>0:\ \int _T||y(t)||^2_{\mathcal {H}^{\mathbf{s}}}dt\le K\int _T ||k||_{\mathcal {H}^{\mathbf{s}-1}}^2dt,\\ \mathrm{ii)}&\ \exists \ C>0:\ ||k||_{\mathcal { H}^{\mathbf{s}-1}}\le C(||\mathbf{u}^a-\mathbf{u}^a_{\tau h}||_{\mathbf{H}^{s_u^a}}^2+||\theta _{\tau h}^o-\theta ^o||_{H^{s^o-\theta }}^2+||\mathbf{u}^o-\mathbf{u}^o_{\tau h}||_{\mathbf{H}^{s_u^o}}^2). \end{aligned} \end{aligned}$$

(81)

From (78) in the proof of Theorem 2 follows for $t\in T:=(t_0,t_1]$ and with $F_{U}^a=k^a_u,F^a_\varTheta =0,F_{U}^o=k^o_u,F^o_\varTheta =k^o_\theta $ that

$$\begin{aligned} \begin{aligned} ||y(t)||_{\mathcal { H}^s}^2 \le&CRe^ae^{\int _T\big (M_s(z)+CK_s^2C_P^2(Pe^a||\gamma ||_{H^{s_\theta ^a}}^2++Re^o||\sigma ||_{H^{s_u^o}}^2)\big )-C_P\nu ^*\, dz}\\&\times \int _T||k_u^a(\tau )||_{\mathbf{H}^{{s_u^a}-1}}^2+||k_u^o(\tau )||_{\mathbf{H}^{{s_u^o}-1}}^2+||k_\theta ^o(\tau )||_{\mathbf{H}^{{s_\theta ^o}-1}}^2d\tau , \end{aligned} \end{aligned}$$

with $\nu ^*=\min \{\frac{1}{Re^a},\frac{1}{Re^o},\frac{1}{Pe^a},\frac{1}{Pe^o}\}$. This implies that a $K>0$ exists that depends on the length of the time interval T such that

$$\begin{aligned} \begin{aligned} \int _T||y(z)||_{\mathcal {H}^{\mathbf{s}}}^2dz \le&K\int _T ||k_u^a(\tau )||_{\mathbf{H}^{{s_u^a}-1}}^2+||k_u^o(\tau )||_{\mathbf{H}^{{s_u^o}-1}}^2+||k_\theta ^o(\tau )||_{\mathbf{H}^{{s_\theta ^o}-1}}^2d\tau , \end{aligned} \end{aligned}$$

where K is a exponentially growing but bounded function on T. This proves assertion i). For assertion ii) it follows from Lemma 1 that

$$\begin{aligned} \begin{aligned} ||k^a_u||_{\mathbf{H}^{s_u^a-1}} \le C||\mathbf{u}^a-\mathbf{u}^a_{sh}||_{\mathbf{H}^{s_u^a}}^2. \end{aligned} \end{aligned}$$

Analogous estimates apply to $k^o_u, k^o_\theta $ and this proves ii). Combining i) and ii) we conclude that

$$\begin{aligned} \int _T||y(t)||_{\mathcal {H}^{\mathbf{s}}}^2dt \le C\int _T||\mathbf{u}^a_{\tau h}-\mathbf{u}^a||_{\mathbf{H}^{s_u^a}}^4+||\mathbf{u}^o_{\tau h}-\mathbf{u}^o||_{\mathbf{H}^{s_u^a}}^4dt. \end{aligned}$$

(82)

We show now an upper bound on the right hand side of (82)

$$\begin{aligned} iii)\ \exists \ K>0:\ \sup _{t\in T}||\mathbf{u}^a_{\tau h}-\mathbf{u}^a||_{\mathbf{H}^{s_u^a}}^2 \le K\tau ^2||h||_{\mathcal {H}^{\mathbf{s}}}^2, \end{aligned}$$

(83)

with an analogous estimate for the ocean term in (82). Define $\hat{\psi }:=\psi _{\tau h}-\psi $, i.e. for the atmospheric component we have $\hat{\mathbf{u}}^a:=\mathbf{u}_{\tau h}^a-\mathbf{u}^a$ and $\hat{\theta }^a:=\theta _{\tau h}^a-\theta ^a$. According to Theorem 1 we have $(\hat{\mathbf{u}}^a,\hat{\theta ^a})\in C(T;\mathbf{H}^{s_u^a})\times C(T;H^{s_\theta ^a})$. Furthermore $\hat{\mathbf{u}}^a,\hat{\theta }^a$ solve the equations

$$\begin{aligned} \begin{aligned}&\frac{\partial \hat{\mathbf{u}}^a}{\partial t} +\mathbf{u}^a\cdot \nabla \hat{\mathbf{u}}^a+\hat{\mathbf{u}}^a\cdot \nabla \mathbf{u}^a_{\tau h} +\frac{1}{Ro^a}\hat{\mathbf{u}}^{a \bot } +\frac{1}{Ro^a}\nabla \hat{\theta }^a =\frac{1}{Re^a}\triangle \hat{\mathbf{u}}^a,\\&\tilde{Fr}^a\frac{\partial \hat{\theta }^a}{\partial t} +\frac{1}{Ro}{div}(\hat{\mathbf{u}}^a) =\frac{1}{Pe^a}\triangle \hat{\theta }^a - \gamma \hat{\theta }^o, \end{aligned} \end{aligned}$$

with initial condition $\hat{\psi }(t_0)=\tau h$. This equation has a similar structure as the linearized equations (56) with a zero forcing term. We can repeat all steps that have lead us to (78) and this inequality implies

$$\begin{aligned} \begin{aligned} ||\hat{\psi }(t)||_{\mathcal {H}^{\mathbf{s}}}^2&\le C(t)||\hat{\psi }_0||_{\mathcal {H}^{\mathbf{s}}}^2 =C(t)\tau ^2||h||^2_{\mathcal {H}^{\mathbf{s}}}, \end{aligned} \end{aligned}$$

(84)

with C(t) bounded on T. With the definition of $\hat{\psi }$ this implies

$$\begin{aligned} ||\mathbf{u}_{\tau h}^a-\mathbf{u}^a||_{\mathbf{H}^{s_u^a}}^2 + ||h_{\tau h}-h||_{\mathcal {H}^{\mathbf{s}}}^2 \le C(t) \tau ^2||h||_{\mathcal {H}^{\mathbf{s}}}^2. \end{aligned}$$

This proves (83). Together with the corresponding bound for the ocean term in (82) this implies (79). From (79) and the definition of the Gateaux derivative it follows that the solution w of the linearized equations satisfies $w(h)=(\mathcal {D} \varPsi /\mathcal {D}\psi _0)h.$$\square $

The adjoint model of the coupled system (6) is defined as adjoint of the equations that are linearized around a model trajectory (56). The adjoint equations are explicitly given by the following equations (see e.g. [20])

$$\begin{aligned} \begin{aligned} \text {Atmosphere: }&-\frac{\partial \widetilde{U}^a}{\partial t} -u^a\frac{\partial \widetilde{U^a}}{\partial x} -v^a\frac{\partial \widetilde{V^a}}{\partial y} -\widetilde{U^a}\frac{\partial v^a}{\partial y} +\widetilde{V^a}\frac{\partial v^a}{\partial x} -\frac{1}{Ro^a}\widetilde{V^a}^\bot +\frac{1}{Ro^a}\theta ^a\frac{\partial \widetilde{\varTheta ^a}}{\partial x}\\&=\sigma \widetilde{U^o} +\frac{1}{Re^a}\triangle \widetilde{U^a}+\tilde{F}^a_{\tilde{U}},\\&-\frac{\partial \widetilde{V}^a}{\partial t} -\widetilde{U^a}\frac{\partial u^a}{\partial y} +u^a\frac{\partial \widetilde{V^a}}{\partial x} -\widetilde{V^a}\frac{\partial u^a}{\partial x} -v^a\frac{\partial \widetilde{V^a}}{\partial y} +\frac{1}{Ro^a}\widetilde{U^a}^\bot +\frac{1}{Ro^a}\theta ^a\frac{\partial \widetilde{\varTheta ^a}}{\partial v}\\&=\sigma \widetilde{V^o} +\frac{1}{Re^a}\widetilde{V^a}+\tilde{F}^a_{\tilde{V}},\\&-\tilde{Fr}^a\frac{\partial \widetilde{{\varTheta }}^a}{\partial t} -\frac{1}{Ro^a}\nabla \widetilde{{U}}^a = \frac{1}{Pe^a}\triangle \widetilde{{\varTheta }}^a + \tilde{F}^a_{\widetilde{{\varTheta }}}.\\ \text {Ocean: }&-\frac{\partial \widetilde{U}^o}{\partial t} -u^o\frac{\partial \widetilde{U^o}}{\partial x} -v^o\frac{\partial \widetilde{V^o}}{\partial y} -\widetilde{U^o}\frac{\partial v^o}{\partial y} +\widetilde{V^o}\frac{\partial v^o}{\partial x} -\frac{1}{Ro^o}\widetilde{U^o}^\bot +\frac{1}{Ro^o}\theta ^o\frac{\partial \widetilde{\varTheta ^o}}{\partial x}\\&=\frac{1}{Re^o}\triangle \widetilde{U^o}+\tilde{F}^o_{\tilde{U}},\\&-\frac{\partial \widetilde{V}^o}{\partial t} +\widetilde{V^o}\frac{\partial u^o}{\partial y} -v^o\frac{\partial \widetilde{V^o}}{\partial y} -\frac{\partial }{\partial x}[u^o\widetilde{V^o}] +\frac{1}{Ro^o}\widetilde{V^o} +\frac{1}{Ro^o}\theta ^a\frac{\partial \widetilde{\varTheta ^o}}{\partial v}\\&=\frac{1}{Re^o}\triangle \widetilde{V^o}+\tilde{F}^o_{\tilde{V}},\\&-\frac{\partial \widetilde{{\varTheta }}^o}{\partial t} -(u^o\nabla )\widetilde{{\varTheta }}^o = \frac{1}{Pe^o}\triangle \widetilde{{\varTheta }}^o-\gamma \widetilde{{\varTheta }}^a+\tilde{F}^o_{\widetilde{{\varTheta }}},\\&\frac{\partial \widetilde{U}^o }{\partial x}+\frac{\partial \widetilde{V}^o }{\partial y}=0,\\ \text {with initial}&\text { conditions } \widetilde{U}^a(t_0)= \widetilde{U}^a_0,\quad \widetilde{V}^a(t_0)= \widetilde{V}^a_0,\quad \widetilde{{\varTheta }}^a(t_0)=\widetilde{{\varTheta }}^a_0,\\&\qquad \qquad \ \ \widetilde{U}^o(t_0)= \widetilde{U}^o_0,\quad \widetilde{V}^o(t_0)= \widetilde{V}^o_0,\quad \widetilde{{\varTheta }}^o(t_0)=\widetilde{{\varTheta }}_0^o, \end{aligned}\nonumber \\ \end{aligned}$$

(85)

and with forcing terms $\tilde{F}:=(\tilde{F}_{{U}}^a, \tilde{F}_{{\varTheta }}^a, \tilde{F}_{{U}}^o, \tilde{F}_{{\varTheta }}^o)$. Observe the “inverse coupling” in the adjoint equations. While in the coupled Eqs. (1)–(5) and in the linearized equations (56) the ocean is influenced through the atmospheric component via the velocity equation and the atmosphere is coupled to the ocean via the advection–diffusion equation, these roles are reversed in the adjoint equations above. In analogy to (6) we write the linearized equations in the following form

$$\begin{aligned} \begin{aligned}&-\frac{\partial \tilde{\varPsi }}{\partial t} +\mathcal {N}'^*[\psi ](\tilde{\varPsi }) +L\tilde{\varPsi } -\tilde{C}(\tilde{\varPsi ^a},\tilde{\varPsi ^o}) +D\tilde{\varPsi } =\tilde{F},\\ \text {with initial conditions }&\tilde{\varPsi }(t_0)=(\widetilde{U}^a(t_0),\widetilde{V}^a(t_0),\widetilde{{\varTheta }}^a(t_0), \widetilde{U}^o(t_0),\widetilde{V}^o(t_0),\widetilde{{\varTheta }}^o(t_0)), \end{aligned} \end{aligned}$$

(86)

where $\tilde{\varPsi }:=(\widetilde{{U}}^a,\widetilde{{\varTheta }}^a,\widetilde{{U}}^o,\widetilde{{\varTheta }}^o)$ denotes the adjoint state vector and $\psi $ the solution to (1)–(5). The following result follows immediately from the corresponding result about the linearized equations (56) and the definition of the adjoint equations by means of the $L^2$-scalar product.

Theorem 3

(Regularity of Adjoint Equations) Let $\mathbf{s}=(s_{u^a},s_{\theta ^a},s_{u^o},s_{\theta ^o})\in \mathbb {Z}_+^4$ such that all components of $\mathbf{s}$ are greater or equal than 3. Let the coupling functions satisfy $\sigma \in C(T,\mathcal {H}^{s_u^o}(\varOmega )),\, \gamma \in C(T,\mathcal {H}^{s_\theta ^a}(\varOmega ))$. Assume that for the initial condition of the coupled equation (1)–(5) it holds that $\psi _0\in \mathcal {H}^{\mathbf{s}}(\varOmega )$. Suppose the initial condition of the coupled adjoint equations, specified at $t=t_1$, satisfy $\tilde{\varPsi }(t_1)\in \mathcal {H}^{\mathbf{s}}(\varOmega )$ and $\tilde{F}:=(\tilde{F}_{{U}}^a, \tilde{F}_{{\varTheta }}^a, \tilde{F}_{{U}}^o, \tilde{F}_{{\varTheta }}^o) \in L^2(T,\mathbf{H}^{s-1})\times L^2(T,{ H}^{s-1})\times L^2(T,\mathbf{H}^{s-1})\times L^2(T,{ H}^{s-1})$. Then the system (85) has a unique solution on $T:=[t_1,t_0]$ with the properties

$$\begin{aligned} \tilde{\varPsi }(t)\in C(T,\mathcal {H}^{\mathbf{s}})\cap L^2(T, \mathcal { H}^{s+1}). \end{aligned}$$

The state vector $\tilde{\varPsi }$ of the adjoint equations (85) satisfies

$$\begin{aligned} \begin{aligned}&||\tilde{\varPsi }(t)||_{\mathcal { H}s}^2 \le ||\tilde{\varPsi }(t_1)||_{\mathcal {H}^{\mathbf{s}}}^2 e^{\int _{t_0}^t \big (M_s(y)+CK_s^2C_P^2(Pe^a||\gamma (y)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (y)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\,dy}\\&\quad +C\int _{t_0}^t \big [ (Pe^a+Pe^o)||\mathcal {\tilde{F}}_{\widetilde{{\varTheta }}}||_{H^{s-1}}^2 +(Re^a+Re^o)||\mathcal {\tilde{F}}_{AU}||_{\mathbf{H}^{s-1}}^2\big ]\\&\quad \times e^{\int _y^t \big (M_s(z)+CK_s^2C_P^2(Pe^a||\gamma (z)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (z)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\, dz}dy \end{aligned} \end{aligned}$$

(87)

where $M_s$, $\nu ^*$ are defined in Theorem 2 and where $\tilde{F}_{{U}}, \tilde{F}_{{\varTheta }}$ . are defined by $||\tilde{F}_{\widetilde{{U}}}||_{\mathbf{H}^{s-1}}^2:=||\tilde{F}_{\widetilde{{U}}^a}||_{\mathbf{H}^{s_u^a-1}}^2+||\tilde{F}_{\widetilde{{U}}^o}||_{\mathbf{H}^{s_u^o-1}}^2$ and $||\tilde{F}_{\widetilde{{\varTheta }}}||_{{H}^{s-1}}^2:=||\tilde{F}_{\widetilde{{\varTheta }}^a}||_{{H}^{s_u^a-1}}^2+||\tilde{F}_{\widetilde{{\varTheta }}^o}||_{{H}^{s_u^o-1}}^2$.

Remark 2

The upper bound on the norm of the adjoint state in (87) depends via M on the regularity of the coupled model solution $||\psi ||_{\mathcal {H}^s}$, on the coupling parameters $(\gamma ,\sigma )$ and through the Reynolds and Péclet number on viscosity and diffusivity. The smoother the underlying model solution, the smaller is M and consequently the smaller becomes the upper bound. A stronger coupling with increasing norm of the coupling functions $\gamma ,\sigma $ will increase the upper bound, while an increase of viscosity and diffusivity will decrease the upper bound in (87) such that we observe competing effects between diffusivity and coupling functions. Additionally the right-hand-side in (87) depends on the adjoint initial condition and the regularity of the adjoint forcing over the time interval T. In the adjoint method of data assimilation the adjoint state at time $t=t_0$ is identified with the gradient of the cost functional (9) with respect to the initial condition. This illustrates that the flow parameters influence the gradient’s smoothness and thereby the convergence rate of the variational minimization.

The following relation follows from a direct calculation using the definition of the tangent linear and the adjoint equations

Lemma 8

(Adjoint Relation) Let F and $\tilde{F}$ be the forcing of the linearized coupled equations and the adjoint coupled equations, respectively. By $\varPsi $ and $\tilde{\varPsi }$ we denote the variables of the linear and adjoint equations. Then

$$\begin{aligned} \big \langle F,\tilde{\varPsi }\big \rangle _{\mathcal {L}^2}=\big \langle \tilde{F},\varPsi \big \rangle _{\mathcal {L}^2}. \end{aligned}$$

Remark 3

(The case of two uncoupled models) The well-posedness results of Sect. 4 remain valid if we replace the coupling term by an external forcing of the same regularity. This follows from an inspection of the proofs of this section. More precisely, in the atmospheric temperature equation (2) we can replace the temperature $\gamma \theta ^o$ that stems from the ocean component, by an external temperature source $F_{\theta ^a}(x,t)\in L^2(T,H^s(\varOmega ))$. The same applies to the ocean component, the wind forcing from the atmosphere in (3) can be replaced by an external force $F_{\mathbf{u}^o}(x,t)\in L^2(T,H^s(\varOmega ))$. The proofs of this section show that with these modifications one obtains two models to which Theorems 1, 2, 3 apply, as well as Lemma 7.

5 Existence of Local Minima of the Data Assimilation Problem

Based on the regularity results of the previous sections we are now formulating the data assimilation cost functional. The formal definition of the data assimilation problem (8) and the cost functional (9)–(11) will no filled with a precise mathematical definition that is consistent with the analysis of Sect. 4.

The model dynamics that we are considering consist of trajectories in $C(T,\mathcal {H}^{\mathbf{s}})$ and are controlled by initial conditions in $\mathcal {H}^{\mathbf{s}}$. Let observations $\psi _{obs}$ and a background guess $\psi _{back}$ be given. We define the cost functional by

$$\begin{aligned} \begin{aligned} \mathcal {J}^\mathbf{s}(\psi _0)&= \mathcal {J}_b^{\mathbf{s}^b}(\psi _0)+\mathcal {J}_{obs}^{\mathbf{s}^o}(\psi _0)\\&:= ||\psi _0-\psi _{back}||_{\mathcal {H}^{\mathbf{s}^b}(d\mu _{\mathcal {B}})}^2 + \int _T ||\mathcal {M}[\psi _0]-\psi _{obs}||_{\mathcal {H}^{\mathbf{s}^{obs}}(d\mu _{\mathcal {R}})}^2\,dt. \end{aligned} \end{aligned}$$

(88)

The background term is given by

$$\begin{aligned} \begin{aligned} \mathcal {J}_b^{\mathbf{s}^b}(\psi _0)&= \big \langle \mathcal {B}(\psi _0-\psi _{back}),\psi _0-\psi _{back}\big \rangle _{\mathcal {H}^{s^b}(dx)}\\&= \sum _{\alpha \in \mathcal {I}_{s^{b}}} \int _\varOmega \mathcal {D}^\alpha \mathcal {B}(\psi _0-\psi _{back})\cdot \mathcal {D}^\alpha (\psi _0-\psi _{back})\, dx, \end{aligned} \end{aligned}$$

(89)

where $\mathbf{s}^b=(s^b_a,s^b_o)\in \mathbb {Z}_+^4\cup \{0\}$ is a non-negative index set for the order of the Sobolev spaces in the background term. We introduce the following notation for index sets of the background term

$$\begin{aligned} \mathcal {I}_{\mathbf{s}^{b}}:= & {} \left\{ \alpha =(\alpha _{u^a},\alpha _{\theta ^a},\alpha _{u^o},\alpha _{\theta ^o} )\in \mathbb {Z}^4_+\cup \{0\}: \alpha _{u^a}\le s_{u^a}^{b}, \alpha _{\theta ^a}\le s_{\theta ^a}^{b},\right. \nonumber \\&\left. \alpha _{u^o}\le s_{u^o}^{b}, \alpha _{\theta ^o}\le s_{\theta ^a}^{b} \right\} . \end{aligned}$$

(90)

The definition of the observational term unfolds to (cf. (20, (21))

$$\begin{aligned} \begin{aligned} \mathcal {J}_{obs}^{\mathbf{s}^{obs}}(\psi _0)&= \int _T \big \langle \mathcal {M}[\psi _0]-\psi _{obs},\mathcal {M}[\psi _0]-\psi _{obs}\big \rangle _{\mathcal {H}^{\mathbf{s}^{obs}}(d\mu _{\mathcal {R}})}\,dt\\&= \int _T \big \langle \mathcal {R}(\mathcal {M}[\psi _0]-\psi _{obs}),\mathcal {M}[\psi _0]-\psi _{obs}\big \rangle _{\mathcal {H}^{\mathbf{s}^{obs}}(dx)}\,dt\\&= \sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]}\int _T\int _\varOmega \big \{ \triangle ^{\alpha }\mathcal {R}(\mathcal {M}[\psi _0]-\psi _{obs})\big \}\cdot \big \{\mathcal {M}[\psi _0]-\psi _{obs}\big \}\, dxdt, \end{aligned} \end{aligned}$$

(91)

with the index set $\mathcal {I}_{\mathbf{s}^{obs}}^{\mathbf{t}^{obs}}$ given as follows. Let $\mathbf{s}^{obs}:=(s^{obs}_{u^a},s^{obs}_{\theta ^a},s^{obs}_{u^o},s^{obs}_{\theta ^o})\in \mathbb {Z}^4$ and $\mathbf{t}^{obs}:=(t^{obs}_{u^a},t^{obs}_{\theta ^a},t^{obs}_{u^o},t^{obs}_{\theta ^o})\in \mathbb {Z}^4$ be given such that $\mathbf{t}^{obs}\le \mathbf{s}^{obs}$. Define now

$$\begin{aligned} \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]:= & {} \left\{ \alpha =(\alpha _{u^a},\alpha _{\theta ^a},\alpha _{u^o},\alpha _{\theta ^o} )\in \mathbb {Z}^4 : t_{u^a}^{obs}\le \alpha _{u^a}\le s_{u^a}^{obs},\right. \nonumber \\&\left. \quad t_{\theta ^a}^{obs}\le \alpha _{\theta ^a}\le s_{\theta ^a}^{obs}, t_{u^o}^{obs}\le \alpha _{u^o}\le s_{u^o}^{obs}, t_{\theta ^a}^{obs}\le \alpha _{\theta ^o}\le s_{\theta ^a}^{obs}\right\} . \end{aligned}$$

(92)

Note that the indices in the observational term are allowed to be negative, while the indices for the background term are non-negative.

The following theorem establishes the existence of stationary points of the cost functional. We refer to these points as “optimal initial conditions”.

Theorem 4

(Optimal Initial Conditions) Let observations $\psi _{obs}\in L^2(T;\mathcal { H}^{s^{*}})$ be given, with $\mathbf{s}^{*}=(s^{*}_{u^a},s^{*}_{\theta ^a},s^{*}_{u^o},s^{*}_{\theta ^o})\in \mathbb {Z}^4_+\cup \{0\}$. Let $\mathbf{s}^b:=(s^b_{u^a},s^b_{\theta ^a},s^b_{u^o},s^b_{\theta ^o})\in \mathbb {Z}^4_+$ be an index set for the order of the Sobolev spaces in the background component of the cost functional that is chosen such that

$$\begin{aligned} \begin{aligned}&s^b_{u^a},s^b_{\theta ^a},s^b_{u^o},s^b_{\theta ^o}\ge 3. \end{aligned} \end{aligned}$$

(93)

Let $\mathbf{s}^{obs}:=(s^{obs}_{u^a},s^{obs}_{\theta ^a},s^{obs}_{u^o},s^{obs}_{\theta ^o})\in \mathbb {Z}^4$ be an index set for the order of the Sobolev spaces in observational component of the cost functional that is chosen such that

$$\begin{aligned} \begin{aligned}&s^{obs}_{u^a}\le \min \{s^{*}_{u^a},s^{b}_{u^a}\},\quad s^{obs}_{\theta ^a}\le \min \{s^{*}_{\theta ^a},s^b_{\theta ^a}\},\\&s^{obs}_{u^o}\le \min \{s^{*}_{u^o},s^{obs}_{u^o}\},\quad s^{obs}_{\theta ^o} \le \min \{s^{*}_{\theta ^o},s^b_{\theta ^o}\}. \end{aligned} \end{aligned}$$

(94)

Then there exist optimal initial conditions $\bar{\psi }_0\in \mathcal { H}^{\mathbf{s}^b}$ for the coupled data assimilation problem (8) using the cost functional (88).

Remark 4

(Cost functional for smooth and non-smooth observations) Let observations be given that are more regular than the model dynamics, such that $\mathbf{s}^{*}\ge \mathbf{s}^b$. Following (94) in Theorem 4 we choose the observational norm $\mathbf{s}^{obs}=\mathbf{s}^b$. Then the observational term of the cost functional will involve derivatives up to degree $\mathbf{s}^b$. This will not use the full regularity of the observations in case that $\mathbf{s}^{*}> \mathbf{s}^b$ .The lower bound of the degree of derivatives that enter the cost functional is given by the lower index of the observational index set $\mathbf{t}^{obs}\in \mathbb {Z}^4$ and can be chosen independently.

For less smooth observations that are for example square integrable only, we have $\mathbf{s}^{*} =0$. Consequently the observational norm is to be chosen as $\mathbf{s}^{obs}=\mathbf{s}^*$. Now only the $L^2$-norm is used in the observational term of the cost functional. The lower bound $\mathbf{t}^{obs}\in \mathbb {Z}^4$ can be chosen such that the desired scale selectivity is implemented.

These examples illustrate that the highest degree of the derivatives that appear in the cost functional $\mathcal {J}_{obs}$, measured by the upper index $\mathbf{s}^{obs}$ in $\mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]$ (see (92)), is determined by the regularity of the model and by the regularity of the observations. The lowest degree of derivatives that enter $\mathcal {J}_{obs}$ is given by the second multi-index $\mathbf{t}^{obs}$. This index can be chosen freely and it can even by negative. With negative indices one filters out small scale features (see Remark 5 below).

Remark 5

(Smoothing via Sobolev norm with negative index) Consider the operator $\triangle ^{\alpha }$ in (91), applied to $\mathcal {R}\delta X$ with model-data misfit $\delta X:=\mathcal {M}[\psi _0]-\psi _{obs}$ and observation error covariance operator $\mathcal {R}$. For $\alpha =(\alpha _{u^a},\alpha _{\theta ^a},\alpha _{u^o},\alpha _{\theta ^o})\in \mathbb {Z}^4$, the positive indices emphasize the importance of derivatives in the minimization process. According to the Fourier-characterization (19) this puts weight on fitting the high-wave numbers (small spatial scales). For negative $\alpha $-indices the operator $\triangle ^{\alpha }$ denotes the solution to the equation

$$\begin{aligned} \triangle ^{\alpha }\delta Y=\mathcal {R}\delta X, \end{aligned}$$

which is supplemented by periodic boundary conditions. This Laplacian acts on the respective component of $\mathcal {R}\delta X$, i.e. if all components of the multi-index $\alpha \in \mathbb {Z}^4$ are negative, then we have to solve the equations for $Y=(Y_{u^a},Y_{\theta ^a},Y_{u^o},Y_{\theta ^o})$

$$\begin{aligned} \begin{aligned}&\triangle ^{\alpha _{u^a}} Y_{u^a}= [\mathcal {R}\delta X]_{u^a},\ \triangle ^{\alpha _{\theta ^a}} Y_{\theta ^a}= [\mathcal {R}\delta X]_{\theta ^a},\\&\triangle ^{\alpha _{u^o}} Y_{u^o}= [\mathcal {R}\delta X]_{u^o},\ \triangle ^{\alpha _{\theta ^o}} Y_{\theta ^o}= [\mathcal {R}\delta X]_{\theta ^o}, \end{aligned} \end{aligned}$$

where the components referring to velocities are vector Laplacians, while the remaining two are scalar Laplacians. For negative Sobolev space indices the small scales are filtered out via the inverse Laplacian and are excluded from the minimization of the cost functional. The decision whether positive or negative indices are appropriate is a design decision of the data assimilation process and depends on the problem under investigation.

Proof of Theorem 4

Let $(\psi _{0,n})_n\subseteq \mathcal { H}^{\mathbf{s}^b}(\varOmega )$ be a minimizing sequence of initial conditions for the data assimilation problem. We denote by $\psi _{n}:=\mathcal {M}[\psi _{0,n}]\in C(T, \mathcal { H}^{\mathbf{s}^b}(\varOmega ))\cap L^2(T, \mathcal { H}^{\mathbf{s}^b+1}(\varOmega ))$ the corresponding solutions of equations (1)–(5). The model-observation difference satisfies $(\mathcal {M}[\psi _0]-\psi _{obs})\in L^2(T,\mathcal { H}^{\mathbf{s}^{obs}}(\varOmega ))$, with $\mathbf{s}^{obs}$ satisfying (92). Since the model error covariance operator $\mathcal {R}$ preserves the space (cf. (26)) it follows that $\mathcal {R}(\mathcal {M}[\psi _0]-\psi _{obs})\in L^2(T,\mathcal { H}^{\mathbf{s}^{obs}}(\varOmega ))$ and from (91) we infer that the cost functional is well-defined. From (88) follows that the sequence of initial conditions $(\psi _{0,n})_n$ is bounded in $\mathcal { H}^{\mathbf{s}^b}$, i.e, there exists a $c>0$ such that uniformly for all $n\in \mathbb {N}$

$$\begin{aligned} ||\psi _{0,n}||_{\mathcal { H}^{\mathbf{s}^b}}\le c. \end{aligned}$$

(95)

From Theorem 1 we see that the sequence of associated solutions $(\psi _{n})_n$ is bounded in $C(T;\mathcal { H}^{\mathbf{s}^b})\cap L^2(T,\mathcal { H}^{\mathbf{s}^b+1})$, in particular there exists a $C>0$ such that for all $n\in \mathbb {N}$

$$\begin{aligned} \int _T||\nabla \psi _n||_{\mathcal { H}^{\mathbf{s}^b}}^2\le C. \end{aligned}$$

(96)

Since $\mathcal { H}^{\mathbf{s}^b+1}$ is compactly embedded in $\mathcal { H}^{\mathbf{s}^b}$ we conclude that a subsequence, still denoted $(\psi _{n})_n$, exists and a limit $\bar{\psi }$, such that $(\psi _{n})_n$ converges strongly in $L^2(T,\mathcal { H}^{\mathbf{s}^b})$ to $\bar{\psi }\in L^2(T,\mathcal { H}^{\mathbf{s}^b})$ and weakly in $L^2(T,\mathcal { H}^{\mathbf{s}^b+1})$. For the limit $\bar{\psi }$ lower semi-continuity implies for all $n\in \mathbb {N}$

$$\begin{aligned} \begin{aligned} ||\bar{\psi }_0||_{\mathcal { H}^{\mathbf{s}^b}}&\le ||\psi _{0,n}||_{\mathcal { H}^{\mathbf{s}^b}(d\mu _\mathcal {B})},\\ \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \int _T||\mathcal {M}[\bar{\psi _0}]-\psi _{obs}||_{\mathcal {H}^{\mathbf{s}^{obs}}(d\mu _\mathcal {R})}^2\,dt&\le \lim \inf _n\int _T||\mathcal {M}[\psi _{n}]-\psi _{obs}||_{\mathcal { H}^{\mathbf{s}^{obs}}(d\mu _\mathcal {R})}^2\, dt. \end{aligned} \end{aligned}$$

Consequently

$$\begin{aligned} ||\bar{\psi }_0||_{\mathcal {H}^{s^b}(d\mu _\mathcal {B})} + \int _T||\mathcal {M}[\bar{\psi }]-\psi _{obs}||_{\mathcal {H}^{\mathbf{s}^{obs}}(d\mu _\mathcal {R})}^2\, dt \le \lim \inf _n\mathcal {J}(\psi _{0,n},\psi _{obs}). \end{aligned}$$

We show now that the limit $\bar{\psi }$ is a regular solution of (1)–(5) in the sense of Definition 1. We can adopt the arguments from the proof of Theorem 1 (cf. Step 4) to show that $\bar{\psi }\in C(T,\mathcal {H}^{\mathbf{s}^b})$. We consider the components of $(\psi _n-\bar{\psi })$, and denote them by $(f_k^{(i)}-\bar{f}^{(i)})$, with $i=1\ldots 4$. By applying Lemma 3 to the components $(f_k^{(i)}-\bar{f}^{(i)})$ of $(\psi _n-\bar{\psi })$ it follows that for all $s_i' < s^b_i$ and $t\in T$

$$\begin{aligned} ||f^{(i)}_n(t)-\bar{f}^{(i)}(t)||_{\mathcal {H}^{s'_i}} \le C_{_i}s ||f^{(i)}_n(t)-\bar{f}^{(i)}(t)||_{\mathcal {L}^2}^{1-s_i'/{s^b_i}} ||f^{(i)}_n(t)-\bar{f}^{(i)}(t)||_{\mathcal {H}^{s^b_i}}^{s_i'/{s^b_i}}. \end{aligned}$$

(97)

This proves the convergence of $(\psi _n)_n$ in $C(T;\mathcal {H}^{\mathbf{s}'})$ to $\bar{\psi }$ for $\mathbf{s}'<\mathbf{s}^b$, because $(\psi _n)_n$ converges in $L^2(T,\mathcal {L}^2)$ and is bounded in $ L^2(T,\mathcal {H}^{\mathbf{s}^b})$. From the strong convergence in $C(T;\mathcal {H}^{s'})$ for $s'<s^b$ and the density of $\mathcal {H}^{-\mathbf{s}'}$ in $\mathcal {H}^{-\mathbf{s}}$ we conclude for $\phi \in \mathcal {H}^{-\mathbf{s}'}$ that

$$\begin{aligned} \lim _{n\rightarrow \infty }\big \langle \psi _n(\cdot ,t),\phi \big \rangle _{\mathcal {L}^2}=\big \langle \bar{\psi }(\cdot ,t),\phi \big \rangle _{\mathcal {L}^2}. \end{aligned}$$

(98)

The weak continuity implies for $\tau \in [t_0,t_1]$

$$\begin{aligned} \lim _{\tau \rightarrow t_0+}\inf ||\bar{\psi }(\cdot ,\tau )||_{\mathcal {H}^{\mathbf{s}^b}}\ge ||\bar{\psi }_0||_{\mathcal {H}^{\mathbf{s}^b}}. \end{aligned}$$

(99)

From (49) follows

$$\begin{aligned} \lim _{\tau \rightarrow t_0+}\sup ||\bar{\psi }(\cdot ,\tau )||_{\mathcal {H}^{\mathbf{s}^b}}\le ||\bar{\psi }_0||_{\mathcal {H}^{\mathbf{s}^b}}. \end{aligned}$$

(100)

This proves the continuity of the $\mathcal {H}^{\mathbf{s}^b}$-norm of the solution at initial time

$$\begin{aligned} \lim _{\tau \rightarrow t_0+}||\bar{\psi }(\cdot ,\tau )||_{\mathcal {H}^{\mathbf{s}^b}}= ||\bar{\psi }_0||_{\mathcal {H}^{\mathbf{s}^b}}. \end{aligned}$$

(101)

From (96) and the weak convergence of $(\psi _{n})_n$ to $\bar{\psi }$ in $L^2(T,\mathcal { H}^{\mathbf{s}^b+1})$ follows

$$\begin{aligned} \int _T||\nabla \bar{\psi }||_{\mathcal { H}^{\mathbf{s}^b}}^2\le C. \end{aligned}$$

(102)

This implies that $\bar{\psi }\in L^2(T,\mathcal {H}^{\mathbf{s}^b+1})$. Consequently there exists a set $E\subseteq T$ of Lebesgue-measure zero such that for all $\tau \in T{\setminus } E$ it holds that $\bar{\psi }(\cdot ,\tau )\in \mathcal {H}^{\mathbf{s}^b+1}$. This implies that for all $\delta >0$ there exists a $t_0^* < \delta $ such that $\bar{\psi }(\cdot ,t_0^*)\in \mathcal {H}^{\mathbf{s}^b+1}$. If we use $\bar{\psi }_{t_0^*}:=\bar{\psi }(\cdot ,t_0^*)$ as initial condition we can repeat all the arguments of our proof to establish the existence of a solution $\tilde{\psi }\in C([t_0^*,t_1^*],\mathcal {H}^{\mathbf{s}^*})$, with $\mathbf{s}^*<\mathbf{s}^b+1$. The two solutions $\bar{\psi },\tilde{\psi }$ coincide on their joint interval of existence $[t_0,t_1]\cap [t_0^*,t_1^*]$. We obviously have for the two endpoints $t_1\le t_1^*$ and hence $\bar{\psi },\tilde{\psi }$ coincide on $[t_0^*,t_1]$. Since $\delta >0$ was arbitrary we have $\bar{\psi }\in C((t_0,t_1], \mathcal {H}^{\mathbf{s}^b})$ and combined with the continuity at $t_0$ (see (54)) it follows that $\bar{\psi }\in C([t_0,t_1], \mathcal {H}^{\mathbf{s}^b})$. From (1)–(5) follows then $\bar{\psi }\in C^1([t_0,t_1], \mathcal {H}^{{\mathbf{s}^b}-2})$. $\square $

Remark 6

The geostrophic balance

$$\begin{aligned} \mathbf{u}^\bot \sim \nabla \theta \end{aligned}$$

constitutes an important constraint on large scale Atmosphere- and Ocean dynamics. The $H^s$-approach drives the gradient $\nabla \theta $ towards $\nabla \theta ^{obs}$ and $\mathbf{u}^\bot $ towards $\mathbf{u}^{\bot ,obs}$ and thereby automatically accounts for geostrophic balance, provided the observational data are in approximate geostrophic balance. Therefore an addition of a penalty term to the cost functional to prevent the deviation from geostrophic balance is not needed.

Proposition 1

(Stability) Let the assumptions of Theorem 4 be satisfied. Denote by $\bar{\psi }_0\in \mathcal { H}^{s^b}$ the optimal initial conditions for the coupled data assimilation problem (8) using the cost functional (88) and by $\psi =\psi (\bar{\psi }_0):=(u^a_1,\theta ^a_1,u^o_1,\theta ^o_1)$ the solution of (1)–(5) with initial condition $\bar{\psi }_0$. Let $\phi =\phi (\phi _0):=(u^a_2,\theta ^a_2,u^o_2,\theta ^o_2)$ be a second solution with initial condition $\phi _0\in \mathcal { H}^{s^b}$. If $||\bar{\psi }_0-\phi _0||_{ \mathcal { H}^{s^b}}\le \varepsilon $, then

$$\begin{aligned} \begin{aligned} ||\psi (t)-\phi (t)||_{ \mathcal {H}^{\mathbf{s}}}\le \varepsilon e^{\int _{t_0}^t \big (M_s(y)+CK_s^2C_P^2(Pe^a||\gamma (y)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (y)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\,dy} \end{aligned} \end{aligned}$$

(103)

where $M_s,\nu ^*$ is defined in Theorem 2.

Proof

Subtracting the two model solutions $\psi $ and $\phi $ leads to a difference equation that resembles the linear equation (56) with linear variables ${U}^a=u^a_2-u^a_1,{\varTheta }^a:=\theta ^a_2-\theta ^a_1$ and ${U}^o=u^o_2-u^o_1,{\varTheta }^o:=\theta ^o_2-\theta ^o_1$. The assertion follows now from Theorem 2. $\square $

6 Calculation of Minimizers and Convergence of Gradient Algorithm

In this section we study the convergence of a gradient based algorithm to calculate the optimal initial conditions for the data assimilation problem (8) with cost functional specified by (88). We demonstrate convergence by invoking a classical result about convergence of gradient algorithms (see below Lemma 9). This Lemma involves the second derivative of the cost functional. We calculate this derivative with the help of the second-order adjoint equations of the model.

6.1 Characterization of Local Minima

The existence of a minimizer allows us to investigate the problem of establishing an algorithm for the calculation of this minimum.

Theorem 5

(First Order Necessary Condition) Let the assumptions of Theorem 4 be satisfied. Denote by $\bar{\psi }_0\in \mathcal {H}^{\mathbf{s}}$ an optimal initial condition of the data assimilation problem (8) and by $\bar{\psi }$ the associated solution of the coupled equations (1)–(5). Then $\bar{\psi }_0$ satisfies

$$\begin{aligned} \bar{\psi }_0= \psi _{back} - \mathcal {B}^{-1}\mathcal {S}^{-1}_{\mathbf{s}^b}\tilde{\psi }_0, \end{aligned}$$

(104)

where

$$\begin{aligned} \mathcal {S}_{\mathbf{s}^b}:=\sum _{\alpha \in \mathcal {I}_{\mathbf{s}^b}}(-1)^{|\alpha |}\triangle ^{\alpha }, \end{aligned}$$

(105)

and where $\tilde{\psi }$ is the solution of the adjoint linearized coupled equations (85) (or equivalently (86))

$$\begin{aligned} \begin{aligned}&-\frac{\partial \tilde{\varPsi }}{\partial t} +\mathcal {N}'^*[\psi ](\tilde{\varPsi }) +L\tilde{\varPsi } +D\tilde{\varPsi } -\tilde{C}(\tilde{\varPsi ^a},\tilde{\varPsi ^o}) =\tilde{F}, \end{aligned} \end{aligned}$$

(106)

with forcing $\tilde{F}:=\sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]}\triangle ^{\alpha }\mathcal {R}(\bar{\psi }-\psi _{obs})$, and with initial condition $\tilde{\psi }(t_1)=0$, specified at $t=t_1$.

Proof

For a minimizer $\bar{\psi _0}$, which exists according to Theorem 4, the Gateaux derivative vanishes such that $\mathcal {J}'(\bar{\psi _0};h)=0$ for all perturbations h. We calculate the Gateaux derivative of $\mathcal {J}$ at an arbitrary state $\psi $ in direction h, by using (19), as follows

$$\begin{aligned} \begin{aligned} \mathcal {J}'(\psi _0;h)&= \big \langle \mathcal {B}(\psi _0-\psi _{back}),h\big \rangle _{\mathcal {H}^{\mathbf{s}^b}} + \sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]}\int _T\int _\varOmega \triangle ^{\alpha }\mathcal {R}\big ( \mathcal {M}[\psi _0]-\psi _{obs}\big )\\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \times \big ( \frac{D\mathcal {M}[\psi _0]}{D\psi _0}h \big )\, dxdt\\&= \big \langle \mathcal {B}(\psi _0-\psi _{back}),h\big \rangle _{\mathcal {H}^{\mathbf{s}^b}} + \sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]} \int _T\int _\varOmega \triangle ^{\alpha }\mathcal {R}\big ( \mathcal {M}[\psi _0]-\psi _{obs} \big )\cdot \varPsi \, dxdt, \end{aligned}\end{aligned}$$

where we have applied the chain rule and the fact that $\varPsi :=\frac{D\mathcal {M}[\psi _0]}{D\psi _0}h$, according to Lemma 7, satisfies the linearized equation. Now we define the forcing of the adjoint equation by $\tilde{F}:=\sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]}\triangle ^{\alpha } \mathcal {R}\big (\mathcal {M}[\psi _0]-\psi _{obs}\big )$ and its initial condition by $\tilde{\psi }(t_1)=0$. Then Lemmas 7 and 8 imply that

$$\begin{aligned} \mathcal {J}'(\psi _0;h)= & {} \big \langle \mathcal {B}(\psi _0-\psi _{back}),h\big \rangle _{\mathcal {H}^{\mathbf{s}^b}} +\int _T\int _\varOmega \tilde{F}\cdot \tilde{\psi }\ dxdt -\int _\varOmega \tilde{\psi }\psi |_{t_0}^{t_1}\, dx \nonumber \\= & {} \big \langle \mathcal {B}(\psi _0-\psi _{back}),h\big \rangle _{\mathcal {H}^{\mathbf{s}^b}} +\int _\varOmega \tilde{\psi }(t_0)\psi (t_0)\, dx\nonumber \\= & {} \sum _{\alpha \in \mathcal {I}_{\mathbf{s}^b}} \big \langle \mathcal {D}^\alpha \mathcal {B}(\psi _0-\psi _{back}),\mathcal {D}^\alpha h\big \rangle _{\mathcal {L}^2} +\int _\varOmega \tilde{\psi }(t_0)h\, dx\nonumber \\= & {} \sum _{\alpha \in \mathcal {I}_{\mathbf{s}^b}}(-1)^{|\alpha |}\big \langle \triangle ^{\alpha }\mathcal {B}(\psi _0-\psi _{back}),h\big \rangle _{L^2} +\int _\varOmega \tilde{\psi }(t_0)h\, dx\nonumber \\= & {} \sum _{\alpha \in \mathcal {I}_{\mathbf{s}^b}}(-1)^{|\alpha |}\int _\varOmega \big (\triangle ^{\alpha }\mathcal {B}(\psi _0-\psi _{back}) +\tilde{\psi }(t_0)\big )\cdot h\, dx\nonumber \\= & {} \int _\varOmega \big ( \mathcal {S}_{\mathbf{s}^b} \mathcal {B}(\psi _0-\psi _{back}) +\tilde{\psi }(t_0)\big )\cdot h\, dx. \end{aligned}$$

(107)

For a minimum we have $\mathcal {J}'(\bar{\psi _0};h)=0$ for all h, which implies with (107)

$$\begin{aligned} \mathcal {S}_{\mathbf{s}^b} \mathcal {B}(\bar{\psi _0}-\psi _{back}) + \tilde{\psi }(t_0)=0, \end{aligned}$$

and we finally derive

$$\begin{aligned} \bar{\psi _0}= \psi _{back} -\mathcal {B}^{-1}\mathcal {S}^{-1}_{\mathbf{s}^b}\tilde{\psi }_0. \end{aligned}$$

$\square $

Remark 7

On the left-hand-side of (106) we find the linear operator $\mathcal {L}_\psi \tilde{\varPsi }:=\frac{\partial \tilde{\varPsi }}{\partial t} +\mathcal {N}'^*[\psi ](\tilde{\varPsi }) +L\tilde{\varPsi }-\tilde{C}(\tilde{\varPsi ^a},\tilde{\varPsi ^o})$. In order to solve the equation $\mathcal {L}_\psi \tilde{\varPsi }=\tilde{F}$, the right-hand-side with the observational information has to be in the range of $\mathcal {L}_\psi $.

Remark 8

The appearance of the Sobolev-norm $H^{s}$ in the background term is equivalent to a smoothing of the adjoint field at time $t=t_0$ via the smoothing operator $\mathcal {S}_{\mathbf{s}^b}^{-1}$. Since the adjoint field at time $t=t_0$ is identified with the gradient of the cost functional with respect to the initial conditions (cf. Theorem 5) this implies a smoothing of the gradient. The Sobolev-norm with negative index $H^{-s}$ in the observational term leads to a smoothing of the adjoint forcing, which potentially may results in a more regular adjoint field at time $t=t_0$.

Remark 9

(The case of two uncoupled models) The results on data assimilation in sections 5 and (6) apply also to the two uncoupled models that is constructed if once replaces the coupling term by an appropriate external forcing (cf. Remark 3). Then our proofs imply the existence of optimal conditions (Theorem 4) and the characterization of the optimal initial condition by an adjoint condition (Theorem 5).

6.2 Convergence of Gradient-Based Descent Algorithm

In this section we always assume that the assumptions of Theorem 4 are satisfied. The goal of this section is to prove the convergence of an iterative gradient based method for determining the optimal initial condition. In order to prove convergence we investigate the Hessian of the cost functional and its computation via the second-order adjoint equations. This iterative gradient algorithm reads as follows:

Remark 10

The difference between the iterative algorithm above and classical data assimilation algorithms are the presence of the Laplace operator in the adjoint forcing in step 2, and the occurrence of the smoothing operator $\mathcal {S}^{-1}_{\mathbf{s}^b}$ in steps 3 and 4. The smoothing operator takes care that the adjoint state $\tilde{\psi }^n$ resides in the same space as the initial state $\psi ^{n+1}_0$. Both modifications are a consequence of the Sobolev-norms in the background and observational term of the cost functional.

The algorithm above shows also that the required modifications can without fundamental difficulties be integrated into an existing data assimilation framework.

The next lemma gives a conditions on the convergence of gradient algorithms in a Hilbert space in terms of the second derivative of the cost functional.

Lemma 9

[1] Let J be a real-valued function on a Hilbert space X with norm $|\cdot |$. We make the following assumptions:

(i) J is of class $C^2$ and has a local minimum at a point $x^*\in X$,

(ii) there exists a ball $B(x^*)\subseteq X$ around $x^*$, and two real numbers m, M, such that the following inequalities hold:

$$\begin{aligned} m |x||y| \le J''(u;x,y)\le M |x||y|,\qquad \text {for all }u\in B, \text {and }x,y\in X, \end{aligned}$$

where $J''[u;x, y]$ is the bilinear form associated with the second derivative of J. Then the gradient algorithm with initial value $x_0\in B$ converge to $x^*$.

The second derivative of the cost functional $\mathcal {J}$ is related to the Hessian $H_{\mathcal {J}}[\psi ]$ via

$$\begin{aligned} \mathcal {J}''(\psi _0; \mathcal {W},\mathcal {Z}) = \big \langle \mathcal {Z},H_{\mathcal {J}}[\psi ]\mathcal {W}\big \rangle ,\qquad \text {for }\mathcal {W},\mathcal {Z}\in \mathcal {H}^{s^b}(\varOmega ). \end{aligned}$$

(110)

The calculation of the Hessian $H_\mathcal {J}$ of the cost functional proceeds in the following steps

The Eqs. (110) and (111) are used to verify the boundedness of the second derivative of the cost functional in order to apply Lemma 9. This requires information about the regularity of the second-order adjoint equations.

Second-Order Adjoint Equations The second-order adjoint equations are derived by linearizing the system of model and adjoint equations (1)–(5) and (85). For more information we refer to [20, 37]. The evolution of second-order adjoint variables $\bar{\varPsi }:=(\bar{U}^a,\bar{V}^a,\bar{\varTheta }^a, \bar{U}^o,\bar{V}^o,\bar{\varTheta }^o)$ is governed by the following equations

$$\begin{aligned} \begin{aligned} \text {Atmosphere: } -\frac{\partial \bar{U}^a}{\partial t}&-u^a\frac{\partial \bar{U}^a}{\partial x} -v^a\frac{\partial \bar{V}^a}{\partial y} -\bar{U}^a\frac{\partial v^a}{\partial y} +\bar{V}^a\frac{\partial v^a}{\partial x} -\frac{1}{Ro^a}\bar{V}^{a\bot }\\&+\frac{1}{Ro^a}\theta ^a\frac{\partial \bar{\varTheta }^a}{\partial x} =\sigma \bar{U}^o +\frac{1}{Re^a}\triangle \bar{U}^a+\mathcal {G}^a_{\bar{U}} +\bar{F}^a_{\bar{U}},\\ -\frac{\partial \bar{V}^a}{\partial t}&-\bar{U}^a\frac{\partial u^a}{\partial y} +u^a\frac{\partial \bar{V}^a}{\partial x} -\bar{V}^a\frac{\partial u^a}{\partial x} -v^a\frac{\partial \bar{V}^a}{\partial y} +\frac{1}{Ro^a}\bar{U}^{a\bot }\\&+\frac{1}{Ro^a}\theta ^a\frac{\partial \bar{\varTheta }^a}{\partial v} =\sigma \bar{V}^o +\frac{1}{Re^a}\triangle \bar{V}^a+\mathcal {G}^a_{\bar{V}^a} +\bar{F}^a_{\bar{V}},\\ -\tilde{Fr}^a\frac{\partial \bar{\varTheta }^a}{\partial t}&-\frac{1}{Ro^a}\left( \frac{\partial \bar{U}^a}{\partial x}+\frac{\partial \bar{V}^a}{\partial y}\right) -u^a\frac{\partial \bar{\varTheta }^a}{\partial x} -v^a\frac{\partial \bar{\varTheta }^a}{\partial y} = \frac{1}{Pe^a}\triangle \bar{\varTheta }^a + \mathcal {G}^a_{\bar{\varTheta }} +\bar{F}^a_{\bar{\varTheta }},\\ \text {Ocean: } -\frac{\partial \bar{U}^o}{\partial t}&-u^o\frac{\partial \bar{U}^o}{\partial x} -v^o\frac{\partial \bar{V}^o}{\partial y} -\bar{U}^o\frac{\partial v^o}{\partial y} +\bar{V}^o\frac{\partial v^o}{\partial x} -\frac{1}{Ro^o}\bar{V}^{o\bot } +\frac{1}{Ro^o}\theta ^o\frac{\partial \bar{\varTheta }^o}{\partial x}\\&= \frac{1}{Re^o}\triangle \bar{U}^o+\mathcal {G}^o_{\bar{U}} +\bar{F}^o_{\bar{U}},\\ -\frac{\partial \bar{V}^o}{\partial t}&-\bar{U}^o\frac{\partial u^o}{\partial y} +u^o\frac{\partial \bar{V}^o}{\partial x} -\bar{V}^o\frac{\partial u^o}{\partial x} -v^o\frac{\partial \bar{V}^o}{\partial y} +\frac{1}{Ro^o}\bar{U}^{o\bot } +\frac{1}{Ro^o}\theta ^o\frac{\partial \bar{\varTheta }^o}{\partial v}\\&= \frac{1}{Re^o}\triangle \bar{V}^o+\mathcal {G}^o_{\bar{V}^o} +\bar{F}^o_{\bar{V}},\\ -\frac{\partial \bar{\varTheta }^o}{\partial t}&- u^o\frac{\partial \bar{\varTheta }^o}{\partial x} -v^o\frac{\partial \bar{\varTheta }^o}{\partial y} = \frac{1}{Pe^o}\triangle \bar{\varTheta }^o-\gamma \bar{\varTheta }^a +\mathcal {G}^o_{\bar{\varTheta }} +\bar{F}^o_{\bar{\varTheta }},\\&\frac{\partial \bar{U}^o}{\partial x} +\frac{\partial \bar{V}^o}{\partial y}=0, \end{aligned} \end{aligned}$$

(112)

where the additional terms on the right-hand side are defined by

$$\begin{aligned} \begin{aligned}&\mathcal {G}_{\bar{U}}^a:= -\widetilde{{V}}^a\frac{\partial {V}^a}{\partial x} +{U}^a\frac{\partial \widetilde{{U}}^a}{\partial x} +{V}^a\frac{\partial {U}^a}{\partial y} -\widetilde{{U}}^a\frac{\partial {V}^a}{\partial y} +{\varTheta }^a\frac{\partial \widetilde{{\varTheta }}^a}{\partial x}\\&\mathcal {G}_{\bar{V}}^a:= -\widetilde{{U}}^a\frac{\partial {U}^a}{\partial x} -{U}^a\frac{\partial \widetilde{{V}}^a}{\partial x} +\widetilde{{V}}^a\frac{\partial {U}^a}{\partial y} -{U}^a\frac{\partial \widetilde{{V}}^a}{\partial y} +{\varTheta }^a\frac{\partial \widetilde{{\varTheta }}^a}{\partial y}\\&\mathcal {G}_{\bar{\varTheta }}^a:= {U}^a\frac{\partial \widetilde{{\varTheta }}^a}{\partial x} +{V}^a\frac{\partial \widetilde{{\varTheta }}^a}{\partial y}. \end{aligned} \end{aligned}$$

(113)

The oceanic terms $\left( \mathcal {G}_{\bar{U}}^o,\mathcal {G}_{\bar{V}}^o,\mathcal {G}_{\bar{V}}^o\right) $ are defined analogously. The forcing is denoted by $\bar{F}=\left( \bar{F}^a_{\bar{U}},\bar{F}^a_{\bar{V}},\bar{F}^a_{\bar{\varTheta }}, \bar{F}^o_{\bar{U}},\bar{F}^o_{\bar{V}},\bar{F}^o_{\bar{\varTheta }}\right) $. For the data assimilation problem under investigations the forcing is given in terms of the linearized state

$$\begin{aligned} \bar{F}:= \sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]}\triangle ^{\alpha }\mathcal {R}{\varPsi }. \end{aligned}$$

(114)

Theorem 6

(Regularity of Second-Order Adjoint Equations) Let $\mathbf{s}=(s_{u^a},s_{\theta ^a},s_{u^o},s_{\theta ^o})\in \mathbb {Z}_+^4$ such that all components of $\mathbf{s}$ are greater or equal than 3. Let the following conditions be fulfilled

1.
the initial condition of the coupled equation (1)–(5) satisfy $\psi _0\in \mathcal {H}^{\mathbf{s}}(\varOmega )$
2.
the initial condition of the linearized coupled equations (56) satisfy $\varPsi _0\in \mathcal {H}^{\mathbf{s}}(\varOmega )$
3.
the initial condition of the coupled adjoint equations (85), specified at $t=t_1$ satisfy $\tilde{\varPsi }(t_1)\in \mathcal {H}^{\mathbf{s}}(\varOmega )$
4.
the forcing satisfies $\bar{F}\in L^2(T,\mathcal {H}^{\mathbf{s}-1})$.

Then the system (112) has a unique solution on T with the properties

$$\begin{aligned} \bar{\varPsi }(t)\in C(T,\mathcal {H}^{\mathbf{s}})\cap L^2(T,\mathcal { H}^{\mathbf{s}+1}), \end{aligned}$$

and the state vector $\bar{\varPsi }$ of (112) satisfies

$$\begin{aligned} \begin{aligned}&||\bar{\varPsi }(t)||_{\mathcal {H}^{\mathbf{s}}}^2 \le ||\bar{\varPsi }(t_1)||_{\mathcal {H}^{\mathbf{s}}}^2e^{\int _{t_0}^t \big (M_s(y)+CK_s^2C_P^2(Pe^a||\gamma (y)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (z)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\, dy}\\&\quad \quad +\int _{t_0}^t \big [ (Pe^a+Pe^o)(||\bar{F}_{{\varTheta }^a}||_{H^{s_{\theta ^a}-1}}^2+||\bar{F}_{{\varTheta }^o}||_{H^{s_{\theta ^o}-1}}^2+||\mathcal {G}_{\varTheta }^a||_{H^{s_{\theta ^a}-1}}^2 +||\mathcal {G}_{\varTheta }^o||_{H^{s_{\theta ^o}-1}}^2)\\&\quad \quad +(Re^a+Re^o)(||\bar{F}_{{U}^a}||_{\mathbf{H}^{s_{u^a}-1}}^2+||\bar{F}_{{U}^o}||_{\mathbf{H}^{s_{u^o}-1}}^2 +||\mathcal {G}_{U}^a||_{\mathbf{H}^{s_{u^a} -1}}^2+||\mathcal {G}_{U}^o||_{\mathbf{H}^{s_{u^o} -1}}^2)\big ]\\&\qquad \times e^{\int _y^t \big (M_s(z)+CK_s^2C_P^2(Pe^a||\gamma (z)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (z)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\, dz}dy \end{aligned} \end{aligned}$$

(115)

where $M_s$, $\nu ^*$ and $F_{U}, F_{\varTheta }$ are defined in Theorem 2.

Proof

(Sketch of proof) The second-order adjoint equations (112) resemble formally the first-order adjoint equations (85) if one identifies the first-order adjoint variable $\tilde{\varPsi }$ with the second-order adjoint variable $\bar{\varPsi }$. Then the difference between the two equations are the additional $\mathcal {G}$-terms defined in (113). These terms consist of products of linear and adjoint variable $\varPsi :=({U}^a,{\varTheta }^a,{U}^o,{\varTheta }^o)$ and $\tilde{\varPsi }:=(\widetilde{{U}}^a,\widetilde{{\varTheta }}^a,\widetilde{{U}}^o,\widetilde{{\varTheta }}^o)$ and their respective derivatives. From the regularity of linear and adjoint state in Theorems 2 and 3 we conclude that the atmospheric $\mathcal {G}_{\bar{U}}^a,\mathcal {G}_{\bar{V}}^a,\mathcal {G}_{\bar{\varTheta }}^a$ and oceanic terms $\mathcal {G}_{\bar{U}}^o,\mathcal {G}_{\bar{V}}^o,\mathcal {G}_{\bar{\varTheta }}^o$ are bounded in $H^{s-1}$. If we now define $\tilde{F}:=\mathcal {G}+\bar{F}$ we can cast the second-order adjoint equations in the form of the first-order adjoint equations and apply the arguments in the proof of Theorem 3 to prove the assertion. $\square $

From the previous theorem we infer that the right-hand side of (111) is well-defined in $\mathcal {H}^{\mathbf{s}}$, i.e. for $\mathcal {W}\in \mathcal {H}^{\mathbf{s}}(\varOmega )$ we have

$$\begin{aligned} H_{\mathcal {J}}[\psi ]\mathcal {W} = \mathcal {S}_s \mathcal {B}\mathcal {W} -\bar{\varPsi }(t_0). \end{aligned}$$

(116)

The convergence of the descent algorithm is the content of the following Theorem.

Theorem 7

(Convergence) We assume that the assumptions of Theorem 4 are satisfied. Additionally we impose for the observational part of the cost functional that

$$\begin{aligned} s^{obs}_{u^a}\le 1,\ s^{obs}_{\theta ^a}\le 1, s^{obs}_{u^o}\le 1,\ s^{obs}_{\theta ^o} \le 1. \end{aligned}$$

(117)

Let ${\psi }_0^*\in \mathcal { H}^{\mathbf{s}^b}(\varOmega )$ be an optimal initial condition for the data assimilation problem (8) with cost functional specified by (88). Let ${\psi }_0^{0}\in \mathcal { H}^{\mathbf{s}^b}(\varOmega )$ be an initial value for the descent algorithm 6.2 that lies within a ball $B({\psi }_0^*)\subseteq \mathcal { H}^{\mathbf{s}^b}(\varOmega )$ around ${\psi }_0^*$. Define the sequence $(\psi _0^n)_n$ by (108). Then $(\psi _0^n)_n$ converges to ${\psi }_0^*$ in $\mathcal { H}^{\mathbf{s}^b}(\varOmega )$.

Proof

To ease notation we denote the Sobolev index of the background term by $\mathbf{s}:=\mathbf{s}^b$. We establish the convergence of the descent algorithm in Sect. 6.2 by invoking Lemma 9. The necessary bounds on the cost functionals derivative are obtained from the regularity of the second-order adjoint by means of equations (110) and (111). For the derivation of upper and lower bound of the Hessian we need an estimate of the second-order adjoint state. We infer from Theorem 6 with $\bar{\varPsi }(t_1)=0$

$$\begin{aligned} \begin{aligned} ||\bar{\varPsi }(t)||_{\mathcal {H}^{\mathbf{s}}}^2 \le \mu&\int _{t_0}^t \big [ ||\bar{F}||_{\mathcal {H}^{\mathbf{s}-1}}^2+||\mathcal {G}||_{\mathcal {H}^{\mathbf{s}-1}}^2\big ]\\&\times e^{\int _y^t \big (M_s(z)+CK_s^2C_P^2(Pe^a||\gamma (z)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (z)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\, dz}dy, \end{aligned} \end{aligned}$$

(118)

where $\mu :=\max \{(Pe^a+Pe^o),(Re^a+Re^o) \}$ and where the components of $\mathcal {G}$ are defined in (113). The forcing $\bar{F}$ in (118) is given by (114) and satisfies with assumption (26) on the covariance operators for $t\in T=[t_0,t_1]$ the estimate

$$\begin{aligned} \begin{aligned} ||\bar{F}(t)||_{\mathcal { H}^{\mathbf{s}-1}}&\le \sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]}||\triangle ^{\alpha }\mathcal {R}{\varPsi }(t)||_{\mathcal {H}^{\mathbf{s}-1}} = \sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]}||\mathcal {R}{\varPsi }(t)||_{\mathcal {H}^{2\alpha +\mathbf{s}-1}}\\&\le ||\mathcal {R}||\sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]}||{\varPsi }(t)||_{\mathcal {H}^{2\alpha +\mathbf{s}-1}} \le C_{s^{obs}}||\mathcal {R}||\,||{\varPsi }(t)||_{\mathcal {H}^{\mathbf{s}+1}}, \end{aligned} \end{aligned}$$

(119)

with $\varPsi \in L^\infty ([t_0,t_1],\mathcal {H}^{\mathbf{s}})\cap L^2([t_0,t_1],\mathcal {H}^{\mathbf{s}+1})$ and where we have used that due to (117) we have $2\alpha +\mathbf{s}-1\le \mathbf{s}+1$ for $\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]$. For the non-linear forcing terms $\mathcal {G}$ in (113) we derive with the inequalities of Cauchy–Schwarz and Poincaré the following estimate that is valid on the time interval $T=[t_0,t_1]$

$$\begin{aligned} \begin{aligned} ||\mathcal {G}_{\bar{U}}^a||_{\mathbf{H}^{s-1}} \le&C||\nabla \widetilde{{V}}^a(t)||_{\mathbf{H}^{s-1}} ||\nabla {V}^a||_{\mathbf{H}^{\mathbf{s}-1}} +C||\nabla {U}^a||_{\mathbf{H}^{s-1}} ||\nabla \widetilde{{U}}^a||_{\mathbf{H}^{\mathbf{s}-1}}\\&+C||\nabla {V}^a||_{\mathbf{H}^{s-1}} ||\nabla {U}^a||_{\mathbf{H}^{\mathbf{s}-1}} +C||\nabla \widetilde{{U}}^a||_{\mathbf{H}^{s-1}} ||\nabla {V}^a||_{\mathbf{H}^{\mathbf{s}-1}}\\&+C||\nabla {\varTheta }^a||_{\mathbf{H}^{s-1}} ||\nabla \widetilde{{\varTheta }}^a||_{\mathbf{H}^{\mathbf{s}-1}} \le C||\nabla \varPsi ^a||_{\mathcal {H}^{s-1}}||\nabla \tilde{\varPsi }^a||_{\mathcal {H}^{\mathbf{s}-1}}. \end{aligned} \end{aligned}$$

(120)

Since an analogous estimate holds for all other terms in (113) we find

$$\begin{aligned} \begin{aligned} ||\mathcal {G}||_{\mathbf{H}^{s-1}} =&||\mathcal {G}^a||_{\mathbf{H}^{\mathbf{s}-1}}+||\mathcal {G}^o||_{\mathbf{H}^{\mathbf{s}-1}}\\&\le C||\nabla \varPsi ^a||_{\mathcal {H}^{\mathbf{s}-1}}||\nabla \tilde{\varPsi }^a||_{\mathcal {H}^{\mathbf{s}-1}}+C||\nabla \varPsi ^o||_{\mathcal {H}^{s-1}}||\nabla \tilde{\varPsi }^o||_{\mathcal {H}^{\mathbf{s}-1}}\\&\le C||\varPsi ||_{\mathcal {H}^{\mathbf{s}}}||\tilde{\varPsi }||_{\mathcal {H}^{\mathbf{s}}}, \end{aligned} \end{aligned}$$

(121)

with $\tilde{\varPsi }\in L^\infty ([t_0,t_1],\mathcal {H}^{\mathbf{s}})\cap L^2([t_0,t_1],\mathcal {H}^{\mathbf{s}+1})$. The linear state $||\varPsi ||_{\mathcal {H}^{\mathbf{s}}}$ in (121) is according to Theorem 2 bounded for $t\in T$

$$\begin{aligned} \begin{aligned} ||\varPsi (t)||_{\mathcal {H}^{\mathbf{s}}}^2&\le ||\varPsi _0||_{\mathcal {H}^{\mathbf{s}}}^2e^{\int _{t_0}^t \big (M_s(y)+CK_s^2C_P^2(Pe^a||\gamma (y)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (y)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\, dy}\\&=L_1(t)||\mathcal {W}||_{\mathcal {H}^{\mathbf{s}}}^2, \end{aligned} \end{aligned}$$

(122)

with $\mathcal {W}:=\varPsi _0$ (cf. box below (110)) and where $L_1(t):=e^{\int _{t_0}^tM_s(y)+||\gamma ||{H^{s_\theta ^a}}^2+||\sigma ||{H^{s_u^o}}^2-C_P^{-1}\nu _*\,dy}$ is bounded on T and where we have used the fact that the forcing of the linearized equations is zero. For the adjoint state $||\tilde{\varPsi }||_{\mathcal {H}^{\mathbf{s}}}$ in (121) we find with the zero initial condition from Theorem 3 that for $t\in T$

$$\begin{aligned} \begin{aligned} ||\tilde{\varPsi }||_{\mathcal { H}^{\mathbf{s}}}^2 \le&\mu \int _{t_0}^t \big [ ||\mathcal {\tilde{F}}_{{\varTheta }}||_{H^{s-1}}^2 +||\mathcal {\tilde{F}}_{{U}}||_{\mathbf{H}^{s-1}}^2\big ]\\&\times e^{\int _y^t \big (M_s(z)+CK_s^2C_P^2(Pe^a||\gamma (z)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (z)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\, dz}dy. \end{aligned} \end{aligned}$$

(123)

The adjoint forcing in (123) is given by

$$\begin{aligned} \tilde{F}=\sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]} \triangle ^{\alpha }\mathcal {R}\big ({\psi }^n-\psi _{obs}\big ) \end{aligned}$$

and satisfies the estimate

$$\begin{aligned} \begin{aligned} ||\tilde{F}(t)||_{\mathcal {H}^{\mathbf{s}-1}}&\le \sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]}||\triangle ^{\alpha }\mathcal {R}\big ({\psi }^n-\psi _{obs}\big )||_{\mathcal {H}^{\mathbf{s}-1}}\\&= \sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]}||\mathcal {R}\big ({\psi }^n-\psi _{obs}\big )||_{\mathcal {H}^{2\alpha +\mathbf{s}-1}}\\&\le ||\mathcal {R}||\sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]}||{\psi }^n-\psi _{obs}||_{\mathcal {H}^{2\alpha +\mathbf{s}-1}}\\&\le ||\mathcal {R}||\sum _{\alpha \in \mathcal {I}[{\mathbf{t}^{obs}},{\mathbf{s}^{obs}}]}(||{\psi }^*-{\psi }^n||_{\mathcal {H}^{2\alpha +\mathbf{s}-1}}+||{\psi }^*-\psi _{obs}||_{\mathcal {H}^{2\alpha +\mathbf{s}-1}})\\&\le ||\mathcal {R}||(||{\psi }^*-{\psi }^n||_{\mathcal {H}^{\mathbf{s}+1}}+||{\psi }^*-\psi _{obs}||_{\mathcal {H}^{\mathbf{s}+1}}) =: L_2(t), \end{aligned} \end{aligned}$$

(124)

where $L_2(t)>0$ is bounded on T, since ${\psi }^*,{\psi }^n,\psi _{obs} \in L^\infty ([t_0,t_1],\mathcal {H}^{\mathbf{s}})\cap L^2([t_0,t_1],\mathcal {H}^{\mathbf{s}+1})$. The function $L_2$ is bounded uniformly in n, because $(\psi ^n)_n\subseteq B({\psi }^*_0)$. From (124) we conclude for (123)

$$\begin{aligned} \begin{aligned} ||\tilde{\varPsi }||_{\mathcal {H}^{\mathbf{s}}}^2&\le \mu \int _{t_0}^t L_2(y) e^{\int _y^t \big (M_s(z)+CK_s^2C_P^2(Pe^a||\gamma (z)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (z)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\, dz}dy=:L_3(t), \end{aligned} \end{aligned}$$

(125)

where the function $L_3(t)$ is bounded on T. From (125) and (122) it follows for (121) that on T

$$\begin{aligned} ||\mathcal {G}(t)||_{\mathbf{H}^{\mathbf{s}-1}}\le CL_3(t)L_1(t)||\mathcal {W}(t)||_{\mathcal {H}^{\mathbf{s}}}^2. \end{aligned}$$

(126)

With (126) and (119) we derive for the upper bound on the second-order state in (118) for $t\in T$

$$\begin{aligned} \begin{aligned}&||\bar{\varPsi }(t)||_{\mathcal {H}^{\mathbf{s}}}^2\\&\quad \le \mu M_se^{\int _T \big (M_s(z)+CK_s^2C_P^2(Pe^a||\gamma (z)||_{H^{s_\theta ^a}}^2+Re^o||\sigma (z)||_{H^{s_u^o}}^2\big )-C_P^{-1}\nu _*\, dz}\\&\qquad \times \int _T \bigg [ ||\mathcal {R}||C_{s^{obs}}L_1(t)+CL_3(t)L_1(t) \bigg ] dy||\mathcal {W}||_{\mathcal {H}^{\mathbf{s}}}^2 \le K||\mathcal {W}(t)||_{\mathcal {H}^{\mathbf{s}}}^2, \end{aligned} \end{aligned}$$

(127)

where $K>0$ is defined by the right-hand side of the second inequality and depends on the parameter of the problem but not on time. We are now in the position to derive bounds on the Hessian. In view of (110) the following estimate follows for $\mathcal {W},\mathcal {Z}\in \mathcal {H}^{\mathbf{s}}(\varOmega )$

$$\begin{aligned} \begin{aligned} \big \langle \mathcal {Z},H_{\mathcal {J}}[\psi ]\mathcal {W}\big \rangle _{\mathcal {H}^{\mathbf{s}}}&= \big \langle \mathcal {Z},\mathcal {S}_{\mathbf{s}^b} \mathcal {B}\mathcal {W} -\bar{\varPsi }(t_0)\big \rangle _{\mathcal {H}^{\mathbf{s}}}\\&\ge -||\mathcal {Z}||_{\mathcal {H}^{\mathbf{s}}}\, ||\mathcal {S}_{\mathbf{s}^b} \mathcal {B}\mathcal {W} -\bar{\varPsi }(t_0)||_{\mathcal {H}^{\mathbf{s}}}\\&\ge -||\mathcal {Z}||_{\mathcal {H}^{\mathbf{s}}}\, \big | ||\mathcal {S}_{\mathbf{s}^b} \mathcal {B}\mathcal {W}||_{\mathcal {H}^{\mathbf{s}}}^2 -||\bar{\varPsi }(t_0)||_{\mathcal {H}^{\mathbf{s}}}\big |. \end{aligned} \end{aligned}$$

(128)

In case that $||\mathcal {S}_{\mathbf{s}^b} \mathcal {B}\mathcal {W}||_{\mathcal {H}^{\mathbf{s}}} >||\bar{\varPsi }(t_0)||_{\mathcal {H}^{\mathbf{s}}}$ we derive the lower bound

$$\begin{aligned} \begin{aligned} \big \langle \mathcal {Z},H_{\mathcal {J}}[\psi ]\mathcal {W}\big \rangle _{\mathcal {H}^{\mathbf{s}}}&\ge -||\mathcal {Z}||_{\mathcal {H}^{\mathbf{s}}}\, ||\mathcal {S}_{\mathbf{s}^b} \mathcal {B}\mathcal {W}||_{\mathcal {H}^{\mathbf{s}}} +||\bar{\varPsi }(t_0)||_{\mathcal {H}^{\mathbf{s}}}\\&\ge -||\mathcal {Z}||_{\mathcal {H}^{\mathbf{s}}}\, ||\mathcal {S}_{\mathbf{s}^b} \mathcal {B}||\,||\mathcal {W}||_{\mathcal {H}^{\mathbf{s}}}. \end{aligned} \end{aligned}$$

(129)

If $||\mathcal {S}_{\mathbf{s}^b} \mathcal {B}\mathcal {W}||_{\mathcal {H}^{\mathbf{s}}}^2 \le ||\bar{\varPsi }(t_0)||_{\mathcal {H}^{\mathbf{s}}}^2$ we find with (127)

$$\begin{aligned} \begin{aligned} \big \langle \mathcal {Z},H_{\mathcal {J}}[\psi ]\mathcal {W}\big \rangle _{\mathcal {H}^{\mathbf{s}}}&\ge ||\mathcal {Z}||_{\mathcal {H}^{\mathbf{s}}}\, ||\mathcal {S}_{\mathbf{s}^b} \mathcal {B}\mathcal {W}||_{\mathcal {H}^{\mathbf{s}}} -||\bar{\varPsi }(t_0)||_{\mathcal {H}^{\mathbf{s}}}\\&\ge -||\mathcal {Z}||_{\mathcal {H}^{\mathbf{s}}}\,||\bar{\varPsi }(t_0)||_{\mathcal {H}^{\mathbf{s}}}\\&\ge -||\mathcal {Z}||_{\mathcal {H}^{\mathbf{s}}}\,\sqrt{K}||\mathcal {W}||_{\mathcal {H}^{\mathbf{s}}}. \end{aligned} \end{aligned}$$

(130)

This proves the lower bound of the Hessian. The upper bound follows from (127)

$$\begin{aligned} \begin{aligned} \big \langle \mathcal {Z},H_{\mathcal {J}}[\psi ]\mathcal {W}\big \rangle _{\mathcal {H}^{\mathbf{s}}}&= \big \langle \mathcal {Z},\mathcal {S}_{\mathbf{s}^b} \mathcal {B}\mathcal {W} -\bar{\varPsi }(t_0)\big \rangle _{\mathcal {H}^{\mathbf{s}}}\\&\le ||\mathcal {Z}||_{\mathcal {H}^{\mathbf{s}}}\, ||\mathcal {S}_{\mathbf{s}^b} \mathcal {B}\mathcal {W} -\bar{\varPsi }(t_0)||_{\mathcal {H}^{\mathbf{s}}}\\&\le ||\mathcal {Z}||_{\mathcal {H}^{\mathbf{s}}}( ||\mathcal {S}_{\mathbf{s}^b} \mathcal {B}\mathcal {W}||_{\mathcal {H}^{\mathbf{s}}} +||\bar{\varPsi }(t_0)||_{\mathcal {H}^{\mathbf{s}}})\\&\le ||\mathcal {Z}||_{\mathcal {H}^{\mathbf{s}}}( ||\mathcal {S}_{\mathbf{s}^b} \mathcal {B}|| + \sqrt{K}||)\mathcal {W}||_{\mathcal {H}^{\mathbf{s}}}. \end{aligned} \end{aligned}$$

(131)

Equations (129), (130) and (131) establish together with (110) the bounds on the second derivative of the cost functional and the application of Lemma 9 proves the assertion of the Theorem. $\square $

Remark 11

(The case of two uncoupled models) We continue here the Remarks 3 and 9 on two uncoupled models that are obtained if one replaces the coupling term by external forcing. The convergence of the steepest descent algorithm of Theorem 8 applies to this case as well.

7 Conclusion

We have suggested a formulation of the variational data assimilation problem that reflects the regularity of the model solution in the norms appearing in the background and observational term of the cost functional. The use of derivative-based norms implements a scale-selective filtering capability within the data assimilation algorithm without affecting the model dynamics. For an idealized coupled atmosphere–ocean model we could show that that this formulation leads to a solvable optimization problem for which we can prove the existence of optimal initial conditions. The coupled optimization problem is also computable in the sense of local convergence of a gradient-based descent algorithm to optimal initial conditions. This work can be extended and continued in several respects.

Most importantly, the impact of the Sobolev-type of cost functional has to be investigated in numerical experiments. These experiments need to study the difference between a well-defined (Sobolev-type) cost functional suggested here and a standard cost functional with square-integrable norms only. The scale-selective filtering property of the Sobolev norms has to be investigated experimentally with respect to its physical impacts on the optimal initial condition that is determined by coupled data assimilation. The filtering has also to be compared against regularization approaches such as using increased dissipation in the adjoint equations as compared to the nonlinear equations to regularize the computation of the gradient of the cost functional (see e.g. [17]). Most notably it has to study if our scale selective filtering is beneficial in avoiding local minima during the cost functional minimization.

For both coupled and uncoupled data assimilation algorithm the modelling of multivariate covariance matrices for model error and observational error is of paramount importance. Modelling the error covariance between different components of the state vector of the coupled system poses a fundamental challenge. In our approach derivatives are additionally taken of the model-background difference, weighted by the model error covariance matrix $\mathcal {B}$. An analogous procedure is applied to the model-data difference (see (89) and (91)). The interaction between the derivatives, created by the Sobolev norm, and the two error covariance matrices constitutes a new constituent of the data assimilation algorithm that was not addressed in this paper and remains to be understood. Since we have not focused on model error covariance modelling we have used for simplicity a single model error covariance operator that is applied to all derivatives of the model-background difference. This does not necessarily have to be the optimal choice and one can imagine to use different error covariance operators for different derivative terms. The standard cost functional with the $L^2$-norm has an interpretation as statistical least-square estimation technique (see e.g. [18, 40]). If there is a corresponding interpretation of the Sobolev-norm cost functional as a Bayesian estimator is an open question.

We mention also a complementary research path towards understanding and practical implementation of coupled data assimilation. Here one makes a compromise on the complexity of the assimilation algorithm but not on the coupled model, i.e. to apply less sophisticated data assimilation algorithms to complex three-dimensional coupled atmosphere ocean equations. An example of such an assimilation technique are downscaling and continuous data assimilation algorithms that have been used in a series of papers (see e.g. [4, 13] and references therein) to study fundamental aspects of data assimilation for a variety of (uncoupled) models.

The final frontier of coupled data assimilation is of course the data assimilation for coupled models consisting of three-dimensional general circulation models of atmosphere and ocean. For such models well-posedness theorems are not available and the knowledge about the regularity of the solution is limited. The deepest results is the series of papers by Lions–Temam–Wang [21,22,23], who established for a coupled atmosphere–ocean model based on the (hydrostatic) primitive equations the existence of weak solutions, with uniqueness remaining an open question and only weakly continuous dependency on initial conditions. For uncoupled primitive equations the breakthrough result of [9] establishes well-posedness for initial data in $H^1$. For these equations the solvability of the variational data assimilation problem using weak solutions has been shown in [3]. The extension to strong solutions remains open. For coupled general circulation model we are not aware of any solvability and computability result of the data assimilation problem. In order to make progress towards general circulation models we have to abandon our notion of classical solutions of the atmosphere–ocean equations and extend our approach to more general notions such as weak and strong solutions.

Notes

The space-time function space $L^p(T, X)$ denotes functions from the time interval T into a function space X whose norm in X is p-times integrable ($1\le p\le \infty $). In standard variational algorithms the space-time metric $L^2(T,L^2)$ is used.
The argument that leads to (48) follows [10], see Sect. 11, (11.10).
The reason to integrate by parts and reduce the order of differentiation of the forcing term is that we need this regularity in the differentiability proof in Lemma 7, see (81).

References

Abergel, F., Temam, R.: On some control problems in fluid mechanics. Theor. Comput. Fluid Dyn. 1, 303–325 (1990)
Article MATH Google Scholar
Adams, R.A., Fournier, J.J.F.: Sobolev Spaces. Academic Press, Cambridge (2003)
MATH Google Scholar
Agoshkov, V.I., Ipatova, V.M.: Solvability of the observation data assimilation problem in the three-dimensional model of ocean dynamics. Differ. Equ. 43, 1088–1100 (2007)
Article MathSciNet MATH Google Scholar
Altaf, M.U., Titi, E.S., Gebrael, T., Knio, O.M., Zhao, L., McCabe, M.F., Hoteit, I.: Downscaling the 2D Benard convection equations using continuous data assimilation. Comput. Geosci. 21, 393–410 (2017)
Article MathSciNet MATH Google Scholar
Bardos, C., Pironneau, O.: Data assimilation for conservation laws. Methods Appl. Anal. 12, 103–134 (2005)
MathSciNet MATH Google Scholar
Bewley, T.R., Moin, P., Temam, R.: DNS-based predictive control of turbulence: an optimal benchmark for feedback algorithms. J. Fluid Mech. 447, 179–225 (2001)
Article MathSciNet MATH Google Scholar
Bresch, D., Desjardins, B.: Existence of global weak solutions for a 2D viscous shallow-water equations and convergence to the quasi-geostrophic model. Commun. Math. Phys. 238, 211–223 (2003)
Article MathSciNet MATH Google Scholar
Bresch, D.: Shallow-water equations and related topics. Handb. Differ. Equ. 5, 1–104 (2009)
MathSciNet MATH Google Scholar
Cao, C., Titi, E.S.: Global well-posedness of the three-dimensional viscous primitive equations of large scale ocean and atmosphere dynamics. Ann. Math. 166, 245–267 (2007)
Article MathSciNet MATH Google Scholar
Constantin, P., Foias, C.: Navier–Stokes Equations. University of Chicago Press, Chicago (1988)
MATH Google Scholar
Dijkstra, H.A.: Nonlinear Physical Oceanography. Springer, Berlin (2005)
Google Scholar
Evans, L.C.: Partial Differential Equations. American Mathematical Society, Philadelphia (1998)
MATH Google Scholar
Farhat, A., Lunasin, E., Titi, E.S.: On the charney conjecture of data assimilation employing temperature measurements alone: the paradigm of 3D planetary geostrophic model. Math. Clim. Weather Forecast. 2, 61–74 (2016)
MATH Google Scholar
Fedorov, A.V.: Ocean-atmosphere coupling. In: Goudie, A., Cuff, D. (eds.) Oxford Companion to Global Change, pp. 369–374. Oxford University Press, Oxford (2008)
Google Scholar
Fowler, A.M., Lawless, A.S.: An idealized study of coupled atmosphere-ocean 4D-Var in the presence of model error. Mon. Weather Rev. 144, 4007–4030 (2016)
Article Google Scholar
Frolov, S., Bishop, C.H., Holt, T., Cummings, J., Kuhl, D.: Facilitating strongly coupled ocean-atmosphere data assimilation with an interface solver. Mon. Weather Rev. 144, 3–20 (2016)
Article Google Scholar
Hoteit, I., Cornuelle, B., Köl, A., Stammer, D.: Treating strong adjoint sensitivities in tropical eddy-permitting variational data assimilation. Q. J. R. Meteorol. Soc. 131, 3659–3682 (2005)
Article Google Scholar
Kalnay, E.: Atmosheric Modeling, Data Assimilation and Predictability. Cambridge University Press, Cambridge (2003)
Google Scholar
Klainermann, S., Majda, A.: Singular limits of quasilinear hyperbolic systems with large parameters and the incompressible limit of compressible fluids. Commun. Pure Appl. Math. 34, 481–524 (1981)
Article MathSciNet MATH Google Scholar
Le Dimet, F.X., Navon, I.M., Daescu, D.: Second-order information in data assimilation. Mon. Weather Rev. 130, 629–648 (2002)
Article Google Scholar
Lions, J.-L., Temam, R., Wang, S.: Models of the coupled atmosphere and ocean. In: Oden, J.T. (ed.) Computational Mechanics Advance, vol. 1, pp. 3–54. Elsevier, Amsterdam (1993)
Google Scholar
Lions, J.-L., Temam, R., Wang, S.: Numerical analysis of coupled atmosphere-ocean models. In: Oden, J.T. (ed.) Computational Mechanics Advance, vol. 1, pp. 55–120. Elsevier, Amsterdam (1993)
Google Scholar
Lions, J.-L., Temam, R., Wang, S.: Mathematical theory for the coupled atmosphere-ocean models. J. Math. Pures Appl. 74(2), 105–163 (1995)
MathSciNet MATH Google Scholar
Lu, J., Hsieh, W.W.: On determining initial conditions and parameters in a simple coupled atmosphere-ocean model by adjoint data assimilation. Tellus 50 A, 534–544 (1998)
Article Google Scholar
Majda, A.: Introduction to PDEs and Waves for Atmosphere and Ocean, Courant Lecture Notes in Mathematics, vol. 5. AMS, New York (2003)
MATH Google Scholar
Majda, A., Bertozzi, A.: Vorticity and Incompressible Flow. Cambridge University Press, Cambridge (2002)
MATH Google Scholar
Maz’ya, V.: Sobolev Spaces: With Applications to Elliptic Partial Differential Equations, Grundlehren der mathematischen Wissenschaften, vol. 341. Springer, Berlin (2011)
Book Google Scholar
Neelin, J.D., Battisti, D.S., Hirst, A.C., Jin, F.-F., Wakata, Y., Yamagata, T., Zebiak, S.E.: ENSO theory. J. Geophys. Res. 103, 14261–14290 (1998)
Article Google Scholar
Penny, S.G., Akella, S., Alves, O., Bishop, C., Buehner, M., Chevallier, M., Counillon, F., Draper, C., Frolov, S., Fujii, Y., Karspeck, A., Kumar, A, Laloyaux, P., Mahfouf, J.-F.r, Martin, M., Pena, M., de Rosnay, P., Subramanian, A., Tardif, R., Wang, Y., Wu, X.: Coupled data assimilation for integrated earth system analysis and prediction: goals, challenges and recommendations. In: WWRP 2017 - 3, World Meteorological Organization, 2017. https://www.wmo.int/pages/prog/arep/wwrp/new/documents/Final_WWRP_2017_3_27_July.pdf
Sigura, N., Awaji, T., Masuda, S., Mochizuki, T., Toyoda, T., Miyama, T., Igarashi, H., Ishikawa, Y.: Development of a four-dimensional variational data assimilation system for enhanced analysis and prediction of seasonal to interannual climate variations. J. Geophys. Res. 113, C10017 (2008)
Article Google Scholar
Stammer, D., Wunsch, C., Giering, R., Eckert, C., Heimbach, P., Marotzke, J., Adcroft, A., Hill, C.N., Marshall, J.: The global ocean circulation during 1992–1997, estimated from ocean observations and a general circulation model. J. Geophys. Res. 107(C9), 3118 (2002)
Article Google Scholar
Sullivan, P.P., McWilliams, J.C.: Dynamics of winds and currents coupled to surface waves. Annu. Rev. Fluid Mech. 42, 19–41 (2010)
Article MATH Google Scholar
Tachim Medjo, T., Temam, R., Ziane, M.: Optimal and robust control of fluid flow: some theoretical and computational aspects. Appl. Mech. Rev. 61, 1–23 (2008)
MATH Google Scholar
Temam, R.: Navier–Stokes equations and nonlinear functional analysis. In: SIAM CBMS-NSF Regional Conference Series in Applied Mathematics (1995)
Ulbrich, S.: A sensitivity and adjoint calculus for discontinuous solutions of hyperbolic conservation laws with source terms. SIAM J. Control Optim. 41(3), 740–797 (2002)
Article MathSciNet MATH Google Scholar
Vallis, G.: Atmospheric and Oceanic Fluid Dynamics. Cambridge University Press, Cambridge (2006)
Book MATH Google Scholar
Wang, Z., Navon, I.M., Le Dimet, F.X., Zou, X.: The second order adjoint analysis: theory and application. Meteorol. Atmos. Phys. 50, 3–20 (1992)
Article Google Scholar
Washington, W.M., Parkinson, C.L.: Introduction To Three-dimensional Climate Modeling. University Science Books, Herndon (2005)
MATH Google Scholar
Weaver, A., Courtier, P.: Correlation modelling on the sphere using a generalized diffusion equation. Q. J. R. Meteorol. Soc. 127, 1815–1846 (2001)
Article Google Scholar
Wunsch, C.: The Ocean Circulation Inverse Problem. Cambridge University Press, Cambridge (1997)
MATH Google Scholar

Download references

Acknowledgements

Open access funding provided by Max Planck Society.

Author information

Authors and Affiliations

Max Planck Institute for Meteorology, Bundesstr. 53, Hamburg, Germany
Peter Korn

Authors

Peter Korn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Korn.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Korn, P. A Regularity-Aware Algorithm for Variational Data Assimilation of an Idealized Coupled Atmosphere–Ocean Model. J Sci Comput 79, 748–786 (2019). https://doi.org/10.1007/s10915-018-0871-y

Download citation

Received: 05 April 2018
Revised: 07 October 2018
Accepted: 01 November 2018
Published: 14 November 2018
Issue Date: 15 May 2019
DOI: https://doi.org/10.1007/s10915-018-0871-y

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Regularity-Aware Algorithm for Variational Data Assimilation of an Idealized Coupled Atmosphere–Ocean Model

Abstract

Similar content being viewed by others

Strong Solvability of a Variational Data Assimilation Problem for the Primitive Equations of Large-Scale Atmosphere and Ocean Dynamics

Variational Method for Solving the Quasi-Geostrophic Circulation Problem in a Two-Layer Ocean

A Discrete Data Assimilation Algorithm for the Three Dimensional Planetary Geostrophic Equations of Large-Scale Ocean Circulation

1 Introduction

2 The Coupled Model and the Associated Data Assimilation Problem

3 Functional Setting

Lemma 1

Lemma 2

Lemma 3

Lemma 4

Lemma 5

Lemma 6

4 Mathematical Analysis of the Coupled Model

4.1 Well-Posedness of the Coupled Equations

Definition 1

Theorem 1

Proof

Remark 1

4.2 Linearized, Adjoint Coupled Equations and Differentiability

Theorem 2

Proof

Lemma 7

Proof

Theorem 3

Remark 2

Lemma 8

Remark 3

5 Existence of Local Minima of the Data Assimilation Problem

Theorem 4

Remark 4

Remark 5

Proof of Theorem 4

Remark 6

Proposition 1

Proof

6 Calculation of Minimizers and Convergence of Gradient Algorithm

6.1 Characterization of Local Minima

Theorem 5

Proof

Remark 7

Remark 8

Remark 9

6.2 Convergence of Gradient-Based Descent Algorithm

Remark 10

Lemma 9

Theorem 6

Proof

Theorem 7

Proof

Remark 11

7 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation