1 Introduction

Over the past few decades, operational NWP and climate models have evolved tremendously thanks to the continuous improvement of computing technologies and of the underlying algorithms at the foundations of these models. Today, these algorithms are facing a major challenge, as NWP and climate models are transitioning to more sophisticated Earth System Models (ESM), that will incorporate more components, and towards more accurate representation of the flow physics. In addition, the hardware used to perform the simulations is also undergoing a dramatic change, with many-core architectures and co-processors—e.g., graphical processing units (GPU) and Intel’s Many Integrated Core processor architecture (MIC)—becoming the prevailing technologies. Therefore, it is necessary to review the core algorithmic strategies—in particular, the numerical discretizations and the time-integration strategies—currently adopted in the industry, to understand their potential in the future landscape.

To begin the discussion, we first need to characterize in more detail the models adopted in weather and climate applications. A numerical weather or climate model is constituted by a set of prognostic partial differential equations (PDEs) governing the fluid motion in the atmosphere (i.e., the geophysical flow) and by all those physical processes acting at a subgrid scale, whose statistical effects on the mean flow are expressed as a function of resolved-scale quantities [64]. The former is known as the dynamical core and represents the scale-resolved part of the model, whereas the latter are referred to as the physical parameterizations and include the under-resolved processes. The dynamical core is usually described by the laws of thermodynamics and Newton’s law of motion for the fluid air (e.g., compressible Navier–Stokes or Euler equations). Typical examples of physical parameterizations are convective processes, cloud microphysics, solar radiation, boundary layer turbulence and drag processes, which are handled by vertically-columnar submodels. The overall model is then discretized in space and time via suitable algorithmic blocks to approximate the continuous system of equations, thereby providing a solution otherwise not readily achievable.

In the short description above, we did not distinguish between a weather and a climate model, but we treated them as though they were identical. This deserves an explanation. While there are certainly overlaps between them, generally the time-scale and scope of a weather model differ substantially from those of a climate model. For example, the typical forecast window of weather models spans a range up to several days ahead, whilst for climate models, the forecast range is several months to years ahead. However, the set of PDEs defining the dynamical core are similar, if not identical; in fact, weather models are also referred to as the “higher-resolution siblings of the climate models’ atmospheric component” [85], since they typically have higher spatial and temporal resolution than climate models. Also, they include a smaller number of physical processes, although both weather and climate models are evolving towards Earth System Models ESMs, that will include dynamic oceans, cryospheres and biochemical cycles [85]. Yet, despite these differences, in both weather and climate the “numerical engine” used to solve the PDEs constituting the model must be effective under several evaluation metrics in order to provide a ‘fast’ and ‘quantitatively satisfactory’ forecast. Hereafter, we will make no distinctions between NWP and climate models, as the scope of this review targets a shared aspect of both, despite the usually different operational and scientific objectives of weather and climate applications. In addition, we will refer to NWP operational constraints omitting those of climate simulations, as they are commonly the most severe.

The “numerical engine” employed to discretize the equations governing weather and climate models is the key to achieve the desired and commonly demanding performance required in operational NWP and climate simulations. From this perspective, the globally scale-resolved numerical solution, also referred to as direct numerical simulation (DNS), of weather and climate is not feasible due to the continuous spectrum of spatial and temporal scales and their nonlinear interactions that would require computational resources not existent today (and that will not be available in the near future). On the other hand, large-eddy simulations (LES) of weather and climate are an emerging area of atmospheric (and oceanic) research, for both limited-area models (i.e., models that work on a limited portion of the Earth) and some global models (i.e., models that work on the entire planet) [44]. However, the major global operational and research centers use the compressible Euler equations (often further approximated due to the dominant hydrostatic balance of the atmosphere), together with physical subgrid-scale processes. This system of prognostic equations constitutes the backbone of all modern operational weather and climate models; their numerical solution together with the physical parameterizations and boundary conditions provide the spatio-temporal evolution of the atmosphere in terms of wind, pressure, temperature, density, including moist variables such as specific humidity, rain, snow, cloud water and ice, precipitation, and other atmospheric constituents. Although the overall numerical discretization strategy affects the quality metrics of the forecast system, the time integration of the PDEs governing geophysical flows in global weather and climate applications—that is the focus of this review—constitutes one of the most important aspects in the design of the computational model. The time integration in fact drives several key aspects of a NWP or climate model:

  1. (1)

    Solution accuracy,

  2. (2)

    Effectiveness of uncertainty quantification,

  3. (3)

    Time-to-solution,

  4. (4)

    Energy- or money-to solution, and

  5. (5)

    Robustness (e.g., numerical stability).

The forecast accuracy and the quantification of its uncertainty bounds are the highest goals for a successful NWP or climate model, together with a strict requirement on the timeliness of delivery of the forecast. The latter is strongly limited by the time of arrival of the global observations that input into the model initialisation and the deadline to deliver the forecast. The data assimilation of observations to derive the initial conditions (the analysis) already makes heavy use of and strongly depends on the quality of the forecast model and its efficiency. For instance, in terms of time-to-solution, the time threshold required operationally to run the entire model is 8.5 min per simulated forecast day, as defined in 2014 in the dynamical core evaluation criteria by the Advanced Computing Evaluation Committee (AVEC) for the Next Generation Global Prediction Systems (NGGPS).Footnote 1 This time constraint casts huge demands in terms of algorithmic efficiency for both parallel scalability as well as single-node performance that can strongly influence the choice of the numerical discretization strategy (e.g., time-integration) that is to be adopted. For example, for NWP, it is deemed acceptable to make some compromises on the level of required machine accuracy for each physical process (given the underlying uncertainty of some of the modeled aspects) in order to achieve better time-to-solution. Most recently, many operational centers are also including the energy-to-solution as an essential requirement to achieve a sustainable and cost-effective path to the future, given that the cost for electricity to run a state-of-the-art global NWP assimilation and forecast system is extremely demanding and will grow as the models become more complex and refined [9, 103]. Finally, the numerical stability and associated reliability of forecast delivery allows a robust operational workflow that guarantees, on the one hand, that the operational model will not fail (e.g., crash), and, on the other hand, that it will provide reproducible results.

Today there are a number of highly successful strategies that have emerged as the method of choice for the temporal (and spatial) discretization of the set of PDEs underlying NWP and climate models. Yet, the evolution of high-performance computing architectures requires a careful review of these strategies. From this perspective, the development of novel mathematical algorithms and their combination with existing successful methods, as well as hardware–software co-design are becoming an essential activity undertaken by many practitioners in the weather industry. In this work, we provide a broad overview on the different numerical time-integration strategies used to solve the PDEs arising in global NWP and climate models, emphasizing the most prominent techniques adopted in the weather community operationally in the past few decades, and describing the emerging solutions that operational weather centers (and the European Centre for Medium Range Weather Forecasts (ECMWF) in particular) are considering for the future. This review also aims to clearly categorize the currently adopted and emerging time-integration strategies under a more structured nomenclature.

The rest of the review is organized as follows. In Sect. 2, we introduce the set of equations used in weather and climate models. In Sect. 3, we categorize the most prominent time-integration approaches used in the industry. In Sect. 4, we highlight the time-stepping strategies adopted by the main operational and research models. In Sect. 5, we introduce three time-integration schemes that are being considered as potentially competitive for the future. Finally, in Sect. 6, we discuss the possible evolution that the weather and climate industry might undergo in the near to long-term future.

2 Equations Modeling the Atmosphere

Operational global NWP and climate models employ the compressible Euler equations to describe the fluid motion in the atmosphere, where the missing viscous stresses, and the sensible and latent heat fluxes as a result of diabatic processes are modeled as part of the physical parameterizations, and represented on the right-hand side of the equations as forcing terms. These equations are usually written in spherical coordinates—or coordinate mappings to a tangential plane for limited-area studies—and include the effects of gravity and the Earth rotation (i.e., the Coriolis force).

The formulation of the Euler equations can be either conservative (also referred to as flux-type through including the action of the continuity equation) or non-conservative (also referred to as advective-type). The form chosen has implications on the formal accuracy of global integrals such as mass, momentum and energy, important for climate projections, as well as local conservation properties important for very high-resolution simulations of clouds and convection. The chosen (type-) formulation of the equations influences the numerical schemes that can be most efficiently employed.

The compressible Euler equations written in conservative form are given as

$$ \frac{\partial \rho }{\partial t} + \nabla \cdot (\rho {\mathbf {u}}) = 0, $$
(1a)
$$\frac{\partial (\rho {\mathbf {u}})}{\partial t} + \nabla \cdot (\rho {\mathbf {u}} \otimes {\mathbf {u}}) + \nabla p = -\rho \mathbf {g} - 2\rho ({\varvec{\omega }} \times {\mathbf {u}}) + {\mathcal {P}},$$
(1b)
$$\frac{\partial (\rho \theta )}{\partial t} + \nabla \cdot (\rho \theta {\mathbf {u}}) = {\mathcal {Q}}, $$
(1c)

where \(\rho \) is the density, p is the pressure, \({\mathbf {u}}\) is the velocity vector, \(\mathbf {g}\) is the gravitational force, \({\varvec{\omega }}\) is the vector representing the Earth rotation, \(\theta \) is the potential temperature, that is related to pressure, p, and temperature, T, via \(\theta = T/\pi \), with \(\pi = (p/p_{0})^{R/c_{p}}\) being the Exner pressure and \(p_{0}\), R and \(c_{p}\) being a reference pressure, the gas constant and the heat capacity at constant pressure, respectively. Equation (1) needs to be complemented by an equation of state, that is:

$$ p = p_{0}\left( \frac{\rho R \theta }{p_0}\right) ^{c_{p}/c_{v}}, $$
(2)

where \(c_{v}\) is the heat capacity at constant volume. In addition, the terms \({\mathcal {P}}\) and \({\mathcal {Q}}\) in Eq. (1) represent the physical parametrization for the momentum and energy equation, respectively. The system of Eq. (1) is just one of several possibilities to write the equations governing the atmosphere (for alternatives see, for instance, [41, 42]). In addition, here we do not formally define the physical parametrization terms \({\mathcal {P}}\) and \({\mathcal {Q}}\), as they are not relevant for the purpose of this review. The reader interested in these aspects can refer to [9] (Fig. 1).

Fig. 1
figure 1

Characteristic waves in the atmosphere

The system of Euler equations complemented with the parametrized physical processes simulates a wide range of characteristic waves whose propagation speeds, in addition to the advective time-scale (of the wind), dictate the resolvable temporal and spatial scales of the solution. These span from slower planetary waves (Rossby waves), to faster propagating Kelvin waves (in equatorial regions), and ubiquitous inertia-gravity waves as well as acoustic waves (the latter only if not filtered by the hydrostatic simplifications), see Fig. (1). In NWP, large-scale Rossby and Kelvin waves as well as inertia-gravity waves are important features that are ideally resolved within the model, while the energetic impact of acoustic waves is small and their subsequent impact on the weather forecast are usually considered marginal. The latter is a key aspect in NWP, since acoustic waves impose a huge restriction on the maximum time-step that can be used when an explicit time-integration scheme is adopted. In particular, the vertically-propagating acoustic waves are the most restrictive in terms of time-step because the vertical spatial discretization is much finer in global simulations than the horizontal discretization, that is \(z_{h} \ll s_{h}\). In the past few decades, most models adopted in global NWP have widely exploited the hydrostatic approximation, where the upward-directed pressure gradient force (i.e., the decrease of pressure with height) is balanced by the downward-directed gravitational pull of the Earth. This approximation results in simplifying the vertical momentum equation as follows

$$ \frac{\partial p}{\partial z} = -\,g\rho,$$
(3)

where z is the vertical coordinate. The hydrostatic approximation is well satisfied in the global atmosphere over a wide range of scales [53], and the resulting hydrostatic NWP model does not have any acoustic wave propagation in the vertical direction, thus removing the associated time-step restrictions. In addition, hydrostatic models substantially simplify the treatment of vertical boundary conditions, with important implications on the stability in complex terrain of the underlying numerical discretization. These features facilitate the adoption of time-stepping strategies that are accurate and stable for very large time-steps, \(\varDelta t \sim [120{-}1800] {\rm s}\) for horizontal resolutions \(\varDelta s \sim [2{-}18] {\rm km}\). However, in global (sub-) km-scale simulations, as required to make fundamental advances in resolving existing model uncertainties with respect to cloud-radiation interactions, the hydrostatic approximation is questionable, although not proven to be inadequate for weather and climate simulations at least up to \(\mathcal {O}(1)\) km resolutions, c.f. [53]. Many global operational NWP centers also operate local sub-km grid refinements, and driven by the desire to maintain a single code framework, will in the next decade transition from hydrostatic models to non-hydrostatic models. This transition has also important consequences on the numerical discretization strategies that can be adopted. In a non-hydrostatic model the vertically-propagating acoustic waves need to be appropriately treated to avoid severe restrictions on the time-step that can be used when explicit time-integration schemes are employed. There are different solutions proposed in the literature, such as a priori filtering of acoustic modes, i.e. sound-proof models, or by solving the compressible Euler equations and treating the acoustic waves via implicit-explicit or fully-implicit time-integrators [4, 10, 30, 90, 92]. While filtering the acoustic modes is attractive, the numerical solution procedure of the resulting filtered equations may be more difficult, hence there is still no consensus in the weather community towards one particular model [69], although the unfiltered compressible Euler equations with a numerical handling of acoustic modes is currently favored. The particular algorithm needs to consider physical and numerical constraints as for example the lower boundary treatment, conservative properties of the cloud resolving algorithms or the correctness of the treatment of acoustic and gravity waves (cf. [1]).

In this review, we consider time-integration strategies that target both hydrostatic and non-hydrostatic models and that are adopted by the major global operational centers. In the following we will denote, without lack of generality, the system of equations governing the NWP model as follows:

$$\frac{\partial {\mathbf {y}}}{\partial t} = {\mathcal {R}}({\mathbf {y}}, t),$$
(4)

where \({\mathcal {R}}({\mathbf {y}}, t)\) denotes the right-hand side of the system of equations, \({\mathbf {y}}\) are the prognostic variables and the system is complemented by suitable initial and boundary conditions, which make it formally an initial boundary value problem (IBVP). In the following, we will manipulate Eq. (4) to highlight the characterizing aspects of each time-integration strategy that will be described.

3 Time-Integration Strategies in Global NWP and Climate Modeling

Broadly speaking, there are two distincts, high-level classes of time-integration schemes for IBVP (4) such as those arising in NWP and climate models:

  1. A.

    Eulerian-based time-integration (EBTI), where spatial and temporal discretizations are viewed as independent from each other; and

  2. B.

    Lagrangian or path-based time-integration (PBTI), where space and time are solved together, or where temporal derivatives may be expressed as spatial derivatives.

EBTI and PBTI have different characteristics in terms of stability properties, accuracy, suitability to emerging hardware, etc. and their application in the context of NWP has been rather different in the past few decades, with PBTI methods, especially semi-implicit semi-Lagrangian schemes, as the method of choice in operational global NWP. Both EBTI and PBTI methods can be explicit (current time-level is calculated using information coming from the previous time-steps only) or implicit (current time-level is obtained by solving a nonlinear problem that uses information from the current time-step). In the following, we describe in more detail each of the two classes and, for each class, we outline the prevailing operational time-integration practices adopted by the main weather and climate centers (cf. Table 1). In particular, we focus on two EBTI time-integration methods, namely (i) split-explicit (SE) schemes and (ii) horizontally-explicit vertically-implicit (HEVI) schemes, and on the PBTI-based family of methods referred to as (iii) semi-implicit semi-Lagrangian schemes.

3.1 Eulerian-Based Time-Integration (EBTI)

EBTI schemes [84], also referred to as Method Of Lines-based schemes (MOL), recast the IBVP constituted by the set of prognostic PDEs describing the physical model (4), into two sequential problems, a semi-discrete boundary value problem (BVP), where the equations are discretized in space, and an initial value problem (IVP), where the spatially discretized equations are discretized in time via a suitable time-integration scheme, as depicted in Fig. 2.

Fig. 2
figure 2

Conceptual schematics of EBTI schemes

In fact, the continuous system of PDEs (4), after being spatially-discretized, is reduced to a semi-discrete BVP that is formally a system of ordinary differential equations (ODEs)

$$ \frac{\text {d}\textsf {y}_{h}}{\text {d}t} = {\mathcal {R}}_{h}(\textsf {y}_{h}, t)$$
(5)

where \(\textsf {y}_{h}\) is the vector of spatially discretized prognostic variables and \({\mathcal {R}}_{h}\) is the spatially-discretized right-hand side. Equation (5) can be solved to a desired time accuracy at all spatial locations of the model domain

$$ \frac{\textsf {y}_{h}^{n+1} - \textsf {y}_{h}^{*}}{\alpha \varDelta t} = {\mathcal {R}}_{h}(\widetilde{\textsf {y}}_{h}),$$
(6)

where n indicates time-level \(t^{n}\), \(\varDelta t = (t^{n+1} - t^{n})\) is the time-step and the factor \(\alpha \) signifies the time-interval over which the temporal approximation is made. In addition, \(\textsf {y}_{h}^{*}\) and \(\widetilde{\textsf {y}}_{h}\) are combinations of model solutions. In particular, the first contains only known quantities (i.e., quantities from previous time-steps), while the second contains either quantities from previous time-steps only—explicit time-integration—or includes also future quantities—implicit time-integration. The right-hand side \({\mathcal {R}}_{h}(\textsf {y}_{h}, t)\) of Eq. (5), includes the spatially-discretized (nonlinear) advection term \((\textsf {u}_{h}\cdot \nabla _{h})\textsf {y}_{h}\) and terms describing wave propagation. The fastest of these terms cast the most severe restrictions regarding the time-step that can be adopted for explicit time-integration schemes. In particular, fast gravity and acoustic waves need to be handled appropriately to avoid time-steps that will be otherwise too small to be used in the context of operational global NWP and climate simulations.

EBTI schemes are conceptually simpler than PBTI and encapsulate two subcategories [67],

  1. i.

    multistage Runge–Kutta methods, that use multiple stages between two consecutive time-levels (or time-steps), discarding information from earlier time-steps; and

  2. ii.

    linear multistep methods, that use information from mulitple earlier time-steps.

These two subcategories can be effectively represented within the General Linear (GL) method proposed by Butcher in 1987 [15], a unifying framework for which there exist many reviews in the literature—see for instance [16, 51]—as well as several implementation strategies—see for example [102]. Both subcategories, multistage Runge–Kutta and linear multistep methods, can be fully-implicit (the current time-level is obtained by solving a nonlinear problem that uses information from the current time-step) and fully-explicit (the current time-level is calculated using information coming from the previous time-steps only). Representative classes of time-integration schemes embedded in the GL method consist of implicit multistep methods such as Adams–Moulton (AM) [22] and backward differentiation (BDF) methods [13, 20, 21], implicit multistage Runge–Kutta schemes such as diagonally (DIRK) and singly-diagonally (SDIRK) implicit Runge–Kutta schemes [3, 19, 59], explicit multistep methods, such as leapfrog and Adams–Bashforth methods [28, 43], explicit Runge–Kutta schemes, such as the fourth-order Runge–Kutta scheme [55] and partitioned methods, such as Implicit–Explicit (IMEX) schemes, whereby the operators are linearized in some fashion with—e.g., two Butcher tableaux, one explicit and one implicit [5, 40, 106].

While EBTI schemes are widely used in computational fluid dynamics, especially in the engineering sector [18, 52], their adoption in the weather and climate communities has been less widespread, with SE schemes [54, 88, 107] and horizontally-explicit vertically-implicit schemes [8, 40, 63]—i.e., schemes where the horizontal direction is treated explicitly and the vertical is treated implicitly–becoming more prominent but still confined mainly to research and limited-area models (with very few exceptions—see Table 1). Within this context, Eq. (5) is further expressed as

$$ \frac{\partial {\mathbf {y}}}{\partial t} = {\mathcal {R}}_{f}({\mathbf {y}}, t) + {\mathcal {R}}_{g}({\mathbf {y}}, t),$$
(7)

where \(\left| \left| {\mathcal {R}}_{f}\right| \right| \gg \left| \left| {\mathcal {R}}_{g}\right| \right| \) (by some norm). The difference in magnitude of \({\mathcal {R}}_{f}\) and \({\mathcal {R}}_{g}\) comes from two aspects:

  1. 1.

    solutions to the continuous model (7) comprise fast and slow modes, i.e. \({\mathcal {R}}_{f}\) (for the fast modes) and \({\mathcal {R}}_{g}\) (for the slow modes) can describe processes that differ by orders of magnitude with respect to the time-scale of their propagation; and in addition,

  2. 2.

    in the discretized model, as already highlighted, the grid-spacings used to resolve the horizontal and vertical directions are highly anisotropic (\(z_h \ll s_h\)), reflecting the different scales that characterize the important processes in each direction. Since the terms on the right-hand side of (7) include spatial gradients, then, if \({\mathcal {R}}_{f}\) represents contributions from the vertical direction and \({\mathcal {R}}_{g}\) from the horizontal, the mesh-anisotropy leads to a separation of scales between the two terms.

The separation of scales that arises between the vertical and horizontal directions is a key attribute for EBTI approaches in global NWP and climate simulations, as it motivates the use of different solution methods in the two directions. The “special-ness” of the vertical direction is further enhanced by the typical method of domain decomposition for parallelization of modern atmospheric models. The domain decomposition is limited to the horizontal dimension, with entire vertical columns preserved on each processor (based on the principle that important physical processes, such as radiative balance, act through the entire atmospheric depth, more or less in a pure vertical direction). EBTI approaches exploit the locality of the model data in the vertical direction in their choice of solution methods.

In the rest of this subsection, two EBTI approaches will be discussed—both are formally horizontally-explicit, vertically-implicit approaches, but the “split-explicit” approach, discussed in Sect. 3.1.1, adds an additional level of complexity in its use of sub-steps to handle the integration of fast processes. The “HEVI” approaches discussed in Sect. 3.1.2 highlight more recent developments, motivated by high-resolution global atmospheric models, which do not use sub-stepping.

3.1.1 Split-Explicit Schemes

Split-explicit schemes have been mainly used for high-resolution atmospheric simulations, where the horizontal mesh spacing is \(\le \mathcal {O}(10)\) km, thus being confined to local area models in the past few decades. One of the first SE (or “time-splitting”) approaches presented in the literature is in fact [57] (hereafter “KW78”). This was a limited-area cloud-resolving (grid-spacing of 1 km) atmospheric dynamical model, where advection, mixing and buoyancy were identified as the physically important processes in the system. The choice of compressible equations adopted however, meant that relatively fast acoustic waves were also present. Their system could be characterized by further extending (7) to

$$ \frac{\partial {\mathbf {y}}}{\partial t} = {\mathcal {R}}_{f_s}({\mathbf {y}}, t) + {\mathcal {R}}_{f_z}({\mathbf {y}}, t) + {\mathcal {R}}_{g}({\mathbf {y}}, t),$$
(8)

where \({\mathcal {R}}_{f_s}\) and \({\mathcal {R}}_{f_z}\) represent horizontally- and vertically-propagating fast waves respectively, and \({\mathcal {R}}_{g}\) represents the slower modes.

Under the SE approach, a scheme with the required accuracy over an appropriate time-step, \(\varDelta t\), is chosen to solve the physically important terms in \({\mathcal {R}}_{g}\)—KW78 used the 3-time-level leapfrog scheme. The fast terms are advanced on short sub-steps, \(\varDelta \tau \), such that \(\varDelta t=M\varDelta \tau \) (\(M>1\)), using simpler and cheaper schemes that guarantee stability, but sacrifice accuracy. For terms in \({\mathcal {R}}_{f_s}\), the single-step 2nd-order forward-backward scheme [68] was used, where the horizontally-propagating acoustic wave terms of the continuity equation are integrated forward in time and those of the momentum equation are integrated backward in time. For \({\mathcal {R}}_{f_z}\), the implicit trapezoidal (or Crank–Nicholson) scheme was used. This yields a tridiagonal system of equations for each vertical model column, which is computationally simple to compute. A pictorial representation of a time-integration step with the leapfrog-based SE approach used in KW78 is presented in Fig. 3a.

Fig. 3
figure 3

Illustrations of two commonly used SE integration steps for solving (8): the terms in \({\mathcal {R}}_{g}\) contribute to the model solution using a leapfrog to step from \(t-\varDelta t\) to \(t+\varDelta t\); and b a 3-stage 3rd-order Runge–Kutta scheme to step from t to \(t+\varDelta t\) using contributions from predictor-stages at \(t+\varDelta t\slash 3\) and \(t+\varDelta t\slash 2\). In both cases, contributions from the fast terms \({\mathcal {R}}_{f_s}\) and \({\mathcal {R}}_{f_z}\) are updated at each sub-step \(\varDelta \tau \)

The SE approach gains efficiency by computing the contributions from \({\mathcal {R}}_{g}\) only on the longer time-step, \(\varDelta t\). The terms in \({\mathcal {R}}_{g}\) include advection and mixing terms that require a relatively large stencil of data and are therefore more computationally expensive. Meanwhile, the contributions from \({\mathcal {R}}_{f}\) (due to the fast waves) are computed every sub-step, \(\varDelta \tau \), but involve only immediate neighbour data-points to calculate local gradients. The latter aspect is particularly attractive for emerging computing technologies, given the reduced communication-to-flop ratio required, thus favoring co-processors, accelerators and many-core architectures.

Based on a stability analysis of the KW78 SE approach, [87] argued that greater efficiency could be gained by handling the buoyancy terms with the implicit scheme on the sub-step alongside the vertically-propagating acoustic waves. Under this approach, the longer time-step \(\varDelta t\) is limited by the maximum speed of advection. Meanwhile, the sub-step \(\varDelta \tau \) continues to be limited by the horizontally-propagating acoustic waves. In addition, it was found that the SE approach needs some damping in its formulation to ensure an acceptable stability region. Skamarock and Klemp [87] proposed a “divergence damping” term to filter the acoustic modes in their analysis of the leapfrog-based KW78 approach. More recently, [35] has demonstrated the importance of an isotropic application of the divergence damping to the acoustic waves. Alternatively, they proposed using simple off-centering for the sub-step implicit solver. Baldauf [6] includes a comprehensive stability analysis of RK-based SE approaches, where the free parameters associated with the various components of the method are optimised, in terms of accuracy and stability. In particular, optimal values are proposed for the off-centering of the implicit solution of the acoustic and buoyancy terms, and the magnitude of the divergence damping applied to the acoustic waves.

The precise integration schemes used in a SE approach are open to choice: simple, efficient methods for the fast components; and a method with good accuracy and an acceptable window of stability (in terms of time-step length) for the slow components. A SE approach has been adopted in a number of active research and operational limited-area atmospheric models: the JMA’s non-hydrostatic Mesoscale Model (JMA-NHM) continues to use a leapfrog-based approach for the long time-step integrations, but includes some low-order advection components in the short time-step computations to improve the computational stability [79, 80]. Other groups have moved towards single-time-level, multistage explicit Runge–Kutta schemes to integrate the slow components—both the COSMO [6, 7] and WRF [56, 88] models use a 3-stage 3rd-order Runge–Kutta (RK) scheme for integrating the slow components, retaining the forward–backward and trapezoidal schemes (previously described) for the fast components (following the analyses of RK methods for time-splitting in [108, 109]). Figure 3b illustrates the 3-stage RK-based approach.

With the recent developments of global models with “very” high resolution (grid-spacings \(\le \mathcal {O}\left( 10\right)\, \mathrm {km}\)), SE methods are now being used also for global NWP: the MPAS model [89] has adopted the SE approach (as described in [56]) based directly on the successful experiences with the 3-stage 3rd-order RK-based SE approach in WRF. The high resolution global non-hydrostatic model, NICAM, also uses a RK-based SE approach (with options for 2nd- or 3rd-order RK schemes) [82, 83].

3.1.2 Horizontally-Explicit Vertically-Implicit (HEVI) Schemes

Similarly to SE approaches, HEVI schemes are becoming more and more attractive due to the latest advancements in computing that are driving the development of “very” high-resolution global NWP models. For global NWP, the stratosphere plays a significant role in the global circulation [48, 50, 71]. Inclusion of a well-represented stratosphere has implications for the chosen time-integration methods, since the stratospheric polar jet (which contributes via the advection term) reaches speeds exceeding \(100\;\mathrm {m\;s}^{-1}\), i.e., the advective Courant number approaches the acoustic one. As highlighted in [34], in this context the efficiency gains from the SE approach become less clear: the horizontal splitting (which defines the sub-stepping) in SE schemes is only relevant when there is a scale-separation between fast insignificant and slow significant processes. With the acoustic and advective Courant numbers being similar, the sub-step \(\varDelta \tau \) and the model time-step \(\varDelta t\) are constrained by similar stability limits and little efficiency can be gained from sub-stepping. In addition, as already noted, SE models require artificial damping to ensure stabilization, with atmospheric models typically employing divergence damping (see, e.g., [6]). HEVI-based alternatives can be efficiently used for global non-hydrostatic equation sets and do not have the drawbacks affecting SE schemes.

Consider again (7), describing the atmosphere as containing contributions from two scale-separated processes:

$$\begin{aligned} \frac{\partial {\mathbf {y}}}{\partial t}={\mathcal {R}}_{g}(t,{\mathbf {y}})+ {\mathcal {R}}_{f}(t,{\mathbf {y}}), \;\; \left| \left| {\mathcal {R}}_{f}\right| \right| \gg \left| \left| {\mathcal {R}}_{g}\right| \right| , \end{aligned}$$

where, for global atmospheric models, the scale-separation occurs due to the order-of-magnitude difference in grid-spacings in the horizontal and vertical directions, that is \(z_{h} \ll s_{h}\). In this context, \({\mathcal {R}}_{f}\) contains vertically-propagating processes and \({\mathcal {R}}_{g}\) contains horizontally-propagating terms. The problem naturally lends itself to an IMEX (Implicit–Explicit) approach, examples of which have been widely analyzed in the literature for use in very stiff diffusion-dominated (parabolic) systems. For the atmospheric (hyperbolic) system, IMEX schemes have only recently been analyzed in the context of HEVI solutions. The analyses have tended to focus on (single time-step) Runge–Kutta (RK) based approaches [8, 23, 40, 63, 100, 106], which inherently avoid any problems associated with the computational modes that derive from multi-step methods.

A \(\nu \)-stage IMEX-RK method can be expressed by a so-called “double Butcher tableau”:

where \(\tilde{c}_i=\sum _{j=1}^{\nu }\tilde{\alpha }_{ij}\) and \(c_i=\sum _{j=1}^{\nu }{\alpha }_{ij}\), and \(\sum \tilde{b}_j =1\) and \(\sum b_j = 1\). Applied to the atmospheric system of interest, subject to a HEVI-based discretization, this notation leads to:

$$\begin{aligned} {\mathbf {Y}}^{(j)}=&\;{\mathbf {y}}^n+\varDelta t \sum _{\ell =1}^{j-1}\tilde{\alpha }_{j\ell }{\mathcal {R}}_{g}\left( t^n+\tilde{c}_{j} \varDelta t,{\mathbf {Y}}^{(\ell )}\right) + \sum _{\ell =1}^{j}\alpha _{j\ell }{\mathcal {R}}_{f}\left( t^n+ c_{j}\varDelta t,{\mathbf {Y}}^{(\ell )}\right) , \end{aligned}$$
(9)
$$\begin{aligned} {\mathbf {y}}^{n+1}=&\;{\mathbf {y}}^{n} +\varDelta t \sum _{j=1}^{\nu }\tilde{b}_{j}{\mathcal {R}}_{g}\left( t^{n}+\tilde{c}_j\varDelta t, {\mathbf {Y}}^{(j)}\right) + \sum _{j=1}^{\nu }b_{j}{\mathcal {R}}_{f}\left( t^n + c_j\varDelta t,{\mathbf {Y}}^{(j)}\right) , \end{aligned}$$
(10)

where the explicit scheme is used to integrate the horizontally-propagating terms in \({\mathcal {R}}_{g}\), and the implicit scheme is used for the vertically-propagating terms in \({\mathcal {R}}_{f}\).

Similar to the SE approach, the aim is to optimize the computational cost by selecting a relatively cheap (due to its local nature) but appropriately accurate (say, 3rd order) conditionally stable explicit scheme, which places a limit on the time-step \(\varDelta t\); and a less accurate (say, 2nd order) unconditionally stable implicit scheme to handle the large Courant number vertical processes. The loss of accuracy in computing the vertical processes is offset by their lesser physical importance. As for the SE case, the implicit problem is also cheap (e.g., a tridiagonal system), since vertical grid columns remain complete on each compute node.

Expressing the RK-HEVI approach as a double Butcher tableau makes it efficient to explore many alternative combinations of schemes—both through linear analyses [23, 40, 63] and numerical simulations [23, 40, 106]. The analyses focus on the accuracy and stability implied for components of simple linear systems (acoustic and gravity waves, and advection); and the performance of numerical implementations for idealised (dry) atmospheric tests, often compared to solutions from a very high-resolution high-order explicit RK method. Substantially different approaches have been recommended: some completely splitting the vertical and horizontal integrations using Strang-type splitting [8, 97, 100]; others proposing schemes that keep the vertical and horizontal solutions balanced in time, by integrating over the same time-interval, at each predictor-stage [40, 63, 106]. In keeping with the semi-implicit approach (Sect. 3.2), [106] stresses the importance of ending the integration with a stage that includes an implicit integration, thereby ensuring a balanced final solution.

Colavolpe et al. [23] proposes a further extension to the double Butcher tableau approach—a quadruple Butcher tableau, whereby the horizontal pressure-gradient and divergence terms are treated separately to the horizontal advection. Under their scheme, all four solutions are balanced in time at each predictor-stage, but in addition, a forward–backward type operation is introduced for the pressure-gradient and divergence terms (based on [68]) that alternates the forward/backward operations for the windspeed/pressure solutions at two predictor stages. They demonstrate that the additional splitting brings greater stability and accuracy at no extra computational cost.

Only one operational global NWP model but a number of global non-hydrostatic research models (cf. Table 1) have adopted HEVI time-integration methods. This trend seems to indicate that HEVI schemes are being considered as a valuable alternative to more commonly adopted time-integration strategies (such as the semi-implicit semi-Lagrangian method), especially for high-resolution models.

3.2 Path-based time-integration (PBTI)

PBTI schemes known also as Lagrangian methods, solve the original IBVP simultaneously in space and time without separating the BVP from the IVP. The PDEs in this case are seen as physical constraints on the path that can be followed to connect two states in the four-dimensional time-space continuum [93], as depicted in Fig. 4. The connection between two arbitrary states is obtained through a trajectory integral applied to the equation

$$\begin{aligned} \overbrace{\frac{\partial {\mathbf {y}}}{\partial t} + ({\mathbf {u}}\cdot \nabla ) {\mathbf {y}}}^{\frac{\text {D}{\mathbf {y}}}{\text {D}t}} = {\mathcal {R}}({\mathbf {y}}, t). \end{aligned}$$
(11)

In Eq. (11), the advective term (where \({\mathbf {u}}\) is the advection velocity that can eventually depend on \({\mathbf {y}}\)) on the left-hand side is absorbed into the material (or path) derivative \(\frac{\text {D}{\mathbf {y}}}{\text {D}t}\) and the right-hand side \({\mathcal {R}}\) consists of forcing terms, e.g., the pressure-gradient term, the Coriolis terms and the source terms arising from the parametrization of the sub-grid physical processes in an NWP model.

Fig. 4
figure 4

Conceptual schematic of PBTI schemes

The trajectory integral of \({\mathcal {R}}\) can be approximated using a weighted average of the integrand values from the physical space at a past time \(t_{\text {D}}\), the “departure time” where the state of the system is known, and the values from the physical space at a future time \(t_{\text {A}}\), the “arrival time”, where the solution is sought. In the PBTI approach when the approximation solely relies on the past values, the integration scheme is explicit, while when the approximation depends on the future (unknown) values of the integrand \({\mathcal {R}}\), the integration scheme is implicit. In the case of explicit integration schemes, the solution of the system is fairly straightforward but the approximation can become numerically unstable, for time-steps exceeding the Eulerian CFL condition of the fast processes, leading to failure of the simulation. On the other hand, in the case of implicit integration schemes, such as the commonly used trapezoidal rule (semi-implicit Crank–Nicholson), the approximation is guaranteed to be unconditionally stable but the resulting system of equations, usually in the form of a BVP, becomes more complicated and its numerical solution more difficult.

PBTI techniques have been very successful in NWP (as indicated in Table 1 with adoption of the technique as late as 2014). The most common PBTI strategy employed in NWP is the Semi-Lagrangian (SL) scheme that revolutionized the field two decades ago [94, 98]. Pure Lagrangian approaches, where the exact solution at the next time step is sought by translating the flow information on the mesh at the current time level along the trajectory integrals, with remapping only for postprocessing purposes, have never been adopted into operational NWP models. This has been mainly due to the initial mesh being significantly deformed in a few time-steps which results in large spatial truncation errors. Having a mesh with vertically aligned grid-points is essential for resolving complex diabatic processes and vertically-propagating gravity waves in a NWP model. In contrast to the pure Lagrangian approach, there is no mesh deformation in a semi-Lagrangian scheme. The reason is that backward trajectories are calculated at each time-step; these end at the model grid-points while they start from locations between mesh grid-points that must be determined.

More recently, forward-in-time finite volume (FTFV) integrators, that can be written in a congruent manner as the SL scheme, have also emerged [91] with applications in NWP and climate. Furthermore, vertical Lagrangian coordinates have been successfully applied in hydrostatic models [46].

3.2.1 The semi-implicit semi-Lagrangian scheme (SISL)

The semi-Lagrangian (SL) method [74, 94] is an unconditionally stable scheme for solving the generic transport equation

$$\begin{aligned} \frac{\text {D}y}{\text {D}t} = \text {S}, \qquad \frac{\text {D}}{\text {D}t}=\frac{\partial }{\partial t} + {\mathbf {u}} \cdot \nabla ,\quad {\mathbf {u}}=(u,v,w) \end{aligned}$$
(12)

where \({\mathbf {u}}\) denotes a wind vector, y a transported variable and S a source term. Beyond stability, an additional strength of the SL numerical technique is that it exhibits very good phase speeds and little numerical dispersion (see, e.g., [38, 94]). Because of these properties, SL solvers can integrate the prognostic equation sets of atmospheric models stably with long time-steps at Courant numbers much larger than unity, without distorting the important atmospheric Rossby waves. When a SL scheme is coupled with a semi-implicit (SI) time discretization, long time-steps can be used in realistic atmospheric flow conditions where a multitude of fast and slow processes coexist. In semi-implicit semi-Lagrangian (SISL) schemes the high-speed gravity waves associated with high-frequency fluctuations in the wind divergence are mitigated, where the terms responsible for the gravity waves are identified and treated in an implicit manner, thereby slowing down the fastest gravity waves.

The SISL approach is currently the most popular option for operational global NWP models while it is also often used in limited area modeling. As shown in Table 1 the vast majority of the listed global NWP centers are using a model with a SISL dynamical core. A typical example is the ECMWF forecast model IFS, which has used a SISL approach since 1991. As discussed in [86], the change from Eulerian to semi-Lagrangian numerics improved the efficiency of IFS by a factor of six thus enabling a significant resolution upgrade at that time. Since 1991, further successful upgrades followed and currently the (high resolution) global forecast model is run at 9 km resolution in grid-point space, to this date the highest in the world.

To explain how a SISL method works we shall write the prognostic equations of the atmosphere in the compact form:

$$ \frac{\text {D}{\mathbf {y}}}{\text {D}t}={\mathcal {R}}({\mathbf {y}}),$$
(13)

where \({\mathbf {y}}=(y_{i})\), \(i=1,2,\ldots ,N\) is a vector of N three-dimensional prognostic scalar fields \(y_i\) (such as the wind components, temperature, density, water vapour and other tracers) and \({\mathcal {R}}=(R_{i})\) is the corresponding forcing term. Integrating (13) along a trajectory, which starts at a point in space D, the departure point, and terminates at a point in space A, the arrival point

$$\begin{aligned} \frac{{\mathbf {y}}^{t+\varDelta t}_{\text {A}}-{\mathbf {y}}^{t}_{\text {D}}}{\varDelta t}=\int _{t}^{t+\varDelta t}{\mathcal {R}}\left( {\mathbf {y}}(t) \right) \text {d}t \end{aligned}$$
(14)

and approximating the right-hand side integral using the second order trapezoidal scheme yields the following SISL discretization

$$\begin{aligned} \frac{{\mathbf {y}}^{t+\varDelta t}_{\text {A}} - {\mathbf {y}}^t_{\text {D}}}{\varDelta t} = \frac{1}{2} \left( {\mathcal {R}}^{t}_{\text {D}} + {\mathcal {R}}^{t+\varDelta t}_{\text {A}} \right) . \end{aligned}$$
(15)

In any SISL scheme there are three crucial steps that influence the numerical properties of the discretization, namely (i) the calculation of the departure point locations and the related interpolation of the prognostic variables at these points, (ii) the semi-implicit time discretization of the nonlinear forcing terms and (iii) the solution of the final semi-implicit system reduced in the form of a Helmholtz elliptic equation. We detail each of these steps in the following.

(i):

SL advection and calculation of the departure points All operational SL codes work “backwards” in the sense that at a given discrete point in time t and with a model time-step of \(\varDelta t\) an air-pracel will start from a point in space between grid-points and will terminate at a given mesh grid-point. The latter are called “arrival points” and coincide with the model mesh grid points while the former are called “departure points” and they must be found as they are not known a priori. There is a unique departure point associated with each grid-point to be computed (this assumes that characteristics do not intersect, i.e., no discontinuities are permitted). Therefore, for a simple passive scalar advection of a generic field y without forcing, the solution at a new time step is:

$$ y^{t+\varDelta t}_{\text {A}} = y_{\text {D}}^{t}.$$
(16)

This means that to compute the field y values at the new time-step \(t+\varDelta t\), it suffices to compute a departure point “D” for each model grid point and then interpolate the transported field y at these departure points. The interpolation method uses the known y-values at time t, at a set of grid points nearest to “D”; the number and location of these grid points depend on the order of interpolation method used. For the more general problem (13), the forcing terms should also be interpolated at the departure point. To compute the location of the departure points the following trajectory equation must be solved:

$$ \frac{\text {D}\mathbf {r}}{\text {D}t}={\mathbf {u}}(\mathbf {r},t), $$
(17)

where \(\mathbf {r}\) denotes the coordinates of a moving fluid parcel, for example \(\mathbf {r}=(x,y,z)\) if a Cartesian system is used. By integrating Eq. (17), we obtain:

$$ \mathbf {r}_{\text {A}} - \mathbf {r}_{\text {D}} = \int _{t}^{t+\varDelta t}{\mathbf {u}}(\mathbf {r},t)\text {d}t.$$
(18)

The right-hand side integral of (18) is usually approximated using a 2nd order scheme such as the midpoint rule, resulting in an implicit equation of the form

$$ \mathbf {r}-\mathbf {r}_{\text {D}} = \varDelta t \,{\mathbf {u}}\left( \frac{\mathbf {r}+\mathbf {r}_{\text {D}}}{2},t+\frac{\varDelta t}{2}\right),$$
(19)

which is solved iteratively (for details see [27]). The accuracy with which the departure points are computed influences greatly the overall accuracy of the model as shown in [27].

In addition, the method employed to interpolate the terms of Eq. (13) to the departure points has also important implications in the model accuracy. From this perspective, it is common practice in operational SISL models to use a cubic interpolation formula most often based on tri-cubic Lagrange interpolation followed by formulae based on cubic Hermite or cubic spline polynomials. The interpolation is directional, i.e., it is performed separately in each of the three spatial coordinates. There is an intriguing interplay between the spatial and time truncation error in the SL advection method. Following the convergence analysis in [31], verified experimentally in [112] using the Navier–Stokes system, the leading order truncation error term for a SL method solving a 1D constant wind advection equation with an interpolation formula of order p on a grid with constant spacing \(\varDelta x\) and a time-integration method for the departure point of order k with time-step \(\varDelta t\) is \(\mathcal {O}(\varDelta t^k + \varDelta x^{p+1}/\varDelta t)\). This suggests that reducing the time-step only without refining the mesh resolution may not improve the overall solution accuracy as it increases the contribution from the error term which has \(\varDelta t\) in the denominator. However, with a shorter time-step the accuracy of the departure point calculation improves and a higher order interpolation scheme improves the accuracy of spatial structures such as waves [29].

(ii):

Semi-implicit time-discretization of forcing terms Eq. (15) is expensive and complex to solve due to its large dimension, its implicitness and in general its nonlinear form (right hand-side \({\mathcal {R}}\) includes nonlinear terms). For this reason, an approach commonly used in NWP is to extract fast terms from the right-hand side and linearise them around a constant reference profile. For example, in the IFS model the right-hand forcing term is split as follows:

$$ {\mathcal {R}}=\mathcal {N}+\mathcal {L} $$

where \(\mathcal {L}\) contains the linear and linearised fast terms which are integrated implicitly and \(\mathcal {N}\) the remaining nonlinear terms \(\mathcal {N}={\mathcal {R}} - \mathcal {L}\) which are integrated explicitly. A two-time-level second order SISL discretization of (15) can be written as follows:

$$\begin{aligned} \frac{{\mathbf {y}}^{t+\varDelta t}_{\text {A}}-{\mathbf {y}}^t_{\text {D}}}{\varDelta t}=\frac{1}{2}\left( \mathcal {L}^t_{\text {D}} + \mathcal {L}^{t+\varDelta t}_{\text {A}}\right) + \frac{1}{2} \left( \mathcal {N}^{t+\varDelta t/2}_{\text {D}} + \mathcal {N}^{t+\varDelta t/2}_{\text {A}} \right) . \end{aligned}$$
(20)

The slowly varying nonlinear terms at \(t+\varDelta t/2\) can be “safely” approximated by a second order extrapolation formula such as

$$ \mathcal {N}^{t+\varDelta t/2} =\frac{3}{2} \mathcal {N}^t - \frac{1}{2}\mathcal {N}^{t-\varDelta t}$$

or alternatives such as SETTLS [49], which are less prone to generate numerical noise. The latter aspect—i.e., the numerical noise issue—is particularly relevant in the stratosphere where large vertically stable areas occur and any small scale oscillations appearing due to the three time-level form of the extrapolation formula used may be amplified. Using “iterative semi-implicit” schemes [11, 26, 111], in which a future model state is predicted with 2-iterations (or more) with the first serving as a predictor and the second as a corrector, is the most effective method for solving the noise issue. However, it is more costly due to its iterative nature.

There is considerable variation in the implementation of the SI time-stepping by different models. An alternative, iterative, approach to the SI method (20) is followed by the UK Met Office (UKMO) Unified Model, where there is no separate treatment between linear and nonlinear terms. Here, the standard off-centerd semi-implicit discretization is used:

$$\begin{aligned} \frac{{\mathbf {y}}^{t+\varDelta t}_{\text {A}}-{\mathbf {y}}^t_{\text {D}}}{\varDelta t}=(1-\alpha ){\mathcal {R}}^t_{\text {D}} +\alpha {\mathcal {R}}^{t+\varDelta t}_{\text {A}}. \end{aligned}$$
(21)

The weight \(\alpha \) is either 0.5 or slightly larger to avoid non-physical numerical oscillations (noise) which may arise due to spurious orographic resonance [76]. To tackle the implicitness of Eq. (21) an iterative method with an outer and inner loop is used. This functions as a predictor-corrector two-time-level scheme. As stated in [66], in the outer iteration loop, the departure-point locations are updated using the latest available estimates of the winds at the next time step. In the inner loop the nonlinear terms, together with the Coriolis terms, are evaluated using estimates of the prognostic variables obtained at the previous iteration. Further details on the iterative approach followed by the UM can be found in [66, 111]. Iterative SI schemes are expensive algorithms, however, they are used by most non-hydrostatic SISL global models as in practice the cheaper non-iterative schemes based on time-extrapolation become unstable when long time steps are used.

(iii):

Helmholtz solver Once the right-hand side of (20) has been evaluated the semi-implicit system can be solved. To avoid solving simultaneously all implicit equations in (20), it is common practice to derive a Helmholtz equation from these. The form of the Helmholtz equation depends on the type of space discretization. In spectral transform methods, such as the one used in IFS [104], the specific form of the semi-implicit system is derived from subtracting a system of equations linearised around a horizontally homogeneous reference state. The solution of this system is greatly accelerated by the separation of the horizontal and the vertical part, which matches the large anisotropy of horizontal to vertical grid dimensions prevalent in atmospheric models. In spectral transform methods, one uses the special property of the horizontal Laplacian operator in spectral space on the sphere

$$ \nabla ^2\psi _n^m = {n(n+1) \over a^2} \psi _n^m,$$
(22)

where \(\psi \) symbolises a prognostic variable, a is the Earth radius, and (nm) are the total and zonal wavenumbers of the spectral discretization [104]. This conveniently transforms the 3D Helmholtz problem into a 2D matrix operator inversion with dimension of the vertical levels only, resulting in a very cheap direct solve [75]. Even in the non-hydrostatic context, formulated in mass-based vertical coordinates [60], only the solution of essentially two coupled Helmholtz problems allow the reduction of the system in a similar way [12, 14, 115]. This technique requires a transformation from grid-point space to spectral space and vice versa at each time-step, an aspect that increases the associated computational cost although the spectral space computations are based on FFTs and matrix–matrix multiplications that are well suited for modern computing architectures. One disadvantage of this technique is the need for a somewhat simple reference state that does not allow, by definition, the inclusion of horizontal variability (as would be desirable for terms involving orography). The relaxation of this constraint and some alternatives are discussed, for example, in [17].

For grid point models using finite differences, such as the UKMO Unified Model, a variable coefficient 3D Helmholtz problem is solved using an iterative Krylov subspace linear solver (e.g., BiGCstab, GCR(k), GMRES etc.) [111]. This type of solver is generally more expensive despite grid point models not requiring transformations from spectral to grid point space and vice versa, which offsets some of the extra cost. However, typically up to 80 percent of computations are spent in the solver in grid-point based semi-implicit methods, compared to 10–40% in spectral transforms (depending on the resolution and the number of MPI communications involved) on today’s high performance computing (HPC) architectures. For emerging and future architectures, that may heavily penalize global communication patterns moving to high throughput capabilities, this is a serious concern that needs to be properly investigated and addressed.

3.3 Summary

In this section, we categorized time-integration schemes into two classes Eulerian-based, EBTI, and path-based, PBTI. The former discretizes the original PDE problem in space first, thus obtaining an ODE, and subsequently in time through a suitable time-integration strategy. The PBTI class, instead, solves the PDE problem in a single step, where the advection term is adsorbed into the path (or material) derivative and the right-hand side is formed by forcing terms only. In this case, the system of PDEs can be seen as a physical constraint on the path that can be followed to link two states in the four-dimensional continuum constituted by space and time.

For each of the two categories, EBTI and PBTI, we outlined the most prominent time-integration schemes adopted in the NWP and climate communities. In particular, SE and HEVI for the EBTI class, and the SISL approach for the PBTI class. The latter was the most widely adopted in the past few decades thanks to the hydrostatic approximation ubiquitously used in global weather and climate models. Indeed, PBTI strategies and SISL in particular, were extremely competitive given the large time-steps they allow and their extremely favorable dispersion properties, which yield the correct representation of wave-like solutions—e.g., Rossby and gravity waves. EBTI strategies are instead now emerging as a potential alternative to PBTI in non-hydrostatic models. This is mainly because they can be constructed to ‘filter’ the fast and atmospherically irrelevant acoustic modes that propagate vertically. In fact, SE and HEVI schemes that had been mainly used in the context of limited-area models are now being taken into consideration in global weather models, as they seem to represent an attractive compromise between solution accuracy, time- and energy-to-solution and reliability. In addition, they can address the non-conservation issues typical of PBTI-based approaches, a feature that is particularly relevant for climate simulations. Some additional strategies, beyond HEVI, SE and SISL methods are also under investigation within global NWP and climate simulations, namely IMEX schemes, fully implicit methods and conservative semi-Lagrangian schemes—the latter addressing the conservation issues of traditional SISL schemes. These additional strategies will be briefly discussed in Sect. 5.

Note that while the time-integration approaches described in this section might differ in their implementation details from one model to another, the general concepts and properties, which motivate their use, hold true across all the models adopting each strategy. In the next Sect. 4, we will highlight the time-stepping strategies employed by the main operational global NWP and climate centres and emphasize in more detail why these were selected. In addition, we will outline the implications these choices may have in the context of the changing hardware and weather modeling landscape. Finally, we will introduce some of the (several) projects undertaken within the weather and climate industry to address the computational challenges of the incoming decades.

4 Overview of Operational Time-Integration Strategies Adopted by the NWP and Climate Centers

The time-stepping practices adopted by the leading NWP and climate operational centers are reported in Table 1, where we also indicate the model name, the application type, the set of equations solved—i.e., either hydrostatic (H) or non-hydrostatic (NH), the specific time-integration scheme used and its class—i.e., either EBTI or PBTI, the approximate date of operational activity or documented performance, and the country of origin. All these models vary widely in respect to the choices of the physical parameterization and spatial discretization. However, here we concentrate on the time-integration schemes.

Table 1 Time-stepping strategies adopted by the main operational global NWP and climate models as of 2017

From this point of view, Table 1 clearly shows that nearly all operational NWP and climate models use PBTI (SISL) approaches and the majority use the hydrostatic approximation—see top part of the Table, where, apart from the German model ICON that uses a non-hydrostatic model and a HEVI scheme, all the others use PBTI schemes, and SISL in particular. These choices are dictated by a desire to maximize time-to-solution performance, that has been (and still is) one of the main objectives in the industry. It is also worth noting that, despite the continuous upgrade of the operational models—see the column “Date” in the Table—all of them keep adopting SISL, even if they transitioned to a non-hydrostatic approximation. This might be an indicator of the inevitable algorithmic inertia within the weather and climate industry due to the strict operational constraints and the relatively complex and large code frameworks developed over several decades.

On the other hand, if we look at research models (bottom part of Table 1, denoted with RD), they are predominantly non-hydrostatic and use EBTI approaches with substantially smaller permitted time-steps. The use of EBTI schemes is justified by their better parallel efficiency (than SISL) that may compensate for the shorter forward-in-time stepping of the dynamical core. Indeed, there seems to be a trend in the weather and climate industry to favour EBTI approaches over PBTI for next-generation non-hydrostatic models, as briefly mentioned in Sect. 3. The associated spatial discretization for the spatial derivatives arising on the right-hand side of each time-stepping strategy can assume various forms that include global spherical harmonics bases, finite-differences, finite-volumes, and spectral element methods (e.g., continuous and discontinuous Galerkin).

Given what we discussed above, it should be clear that the choice of the time-stepping strategy is strongly influenced by the choice of equations, as most non-hydrostatic equations require numerical control of vertical acoustic wave propagation, whereas hydrostatic models filter these modes a priori. The stiffness of the problem arising from the fast propagating acoustic waves needs to be handled carefully. Options include SE methods, HEVI schemes, and, more recently, IMEX methods (discussed in Sect. 5). In essence, many of the approaches originally used in limited-area modeling have been applied in non-hydrostatic global models. It should be also noted that these schemes are favourable in terms of parallel communication patterns, as they require a reduced amount of data to be transferred at each time-step, being mostly nearest-neighbour communication algorithms, and have a higher throughput (flops/communication) than PBTI approaches. This attractive feature is particularly suited to emerging computing architectures that are commonly communication bounded. However, the larger number of time-steps (due to shorter permitted time-steps), can counter balance this advantage in favour of larger-time-step algorithms, including SISL.

Overall, the suitability to emerging hardware architectures of a given time-stepping strategy is of crucial importance for planning a sustainable path to future high-resolution weather and climate prediction systems, as there will be the pressing need to mitigate the exponentially increasing running costs of operational centres. In fact, the use of emerging many-core computing architectures (e.g., GPUs and MICs) help reducing the power consumption of the overall HPC centre where the simulations are run (the power required grows exponentially with the clock-speed of the processor; therefore the use of many-core computing technologies that have a lower clock rate mitigate the growth in energy consumption—see e.g., [9]). With the increased model resolutions envisioned in the next few decades, another critical aspect is the efficient treatment of a massive amount of gridded data (the so-called data tsunami problem). In this case, the adopted time-stepping strategy has important implications, since a larger number of time-steps could imply a larger amount of gridded data produced. This obviously has direct consequences on the costs to maintain and power the servers where the data are stored, and might affect the efficient post-processing and dissemination (to the clients or member states) of the results.

Indeed, these issues are being closely evaluated by several weather and climate agencies through various projects, including the European Exascale Software Initiative (EESI), Energy efficient SCalable Algorithms for weather Prediction at Exascale (ESCAPE) and Excellence in Simulation of Weather And Climate in Europe (ESiWACE). All these initiatives have the general recommendation for research in the areas of development of efficient numerical methods and solvers capable of complying with exabyte data sets and exascale computational efficiency—i.e. next-generation HPC systems.

The efforts being spent to address the computational efficiency issues of weather and climate algorithms run in parallel to the development of the high-resolution medium-range global and nested NWP models, as many government agencies are extensively researching new candidates for their operational services. In fact, these new models must be able to make a quantum leap forward in forecast skill, by being able to efficiently utilize the latest developments in data assimilation (e.g., 4D-En-Var [45]), scale-aware physical parameterizations, and, as just mentioned, modern HPC technologies (e.g., GPU/MIC). From this perspective, several international and interagency activities have been organized to select best candidates for next-generation models. For example, the NOAA High Impact Weather Prediction Project (HIWPP), seeks to improve hydrostatic-scale global modeling systems and demonstrate their skill at the coarsest resolution down to \(\sim \) 10 km. They are also trying to accelerate the development and evaluation of higher-resolution non-hydrostatic, global modeling systems at cloud-resolving (\(\sim \) 3 km) scales. As a part of Research to Operations (R2O) Initiative, the NOAA / National Weather Service (NWS) led the inter-agency effort to develop a unified Next Generation Global Prediction System (NGGPS) for 0–100 days predictions to be used for the next 10–20 years. The new prediction model system designed to upgrade the current operational system, GFS, has to be adaptable to and scalable on evolving HPC architectures. The research and development efforts included the U.S. Navy, NOAA, NCAR and university partners. A similar effort has been undertaken in Japan with the Japan Meteorological Agency (JMA), which has been exploring its own next generation non-hydrostatic global numerical weather prediction model since 2009. The UK MetOffice is also pursuing new modeling efforts through its development of a scalable dynamical core “GungHo” (Globally Uniform, Next Generation, Highly Optimised) [95]. The dynamical cores are also a subject of many intercomparison projects, such as the Dynamical Core Model Intercomparison Project (DCMIP) [101], which in a few editions has hosted more than 20 different models. In addition to the current operational models, several research models may offer good candidates for next generation operational systems replacing the current state-of-the-art solutions.

The overarching objective is to increase the computational efficiency of the models, thereby reducing the operating costs (e.g., energy-to-solution) and the time-to-solution performance, in order to increase the accuracy and resolution of the models at a sustainable economic cost. This aspect is intimately related to the overall co-design of the algorithms underlying weather and climate models where current and emerging hardware and the adopted time-integration method will play a major role. Specifically, the amount of data communicated during each simulation, alongside the length of permitted time-step and the percentage of peak performance achieved on a given computing machine, are a direct consequence of the chosen numerical discretization, and in particular the time-integration. Emerging and future time-integration strategies should take into account all these factors to provide an effective path to sustainable NWP and climate simulations.

5 Emerging Alternatives

For both EBTI and PBTI, we outline some emerging alternatives that are being considered within the community, namely IMEX and fully implicit schemes for the EBTI class, and conservative semi-Lagrangian schemes for the PBTI class. The investigation of these time-stepping schemes for weather and climate is aligned with the general guidelines provided in Sect. 4, where co-design of software and hardware, energy-to- and time-to-solution are key players. In fact, in terms of EBTI strategies, there seems to be a trend to explore (semi-)implicit solutions, that eventually maximize the time-to-solution but would require the development of efficient algorithms to achieve increased percentage of peak performance (of a given machine) and better scalability properties. On the other hand, for PBTI strategies, efforts have been spent on developing conservative SISL schemes, thereby addressing the concerns of lack of accuracy for long-range weather and climate simulations.

5.1 EBTI Strategies

Both the strategies described below aim to improve the time-to-solution requirement of weather and climate simulations. In addition, if coupled with compact-stencil spatial discretizations, including finite volume and spectral element methods, they can be efficiently used on emerging computing architectures, due to the reduced parallel communication costs required. However, the solution procedure for both will require iterative elliptic solvers and iterative Newton-type methods, with all the associated numerical complexities that are to be addressed to make them a suitable alternative for operational NWP and climate.

5.1.1 Implicit–Explicit (IMEX) Methods

Although not as prevalent as HEVI schemes, IMEX methods are gaining interest in many fields (including the geosciences) for evolving PDEs forward in time. We can outline IMEX methods starting from the scale-separated problem described earlier:

$$\begin{aligned} \frac{\partial {\mathbf {y}}}{\partial t}={\mathcal {R}}_{g}(t,{\mathbf {y}})+{\mathcal {R}}_{f}(t,{\mathbf {y}}), \;\; \left| \left| {\mathcal {R}}_{f}\right| \right| \gg \left| \left| {\mathcal {R}}_{g}\right| \right| , \end{aligned}$$

where, for global atmospheric models, the scale-separation occurs due to the physical processes contained within \({\mathcal {R}}_f\) and \({\mathcal {R}}_g\). For instance, \({\mathcal {R}}_{f}\) contains a linearized form of not just the vertically-propagating acoustic waves but also the horizontally propagating ones as well, while \({\mathcal {R}}_{g}\) contains the remainder of the nonlinear processes. Therefore, we would apply a double Butcher tableaux to this form of the equations, where we use the explicit tableau for \({\mathcal {R}}_g\) and the implicit tableau for \({\mathcal {R}}_f\). This would result in the exact same form given in Eqs. (9)–(10). The difference between the IMEX method described here and the HEVI scheme described previously is that the resulting linear implicit problem is now fully three-dimensional; whereas in the HEVI case, it is is only one-dimensional along the vertical direction. What this means for the IMEX method is that the solution procedure will require iterative (elliptic) solvers which are usually handled via matrix-free approaches since the resulting system of equations will be too large to store in memory—for HEVI methods, the resulting system is quite small and hence direct solvers are appropriate. Note that the HEVI and IMEX methods can be written within the same unified temporal discretization as described in [40, 106] where the HEVI scheme is denoted as the 1d-IMEX method, meaning that HEVI is simply an IMEX method that has been partitioned to be implicit in only one of the spatial directions. Further note that whereas HEVI schemes only circumvent the CFL condition related to the vertically-propagating acoustic waves, IMEX methods entirely circumvent the CFL condition related to all acoustic waves. However, IMEX methods still must adhere to the explicit CFL condition due to the nonlinear wind field. IMEX methods become competitive with HEVI methods only when the vertical to horizontal grid aspect ratio, \(z_{h}/s_{h}\), approaches unity, when the stiffness of the problem is no longer dominated by the vertical direction due to the anisotropy of the grid.

5.1.2 Fully-Implicit Methods

While the most common option remains HEVI schemes, many research groups are exploring variants of fully-implicit methods. The fully-implicit approach begins in a similar fashion to that for the explicit problem

$$ \frac{\partial {\mathbf {y}}}{\partial t}={\mathcal {R}}(t,{\mathbf {y}}),$$

where \({\mathcal {R}}\) contains the full nonlinear right-hand-side operator. Next, we apply the implicit Butcher tableau (using the same notation as previously for the IMEX methods) to \({\mathcal {R}}\), resulting in the following fully-discrete problem:

$$\begin{aligned} {\mathbf {Y}}^{(j)}=&\;{\mathbf {y}}^n + \sum _{\ell =1}^{j}\alpha _{j\ell }{\mathcal {R}}\left( t^n+c_{j}\varDelta t,{\mathbf {Y}}^{(\ell )}\right) \end{aligned}$$
(23)
$$\begin{aligned} {\mathbf {y}}^{n+1}=&\;{\mathbf {y}}^{n} + \sum _{j=1}^{\nu }b_{j}{\mathcal {R}}\left( t^n + c_j\varDelta t,{\mathbf {Y}}^{(j)}\right) , \end{aligned}$$
(24)

which looks conceptually simpler than the IMEX problem given by Eqs. (9)–(10). However, the complexity comes in through the vector R which is a fully three-dimensional nonlinear function of \({\mathbf {y}}\). Consequently, we must solve this problem iteratively using Newton-type methods. To simplify the exposition, let us assume that we are using a 1st-order implicit RK method (i.e., backward Euler, which is strong-stability preserving and highly stable) as follows:

$$ {\mathbf {y}}^{n+1}= \;{\mathbf {y}}^n + \varDelta t {\mathcal {R}}\left( t^{n+1},{\mathbf {y}}^{n+1}\right).$$
(25)

Before employing Newton’s method, we first write Eq. (25) as the functional

$$ \mathbf {F}^{n+1}= {\mathbf {y}}^{n+1} - \;{\mathbf {y}}^n - \varDelta t {\mathcal {R}}\left( t^{n+1},{\mathbf {y}}^{n+1}\right) \equiv 0$$

where we note that a 2nd order Taylor series expansion yields

$$ \mathbf {F}^{n+1}= \mathbf {F}^{n} + \frac{ \partial \mathbf {F}^n }{\partial {\mathbf {y}} } \left( {\mathbf {y}}^{n+1} - {\mathbf {y}}^n \right) \equiv 0$$

and, after rearranging, results in the classical Newton’s method

$$ \frac{ \partial \mathbf {F}^n }{\partial {\mathbf {y}} } \left( {\mathbf {y}}^{n+1} - {\mathbf {y}}^n \right) = - \mathbf {F}^{n}. $$
(26)

Note that Eq. (26) is solved iteratively whereby we replace \(n \rightarrow k\) where k is the iteration counter such that \(k=0\) implies starting from the previous time-step (\(t^n\)) as follows:

$$\begin{aligned} \frac{ \partial \mathbf {F}^{(k)} }{\partial {\mathbf {y}} } \varDelta {\mathbf {y}} = - \mathbf {F}^{(k)}, \;\; \;{\mathbf {y}}^{(0)}={\mathbf {y}}^n \end{aligned}$$
(27)

with \(\varDelta {\mathbf {y}}=\left( {\mathbf {y}}^{(k+1)} - {\mathbf {y}}^{(k)} \right) \). Equation (27) represents a linear fully three-dimensional matrix problem that must be solved until an acceptable stopping criterion is reached (e.g., \(\parallel {\mathbf {y}}^{(k+1)}-{\mathbf {y}}^{(k)} \parallel _p < \epsilon _{stop}\) where p is a selected norm). In classical Newton-type methods, one of the largest costs is due to the formation of the Jacobian matrix \(\mathbf {J}=\frac{ \partial \mathbf {F} }{\partial {\mathbf {y}} }\). However, this cost can be substantially mitigated by the introduction of Jacobian-free Newton-Krylov methods (see, e.g., [58]) whereby we recognize that the Jacobian can be approximated as follows

$$\begin{aligned} J^{(k)}=\frac{ \mathbf {F}\left( {\mathbf {y}}^{(k)} + \epsilon \varDelta {\mathbf {y}}\right) - \mathbf {F}\left( {\mathbf {y}}^{(k)}\right) }{ \epsilon \varDelta {\mathbf {y}} } \end{aligned}$$

(where \(\epsilon \) is, e.g., machine zero) and direct substitution into Eq. (27) leads to

$$\begin{aligned} \frac{ \mathbf {F}\left( {\mathbf {y}}^{(k)} + \epsilon \varDelta {\mathbf {y}}\right) - \mathbf {F}\left( {\mathbf {y}}^{(k)}\right) }{ \epsilon }= - \mathbf {F}^{(k)}, \end{aligned}$$
(28)

which we can write in the residual form

$$\begin{aligned} \mathbf {Res}=\frac{ \mathbf {F}\left( {\mathbf {y}}^{(k)} + \epsilon \varDelta {\mathbf {y}}\right) - \mathbf {F}\left( {\mathbf {y}}^{(k)}\right) }{ \epsilon } + \mathbf {F}\left( {\mathbf {y}}^{(k)}\right) . \end{aligned}$$
(29)

Equation (29) is a linear fully three-dimensional problem that has been written in a matrix-free (Jacobian-free) form since we only need to construct the vectors \(\mathbf {F}\) and then solve the residual problem only approximately by any Krylov subspace method (e.g., GMRES, BiCGStab, etc.). One of the advantages of the Jacobian-free Newton–Krylov (JFNK) method described is that we can exploit the iterative nature of both the Krylov and Newton methods, meaning that, typically, the Krylov stopping criterion need not be as stringent as it would be for an IMEX method as long as the final solution in the Newton solver satisfies certain physical conditions.

Fully-implicit methods are unconditionally stable, like the PBTI methods described previously. However, this flexibility in taking any length time-step size (restricted only by accuracy considerations) is in practice off-set by the prohibitive cost of the iterative solution of both the inner (Krylov solution) and outer (Newton solution) loops of the JFNK method. The condition number of the linear Krylov solution is proportional to the time-step size so taking a large time-step size translates into an increase in the number of Krylov iterations. For this reason it is especially important to choose the proper Krylov method (e.g., the cost of GMRES increases quadratically with the number of iterations/Krylov vectors). Preconditioners become all the more important for this class of time-integration methods if one wishes to build competitive (using the time-to-solution metric) strategies. In fact, in the past two Supercomputing conferences, the Gordon Bell prizes have been awarded to geoscience models using fully-implicit time-integrators–in 2015 for the simulation of the Earth’s mantle [78] and in 2016 for the (dry) simulation of atmospheric flows [113].

5.2 PBTI Strategies

The main drawback of SISL schemes is their inability to conserve, e.g. mass and scalar tracers. Conservative semi-Lagrangian methods exist and are being developed, with the main issue that they usually require computationally demanding re-meshing at each time-step. This is due to the varying-control-volumes in time required to maintain the exact conservative nature of the method. Therefore, their use in the context of weather and climate is limited and currently being investigated.

5.2.1 Conservative Semi-Lagrangian Methods

In Eq. (11), we described the classical semi-Lagrangian methods that are typically used in operational NWP and climate models and we stated that they do not formally conserve, e.g., mass, although in practice they conserve it to within an acceptable level. However, conservative semi-Lagrangian methods do exist and we describe a specific class of them below.

Instead of writing the original problem in Eq. (11) as follows

$$ \frac{\partial {\mathbf {y}}}{\partial t} + ({\mathbf {u}}\cdot \nabla ) {\mathbf {y}} = {\mathcal {R}}({\mathbf {y}}, t),$$
(30)

we can write it in conservation form, that is

$$\begin{aligned} \frac{\partial {\mathbf {Y}}}{\partial t} + \nabla \cdot \left( {\mathbf {Y}} {\mathbf {u}} \right) = {\mathcal {R}}({\mathbf {Y}}, t), \, \, {\mathbf {Y}}={\mathbf {y}} \rho \end{aligned}$$
(31)

where \(\rho \) is a mass variable (e.g., density) and \({\mathbf {Y}}\) is now a conservation variable.

Integrating Eq. (31) in space yields

$$\begin{aligned} \int _{\varOmega _e} \left( \frac{\partial {\mathbf {Y}}}{\partial t} + \nabla \cdot \left( {\mathbf {Y}} {\mathbf {u}} \right) \right) d \varOmega _{e}, =\int _{\varOmega _e} {\mathcal {R}}({\mathbf {Y}}, t) d\varOmega _e \end{aligned}$$
(32)

where \(\varOmega _{e}\) is a control volume. Using the Reynolds transport theorem allows us to rewrite Eq. (32) as follows

$$\begin{aligned} \frac{d}{dt} \int _{\varOmega _e(t)} {\mathbf {Y}} \, d\varOmega _e = \int _{\varOmega _e(t)} {\mathcal {R}}({\mathbf {Y}}, t) \, d\varOmega _{e}, \end{aligned}$$
(33)

where \(\varOmega _e(t)\) is meant to imply that the shape of the control volume changes in time and, thereby, reveals the Lagrangian nature of the approach. This approach is exactly conservative provided that the integrals at the departure control volume are solved exactly (see, e.g., [37, 39, 61, 117]). The difficulty with this approach is that it requires tracking back the control volumes at the departure points which essentially amounts to the construction of the mesh defined by the departure points.

6 Discussion and Concluding Remarks

There are two key aspects that the weather and climate industry need to face in the oncoming years in terms of time-integration strategies: (i) the continuous increment in spatial resolution that is moving the equations of choice towards non-hydrostatic approximations, such that the vertical acoustic modes are not filtered a priori and a separation of horizontal and vertical motions is no longer readily achieved, placing substantially increased constraints on the time-stepping methods; (ii) the emerging computing technologies, that are significantly changing the paradigms traditionally adopted in parallel programming, thus demanding a review of algorithms and numerical solutions to maintain and possibly improve computational efficiency in next-generation HPC systems. In addition to these two points, the overall approach to the time-integration of weather and climate models should consider the operational constraints, time-to-solution (requirement to deliver a 10-days global forecast in 1 h real-time) and cost-to-solution (the latter usually translated into energy-to-solution), accuracy and robustness of the entire simulation framework.

Taking into account all these factors, it is clear that high efficiency time-stepping algorithms are required to allow operational NWP centers to complete extensive simulations with limited computer resources and satisfy the strict operational bounds. For this reason, due to the good efficiency of semi-Lagrangian advection schemes at high advective Courant numbers, the SISL methods have been at the heart of the most successful operational NWP and climate systems, e.g. IFS, UM, GSM, GFS. SISL-based schemes guarantee boundedness of the solution and unconditional stability [96], which have the advantage—compared to explicit schemes—that they permit a relatively large time-step and a very competitive time-to-solution performance [24, 73, 105, 111, 114].

The efficiency and robustness achieved with SISL is not easily replaced by alternative choices [110], although the better scalability that can be exploited on future HPC architectures, due to less reliance on communication exchanges beyond nearest neighborhoods will work in favor of techniques with compact stencils, e.g. finite volume (FV) and spectral element methods (SEM), coupled with EBTI-based approaches. The additional problem of the convergence of meridians at the globe’s poles in classical latitude-longitude grids, which results in very small simulated distances of the grid in the zonal direction (shorter than the physical distance between compute processors in some of today’s very high resolution models) and, thus, extremely small time-steps, has been widely addressed. In particular, the use of reduced quasi-homogeneous grids increases the stability of the solution and icosahedral (cf. ICON, MPAS, NICAM) or cubed sphere (cf. FV3, CAM-SE, NUMA) grids improve the computational efficiency [62, 65, 72]. Despite these improvements, compact-stencil EBTI-based techniques need to overcome additional issues in order to become competitive with PBTI schemes, especially in terms of time-to-solution. From this perspective, the development of efficient parallel preconditioners and their co-design with the underlying hardware is key, especially for IMEX, semi- and fully-implicit methods. The use of compact-stencil EBTI techniques, including FV and SEM, can also allow local grid-refinement, that in conjunction with local sub-time-stepping, can allow for improved resolution in the vicinity of steep topographic slopes at the lower boundary of models (up to 70 degrees in high-resolution global models). Such features can impose severe restrictions on the time-step and subsequently undermine numerical stability [116]. Also, the conservation of important quantities, e.g. mass and scalar tracers, that is of critical importance for climate simulations, is favored in EBTI schemes compared to classical SISL (although note the recent developments in conservative semi-Lagrangian schemes outlined in Sect. 5.2).

Given all the aspects discussed, it is clear that the choice of the time-integration scheme is a constrained multifold problem, for which a single and unified answer might not exist. However, the most promising future directions in terms of time-stepping should involve three key points:

  1. (a)

    to overcome the bottlenecks of today’s highly efficient SISL schemes and the associated cost of the solver by overlapping communications and computations [70, 81]; and to overcome accuracy drawbacks related to the large time-step choice while still correctly simulating all relevant wave dispersion relations. Promising approaches for satisfying the latter condition are exponential time integrators [36, 47];

  2. (b)

    to overcome the overly restrictive time-step limitations of EBTI schemes combined with highly scalable horizontal discretizations, either through horizontal/vertical splitting (HEVI) [2, 8, 40] or through combining SISL PBTI methods with discontinuous Galerkin (DG) discretization [99]; and

  3. (c)

    to further the scalability and the adaptation of algorithms to emerging HPC architectures involving SE [32] or fully-implicit time-stepping approaches [113], and further through exploiting additional parallelism with time-parallel algorithms [33].

The success of any of these approaches may depend on the closer integration of software and hardware development (co-design), with dedicated hardware features accelerating specific aspects of a given algorithm. From the algorithm developer’s perspective, expressing algorithms through Domain Specific Language (DSL) concepts can lead to enhanced flexibility and modularity [25], thereby shortening the deployment to potentially novel and disruptive hardware technologies (e.g., optical processors and quantum computing). Also, hardware development currently is (and will be in the next decade) strongly driven by the requirements of artificial intelligence. From this point of view, efforts are being undertaken to adapt solution algorithms to fit this new programming paradigm. This implies the use of technology based on training sets (determining weights of associated neural networks) computed outside critical time windows, that are successively applied to problems (including PDEs) described by approximated models constructed with the neural network weights (see, e.g., [77]). The weather and climate community should be aware of the factors driving hardware vendors and facilitate co-design of the underlying algorithms, in order to exploit new generation computing machines at their best. This, in turn, might help to increase percentage of machine use with respect to peak performance of the simulations, and might significantly improve time-to- and cost-to-solution, while increasing the resolution and accuracy of the models.