1 Introduction

The various tasks of offshore development and coastal shipping make it necessary to use the regional configurations of the numerical wind wave models to reproduce historical extreme events and predict potential hazards. To obtain the forecasts and hindcasts of desired quality, the suitable physical parameters of models should be identified for the specific simulation conditions.

The numerical model calibration of ocean wind wave model involves the fitting of simulation results with the in-situ and satellite wave measurements. The purpose of calibration is the identification of the physical parameters set that allows minimising the discrepancy between the model and observations.

However, it is a sophisticated task to calibrate the model manually even with the metocean experts’ involvement. The modern wind wave models are computationally intensive, and each simulation run can take hours to compute. Also, a dramatically low time-spatial coverage of the available historical wave measurements and low quality of atmospheric reanalyses in some regions (like the Arctic seas, in particular the Kara Sea region described in the paper) makes it hard to validate the parameter set reliably. The obtained parameters with minimal discrepancy can be very specialised in case of over-fitting to the low number of observed data points and can actually decrease the quality of long-term simulation results in non-observed locations or time ranges [3].

There are many well-known optimisation approaches that can be applied to automate the parameters’ tuning for environmental models as well as [8]. Despite this, in the paper a task-specific robust evolutionary algorithm is proposed. It allows to make reliable calibration decisions in situations with high environmental uncertainty and tries to ensure a tolerable solution identification.

At the moment, the modern atmospheric reanalysis still has quality issues in the Arctic region [7]. We proposed an algorithm that establishes artificial diversity for wind velocity fields. It was used to generate the probabilistic ensemble of input wind fields to take the impact of the surface forcing uncertainty into account. Then, the multi-objective fitness function was used to achieve the trade-off between robustness and performance of the optimised model.

We conducted a set of experiments to verify the effectiveness of the proposed approach against the baseline SPEA2 algorithm using the Kara Sea domain and the SWAN (Simulating WAves Nearshore) [2] model as the case study. The nine spatially distributed points were chosen to analyse the performance and robustness of the model’s configurations obtained after calibration in one-month training runs of the model. The several configurations with different subsets of calibration and validation points were compared to estimate the statistical metrics of optimisation effectiveness for both algorithms.

This paper is structured as follows. Section 2 describes the problem statement and mathematical formalisation of the robust optimisation task. Section 3 provides an overview of various calibration approaches and their applicability for the problem. Section 4 contains a detailed description of the baseline SPEA2 algorithm and the proposed robust algorithm. Section 5 is dedicated to the experimental studies (model configuration, datasets, results and metrics). Section 6 summarises the obtained results and highlights of the key findings.

2 Problem Statement

As it was noted in the introduction, coverage of observed met-ocean data (especially oceanic observations) is extremely sparse. Although, reliable information about met-ocean characteristics is needed in many regions (e.g. Arctic seas). That’s why during last decades it became a common practice to obtain the information about met-ocean events and processes from forecasting or hindcasting (retrospective) simulation results from numerical hydrodynamic models. Nevertheless, for solving such task the numerical models should be fitted (through model parameters) to the certain water area. Taking into account few spatial points and small sizes of datasets with observations, there is a serious risk of model overfitting when model fits to specific features of observed data instead of fitting to common features of the target region. Description of the solution to this problem is the main goal of this article.

Hydrodynamic model fitting through the tuning of model parameters (or model calibration) can be formulated as an optimisation task. For this purpose, it is reasonable to present the simulation process in a general mathematical notation (1).

$$\begin{aligned} Y = \{Y_1, Y_2, ..., Y_k\} = M(\xi \mid \theta ), \end{aligned}$$
(1)

where \( Y = \{Y_1, Y_2, ..., Y_k\} \) denotes multivariate output data (simulated fields, e.g. wave heights), \( M(\bullet ) \) is the model operator, \( \xi \) is the input data (boundary and initial conditions), \( \theta \) is the set of model parameters.

With that, the tuning of model parameters (or model calibration) can be formalized in terms of multi-objective optimisation in the model parameter space and written as:

$$\begin{aligned} \begin{aligned}&\theta _{opt} = {{\,\mathrm{arg\,min}\,}}_{\theta }{F(\theta )}, \\&F(\theta ) = \mathcal {G}(f_i(\theta , Y, \{x,y\})), \\ \end{aligned} \end{aligned}$$
(2)

where \( \mathcal {G}(\bullet ) \) is an operator for multiobjective transformation to \( F\), \( f_i\) is the objective function, i = 1 ... n , \(\{x,y\}\) are spatial coordinates of a point-of-interest.

In a case of wind waves hindcasting, the poor time and spatial coverage of observations make the model optimisation much harder. The over-fitting of the solution to the specific events represented in small data samples can cause a non-optimal model configuration with lower quality under different external conditions. One of the ways to improve the robustness of optimisation results is to enlarge training dataset with new instances with relatively small artificial disturbances. This issue makes it necessary to take the simulation uncertainty factors into account.

The uncertainty in the wind wave model can be represented not only by disturbances in design variables [18]. There are deviations in the environment variables that can be represented through input data sets diversity (for the SWAN model the wind forcing obtained from atmospheric reanalysis is most important). In this case input data \( \xi \) should be transformed to ensemble realisation \( \{\xi \}_n = \{\xi _1, ..., \xi _n\} \) by addition of artificial disturbance (or noise) and Eq. (2) transforms into Eq. (3). A detailed description of the ensemble procedure is given in Sect. 4.3.

$$\begin{aligned} \begin{array}{cc} \theta _{rob} = {{\,\mathrm{arg\,min}\,}}_{\theta }{\tilde{F}(\theta \mid \{\xi \}_n)}, \\ \tilde{F}(\theta \mid \{\xi \}_n) = \mathcal {G}(\tilde{f_i}(\theta \mid \{\xi \}_n,Y, \{x,y\})). \\ \end{array} \end{aligned}$$
(3)

An ensemble objective function \(\tilde{f_i}\) defines landscape of objective function over the space of parameters considering ensemble of input states \( \{\xi \}_n \). As an example, ensemble fitness function can be represented by the expected function for the ensemble of runs with small disturbances in input data (shown in Eq. (4)). This approach can be used to produce better solutions for the set of diverse environmental scenarios and increase the expected performance.

$$\begin{aligned} \tilde{f}(\theta \mid \{\xi \}_n) = \int _ { - \infty } ^ { \infty } f ( \mathbf { x },\mathbf { \xi } + \varvec{ \delta } ) \cdot p ( \varvec{ \delta } ) d \varvec{ \delta } \end{aligned}$$
(4)

As an example of the hydrodynamic model for experimental studies, third-generation wind wave model SWAN [2] was chosen. The wind waves are surface waves in the oceans and seas that caused by the interaction between water masses and sea-level wind. Wind waves models of third-generation (e.g. SWAN) allow to simulate the wave spectra and to reconstruct characteristics of waves (e.g. heights, periods, directions). The SWAN model can be described with the action balance equation (5).

$$\begin{aligned} \frac{\partial }{\partial t} N + \frac{\partial }{\partial x} c_x N + \frac{\partial }{\partial y} c_y N + \frac{\partial }{\partial \sigma } c_\sigma N + \frac{\partial }{\partial \theta } c_\theta N = \frac{S}{\sigma }, \end{aligned}$$
(5)

where on the left-hand side \(N = \frac{E}{\sigma }\) denotes the wave action density and \(E\) is an energy of wave spectrum, \(\sigma \) is the relative frequency, \(\theta \) is the group wave direction, \(c\) is the group velocity in corresponding space. The right-hand side represents the source and sink term in a form Eq. (6).

$$\begin{aligned} S = S_{in} + S_{ds} + S_{nl}, \end{aligned}$$
(6)

where \(S_{in}\) is the input energy obtained by wind, \(S_{ds}\) is the energy of dissipation and \(S_{nl}\) denotes the energy of wave-wave nonlinear interaction.

These three terms represent the genesis of wave energy sources/sinks and are a powerful handle for wave model fitting. From this point of view, it is convenient to express energy sources through model parameters. Wind energy is characterised by the drag function (DRG), wave dissipation—by the wave breaking (STMP) and bottom friction (CFW) functions. Energy flow from nonlinear interactions is relatively small and wasn’t taken into account in the current paper.

In the frame of this article, the experimental study (Sect. 5) was provided to assess the practical effectiveness of the proposed robust calibration method in comparison with the general-purpose calibration algorithms. The SWAN model configuration for the Kara Sea was chosen as a case study because of the importance of this region for offshore industrial development and extremely low density of sensors in areas of interest.

3 Related Work

Model calibration or tuning is a subject with extensive literature [8, 25]. The conservative approach is to estimate the parameters in an expert way [10, 16]. It includes the development of several candidate sets of parameters based on previous simulation experience and manual individual adjustment of each parameter. The quality metrics for the model quality assessment are calculated with the comparison of model time series and historical values obtained from the reanalyses and observations.

Since the manual “trials-and-errors” method is time-consuming and gives solution only for particular model setup, the automatic calibration of models is widely used for different aspects of environmental simulations like atmospheric [6] and ocean [26] forecasting tasks. As a basic approach, the space-filling design for the parameter space can be used [24] for model calibration. However, the high-resolution configurations of wind wave models of 3rd generation are computationally-intensive and require a lot of time to process the appropriate date range and spatial domain. This problem makes it necessary to reduce the number of runs required for calibration.

There are many well-known optimisation methods applied to environmental models like derivative-free optimisation [22], various Bayesian optimisation methods [5] and surrogate-assisted methods [9].

However, the evolutionary (genetic) algorithms are efficient enough to perform a robust solution search [14] in complex parameter space with a lack of historical data for quality assessment [20]. The applicability of evolutionary algorithms for SWAN wave model calibration is demonstrated in [12].

The robust optimal design approaches have a lot of applications in many fields [27]. They are often based on Monte Carlo methods that allow representing the uncertainty from different sources [4]. The perturbation-based ensemble allows sampling the modelling uncertainty in a more systematic way [19]. A set of simulation with small differences induced by stochastic modifications allows to increase the variability of the calibration dataset and improve the quality of models [17].

Nevertheless, the discrepancy usually simulated as additional noise in model output and observations [1] without taking the actual sources of external uncertainty (e.g. wind forcing for reanalysis) into account. The task of a reliable calibration of a wave model for a specific domain with poor observational coverage makes it necessary to implement the approach that combines the ensemble-based diversity of environmental variables with multi-objective evolutionary optimisation.

4 Evolutionary Algorithms for Models Fitting

We compared the robust wave model calibration with a baseline solution—the multi-objective evolutionary algorithm that estimates the most suitable solution without taking uncertainty into account. The other approach is based on the same algorithm with modified fitness functions—it estimates the performance and robustness of the solution with the ensemble of forecasts obtained from several model’s runs with noised inputs. The source code of both algorithms was implemented in Python and available in [23].

4.1 Baseline Approach

The commonly used SPEA2 multi-objective optimisation algorithm [28] was chosen as a baseline solution for the calibration task. In terms of evolutionary algorithms, in our case, each individual corresponds to a genotype represented by a certain set of model parameter and the phenotype (values of the objective function) are the errors of the model predictions, corresponding to these parameters. At each iteration of evolution, the Pareto-optimal set of individuals is selected according to the values of the fitness function, and all non-dominated solutions are saved in the archive. Then the mating pool is filled with a binary tournament selection and recombination and a mutation operator are applying for each individual. The resulting mating pool becomes a new population at the next iteration of the algorithm.

Despite the fact that some modern evolutionary algorithms outperform SPEA2 in some synthetic tasks [13], we decided to base the experiments on a well-studied [12] algorithm to separate the impact from the proposed ensemble-based modifications from other features’ influence.

4.2 Robust Ensemble-Based Evolutionary Calibration (REBEC) Approach

The main disadvantage of the baseline algorithm is that the model variables optimise exactly for the specific conditions that were used for the fitness function evaluation. It allows to maximise the performance for the observed case, but the solution found can be unstable even after small changes in external conditions. The lack of the time-spatial coverage of observational data for wave parameters in target regions makes it complicated to take the different external uncertainties (e.g. forcing-induced, resolution-induced, etc.) into account.

The more robust approach to model parameters optimisation can be implemented using the ensemble of wave models configured using different input data sets. We can form the stochastic ensemble of wave models with the perturbed wind forcings and search for more robust model parameters using this ensemble instead of a single model with certain forcing.

For this purpose, we can adapt the baseline SPEA2 algorithm (that was introduced above) by changing the fitness assignment strategy: for a given genotype, the set of phenotypes corresponding to the elements of the ensemble is estimated and based on its values the robust metric is calculated. The flowchart of the proposed algorithm is presented in Fig. 2.

Fig. 1.
figure 1

The landscape of an objective function for a probabilistic ensemble

It is important to find a compromise between performance and robustness of the obtained solution [11], so the fitness function for the algorithm is based on the composite estimation robustness and performance metrics. The performance can be calculated as a vector of root-mean-square errors (RMSE) against observations for a set of target points, and the robustness can be simulated in various ways [15]. Figure 1 depicts the set of the ensemble error surfaces that are used for metric calculation.

We tried to use the mean-variance as a robust metric, but it causes the domination of the solutions with low wind drag and, consequently, near-zero wind-induced variability. So, the ensemble mean was chosen as a trade-off metric. The pseudocode of the final implementation of the robust algorithm is presented in Algorithm 1.

figure a
Fig. 2.
figure 2

The main logical blocks and interconnections of proposed robust evolutionary algorithm

4.3 Synthetic Input Data Generation with Artificial Noise

To implement the proposed probabilistic optimisation method, we developed the supplementary algorithm that allows to add specific noise to wind velocity variables—U (eastward) and V (northward) vector components from atmospheric reanalysis data that is used by the wave model as an external forcing.

The algorithm starts from uniform scattering of the randomly-located sources of artificial noise in the gridded data. To obtain the realistic wind field after the application of noise, the time-spatial correlation terms are added to control the noise spreading from the source.

The noise function for the wind vector component U produced by one noise point can be written as:

$$\begin{aligned} f^{*}(j,t)=N(0,\sigma ) \cdot corr(U_{j},V_{j}) \cdot corr(U_{t},U_{t-1}) \end{aligned}$$
(7)

where j is the spatial index of source points, t is the time step index, \(\sigma \) is the standard deviation parameter of the Gaussian distribution, U is the matrix of wind U-components.

Then, the aggregated noise from N source points for specific data point induced by all source points can be obtained as:

$$\begin{aligned} f(i,t)=\sum _{j=1}^{N}f^{*}(j,t) \cdot corr(U_{i},U_{j}) \end{aligned}$$
(8)

where i is the spatial index of the data point, j is the spatial index of the noise point, t is the time step index, N is the number of noise points, U is the matrix of wind U-components.

The example of the wind field augmented with noise by the described method (with \(\sigma \) equal to 25% of basic value magnitude) is presented in Fig. 3.

Fig. 3.
figure 3

The example of comparison of (a) basic ERA-Interim for Kara Sea region and (b) wind field augmented with noise for the same region. The blue marks depicts the noise source points locations. (Color figure online)

It can be seen that the common wind patterns are similar but some wind speed variability exists. The additional post-processing procedure was applied to the perturbed model runs output to suppress the non-realistic wind height peaks in the observed calm periods. However, the near-peaks variability was preserved.

In this way, the ten wind data sets augmented with artificial noise were generated and used in an experimental study.

5 Experimental Study

The case study for the calibration task is based on the SWAN model configuration for the Kara Sea region. The significant wave height (Hsig) variable was chosen as a target variable. Moreover, the results in nine representative points were analyzed (P1–P9 presented in Fig. 4) to take into account possible spatial variability of the optimal solution.

5.1 Synthetic Data for Wave Observations

The wave observations data are required for the validation of the model quality and calibration algorithm effectiveness. However, such data often cannot be obtained from open data sources. To perform a reproducible experiment with Kara Sea configuration, we used the simulation results from the high-resolution WaveWatch III model [21] configuration. The systematic biases of synthetic observations against model were removed. Then we analysed the error metrics for the significant wave height variable against real observations in points 1–3 (RMSE is 0.29 m and MAE is 0.21 m). We accept the quality of the WaveWatch III output as sufficient to be used as the reference dataset for the optimisation algorithms’ evaluation.

To maintain the variability of experimental scenarios, we prepared 18 subsets of synthetic observational points to be used for calibration. They consist of observation points located in various spatial areas with different depths and distances to the coast. To reproduce various scenarios, the calibration subsets were initialised with random point groups of a certain size: from a single-point situation to the all-points-instead-one case. For each subset, the points that were not used during the calibration were assigned to validation sets. It allows to compare the effectiveness of the robust algorithm with baseline SPEA2 and analyse the dependency of results from selected observation points set.

5.2 Model Configuration

The SWAN model was configured with the regular curvilinear grid in cartesian coordinates. The initial conditions were obtained for a preliminary monthly spin-up run. The boundary conditions were not set (since the control points were distanced from the grid boundaries, see Fig. 4). The simulation dates range was set from 20140814.120000 to 20140915.000000. The time step for integration was defined as 120 min and output time step is 3 h. The parameterisations GEN3, COLLINS, QUADRUPL, TRIAD and DIFFRACtion were enabled. The output was configured to obtain the significant wave height (HS) values in 9 spatial points. Their locations are specified in Fig. 4.

Fig. 4.
figure 4

The part of the bathymetry of the simulation domain: the Kara Sea and Ob bay. The land cells are shaded with a grey mask. The locations of observation points and their indices are specified with green marks. (Color figure online)

5.3 Sensitivity Analysis

The sensitivity analysis of model parameters described in Sect. 2 was performed to estimate their significance. We ran the set of experiments with every parameter independently modified by additive noise with Gaussian distribution with \(\sigma {/} \mu = 0.25\) assumption. The results are obtained from 50 experiments with sequential noising of each variable from the chosen set.

The wind drag was identified as the most sensitive parameter with a high relative output and input variance ratios, the wave steepness is the second one and the sensitivity bottom friction coefficient is quite low for most of the comparison points. In can be concluded that the SWAN model error function has a wide “plateau” with similar error values and many local minimums in the “valley” area that can affect the algorithm’s convergence and robustness.

5.4 Validation of REBEC Approach

A set of experiments was conducted to compare the results of optimisation experiments. The initial population for both calibration approaches was produced using Latin hypercube sampling (LHS) in the parameter space. The parameters for both calibration algorithms was chosen as: population size is set to 20 individuals, the number of generations—60, the archive size—5 individuals, the probability of mutation and crossover—0.2.

Two objective functions were chosen for model results quality assessment: the mean absolute error (MAE) and root mean square error (RMSE).

The calibration for every scenario was repeated 100 times to obtain the distribution of the relative improvement of RMSE and MAE against the model configuration with default parameter values (DRF = 1.0, CFW = 0.015, STPM = 0.00302). Also, the mean relative standard deviation of the calibrated parameters set is provided. The boxplots for the results with different scenarios are presented in Fig. 5.

Fig. 5.
figure 5

The comparison of the baseline and robust algorithms’ performance on the validation set of stations in all scenarios. The RMSE and MAE metrics are presented as an improvement against the corresponding values for the default configuration.

It can be seen that the variance of metrics for the robust algorithm is lower and the quality is better. The detailed metrics for all scenarios and stations sets are provided in Table 1.

Table 1. Error metrics for the baseline and robust algorithms. The “test” block contains the metrics for the verification points. The “train” block contains the metrics for the calibration points. The boldface numbers indicate the best metrics for all station sets (the higher improvement and lower standard deviation is better)

As can be seen, the robust approach provides a better or equivalent improvement of model performance for the validation points in all groups of scenarios. The standard deviation for both model parameters and relative improvement values are also lower than the baseline. In can be concluded that the optimal algorithm choice for validation points varies in different scenarios. The scenarios 1–9 operate with a single-point calibration set. The performance of the robust algorithm for this group of validation points is similar to baseline RMSE (but outperforms it for the MAE metric and calibration points metrics). For the other scenarios, the gain is near 1–2% RMSE and 10% MAE against the baseline.

Also, the calibration set quality averaged for all scenarios for the robust approach also outperforms the baseline. The standard deviation of the obtained metrics is smaller for all scenarios, as well as the mean standard deviation for model parameters. We can claim that a robust approach is effective for the cases with several spatially scattered points that can be applied for calibration. It is important to notice that the calibration points’ quality is not affected in a negative way.

6 Conclusion

In the paper, the practical approach to the calibration of numerical wave models under data quality and availability constraints was proposed. The algorithm for the simulation of artificial data diversity was implemented and applied to the ERA-Interim reanalysis wind data. The regional configuration of the SWAN model was used as a case study for the parameters tuning algorithm effectiveness evaluation.

The proposed REBEC approach was compared with the baseline SPEA2 algorithm in a set of experiments. The lower variability and better performance metrics for the spatially distributed calibration and verification points were obtained. It confirms the effectiveness of the robust calibration approach for the simulation domains with a small number and poor coverage of real observations. However, the negative impact of the proposed approach for computational performance (several simulations should be performed for each candidate parameters set) makes the robust optimisation potentially non-preferable for the model configurations with the sufficient spatial coverage of observations and high-quality atmospheric reanalyses.

The source code of the algorithms for calibration, pre- and post- processing as well as the configuration files for SWAN are available in an open repository [23].