The methodology to construct the future weather dataset for Europe is described in the following sections and summarized in Fig. 1.
Complementing the weather dataset with biophysical variables necessary for crop simulation
Despite the bias-corrections, the future weather dataset proposed by Dosio and Paruolo (2011) is still inadequate to properly run process-based crop simulation models to assess climate change impacts on crop growth and yield. The main issue is the lack of consistency of weather variables resulting from the fact that the bias-correction is done only on a subset of the necessary variables: surface air temperature and rainfall. Other required variables, such as global solar radiation and wind speed, may still have unrealistic distributions when compared to observed data. Other input variables for crop growth models, such as evapotranspiration, are not directly available and must thus be calculated. The solutions below have been adopted to consolidate the weather dataset.
Global solar radiation
To ensure global solar radiation is coherent with the bias-corrected temperature values, it has been estimated using the Bristow-Campbell model (Bristow and Campbell 1984). Such methods to estimate global solar radiation using daily surface air temperature amplitude are based on the assumption that the site is not significantly affected by advection. This assumption does not necessarily hold when estimating the solar radiation pattern of a specific site, but when working with abstractions such as interpolated time series associated to a spatial grid and when site-specific information is lacking, the assumption can be considered non-limiting. This is because the range-based method is physically consistent: clear days show a greater range of temperature because solar irradiance is not filtered by clouds during the daytime, while the long-wave emission from soil surface is more rapidly lost in the atmosphere during the night. Seasonality is accounted for in the Bristow-Campbell model. The Bristow-Campbell model requires a continuous observed global solar radiation spanning at least 2 years for proper calibration; this is not available for all of Europe. Consequently, the auto-calibration procedure proposed by Bojanowski et al. (2013), which does not require reference data, was used. The auto-calibration method provides robust estimates of solar radiation which are consistent with temperature data. Given that GCM climate change scenarios do estimate changes in temperature, the Bristow-Cambell b parameter can be estimated for each scenario; solar radiation data can be derived accordingly. Clear sky transmissivity (CST) is estimated for each grid cell from remote sensing data as in Bojanowski et al. (2013). After CST was estimated, the b parameter is estimated keeping the value of the c parameter constant as c = 2. The uncertainties in the temperature estimates of GCM and RCMs can be propagated to solar radiation; however, it is beyond the scope of this study to articulate about the exogenous variables.
Wind speed and relative air humidity
Wind speed and relative air humidity are of direct interest for plant disease models and for the estimation of reference evapotranspiration based on models such as Penman-Monteith. As illustrated in Fig. 2, the distributions of these variables produced by GCMs are not realistic when compared to observations from MCYFS. The wind speed data contained some very excessive and unrealistic values in the METO-HC-HadRM3Q0-HadCM3Q0 model simulation. To avoid problems with the Q-Q plot, data above 15 m/s (less than 2 % of the total samples considered here) are discarded from the simulated data and the respective records are removed from the observed data. Even after this operation, wind speeds appear overestimated when compared to the observed data.
Some relationship between these variables and temperature and precipitation patterns exists, but it is much less straightforward to derive the former from the latter as it was done with global solar radiation. Here, it is conservatively assumed that patterns of wind speed and humidity should not change considerably in the near future. The historical observed data from MCYFS during the period 1993–2007 are thus used to represent all time horizons (the baseline, 2020 and 2030).
Evapotranspiration and vapour pressure deficit
Reference evapotranspiration and vapour pressure deficit were estimated from the GCMs and/or derived weather element variables using the FAO56 realization of the Penman-Monteith model (Allen et al. 1998) as implemented in the CLIMA libraries (Donatelli et al. 2006; Donatelli et al. 2009). While simpler temperature-driven empirical methods can also be used, this physically based approach is preferred to ensure that the bio-meteorological values are consistent with the driving thermic, aerodynamic and radiative elements.
Processing the dataset for short-term time horizons
The data produced by GCMs (or RCMs) are often used to represent the general trend that climate variables such as temperature are expected to have. However, there is some variability around such a trend representing the weather patterns that are simulated by GCMs. For a given time horizon, climate studies will typically use datasets with 30 years or more to characterize a given variable or to derive biophysical variables from impact models (such as crop yields). Such sample size is deemed large enough so that the short-term random fluctuations—such as yearly weather pattern variations—do not significantly affect the characterization of the climate during the target time horizon.
Typically, climate studies distinguish time horizons that are well separated in time, e.g. 2020, 2050 and 2100. Using the larger sample size tempers statistical descriptors such that identified trends are not significantly influenced by short-term anomalous fluctuations. If time horizons of interest are close in time, such as 2020 and 2030, taking windows of 30 years around these horizons results in an overlap that is too large, rendering the separation into two horizons not significant. Conversely, when considering only 10 years (thereby avoiding the overlap in the above-mentioned case of 2020 vs. 2030), the sample size becomes too small in order to assume that specific short-term weather fluctuations do not dominate the trend. Indeed, 3 or 4 years that are much warmer than the average during a period of 10 years will have stronger consequences on the average indicators derived by impact models, such as crop growth models, than if these 3–4 years occur within a period of 30 years.
A compromise was made to characterize the climate of the target time horizons by using a stochastic weather generator, ClimGen (Stöckle et al. 2001), to increase the sample size corresponding to each time horizon. We used 15 years of data around each time horizon to derive monthly parameters for the weather generator (WG); these parameters resume the distribution of each weather variable for each grid cell. These parameters are then used to generate a set of 30 synthetic years for every grid cell, which have the characteristics of the 15-year period. Although the 15-year periods used to generate parameters do overlap by 4 years, the new synthetic years within each time horizon are distinct since they are regenerated randomly.
It must be acknowledged that due to the stochastic nature of the weather generation process, and to the fact that the generation is applied independently on every grid cell, there is a lack of spatial consistency: a generated value for a variable in any given cell on any given day will not necessarily be similar in value to variable’s values in adjacent cells. In reality, there would generally be a continuum of the values between cells, if not throughout the region. This also applies to any biophysical variables derived from this synthetic weather dataset. The weather dataset built here is consequently targeted to be used with impact models in which runs are spatially independent. Spatially-continuous results are obtained after averaging all the simulated variables at each grid cells. Hence, results can only be analysed in terms of statistic properties, and not for investigating patterns of individual years.
Comparison against gridded observed weather data
A comparison between the observed MCYFS and the generated weather datasets for the overlapping period of the baseline serves to assess how well the dataset correlates to the past climate. Three evaluations were devised to ensure the cogency of the assumptions: (1) an assessment of the bias-correction process by analysing potential differences between the GCM-RCMs (the reference database MCYFS is indeed different from that of E-OBS used for the bias-correction); (2) an analysis of how the duration, or sample size, of generated weather time horizons (section 3.2) may affect results and (3) an evaluation of how well global solar radiation estimation is improved by the auto-calibration of the temperature-based model (section 3.1.1).
The evaluations were conducted on a subset of grid cells encompassing key sites representing regions in Europe with contrasting agro-ecological conditions (as shown in Fig. 3). In each case, every year in the baseline time horizon of 1993 to 2007 (2000 ± 7 years), was included, but only during a typical growing season ranging from 1 April up to 30 September. This period covering much of the spring and summer is considered most relevant for crop growth in Europe.
The bias-correction process evaluation was performed for each meteorological variable. Each variable was sorted by value irrespectively of spatial location for every GCM-RCM dataset separately. Each series was matched with a series obtained with the same procedure from data of the same cells in the MCYFS dataset. Both GCM and WG series were compared; in the case of WG data, 15 years were sampled from the 30 generated to have the same number of items in each series. This analysis quickly identifies if biases persist and if extreme values are correctly represented.