The aim of this study is to use an approach of probabilistic event attribution (PEA) to investigate whether the odds of extreme precipitation in July, given the boundary conditions of observed forcings and sea surface temperatures (SSTs), have changed due to anthropogenic influences, making use of very large ensembles of regional climate models.
Probabilistic event attribution (PEA)
To achieve our aim of a probabilistic event attribution study of extreme July precipitation in England and Wales (such as led to severe flooding in large parts of central England, Northeast England and Wales), at least two climate model ensembles are required. The first one is a historical ensemble using observed forcings and sea surface temperatures (SSTs), while the second is a counterfactual ensemble for the “world that might have been” without anthropogenic forcing. Both ensembles must be sufficiently large to ensure that the statistical analysis of changes in extreme events is robust (e.g., Pall et al. 2011; Kay et al. 2011). Some authors have also compared the change in event occurrence in model simulations between a recent decade (e.g., 2000s) and an earlier decade when the anthropogenic forcing was not as strong (e.g., 1960s) (e.g., Otto et al. 2012; Massey et al. 2012). Both approaches have their advantages and disadvantages. A comparison of the 1960s with the 2000s does not allow for the changes in probability of occurrence of extreme events to be attributed to anthropogenic climate change alone as other climate conditions have been different in the two decades as well. It does, however, allow for a validation of modelling results against observations. Additionally comparing whole decades instead of ensembles of single years minimises the influence of large scale teleconnection patterns, e.g. the North Atlantic Oscillation (NAO), as decadal ensembles smooth interannual variability of such oscillations. However, this does not apply for the Atlantic Multidecadal Oscillation (AMO) pattern which switches modes only on decadal and longer timescales (Sutton and Dong 2012). Especially with respect to precipitation, which is resolvable for large scales but largely either not understood or not resolvable even in a regional climate model on local scales that are important for flooding, both aspects - the ability to validate the model and the reduced influence of specific SST patterns - are important.
Modelling analysis
Assessing the influence of external drivers (e.g., increased greenhouse gas concentrations in the atmosphere) on extreme weather is challenging because the most important events are typically rare, so their observed frequency is dominated by chance. In order to compile robust statistics of extreme weather events, large ensembles of model simulations at relatively high resolution are required. This project makes use of the large-ensemble capability provided by the on-going climateprediction.net ‘weather@home’ volunteer computing network (Allen 1999; Massey et al. 2006), where members of the public are producing multi-thousand-member ensembles of weather simulations using regional climate models (RCMs) of different parts of the world. We have applied the model set-up described in Massey et al. (2012) with the regional climate model (RCM) embedded within a general circulation model (GCM). The increased resolution of the RCM results in a more realistic simulation of localised weather events, including high and low temperatures and extreme precipitation over a relatively small area (Jones et al. 2004). The standard ensemble described below is an initial condition ensemble hindcast experiment over Europe which simulates the historical period 1960–2010 including all observed forcings (AF)(we will only analyse the 1960s (AF1960s) and 2000s (AF2000s) in this study) as well as the decade from 2000 to 2010 with the anthropogenic climate change signal removed (NAT2000s) as described in Section 2.3. For the results used in this study, the RCMs run by volunteers are at 50 km resolution over Europe driven by a global atmospheric model. This is a relatively low resolution for an RCM but given that natural variability is the largest source of uncertainty (Section 2.4) the best methodology to account for this uncertainty (given resources are not unlimited) is to trade accuracy for precision and employ large ensembles of relatively low resolution. The models used are HadAM3P, an atmosphere only GCM at 1.25 × 1.875 degrees resolution, forced with observed SSTs from the HadISST data set (Rayner et al. 2003) and the RCM HadRM3P. Both models have been developed by the UK Met Office and are based upon the atmospheric component of HadCM3 (Pope et al. 2000; Gordon et al. 2000) with some improvements in the model physics described in Massey et al. (2012). Both models are run many hundreds of times with varied initial conditions. In this way, very large ensembles of RCM simulations can be computed, of the order of thousands, which in turn allows greater confidence when examining the statistics of rare events. We follow a similar methodology to Massey et al. (2012), which uses very large ensembles of general circulation models (GCMs) to assess the change in risk of very warm Novembers in central England under two different climate scenarios: observed July in the decade between 1960 and 1970 and observed July between 2000 and 2010. In addition to that we analyse the same decade, 2000-2010, in a representation of a world that might have been without anthropogenic climate change.
Although our method of comparing the 1960s with the 2000s does not permit clean decoupling of anthropogenic and natural drivers, using decadal long scenarios of precipitation reduces some of the effects of natural variability and allows both scenarios to be validated against observed data. The main observational dataset used here is the HadEWP data which are part of HadUKP, the UK regional precipitation series provided by the Met Office Hadley Centre (Alexander and Jones 2001). As well as the analysis of large ensembles of at least two scenarios showing different climate conditions, the other crucial component of an event attribution study is the validation of the model, investigating whether it is capable of representing the extreme event.
Figure 1 shows quantile-quantile plots as a measure of the model’s ability to represent the observed distribution of precipitation. Both model (HadRM3P) and observational (HadEWP) data are shown in the two different decades,daily averages as well as the 5-day average computed as a running 5-day mean are shown. Pall et al. (2011) showed that daily means are the important timescales to derive river run-off,however, if river run-off is not being computed, 5-day means are a better proxy for flood risk than daily means, giving some indication of ground saturation and maintaining sub-monthly variability. Fowler and Kilsby (2003) demonstrated that multi-day precipitation is an important causal factor in floods, and found no significant changes in 1-2 day precipitation in the period 1991-2000 compared to earlier decades but significant changes in 5-day events. Furthermore a relatively low resolution RCM is probably more reliable on 5-day timescales as very local mechanisms accounting for extreme changes in daily precipitation are generally not well represented by these types of models, especially in the summer months. Fowler et al. (2007) demonstrated that RCMs are in general more likely to reproduce the observed distribution of 5-day events compared to daily precipitation, an analysis our model validation below supports. The observed data set used for validation gives the area average over England and Wales encompassing roughly a region from 51.3–53.2N and 356.2–0.3E. This region (51.3-53.2N, 356.2-0.3E) both encompasses the regions of highest flooding in 2007 in Central and North East England and is large enough to be expected to be represented well in an RCM of this resolution. From this figure and complementary reliability diagrams (not shown) it becomes evident that the model underestimates the observed precipitation, and that the distribution is quite well modelled within the lowest 50 %-quantile, but allocates insufficient probability to the upper percentiles of the distribution. The straight line in Fig. 1 is the linear fit between both distributions, if both lay on this line the distributions would be identical. The fact that this line is not the 1-1 line shows that the magnitude of precipitation in the model does not match observations. Comparing 5-day means of precipitation as shown on the right-hand side of Fig. 1 reveals a much improved goodness-of-fit even at the wet tail of the distribution which is important with respect to extreme precipitation events, but also highlights that the magnitude of precipitation is underestimated in the model. Attempting to counteract the bias in magnitude by simply multiplying the mean difference leads to better reliability of the overall mean but does not improve the fit at the tails of the distribution. We thus refrain from applying a bias correction as we are only comparing the model with the model so the actual magnitude is irrelevant for the analysis. We concentrate our analysis on 5-day means which give the better distribution in the model and are furthermore an important indicator for flood risk (e.g, Fowler and Kilsby 2003; Fowler et al. 2010). Additionally, the underestimation of the observations is consistent for the 1960s and the 2000s so any changes in frequency and magnitude between the two decades represents a change in the relative risk of extreme precipitation.
Modelling “the world that might have been”
The natural-only forcing used to simulate a counterfactual ensemble of the decade 2001-2010 is a climate forcing that is identical to the one we observed in the last decade but with greenhouse gas emissions, ozone, SO2 and DMS held at preindustrial levels, and with the corresponding lower SSTs prescribed. It thus simulates a world that might have been without anthropogenic influences.
The attributable twentieth-century greenhouse gas warming in SSTs cannot be founddirectly from observations, because observations also contain the signal of both natural (e.g., solar and volcanic) and other anthropogenic (e.g. sulphate aerosol) drivers, and internal variability. Prior studies (Stott et al. 2006) derived this warming from estimates that used established ’optimal fingerprinting’ analysis (Hegerl et al. 2007, Stott et al. 2010), which uses multiple linear regression to compare observed surface temperature change and is described in the supplementary text of Pall et al. (2011).
We use a new method based on Pall et al. (2011) in which we subtract warming patterns from observed SSTs. The Met Office state-of-the-art coupled climate model HadGEM2 is used to compute delta values (SST difference fields) by subtracting SSTs of a HadGEM2 ’natural’ run over the last decade without anthropogenic greenhouse gas, ozone, SO2 and DMS forcing from the same model runs with ’all forcing’. These delta SSTs are then subtracted from HadISST SSTs. To reduce noise, the deltas are produced using decadal averages. The counterfactual SSTs were then used to estimate the sea ice concentration for the ’world that might have been’. As current state of the art GCM sea ice projections are inconsistent with observations and with respect to other models (Eisenmann et al. 2007), we apply an empirically-based method (Rayner et al. 2003) to provide sea ice fields for the natural model runs. This method to generate possible sea ice fields is independent of GCMs, using a statistical method to fit a quadratic equation to SSTs. Using observational records from HadISST SSTs (SST) and sea ice extent (SIE) the first order approximation:
$$SST = a \times SIE^{2} + b \times SIE +c $$
is derived. Under the assumption that this approximation holds on the time scale of interest so that the parameters a,b, and c are time invariant, the sea ice extent (SIE) for given SSTs can be calculated. To test the algorithm this method is used to calculate the observed SIE for the period 1999-2010. Figure 2 shows the result of this validation as well as the sea ice extent for the natural climate simulations for the same period and the algorithm is seen to perform well.
Uncertainty assessment
The main source of uncertainty in the regional precipitation in present day climate is the natural variability (Hawkins and Sutton 2011). The modelling framework employed in this study is ideal to quantify this uncertainty by using a very large ensemble of simulations of the very same climate conditions with slightly varied initial conditions. However, another large uncertainty, the modelling uncertainty, cannot be addressed as only one model is used for the study. Validating the model against observations gives insights into the shortcomings of the model but cannot quantify this uncertainty. Introducing observations into the experiment reveals another source of uncertainty - the uncertainty within the observations. However, as observed data sets in this part of the world are fairly consistent, we assume this to be a small part of the overall uncertainty. To validate the model’s ability to reproduce observations, we compare model precipitation with the observed timeseries from HadEWP, making use of quantile-quantile plots and reliability diagrams (not shown) as statistical validation tools, described in Section 2.2. England and Wales have long records of observed precipitation, and a very dense network of weather stations. Therefore we might expect the uncertainty in this observed dataset to be somewhat smaller than in most other regions. For a qualitative assessment of uncertainty in the observed data we additionally compute quantile-quantile plots for the decades of 2001-2010 and 1961-1970 using NCEP reanalyses and our model’s data. NCEP is certainly not the ideal data set for regional precipitation but is one of the few spanning the considered timeframe 1960-2010. Furthermore it is derived very differently from HadEWP and is thus arguably suitable to check whether validation with very different data sets gives very different results, which turns out not to be the case. The picture (not shown) represents good agreement but with the 5-day mean data slightly overestimating the NCEP data in the higher quantiles. We made no attempt to quantify the uncertainty in the observed data sets, but assessments can be found in (Alexander and Jones 2001; Kalnay et al. 1996). It is important to note that we are not so interested in the model’s ability to simulate the correct precipitation at the correct time but more to represent the right distribution of precipitation within the relevant decade. The validation analysis is thus done separately for the two analysed decades.
The model set-up we use for the first part of this study has been especially designed to account for uncertainty in the initial conditions input to the model. In order to produce a robust statistical analysis, we use an ensemble size of 1741 initial conditions. By current standards in this area of science this is a large ensemble - thus we believe we are able to sample the initial condition uncertainty exceptionally well. Using this standard ensemble we focus purely on the initial condition uncertainty - all other aspects, of model structure and external forcing, have not been varied at all. Therefore the uncertainties in other input variables cannot be quantified in this approach.
In the later part of the study, we examine differences between models representing conditions of the 1960s and models representing conditions of the 2000s with the anthropogenic signal in the SST forcing removed. The application of this method can give an initial bound on the uncertainty originating in the fact that we do not know how present day climate without anthropogenic emissions would be, but is likely to underestimate the full range. To make more comprehensive statements about the true range of uncertainty in this area, further studies should apply different warming patterns from different GCMs to remove the anthropogenic signal from the SSTs (such as was done by Pall et al 2011). The use of a regional climate model driven by a global model, and the consistency of the two models, allows us to gain confidence in the model physics on different scales, even though it does not allow us to quantify the uncertainty in these methods explicitly.
In this study we have not attempted to quantify any other type of modelling uncertainty, e.g. we have not varied any internal model parameters. All other sources of uncertainty affecting these results, e.g. stemming from stochastic variability and our own expert judgment and interpretation of results, cannot be accounted for in this approach.