Introduction

Coral reefs are vital ecosystems that are rich in biodiversity and provide enormous economic benefits, primarily via fisheries and tourism (Moberg and Folke 1999), as well as playing an important role in coastal protection (Ferrario et al. 2014). The Great Barrier Reef (GBR) is the world’s largest coral reef ecosystem, with its economic benefits estimated to be $56 billion AUD, supporting 64,000 jobs and contributing $6.4 billion AUD per annum to the Australian economy (Deloitte Access Economics 2017). In addition to economic value, the GBR is a World Heritage site with extensive indigenous, historic, social, scientific, aesthetic, and natural value (Great Barrier Reef Marine Park Authority 2014a).

Key environmental variables generally exhibit relatively little variance over seasonal and diurnal time frames in tropical oceans, making marine organisms sensitive to small changes. Healthy coral reefs survive in a narrow range of environmental conditions and thus are particularly vulnerable to climate variability and change (Wells 1957; Johnson and Marshall 2007). When corals are subjected to environmental conditions outside their tolerance threshold corals can become stressed and bleach as a response (Glynn 1993). Mass coral bleaching refers to impacts over large regions and is primarily due to higher than normal ocean temperatures (Brown 1997).

The GBR has been identified as particularly vulnerable to climate variability and change due to its sensitivity and exposure, especially with regard to mass coral bleaching (Fabricius et al. 2007). Various onshore and offshore regions across the entire length of the GBR have experienced significant mass bleaching due to elevated ocean temperatures in recent decades. Two unprecedented severe events occurred recently in contiguous years 2016 and 2017, highlighting the vulnerability and sensitivity of the GBR to increases in ocean temperature in a changing climate (Hughes and Kerry 2017). Coral bleaching from warm ocean temperatures in the GBR typically occurs in mid-to-late Austral summer from January to March (Hoegh-Guldberg 1999). Drivers of warm ocean temperature in the GBR include both El Niño Southern Oscillation (ENSO) and regional weather, although ENSO-related bleaching is likely to diminish in importance as climate change is predicted to elevate ocean temperatures above bleaching thresholds on an annual basis (Goreau and Hayes 2005). In the year following the formation of an El Niño event, generally the GBR is anomalously warm due to reduced cloud cover from a weakened monsoon that enhances radiative heating (Lough 1994), and an enhanced south equatorial current flowing into the Coral Sea (Kessler and Cravatte 2013; Ganachaud et al. 2014). In the northern GBR, this anomalous warming occurs during the normal seasonal ocean temperature maximum (summer), whereas in the south the warming occurs after the normal seasonal maximum in late summer or early autumn (Lough 1999; Redondo-Rodriguez et al. 2012). Regional weather drivers of anomalous ocean warming include increased solar radiation, low cloud cover at middle to high levels (Marshall and Schuttenberg 2006), and reduced wind and rain. A reduction in wind speed decreases cooling mechanisms such as mixing, upwelling, and air–sea flux quantities latent and sensible heat which are related to turbulence (Praveen Kumar et al. 2017; Chen et al. 1994).

Past bleaching events on the GBR

In the summer of 1998, a global mass bleaching event occurred that coincided with a strong El Niño event. Bleaching was mostly inshore in the southern and central GBR. In 2002, another event occurred which again affected the central and southern GBR, this time impacting offshore corals that largely avoided bleaching during the 1998 event (Berkelmans et al. 2004). This latter event did not coincide with a significant phase in the ENSO cycle. Elevated ocean temperatures were attributed to trends in regional weather, specifically an extended period of higher than normal air temperatures from December 2001 to March 2002, and frequent low winds (Done et al. 2002). The next significant mass bleaching event occurred in 2006, and similarly to 2002, was also attributed to regional weather (Maynard et al. 2007). Thermal accumulation was caused from extended periods of clear skies (Masiri et al. 2008) causing bleaching in the far south primarily around the Keppel Islands, and to a lesser extent the Capricorn Bunker Group (Great Barrier Reef Marine Park Authority 2007). In the summer of 2016, another global bleaching event occurred coinciding with a strong El Niño, with 81% of surveyed corals in the northern sector of the marine park experiencing severe bleaching and 33% in the central GBR (Hughes et al. 2017) resulting in an overall loss of 30% of corals in the GBR marine park (Hughes et al. 2018). The southern GBR avoided major bleaching due to significant local cooling during the passing of cyclone Winston and Tatiana that increased upwelling, cloud cover, and rainfall (Stella et al. 2016). Severe mass bleaching was observed again in the following summer of 2017 which was defined as an ENSO-neutral period, impacting in particular the middle third of the marine park region (Hughes and Kerry 2017). Drivers of this event have been attributed to regional weather conditions including heatwaves and a relatively low number of summer storms occurring over the reef until late in the season, leading to increased surface heating and reduced mixing.

Seasonal forecasting in the GBR

Forecasting sea surface temperatures (SST) at seasonal scales up to 9 months in advance has been demonstrated to be a critical tool for marine stakeholders and management groups (Spillman et al. 2011; Stock et al. 2015). Such forecast tools support decision making surrounding both risk management and risk reduction, and have demonstrated value in determining the likelihood of future mass bleaching in an upcoming season (Spillman 2011a; Eakin et al. 2012). In the GBR, marine park managers utilise accurate seasonal forecasts as part of their Early Warning System component of the Coral Bleaching Response Plan (Great Barrier Reef Marine Park Authority 2013a; Maynard et al. 2009; Marshall and Schuttenberg 2006). An incident response framework is prepared when seasonal forecasts indicate an increased risk of coral bleaching, defined as SSTs that are 0.6 °C greater than average and persist for 2 months or more. In the event of severe or widespread bleaching in the GBR, an assessment and monitoring programme is implemented to both measure and verify impact and severity. Seasonal forecasts over the winter are also useful for marine park managers as anomalous ocean warmth during this period is a known precursor to marine disease outbreaks (Heron et al. 2010; Great Barrier Reef Marine Park Authority 2013b).

Since 2009, seasonal SST forecasts from the Bureau of Meteorology have been used to derive real-time forecast metrics for coral bleaching risk for the GBR using the Predictive Ocean Atmosphere Model for Australia (POAMA) model, a global dynamic coupled ocean–atmosphere model that forecasts ocean and atmospheric conditions up to 9 months in advance. (Spillman et al. 2009). Bleaching metrics derived from POAMA outlooks were verified against recorded mass bleaching events in the GBR to assess its capabilities for this application (Spillman et al. 2013). The accuracy of the SST forecast, and therefore its efficacy for marine applications, varies depending on the region, the forecast lead time (temporal length into the future), and the time of year the forecast is initialised (Spillman and Alves 2009; Griesser and Spillman 2016).

The Bureau of Meteorology has upgraded their seasonal forecasting system to ACCESS-S1 (Australian Community Climate and Earth–System Simulator–Seasonal) which features improved physics and higher temporal and spatial resolution (Hudson et al. 2017; Lim et al. 2016). The ocean resolution at the surface in ACCESS-S1 is ~ 25 km which is higher than many other seasonal models currently in operation.

The development of skilful early warning tools is becoming increasingly important to assist marine park managers in mitigating impacts on the health and coverage of coral reefs as coral bleaching is predicted to become more frequent and severe in a changing climate. This paper assesses the skill of ACCESS-S1 in forecasting SST over the GBR and the Coral Sea for the purpose of evaluating bleaching risk. Model skill was compared to persistence which is the assumption that existing SST conditions will continue unchanged, and against the previous seasonal model POAMA2.4 (Schiller et al. 2004). Various historical GBR coral bleaching events were individually investigated to assess the ability of ACCESS-S1 to predict high SST anomalies for known scenarios.

Method

Site description

The GBR Marine Park boundary stretches 2300 km from just north of Bundaberg to the far north tip of Queensland covering an area of 344,400 square kilometres. There are almost 3000 reefs in the GBR that account for approximately 10% of the world’s coral reefs. The reef ecosystems are rich in biodiversity and support a vast variety of marine plants, fish, invertebrates, sharks, rays, marine turtles, and sea snakes (Great Barrier Reef Marine Park Authority 2014b). The GBR Marine Park is divided into four management zones: Far Northern, Cairns and Cooktown, Townsville/Whitsunday, and Mackay/Capricorn (Fig. 1; Great Barrier Reef Marine Park Authority (2004)).

Fig. 1
figure 1

Locality map, topography, and management zones of the Great Barrier Reef Marine Park

The Coral Sea is dominated by three major surface currents which are the South Equatorial Current (SEC), the Hiri Current, and the East Australian Current (EAC). The SEC flows into the Coral Sea between the Solomon Islands and New Caledonia, travelling over the Mellish Plateau (north of the shallow reefs on the Chesterfield Plateau) towards the Queensland Plateau. This plateau greatly influences the SEC, resulting in high variability between the plateau and continental shelf (Choukroun et al. 2010). The SEC bifurcates west of the Queensland Plateau and flows both north and south along the continental shelf. The northern flow head, called the Hiri Current, traverses through the Queensland Trough and forms a gyre in the Gulf of Papua. The southern flow is known as the EAC and heads through the Townsville Trough that separates the Queensland and Marion Plateaus. From there, the EAC flows down the Cato Trough forming a clockwise eddy (Griffin et al. 1987). The Marion Plateau experiences large daily variability due to significant tidal processes. The Cato Trough is highly dynamic due primarily to an EAC gyre that interacts with north flowing sub-Antarctic waters creating upwelling (Burrage 1993).

Model description

Seasonal forecasts from dynamical coupled models provide outlooks at timescales from weeks to months to seasons based on coupled interactions between the atmosphere, ocean, and land surface systems. The complexity of the coupled framework is a chaotic system, which is emulated by creating an ensemble of outcomes. Ensembles are created by running the model multiple times, with slightly different initial conditions representative of their respective observational errors. A single ‘deterministic’ forecast can then be derived from the ensemble by simply taking an arithmetic average, referred to as the ensemble mean. Each forecast has a particular target temporal range, with the first forecast called lead time zero, and each subsequent forecast advancing the lead time number. For example, in a monthly forecast, lead time zero is the next full calendar month following the start date, and lead time one is the month after that, etc. The start date of a forecast, also called the initialisation date, refers to the date when the model ran, and is initialized with recent observations. Retrospective forecasts (hindcasts) over a specified period are created for the purpose of bias correction and model assessment.

ACCESS-S

The first implementation of ACCESS-S is referred to as ACCESS-S1 (Lim et al. 2016) and is based on the UK Met Office (UKMO) global seasonal prediction system version 5, referred to as GloSea5 (Maclachlan et al. 2015) that includes the latest atmospheric Global Coupled (GC2) model (Williams et al. 2015) coupled with the latest Nucleus for European Modelling of the Ocean (NEMO) community model (Madec & NEMO team 2011). The NEMO ocean model operates on the tripolar ORCA2 grid which avoids having singularity points over ocean, instead having two northern hemisphere foci points over land masses (the third pole remains over Antarctica). Singularity points are where meridians converge as a foci point, severely restricting the allowable time step in finite element models (Madec and Imbard 1996). The ORCA2 grid removes the foci points from the computational domain ACCESS-S operates on approximately 25 km horizontal ocean resolution in the Australian region, with 75 depth layers commencing at one metre at the surface. In comparison, the previous operational model POAMA2.4 is ~ 250 km horizontal resolution with 25 depth levels and a 15 m top surface layer. The geographic resolution in ACCESS-S1 is high enough to resolve large eddies and mesoscale currents, and is defined as ‘eddy permitting’ (Marsh et al. 2009). Version S1 uses ERA-Interim (Dee et al. 2011) for the hindcast atmospheric initial conditions and the Bureau of Meteorology’s Operational Global NWP analyses using ACCESS-G for the real-time forecasts (Hudson et al. 2017). Ocean data assimilation uses Forecast Ocean Assimilation Model (FOAM) analyses (Blockley et al. 2014) based on the NEMOVAR project specifically developed for the NEMO ocean model for both hindcast and real-time forecast (Mogensen et al. 2012).

The model was run retrospectively for the period of 1990 to 2012 to generate hindcasts for both multiweek (fortnightly) and monthly datasets. Each has their own initial condition perturbation scheme, producing 11 ensemble members (Hudson et al. 2017). There were four model start dates each month (1st, 9th, 17th, and 25th), resulting in a total of 48 start dates per year. For the monthly and multiweek hindcast sets, retrospective forecasts went out to 6 months and 6 weeks, respectively, into the future. The multiweek hindcast set was averaged into three fortnights, referred to as f1 (weeks 1 and 2), f2 (weeks 3 and 4), and f3 (weeks 5 and 6). Fortnightly and monthly climatologies were generated from their respective hindcast datasets by start date and lead time. These climatologies were subtracted from the top layer of model ocean temperatures to provide model SST anomalies, i.e. deviations from long term averages.

Operational real-time seasonal SST forecasts from ACCESS-S1 commenced in August 2018. In the real-time system, both monthly and fortnightly forecasts are comprised of 99 ensemble members. Expanding the ensemble size improves the model’s ability to capture uncertainty (Tracton and Kalnay 1993). For monthly forecasts, 11 ensemble members are produced daily via initial condition perturbation and accumulated over the previous 9 d to amass a total of 99 ensemble members. For fortnightly forecasts, 33 ensemble members are produced daily which are then accumulated over the previous 3 d of forecast runs (Hudson et al. 2017).

It was not possible to fully investigate the benefits of the real-time time-lagged approach due to the limited model start dates of only four a month in the hindcast. Producing the full 99 member ensemble to mimic the operational real-time implementation would require a daily hindcast. However, the ensemble size for the monthly hindcast was doubled from 11 to 22 by accumulating the retrospective forecast made on the 25th and the 1st of each month in the probabilistic assessments.

POAMA 2.4

Coupled ocean–atmosphere seasonal forecasting at the Bureau of Meteorology has been an operational system using the POAMA model since 2002 (Alves et al. 2003; Wang et al. 2008; Hudson et al. 2011). The most recent version, POAMA 2.4, was released in 2011 and included an improved ocean data assimilation scheme over a longer hindcast period, with more ensembles by adopting a multi-model approach with three different configurations (Wang et al. 2011).

The atmospheric model component of POAMA 2.4 is based on the Bureau of Meteorology’s Atmospheric Model version 3.0, with a horizontal spectral resolution of approximately 250 km and 17 vertical levels (Colman and McAvaney 1995; Colman 2002). The ocean model is the CMAR Australian Community Ocean Model V.2 (ACOM2), with model grid spacing of 2° in the zonal direction, and 0.5° to 1.5° in the meridional direction (Schiller et al. 2004) with 25 vertical levels, and a 15 m top layer. For each POAMA forecast, the ocean (Smith 1991), atmosphere, and land (Hudson et al. 2011) are initialized from observed states. Ensembles are generated from perturbing the initial conditions for 11 states, and a multi-model ensemble from three atmospheric models (Hudson et al. 2013), providing a total of 33 ensemble members. For the skill comparison between POAMA and ACCESS-S1, the POAMA ensemble mean was calculated by randomly sub-sampling 11 ensembles from the full set of 33 to match the number of ensembles in ACCESS-S1. This procedure was repeated 20 times, which was found to be a sufficient number of iterations to represent the full spread of possible outcomes. This was achieved by increasing the number of iterations until the ensemble spread was deemed relatively stable.

Verification data description

Satellite-derived SST observational data was employed for assessment of model skill. The Optimum Interpolation Sea Surface Temperature v2 (OISSTv2) produced by the National Oceanic and Atmospheric Administration (NOAA) is a global product with similar resolution to ACCESS-S1 over the GBR and Coral Sea, and extends temporally over the entire ACCESS-S1 hindcast period (Reynolds et al. 2002). To prepare the data for assessment against the model, monthly climatologies for 1990–2012 were calculated and then subtracted from the monthly data to create multiweek and monthly SST anomalies. The observed SST dataset was regridded from its rectilinear 0.25° resolution Mercator grid to the tripolar ORCA2 grid used by the NEMO ocean model.

Model skill assessment

The ACCESS-S1 hindcast accuracy was assessed against observations, a reference forecast (persistence), and POAMA 2.4 for the entire hindcast period to quantify model skill in the Great Barrier Reef Marine Park and greater Coral Sea region. A persistence forecast is the method of using recent observed anomalies as a forecast, with the assumption that they will remain unchanged over the coming weeks/months. In this skill assessment, persistence was defined as the previous monthly SST anomaly, persisted for the duration of the forecast. In terms of a reference forecast, persistence was chosen over climatology since it is more difficult to beat in terms of skill due to the slow evolving nature of the ocean, especially for shorter lead times (Troccoli et al. 2005).

Each monthly forecast in the skill assessment uses only the 1st of the month forecast so that each month only has one forecast. Monthly model skill is available out to lead time 5 (six forecast periods in total). For example, a model simulation started on the first of October will produce monthly forecasts for October (lead 0), November, December, January, February, and March (lead 5). Multiweek skill assessments use all start dates and evaluate the first two fortnights.

The initial skill assessment for ACCESS-S1 involved correlating the hindcast ensemble mean (equally weighted average of all ensembles) SST anomalies against observed SST anomalies in the Coral Sea, focusing on the austral summer (January–February–March) for the hindcast period of 1990–2012. Skill of POAMA2.4 was also assessed using the same methodology for all 20 combinations of 11 random ensembles for comparison. The greater Coral Sea region was selected for a comparative assessment between the ACCESS-S1 and POAMA, due to the coarser ocean grid resolution of POAMA and thus fewer ocean grid cells within the actual marine park.

The model spread in the ensembles was interrogated by areal averaging the ensemble mean and ensemble range within the GBR marine park and plotting as a time series against observations for the hindcast period. Additionally, the ensemble frequency and ensemble variability were evaluated by way of a rank histogram, developed by sorting the ensembles at each grid point in the GBR Marine Park for each month of the 23 yr hindcast to create bins, and then determining which bins the observations fell into. A perfect model would exhibit a flat histogram to signify equal probability for all percentile bins in the discretised observations (Hamill 2001).

Model skill in forecasting the monthly warming potential prior to summer was assessed in terms of RMSE between the model ensemble mean and observed SST anomalies, and compared with persistence over the hindcast period. The RMSE gives an indication of how much the model deviates from observed values. Selected forecast start dates were prior to the summer, commencing in October, then November, then December. Target summer months were January, February, and March, which historically coincide with bleaching events.

For assessments of individual management zones along the GBR for both monthly and multiweek, ACCESS-S1 model accuracy was assessed using Pearson correlation coefficients (r) between the model ensemble mean and observed SST anomalies. The Pearson correlation coefficients are the ratio of the covariance of two sample populations to the product of their standard deviations. A correlation coefficient of one indicates a perfect fit between model and observations. For the available hindcast period, where N = 23 (number of years), correlation coefficients above 0.413 are statistically significant with a 95% confidence interval for 21 degrees of freedom (df = N − 2) for a two-sided t test (Whitlock 2007). Areal averaging was employed to provide a summary accuracy assessment of the four management zones (shown in Fig. 1). Both multiweek and monthly SST anomalies for model output and observations were averaged over each management zone prior to correlation calculation.

For the SST Anomaly (SSTA) forecasts relevant here, a forecast event’s probability of occurrence is based on the number of ensembles above a defined SSTA threshold. Probabilistic skill was assessed by determining the model’s ability to forecast events defined by various SSTA thresholds. The thresholds were defined by a series of positive anomalies; 0 °C and 0.6 °C, as well as the upper tercile, defined as the 66th percentile for all ensembles at each month over the entire hindcast period. The technique for calculating the 66th percentile involves cross-validation, which is the exclusion of the current month’s anomaly value in the compiled dataset, leaving the remaining 22 hindcast years as part of the tercile calculation. Cross-validation prevents the pixel value of the month being assessed from influencing the percentile calculation.

Probabilistic skill assessment of rare events requires a large number of samples (Bradley et al. 2008). To maximise sample size, the probabilistic skill was calculated on a pixel-by-pixel basis across the entire marine park, for all monthly forecasts initialised on the 1st day of all months over the 23 yr hindcast. Therefore, the total number of samples is 503 grid cells × 12 model initialisations × 23 yr = 138,828 samples for each lead time.

Two features of probabilistic forecasting, ‘reliability’ and ‘resolution’, were assessed by plotting reliability diagrams (Wilks 1995) and relative operating characteristic (ROC) curves (Mason 1982), respectively. The term reliability refers to the ability of the model to match forecast probabilities with the observed frequencies. Model resolution is a comparison between the proportion of event ‘hits’ and the proportion of event ‘false alarms’ and demonstrates the ability of the model to discriminate between two dichotomous positions (i.e. events and non-events). The use of ROC diagrams as an ensemble verification metric was outlined by WMO (1992). The area under the ROC curve is used as a measure of resolution, with 1.0 being a perfect score (100% hits and correct negatives, 0% misses and false alarms), and 0.5 being a random forecast. Hosmer and Lemeshow (2000) provided general rules for resolution classification, with 0.7 or greater being an acceptable discrimination, more than 0.8 is excellent, and more than 0.9 is outstanding though rare.

These probabilistic assessments were used in conjunction with a Brier score (BS) to give a measure of overall skill against observations (Brier 1950):

$$ {\text{BS}} = \frac{1}{N}\sum\limits_{i = 1}^{N} {(p_{i} - o_{i} )^{2} } $$

where pi is the forecast probability and oi is the observed occurrence (either 0 or 1). The range of BS values is 0 to 1, with 0 being a perfect score. In addition to the Brier score, the Brier skill score (BSS) was also calculated to give an indication of the skill of the probabilistic forecast compared to the reference forecast (persistence):

$$ {\text{BSS}} = \, 1 - \frac{\text{BS}}{{{\text{BS}}_{\text{pers}} }} $$

where BSpers is the Brier score of persistence. A value of 0 or less suggests no skill compared to using persistence, and 1 is a perfect score.

Case studies

As detailed previously, there have been three significant bleaching events in the GBR that occurred during the ACCESS-S1 hindcast period (1990–2012), and two outside the hindcast period in 2016 and 2017. The ability of the model to forecast the extent of the SST anomaly during the critical month of each bleaching event was assessed in the forecasts leading up to the event. For the events in 2016 and 2017 that fall outside the hindcast period, the GloSea5 model operational forecasts were used in place of ACCESS-S1 to give an indication of the model’s forecast skill, although there is likely to be a difference between GloSea5 and ACCESS-S1 forecasts due to differences in data assimilation schemes, ensemble generation, and perturbation techniques. The 1998 and 2016 case studies are defined as El Niño driven due to the significance of the Nino3.4 anomaly in the preceding year being well above the 0.8 °C threshold. Figure 2 shows the warming that occurs in the GBR in the year following the formation of an El Niño based on ERSSTv5 data (Huang et al. 2017). Case studies for years 2002, 2006, and 2017 were defined as ENSO neutral.

Fig. 2
figure 2

SST Anomalies in the GBR during 1997/1998 and 2015/2016 El Niño events. Data from ERSSTv5 (1961–1990 climatology)

Results

Correlation coefficients between ACCESS-S1 model forecasts and observations were either higher than or within the POAMA-2.4 forecast sub-sampling range skill for all start dates (Fig. 3). In particular, ACCESS-S1 showed improved forecast skill over the GBR in January for 1st October, 1st November, 1st December, and 1st January start dates. ACCESS-S1 exhibited higher skill in January, February, and March for start date 1st December, and the month of March for start dates 1st November and 1st January.

Fig. 3
figure 3

Comparison of model skill (Pearson correlation coefficient cf. observations) for ACCESS-S1 versus POAMA for January, February, and March 1990–2012 a forecasts issued 1st October? b 1st November c 1st December d 1st January

Forecast ensemble mean SST anomalies for lead times 0 to 3 months were compared to observed SST anomalies for the hindcast period of 1990 to 2012 (Fig. 4), using areal averages over the marine park region to create a time series representative of the entire GBR. For lead time 0, the model SST anomalies follow the observations closely with a 0.86 correlation coefficient. From lead times 1 to 3 months, the correlation drops, going from 0.70, to 0.58, to 0.48 which are all above the 0.413 threshold for statistical significance. High SSTs caused by the 1998 El Niño were predicted at all four lead times, especially towards austral winter. The peaks in January and April were underpredicted in lead times 1–3 for this event; however, the remainder of 1998 was very good. The warming leading into the 2010–2012 La Niña event was also well predicted in terms of magnitude and timing for all lead times. The model missed the largest anomaly for the hindcast period of 1.4 °C in September 2010 in terms of magnitude in the ensemble mean, but still predicted it to be warmer than average between 0.6 °C and 1.0 °C for lead times 1 to 3. The ensemble spread was also inclusive of this event for all lead times. The model ensemble mean and spread both missed the second highest anomaly in the hindcast period that occurred in December 2004 at lead times 2 and 3.

Fig. 4
figure 4

Hindcast time series: monthly ACCESS-S1 ensemble mean and range of 11 ensembles of SST anomalies from January 1990–December 2012 for lead times a 0, b 1, c 2, d 3 months compared to Reynolds SST anomalies for entire Great Barrier Reef Management Park. Initialisation dates are 1st of the month only. Grey area shows the ensemble spread

The narrower ensemble spread of lead time 0 (shown as the grey shaded region on Fig. 4a) c.f. lead times 1 to 3 (Fig. 4b–d) indicates increased forecast confidence. However, at all lead times some observed events fell outside the ensemble spread. A rank histogram was plotted to evaluate the extent to which the 11-member model ensemble spread captured observations at different lead times (Fig. 4). As observed in the rank histogram (Fig. 5), the first and last ensemble bins of the histogram at lead time 0 have high peaks, demonstrating that the model spread is too small resulting in some observations falling both below and above the coldest and hottest extremes. Lead times 1, 2, and 3 show much improvement with a flatter histogram.

Fig. 5
figure 5

Rank histogram for GBR marine park region showing lead times 0, 1, and 2. For each pixel observation, the ensembles are ordered from lowest to highest to form a series of bins. The observation is allocated to the bin it falls within and counts as one occurrence. The process is repeated for all observations and lead times to produce a histogram of rank

Forecast error during the summer months was lower than persistence across the Coral Sea for all lead times and forecast start dates, particularly in February (Fig. 6). The RMSE from areal averaged SST anomalies within the marine park for summer was 0.37 when forecast from October (persistence = 0.48), 0.34 from November forecast (persistence = 0.39), and 0.35 from December forecast (persistence = 0.39).

Fig. 6
figure 6

Root means squared (RMS) error for SSTA forecasts for January, February, and March 1990–2012 at lead times 0–5 months compared to persistence SSTA forecasts

Figure 7 shows the standard deviation of observations over the hindcast period for January, February, and March. The highest variability in the Coral Sea occurs over the Mellish Plateau for January and February, the Cato Trough for February, and North Marion and Queensland Plateau in March. These patterns correspond with areas of poor skill in the RMSE shown in Fig. 6.

Fig. 7
figure 7

Standard deviation for January, February, and March of observed SST

Model skill evaluation was also performed for each of the four management zones as shown in Fig. 1; Far Northern (120 grid cells), Cairns/Cooktown (52 grid cells), Townsville/Whitsunday (112 grid cells), and Mackay/Capricorn (219 grid cells). Pearson correlation coefficients were calculated for multiweek and monthly ensemble mean SST anomalies averaged over each zone against the satellite observations (Fig. 8).

Fig. 8
figure 8

Skill in terms of correlation coefficient for each Management Zone a Far Northern, b Cairns/Cooktown, c Townsville/Whitsunday, d Mackay/Capricorn, showing both ensemble mean for multiweek and monthly forecasts. Skill is only shaded where statistically significant (i.e. r ≥ 0.413)

Typically, the model has highest skill for the first fortnight from the start date (f1; weeks 1 and 2), and the first month (lead time 0) across all start dates, with coefficients of r = 0.7 or higher. Fortnight two (f2; weeks 3 and 4) had much lower skill when forecasting from start dates within the summer period (November to February), occasionally dropping below the statistically significant threshold. Skill was highest overall in the Far Northern management zone. Forecasting November for all lead times in the southern GBR was particularly poor, as was forecasting past lead time 1 from the December start date. Forecasting the critical bleaching months of February and March from 1st November exhibited higher skill than forecasts starting 1st December in the southern GBR (Townsville/Whitsunday to Mackay/Capricorn). The latter had no significant skill in this region from lead time 2 onwards.

Probabilistic skill

Metrics to represent resolution and reliability were adopted to assess probabilistic skill and were calculated using both 11 and 22 member ensembles to evaluate the impact of time-lagged ensemble generation on skill. Three exceedance thresholds were used to define events: 0 °C, 0.6 °C, and tercile 3, for the first three lead times (0, 1, and 2). The ROC curves and reliability diagrams for 22 member ensemble forecasts are shown in Fig. 9. (Resolution, Brier skill scores, and reliability were all marginally lower in 11-member ensemble forecasts; not shown.) The most noticeable improvement achieved by increasing the size of the ensemble was in the upper end of the reliability diagram: additional ensemble members increased the likelihood of the model capturing extreme events. This improvement in reliability with increased ensemble size was also reflected in experiments using POAMA2 (Wang et al. 2011), but was also determined to be marginal. The model resolution (the ability of the model to discriminate between two dichotomous events) demonstrated on the ROC plots was all in positive territory, with acceptable skill (> 0.7 AUC) for lead times 1 and 2, and excellent skill (> 0.8 AUC) at lead time 0 for all thresholds tested.

Fig. 9
figure 9

Relative operating characteristic (ROC) and reliability diagrams for thresholds of 0 °C, 0.6 °C, and tercile 3, and lead times 0, 1, 2 for all monthly forecasts, 22-member ensembles

Brier skill scores were all positive (Fig. 9, inset legend text, right column panels), indicating that model had value over persistence. Reliability diagrams indicate that the model has good 1:1 relationship between forecast probability and observed frequency for thresholds 0 and 0.6 degrees. Tercile 3 forecasts drift below the 1:1 relationship, indicating that the model is under-dispersive and is over-forecasting (model probabilities consistently higher than observed frequencies, i.e. the model is overconfident).

Case studies

The ability of ACCESS-S1 to predict the warmest observed SST anomalies during summers when mass bleaching occurred was assessed. Case studies of GBR condition during El Niño events (1998 and 2016) and neutral conditions (2002, 2006, and 2017) were considered separately.

The highest anomaly reached in the southern GBR in the 1998 bleaching event occurred in February. Ensemble mean SST anomaly forecasts initialised on 1st December through to 1st February (lead times 2, 1, and 0) are shown in the top row of Fig. 10, as well as observed (Reynolds) SST anomaly. The model repeatedly forecasts warmer conditions in the southern GBR where bleaching was experienced (0.2 °C to 0.6 °C). At lead time 0, the model was able to forecast the critical anomaly level (> 0.6 °C) in the southern GBR. In 2016 (bottom row of Fig. 10), the highest anomalies occurred during March in the northern half of the GBR. Warm conditions were predicted for the January 1st start date yet cooled in the 1st February forecast (although still warm offshore). Lead time 0 provided a very good representation of the extreme marine heatwave in terms of spatial patterns and magnitude. The highest anomaly in the north in the observations was 1.5 °C to 2.0 °C; however, the model did not forecast greater than 1.5 °C.

Fig. 10
figure 10

Seasonal SST anomaly forecasts for peak bleaching months during two El Niño events, showing lead times 2, 1, 0, and the observations in the final column

In 2002 (categorised as an ENSO-neutral year), bleaching was first observed at the end of January. The model forecast (top row Fig. 11) showed minor warm conditions (0.2 °C to 0.4 °C) in the Cairns/Cooktown management zone for the 1st December start date, but warm anomalies above 0.6 °C were not forecast in the GBR until lead time 0. Bleaching was observed mainly offshore, where model forecast anomalies at lead time 0 were between 0.4 and 0.6 °C. In January of 2006 (ENSO neutral), most bleaching was confined to a fairly localised area around the Keppel Islands towards the southern end of the GBR. Patches of warm anomalies are present in the southern GBR at lead times 2 and 1 (middle row Fig. 11), with lead time 0 showing a precise region of warming between 1 and 1.5 °C in the southernmost management zone. (Observations in the region were between 1.5 and 2.0 °C.) The March 2017 event (bottom row of Fig. 11) forecasts warm anomalies up to 0.4 °C to 0.8 °C in most of the GBR at 1st January start date, receding significantly at lead time 1. At lead time 0, warming escalated again, with forecast anomalies between 0.8 and 1.5 °C in the northern half of the GBR, which is a similar range to the observed anomalies.

Fig. 11
figure 11

Seasonal SST anomaly forecasts for peak bleaching months during three ENSO-neutral events, showing lead times 2, 1, 0, and the observations in the final column

Figures 12 and 13 show the ensemble mean forecasts for the management zones where bleaching occurred, for five bleaching events, starting 5 months prior to the event. Box colours correspond to forecast start dates as labelled.

Fig. 12
figure 12

Bleaching during El Niño events a summer of 1998 in the southern half of the marine park, mostly inshore, and b summer of 2016 in the northern half, most significant in the ‘Far Northern’ management zone

Fig. 13
figure 13

Other significant bleaching events occurring since the start of the hindcast period (1990) during ENSO-neutral periods. a 2002 event in the southern half, affecting offshore reefs. b bleaching in 2006, main impacts occurring at the Keppel Islands. c 2017 event affecting the central third of the marine park

In 1998, the model indicated median warm anomalies of at least 0.5 °C during February when forecast was initialised on the 1st November. The magnitude of the February anomaly was not predicted in the ensemble mean for forecasts at any lead time (including the 1st of February lead zero forecast); however, the upper quartile range of the ensemble spread did capture the February anomaly for all forecasts, including those initialised on 1st October. A similar conclusion was found using POAMAv1.5 for 2010/2011 warm anomaly case study (Spillman 2011b). In 2016 (Fig. 12, right column), the model predicted March median warming of at least 0.5 °C from January, but only predicted an extreme median anomaly for march (> 1.0 °C) in the far north at the 1st March forecast.

The 2002 and 2006 bleaching events were both attributed to trends in local weather. They exhibited an ongoing seasonal thermal accumulation over several months from November onwards (Fig. 13). The model forecast consistently attempted to cool the SSTs following each initialisation in the austral summer months, trending back to neutral in subsequent months. In 2017, forecasts were able to predict warming of approximately 0.4 °C in January and February; however, the observed magnitude of the anomaly in January and March (more than 1 °C) was not forecast in terms of the median, even at short lead times (similar to the El Niño 1997/1998 and 2015/2016 forecasts). All forecasts for the 2017 austral summer period (Fig. 13c) include the observed anomaly within the model ensemble spread.

ACCESS-S1 was able to forecast the anomalously warm conditions associated with El Niño conditions, though peak SST magnitudes were not captured. The model struggled to forecast the bleaching events during the atmosphere driven events in 2002/2006, with the model often dampening any initial SST anomalies towards the climatological mean state as the forecast progressed.

Discussion

The vulnerability of the GBR to a changing climate was highlighted during the consecutive mass bleaching events in the summers of 2016 and 2017. Bleaching events are predicted to occur on average twice a decade by 2035 under the RCP8.5 scenario (Heron et al. 2017). Management interventions are critical in reducing vulnerability to maintain coral dominated ecosystems leading up to and during bleaching events, such as improvements in water quality and effective Crown of Thorns starfish control to enhance reef resilience (Hoegh-Guldberg et al. 2007; Wolff et al. 2018). Marine park managers continue to rely on skilful sub-seasonal to seasonal forecast models to inform decision making with regard to coral reef management. The Bureau of Meteorology’s new seasonal forecast model ACCESS-S1 is well placed for integration in marine park managers’ risk management systems, with model benefits including high ocean resolution, daily updates, and probabilistic forecasts from a 99 member ensemble which includes both perturbed initial conditions and time-lag members from previous days’ forecasts.

Over the hindcast period 1990–2012, ACCESS-S1 was most successful in forecasting larger warm anomalies in the GBR associated with climate drivers that persisted over many months (e.g. larger warm anomalies during ENSO events, such as El Niño 1997/1998 and La Niña 2010/2011), especially prior to monsoon onset when the correlation between northern Australian SST and rainfall is high, and less skill post-monsoon onset (typically late December). The model consistently performed better than persistence over the critical summer period, and marginally better than POAMA. Zhou et al. (2015) found that GloSea5 had good skill when predicting the magnitude of the 1997/1998 El Niño in terms of SST, indicating that ACCESS-S1 is doing well to get the ENSO teleconnections right in the GBR, though peak magnitudes in this area were not captured by the ensemble mean.

The model was less successful in forecasting short-term events driven by regional weather patterns, such as the significant warming events causing bleaching in summer 2002 and 2006. Air–sea fluxes are difficult to determine in global climate models (Valdivieso et al. 2017), especially near and in the western warm pool (Song and Yu 2013). The predictability of the atmosphere and the air–sea interaction is reduced following monsoon onset, and the regime shifts to have no correlation between the rainfall and SST (Hendon et al. 2012) or the atmospheric forcing of SST variability (Wu and Kirtman 2007). A noticeable reduction in correlation coefficients between pre-monsoon and post-monsoon onset was observed (Fig. 8) for all lead times, which is a similar finding for POAMA1.5 (Hendon et al. 2012). There is also an inherent difficulty in forecasting SST in the Coral Sea for January/February/March as these months align with the peak of the tropical cyclone season. ACCESS-S1 cannot forecast cyclones at a seasonal timescale, and therefore, their impacts on coral bleaching risk (usually a local reduction in SST due to vertical mixing) cannot be forecast and have a significant detrimental effect on forecast accuracy. Once SST variations due to cyclone activity are captured in the model initial conditions, forecasts can change significantly. There is currently ongoing research into forecasting tropical cyclones on a seasonal timescale (Gregory et al. 2019; Camp et al. 2018)

Forecasts in the northern GBR often exhibited the highest skill and exceeded persistence skill by the highest margins (Figs. 6, 8). ACCESS-S1 skill in the southern GBR also exceeded that of a persistence forecast most of the time, with the exception of forecasts initialised on 1st December. Although skill in the southern GBR is lower than in the north, the model was successfully able to predict SST anomalies associated with the peak of the East Australian Current in February, where it straddles marine park boundary in the southern GBR, evident in RMSE maps for January and February forecasts initialised on 1st November (Fig. 6, third row from top, cf. fourth row from top).

The ability of the model to discriminate between two dichotomous events (whether or not a threshold is exceeded) ranged from excellent at lead time 0 to reasonable at lead times 1 and 2 (Fig. 9). Increasing the ensemble size using additional time-lagged ensemble members showed improvement in probabilistic skill for warm anomaly events. Model reliability showed good ability in matching the observed frequency for warm anomaly events (with an improvement for 22 ensemble members compared to 11), with the exception tercile 3 exceedance events where the model is too under-dispersive (model is overconfident) which was identified by the reliability line falling below the 1:1 relationship. This assessment was limited to a 22-member ensemble; however, the operational system will have 99 ensemble members, which are expected to contribute improvements in probabilistic skill, and ensemble spread. With increased number of ensemble members, the ability for the ensemble spread to encompass more unlikely extreme events should be improved, especially at lead time zero. This would in principle flatten the rank histogram (Fig. 5), reflecting a better match between the model variability and observed variability. The probabilistic skill assessment used grid cell by grid cell matching between the model and the observation, which can be a difficult skill measure. If an observed feature or eddy is predicted in the incorrect location, even by single grid cell, there will be a skill penalty.

The results presented herein demonstrate that ACCESS-S1 can provide skilful SST forecasts in support of coral reef management activities on sub-seasonal to seasonal timescales. Operational seasonal SST forecasts from ACCESS-S1 commenced in August 2018 (updated every 3 d) at the Bureau of Meteorology’s website (http://www.bom.gov.au/oceanography/oceantemp/sst-outlook-map.shtml) for the GBR and greater Coral Sea region. Following on from ACCESS-S1, a subsequent version called ACCESS-S2 will be released as an operational product. This version of the model will have a different data assimilation scheme, where the land surface will be initialised with realistic initial conditions (Hudson et al. 2017), and include a locally developed coupled assimilation-initialisation strategy, which is expected to provide further enhancements in model skill. ACCESS-S2 will have ocean initialisation state perturbations that were absent in ACCESS-S1, directly contributing to the variability in the SST, especially at short lead times (< 3 months) and multiweek forecasts (Vialard et al. 2005). Operational real-time ACCESS-S1 forecasts products for the GBR and greater Coral Sea region are currently available (updated every 3 d) at the Bureau of Meteorology’s website (http://www.bom.gov.au/oceanography/oceantemp/sst-outlook-map.shtml). Future work will include operational thermal stress forecasts using duration and magnitude of stress above a critically defined climatology, and also expand to other regions around Australia.