Application of Hydrological Forecast Verification Information

  • Kevin WernerEmail author
  • Jan S. VerkadeEmail author
  • Thomas C. PaganoEmail author
Reference work entry


Verification studies and systems often focus solely on the exercise of verifying forecasts and not on the application of verification information. This chapter discusses the potential for application of hydrological forecast verification information to improve decision-making in and around the forecast process. Decision-makers include model developers and system designers, forecasters, forecast consumers, and forecast administrators. Each of these has an important role in decisions about forecasts and/or the application of forecasts that may be improved through use of forecast verification. For each, we describe the role, the actions that could be taken to improve forecasts or their application, the context and constraints of those actions, and needs for verification information. Consistent with other studies and assessments on forecast verification, we identify the need for a routine forecast verification system to archive data, plan for operations, measure forecast performance, and group forecasts according to application. Further, we call on forecast agencies and forecast consumers to use forecast verification as a routine part of their operations in order to continually improve services and to engage others to use forecast verification to improve decision-making.


Hydrological forecasting Forecast verification Decision making 

1 Introduction

Good forecast verification only acquires value through its ability to improve the effectiveness of the forecasting systems and users’ decisions (Stanski et al. 1989). The present chapter explores the link between hydrological forecast verification, i.e., the process of quality-assessing hydrological forecasts, and improving the forecasting systems and users’ decisions.

Quality assessment evaluates the fitness for use of the forecasts by affected stakeholders, such as natural resource managers and people at risk. Verification focuses on the accuracy-related aspects of quality. The reasons for verification fall broadly into three categories: administrative, economic, and scientific (Brier and Allen 1951; Stanski et al. 1989; Welles et al. 2007). Administrative reasons include the justification of the cost of implementation of or improvement of a forecasting system to whoever bears the costs of that system (often ultimately the taxpayer). Economic reasons are the expected benefits accrued to a stakeholder, through the use of the forecasts. Scientific verification includes the identification of strengths and weaknesses of a forecast product in order to define research and development that will lead to improvements in the forecasts. This can impact the systems for modeling physical processes (e.g., rainfall to runoff, hydrodynamic routing), the forecasters operating these systems, or both.

Once made aware of verification and its application, forecast users maintain a strong appetite for verification information (Hartmann et al. 1999). Increased reporting of past forecast performance was ranked as the highest of 23 development priorities in surveys of users – members of European Union member states’ national and subnational hydrological forecasting agencies – of the European Flood Awareness System (Wetterhall et al. 2013). Considering that its cost and complexity are much lower than traditional investments, such as improving physical model representations (Pagano et al. 2014), it is remarkable how frequently this investment is not made.

Despite the benefits, there is a growing, but we believe still relatively underdeveloped, culture of forecast verification in operational hydrology (Welles et al. 2007; Welles and Sorooshian 2009), with some notable exceptions that have developed since (Bureau of Meteorology 2015; Demargne et al. 2009). For example, it took 80 years of seasonal water supply forecasting in the Western USA before the first scientific verification of those operational forecasts was published (Pagano et al. 2004).

The lack of hydrologic forecast verification is not for a lack of verification methods. See, for example, Jolliffe and Stephenson (2012), Murphy (1993), Wilks (2011), WWRP/WGNE Joint Working Group on Forecast Verification Research (2015) and the references therein. The literature is also rich on specific analyses that can be performed including, for example, measures-oriented and distributions-oriented verification (e.g., Bradley et al. 2004; Murphy 1997) and conditional verification (Bradley and Schwartz 2011). In addition, there are software packages available to facilitate the computation of these measures (see Pocernich (2012) for a recent discussion thereof).

Verification metrics must be aligned with the goals of the verification activity – ultimately, the improved effectiveness of the forecasts. The present chapter explores the link between computing the verification metrics and improving the forecast process and products, on which the scientific literature is nearly silent. The chapter is structured accordingly. In Sect. 2, the general attributes of a good verification are introduced. The effectiveness of the verification depends on the intended audience and so Sect. 3 contains an overview of verification users. Each category of user and their roles in forecasting, their relevant decisions, and their verification needs are then addressed in subsequent sections. The chapter ends with some general conclusions as well as calls for action.

2 What Constitutes Good Forecast Verification?

The qualities of good forecast verifications are largely similar to the qualities of good forecasts (World Meteorological Organization 2013) and, more generally, good information (Wang and Strong 1996). These include aspects such as production, credibility, accuracy, transmission, and messaging:
  • Production pertains to the act of producing verification information: is the verification easy to generate and is it produced routinely?

  • Credibility refers to how the verification is perceived by users: is it honest, impartial, and unprejudiced?

  • Accuracy pertains to how correct the verification is, technically speaking: have the verification measures been calculated correctly, and has their uncertainty/statistical significance been accurately quantified?

  • Transmission refers to how the verification gets to the users: is the verification available soon after the forecast is made? Are evaluations for the public distributed freely and are easily accessible?

  • Finally, messaging pertains to how the verification is framed for the user: is the verification clear and easy to understand? Is the verification in meaningful units and is it expressed in users’ terms? Is it relevant and specific to a given user? Is it meaningful to those using it?

The primary theme of these aspects is “know and serve your users effectively.” Who is the verification for and what is that person’s motivation? What information do they need and what are their limitations? These issues are explored in the remainder of this chapter.

3 Who Can Use Verification Information?

A typology of users of verification information starts with a conceptualisation of how verification information is used to improve the effectiveness of forecast products. As a starting point, we use the “forecast – decision – response” model. Here, “forecast” is modeled as a single process containing multiple subprocesses including model development and real-time forecasting. The outcome of that process – a forecast – is communicated to a user so as to inform a forecast-sensitive decision. Depending on the purpose of the forecasting system, the response can vary from setting the height of a water gate to warning an at-risk community against an impending flood to making a conscious choice to do nothing. In some – but not all – cases, the decision affects the observed outcome (e.g., attempts to prevent the flood succeed).

Figure 1 depicts the role of verification in the “forecast – decision – response” process as well as the adjacent “model development” and “observation” processes. Verification is often done by comparing model outcomes, forecasts, or forecast-sensitive decisions with observations. The information about the quality of these is fed back to model developers, forecast-decision-response system designers, forecasters, and/or decision-makers. Ultimately the goal is for these actors to improve the forecast and its application based on verification. This is described in subsequent chapters.
Fig. 1

Role of verification in the process and forecast application information flow

4 Verification for Model Developers and System Designers

4.1 Role in the Forecast Process

Developers are those who contribute to the creation and improvement of systems used to produce forecasts (Pappenberger et al. 2015a). This is distinct from forecasters (described in the next section) who are the intended users of the forecast production systems. Developers may include research scientists and trained personnel within forecasting agencies but also those in the wider community, such as academics and consultants.

A typical forecasting system may include one or more sources of guidance, such as statistical or dynamical models of runoff. Models are articulations of the scientific community’s views about how natural systems behave (Pagano et al. 2014). Dynamical hydrological models typically contain generalized laws relating precipitation, snow melt, and evaporation to runoff but also contain observed parameters derived from catchment characteristics (e.g., catchment area) and conceptual parameters that are tuned to local observed data.

Forecasting systems may include chains of models, such as one dynamical model predicting future rainfall forcing a different dynamical model predicting runoff, followed by a statistical model to reduce forecast biases or quantify uncertainty.

4.2 Actions Stakeholders Can Take to Improve the Forecast Process

There are a variety of ways in which modelers can improve models, such as algorithms, description of physical processes, and use of data and interfaces. For example, improved algorithms for solving the model equations can lead to faster model execution and/or greater numerical stability of the results (Kavetski and Clark 2011). New regression methodologies may also improve statistical models.

Many operational forecasters use simple “bucket style” rainfall-runoff models (Hartmann et al. 2002), but since the 1990s there has been increased development of fully spatially distributed and physically based models, including many more physical processes than were previously modeled. Modelers can also build better simulations of processes that are the result of human activity, such as floodplain obstructions and diversions. Rather than adding processes and complexity, others have taken the approach to reduce existing models to their simplest yet most effective forms, testing the results on data from thousands of catchments (e.g., Perrin et al. 2003).

Dynamical models typically accept time series data of forcing variables, but also simulate intermediary catchment processes such as soil moisture. Data assimilation (e.g., Liu et al. 2012) is the process of incorporating observations into the model. This may include, for example, comparing recent simulations and observed discharge so as to update simulated soil moisture to bring the simulated discharge closer to the observed. Researchers can develop improved data streams and methods for using available data.

Modelers can also study the interfaces between models. For example, precipitation forecasts from weather models have historically had substantial biases and coarse spatial resolution that should be addressed so as to improve hydrologic forecast accuracy (Cuo et al. 2011). The weather model may provide a single deterministic forecast, and the modelers may develop a process to convert this into a probabilistic or ensemble forecast (see, for example Robertson et al. 2013; Weerts et al. 2011).

While the scientific community can improve models in a general sense, in-house developers (and consultants) play a role in refining and tuning systems in any forecasting agency. For example, an agency may have adopted an approach for calibrating dynamical model parameters, but alternative approaches may be evaluated and compared for specific forecasting challenges (i.e., climate, landscapes, data availability). The in-house developers may also decide that the parameters of a model of a certain catchment could be improved and may want to compare model performance with the old and new parameters. Finally, in-house developers may be responsible for the interfaces for visualizing and interacting with the model guidance and may find that adjustments may lead to more reliable interpretation of the information available to the hydrologists creating the official forecasts (Demargne et al. 2009).

4.3 Context and Constraints

Research scientists have developed and improved rainfall-runoff models for decades. Given that these models are not solely used for operational forecasting (e.g., they may be used to estimate historical water availability when observations are incomplete, or design structures in the floodplain to withstand a given level of risk), there is often only an indirect link between forecaster needs and modelers efforts. Similarly, the weather models that generate rainfall forecasts are also designed to predict many other variables such as air pressure, temperature, winds, humidity.

The technical sophistication of research scientists is commonly very high, and therefore this audience likely requires little additional training to make effective use of formalized and complex verification approaches. However, each discipline may have its own terminology, and therefore there is the potential for miscommunication and misinterpretation. For example, terms like optimization, parameterization, calibration, tuning, and postprocessing have different meanings in the weather forecasting, hydrology, and water management communities. Additionally, independent researchers may give the different names to the same verification measure. Further, research scientists may ultimately develop complex verification approaches that are inaccessible or incomprehensible to other audiences because of their sophistication, terminology, and specialized purposes.

4.4 Verification Needs

This audience is likely to ask questions such as “How can the system create better forecasts?” and “Does the accuracy of the forecasts match our expectations, and, if not, why not and what can be done to improve them?”

In diagnosing problems or identifying ways to improve the forecasts, these users would like to control as many factors as possible. They many not even study the final official forecasts received by users but may generate retrospective forecasts from the system with some operational realism but without confounding factors, such as poor-quality or latent observational data. For example, the Australian Bureau of Meteorology sponsored research to set up and evaluate six hydrologic models for several hundred Australian catchments (Pagano et al. 2009). Each model was calibrated using the same method and forced by identical historical time series data. Skill was evaluated using a variety of measures (Nash Sutcliffe, Nash Sutcliffe of log-transformed flows, correlation, bias, and four other diagnostic scores). The overall best performing model was selected for implementation as part of the Bureau’s new operational short-term river-forecasting service.

In the model-selection experiment, models did not generate forecasts but rather simulations by using observed rainfall data, not forecast rainfall. Later research used one rainfall-runoff model, but combined it with a variety of forecast rainfall sources, again testing on many catchments, to determine the most suitable forecast rainfall source (Bennett et al. 2013). Further research isolated other factors, such as statistical postprocessing.

Finally, upon implementation of the system, Bureau hydrologists monitored the forecasts and observations daily during a preoperational trial, documenting unexpected or unusual behavior. The historical verification datasets were used to detect the unusualness of the error in any specific forecast. Some unusual behavior was evident by visual inspection of the hydrograph time series (e.g., the duration of the flood event, rates of recession). Hydrologists created algorithms to detect such behavior and then computed the measures across the past year of forecasts at all catchments to determine how widespread/frequent any particular problem was.

There are many examples of targeted research studies to improve specific model processes, some of which have an operational focus. For example, Bryant and Painter (2009) correlated surface radiative forcing with errors in the calibration dataset of the US National Weather Service’s operational model in four mountainous catchments. This identified that a currently unmodeled process (dust on snow changing the surface albedo, leading to earlier snowmelt) was likely responsible for some part of the operational forecast error. The agency then knew it could either improve the model itself or warn operational forecasters of this effect so that they could make adjustments during dust events.

5 Verification for Operational Forecasters

5.1 Role in the Forecast Process

Operational forecasters are responsible for the routine production of streamflow forecasts. Forecasters typically operate in a time and data constrained environment where they must produce a forecast by certain times of day based on the best data available at that time. Forecasters are frequently involved in some or all of the following activities: reviewing data, performing data quality control, running and adjusting models, interpreting model output, assessing forecast confidence, interacting with other forecast producers (e.g., meteorologists), communicating forecasts, coordinating with water managers whose actions both depend on and affect river flow, translating model output into the decision-maker’s context, and responding to user requests (Pagano et al. 2014). Depending on the context of the organization and forecast conditions, forecasters can engage in some or all of these activities multiple times per day. Some operational forecasters also serve as system developers and/or administrators during nonoperational periods.

5.2 Actions Stakeholder Can Take to Improve the Forecast Process

Being central to it, forecasters are well positioned to improve the forecast process. Forecasters routinely assess the quality of and seek to improve the forecast model runs. Forecasters also routinely apply expert judgement based on experience to improve forecasts. For example, in the US NWS, hydrologists may run a dynamic simulation model several times making modifications to the model states or input data as part of the process to achieve the best possible forecast.

Experienced forecasters often have very good mental models of how nature behaves and have first-hand experience with the performance of the guidance available to them. They often can readily recall situations in which the models have failed or the outcome was not as expected. Forecasters may provide system developers with lines of investigation on how to improve the models.

While forecasters often successfully employ intuition based on experience and conduct “sanity checks” using heuristics, they also have their own biases. For example, forecasters may exhibit overconfidence, for example, thinking that there is an 80% chance of a typically rare event happening, when it is only 20% likely (Nicholls 1999). This highlights the value that adequate systems of feedback can bring to assist forecasters identify and limit these biases and test the effectiveness of their heuristics.

5.3 Context and Constraints

Forecasters have a range of statistical sophistication. Most forecasters have an academic background in science and/or engineering and therefore have at least a basic level of training in statistics. Given the range of approaches to improving the forecast, and possible limited exposure of the forecaster to verification practices, an individual forecasters’ experience with applying statistical techniques common to forecast verification can be variable. Some forecasters and their organizations are well versed in verification and have incorporated it into the forecast production cycle. Other forecasters and their organizations have never routinely incorporated forecast verification and are typically much less familiar with the use of statistical techniques.

Forecasters, and the forecast verification systems available to them, frequently emphasize recent forecasts. Particularly during a long duration flood or even drought event when hydrologic conditions persist over time, forecasters (and people more generally) have a tendency to examine and utilize recent forecast performance as a proxy for current forecast accuracy (Muir and Moray 1996). While this practice may add value to the current forecast by identifying and correcting forecast biases and problems in real time, it also may lead to an overweighting of recent forecast skill at the expense of a longer verification analysis.

Forecasters are often constrained by time and may not have access to systems to configure their own verification measures. They may informally compare their forecasts to observations and have subjective impressions of performance, which may be difficult to quantitatively articulate. Forecasters may also have concerns about the consequences of verification on the human aspects of the forecasting system (e.g., if they are shown to be underperforming relative to their peers, will there be professional ramifications for them?).

5.4 Verification Needs

This audience is likely to ask questions such as “Should I trust a particular source of guidance?” “What are the likely errors in today’s forecast?” “How can I best blend together multiple sources of guidance, along with my own understanding of the situation that may not already be captured by the models?” “Do my operational rules-of-thumb have a scientific basis?” and “Am I adding value to the forecast process?”

Verification can be used to ground forecasters’ understanding of the performance of the models. Impressions of overall system reliability that are derived subjectively often do not match the actual system performance (Skitka et al. 1999). People particularly struggle to know how much to trust a model when its quality is not consistent (Parasuraman and Riley 1997), in part because trust is conditioned on the worst behaviors of the system, i.e., the largest errors in recent memory (Muir 1994; Muir and Moray 1996).

One of the best ways to let a forecaster know if particular model guidance should be trusted is to integrate measures of uncertainty in the real-time products themselves. For example, seasonal climate forecasters produce long-lead outlooks of precipitation and temperature, and in many locations, seasons, and lead times, individual tools may have no appreciable skill. On the forecast maps, these cases are displayed with a shading, indicating skill is below a threshold value to warn the forecaster against putting too much trust in that product (Fig. 2).
Fig. 2

Coupled Forecast System model version 2 seasonal precipitation forecast with skill mask. Forecasts expected to have low skill, based on the historical performance of the model for that region, season, lead time, and element, are censored by a gray mask (From

Naturally, the reliability of such uncertainty quantifications is its own verification issues. For example, many ensemble streamflow predictions only account for uncertainty in future rainfall, whereas there is also uncertainty in the model structure, parameters, and other factors. It is common for forecasters to view time series charts of current and recent forecast hydrographs along with recent observations, so as to visualize the errors of past forecasts and place the current forecast in context. This assumes that there is persistence in errors and biases. Here, the forecaster is using verification information combined with expertise and judgment to correct and blend forecasts, a process that is done formally and objectively with techniques such as Bayesian Model Averaging (Duan et al. 2007).

While valuable, forecaster judgment and intuition require training, to be communicated effectively and feedback so that such judgements and intuition improve. Rapid, relevant, and unambiguous feedback is the key to improving intuitive expertise (Kahneman and Klein 2009). If the final official product contains expert input, it is good to also keep a separate record of unadjusted models and/or objectively blended guidance to compare to. Therefore, the forecaster can see if his/her adjustments had a positive or negative impact on product accuracy. Such feedback should be given rapidly after the event, while the situation is still fresh in the forecaster’s mind.

Finally, an avenue for expertise development is formal training, for example, in relation to the testing of judgements. Some elements of the natural system cannot be modeled (because of insufficient models, data or a variety of other factors), but the forecaster may have some awareness of this and attempt to incorporate this into the forecasts manually. The previous section gave the example of dust on snow affecting hydrograph simulations. Prior to the research study, the forecaster may have developed a heuristic rule-of-thumb (e.g., when dust happens, the forecasts should be 30% lower than otherwise). Verification can be used in training to recognize, question, and improve those heuristics. For example, the historical error of the model may have been 20%, and therefore a 30% change would be an overadjustment. Such training should include case studies to introduce the issue to forecasters but should also include broad sets of forecasts to ensure the generalizability of the training. Ultimately, forecasters should drive this training not only to improve their own forecasts but also to identify opportunities for improvement for model developers and administrators as well.

6 Verification for Forecast Users

6.1 Role in the Forecast Process

As the intended recipients of the forecasts, users play a critical role in verification and the forecast process. Just as the hydrologist may consider multiple sources of guidance while determining the final official forecast, the user may consolidate forecasts from multiple sources and place them in context of their situational awareness of other factors, natural and societal. The users interpret the forecast and assess their confidence in the product. The forecast then informs a decision-making process that is largely specific to a given user, and is affected by the user’s objectives, culture, resources, and other factors.

There is a diverse array of user communities, including citizens in danger of being flooded, reservoir operators, irrigators, financial traders (e.g., commodities, hydropower), insurance agencies, emergency responders, disaster relief agencies, and the media. The forecasts may affect decisions to evacuate a community, store or release water from a reservoir, plant a certain type of drought-resistant crop, and so on.

6.2 Actions Stakeholder Can Take to Improve the Forecast Process

There are frequent cases of interested stakeholders contacting forecast providers for clarifications or more details associated with a particular forecast, or they may generally communicate with forecast providers about their needs (e.g., where they would like forecasts, how often updates are needed). If forecast products are not meeting the user needs, they may decide not to use it, ask to have the product changed, change to another source of forecasts, or even change to another forecast provider who packages the same forecast in a more relevant and accessible form.

Users can also highlight the most important forecasts, such as those for certain locations, or situations (e.g., forecasts crossing above flood-stage are more important to emergency response agencies than forecasts during low flows). Users can specify the minimum level of accuracy required, so as to help prioritize areas for improvement. They can communicate their vulnerability to errors in the forecast, which is tied to their risk tolerance. For example, when making a deterministic forecast, the emergency manager may prefer the forecaster to be conservative, adjusting the forecast towards the worst case scenario, because overpreparing for a disaster is not as dangerous as underpreparing.

Users can ensure that they have a clear understanding of how to interpret forecast products and are familiar with the forecast uncertainties. They can be aware of biases in the forecast and in certain cases develop in-house systems for adding value to the forecasts (e.g., relate a forecast river level to an inundation extent). Verification can help manage expectations around the capabilities of the forecasting systems. Given that users’ decisions affect outcomes, they can feed back information to the forecasters about that process, for example, when planned reservoir releases may contribute significantly to downstream flow (and these releases may depend on the forecasts themselves).

6.3 Context and Constraints

Users encompass the entire spectrum of statistical literacy, from highly sophisticated users who may run their own modeling systems in-house and perform their own verifications, to those with no background at all in statistics. The latter category of users may not understand verification concepts, techniques, or measures and may even have issues interpreting the forecasts themselves. Some users may analyze the forecasts on a daily basis, but others may use the forecasts only very infrequently, such as every few years when floods occur. Just as each scientific field has its own terminology, jargon, and definition of terms, users may have their own language in discussing and evaluating forecasts. Further, because users are often external to forecasting agencies, extra effort may be needed to communicate verification information with them.

Users typically have other inputs to their decision process that are unrelated to forecasts. For example, a reservoir operator may need to adjust reservoir releases to account for dynamic requirements of the ecosystem below the reservoir, the water supply requirements of downstream users, or political considerations to address competing interests. Each of these inputs to users’ decision-making processes requires knowledge of diverse areas of input that may not be related to forecasts at all.

6.4 Verification Needs

This audience is likely to have questions such as “Should I trust this forecast?” “How, if at all, should I use this forecast?” “To what extent is forecast quality conditional on attributes of streamflow?” “Can I trust a forecast for an extreme value as much as a routine value?” and “Is the forecast uncertainty compatible with my risk tolerance?” If using probabilistic forecasts, additional questions from the users include “are the forecasts probabilistically reliable? (e.g., when they say 30% probability of flooding, does flooding indeed happen 30% of the time?)”. Many members of this audience are also likely not to know how to express the verification questions they have and may likely need verification architects to act as translators.

Users need to know when forecasts are sufficiently reliable for their purposes (Sarewitz et al. 2000). Consistent communication of forecast uncertainty and historical performance can increase forecast credibility (O’Grady and Shabman 1990). Without this credibility, forecasts may not be used (Rayner et al. 2005). Further, the costly consequences of bad outcomes from the use of a particular forecast can devastate user confidence in subsequent forecasts (Glantz 1982).

Given that forecasters face a similar challenge of converting guidance into official warnings as users do converting official forecasts into effective responses, some of the verification information requirements for users are similar to those of forecasters. They may want to give a quantitative basis to their subjective impressions of the forecast skill. They may want a better understanding of the uncertainty in a specific forecast, informed partly by the recent performance of similar forecasts. In that regard, many of the same approaches, such as integrating verification/uncertainty information into the products themselves, are useful. A critical distinction between forecast producers and users is that the users may be interested in a subset of “high impact” forecasts only. Also, users’ risk tolerance may not be the same as that of forecasters.

For example, forecast producers may look at national maps of forecast skill to get an overall sense of the performance of the forecast system. In contrast, water managers in the Upper Colorado River may only be interested in forecasts for their catchment, issued December to April, forecasting for January to September (Hartmann et al. 2002). Emergency managers on the Mekong River may be most interested in those forecasts indicating a future flood, while the current observation is still below the flood level, a situation that only happens on less than 1% of days (Pagano 2014). Not all users are interested in floods; when the Rio Grande River in the USA is about to dry up, fish biologists must prepare to scoop out endangered species from isolated pools (Paskus 2003). User-oriented verifications often segregate certain forecasts of interest based on their location, lead time, season, and magnitude. To ensure users’ needs are met, the user must be involved in the process of how to subset forecasts.

Additionally, the user has a risk tolerance in the sense that he/she may be more vulnerable to forecasts that are too high than too low. For example, the damage from underpreparing for a flood may be much larger than the cost of overpreparing. Standard forecast evaluation measures such as Nash Sutcliffe or correlation would not reflect this asymmetry. Ideally, the forecast should be probabilistic, and then the user could estimate their own risk tolerance (perhaps with the assistance of forecasters, researchers, extension agents, or other specialists) and decide that, for example, it is most appropriate for them to base their planning around the 80% nonexceedence level of the forecast. This process may involve using a collection of historical forecasts in a formal framework (e.g., input into a reservoir optimization tool) or tabletop exercise (Baldwin et al. 2006). The user may even be able to use verification to quantify financial benefit from using the forecasts (Faber and Stedinger 2001; Hamlet et al. 2002; Verkade and Werner 2011).

Regrettably, probabilistic forecasts are underutilized and deterministic, single-valued forecasts are commonly preferred by users. The forecaster may attempt to compensate for this lack of risk management by users by adjusting the products themselves. For example, the agency that issues deterministic forecasts (e.g., 3.4 m peak river height by Thursday) may informally recommend to the hydrologists that it is “better to forecast too high and too early than too low and too late.” Here, the forecasts are purposefully biased because the costs of overpreparedness are much less than those of a disaster (and the associated damages to the forecasters’ reputation). When combined with an understanding of user decision-making, verification can be used to determine if the forecasters’ estimate of the users’ risk tolerance serves the actual user needs.

A key element of verification for decision-makers is to use language that is meaningful and relevant. For example, knowing that a forecasts’ Root Mean Squared Error is 0.6 m or that the Brier Skill Score is 0.3 may not be useful. In contrast, a user may want to know that in 6 out of 10 cases the forecasts are too high, or half of the time the error is more than 0.4 m or the worst error in the past 5 years was 1.5 m. Analogues are easy to understand and visualize (e.g., in the 1983 flood we predicted 7 m and the observed was 6.7) although care should be taken to emphasize that individual forecasts may not be representative of overall performance.

7 Verification for System Administrators

7.1 Role in the Forecast Process

System administrators – sometimes called program managers – are responsible for developing and evaluating business cases, investment decisions, and financial analyses related to the maintenance and development of forecasting systems. While operational forecasters are responsible for the forecasts, system administrators are accountable for their quality. For example, system administrators may be brought to account for the forecasts of high-profile events, such as major floods that had widespread impacts that were poorly forecast. Such activities are important in maintaining the credibility of, and support for, the forecasting agency. If a particular poor-quality forecast gains the public’s attention, it is useful for the agency to provide quantitative evidence that most other forecasts have been quite accurate or that the forecast system overall has a certain level of skill. System administrators may facilitate and/or monitor the dialogues among modelers, forecasters, and users. Verification may also be part of the reporting of key performance indicators to government ministers and key stakeholders.

7.2 Actions Stakeholder Can Take to Improve the Forecast Process

While modelers may investigate improvements into a particular aspect of a model, system administrators direct resources (people, money, and assets) to make such an investigation possible and a priority. System administrators may also determine the policy related to forecasting procedures and may direct investment in creating new or modifying existing forecast products. They may choose to centralize or decentralize forecasting and may recruit and train additional forecasters. They may invest in improvements in the data networks, forecaster workflow management software, and forecast center facilities and computing or display equipment.

7.3 Context and Constraints

Similar to forecasters, system administrators are typically time limited, and verification may only be a minor aspect of their overall activities. Administrators must also concentrate on all aspects of forecast quality, such as timeliness, accessibility, and system reliability. In this bigger picture, investments in, for example, forecast communication or digital delivery mechanisms may be a higher priority than improving core aspects of forecast accuracy. Typically verification will be a component (along with user-based assessment) of a program evaluating the overall effectiveness of the service. Administrators’ familiarity with statistical concepts and verification terminology varies, with some administrators being former researchers and/or forecasters and others coming from a business management background or other fields entirely.

7.4 Verification Needs

This audience is likely to ask questions such as “What is the overall quality of service I am providing?” “How is the agency performing compared to its peers?” “Where should I invest in improvements in the system?” “Has the agency realized the benefits of past investments?” and “How was the skill during particularly high-profile events?” System administrators generally have three types of verification information needs: “headline scores,” evaluations of system upgrades, and event-specific analyses.

“Headline scores” provide an overall health check of the entire forecasting enterprise and often involve distilling important aspects of quality across many forecasts into a few measures. For example, Pagano et al. (2004) developed an index of performance (Nash Sutcliffe of each site within a 20 year moving window, averaged across 29 spatially and climatically representative sites across the Western USA) that later became a key performance indicator for the agency. This updated annually along with other measures, such as the number of forecasts issued, and reported publicly and to other government agencies. The agency set short-range performance targets. Regrettably, it was difficult to meet these targets because the score, while relatively easy to calculate and communicate, was often influenced by factors that were out of the agency’s control, such as climatic variability and extreme events.

When making decisions about future investments in the forecasting system, system administrators may seek a clear value proposition, possibly supported by evidence that past investments have yielded the expected benefits. For example, some authors quantified the operational river-forecasting benefits of accurate rainfall forecasts, relative to other sources of information (Pappenberger et al. 2011, 2015b; Rossa et al. 2011; Welles 2005; Welles and Sorooshian 2009; Zappa et al. 2010). If evidence can be provided that an equivalent investment (e.g., $1 million) in a more dense raingage network yields much less operational forecast improvement than a $1 million investment in better rainfall forecasts, then the administrative choice is clear. Part of the challenge lies in defining “forecast improvement”: Which forecasts? Improved how? Does the change make some forecasts better and some worse? Currently, operational hydrologic forecasting agencies often lack the resources and frameworks for conducting such quantitative, structured experiments and instead rely on the informed impressions of subject matter experts.

Finally, system administrators may need to investigate the performance of individual forecasts that have captured the public interest. For example, in January 2011, large floods in Queensland Australia inundated a major metropolitan area. The Commission of Inquiry that followed lasted several years and called hundreds of witnesses and received hundreds of written submissions. Of critical interest was the quality of the river and rainfall forecasts, in particular how well the magnitude of the event was predicted. Administrators in the Bureau provided detailed information about the forecasts, their accuracy, the precedence of the flood magnitude, and forecast quality. The agency also provided information about how the forecasts were produced so as to show that standard operating procedures were followed. Similar investigations followed the 1997 Red River floods in the USA (Pielke 1999), the 1983 Colorado River floods (Rhodes et al. 1984), the Yakima River drought (Glantz 1982), and also earthquake predictions associated with the disaster in L’Aquila, Italy in 2009 (American Geophysical Union 2010).

8 Recommendations and Conclusions

Several documents provide guidelines on the performance assessment of public forecasting services (Gordon and Shaykewich 2000; World Meteorological Organization 2013). Based on this study and the recommendations of those other studies, several features of a good verification system can be offered:
  • Archival: Systematically preserve historical operational forecasts, as well as corresponding observations, in a consistent machine readable format to facilitate easy processing. Organize the archive in such a way that historical forecasts can be easily retrieved later on. It is essential to archive official products, but also consider archiving original model inputs, outputs, parameters as well as forecaster interventions. Keeping the original data allows scores to be recomputed over time, if the methods or scores ever change.

  • Planning: Have a verification plan. Know why you want to verify and understand what questions you are attempting to answer, what new information you want to discover. Recognize the diversity of verification needs but do not try to satisfy every need imaginable. Initially at least, keep the verifications simple and start small and grow over time to suit your needs.

  • Measures: Choose elements and scores that are relevant to the needs of the verification audience. Recognize the differences between accuracy and skill. Weigh the merits (simplicity, cost, relevancy, reproducibility, and others) of subjective versus objective verification methods, favoring objective measures where possible. Use multiple measures to evaluate various aspects of forecast accuracy – rarely can “one number” paint the entire picture.

  • Grouping: Group similar forecasts so as to identify systematic errors and accumulate enough examples to calculate results with statistical significance. However, beware of overaggregation, lumping together disparate climates, lead times, events, and so on, because this can conceal useful information. Furthermore, if performing user-oriented verification, attempt to include only forecasts that are relevant to that decision-maker’s context. Forecast verification should be stratified to focus on “high impact” and/or difficult forecasts and be done in a way that informs system improvement.

  • Use: Do not simply verify the forecasts and file away the results in a report. Be prepared to act on the results of the verification, be it to adjust the forecasting system, investing in system improvements, changing forecaster training, and so on.

  • Engagement: Share the results of the verification in a timely manner, especially providing rapid feedback to operational forecasters on the quality of their performance. Keep stakeholders updated regularly and do not simply deliver the numerical scores but also include an interpretation of the results. Communicate the results in an easy to understand way and make the results easily accessible. Seek feedback from users that the verification is meaningful and effective and is achieving its intended purpose. Have a verification communication plan.

Investments in forecast verification capacities that incorporate these aspects will pay dividends for forecast agencies and their stakeholders. Unfortunately such verification capacities are the exception rather than the rule today. This has created an environment where the various users described in the chapter are forced into making decisions that are not informed by forecast skill often leading to suboptimal forecast application by stakeholders, forecast development by developers and administrators, or forecast production by forecasters.

Instead, we propose that forecast agencies routinely invest in the development and operation of forecast verification capabilities that support data-driven decisions for all stakeholders in and around the hydrologic prediction enterprise. Given that, we believe continual improvements to forecasts would occur as a matter of course through focusing forecaster, developer, and administrator efforts on areas to reduce forecast error and that greater optimization of forecast application would lead to more resilient decision-making by forecast consumers.


  1. American Geophysical Union, AGU statement: investigation of scientists and officials in L’Aquila, Italy, is unfounded. EOS Trans. Am. Geophys. Union 91(28), 248 (2010). Scholar
  2. C. Baldwin, M. Waage, R. Steger, J. Garbrecht, T. Piechota et al., Acclimatizing water managers to climate forecasts through decision experiments, in Climate Variations, Climate Change, and Water Resources Engineering (ASCE, Reston, 2006), pp. 115–131Google Scholar
  3. N.D. Bennett, B.F.W. Croke, G. Guariso, J.H.A. Guillaume, S.H. Hamilton, A.J. Jakeman, S. Marsili-Libelli, L.T.H. Newham, J.P. Norton, C. Perrin, S.A. Pierce, B. Robson, R. Seppelt, A.A. Voinov, B.D. Fath, V. Andreassian, Characterising performance of environmental models. Environ. Model. Softw. 40, 1–20 (2013). Scholar
  4. A.A. Bradley, S.S. Schwartz, Summary verification measures and their interpretation for ensemble forecasts. Mon. Weather Rev. 139(9), 3075–3089 (2011). Scholar
  5. A.A. Bradley, S.S. Schwartz, T. Hashino, Distributions-oriented verification of ensemble streamflow predictions. J. Hydrometeorol. 5(3), 532–545 (2004)CrossRefGoogle Scholar
  6. G.W. Brier, R.A. Allen, Verification of weather forecasts. Compend, in Compendium of Meteorology (1951), pp. 841–848Google Scholar
  7. A.C. Bryant, T.H. Painter, Radiative forcing by desert dust in the Colorado River Basin from 2000 to 2009 inferred from MODIS data, in AGU Fall Meeting Abstracts, vol. 1 (2009), p. 0501. Online Available from Accessed 27 Jan 2015
  8. Bureau of Meteorology, Verification in the Bureau. Framework Report (Bureau of Meteorology, Melbourne, 2015)Google Scholar
  9. L. Cuo, T.C. Pagano, Q.J. Wang, A review of quantitative precipitation forecasts and their use in short-to medium-range streamflow forecasting. J. Hydrometeorol. 12(5), 713–728 (2011)CrossRefGoogle Scholar
  10. J. Demargne, M. Mullusky, K. Werner, T. Adams, S. Lindsey, N. Schwein, W. Marosi, E. Welles, Application of forecast verification science to operational river forecasting in the US National Weather Service. Bull. Am. Meteorol. Soc. 90(6), 779–784 (2009)CrossRefGoogle Scholar
  11. Q. Duan, N.K. Ajami, X. Gao, S. Sorooshian, Multi-model ensemble hydrologic prediction using Bayesian model averaging. Adv. Water Resour. 30(5), 1371–1386 (2007)CrossRefGoogle Scholar
  12. B.A. Faber, J.R. Stedinger, Reservoir optimization using sampling SDP with ensemble streamflow prediction (ESP) forecasts. J. Hydrol. 249(1–4), 113–133 (2001). Scholar
  13. M.H. Glantz, Consequences and responsibilities in drought forecasting: the case of Yakima, 1977. Water Resour. Res. 18(1), 3–13 (1982)CrossRefGoogle Scholar
  14. N. Gordon, J. Shaykewich, Guidelines on Performance Assessment of Public Weather Services (World Meteorological Organization, Geneva, 2000). Online Available from
  15. A. Hamlet, D. Huppert, D. Lettenmaier, Economic value of long-lead streamflow forecasts for Columbia River Hydropower. J. Water Resour. Plan. Manag. 128(2), 91–101 (2002). Scholar
  16. H. Hartmann, R. Bales, S. Sorooshian, Weather, Climate, and Hydrologic Forecasting for the Southwest U.S. (The University of Arizona, Tucson, 1999). Online Available from
  17. H.C. Hartmann, R. Bales, S. Sorooshian, Weather, climate, and hydrologic forecasting for the US Southwest: a survey. Clim. Res. 21(3), 239–258 (2002)CrossRefGoogle Scholar
  18. I.T. Jolliffe, D.B. Stephenson, Forecast Verification: A Practitioner’s Guide in Atmospheric Science (Wiley, 2012). Online Available from Accessed 27 Jan 2015
  19. D. Kahneman, G. Klein, Conditions for intuitive expertise: a failure to disagree. Am. Psychol. 64(6), 515 (2009)CrossRefGoogle Scholar
  20. D. Kavetski, M.P. Clark, Numerical troubles in conceptual hydrology: approximations, absurdities and impact on hypothesis testing. Hydrol. Process. 25(4), 661–670 (2011)CrossRefGoogle Scholar
  21. Y. Liu, A.H. Weerts, M. Clark, H.-J. Hendricks Franssen, S. Kumar, H. Moradkhani, D.-J. Seo, D. Schwanenberg, P. Smith, A. Van Dijk et al., Advancing data assimilation in operational hydrologic forecasting: progresses, challenges, and emerging opportunities. Hydrol. Earth Syst. Sci. 16(10), 3863–3887 (2012)CrossRefGoogle Scholar
  22. B.M. Muir, Trust in automation: Part I. Theoretical issues in the study of trust and human intervention in automated systems. Ergonomics 37(11), 1905–1922 (1994)CrossRefGoogle Scholar
  23. B.M. Muir, N. Moray, Trust in automation. Part II. Experimental studies of trust and human intervention in a process control simulation. Ergonomics 39(3), 429–460 (1996)CrossRefGoogle Scholar
  24. A.H. Murphy, What is a good forecast? An essay on the nature of goodness in weather forecasting. Weather Forecast. 8(2), 281–293 (1993)CrossRefGoogle Scholar
  25. A.H. Murphy, Forecast verification, in Economic Value of Weather and Climate Forecasts (Cambridge University Press, Cambridge, UK/New York/Melbourne, 1997)Google Scholar
  26. N. Nicholls, Cognitive illusions, heuristics, and climate prediction. Bull. Am. Meteorol. Soc. 80(7), 1385–1397 (1999).<1385:CIHACP>2.0.CO;2CrossRefGoogle Scholar
  27. K. O’Grady, L. Shabman, Communicating the probability of Great Lakes water levels and storms, in Proceedings of Great Lakes Water Level Forecast and Statistics Symposium, Windsor (1990) pp. 197–204Google Scholar
  28. T.C. Pagano, Evaluation of Mekong River commission operational flood forecasts, 2000–2012. Hydrol. Earth Syst. Sci. 18(7), 2645–2656 (2014)CrossRefGoogle Scholar
  29. T.C. Pagano, D. Garen, S. Sorooshian, Evaluation of official western US seasonal water supply outlooks, 1922–2002. J. Hydrometeorol. 5(5), 896–909 (2004)CrossRefGoogle Scholar
  30. T.C. Pagano, H. Hapuarachchi, Q.J. Wang, Continuous Soil Moisture Accounting and Routing Modelling to Support Short Lead-Time Streamflow Forecasting (CSIRO Water for a Healthy Country National Research Flagship, Melbourne, 2009)Google Scholar
  31. T.C. Pagano, A.W. Wood, M.-H. Ramos, H.L. Cloke, F. Pappenberger, M.P. Clark, M. Cranston, D. Kavetski, T. Mathevet, S. Sorooshian, J.S. Verkade, Challenges of operational river forecasting. J. Hydrometeorol. (2014). Online Available from Accessed 27 Jan 2015
  32. F. Pappenberger, J. Thielen, M. Del Medico, The impact of weather forecast improvements on large scale hydrology: analysing a decade of forecasts of the European Flood Alert System. Hydrol. Process. 25(7), 1091–1113 (2011)CrossRefGoogle Scholar
  33. F. Pappenberger, M.H. Ramos, H.L. Cloke, F. Wetterhall, L. Alfieri, K. Bogner, A. Mueller, P. Salamon, How do I know if my forecasts are better? Using benchmarks in hydrological ensemble prediction. J. Hydrol. 522, 697–713 (2015a). Scholar
  34. F. Pappenberger, H.L. Cloke, D.J. Parker, F. Wetterhall, D.S. Richardson, J. Thielen, The monetary benefit of early flood warnings in Europe. Environ. Sci. Pol. 51, 278–291 (2015b). Scholar
  35. R. Parasuraman, V. Riley, Humans and automation: use, misuse, disuse, abuse. Hum. Factors J. Hum. Factors Ergon. Soc. 39(2), 230–253 (1997)CrossRefGoogle Scholar
  36. L. Paskus, Why the Silvery Minnow Matters, AlterNet (2003). Online Available from Accessed 27 Jan 2015
  37. C. Perrin, C. Michel, V. Andréassian, Improvement of a parsimonious model for streamflow simulation. J. Hydrol. 279(1), 275–289 (2003)CrossRefGoogle Scholar
  38. R.A. Pielke Jr., Who decides? Forecasts and responsibilities in the 1997 Red River flood. Appl. Behav. Sci. Rev. 7(2), 83–101 (1999)CrossRefGoogle Scholar
  39. M. Pocernich, Appendix: verification software, in Forecast Verification: A Practitioner’s Guide in Atmospheric Science, 2nd edn. (Wiley, Chichester, 2012), pp. 231–240. Online Available from Accessed 27 Jan 2015CrossRefGoogle Scholar
  40. S. Rayner, D. Lach, H. Ingram, Weather forecasts are for wimps: why water resource managers do not use climate forecasts. Clim. Chang. 69(2–3), 197–227 (2005)CrossRefGoogle Scholar
  41. S.L. Rhodes, D. Ely, J.A. Dracup, Climate and the Colorado River: the limits of management. Bull. Am. Meteorol. Soc. 65(7), 682–691 (1984)CrossRefGoogle Scholar
  42. D.E. Robertson, D.L. Shrestha, Q.J. Wang, Post processing rainfall forecasts from numerical weather prediction models for short term streamflow forecasting. Hydrol. Earth Syst. Sci. Discuss. 10(5), 6765–6806 (2013). Scholar
  43. A. Rossa, K. Liechti, M. Zappa, M. Bruen, U. Germann, G. Haase, C. Keil, P. Krahe, The COST 731 action: a review on uncertainty propagation in advanced hydro-meteorological forecast systems. Atmos. Res. 100(2–3), 150–167 (2011). Scholar
  44. D. Sarewitz, R.A. Pielke, R. Byerly, Prediction: Science, Decision Making, and the Future of Nature (Island Press, 2000). Online Available from Accessed 27 Jan 2015
  45. L.J. Skitka, K.L. Mosier, M. Burdick, Does automation bias decision-making? Int. J. Hum. Comput. Stud. 51(5), 991–1006 (1999)CrossRefGoogle Scholar
  46. H.R. Stanski, L.J. Wilson, W.R. Burrows, Survey of Common Verification Methods in Meteorology (World Meteorological Organization, Geneva, 1989). Online Available from Accessed 27 Jan 2015
  47. J.S. Verkade, M.G.F. Werner, Estimating the benefits of single value and probability forecasting for flood warning. Hydrol. Earth Syst. Sci. 15(12), 3751–3765 (2011). Scholar
  48. R.Y. Wang, D.M. Strong, Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12, 5–33 (1996)CrossRefGoogle Scholar
  49. A.H. Weerts, H.C. Winsemius, J.S. Verkade, Estimation of predictive hydrological uncertainty using quantile regression: examples from the National Flood Forecasting System (England and Wales). Hydrol. Earth Syst. Sci. 15(1), 255–265 (2011). Scholar
  50. E. Welles, Verification of River Stage Forecasts (2005). Online Available from Accessed 27 Jan 2015
  51. E. Welles, S. Sorooshian, Scientific verification of deterministic river stage forecasts. J. Hydrometeorol. 10(2), 507–520 (2009)CrossRefGoogle Scholar
  52. E. Welles, S. Sorooshian, G. Carter, B. Olsen, Hydrologic verification: a call for action and collaboration. Bull. Am. Meteorol. Soc. 88(4), 503–511 (2007)CrossRefGoogle Scholar
  53. F. Wetterhall, F. Pappenberger, H.L. Cloke, J. Thielen-del Pozo, S. Balabanova, J. Daňhelka, A. Vogelbacher, P. Salamon, I. Carrasco, A.J. Cabrera-Tordera et al., Forecasters priorities for improving probabilistic flood forecasts. Hydrol. Earth Syst. Sci. Discuss. 10(2), 2215–2242 (2013)CrossRefGoogle Scholar
  54. D.S. Wilks, Statistical Methods in the Atmospheric Sciences (Academic, 2011). Online Available from Accessed 27 Jan2015
  55. World Meteorological Organization, Guide to the Implementation of a Quality Management System for National Meteorological and Hydrological Services (World Meteorological Organization, Geneva, 2013). Online Available from
  56. WWRP/WGNE Joint Working Group on Forecast Verification Research, Forecast Verification: Issues, Method and FAQ (2015). Online Available from Accessed 27 Jan 2015
  57. M. Zappa, K.J. Beven, M. Bruen, A.S. Cofino, K. Kok, E. Martin, P. Nurmi, B. Orfila, E. Roulin, K. Schröter et al., Propagation of uncertainty from observing systems and NWP into hydrological models: COST-731 Working Group 2. Atmos. Sci. Lett. 11(2), 83–91 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.National Weather Service, National Oceanic and Atmospheric AdministrationSalt Lake CityUSA
  2. 2.DeltaresDelftThe Netherlands
  3. 3.Ministry of Infrastructure and the EnvironmentWater Management Centre of the Netherlands, River Forecasting ServiceLelystadThe Netherlands
  4. 4.Delft University of TechnologyDelftThe Netherlands
  5. 5.Bureau of MeteorologyMelbourneAustralia

Personalised recommendations