Skip to main content

Many artificial intelligence algorithms or models are ultimately designed for prediction. A prediction algorithm, wherever it may reside—in a computer, or in a forecaster's head—is subject to a set of tests aimed at assessing its goodness. The specific choice of the tests is contingent on many factors, including the nature of the problem, and the specific facet of goodness. This chapter will discuss some of these tests. For a more in-depth exposure, the reader is directed to the references, and two books: Wilks (1995) and Jolliffe and Stephenson (2003). The body of knowledge aimed at assessing the goodness of predictions is referred to as performance assessment in most fields; in atmospheric circles, though, it is generally called verification. In this chapter, I consider only a few of the numerous performance measures considered in the literature, but my emphasis is on ways of assessing their uncertainty (i.e., statistical significance).

Here, prediction (or forecast) does not necessarily refer to the prediction of the future state of some variable. It refers to the estimation of the state of some variable, from information on another variable. The two variables may be contemporaneous, or not. What is required, however, is that the data on which the performance of the algorithm is being assessed is as independent as possible from the data on which the algorithm is developed or fine-tuned; otherwise, the performance will be optimistically biased—and that is not a good thing; see Section 2.6 in Chapter 2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Baldwin, M. E., Lakshmivarahan, S., & Kain, J. S. (2002). Development of an “events-oriented” approach to forecast verification. 15th Conference, Numerical Weather Prediction, San Antonio, TC, August 12–16, 2002. Available at http://www.nssl.noaa.gov/mag/pubs/nwp15verf.pdf

  • Brown, B. G., Bullock, R., Davis, C. A., Gotway, J. H., Chapman, M., Takacs, A., Gilleland, E., Mahoney, J. L., & Manning, K. (2004). New verification approaches for convective weather forecasts. Preprints, 11th Conference on Aviation, Range, and Aerospace, Hyannis, MA, October 3–8

    Google Scholar 

  • Casati, B., Ross, G., & Stephenson, D. B. (2004). A new intensity-scale approach for the verification of spatial precipitation forecasts. Meteorological Applications, 11, 141–154

    Article  Google Scholar 

  • Devore, J., & Farnum, N. (2005). Applied statistics for engineers and scientists. Belmont, CA: Thomson Learning

    Google Scholar 

  • Doswell, C. A., III, Davies-Jones, R., & Keller, D. (1990). On summary measures of skill in rare event forecasting based on contingency tables. Weather and Forecasting, 5, 576–585

    Article  Google Scholar 

  • Du, J., & Mullen, S. L. (2000). Removal of distortion error from an ensemble forecast. Monthly Weather Review, 128, 3347– 3351

    Article  Google Scholar 

  • Ebert, E. E., & McBride, J. L. (2000). Verification of precipitation in weather systems: Determination of systematic errors. Journal of Hydrology, 239, 179–202

    Article  Google Scholar 

  • Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. London: Chapman & Hall

    Google Scholar 

  • Fawcett, T. (2006). An introduction to ROC analysis.Pattern Recognition Letters, 27, 861–874

    Article  Google Scholar 

  • Ferro, C. (2007). Comparing probabilistic forecasting systems with the Brier score. Weather and Forecasting, 22, 1076– 1088

    Article  Google Scholar 

  • Gandin, L. S., & Murphy, A. (1992). Equitable skill scores for categorical forecasts. Monthly Weather Review, 120, 361– 370

    Article  Google Scholar 

  • Gerrity, J. P. Jr. (1992). A note on Gandin and Murphy's equitable skill score. Monthly Weather Review, 120, 2707–2712

    Article  Google Scholar 

  • Glahn, H. R., Lowry, D. A. (1972). The use of Model Output Statistics (MOS) in objective weather forecasting. Journal of Applied Meteorology, 11, 1203–1211

    Article  Google Scholar 

  • Gneiting, T., & Raftery, A. E. (2005). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102, (477), 359–378

    Article  Google Scholar 

  • Gneiting, T., Raftery, A. E., Westveld, A. H., & Goldman, T. (2005). Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Monthly Weather Review, 133, 1098–1118

    Article  Google Scholar 

  • Gneiting, T., Balabdaoui, F., Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(2), 243–268

    Article  Google Scholar 

  • Good, P. I. (2005a). Permutation, parametric and bootstrap tests of hypotheses (3rd ed.). New York: Springer. ISBN 0-387-98898-X

    Google Scholar 

  • Good, P. I. (2005b). Introduction to statistics through resampling methods and R/S-PLUS. New Jersey, Canada: Wiley. ISBN 0-471-71575-1

    Google Scholar 

  • Hamill, T. M. (1997). Reliability diagrams for multicategory probabilistic forecasts. Weather and Forecasting, 12, 736– 741

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Springer Series in Statistics. New York: Springer

    Google Scholar 

  • Heidke, P. (1926). Berechnung des Erfolges und der Gute der Windstarkevorhersagen im Sturmwarnungsdienst. Geogra Ann. 8, 301–349

    Article  Google Scholar 

  • Jolliffe, I. T. (2007). Uncertainty and inference for verification measures. Weather and Forecasting, 22, 633–646

    Article  Google Scholar 

  • Jolliffe, I. T., & Stephenson, D. B. (2003). Forecast verification: A practitioner's guide in atmospheric science. Chichester Wiley

    Google Scholar 

  • Livezey in Jolliffe, I. T., & Stephenson, D. B. (2003). Forecast verification: A practitioner's guide in atmospheric science. West Sussex, England: Wiley. See chapter 4 concerning categorical events, written by R. E. Livezey

    Google Scholar 

  • Macskassy, S. A., Provost, F. (2004). Confidence bands for ROC curves: Methods and an empirical study. First workshop on ROC analysis in AI, ECAI-2004, Spain

    Google Scholar 

  • Marzban, C. (1998). Scalar measures of performance in rare-event situations. Weather and Forecasting, 13, 753–763

    Article  Google Scholar 

  • Marzban, C. (2004). The ROC curve and the area under it as a performance measure. Weather and Forecasting, 19(6), 1106–1114

    Article  Google Scholar 

  • Marzban, C., & Lakshmanan, V. (1999). On the uniqueness of Gandin and Murphy's equitable performance measures. Monthly Weather Review, 127(6), 1134–1136

    Article  Google Scholar 

  • Marzban, C., Sandgathe, S. (2006). Cluster analysis for verification of precipitation fields. Weather and Forecasting, 21(5), 824–838

    Article  Google Scholar 

  • Marzban, C., & Sandgathe, S. (2008). Cluster analysis for object-oriented verification of fields: A variation. Monthly Weather Review, 136, 1013–1025

    Article  Google Scholar 

  • Marzban, C., & Stumpf, G. J. (1998). A neural network for damaging wind prediction. Weather and Forecasting, 13, 151– 163

    Article  Google Scholar 

  • Marzban, C., & Witt, A. (2001). A Bayesian neural network for hail size prediction. Weather and Forecasting, 16(5), 600– 610

    Article  Google Scholar 

  • Marzban, C., Sandgathe, S., & Lyons, H. (2008). An object-oriented verification of three NWP model formulations via cluster analysis: An objective and a subjective analysis. Monthly Weather Review, 136, 3392–3407

    Article  Google Scholar 

  • Murphy, A. H. (1991). Forecast verification: Its complexity and dimensionality. Monthly Weather Review, 119, 1590–1601

    Article  Google Scholar 

  • Murphy, A. H. (1993). What is a good forecast? An essay on the nature of goodness in weather forecasting. Weather and Forecasting, 8, 281–293

    Article  Google Scholar 

  • Murphy, A. H., & Epstein, E. S. (1967). A note on probabilistic forecasts and “hedging”. Journal of Applied Meteorology, 6, 1002–1004

    Article  Google Scholar 

  • Murphy, A. H., & Winkler, R. L. (1987). A general framework for forecast verification. Monthly Weather Review, 115, 1330–1338

    Article  Google Scholar 

  • Murphy, A. H., & Winkler, R. L. (1992). Diagnostic verification of probability forecasts. International Journal of Forecasting, 7, 435–455

    Article  Google Scholar 

  • Nachamkin, J. E. (2004). Mesoscale verification using meteorological composites. Monthly Weather Review, 132, 941–955

    Article  Google Scholar 

  • Raftery, A. E., Gneiting, T., Balabdaoui, F., & Polakowski, M. (2005). Using Bayesian model averaging to calibrate forecast ensembles. Monthly Weather Review, 133, 1155–1174

    Article  Google Scholar 

  • Richardson, D. S. (2000). Skill and relative economic value of the ECMWF ensemble prediction system. Quarterly Journal of the Royal Meteorological Society, 126, 649–667

    Article  Google Scholar 

  • Roebber, P. J., & Bosart, L. F. (1996). The complex relationship between forecast skill and forecast value: A real-world analysis. Weather and Forecasting, 11, 544–559

    Article  Google Scholar 

  • Roulston, M. S., & Smith, L. A. (2002). Evaluating probabilistic forecasts using information theory. Monthly Weather Review, 130, 1653–1660

    Article  Google Scholar 

  • Seaman, R., Mason, I., & Woodcock, F. (1996). Confidence intervals for some performance measures of Yes-No forecasts. Australian Meteorological Magazine, 45, 49–53

    Google Scholar 

  • Stephenson, D. B., Casati, B., & Wilson, C. (2004). Verification of rare extreme events. WMO verification workshop, Montreal, September 13–17

    Google Scholar 

  • Venugopal, V., Basu, S., & Foufoula-Georgiou, E. (2005). A new metric for comparing precipitation patterns with an application to ensemble forecasts. Journal of Geophysical Research, 110, D8, D08111 DOI: 10.1029/2004JD005395

    Google Scholar 

  • Wilks, D. S. (1995). Statistical methods in the atmospheric sciences (467 pp.). San Diego, CA: Academic Press

    Google Scholar 

  • Wilks, D. S. (2001). A skill score based on economic value for probability forecasts. Meteorological Applications, 8, 209– 219

    Article  Google Scholar 

  • Wilson, L. J., Burrows, W. R., & Lanzinger, A. (1999). A strategy for verification of weather element forecasts from an ensemble prediction system. Monthly Weather Review, 127, 956–970

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Caren Marzban .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media B.V

About this chapter

Cite this chapter

Marzban, C. (2009). Performance Measures and Uncertainty. In: Haupt, S.E., Pasini, A., Marzban, C. (eds) Artificial Intelligence Methods in the Environmental Sciences. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-9119-3_3

Download citation

Publish with us

Policies and ethics