Relative model score: a scoring rule for evaluating ensemble simulations with application to microbial soil respiration modeling
- 110 Downloads
This paper defines a new scoring rule, namely relative model score (RMS), for evaluating ensemble simulations of environmental models. RMS implicitly incorporates the measures of ensemble mean accuracy, prediction interval precision, and prediction interval reliability for evaluating the overall model predictive performance. RMS is numerically evaluated from the probability density functions of ensemble simulations given by individual models or several models via model averaging. We demonstrate the advantages of using RMS through an example of soil respiration modeling. The example considers two alternative models with different fidelity, and for each model Bayesian inverse modeling is conducted using two different likelihood functions. This gives four single-model ensembles of model simulations. For each likelihood function, Bayesian model averaging is applied to the ensemble simulations of the two models, resulting in two multi-model prediction ensembles. Predictive performance for these ensembles is evaluated using various scoring rules. Results show that RMS outperforms the commonly used scoring rules of log-score, pseudo Bayes factor based on Bayesian model evidence (BME), and continuous ranked probability score (CRPS). RMS avoids the problem of rounding error specific to log-score. Being applicable to any likelihood functions, RMS has broader applicability than BME that is only applicable to the same likelihood function of multiple models. By directly considering the relative score of candidate models at each cross-validation datum, RMS results in more plausible model ranking than CRPS. Therefore, RMS is considered as a robust scoring rule for evaluating predictive performance of single-model and multi-model prediction ensembles.
KeywordsScoring rule Continuous ranked probability score Bayes factor Log-score Dispersion Reliability
This work was supported by the Department of Energy Early Career Award DE-SC0008272 and NSF-EAR Grant 1552329.
- Anderson MP, Woessner WW (1992) Applied groundwater modeling: simulation of flow and advective transport, 2nd edn. Academic, LondonGoogle Scholar
- Good IJ (1952) Decisions. J R Stat Soc Ser B 14(1):107–114Google Scholar
- Heath MT (1997) Scientific computing: an introductory survey. McGraw-Hill, BostonGoogle Scholar
- Poeter EP, Hill MC, Banta ER, Mehl SW, Christensen S (2005) UCODE_2005 and six other computer codes for universal sensitivity analysis, inverse modeling, and uncertainty evaluation. U.S. Geological Survey Techniques and Methods, 6-A11Google Scholar
- Schoups G, Vrugt JA (2010) A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non-Gaussian errors. Water Resour Res 46(10):W10531Google Scholar
- Shrestha DL (2014) Continuous rank probability score, MathWorks File Exchange. https://www.mathworks.com/matlabcentral/fileexchange/47807-continuous-rank-probability-score/content/crps.m. Last checked 8 Feb 2017
- Silverman BW (1998) Density estimation for statistics and data analysis. Chapman & Hall, Boca Raton, p 176Google Scholar
- Smith RC (2014) Uncertainty quantification: theory, implementation, and applications. Computational science and engineering series, vol XVIII. Society for Industrial and Applied Mathematics, Philadelphia, p 382 sGoogle Scholar
- Smith MW, Bracken LJ, Cox NJ (2010) Toward a dynamic representation of hydrological connectivity at the hillslope scale in semiarid areas. Water Resour Res 46(12):W12540Google Scholar
- Wenger SJ, Som NA, Dauwalter DC, Isaak DJ, Neville HM, Luce CH, Dunham JB, Young MK, Fausch KD, Rieman BE (2013) Probabilistic accounting of uncertainty in forecasts of species distributions under climate change. Glob Change Biol 19(11):3343–3354Google Scholar
- Ye M, Neuman SP, Meyer PD (2004) Maximum likelihood Bayesian averaging of spatial variability models in unsaturated fractured tuff. Water Resour Res 40(5):W05113Google Scholar