Emulation Engines: Choice and Quantification of Uncertainty for Complex Hydrological Models

  • Daniel W. Gladish
  • Daniel E. Pagendam
  • Luk J. M. Peeters
  • Petra M. Kuhnert
  • Jai Vaze


Complex, mechanistic hydrological models can be computationally expensive, have large numbers of input parameters, and generate multivariate output. Model emulators can be constructed to approximate these complex models with substantial computational savings, making activities such as sensitivity analysis, calibration and uncertainty analysis feasible. Success in the use of an emulator relies on it making accurate and precise predictions of the model output. However, it is often unclear what type of emulation approach will be suitable. We present a comparison of reduced-rank, multivariate emulators built upon different ‘emulation engines’ and apply them to the Australian Water Resource Assessment System model. We examine first-order and second-order approaches which focus on specifying the mean and covariance, respectively. We also introduce a nonparametric approach for quantifying the uncertainty associated with the emulated prediction where this has bounded support. Our results demonstrate that emulation engines based on second-order approaches, such as Gaussian processes, can be computationally burdensome and may be comparable in performance to computationally efficient, first-order methods such as random forests.Supplementary materials accompanying this paper appear online.


Surrogate model Meta-model Random forests Gaussian processes AWRA Reduced-rank multivariate statistical emulator 

Supplementary material

13253_2017_308_MOESM1_ESM.r (8 kb)
Supplementary material 1 (R 8 KB)
13253_2017_308_MOESM2_ESM.pdf (48 kb)
Supplementary material 2 (pdf 48 KB)


  1. Asher, M. J., Croke, B. F. W., Jakeman, A. J., and Peeters, L. J. M. (2015). A review of surrogate models and their application to groundwater modeling. Water Resources Research, 51(8):5957–5973.CrossRefGoogle Scholar
  2. Bastos, L. and O’Hagan, A. (2009). Diagnostics for Gaussian process emulators. Technometrics, 51(4):425–438. cited By 80.MathSciNetCrossRefGoogle Scholar
  3. Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.CrossRefMATHGoogle Scholar
  4. Conti, S. and O’Hagan, A. (2010). Bayesian emulation of complex multi-output and dynamic computer models. Journal of Statistical Planning and Inference, 140(3):640–651. cited By 62.MathSciNetCrossRefMATHGoogle Scholar
  5. Cressie, N. and Wikle, C. K. (2011). Statistics for spatio-temporal data. John Wiley & Sons.Google Scholar
  6. Frolov, S., Baptista, A., Leen, T., Lu, Z., and van der Merwe, R. (2009). Fast data assimilation using a nonlinear kalman filter and a model surrogate: An application to the columbia river estuary. Dynamics of Atmospheres and Oceans, 48(1–3):16–45. cited By 15.CrossRefGoogle Scholar
  7. Gramacy, R. and Apley, D. (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24(2):561–578. cited By 1.MathSciNetCrossRefGoogle Scholar
  8. Gramacy, R. and Lee, H. (2007). tgp: an R package for Bayesian nonstationary, semiparametric nonlinear regression ad design by treed Gaussian process models. Journal of Statistical Software, 19(9):1–46.CrossRefGoogle Scholar
  9. Gramacy, R. and Lee, H. (2008a). Bayesian treed Gaussian process models with an application to computer modeling. Journal of the American Statistical Association, 103(483):1119–1130. cited By 133.MathSciNetCrossRefMATHGoogle Scholar
  10. —— (2008b). Gaussian processes and limiting linear models. Computational Statistics and Data Analysis, 53:123–136.Google Scholar
  11. Gramacy, R. B. (2016). laGP: Large-scale spatial modeling via local approximate gaussian processes in R. Journal of Statistical Software, 72(1):1–46.MathSciNetCrossRefGoogle Scholar
  12. Hastie, T. and Tibshirani, R. (1986). Generalized additive models. Statistical Science, 1:297–310.MathSciNetCrossRefMATHGoogle Scholar
  13. Higdon, D., Gattiker, J., Williams, B., and Rightley, M. (2008). Computer model calibration using high-dimensional output. Journal of the American Statistical Association, 103(482):570–583. cited By 168.MathSciNetCrossRefMATHGoogle Scholar
  14. Hooten, M., Leeds, W., Fiechter, J., and Wikle, C. (2011). Assessing first-order emulator inference for physical parameters in nonlinear mechanistic models. Journal of Agricultural, Biological, and Environmental Statistics, 16(4):475–494. cited By 13.MathSciNetCrossRefMATHGoogle Scholar
  15. Kennedy, M. and O’Hagan, A. (2001). Bayesian calibration of computer models. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 63(3):425–450. cited By 711.MathSciNetCrossRefMATHGoogle Scholar
  16. Leeds, W., Wikle, C., and Fiechter, J. (2014). Emulator-assisted reduced-rank ecological data assimilation for nonlinear multivariate dynamical spatio-temporal processes. Statistical Methodology, 17(0):126–138. Modern Statistical Methods in Ecology.Google Scholar
  17. Leeds, W., Wikle, C., Fiechter, J., Brown, J., and Milliff, R. (2013). Modeling 3-d spatio-temporal biogeochemical processes with a forest of 1-d statistical emulators. Environmetrics, 24(1):1–12. cited By 6.MathSciNetCrossRefGoogle Scholar
  18. Liu, F. and West, M. (2009). A dynamic modelling strategy for Bayesian computer model emulation. Bayesian Analysis, 4(2):393–412. cited By 23.MathSciNetCrossRefMATHGoogle Scholar
  19. Lorenz, E. (1956). Empirical orthogonal functions and statistical weather prediction, statistical forecasting project. Statistical Forecasting Project - Scientific Report No. 1, 49pp.Google Scholar
  20. Luo, J. and Lu, W. (2014). Comparison of surrogate models with different methods in groundwater remediation process. Journal of Earth System Science, 123(7):1579–1589.CrossRefGoogle Scholar
  21. Machac, D., Reichert, P., Rieckermann, J., and Albert, C. (2016). Fast mechanism-based emulator of a slow urban hydrodynamic drainage simulator. Environmental Modelling & Software, 78:54–67.CrossRefGoogle Scholar
  22. Mara, T. and Joseph, O. (2008). Comparison of some efficient methods to evaluate the main effect of computer model factors. Journal of Statistical Computation and Simulation, 78(2):167–178. cited By 8.MathSciNetCrossRefMATHGoogle Scholar
  23. Oakley, J. and O’Hagan, A. (2004). Probabilistic sensitivity analysis of complex models: A Bayesian approach. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 66(3):751–769. cited By 378.MathSciNetCrossRefMATHGoogle Scholar
  24. O’Hagan, A. (2006). Bayesian analysis of computer code outputs: A tutorial. Reliability Engineering and System Safety, 91(10-11):1290–1300. cited By 173.CrossRefGoogle Scholar
  25. Paciorek, C., Lipshitz, B., Zhu, W., Prabhat, P., Kaufman, C., and Thomas, R. (2015). Parallelizing Gaussian process calculations in R. Journal of Statistical Software, 63(10):1–23. cited By 1.CrossRefGoogle Scholar
  26. Preisendorfer, R. (1988). Principal component analysis in meteorology and oceanography. Elsevier. cited By 919.Google Scholar
  27. R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.Google Scholar
  28. Rasmussen, C. E. and Williams, C. K. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, USA.MATHGoogle Scholar
  29. Razavi, S., Tolson, B. A., and Burn, D. H. (2012). Review of surrogate modeling in water resources. Water Resources Research, 48(7):n/a–n/a. W07401.Google Scholar
  30. Reichert, P., White, G., Bayarri, M., and Pitman, E. (2011). Mechanism-based emulation of dynamic simulation models: Concept and application in hydrology. Computational Statistics & Data Analysis, 55(4):1638–1655.MathSciNetCrossRefMATHGoogle Scholar
  31. Rougier, J. (2008). Efficient emulators for multivariate deterministic functions. Journal of Computational and Graphical Statistics, 17(4):827–843. cited By 50.MathSciNetCrossRefGoogle Scholar
  32. Sacks, J., William, J., Mitchell, T., and Wynn, H. (1989). Design and analysis of computer experiments. Statist. Sci., 4(4):409–423.MathSciNetCrossRefMATHGoogle Scholar
  33. Schnorbus, M. A. and Cannon, A. J. (2014). Statistical emulation of streamflow projections from a distributed hydrological model: Application to cmip3 and cmip5 climate projections for british columbia, canada. Water Resources Research, 50(11):8907–8926.CrossRefGoogle Scholar
  34. Sobol’, I. (1967). On the distribution of points in a cube and the approximate evaluation of integrals. USSR Computational Mathematics and Mathematical Physics, 7(4):86–112. cited By 594.MathSciNetCrossRefMATHGoogle Scholar
  35. Sparnocchia, S., Pinardi, N., and Demirov, E. (2003). Multivariate empirical orthogonal function analysis of the upper thermocline structure of the mediterranean sea from observations and model simulations. Annales Geophysicae, 21(1 PART I):167–187. cited By 0.Google Scholar
  36. Stanfill, B., Mielenz, H., Clifford, D., and Thorburn, P. (2015). Simple approach to emulating complex computer models for global sensitivity analysis. Environmental Modelling & Software, 74:140–155.CrossRefGoogle Scholar
  37. Storlie, C., Swiler, L., Helton, J., and Sallaberry, C. (2009). Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models. Reliability Engineering and System Safety, 94(11):1735–1763. cited By 126.CrossRefGoogle Scholar
  38. Strong, M., Oakley, J., and Brennan, A. (2014). Estimating multiparameter partial expected value of perfect information from a probabilistic sensitivity analysis sample: A nonparametric regression approach. Medical Decision Making, 34(3):311–326. cited By 6.CrossRefGoogle Scholar
  39. Sudret, B. (2008). Global sensitivity analysis using polynomial chaos expansions. Reliability Engineering & System Safety, 93(7):964–979. Bayesian Networks in Dependability.Google Scholar
  40. van der Merwe, R., Leen, T., Lu, Z., Frolov, S., and Baptista, A. (2007). Fast neural network surrogates for very high dimensional physics-based models in computational oceanography. Neural Networks, 20(4):462–478. cited By 24.CrossRefGoogle Scholar
  41. Vaze, J., Viney, N., Stenson, M., Renzullo, L., Van Dijk, A., Dutta, D., Crosbie, R., Lerat, J., Penton, D., Vleeshouwer, J., Peeters, L., Teng, J., Kim, S., Hughes, J., Dawes, W., Zhang, Y., Leighton, B., Perraud, J.-M., Joehnk, K., Yang, A., Wang, B., Frost, A., Elmahdi, A., Smith, A., and Daamen, C. (2013). The australian water resource assessment modelling system (awra). In Piantadosi, J., Anderssen, R., and Boland, J., editors, MODSIM2013, 20th International Congress on Modelling and Simulation. Modelling and Simulation Society of Australia and New Zealand.Google Scholar
  42. Viney, N., Vaze, J., Crosbie, R., Wang, B., Dawes, W., and Frost, A. (2014). AWRA-L v4.5: technical description of model algorithms and inputs. CSIRO.Google Scholar
  43. Wikle, C. (2015). Modern perspectives on statistics for spatio-temporal data. Wiley Interdisciplinary Reviews: Computational Statistics, 7(1):86–98. cited By 0.MathSciNetCrossRefGoogle Scholar
  44. Wood, S. (2006). Generalized Additive Models: an Introduction with R. CRC press.Google Scholar
  45. Zhan, C.-s., Song, X.-m., Xia, J., and Tong, C. (2013). An efficient integrated approach for global sensitivity analysis of hydrological model parameters. Environmental Modelling & Software, 41:39–52.CrossRefGoogle Scholar
  46. Zhang, Y., Viney, N., Chen, Y., and Li, H. Y. (2011). Collation of streamflow data for 719 unregulated australian catchments. Technical report, CSIRO: Water for a Healthy Country National Research Flagship.Google Scholar

Copyright information

© International Biometric Society 2017

Authors and Affiliations

  • Daniel W. Gladish
    • 1
  • Daniel E. Pagendam
    • 1
  • Luk J. M. Peeters
    • 2
  • Petra M. Kuhnert
    • 3
  • Jai Vaze
    • 4
  1. 1.CSIRO Data61BrisbaneAustralia
  2. 2.CSIRO Land and Water, P.M.B. 2Glen OsmondAustralia
  3. 3.CSIRO Data61CanberraAustralia
  4. 4.CSIRO Land and WaterCanberraAustralia

Personalised recommendations