Skip to main content

Uncertainty Quantification Using the Nearest Neighbor Gaussian Process

  • Chapter
  • First Online:
New Advances in Statistics and Data Science

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

Abstract

Gaussian process has been widely used in areas including geostatistics and uncertainty quantification due to its parsimonious yet flexible representation of a stochastic process. However, analyzing a large data set with Gaussian process can be challenging due to its O(n 3) computational complexity, where n denotes the size of the data set. The recently proposed Nearest Neighbor Gaussian Process (NNGP) aims to approximate a Gaussian process with a target covariance function by using a series of conditional distributions and then exploiting the sparse precision matrices. We demonstrate that NNGP has the potential to be used for uncertainty quantification. We discover that when using NNGP to approximate a Gaussian process with strong smoothness, e.g., the squared-exponential covariance function, Bayesian inference needs to be carried out carefully with marginalizing over the random effects in NNGP. Using simulated and real data, we investigate empirically the performance of NNGP to approximate the squared-exponential covariance function as well as its ability to handle change-of-support effect, a common phenomenon in geostatistics and uncertainty quantification when only aggregated data over space are available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Arendt, P. D., Apley, D. W., & Chen, W. (2012). Quantification of model uncertainty: Calibration, model discrepancy, and identifiability. Journal of Mechanical Design, 134, 100908-100908-12.

    Google Scholar 

  • Banerjee, S., Carlin, B. P., & Gelfand, A. E. (2014). Hierarchical modeling and analysis for spatial data. Boca Raton: CRC Press.

    Google Scholar 

  • Banerjee, S., Gelfand, A. E., Finley, A. O., & Sang, H. (2008). Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society B, 70, 825–848.

    Google Scholar 

  • Berrocal, V. J., Gelfand, A. E., & Holland, D. M. (2010). A spatio-temporal downscaler for output from numerical models. Journal of Agricultural, Biological, and Environmental Statistics, 15, 176–197.

    Google Scholar 

  • Bush, A., Gibson, R., & Thomas, T. (1975). The elastic contact of a rough surface. Wear, 35, 87–111.

    Google Scholar 

  • Craig, P. S., Goldstein, M., Rougier, J. C., & Seheult, A. H. (2001). Bayesian forecasting for complex systems using computer simulators. Journal of the American Statistical Association, 96, 717–729.

    Google Scholar 

  • Cressie, N. (1993). Statistics for spatial data, revised ed. New York: Wiley.

    Google Scholar 

  • Cressie, N. (1996). Change of support and the modifiable areal unit problem. Geographical Systems, 3, 159–180.

    Google Scholar 

  • Cressie, N., & Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 209–226.

    Article  MathSciNet  MATH  Google Scholar 

  • Cressie, N., Shi, T., & Kang, E. K. (2010). Fixed rank filtering for spatio-temporal data. Journal of Computational and Graphical Statistics, 19, 724–745.

    Article  MathSciNet  Google Scholar 

  • Crevillen-Garcia, D., Wilkinson, R. D., Shah, A. A., & Power, H. (2017). Gaussian process modelling for uncertainty quantification in convectively-enhanced dissolution processes in porous media. Advances in Water Resources, 99, 1–14.

    Article  Google Scholar 

  • Currin, C., Mitchell, T, Morris, M., & Ylvisaker, D. (1988). A Bayesian approach to the design and analysis of computer experiments. Technical Report, ORNL498, Oak Ridge Laboratory.

    Google Scholar 

  • Datta, A., Banerjee, S., Finley, A. O., & Gelfand, A. E. (2016). Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111, 800–812.

    Article  MathSciNet  Google Scholar 

  • Emery, X. (2009). The kriging update equations and their application to the selection of neighboring data. Computational Geosciences, 13, 269–280.

    Article  Google Scholar 

  • Furrer, R., Genton, M. G., & Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15, 502–523.

    Article  MathSciNet  Google Scholar 

  • Gneiting, T., Kleiber, W., & Schlather, M. (2010). Matérn cross-covariance functions for multivariate random fields. Journal of the American Statistical Association, 105, 1167–1177.

    Article  MathSciNet  MATH  Google Scholar 

  • Goulard, M., & Voltz, M. (1992). Linear coregionalization model: Tools for estimation and choice of cross-variogram matrix. Mathematical Geology, 24, 269–286.

    Article  Google Scholar 

  • Gramacy, R. B., & Apley, D. W. (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24, 561–578.

    Article  MathSciNet  Google Scholar 

  • Gramacy, R. B., & Lee, H. K. H. (2008). Bayesian treed Gaussian process models with an application to computer modeling. Journal of the American Statistical Association, 103, 1119–1130.

    Article  MathSciNet  MATH  Google Scholar 

  • Greenwood, J. A., & Williamson, J. B. P. (1966). Contact of nominally flat surfaces. In Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences. The Royal Society (Vol. 295, pp. 300–319).

    Google Scholar 

  • Guttorp, P., & Gneiting, T. (2006). Studies in the history of probability and statistics XLIX: On the Matérn correlation family. Biometrika, 93, 989–995.

    Article  MathSciNet  MATH  Google Scholar 

  • Higdon, D., Nakhleh, C., Gattiker, J., & Williams, B. (2008). A Bayesian calibration approach to the thermal problem. Computer Methods in Applied Mechanics and Engineering, 1976, 2431–2441.

    Article  MATH  Google Scholar 

  • Kaufman, C. G., & Shaby, B. A. (2013). The role of the range parameter for estimation and prediction in geostatistics. Biometrika, 100, 473–484.

    Article  MathSciNet  MATH  Google Scholar 

  • Kennedy, M. C., & O’Hagan, A. (2000). Predicting the output from a complex computer code when fast approximations are available. Biometrika, 87, 1–13.

    Article  MathSciNet  MATH  Google Scholar 

  • Kennedy, M. C., & O’Hagan, A. (2001). Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63, 425–464.

    Article  MathSciNet  MATH  Google Scholar 

  • Konomi, B., Sang, H., & Mallick, B. (2014). Adaptive Bayesian nonstationary modeling for large spatial datasets using covariance approximations. Journal of Computational and Graphical Statistics, 23, 802–829.

    Article  MathSciNet  Google Scholar 

  • Liu, F., Bayarri, M. J., & Berger, J. O. (2009). Modularization in Bayesian analysis, with emphasis on analysis of computer models. Bayesian Analysis, 4, 119–150.

    Article  MathSciNet  MATH  Google Scholar 

  • Nguyen, H., Cressie, N., & Braverman, A. (2012). Spatial statistical data fusion for remote sensing applications. Journal of the American Statistical Association, 107, 1004–1018.

    Article  MathSciNet  MATH  Google Scholar 

  • Ohio Supercomputer Center (OSC). (1987). Columbus OH: Ohio Supercomputer Center. http://osc.edu/ark:/19495/f5s1ph73

  • Peng, C. Y., & Wu, J. (2004). On the choice of nugget in kriging modeling for deterministic computer experiments. Journal of Computational and Graphical Statistics, 23, 151–168.

    Article  MathSciNet  Google Scholar 

  • Perdikaris, P., Venturi, D., Royset, J. O., & Karniadakis, G. E. (2015). Multi-fidelity modelling via recursive co-kriging and Gaussian Markov random fields. Proceedings of the Royal Society of London A, 471, 20150018.

    Article  Google Scholar 

  • Qian, P. Z. G., Wu, H., & Wu, C. F. J. (2008). Gaussian process Models for computer experiments with qualitative and quantitative factors. Technometrics, 50, 383–396.

    Article  MathSciNet  Google Scholar 

  • Rue, H., & Held, L. (2005). Gaussian Markov random fields: Theory and applications. Boca Raton: Chapman and Hall.

    Book  MATH  Google Scholar 

  • Sacks, J., Welch, W. J., Mitchell, T. J., & Wynn, H. P. (1989). Design and analysis of computer experiments. Statistical Science, 4, 409–423.

    Article  MathSciNet  MATH  Google Scholar 

  • Santner, T. J., Williams, B. J., & Notz, W. I. (2013). The design and analysis of computer experiments. New York: Springer Science & Business Media.

    MATH  Google Scholar 

  • Sista, B., & Vemaganti, K. (2014). Estimation of statistical parameters of rough surfaces suitable for developing micro-asperity friction models. Wear, 316, 6–18.

    Article  Google Scholar 

  • Stein, M. L. (1999). Interpolation of spatial data: Some theory for kriging. New York: Springer.

    Book  MATH  Google Scholar 

  • Tworzydlo, W. W., Cecot, W., Oden, J. T., & Yew, C. H. (1988). Computational micro-and macroscopic models of contact and friction: Formulation, approach and applications. Wear, 220, 113–140.

    Article  Google Scholar 

  • Wackernagel, H. (2003). Multivariate geostatistics: An introduction with applications, 3rd ed. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Zaytsev, V., Biver, P., Wachernagel, H., & Allard, D. (2016). Change-of-support models on irregular grids for geostatistical simulation. Mathematical Geosciences, 48, 353–369.

    Article  MathSciNet  Google Scholar 

  • Zhou, Q., Qian, P. Z. G., & Zhou, S. (2011). A simple approach to emulation for computer models with qualitative and quantitative factors. Technometrics, 53, 266–273.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported in part by an allocation of computing time from the Ohio Supercomputer Center (OSC 1987). Shi’s research was supported by the Taft Research Center at the University of Cincinnati. Kang’s research was partially supported by the Simons Foundation’s Collaboration Award (#317298) and the Taft Research Center at the University of Cincinnati. Vemaganti’s work was partially supported by the University of Cincinnati Simulation Center.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emily L. Kang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Shi, H., Kang, E.L., Konomi, B.A., Vemaganti, K., Madireddy, S. (2017). Uncertainty Quantification Using the Nearest Neighbor Gaussian Process. In: Chen, DG., Jin, Z., Li, G., Li, Y., Liu, A., Zhao, Y. (eds) New Advances in Statistics and Data Science. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-69416-0_6

Download citation

Publish with us

Policies and ethics