Abstract
Gaussian process has been widely used in areas including geostatistics and uncertainty quantification due to its parsimonious yet flexible representation of a stochastic process. However, analyzing a large data set with Gaussian process can be challenging due to its O(n 3) computational complexity, where n denotes the size of the data set. The recently proposed Nearest Neighbor Gaussian Process (NNGP) aims to approximate a Gaussian process with a target covariance function by using a series of conditional distributions and then exploiting the sparse precision matrices. We demonstrate that NNGP has the potential to be used for uncertainty quantification. We discover that when using NNGP to approximate a Gaussian process with strong smoothness, e.g., the squared-exponential covariance function, Bayesian inference needs to be carried out carefully with marginalizing over the random effects in NNGP. Using simulated and real data, we investigate empirically the performance of NNGP to approximate the squared-exponential covariance function as well as its ability to handle change-of-support effect, a common phenomenon in geostatistics and uncertainty quantification when only aggregated data over space are available.
References
Arendt, P. D., Apley, D. W., & Chen, W. (2012). Quantification of model uncertainty: Calibration, model discrepancy, and identifiability. Journal of Mechanical Design, 134, 100908-100908-12.
Banerjee, S., Carlin, B. P., & Gelfand, A. E. (2014). Hierarchical modeling and analysis for spatial data. Boca Raton: CRC Press.
Banerjee, S., Gelfand, A. E., Finley, A. O., & Sang, H. (2008). Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society B, 70, 825–848.
Berrocal, V. J., Gelfand, A. E., & Holland, D. M. (2010). A spatio-temporal downscaler for output from numerical models. Journal of Agricultural, Biological, and Environmental Statistics, 15, 176–197.
Bush, A., Gibson, R., & Thomas, T. (1975). The elastic contact of a rough surface. Wear, 35, 87–111.
Craig, P. S., Goldstein, M., Rougier, J. C., & Seheult, A. H. (2001). Bayesian forecasting for complex systems using computer simulators. Journal of the American Statistical Association, 96, 717–729.
Cressie, N. (1993). Statistics for spatial data, revised ed. New York: Wiley.
Cressie, N. (1996). Change of support and the modifiable areal unit problem. Geographical Systems, 3, 159–180.
Cressie, N., & Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 209–226.
Cressie, N., Shi, T., & Kang, E. K. (2010). Fixed rank filtering for spatio-temporal data. Journal of Computational and Graphical Statistics, 19, 724–745.
Crevillen-Garcia, D., Wilkinson, R. D., Shah, A. A., & Power, H. (2017). Gaussian process modelling for uncertainty quantification in convectively-enhanced dissolution processes in porous media. Advances in Water Resources, 99, 1–14.
Currin, C., Mitchell, T, Morris, M., & Ylvisaker, D. (1988). A Bayesian approach to the design and analysis of computer experiments. Technical Report, ORNL498, Oak Ridge Laboratory.
Datta, A., Banerjee, S., Finley, A. O., & Gelfand, A. E. (2016). Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111, 800–812.
Emery, X. (2009). The kriging update equations and their application to the selection of neighboring data. Computational Geosciences, 13, 269–280.
Furrer, R., Genton, M. G., & Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15, 502–523.
Gneiting, T., Kleiber, W., & Schlather, M. (2010). Matérn cross-covariance functions for multivariate random fields. Journal of the American Statistical Association, 105, 1167–1177.
Goulard, M., & Voltz, M. (1992). Linear coregionalization model: Tools for estimation and choice of cross-variogram matrix. Mathematical Geology, 24, 269–286.
Gramacy, R. B., & Apley, D. W. (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24, 561–578.
Gramacy, R. B., & Lee, H. K. H. (2008). Bayesian treed Gaussian process models with an application to computer modeling. Journal of the American Statistical Association, 103, 1119–1130.
Greenwood, J. A., & Williamson, J. B. P. (1966). Contact of nominally flat surfaces. In Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences. The Royal Society (Vol. 295, pp. 300–319).
Guttorp, P., & Gneiting, T. (2006). Studies in the history of probability and statistics XLIX: On the Matérn correlation family. Biometrika, 93, 989–995.
Higdon, D., Nakhleh, C., Gattiker, J., & Williams, B. (2008). A Bayesian calibration approach to the thermal problem. Computer Methods in Applied Mechanics and Engineering, 1976, 2431–2441.
Kaufman, C. G., & Shaby, B. A. (2013). The role of the range parameter for estimation and prediction in geostatistics. Biometrika, 100, 473–484.
Kennedy, M. C., & O’Hagan, A. (2000). Predicting the output from a complex computer code when fast approximations are available. Biometrika, 87, 1–13.
Kennedy, M. C., & O’Hagan, A. (2001). Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63, 425–464.
Konomi, B., Sang, H., & Mallick, B. (2014). Adaptive Bayesian nonstationary modeling for large spatial datasets using covariance approximations. Journal of Computational and Graphical Statistics, 23, 802–829.
Liu, F., Bayarri, M. J., & Berger, J. O. (2009). Modularization in Bayesian analysis, with emphasis on analysis of computer models. Bayesian Analysis, 4, 119–150.
Nguyen, H., Cressie, N., & Braverman, A. (2012). Spatial statistical data fusion for remote sensing applications. Journal of the American Statistical Association, 107, 1004–1018.
Ohio Supercomputer Center (OSC). (1987). Columbus OH: Ohio Supercomputer Center. http://osc.edu/ark:/19495/f5s1ph73
Peng, C. Y., & Wu, J. (2004). On the choice of nugget in kriging modeling for deterministic computer experiments. Journal of Computational and Graphical Statistics, 23, 151–168.
Perdikaris, P., Venturi, D., Royset, J. O., & Karniadakis, G. E. (2015). Multi-fidelity modelling via recursive co-kriging and Gaussian Markov random fields. Proceedings of the Royal Society of London A, 471, 20150018.
Qian, P. Z. G., Wu, H., & Wu, C. F. J. (2008). Gaussian process Models for computer experiments with qualitative and quantitative factors. Technometrics, 50, 383–396.
Rue, H., & Held, L. (2005). Gaussian Markov random fields: Theory and applications. Boca Raton: Chapman and Hall.
Sacks, J., Welch, W. J., Mitchell, T. J., & Wynn, H. P. (1989). Design and analysis of computer experiments. Statistical Science, 4, 409–423.
Santner, T. J., Williams, B. J., & Notz, W. I. (2013). The design and analysis of computer experiments. New York: Springer Science & Business Media.
Sista, B., & Vemaganti, K. (2014). Estimation of statistical parameters of rough surfaces suitable for developing micro-asperity friction models. Wear, 316, 6–18.
Stein, M. L. (1999). Interpolation of spatial data: Some theory for kriging. New York: Springer.
Tworzydlo, W. W., Cecot, W., Oden, J. T., & Yew, C. H. (1988). Computational micro-and macroscopic models of contact and friction: Formulation, approach and applications. Wear, 220, 113–140.
Wackernagel, H. (2003). Multivariate geostatistics: An introduction with applications, 3rd ed. Berlin: Springer.
Zaytsev, V., Biver, P., Wachernagel, H., & Allard, D. (2016). Change-of-support models on irregular grids for geostatistical simulation. Mathematical Geosciences, 48, 353–369.
Zhou, Q., Qian, P. Z. G., & Zhou, S. (2011). A simple approach to emulation for computer models with qualitative and quantitative factors. Technometrics, 53, 266–273.
Acknowledgements
This work was supported in part by an allocation of computing time from the Ohio Supercomputer Center (OSC 1987). Shi’s research was supported by the Taft Research Center at the University of Cincinnati. Kang’s research was partially supported by the Simons Foundation’s Collaboration Award (#317298) and the Taft Research Center at the University of Cincinnati. Vemaganti’s work was partially supported by the University of Cincinnati Simulation Center.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Shi, H., Kang, E.L., Konomi, B.A., Vemaganti, K., Madireddy, S. (2017). Uncertainty Quantification Using the Nearest Neighbor Gaussian Process. In: Chen, DG., Jin, Z., Li, G., Li, Y., Liu, A., Zhao, Y. (eds) New Advances in Statistics and Data Science. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-69416-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-69416-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69415-3
Online ISBN: 978-3-319-69416-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)