Abstract
The Gaussian process is a common model in a wide variety of applications, such as environmental modeling, computer experiments, and geology. Two major challenges often arise: First, assuming that the process of interest is stationary over the entire domain often proves to be untenable. Second, the traditional Gaussian process model formulation is computationally inefficient for large datasets. In this paper, we propose a new Gaussian process model to tackle these problems based on the convolution of a smoothing kernel with a partitioned latent process. Nonstationarity can be modeled by allowing a separate latent process for each partition, which approximates a regional clustering structure. Partitioning follows a binary tree generating process similar to that of Classification and Regression Trees. A Bayesian approach is used to estimate the partitioning structure and model parameters simultaneously. Our motivating dataset consists of 11918 precipitation anomalies. Results show that our model has promising prediction performance and is computationally efficient for large datasets.
Similar content being viewed by others
References
Analytics R, Weston S (2015a) doParallel: Foreach parallel adaptor for the “parallel” package. http://CRAN.R-project.org/package=doParallel, R package version 1.0.10
Analytics R, Weston S (2015b) foreach: Provides Foreach looping construct for R. http://CRAN.R-project.org/package=foreach, R package version 1.4.3
Banerjee S, Gelfand AE, Finley AO, Sang H (2008) Gaussian predictive process models for large spatial data sets. J R Stat Soc Ser B 70(4):825–848
Bornn L, Shaddick G, Zidek J (2012) Modelling nonstationary processes through dimension expansion. J Am Stat Assoc 107(497):281–289
Breiman L, Friedman JH, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, Belmont
Brenning A (2001) Geostatistics without stationarity assumptions within geographical information systems. Freiberg Online Geosci 6:1–108
Chipman HA, George EI, McCulloch RE (1998) Bayesian CART model search. J Am Stat Assoc 93(443):935–948
Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. J R Stat Soc Ser B 70(Part 1):209–226
Damian D, Sampson P, Guttorp P (2001) Bayesian estimation of semi-parametric non-stationary spatial covariance structure. Environmetrics 12:161–178
Finley AO, Banerjee S, Carlin BP (2007) spBayes: an R package for univariate and multivariate hierarchical point-referenced spatial models. J Stat Softw 19(4):1–24 http://www.jstatsoft.org/article/view/v019i04
Finley AO, Sang H, Banerjee S, Gelfand AE (2009) Improving the performance of predictive process modeling for large datasets. Comput Stat Data Anal 53:2873–2884
Fuentes M, Smith RL (2001) A new class of nonstationary spatial models. Technical reports on North Carolina State University, Department of Statistics, Raleigh, NC
Fuentes M, Kelly R, Kittel T, Nychka D (1998) Spatial prediction of climate fields for ecological models. Technical reports on National Center for Atmospheric Research, Boulder CO
Furrer R (2006) KriSp: an R package for covariance tapered kriging of large datasets using sparse matrix techniques. In: Technical reports on MCS 06-06, Colorado School of Mines, Golden, USA, http://user.math.uzh.ch/furrer/software/KriSp/, version 0.4, 2006–10–26
Gaujoux R (2014) doRNG: generic reproducible parallel backend for “foreach” loops. http://CRAN.R-project.org/package=doRNG, R package version 1.6
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal Mach Intell 12:609–628
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359–378
Gramacy RB (2007) tgp: an R package for Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian process models. J Stat Softw 19(9):1–46. http://www.jstatsoft.org/v19/i09/
Gramacy RB, Apley DW (2015) Local Gaussian process approximation for large computer experiments. J Comput Graph Stat 24(2):561–578
Gramacy RB, Lee HK (2008) Bayesian treed Gaussian process models with an application to computer modeling. J Am Stat Assoc 103(483):1119–1130
Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4):711–32
Higdon D (1998) A process-convolution approach to modeling temperatures in the north Atlantic Ocean. J Environ Ecol Stat 5(2):173–190
Higdon D (2002) Space and space-time modeling using process convolutions. In: Anderson C, Barnett V, Chatwin P, El-Shaarawi A (eds) Quantitative methods for current environmental issues. Springer, London, pp 37–54
Higdon D (2006) A primer on space-time modeling from a Bayesian perspective. In: Finkenstadt B, Held L, Isham V (eds) Statistical methods of spatio-temporal systems. Chapman and Hall/CRC, Boca Raton, pp 217–279
Higdon D, Swall J, Kern J (1999) Non-stationary spatial modeling. Bayesian Stat 6:761–768
Johns CJ, Nychka D, Kittel TG, Daly C (2003) Infilling sparse records of spatial fields. J Am Stat Assoc 98:796–806
Katzfuss M (2013) Bayesian nonstationary spatial modeling for very large datasets. Environmetrics 24(3):189–200
Kim HM, Mallick BK, Holmes CC (2005) Analyzing nonstationary spatial data using piecewise Gaussian processes. J Am Stat Assoc 100:653–668
Konomi BA, Sang H, Mallick BK (2014) Adaptive Bayesian nonstationary modeling for large spatial datasets using covariance approximations. J Comput Graph Stat 23(3):802–829
Lee HKH, Higdon D, Calder CA, Holloman CH (2005) Efficient models for correlated data via convolutions of intrinsic processes. Stat Model 5(1):53–74
Lemos RT, Sansó B (2009) Spatio-temporal model for mean, anomaly and trend fields of north atlantic sea surface temperature. J Am Stat Assoc 104(485):5–18
Liang WWJ (2012) Bayesian nonstationary Gaussian process models via treed process convolutions. Ph.D. Thesis, Department of AMS, UCSC, Santa Cruz, 95064
Montagna S (2013) On Bayesian analyses of functional regression, correlated functional data and non-homogeneous computer models. Ph.D. Thesis, Duke University, Durham, NC 27708
Naish-Guzman A, Holden S (2007) The generalized FITC approximation. In: Advances in neural information processing systems, pp 1057–1064
Paciorek C, Schervish MJ (2006) Spatial modelling using a new class of nonstationary covariance functions. Environmetrics 17:483–506
Sampson P, Guttorp P (1992) Nonparametric estimation of nonstationary spatial covariance structure. J Am Stat Assoc 87:108–119
Sang H, Huang JZ (2012) A full scale approximation of covariance functions for large spatial data sets. J R Stat Soc Ser B 74(22):111–132
Schmidt A, O’Hagan A (2003) Bayesian inference for non-stationary spatial covariance structure via spatial deformations. J R Stat Soc Ser B 65:743–758
Snelson E, Ghahramani Z (2005) Sparse Gaussian processes using pseudo-inputs. In: Advances in neural information processing systems, 18
Taddy MA, Gramacy RB, Polson NG (2011) Dynamic trees for learning and design. J Am Stat Assoc 106(493):109–123
van Dyk DA, Park T (2008) Partially collapsed Gibbs samplers: theory and methods. J Am Stat Assoc 103(482):790–796
Yang H, Liu F, Ji C, Dunson D (2014) Adaptive sampling for Bayesian geospatial models. Stat Comput 24:1101–1110
Acknowledgements
This research was partially supported by National Science Foundation Grant DMS-0906720.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liang, W.W.J., Lee, H.K.H. Bayesian nonstationary Gaussian process models via treed process convolutions. Adv Data Anal Classif 13, 797–818 (2019). https://doi.org/10.1007/s11634-018-0341-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-018-0341-2
Keywords
- Spatial statistics
- Stochastic modeling
- Classification and Regression Trees
- Reduced-rank approximation
- Heteroscedasticity