Spatio-temporal additive regression model selection for urban water demand

  • Hunter R. Merrill
  • Xueying Tang
  • Nikolay BliznyukEmail author
Original Paper


Understanding the factors influencing urban water use is critical for meeting demand and conserving resources. To analyze the relationships between urban household-level water demand and potential drivers, we develop a method for Bayesian variable selection in partially linear additive regression models, particularly suited for high-dimensional spatio-temporally dependent data. Our approach combines a spike-and-slab prior distribution with a modified version of the Bayesian group lasso to simultaneously perform selection of null, linear, and nonlinear models and to penalize regression splines to prevent overfitting. We investigate the effectiveness of the proposed method through a simulation study and provide comparisons with existing methods. We illustrate the methodology on a case study to estimate and quantify uncertainty of the associations between several environmental and demographic predictors and spatio-temporally varying household-level urban water demand in Tampa, FL.


Bayesian group lasso Geoadditive model High dimensional data Sparsity 


Supplementary material

477_2019_1682_MOESM1_ESM.pdf (310 kb)
Supplementary material 1 (pdf 309 KB)


  1. Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88(422):669–679CrossRefGoogle Scholar
  2. Banerjee S, Carlin B, Gelfand A (2014) Hierarchical modeling and analysis for spatial data. Chapman and Hall/CRC Press, Boca RatonGoogle Scholar
  3. Banerjee S, Ghosal S (2014) Bayesian variable selection in generalized additive partial linear models. Stat 3(1):363–378CrossRefGoogle Scholar
  4. Blangiardo M, Cameletti M, Baio G, Rue H (2013) Spatial and spatio-temporal models with R-INLA. Spat Spatio-Temporal Epidemiol 4:33–49CrossRefGoogle Scholar
  5. Bliznyuk N, Carroll RJ, Genton MG, Wang Y (2012) Variogram estimation in the presence of trend. Stat Interface 5:159–168CrossRefGoogle Scholar
  6. Bliznyuk N, Paciorek CJ, Schwartz J, Coull B (2014) Nonlinear predictive latent process models for integrating spatio-temporal exposure data from multiple sources. Ann Appl Stat 8(3):1538–1560CrossRefGoogle Scholar
  7. Boyer MJ, Dukes MD, Young LJ, Wang S (2014) Irrigation conservation of Florida-friendly landscaping based on water billing data. J Irrig Drain Eng 140(12):04014037CrossRefGoogle Scholar
  8. Casella G (2001) Empirical Bayes gibbs sampling. Biostatistics 2(4):485–500CrossRefGoogle Scholar
  9. Chouldechova, A, Hastie T (2017) Generalized additive model selection. arXiv preprint: arxiv: 1506.03850
  10. Crainiceanu CM, Ruppert D, Wand MP (2005) Bayesian analysis for penalized spline regression using WinBUGS. J Stat Softw 14(14):1–24CrossRefGoogle Scholar
  11. Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, HobokenGoogle Scholar
  12. Donkor E, Roberson JA, Soyer R, Mazzuchi T (2014) Urban water demand forecasting: review of methods and models. J Water Resour Plan Manag 140(2):146–159CrossRefGoogle Scholar
  13. Duerr I, Merrill HR, Wang C, Bai R, Boyer M, Dukes MD, Bliznyuk N (2018) Forecasting urban household water demand with statistical and machine learning methods using large space-time data: a comparative study. Environ Model Softw 102:29–38CrossRefGoogle Scholar
  14. Francisco-Fernandez M, Opsomer JD (2005) Smoothing parameter selection methods for nonparametric regression with spatially correlated errors. Can J Stat 33(2):279–295CrossRefGoogle Scholar
  15. George EI, Mcculloch RE (1997) Approaches for Bayesian variable selection. Stat Sin 7:339–373Google Scholar
  16. Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU Press, NYGoogle Scholar
  17. Gryparis A, Coull Ba, Schwartz J, Suh HH (2007) Semiparametric latent variable regression models for spatio-temporal modeling of mobile source particles in the greater Boston area. J R Stat Soc Ser C 56(2):183–209CrossRefGoogle Scholar
  18. Haley MB, Dukes MD, Miller GL (2007) Residential irrigation water use in Central Florida. J Irrig Drain Eng 133(5):427–434CrossRefGoogle Scholar
  19. Harville D (1997) Matrix algebra from a statistician’s perspective. Technometrics 40:749Google Scholar
  20. Hastie T, Tibshirani R (1986) Generalized additive models. Stat Sci 1(3):297–318CrossRefGoogle Scholar
  21. Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference, and prediction. Springer, New YorkGoogle Scholar
  22. He K, Huang JZ (2016) Asymptotic properties of adaptive group lasso for sparse reduced rank regression. Stat 5(1):251–261 sta4.123CrossRefGoogle Scholar
  23. Heaton MJ, Datta A, Finley AO, Furrer R, Guinness J, Guhaniyogi R, Gerber F, Gramacy RB, Hammerling D, Katzfuss M, et al (2018) A case study competition among methods for analyzing large spatial data. J Agric Biol Environ Stat, 1–28Google Scholar
  24. Johnstone IM, Silverman BW (2004) Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. Ann Stat 32(4):1594–1649CrossRefGoogle Scholar
  25. Kamman EE, Wand MP (2003) Geoadditive models. Appl Stat 52:1–18Google Scholar
  26. Knight K, Fu W (2000) Asymptotics for lasso-type estimators. Ann Stat 28(5):1356–1378CrossRefGoogle Scholar
  27. Kyung M, Gill J, Ghosh M, Casella G (2010) Penalized regression, standard errors, and Bayesian lassos. Bayesian Anal 5(2):369–411CrossRefGoogle Scholar
  28. Lee S-J, Chang H, Gober P (2015) Space and time dynamics of urban water demand in Portland, Oregon and Phoenix, Arizona. Stoch Environ Res Risk Assess 29(4):1135–1147CrossRefGoogle Scholar
  29. Lee S-J, Wentz EA, Gober P (2010) Space-time forecasting using soft geostatistics: a case study in forecasting municipal water demand for Phoenix, Arizona. Stoch Environ Res Risk Assess 24(2):283–295CrossRefGoogle Scholar
  30. Lin C-Y, Bondell H, Zhang HH, Zou H (2013) Variable selection for non-parametric quantile regression via smoothing spline analysis of variance. Stat 2(1):255–268CrossRefGoogle Scholar
  31. Lindgren F, Rue H (2015) Bayesian spatial modelling with R-INLA. J Stat Softw 63(19):1–25CrossRefGoogle Scholar
  32. Lindgren F, Rue H, Lindström J (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach (with discussion). J R Stat Soc B 73(4):423–498CrossRefGoogle Scholar
  33. Lou Y, Bien J, Caruana R, Gehrke J (2016) Sparse partially linear additive models. J Comput Graph Stat 25(4):1126–1140CrossRefGoogle Scholar
  34. Luts J, Broderick T, Wand MP (2014) Real-time semiparametric regression. J Comput Graph Stat 23(3):589–615CrossRefGoogle Scholar
  35. Marra G, Wood SN (2011) Practical variable selection for generalized additive models. Comput Stat Data Anal 55(7):2372–2387CrossRefGoogle Scholar
  36. Merrill HR, Grunwald S, Bliznyuk N (2017) Semiparametric regression models for spatial prediction and uncertainty quantification of soil attributes. Stoch Environ Res Risk Assess 31(10):2691–2703CrossRefGoogle Scholar
  37. Opsomer J, Wang Y, Yang Y (2001) Nonparametric regression with correlated errors. Stat Sci 16(2):134–153CrossRefGoogle Scholar
  38. Piffady J, Parent É, Souchon Y (2013) A hierarchical generalized linear model with variable selection: studying the response of a representative fish assemblage for large european rivers in a multi-pressure context. Stoch Environ Res Risk Assess 27(7):1719–1734CrossRefGoogle Scholar
  39. R Core Team(2017) R: A language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  40. Rakitsch B, Lippert C, Borgwardt K, Stegle O (2013) It is all in the noise: efficient multi-task gaussian process inference with structured residuals. In: Burges CJC, Bottou L, Welling, M, Ghahramani Z, Weinberger KQ (eds) Advances in Neural Information Processing Systems 26, pp 1466–1474. Curran Associates, IncGoogle Scholar
  41. Raman S, Fuchs TJ, Wild PJ, Dahl E, Roth V (2009) The Bayesian group-lasso for analyzing contingency tables. In: Proceedings of the 26th annual international conference on machine learning, pp 881–888Google Scholar
  42. Ravikumar P, Lafferty J, Liu H, Wasserman L (2009) Sparse additive models. J R Stat Soc Ser B Stat Methodol 71(5):1009–1030CrossRefGoogle Scholar
  43. Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations (with discussion). J R Stat Soc B 71:319–392CrossRefGoogle Scholar
  44. Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, New YorkCrossRefGoogle Scholar
  45. Scheipl F (2011) spikeSlabGAM: Bayesian variable selection, model choice and regularization for generalized additive mixed models in R. J Stat Softw 43(14):1–24CrossRefGoogle Scholar
  46. Sun Y, Li B, Genton MG (2012) Geostatistics for large datasets. In: Advances and challenges in space-time modelling of natural events, pp 55–77. Springer, BerlinGoogle Scholar
  47. Taylor-Rodriguez D, Womack AJ, Fuentes C, Bliznyuk N et al (2017) Intrinsic bayesian analysis for occupancy models. Bayesian Anal 12(3):855–877CrossRefGoogle Scholar
  48. USDA, Natural Resources Conservation Service, U.S. Dept. of Agriculture (2013). Soil surveys of Hillsborough, Pasco, and Pinellas counties.
  49. USGS (2005). Evapotranspiration data for Florida. U.S. Geological Survey Florida Evapotranspiration Network,
  50. USGS (2011) Evapotranspiration data for Florida. U.S. Geological Survey Florida Evapotranspiration Network,
  51. Wand M, Ormerod J (2011) Penalized wavelets: embedding wavelets into semiparametric regression. Electron J Stat 5:1654–1717CrossRefGoogle Scholar
  52. Wood S (2016) Just another gibbs additive modeler: interfacing JAGS and mgcv. J Stat Softw Artic 75(7):1–15Google Scholar
  53. Wood SN (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J Am Stat Assoc 99(467):673–686CrossRefGoogle Scholar
  54. Wood SN (2006) Generalized additive models: an introduction with R. Chapman and Hall/CRC Press, Boca RatonCrossRefGoogle Scholar
  55. Xu X, Ghosh M (2015) Bayesian variable selection and estimation for group lasso. Bayesian Anal 10(4):909–936CrossRefGoogle Scholar
  56. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68(1):49–67CrossRefGoogle Scholar
  57. Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106(495):1099–1112CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Agricultural and Biological EngineeringUniversity of FloridaGainesvilleUSA
  2. 2.Department of StatisticsColumbia UniversityNew YorkUSA
  3. 3.Departments of Agricultural and Biological Engineering, Biostatistics and StatisticsUniversity of FloridaGainesvilleUSA

Personalised recommendations