Abstract
In many research fields, scientific questions are investigated by analyzing data collected over space and time, usually at fixed spatial locations and time steps and resulting in geo-referenced time series. In this context, it is of interest to identify potential partitions of the space and study their evolution over time. A finite space-time mixture model is proposed to identify level-based clusters in spatio-temporal data and study their temporal evolution along the time frame. We anticipate space-time dependence by introducing spatio-temporally varying mixing weights to allocate observations at nearby locations and consecutive time points with similar cluster’s membership probabilities. As a result, a clustering varying over time and space is accomplished. Conditionally on the cluster’s membership, a state-space model is deployed to describe the temporal evolution of the sites belonging to each group. Fully posterior inference is provided under a Bayesian framework through Monte Carlo Markov chain algorithms. Also, a strategy to select the suitable number of clusters based upon the posterior temporal patterns of the clusters is offered. We evaluate our approach through simulation experiments, and we illustrate using air quality data collected across Europe from 2001 to 2012, showing the benefit of borrowing strength of information across space and time.
Similar content being viewed by others
Notes
For notation simplicity, same symbols are re-used.
References
Banerjee, S., Carlin, B.P., Gelfand, A.E.: Hierarchical Modeling and Analysis for Spatial Data, 2nd edn. Chapman and Hall, Boca Raton (2014)
Bruno, F., Cocchi, D., Paci, L.: A practical approach for assessing the effect of grouping in hierarchical spatio-temporal models. AStA Adv. Stat. Anal. 97(2), 93–108 (2013)
Carlin, B.P., Polson, N.G., Stoffer, D.S.: A Monte Carlo approach to nonnormal and nonlinear state-space modeling. J. Am. Stat. Assoc. 87(418), 493–500 (1992)
Celeux, G., Forbes, F., Robert, C.P., Titterington, D.M.: Deviance information criteria for missing data models. Bayesian Anal. 1(4), 651–673 (2006)
Cocchi, D., Greco, F., Trivisano, C.: Hierarchical space-time modelling of PM10 pollution. Atmos Environ 41(3), 532–542 (2007)
Cressie, N., Wikle, C.K.: Statistics for Spatio-Temporal Data. Wiley, Hoboken (2011)
Dellaportas, P., Papageorgiou, I.: Multivariate mixtures of normals with unknown number of components. Stat. Comput. 16(1), 57–68 (2006)
Duan, J.A., Guindani, M., Gelfand, A.E.: Generalized spatial Dirichlet process models. Biometrika 94, 809–825 (2007)
EU: Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe. Off. J. Eur. Union L 152:1–44 (2008). http://eur-lex.europa.eu/eli/dir/2008/50/oj
EU: Commission implementing decision 2011/850/EU of 12 December 2011 laying down rules for directives 2004/107/EC and 2008/50/EC of the European Parliament and of the Council as regards the reciprocal exchange of information and reporting on ambient air quality. Off. J. Eur. Union L 335:86–106 (2011). http://data.europa.eu/eli/dec_impl/2011/850/oj
Fernández, C., Green, P.J.: Modelling spatially correlated data via mixtures: a Bayesian approach. J. R. Stat. Soc. Ser. B 64, 805–826 (2002)
Finazzi, F., Haggarty, R., Miller, C., Scott, M., Fassò, A.: A comparison of clustering approaches for the study of the temporal coherence of multiple time series. Stoch. Environ. Res. Risk Assess. 29, 463–475 (2015)
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)
Frühwirth-Schnatter, S., Kaufmann, S.: Model-based clustering of multiple time series. J. Bus. Econ. Stat. 26, 78–89 (2008)
Gelfan, A.E., Ghosh, S.K.: Model choice: a minimum posterior predictive loss approach. Biometrika 85(1), 1–11 (1998)
Gelfand, A.E., Kottas, A., MacEachern, S.N.: Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Am. Stat. Assoc. 100(471), 1021–1035 (2005)
Guerreiro, C.B., Foltescu, V., de Leeuw, F.: Air quality status and trends in Europe. Atmos. Environ. 98, 376–384 (2014)
Hennig, C.: Methods for merging gaussian mixture components. Adv. Data Anal. Classif. 4(1), 3–34 (2010)
Hossain, M.M., Lawson, A.B., Cai, B., Choi, J., Liu, J., Kirby, R.S.: Space-time areal mixture model: relabeling algorithm and model selection issues. Environmetrics 25, 84–96 (2014)
Inoue, L.Y.T., Neira, M., Nelson, C., Gleave, M., Etzioni, R.: Cluster-based network model for time-course gene expression data. Biostatistics 8, 507–525 (2007)
Jasra, A., Holmes, C.C., Stephens, D.A.: Markov chain monte carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci. 20(1), 50–67 (2005)
Knorr-Held, L.: Conditional prior proposals in dynamic models. Scand. J. Stat. 26(1), 129–144 (1999)
Lau, J.W., Green, P.J.: Bayesian model-based clustering procedures. J. Comput. Gr. Stat. 16(3), 526–558 (2007)
Malsiner-Walli, G., Frühwirth-Schnatter, S., Grün, B.: Model-based clustering based on sparse finite gaussian mixtures. Stat. Comput. 26(1), 303–324 (2016)
Melnykov, V.: Merging mixture components for clustering through pairwise overlap. J. Comput. Gr. Stat. 25(1), 66–90 (2016)
Neelon, B., Gelfand, A.E., Miranda, M.L.: A multivariate spatial mixture model for areal data: examining regional differences in standardized test scores. J. R. Stat. Soc. Ser. C 63, 737–761 (2014)
Nguyen, X., Gelfand, A.E.: The Dirichlet labeling process for clustering function data. Stat. Sin. 21, 1249–1289 (2011)
Nieto-Barajas, L.E., Contreras-Cristán, A.: A Bayesian nonparametric approach for time series clustering. Bayesian Anal. 9(1), 147–170 (2014)
Page, G.L., Quintana, F.A.: Spatial product partition models. Bayesian Anal. 11, 265–298 (2016)
Polson, N.G., Scott, J.G., Windle, J.: Bayesian inference for logistic models using PlyaGamma latent variables. J. Am. Stat. Assoc. 108(504), 1339–1349 (2013)
Ranciati, S., Viroli, C., Wit, E.: Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data. ArXiv e-prints 1601, 04879 (2016)
Reich, B.J., Fuentes, M.: A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. Ann. Appl. Stat. 1(1), 249–264 (2007)
Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. Ser. B 59(4), 731–792 (1997)
Sperrin, M., Jaki, T., Wit, E.: Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models. Stat. Comput. 20(3), 357–366 (2010)
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B 64(4), 583–639 (2002)
Stephens, M.: Dealing with label switching in mixture models. J. R. Stat. Soc. Ser. B 62(4), 795–809 (2000)
Vincent, K., Stedman, J.: A review of air quality station type classifications for UK compliance monitoring. Tech. rep. The Department for Environment, Food and Rural Affairs, Welsh Government, Scottish Government and the Department of the Environment for Northern Ireland, rICARDO-AEA/R/3387 (2013). https://uk-air.defra.gov.uk/library/reports?report_id=765
Viroli, C.: Model based clustering for three-way data structures. Bayesian Anal. 6(4), 573–602 (2011)
West, M., Harrison, J.: Bayesian Forecasting and Dynamic Models, 2nd edn. Springer, New York (1997)
Zhang, H.: Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J. Am. Stat. Assoc. 99, 250–261 (2004)
Acknowledgements
The authors thank Gianluca Mastrantonio and the air quality service at ARPAE Emilia-Romagna for helpful discussions. We also thank the anonymous reviewers for their comments which have improved the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research was partially funded by a FIRB2012 grant (project no. RBFR12URQJ) provided by the Italian Ministry of Education, Universities and Research.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
The full conditional distribution of the variances \(\lambda ^2_k\), for \(k=2,\dots ,K\), is
The full conditional distribution of the variances \(\tau ^2_k\), for \(k=1,\dots ,K\), is
The full conditional distribution of the error variance \(\sigma ^2\) is given by
The full conditional distribution of \(\rho _k\), \(k=2,\dots ,K\), is a univariate normal distribution \(\mathcal {N}(vd,v)\) restricted in the interval \(I(-1<\rho _k<1)\), where
The full conditional distribution of \(g_k\), \(k=1,\dots , K\), is a univariate normal distributionFootnote 1 \(\mathcal {N}(vd,v)\) truncated in the interval \(I(-1<g_k<1)\), where
The full conditional distribution of the allocation variables \(w_t(\mathbf {s})\) is given by
where the posterior probabilities are
The full conditional distribution of the latent states \(\mathbf {z}_{t}\) is a K-dimensional multivariate normal distribution \(\mathcal {N}_K(VD,V)\), where
-
\(t=1\)
$$\begin{aligned}&V^{-1} = \frac{1}{\sigma ^2}\mathbf {H}_t^\prime \mathbf {H}_t + \mathbf {G}^\prime \varSigma _\eta ^{-1} \mathbf {G} + 10^{-4}I_K \\&D = \frac{1}{\sigma ^2}\mathbf {H}_t^\prime \mathbf {y}_t + \mathbf {G}^\prime \varSigma _\eta ^{-1} \mathbf {z}_{t+1} \end{aligned}$$ -
\(t=2,\dots ,T-1\)
$$\begin{aligned}&V^{-1} = \frac{1}{\sigma ^2}\mathbf {H}_t^\prime \mathbf {H}_t + \mathbf {G}^\prime \varSigma _\eta ^{-1} \mathbf {G} + \varSigma _\eta ^{-1} \\&D = \frac{1}{\sigma ^2}\mathbf {H}_t^\prime \mathbf {y}_t + \mathbf {G}^\prime \varSigma _\eta ^{-1} \mathbf {z}_{t+1} + \varSigma _\eta ^{-1}\mathbf {G} \mathbf {z}_{t-1} \end{aligned}$$ -
\(t=T\)
$$\begin{aligned}&V^{-1} = \frac{1}{\sigma ^2}\mathbf {H}_t^\prime \mathbf {H}_t + \varSigma _\eta ^{-1} \\&D = \frac{1}{\sigma ^2}\mathbf {H}_t^\prime \mathbf {y}_t + \varSigma _\eta ^{-1}\mathbf {G} \mathbf {z}_{t-1}. \end{aligned}$$
Rights and permissions
About this article
Cite this article
Paci, L., Finazzi, F. Dynamic model-based clustering for spatio-temporal data. Stat Comput 28, 359–374 (2018). https://doi.org/10.1007/s11222-017-9735-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-017-9735-9