Dynamic model-based clustering for spatio-temporal data

Paci, Lucia; Finazzi, Francesco

doi:10.1007/s11222-017-9735-9

Dynamic model-based clustering for spatio-temporal data

Published: 20 February 2017

Volume 28, pages 359–374, (2018)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

1300 Accesses
9 Citations
Explore all metrics

Abstract

In many research fields, scientific questions are investigated by analyzing data collected over space and time, usually at fixed spatial locations and time steps and resulting in geo-referenced time series. In this context, it is of interest to identify potential partitions of the space and study their evolution over time. A finite space-time mixture model is proposed to identify level-based clusters in spatio-temporal data and study their temporal evolution along the time frame. We anticipate space-time dependence by introducing spatio-temporally varying mixing weights to allocate observations at nearby locations and consecutive time points with similar cluster’s membership probabilities. As a result, a clustering varying over time and space is accomplished. Conditionally on the cluster’s membership, a state-space model is deployed to describe the temporal evolution of the sites belonging to each group. Fully posterior inference is provided under a Bayesian framework through Monte Carlo Markov chain algorithms. Also, a strategy to select the suitable number of clusters based upon the posterior temporal patterns of the clusters is offered. We evaluate our approach through simulation experiments, and we illustrate using air quality data collected across Europe from 2001 to 2012, showing the benefit of borrowing strength of information across space and time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Model Based Clustering in a Spatial Data Mining Context

Bayesian Modeling of Discrete-Time Point-Referenced Spatio-Temporal Data

Article 25 March 2022

Locally stationary spatio-temporal processes

Article 09 April 2018

Notes

For notation simplicity, same symbols are re-used.

References

Banerjee, S., Carlin, B.P., Gelfand, A.E.: Hierarchical Modeling and Analysis for Spatial Data, 2nd edn. Chapman and Hall, Boca Raton (2014)
MATH Google Scholar
Bruno, F., Cocchi, D., Paci, L.: A practical approach for assessing the effect of grouping in hierarchical spatio-temporal models. AStA Adv. Stat. Anal. 97(2), 93–108 (2013)
Article MathSciNet Google Scholar
Carlin, B.P., Polson, N.G., Stoffer, D.S.: A Monte Carlo approach to nonnormal and nonlinear state-space modeling. J. Am. Stat. Assoc. 87(418), 493–500 (1992)
Article Google Scholar
Celeux, G., Forbes, F., Robert, C.P., Titterington, D.M.: Deviance information criteria for missing data models. Bayesian Anal. 1(4), 651–673 (2006)
Article MathSciNet MATH Google Scholar
Cocchi, D., Greco, F., Trivisano, C.: Hierarchical space-time modelling of PM10 pollution. Atmos Environ 41(3), 532–542 (2007)
Article Google Scholar
Cressie, N., Wikle, C.K.: Statistics for Spatio-Temporal Data. Wiley, Hoboken (2011)
MATH Google Scholar
Dellaportas, P., Papageorgiou, I.: Multivariate mixtures of normals with unknown number of components. Stat. Comput. 16(1), 57–68 (2006)
Article MathSciNet Google Scholar
Duan, J.A., Guindani, M., Gelfand, A.E.: Generalized spatial Dirichlet process models. Biometrika 94, 809–825 (2007)
Article MathSciNet MATH Google Scholar
EU: Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe. Off. J. Eur. Union L 152:1–44 (2008). http://eur-lex.europa.eu/eli/dir/2008/50/oj
EU: Commission implementing decision 2011/850/EU of 12 December 2011 laying down rules for directives 2004/107/EC and 2008/50/EC of the European Parliament and of the Council as regards the reciprocal exchange of information and reporting on ambient air quality. Off. J. Eur. Union L 335:86–106 (2011). http://data.europa.eu/eli/dec_impl/2011/850/oj
Fernández, C., Green, P.J.: Modelling spatially correlated data via mixtures: a Bayesian approach. J. R. Stat. Soc. Ser. B 64, 805–826 (2002)
Article MathSciNet MATH Google Scholar
Finazzi, F., Haggarty, R., Miller, C., Scott, M., Fassò, A.: A comparison of clustering approaches for the study of the temporal coherence of multiple time series. Stoch. Environ. Res. Risk Assess. 29, 463–475 (2015)
Article Google Scholar
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)
MATH Google Scholar
Frühwirth-Schnatter, S., Kaufmann, S.: Model-based clustering of multiple time series. J. Bus. Econ. Stat. 26, 78–89 (2008)
Article MathSciNet Google Scholar
Gelfan, A.E., Ghosh, S.K.: Model choice: a minimum posterior predictive loss approach. Biometrika 85(1), 1–11 (1998)
Article MathSciNet Google Scholar
Gelfand, A.E., Kottas, A., MacEachern, S.N.: Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Am. Stat. Assoc. 100(471), 1021–1035 (2005)
Article MathSciNet MATH Google Scholar
Guerreiro, C.B., Foltescu, V., de Leeuw, F.: Air quality status and trends in Europe. Atmos. Environ. 98, 376–384 (2014)
Article Google Scholar
Hennig, C.: Methods for merging gaussian mixture components. Adv. Data Anal. Classif. 4(1), 3–34 (2010)
Article MathSciNet MATH Google Scholar
Hossain, M.M., Lawson, A.B., Cai, B., Choi, J., Liu, J., Kirby, R.S.: Space-time areal mixture model: relabeling algorithm and model selection issues. Environmetrics 25, 84–96 (2014)
Article MathSciNet Google Scholar
Inoue, L.Y.T., Neira, M., Nelson, C., Gleave, M., Etzioni, R.: Cluster-based network model for time-course gene expression data. Biostatistics 8, 507–525 (2007)
Article MATH Google Scholar
Jasra, A., Holmes, C.C., Stephens, D.A.: Markov chain monte carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci. 20(1), 50–67 (2005)
Knorr-Held, L.: Conditional prior proposals in dynamic models. Scand. J. Stat. 26(1), 129–144 (1999)
Article MATH Google Scholar
Lau, J.W., Green, P.J.: Bayesian model-based clustering procedures. J. Comput. Gr. Stat. 16(3), 526–558 (2007)
Malsiner-Walli, G., Frühwirth-Schnatter, S., Grün, B.: Model-based clustering based on sparse finite gaussian mixtures. Stat. Comput. 26(1), 303–324 (2016)
Article MathSciNet MATH Google Scholar
Melnykov, V.: Merging mixture components for clustering through pairwise overlap. J. Comput. Gr. Stat. 25(1), 66–90 (2016)
Article MathSciNet Google Scholar
Neelon, B., Gelfand, A.E., Miranda, M.L.: A multivariate spatial mixture model for areal data: examining regional differences in standardized test scores. J. R. Stat. Soc. Ser. C 63, 737–761 (2014)
Article MathSciNet Google Scholar
Nguyen, X., Gelfand, A.E.: The Dirichlet labeling process for clustering function data. Stat. Sin. 21, 1249–1289 (2011)
Article MATH Google Scholar
Nieto-Barajas, L.E., Contreras-Cristán, A.: A Bayesian nonparametric approach for time series clustering. Bayesian Anal. 9(1), 147–170 (2014)
Article MathSciNet MATH Google Scholar
Page, G.L., Quintana, F.A.: Spatial product partition models. Bayesian Anal. 11, 265–298 (2016)
Article MathSciNet MATH Google Scholar
Polson, N.G., Scott, J.G., Windle, J.: Bayesian inference for logistic models using PlyaGamma latent variables. J. Am. Stat. Assoc. 108(504), 1339–1349 (2013)
Article MATH Google Scholar
Ranciati, S., Viroli, C., Wit, E.: Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data. ArXiv e-prints 1601, 04879 (2016)
MATH Google Scholar
Reich, B.J., Fuentes, M.: A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. Ann. Appl. Stat. 1(1), 249–264 (2007)
Article MathSciNet MATH Google Scholar
Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. Ser. B 59(4), 731–792 (1997)
Article MATH Google Scholar
Sperrin, M., Jaki, T., Wit, E.: Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models. Stat. Comput. 20(3), 357–366 (2010)
Article MathSciNet Google Scholar
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B 64(4), 583–639 (2002)
Article MathSciNet MATH Google Scholar
Stephens, M.: Dealing with label switching in mixture models. J. R. Stat. Soc. Ser. B 62(4), 795–809 (2000)
Article MathSciNet MATH Google Scholar
Vincent, K., Stedman, J.: A review of air quality station type classifications for UK compliance monitoring. Tech. rep. The Department for Environment, Food and Rural Affairs, Welsh Government, Scottish Government and the Department of the Environment for Northern Ireland, rICARDO-AEA/R/3387 (2013). https://uk-air.defra.gov.uk/library/reports?report_id=765
Viroli, C.: Model based clustering for three-way data structures. Bayesian Anal. 6(4), 573–602 (2011)
Article MathSciNet MATH Google Scholar
West, M., Harrison, J.: Bayesian Forecasting and Dynamic Models, 2nd edn. Springer, New York (1997)
MATH Google Scholar
Zhang, H.: Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J. Am. Stat. Assoc. 99, 250–261 (2004)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors thank Gianluca Mastrantonio and the air quality service at ARPAE Emilia-Romagna for helpful discussions. We also thank the anonymous reviewers for their comments which have improved the paper.

Author information

Authors and Affiliations

Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
Lucia Paci
Department of Management, Information and Production Engineering, University of Bergamo, Dalmine, Italy
Francesco Finazzi

Authors

Lucia Paci
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Finazzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucia Paci.

Additional information

The research was partially funded by a FIRB2012 grant (project no. RBFR12URQJ) provided by the Italian Ministry of Education, Universities and Research.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1544 KB)

Appendix

The full conditional distribution of the variances $\lambda ^2_k$, for $k=2,\dots ,K$, is

$$\begin{aligned} \lambda ^2_k&\mid \text {rest} \sim \mathcal {IG} \Biggl (a+\frac{Tn}{2}, b+\frac{1}{2}\sum _{t=1}^T\left( \varvec{\phi }_{t,k}-\rho _k \varvec{\phi }_{t-1,k}\right) ^\prime \\&\times C(\theta )^{-1} \left( \varvec{\phi }_{t,k}-\rho _k \varvec{\phi }_{t-1,k}\right) \Biggr ). \end{aligned}$$

The full conditional distribution of the variances $\tau ^2_k$, for $k=1,\dots ,K$, is

$$\begin{aligned} \tau ^2_k \mid \text {rest} \sim \mathcal {IG} \left( a+\frac{T}{2}, b+ \frac{1}{2} \sum _{t=1}^T{\left( z_{t,k}-g_k z_{t-1,k}\right) ^2} \right) . \end{aligned}$$

The full conditional distribution of the error variance $\sigma ^2$ is given by

$$\begin{aligned} \sigma ^2 \mid \text {rest}\! \sim \! \mathcal {IG}\left( a\!+\!\frac{Tn}{2}, b\!+\!\frac{1}{2} \sum _{t=1}^T{\left( \mathbf {y}_t\!-\!\mathbf {H}_t\mathbf {z}_t\right) ^\prime \left( \mathbf {y}_t\!-\!\mathbf {H}_t\mathbf {z}_t\right) } \right) . \end{aligned}$$

The full conditional distribution of $\rho _k$, $k=2,\dots ,K$, is a univariate normal distribution $\mathcal {N}(vd,v)$ restricted in the interval $I(-1<\rho _k<1)$, where

$$\begin{aligned}&v^{-1} = \frac{1}{\lambda _k^2}\varvec{\phi }_{t-1,k}^\prime C(\theta )^{-1} \varvec{\phi }_{t-1,k} + 10^{-4}\\&d = \frac{1}{\lambda _k^2}\varvec{\phi }_{t-1,k}^\prime C(\theta )^{-1} \varvec{\phi }_{t,k} . \end{aligned}$$

The full conditional distribution of $g_k$, $k=1,\dots , K$, is a univariate normal distribution^{Footnote 1} $\mathcal {N}(vd,v)$ truncated in the interval $I(-1<g_k<1)$, where

$$\begin{aligned}&v^{-1} = \frac{1}{\tau ^2_k}\sum _{t=1}^T{z_{t-1,k}^2} + 10^{-4}\\&d = \frac{1}{\tau ^2_k} \sum _{t=1}^T{z_{t-1,k}z_{t,k}} . \end{aligned}$$

The full conditional distribution of the allocation variables $w_t(\mathbf {s})$ is given by

$$\begin{aligned} w_t(\mathbf {s}) \mid \text {rest} \sim \text {Multinomial}\left( \pi _{t,1}(\mathbf {s})^\star , \dots , \pi _{t,K}(\mathbf {s})^\star \right) \end{aligned}$$

where the posterior probabilities are

$$\begin{aligned} \pi ^\star _{t,k}(\mathbf {s})=\frac{\pi _{t,k}(\mathbf {s}) N(z_{t,k}, \sigma ^2)}{\sum _{l=1}^K{\pi _{t,l}(\mathbf {s}) N(z_{t,l}, \sigma ^2)}} . \end{aligned}$$

The full conditional distribution of the latent states $\mathbf {z}_{t}$ is a K-dimensional multivariate normal distribution $\mathcal {N}_K(VD,V)$, where

$t=1$
$$\begin{aligned}&V^{-1} = \frac{1}{\sigma ^2}\mathbf {H}_t^\prime \mathbf {H}_t + \mathbf {G}^\prime \varSigma _\eta ^{-1} \mathbf {G} + 10^{-4}I_K \\&D = \frac{1}{\sigma ^2}\mathbf {H}_t^\prime \mathbf {y}_t + \mathbf {G}^\prime \varSigma _\eta ^{-1} \mathbf {z}_{t+1} \end{aligned}$$
$t=2,\dots ,T-1$
$$\begin{aligned}&V^{-1} = \frac{1}{\sigma ^2}\mathbf {H}_t^\prime \mathbf {H}_t + \mathbf {G}^\prime \varSigma _\eta ^{-1} \mathbf {G} + \varSigma _\eta ^{-1} \\&D = \frac{1}{\sigma ^2}\mathbf {H}_t^\prime \mathbf {y}_t + \mathbf {G}^\prime \varSigma _\eta ^{-1} \mathbf {z}_{t+1} + \varSigma _\eta ^{-1}\mathbf {G} \mathbf {z}_{t-1} \end{aligned}$$
$t=T$
$$\begin{aligned}&V^{-1} = \frac{1}{\sigma ^2}\mathbf {H}_t^\prime \mathbf {H}_t + \varSigma _\eta ^{-1} \\&D = \frac{1}{\sigma ^2}\mathbf {H}_t^\prime \mathbf {y}_t + \varSigma _\eta ^{-1}\mathbf {G} \mathbf {z}_{t-1}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paci, L., Finazzi, F. Dynamic model-based clustering for spatio-temporal data. Stat Comput 28, 359–374 (2018). https://doi.org/10.1007/s11222-017-9735-9

Download citation

Received: 10 November 2016
Accepted: 07 February 2017
Published: 20 February 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s11222-017-9735-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic model-based clustering for spatio-temporal data

Abstract

Access this article

Similar content being viewed by others

On Model Based Clustering in a Spatial Data Mining Context

Bayesian Modeling of Discrete-Time Point-Referenced Spatio-Temporal Data

Locally stationary spatio-temporal processes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 1544 KB)

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic model-based clustering for spatio-temporal data

Abstract

Access this article

Similar content being viewed by others

On Model Based Clustering in a Spatial Data Mining Context

Bayesian Modeling of Discrete-Time Point-Referenced Spatio-Temporal Data

Locally stationary spatio-temporal processes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 1544 KB)

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation