Abstract
Interval time series occur when real intervals of some variable of interest are registered as an ordered sequence along time. We address the problem of clustering interval time series (ITS), for which different approaches are proposed. First, clustering is performed based on point-to-point comparisons. Time-domain and wavelet features also serve as clustering variables in alternative approaches. Furthermore, autocorrelation matrix functions, gathering the autocorrelation and cross-correlation functions of the ITS upper and lower bounds, may be compared using adequate distances (e.g. the Frobenius distance) and used for clustering ITS. An improved procedure to determine the autocorrelation function of ITS is proposed, which also serves as a basis for clustering. The different alternative approaches are explored and their performances compared for ITS simulated under different setups. An application to sea level daily ranges, observed at different locations in Australia, illustrates the proposed methods.
Keywords
Interval autocorrelation Interval data Interval time series Time series clusteringNotes
Acknowledgements
The work of P. Teles and P. Brito is financed by the ERDF—European Regional Development Fund—through the Operational Programme for Competitiveness and Internationalisation—COMPETE 2020 Programme within project “POCI-01-0145-FEDER-006961”—and by the National Funds through the FCT—Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology)–as part of project UID/EEA/50014/2013. We thank the associate editor and reviewers for their helpful comments and suggestions.
Supplementary material
References
- Antunes, A.M.C., Subba Rao, T.: On hypotheses testing for the selection of spatio-temporal models. J. Time Ser. Anal. 27, 767–791 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
- Arroyo, J.: Métodos de Predicción para Series Temporales de Intervalos e Histogramas. PhD thesis, Universidad Pontificia Comillas, Madrid (2008)Google Scholar
- Arroyo, J., Maté, C.: Forecasting histogram time series with k-nearest neighbours methods. Int. J. Forecast. 25(1), 192–207 (2009)CrossRefGoogle Scholar
- Bertrand, P., Goupil, F.: Descriptive statistics for symbolic data. In: Bock, H.-H., Diday, E. (eds.) Analysis of Symbolic Data, pp. 106–124. Exploratory Methods for Extracting Statistical Information from Complex Data, Springer, Heidelberg (2000)Google Scholar
- Billard, L.: Sample covariance functions for complex quantitative data. In: Proceedings of the World IASC Conference, Yokohama, Japan, pp. 157–163 (2008)Google Scholar
- Billard, L., Diday, E.: From the statistics of data to the statistics of knowledge: symbolic data analysis. J. Am. Stat. Assoc. 98(462), 470–487 (2003)MathSciNetCrossRefGoogle Scholar
- Billard, L., Diday, E.: Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley, Chichester (2006)CrossRefzbMATHGoogle Scholar
- Brito, P.: Symbolic data analysis: another look at the interaction of data mining and statistics. WIREs Data Min. Knowl. Discov. 4(4), 281–295 (2014)CrossRefGoogle Scholar
- Caldwell, P.C., Merrifield, M.A., Thompson, P.R.: Sea level measured by tide gauges from global oceans–the joint archive for sea level holdings (NCEI Accession 0019568), Version 5.5. In: NOAA National Centers for Environmental Information, Dataset (2015). https://doi.org/10.7289/V5V40S7W
- Caiado, J., Maharaj, E.A., D’Urso, P.: Time series clustering. In: Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.) Handbook of Cluster Analysis. Chapman and Hall, New York (2015)Google Scholar
- Chavent, M., Lechevallier, Y.: Dynamical clustering of interval data: optimization of an adequacy criterion based on Hausdorff distance. In: Classification, Clustering, and Data Analysis, pp. 53–60. Springer, Berlin (2002)Google Scholar
- Cliff, A.D., Ord, J.K.: Model building and the analysis of spatial pattern in human geography. J. R. Stat. Soc. B 37, 297–328 (1975)MathSciNetzbMATHGoogle Scholar
- Crespo, F., Peters, G., Weber, R.: Rough clustering approaches for dynamic environments. In: Peters, G., Lingras, P., Ślȩzak, D., Yao, Y. (eds.) Rough Sets: Selected Methods and Applications in Management and Engineering. Advanced Information and Knowledge Processing. Springer, London (2012)Google Scholar
- Cressie, N.A.C.: Statistics for Spatial Data. Wiley, New York (1993)zbMATHGoogle Scholar
- Cressie, N.A.C., Wikle, C.K.: Statistics for Spatio-temporal Data. Wiley, Hoboken (2011)zbMATHGoogle Scholar
- De Carvalho, F.A.T., Lechevallier, Y.: Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recognit. 42(7), 1223–1236 (2009)CrossRefzbMATHGoogle Scholar
- De Carvalho, F.A.T., Brito, P., Bock, H.-H.: Dynamic clustering for interval data based on \(L_2\) distance. Comput. Stat. 21(2), 231–250 (2006a)CrossRefzbMATHGoogle Scholar
- De Carvalho, F.A.T., De Souza, R.M.C.R., Chavent, M., Lechevallier, Y.: Adaptive Hausdorff distances and dynamic clustering of symbolic interval data. Pattern Recognit. Lett. 27(3), 167–179 (2006b)CrossRefGoogle Scholar
- De Carvalho, F.A.T., Lechevallier, Y., Verde R.: Clustering methods in symbolic data analysis. In: Diday, E., Noirhomme-Fraiture, M. (eds) Symbolic Data Analysis and the SODAS Software, Chichester, pp. 182–203 (2008)Google Scholar
- De Souza, R.M.C.R., De Carvalho, F.A.T.: Clustering of interval data based on city-block distances. Pattern Recognit. Lett. 25(3), 353–365 (2004)CrossRefGoogle Scholar
- Dias, S., Brito, P.: Off the beaten track: a new linear model for interval data. Eur. J. Oper. Res. 258(3), 1118–1130 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
- Diday, E., Simon, J.C.: Clustering Analysis. Digital Pattern Recognition, pp. 47–94. Springer, Berlin (1976)CrossRefGoogle Scholar
- Diggle, P.J., Ribeiro Jr., P.J.: Model-Based Geostatistics. Springer, New York (2007)zbMATHGoogle Scholar
- Douzal-Chouakria, A., Billard, L., Diday, E.: Principal component analysis for interval-valued observations. Stat. Anal. Data Min. 4(2), 229–246 (2011)MathSciNetCrossRefGoogle Scholar
- Duarte Silva, A.P., Brito, P.: Linear discriminant analysis for interval data. Comput. Stat. 21(2), 289–308 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
- Duarte Silva, A.P., Brito, P.: Discriminant analysis of interval data: an assessment of parametric and distance-based approaches. J. Classif. 32(3), 516–541 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
- D’Urso, P., Maharaj, E.A.: Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst. 160, 3565–3589 (2009)MathSciNetCrossRefGoogle Scholar
- D’Urso, P., Maharaj, E.A.: Wavelets-based clustering of multivariate time series. Fuzzy Sets Syst. 193, 33–61 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
- Finkenstadt, B., Held, L., Isham, V. (eds).: Statistical Methods for Spatio-Temporal Systems. Chapman and Hall, London (2007)Google Scholar
- García-Ascanio, C., Maté, C.: Electric power demand forecasting using interval time series: a comparison between var and imlp. Energy Policy 38(2), 715–725 (2010)CrossRefGoogle Scholar
- Genolini, C., Falissard, B.: Kml: k-means for longitudinal data. Comput. Stat. 25, 317–328 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
- González-Rivera, G., Arroyo, J.: Time series modeling of histogram-valued data: the daily histogram time series of s&p500 intradaily returns. Int. J. Forecast. 28(1), 20–33 (2012)CrossRefGoogle Scholar
- Han, A., Yongmiao, H., La, K.K., Shouyang, W.: Interval time series analysis with an application to the sterling-dollar exchange rate. J. Syst. Sci. Complex. 21(4), 558–573 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
- Han, A., Hong, Y., Wang, S.: Autoregressive conditional models for interval-valued time series data. In: The 3rd International Conference on Singular Spectrum Analysis and Its Applications (2012)Google Scholar
- Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds): Handbook of Cluster Analysis. Chapman and Hall/CRC, London (2015)Google Scholar
- Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRefzbMATHGoogle Scholar
- Irpino, A., Verde, R. (2006) A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: Batagelj V, Bock HH, Ferligoj A (eds.) Proceedings of the Conference of the International Federation of Classification Societies (IFCS06), pp. 185–192. Springer, HeidelbergGoogle Scholar
- Johnston, J., Dinardo, J.: Econometric Methods, 2nd edn. McGraw-Hill, New York (1997)Google Scholar
- Le, N.D., Zidek, J.V.: Statistical Analysis of Environmental Space-Time Processes. Springer, New York (2006)zbMATHGoogle Scholar
- Le-Rademacher, J., Billard, L.: Symbolic covariance principal component analysis and visualization for interval-valued data. J. Comput. Gr. Stat. 21(2), 413–432 (2012)MathSciNetCrossRefGoogle Scholar
- LimaNeto, E., De Carvalho, F.A.T.: Centre and range method for fitting a linear regression model to symbolic interval data. Comput. Stat. Data Anal. 52(3), 1500–1515 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
- LimaNeto, E., De Carvalho, F.A.T.: Constrained linear regression models for symbolic interval-valued variables. Comput. Stat. Data Anal. 54(2), 333–347 (2010)MathSciNetCrossRefGoogle Scholar
- LimaNeto, E., De Carvalho, F.A.T.: Bivariate symbolic regression models for interval-valued variables. J. Stat. Comput. Simul. 81(11), 1727–1744 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
- Maia, A.L.S., De Carvalho, F.A.T., Ludermir, T.B.: Forecasting models for interval-valued time series. Neurocomputing 71(16), 3344–3352 (2008)CrossRefGoogle Scholar
- Percival, D., Walden, A.: Wavelets Analysis for Time Series Analysis. Cambridge University Press, Cambridge (2000)zbMATHGoogle Scholar
- Pfeifer, P., Deutsch, S.: A three stage interactive procedure for space-time modeling. Technometrics 22, 35–47 (1980)CrossRefzbMATHGoogle Scholar
- Ramos-Guajardo, A.B., Grzegorzewski, P.: Distance-based linear discriminant analysis for interval-valued data. Inf. Sci. 372, 591–607 (2016)CrossRefGoogle Scholar
- Rodrigues, P.M., Salish, N.: Modeling and forecasting interval time series with threshold models. Adv. Data Anal. Classif. 9(1), 41–57 (2015)MathSciNetCrossRefGoogle Scholar
- Teles, P., Brito, P.: Modelling interval time series data. In: Proceedings of the 3rd IASC World Conference on Computational Statistics and Data Analysis, Limassol, Cyprus (2005)Google Scholar
- Teles, P., Brito, P.: Modeling interval time series with space-time processes. Commun. Stat.Theory Methods 44(17), 3599–3627 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
- Verde, R., Irpino, A.: Dynamic clustering of histogram data: Using the right metric. In: Brito, P., Bertrand, P., Cucumel, G., De Carvalho, F.A.T. (eds.) Selected Contributions in Data Analysis and Classification, pp. 123–134. Springer, Heidelberg (2007)Google Scholar
- Verde, R., Irpino, A.: Comparing histogram data using a Mahalanobis-Wasserstein distance. In: Brito, P. (ed) Proceedings of the COMPSTAT’2008, pp. 77–89. Springer, Heidelberg (2008)Google Scholar
- Wei, W.W.S.: Time Series Analysis–Univariate and Multivariate Methods, 2nd edn. Pearson, New York (2006)zbMATHGoogle Scholar