Advertisement

Clustering of interval time series

  • Elizabeth Ann MaharajEmail author
  • Paulo Teles
  • Paula Brito
Article
  • 67 Downloads

Abstract

Interval time series occur when real intervals of some variable of interest are registered as an ordered sequence along time. We address the problem of clustering interval time series (ITS), for which different approaches are proposed. First, clustering is performed based on point-to-point comparisons. Time-domain and wavelet features also serve as clustering variables in alternative approaches. Furthermore, autocorrelation matrix functions, gathering the autocorrelation and cross-correlation functions of the ITS upper and lower bounds, may be compared using adequate distances (e.g. the Frobenius distance) and used for clustering ITS. An improved procedure to determine the autocorrelation function of ITS is proposed, which also serves as a basis for clustering. The different alternative approaches are explored and their performances compared for ITS simulated under different setups. An application to sea level daily ranges, observed at different locations in Australia, illustrates the proposed methods.

Keywords

Interval autocorrelation Interval data Interval time series Time series clustering 

Notes

Acknowledgements

The work of P. Teles and P. Brito is financed by the ERDF—European Regional Development Fund—through the Operational Programme for Competitiveness and Internationalisation—COMPETE 2020 Programme within project “POCI-01-0145-FEDER-006961”—and by the National Funds through the FCT—Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology)–as part of project UID/EEA/50014/2013. We thank the associate editor and reviewers for their helpful comments and suggestions.

Supplementary material

11222_2018_9851_MOESM1_ESM.zip (13.9 mb)
Supplementary material 1 (zip 14198 KB)

References

  1. Antunes, A.M.C., Subba Rao, T.: On hypotheses testing for the selection of spatio-temporal models. J. Time Ser. Anal. 27, 767–791 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  2. Arroyo, J.: Métodos de Predicción para Series Temporales de Intervalos e Histogramas. PhD thesis, Universidad Pontificia Comillas, Madrid (2008)Google Scholar
  3. Arroyo, J., Maté, C.: Forecasting histogram time series with k-nearest neighbours methods. Int. J. Forecast. 25(1), 192–207 (2009)CrossRefGoogle Scholar
  4. Bertrand, P., Goupil, F.: Descriptive statistics for symbolic data. In: Bock, H.-H., Diday, E. (eds.) Analysis of Symbolic Data, pp. 106–124. Exploratory Methods for Extracting Statistical Information from Complex Data, Springer, Heidelberg (2000)Google Scholar
  5. Billard, L.: Sample covariance functions for complex quantitative data. In: Proceedings of the World IASC Conference, Yokohama, Japan, pp. 157–163 (2008)Google Scholar
  6. Billard, L., Diday, E.: From the statistics of data to the statistics of knowledge: symbolic data analysis. J. Am. Stat. Assoc. 98(462), 470–487 (2003)MathSciNetCrossRefGoogle Scholar
  7. Billard, L., Diday, E.: Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley, Chichester (2006)CrossRefzbMATHGoogle Scholar
  8. Brito, P.: Symbolic data analysis: another look at the interaction of data mining and statistics. WIREs Data Min. Knowl. Discov. 4(4), 281–295 (2014)CrossRefGoogle Scholar
  9. Caldwell, P.C., Merrifield, M.A., Thompson, P.R.: Sea level measured by tide gauges from global oceans–the joint archive for sea level holdings (NCEI Accession 0019568), Version 5.5. In: NOAA National Centers for Environmental Information, Dataset (2015).  https://doi.org/10.7289/V5V40S7W
  10. Caiado, J., Maharaj, E.A., D’Urso, P.: Time series clustering. In: Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.) Handbook of Cluster Analysis. Chapman and Hall, New York (2015)Google Scholar
  11. Chavent, M., Lechevallier, Y.: Dynamical clustering of interval data: optimization of an adequacy criterion based on Hausdorff distance. In: Classification, Clustering, and Data Analysis, pp. 53–60. Springer, Berlin (2002)Google Scholar
  12. Cliff, A.D., Ord, J.K.: Model building and the analysis of spatial pattern in human geography. J. R. Stat. Soc. B 37, 297–328 (1975)MathSciNetzbMATHGoogle Scholar
  13. Crespo, F., Peters, G., Weber, R.: Rough clustering approaches for dynamic environments. In: Peters, G., Lingras, P., Ślȩzak, D., Yao, Y. (eds.) Rough Sets: Selected Methods and Applications in Management and Engineering. Advanced Information and Knowledge Processing. Springer, London (2012)Google Scholar
  14. Cressie, N.A.C.: Statistics for Spatial Data. Wiley, New York (1993)zbMATHGoogle Scholar
  15. Cressie, N.A.C., Wikle, C.K.: Statistics for Spatio-temporal Data. Wiley, Hoboken (2011)zbMATHGoogle Scholar
  16. De Carvalho, F.A.T., Lechevallier, Y.: Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recognit. 42(7), 1223–1236 (2009)CrossRefzbMATHGoogle Scholar
  17. De Carvalho, F.A.T., Brito, P., Bock, H.-H.: Dynamic clustering for interval data based on \(L_2\) distance. Comput. Stat. 21(2), 231–250 (2006a)CrossRefzbMATHGoogle Scholar
  18. De Carvalho, F.A.T., De Souza, R.M.C.R., Chavent, M., Lechevallier, Y.: Adaptive Hausdorff distances and dynamic clustering of symbolic interval data. Pattern Recognit. Lett. 27(3), 167–179 (2006b)CrossRefGoogle Scholar
  19. De Carvalho, F.A.T., Lechevallier, Y., Verde R.: Clustering methods in symbolic data analysis. In: Diday, E., Noirhomme-Fraiture, M. (eds) Symbolic Data Analysis and the SODAS Software, Chichester, pp. 182–203 (2008)Google Scholar
  20. De Souza, R.M.C.R., De Carvalho, F.A.T.: Clustering of interval data based on city-block distances. Pattern Recognit. Lett. 25(3), 353–365 (2004)CrossRefGoogle Scholar
  21. Dias, S., Brito, P.: Off the beaten track: a new linear model for interval data. Eur. J. Oper. Res. 258(3), 1118–1130 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  22. Diday, E., Simon, J.C.: Clustering Analysis. Digital Pattern Recognition, pp. 47–94. Springer, Berlin (1976)CrossRefGoogle Scholar
  23. Diggle, P.J., Ribeiro Jr., P.J.: Model-Based Geostatistics. Springer, New York (2007)zbMATHGoogle Scholar
  24. Douzal-Chouakria, A., Billard, L., Diday, E.: Principal component analysis for interval-valued observations. Stat. Anal. Data Min. 4(2), 229–246 (2011)MathSciNetCrossRefGoogle Scholar
  25. Duarte Silva, A.P., Brito, P.: Linear discriminant analysis for interval data. Comput. Stat. 21(2), 289–308 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  26. Duarte Silva, A.P., Brito, P.: Discriminant analysis of interval data: an assessment of parametric and distance-based approaches. J. Classif. 32(3), 516–541 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  27. D’Urso, P., Maharaj, E.A.: Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst. 160, 3565–3589 (2009)MathSciNetCrossRefGoogle Scholar
  28. D’Urso, P., Maharaj, E.A.: Wavelets-based clustering of multivariate time series. Fuzzy Sets Syst. 193, 33–61 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  29. Finkenstadt, B., Held, L., Isham, V. (eds).: Statistical Methods for Spatio-Temporal Systems. Chapman and Hall, London (2007)Google Scholar
  30. García-Ascanio, C., Maté, C.: Electric power demand forecasting using interval time series: a comparison between var and imlp. Energy Policy 38(2), 715–725 (2010)CrossRefGoogle Scholar
  31. Genolini, C., Falissard, B.: Kml: k-means for longitudinal data. Comput. Stat. 25, 317–328 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  32. González-Rivera, G., Arroyo, J.: Time series modeling of histogram-valued data: the daily histogram time series of s&p500 intradaily returns. Int. J. Forecast. 28(1), 20–33 (2012)CrossRefGoogle Scholar
  33. Han, A., Yongmiao, H., La, K.K., Shouyang, W.: Interval time series analysis with an application to the sterling-dollar exchange rate. J. Syst. Sci. Complex. 21(4), 558–573 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  34. Han, A., Hong, Y., Wang, S.: Autoregressive conditional models for interval-valued time series data. In: The 3rd International Conference on Singular Spectrum Analysis and Its Applications (2012)Google Scholar
  35. Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds): Handbook of Cluster Analysis. Chapman and Hall/CRC, London (2015)Google Scholar
  36. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRefzbMATHGoogle Scholar
  37. Irpino, A., Verde, R. (2006) A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: Batagelj V, Bock HH, Ferligoj A (eds.) Proceedings of the Conference of the International Federation of Classification Societies (IFCS06), pp. 185–192. Springer, HeidelbergGoogle Scholar
  38. Johnston, J., Dinardo, J.: Econometric Methods, 2nd edn. McGraw-Hill, New York (1997)Google Scholar
  39. Le, N.D., Zidek, J.V.: Statistical Analysis of Environmental Space-Time Processes. Springer, New York (2006)zbMATHGoogle Scholar
  40. Le-Rademacher, J., Billard, L.: Symbolic covariance principal component analysis and visualization for interval-valued data. J. Comput. Gr. Stat. 21(2), 413–432 (2012)MathSciNetCrossRefGoogle Scholar
  41. LimaNeto, E., De Carvalho, F.A.T.: Centre and range method for fitting a linear regression model to symbolic interval data. Comput. Stat. Data Anal. 52(3), 1500–1515 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  42. LimaNeto, E., De Carvalho, F.A.T.: Constrained linear regression models for symbolic interval-valued variables. Comput. Stat. Data Anal. 54(2), 333–347 (2010)MathSciNetCrossRefGoogle Scholar
  43. LimaNeto, E., De Carvalho, F.A.T.: Bivariate symbolic regression models for interval-valued variables. J. Stat. Comput. Simul. 81(11), 1727–1744 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  44. Maia, A.L.S., De Carvalho, F.A.T., Ludermir, T.B.: Forecasting models for interval-valued time series. Neurocomputing 71(16), 3344–3352 (2008)CrossRefGoogle Scholar
  45. Percival, D., Walden, A.: Wavelets Analysis for Time Series Analysis. Cambridge University Press, Cambridge (2000)zbMATHGoogle Scholar
  46. Pfeifer, P., Deutsch, S.: A three stage interactive procedure for space-time modeling. Technometrics 22, 35–47 (1980)CrossRefzbMATHGoogle Scholar
  47. Ramos-Guajardo, A.B., Grzegorzewski, P.: Distance-based linear discriminant analysis for interval-valued data. Inf. Sci. 372, 591–607 (2016)CrossRefGoogle Scholar
  48. Rodrigues, P.M., Salish, N.: Modeling and forecasting interval time series with threshold models. Adv. Data Anal. Classif. 9(1), 41–57 (2015)MathSciNetCrossRefGoogle Scholar
  49. Teles, P., Brito, P.: Modelling interval time series data. In: Proceedings of the 3rd IASC World Conference on Computational Statistics and Data Analysis, Limassol, Cyprus (2005)Google Scholar
  50. Teles, P., Brito, P.: Modeling interval time series with space-time processes. Commun. Stat.Theory Methods 44(17), 3599–3627 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  51. Verde, R., Irpino, A.: Dynamic clustering of histogram data: Using the right metric. In: Brito, P., Bertrand, P., Cucumel, G., De Carvalho, F.A.T. (eds.) Selected Contributions in Data Analysis and Classification, pp. 123–134. Springer, Heidelberg (2007)Google Scholar
  52. Verde, R., Irpino, A.: Comparing histogram data using a Mahalanobis-Wasserstein distance. In: Brito, P. (ed) Proceedings of the COMPSTAT’2008, pp. 77–89. Springer, Heidelberg (2008)Google Scholar
  53. Wei, W.W.S.: Time Series Analysis–Univariate and Multivariate Methods, 2nd edn. Pearson, New York (2006)zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Monash UniversityMelbourneAustralia
  2. 2.Faculdade de Economia and LIAAD INESC TECUniversidade do PortoPortoPortugal

Personalised recommendations