Advertisement

Time Series Clustering from High Dimensional Data

  • Carlo DragoEmail author
  • Germana Scepi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7627)

Abstract

Due to technological advances there is the possibility to collect datasets of growing size and dimension. On the other hand, standard techniques do not allow the easy management of large dimensional data and new techniques need to be considered in order to find useful results. Another relevant problem is the information loss due to the aggregation in large data sets. We need to take into account this information richness present in the data which could be hidden in the data visualization process. Our proposal - which contributes to the literature on temporal data mining - is to use some new types of time series defined as the beanplot time series in order to avoid the aggregation and to cluster original high dimensional time series effectively. In particular we consider the case of high dimensional time series and a clustering approach based on the statistical features of the beanplot time series.

Keywords

Beanplots High dimensional data Clustering Self- organizing maps 

References

  1. 1.
    Arroyo, J., Gonzales Rivera, G., Maté, C.: Forecasting with Interval and Histogram Data: Some Financial Applications. Working Paper (2009)Google Scholar
  2. 2.
    Arroyo, J., Maté, C.: Forecasting histogram time series with K-nearest neighbours methods. Int. J. Forecast. 25, 192–207 (2009)CrossRefGoogle Scholar
  3. 3.
    Billard, L., Diday, E.: From the statistics of data to the statistics of knowledge: symbolic data analysis. J. Am. Stat. Assoc. 98, 991–999 (2003)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Billard, L., Diday, E.: Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley, Chichester (2006)CrossRefzbMATHGoogle Scholar
  5. 5.
    Brownlees, C.T., Gallo, G.M.: Financial econometric analysis at ultra-high frequency: Data handling concerns. Comput. Stat. Data Anal. 51(4), 2232–2245 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Deistler, M., Zinner, C.: Modelling high-dimensional time series by generalized linear dynamic factor models: an introductory survey. Commun. Inf. Syst. 7(2), 153–166 (2007)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Diday, E., Noirhomme, F.: Symbolic Data Analysis and the SODAS Software. Wiley-Interscience, Chichester (2008)zbMATHGoogle Scholar
  8. 8.
    Drago, C.: The Density Valued Data Analysis in a Temporal Framework: The Data Model Approach. Ph.D Dissertation in Statistics XXIV Cycle, University of Naples “Federico II” (2011)Google Scholar
  9. 9.
    Drago, C., Lauro, C., Scepi, G.: Visualizing and Forecasting Beanplot Time Series Working Paper (2010)Google Scholar
  10. 10.
    Drago, C., Scepi, G.: Forecasting by Beanplot Time Series Electronic Proceedings of Compstat. Springer Verlag, pp. 959–967 (2010). ISBN 978-3-7908-2603-6Google Scholar
  11. 11.
    Drago, C., Scepi, G.: Visualizing and exploring high frequency financial data: beanplot time series forthcoming. In: Ingrassia, S., Rocci, R., Vichi, M. (eds.) New Perspectives in Statistical Modeling and Data Analysis, Springer Series: Studies in Classification, Data Analysis, and Knowledge Organization (2011). ISBN: 978-3-642-11362Google Scholar
  12. 12.
    Engle, R.F., Russell, J.R.: Analysis of high frequency financial data. In: Hansen, L., Ait-Sahalia, Y. (eds.) New York, vol. 6(1), pp. 47–53 (2004)Google Scholar
  13. 13.
    Fu, T.-C., et al.: Pattern discovery from stock time series using self-organizing maps. In: Workshop Notes of KDD 2001 Workshop on Temporal Data Mining, pp. 26–29 (2001)Google Scholar
  14. 14.
    Gençay, R., et al.: An Introduction to High Frequency Finance, 1st edn. Academic Press, San Diego (2001)Google Scholar
  15. 15.
    Gilbert, P.D., Meijer, E.: Time Series Factor Analysis with an Application to Measuring Money. Research Report No. 05F10. University of Groningen, SOM Research School (2005)Google Scholar
  16. 16.
    Glattfelder, J.B., Dupuis, A., Olsen, R.B.: Patterns in high-frequency FX data: Discovery of 12 empirical scaling laws. Quant. Finance 11(4), 26 (2008)MathSciNetGoogle Scholar
  17. 17.
    Heaton, C.: Factor Analysis of High Dimensional Time Series. Ph.D. Thesis, University of New South Wales. School of Economics, Sydney, Australia (2008)Google Scholar
  18. 18.
    Hyndman, R.J.: Measuring Time Series Characteristics. R-Bloggers (2012)Google Scholar
  19. 19.
    Kampstra, P.B.: A boxplot alternative for visual comparison of distributions. J. Stat. Softw. 28(1), 1–9 (2008). Code Snippet 1Google Scholar
  20. 20.
    Kavitha, V., Punithavalli, M.: Clustering time series data stream: a literature survey. J. Comput. Sci. 8(1), 289–294 (2010)Google Scholar
  21. 21.
    Kohonen, T.: Self-Organizing Maps Springer Series in Information Sciences, vol. 30. Springer, Heidelberg (2001)zbMATHGoogle Scholar
  22. 22.
    Lam, C., Yao, Q., Bathia, N.: Estimation for latent factor models for high-dimensional time series. Biometrika 98, 35 (2010)Google Scholar
  23. 23.
    Li, Q., Racine, J.S.: Nonparametric Econometrics: Theory and Practice. Princeton University Press, Princeton (2006)zbMATHGoogle Scholar
  24. 24.
    Liao, W.T.: Clustering of time series data a survey. Pattern Recogn. 38, 1857–1874 (2005)CrossRefzbMATHGoogle Scholar
  25. 25.
    Myland, P.A., Zhang, L.: The Econometrics of High Frequency Data. Working Paper (2009)Google Scholar
  26. 26.
    Nanopoulos, A., Alcock, R., Manolopoulos, Y.: Feature-based classification of time-series data. Int. J. Comput. Res. 10, 49–61 (2001). Nona ScienceGoogle Scholar
  27. 27.
    OlsenWorld: High Frequency Data (2012). http://www.olsen.ch/more/datasets/
  28. 28.
    Pasley, A., Austin, J.: Distribution forecasting of high frequency time series. Decis. Support Syst. 37(4), 501–513 (2004)CrossRefGoogle Scholar
  29. 29.
    Rao, R.B., Rickard, S., Coetzee, F.: Time series forecasting from high-dimensional data with multiple adaptive layers. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD 1998), pp. 319–323 (1998)Google Scholar
  30. 30.
    R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2010). ISBN 3-900051-07-0Google Scholar
  31. 31.
    Russell, J.R.: Econometric Modeling of Multivariate Irregularly-Spaced High-Frequency Data. Working Paper (1999)Google Scholar
  32. 32.
    Sewell, M.: Characterization of financial time series, Research Note RN/11/01. University College London, London (2011)Google Scholar
  33. 33.
    Sewell, M.V., Yan, W.: Ultra high frequency financial data. In: Proceedings of the 2008 GECCO Conference Companion on Genetic and Evolutionary Computation GECCO 2008, pp. 18–47 (2008)Google Scholar
  34. 34.
    Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selection method for kernel density estimation. JRSS-B 53, 683–690 (1991)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Taylor, G.: Compact Modeling of High-Dimensional Time Series (2007)Google Scholar
  36. 36.
    Verde, R., Irpino, A.: Comparing Histogram Data Using a Mahalanobis Wasserstein Distance P. Brito, ed. Analysis 2008, 77–89 (2008)Google Scholar
  37. 37.
    Verleysen, M., François, D.: The curse of dimensionality in data mining and time series prediction. In: Cabestany, J., Prieto, A.G., Sandoval, F. (eds.) IWANN 2005. LNCS, vol. 3512, pp. 758–770. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  38. 38.
    Wang, X., Smith, K.A., Hynman, R.J.: Characteristic based clustering for time series data. Data Mining Knowl. Discov. 13(3), 335–364 (2006)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Wehrens, R., Buydens, L.M.C.: Self- and Super-organising Maps in R: the kohonen package. J. Stat. Softw. 21(5) (2007). http://www.jstatsoft.org/v21/i05
  40. 40.
    Yan, B., Zivot, G.: Analysis of High-Frequency Financial Data with S-PLUS. Working Paper (2003)Google Scholar
  41. 41.
    Zivot, E.: Analysis of high frequency financial data: models, methods and software. Descriptive Analysis of High Frequency Financial Data with S-PLUS. Presented at the 11th Brazilian Time Series and Econometrics Meeting (ESTE), July 31–August 3 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Universitá degli Studi “Niccolo Cusano”RomeItaly
  2. 2.Department of Economics and StatisticsUniversitá degli Studi di NapoliNaplesItaly

Personalised recommendations