Advertisement

Stock Data Clustering and Multiscale Trend Detection

  • Andreea B. Dragut
Article

Abstract

Generally, trend detection algorithms over the data stream require expert assistance in some form. We present an unsupervised multiscale data stream algorithm which detects trends for evolving time series based on a data driver data stream. The raw stream data clustering algorithm is incremental, space dilating and has linear time complexity. The evolving stream is incrementally explored on a number of windows. Whenever a change occurs, we switch the analysis on this driver data stream in order to detect the new aggregated patterns and the new best selection of window widths among an exponential base set. The window widths are detected using a slightly modified version of an incremental SVD procedure. We apply this clustering algorithm to a free public NYSE stock exchange financial data feed, investigating incrementally the developing trends during the current crisis data from 2007 to 2009. The algorithm detected the changing widths of the trends as well as the trends themselves in the market.

Keywords

Financial time series Linear time clustering algorithm Space dilating measure Monotonic algorithm Multiscale trend detection 

AMS 2000 Subject Classifications

62H30 62H20 37M10 06A06 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal CC, Yu JP, Han S, Wang J (2003) A framework for clustering evolving data streams. In: Freytag JC, Lockemann P, Abiteboul S, Carey M, Selinger P, Heuer A (eds) Proceedings 2003 VLDB conference. Morgan Kaufmann, San Francisco, pp 81–92. doi: 10.1016/B978-012722442-8/50016-1. http://www.sciencedirect.com/science/article/B86NR-4PJFHM0-38/2/4a6a5265cf67c4754862f4e42be8375c
  2. Basalto N, Bellotti R, De Carlo F, Facchi P, Pantaleo E, Pascazio, S (2007) Hausdorff clustering of financial time series. Physica A 379:635–644. doi: 10.1016/j.physa.2007.01.011 CrossRefGoogle Scholar
  3. Bonanno G, Lillo F, Mantegna R (2001) High-frequency cross-correlation in a set of stocks. Quantitative Finance 1:96–104CrossRefGoogle Scholar
  4. Chakrabarti K, Keogh E, Mehrotra S, Pazzani M (2002) Localy adaptive dimensionality reduction for indexing large time series databases. In: TODS, vol 27(2)Google Scholar
  5. Chen JR (2007) Useful clustering outcomes from meaningful time series clustering. In: Proc. of the 6th Australasian conf. on data mining and analytics, vol 70, pp 101–109Google Scholar
  6. Chen Y, Dong G, Han J, Wah BW, Wang J (2002) Multi-dimensional regression analysis of time-series data streams. In: VLDBGoogle Scholar
  7. Chen Z, Van Ness JW (1996) Space-conserving agglomerative algorithms. J Classif 13:157–168CrossRefMATHGoogle Scholar
  8. Drozdz S, Ruf F, Speth J, Wójcik M (1999) Imprints of log-periodic self-similarity in the stock market. Eur Phys J 10:589–593. doi: 10.1007/s100510050890 Google Scholar
  9. Epps TW (1979) Comovements in stock prices in the very short run. J Am Stat Assoc 74:291–298CrossRefGoogle Scholar
  10. Everitt BS, Landau S, Leese M (2001) Cluster analysis. WileyGoogle Scholar
  11. Ganti V, Gehrke J, Ramakrishnan R (1999) A framework for measuring changes in data characteristics. In: PODS ’99: Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, New York, pp 126–137. doi: 10.1145/303976.303989 CrossRefGoogle Scholar
  12. Ganti V, Gehrke J, Ramakrishnan R (2002) Mining data streams under block evolution. SIGKDD Explor Newsl 3(2):1–10. doi: 10.1145/507515.507517 CrossRefGoogle Scholar
  13. Gavrilov M, Anguelov D, Indyk P, Motwani R (2000) Mining the stock market: which measure is best? In: Proc. of the KDD 2000, pp 487–496Google Scholar
  14. Guha S, Gunopulos D, Koudas N (2003) Correlating synchronous and asynchronous data streams. In: Proc. of the 9th int’l conf. on knowledge discovery and data mining, pp 529–534Google Scholar
  15. Iori G, Precup OV (2007) Cross-correlation measures in the high-frequency domain. Eur J Financ 13(4):319–331CrossRefGoogle Scholar
  16. Keogh E, Lin J, Truppel W (2003) Clustering of time series subsequences is meaningless: implications for past and future research. In: 3rd IEEE intl. conf. on data mining, pp 115–122Google Scholar
  17. Kontaki M, Papadopoulos AN, Manolopoulos Y (2008) Continuous trend-based clustering in data streams. In: Lecture notes in computer science, data warehousing and knowledge discovery, vol 5182. Springer, pp 251–262. doi: 10.1007/978-3-540-85836-2_24
  18. Lance GN, Williams WT (1967) A general theory of classificatory sorting strategies. I. Hierarchical systems. Comput J 9:373–380Google Scholar
  19. Laur PA, Nock R, Symphor J-E, Poncelet P (2007) Mining evolving data streams for frequent patterns. Pattern Recogn 40(2):492–503. doi: 10.1016/j.patcog.2006.03.006. http://www.sciencedirect.com/science/article/B6V14-4K42BHK-1/2/26bd6bc956b1b2658af10111d475aa5a CrossRefMATHGoogle Scholar
  20. LeBaron B (2001) Evolution and time horizons in an agent-based stock market. Macroecon Dyn 5:225–254CrossRefMATHGoogle Scholar
  21. Lühr S, Lazarescu M (2009) Incremental clustering of dynamic data streams using connectivity based representative points. Data Knowl Eng 68(1):1–27. doi: 10.1016/j.datak.2008.08.006. http://www.sciencedirect.com/science/article/B6TYX-4TB776Y-1/2/7eadbcbfab7f08e3890fe98cd0776c94 CrossRefGoogle Scholar
  22. Mantegna RN (1999) ierarchical structure in financial markets. Eur Phys J B 1:193–197CrossRefMathSciNetGoogle Scholar
  23. Mattiussi V, Iori G (2008) Currency futures volatility during the 1997 East Asian crisis: an application of Fourier analysis in debt, risk and liquidity in futures markets. In: Goss BA (ed) Debt, risk and liquidity in futures markets. RoutledgeGoogle Scholar
  24. Nasraoui O, Rojas C, Cardona C (2006) A framework for mining evolving trends in web data streams using dynamic learning and retrospective validation. Comput Netw 50(10):1488–1512. doi: 10.1016/j.comnet.2005.10.021. http://www.sciencedirect.com/science/article/B6VRG-4HWXD7F-2/2/c8a52734c219f12e184c298e7ac9146c I. Web DynamicsCrossRefGoogle Scholar
  25. Onnela JP, Chakraborti A, Kaski K, Kertesz J, Kanto A (2003) Asset trees and asset graphs in financial markets. Phys Scr T 106:48–54CrossRefGoogle Scholar
  26. Palpanas T, Vlachos M, Keogh EJ, Gunopulos D, Truppel W (2004) Online amnesic approximation of streaming time series. In: ICDEGoogle Scholar
  27. Panadimitriou S, Sun J, Faloutsos C (2005) Stream pattern discovery in multiple time series. In: Proceedings of the 31st VLDB confGoogle Scholar
  28. Papadimitriou P, Yu S (2006) Optimal multiscale patterns in time series streams. In: Proceedings of SIGMODGoogle Scholar
  29. Percival DB, Walden AT (2000) Wavelet methods for time series analysis. Cambridge University PressGoogle Scholar
  30. Rammal R, Toulouse G, Virasoro MA (1986) Ultrametricity for physicists. Rev Mod Phys 58:765CrossRefMathSciNetGoogle Scholar
  31. Sakurai Y, Faloutsos C, Yamamuro M (2007) Stream monitoring under the time warping distance. In: Proceedings of ICDE, pp 1046–1055Google Scholar
  32. Simon G, Lee JA, M V (2006) Unfolding preprocessing for meaningful time series clustering. Neural Netw pp 877–888Google Scholar
  33. Sornette D, Johansen A, Bouchaud JP (1996) Stock market crashes, precursors and replicas. J Phys I France 6:167–175CrossRefGoogle Scholar
  34. Tumminello M, Di Matteo T, Aste T, Mantegna RN (2007) Correlation based networks of equity returns sampled at different time horizons. Eur Phys J, B Cond Matter Complex Syst 55:209–217. doi: 10.1140/epjb/e2006-00414-4. http://www.springerlink.com/content/K00208620K555142 MATHGoogle Scholar
  35. Tumminello M, Lillo F, Mantegna RN (2010) Correlation, hierarchies, and networks in financial markets. J Econ Behav Organ 75(1):40–58. doi: 10.1016/j.jebo.2010.01.004. http://www.sciencedirect.com/science/article/B6V8F-4Y70C2G-3/2/096ef02707b1fb9e4fb7aea7e24d9104 (Transdisciplinary Perpectives on Economic Complexity)CrossRefGoogle Scholar
  36. Villasenor J, Belzer B, Liao J (1995) Wavelet filter evaluation for image compression. IEEE Trans Im Proc 4:1053–1060CrossRefGoogle Scholar
  37. Zhang T, Yue D, Gu Y, Wang Y, Yu G (2009) Adaptive correlation analysis in stream time series with sliding windows. Comput Math Appl 57(6):937–948. doi: 10.1016/j.camwa.2008.10.083. http://www.sciencedirect.com/science/article/B6TYJ-4V34RJB-1/2/c6d1cac755a205191fe8497cb4316d73 (Advances in Fuzzy Sets and Knowledge Discovery)CrossRefMATHGoogle Scholar
  38. Zhang Z, Zhou J (2010) Transfer estimation of evolving class priors in data stream classification. Pattern Recogn 43(9):3151–3161. doi: 10.1016/j.patcog.2010.03.021. http://www.sciencedirect.com/science/article/B6V14-4YRHCMK-1/2/6b371db998b70834ff438e5265150d9f CrossRefMATHGoogle Scholar
  39. Zhu Y, Shasha D (2002) Statstream: Statistical monitoring of thousands of data streams in real time. In: Proc. of the 28th int’l conf. on very large data bases, pp 358–369Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Lab. Informatique Fondamentale (LIF)Univ. Aix-Marseille IICedex 1France

Personalised recommendations