Abstract
Generally, trend detection algorithms over the data stream require expert assistance in some form. We present an unsupervised multiscale data stream algorithm which detects trends for evolving time series based on a data driver data stream. The raw stream data clustering algorithm is incremental, space dilating and has linear time complexity. The evolving stream is incrementally explored on a number of windows. Whenever a change occurs, we switch the analysis on this driver data stream in order to detect the new aggregated patterns and the new best selection of window widths among an exponential base set. The window widths are detected using a slightly modified version of an incremental SVD procedure. We apply this clustering algorithm to a free public NYSE stock exchange financial data feed, investigating incrementally the developing trends during the current crisis data from 2007 to 2009. The algorithm detected the changing widths of the trends as well as the trends themselves in the market.
Similar content being viewed by others
References
Aggarwal CC, Yu JP, Han S, Wang J (2003) A framework for clustering evolving data streams. In: Freytag JC, Lockemann P, Abiteboul S, Carey M, Selinger P, Heuer A (eds) Proceedings 2003 VLDB conference. Morgan Kaufmann, San Francisco, pp 81–92. doi:10.1016/B978-012722442-8/50016-1. http://www.sciencedirect.com/science/article/B86NR-4PJFHM0-38/2/4a6a5265cf67c4754862f4e42be8375c
Basalto N, Bellotti R, De Carlo F, Facchi P, Pantaleo E, Pascazio, S (2007) Hausdorff clustering of financial time series. Physica A 379:635–644. doi:10.1016/j.physa.2007.01.011
Bonanno G, Lillo F, Mantegna R (2001) High-frequency cross-correlation in a set of stocks. Quantitative Finance 1:96–104
Chakrabarti K, Keogh E, Mehrotra S, Pazzani M (2002) Localy adaptive dimensionality reduction for indexing large time series databases. In: TODS, vol 27(2)
Chen JR (2007) Useful clustering outcomes from meaningful time series clustering. In: Proc. of the 6th Australasian conf. on data mining and analytics, vol 70, pp 101–109
Chen Y, Dong G, Han J, Wah BW, Wang J (2002) Multi-dimensional regression analysis of time-series data streams. In: VLDB
Chen Z, Van Ness JW (1996) Space-conserving agglomerative algorithms. J Classif 13:157–168
Drozdz S, Ruf F, Speth J, Wójcik M (1999) Imprints of log-periodic self-similarity in the stock market. Eur Phys J 10:589–593. doi:10.1007/s100510050890
Epps TW (1979) Comovements in stock prices in the very short run. J Am Stat Assoc 74:291–298
Everitt BS, Landau S, Leese M (2001) Cluster analysis. Wiley
Ganti V, Gehrke J, Ramakrishnan R (1999) A framework for measuring changes in data characteristics. In: PODS ’99: Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, New York, pp 126–137. doi:10.1145/303976.303989
Ganti V, Gehrke J, Ramakrishnan R (2002) Mining data streams under block evolution. SIGKDD Explor Newsl 3(2):1–10. doi:10.1145/507515.507517
Gavrilov M, Anguelov D, Indyk P, Motwani R (2000) Mining the stock market: which measure is best? In: Proc. of the KDD 2000, pp 487–496
Guha S, Gunopulos D, Koudas N (2003) Correlating synchronous and asynchronous data streams. In: Proc. of the 9th int’l conf. on knowledge discovery and data mining, pp 529–534
Iori G, Precup OV (2007) Cross-correlation measures in the high-frequency domain. Eur J Financ 13(4):319–331
Keogh E, Lin J, Truppel W (2003) Clustering of time series subsequences is meaningless: implications for past and future research. In: 3rd IEEE intl. conf. on data mining, pp 115–122
Kontaki M, Papadopoulos AN, Manolopoulos Y (2008) Continuous trend-based clustering in data streams. In: Lecture notes in computer science, data warehousing and knowledge discovery, vol 5182. Springer, pp 251–262. doi:10.1007/978-3-540-85836-2_24
Lance GN, Williams WT (1967) A general theory of classificatory sorting strategies. I. Hierarchical systems. Comput J 9:373–380
Laur PA, Nock R, Symphor J-E, Poncelet P (2007) Mining evolving data streams for frequent patterns. Pattern Recogn 40(2):492–503. doi:10.1016/j.patcog.2006.03.006. http://www.sciencedirect.com/science/article/B6V14-4K42BHK-1/2/26bd6bc956b1b2658af10111d475aa5a
LeBaron B (2001) Evolution and time horizons in an agent-based stock market. Macroecon Dyn 5:225–254
Lühr S, Lazarescu M (2009) Incremental clustering of dynamic data streams using connectivity based representative points. Data Knowl Eng 68(1):1–27. doi:10.1016/j.datak.2008.08.006. http://www.sciencedirect.com/science/article/B6TYX-4TB776Y-1/2/7eadbcbfab7f08e3890fe98cd0776c94
Mantegna RN (1999) ierarchical structure in financial markets. Eur Phys J B 1:193–197
Mattiussi V, Iori G (2008) Currency futures volatility during the 1997 East Asian crisis: an application of Fourier analysis in debt, risk and liquidity in futures markets. In: Goss BA (ed) Debt, risk and liquidity in futures markets. Routledge
Nasraoui O, Rojas C, Cardona C (2006) A framework for mining evolving trends in web data streams using dynamic learning and retrospective validation. Comput Netw 50(10):1488–1512. doi:10.1016/j.comnet.2005.10.021. http://www.sciencedirect.com/science/article/B6VRG-4HWXD7F-2/2/c8a52734c219f12e184c298e7ac9146c I. Web Dynamics
Onnela JP, Chakraborti A, Kaski K, Kertesz J, Kanto A (2003) Asset trees and asset graphs in financial markets. Phys Scr T 106:48–54
Palpanas T, Vlachos M, Keogh EJ, Gunopulos D, Truppel W (2004) Online amnesic approximation of streaming time series. In: ICDE
Panadimitriou S, Sun J, Faloutsos C (2005) Stream pattern discovery in multiple time series. In: Proceedings of the 31st VLDB conf
Papadimitriou P, Yu S (2006) Optimal multiscale patterns in time series streams. In: Proceedings of SIGMOD
Percival DB, Walden AT (2000) Wavelet methods for time series analysis. Cambridge University Press
Rammal R, Toulouse G, Virasoro MA (1986) Ultrametricity for physicists. Rev Mod Phys 58:765
Sakurai Y, Faloutsos C, Yamamuro M (2007) Stream monitoring under the time warping distance. In: Proceedings of ICDE, pp 1046–1055
Simon G, Lee JA, M V (2006) Unfolding preprocessing for meaningful time series clustering. Neural Netw pp 877–888
Sornette D, Johansen A, Bouchaud JP (1996) Stock market crashes, precursors and replicas. J Phys I France 6:167–175
Tumminello M, Di Matteo T, Aste T, Mantegna RN (2007) Correlation based networks of equity returns sampled at different time horizons. Eur Phys J, B Cond Matter Complex Syst 55:209–217. doi:10.1140/epjb/e2006-00414-4. http://www.springerlink.com/content/K00208620K555142
Tumminello M, Lillo F, Mantegna RN (2010) Correlation, hierarchies, and networks in financial markets. J Econ Behav Organ 75(1):40–58. doi:10.1016/j.jebo.2010.01.004. http://www.sciencedirect.com/science/article/B6V8F-4Y70C2G-3/2/096ef02707b1fb9e4fb7aea7e24d9104 (Transdisciplinary Perpectives on Economic Complexity)
Villasenor J, Belzer B, Liao J (1995) Wavelet filter evaluation for image compression. IEEE Trans Im Proc 4:1053–1060
Zhang T, Yue D, Gu Y, Wang Y, Yu G (2009) Adaptive correlation analysis in stream time series with sliding windows. Comput Math Appl 57(6):937–948. doi:10.1016/j.camwa.2008.10.083. http://www.sciencedirect.com/science/article/B6TYJ-4V34RJB-1/2/c6d1cac755a205191fe8497cb4316d73 (Advances in Fuzzy Sets and Knowledge Discovery)
Zhang Z, Zhou J (2010) Transfer estimation of evolving class priors in data stream classification. Pattern Recogn 43(9):3151–3161. doi:10.1016/j.patcog.2010.03.021. http://www.sciencedirect.com/science/article/B6V14-4YRHCMK-1/2/6b371db998b70834ff438e5265150d9f
Zhu Y, Shasha D (2002) Statstream: Statistical monitoring of thousands of data streams in real time. In: Proc. of the 28th int’l conf. on very large data bases, pp 358–369
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dragut, A.B. Stock Data Clustering and Multiscale Trend Detection. Methodol Comput Appl Probab 14, 87–105 (2012). https://doi.org/10.1007/s11009-010-9186-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-010-9186-7
Keywords
- Financial time series
- Linear time clustering algorithm
- Space dilating measure
- Monotonic algorithm
- Multiscale trend detection