Abstract
Clustering of data streams has become a task of great interest in the recent years as such data formats is are becoming increasingly ambiguous. In many cases, these data are also high dimensional and in result more complex for clustering. As such there is a growing need for algorithms that can be applied on streaming data and the at same time can cope with high dimensionality. To this end, here we design a streaming clustering approach by extending a recently proposed high dimensional clustering algorithm.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is Nearest Neighbor Meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: 2006 SIAM Conference on Data Mining, pp. 328–339 (2006)
Domingos, P., Hulten, G., Edu, P.C.W., Edu, C.H.G.W.: A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 106–113. Morgan Kaufmann (2001)
Heinz, C., Seeger, B.: Towards Kernel Density Estimation over Streaming Data. In: International Conference on Management of Data. Computer Society of India, COMAD 2006, Delhi, India (December 2006)
Oja, E., Karhunen, J.: On Stochastic Approximation of the Eigenvectors and Eigenvalues of the Expectation of a Random Matrix. Journal of Mathematical Analysis and Applications 106, 69–84 (1985)
Rosenberg, A., Hirschberg, J.: V-measure: A conditional entropy-based external cluster evaluation measure. In: 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 410–420 (2007)
Sanger, T.D.: Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks 2(6), 459–473 (1989)
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley Series in Probability and Statistics. Wiley (September 1992)
Steinbach, M., Ertöz, L., Kumar, V.: The challenges of clustering high dimensional data. New Vistas in Statistical Physics: Applications in Econophysics, Bioinformatics, and Pattern Recognition (2003)
Tasoulis, S., Tasoulis, D., Plagianakos, V.: Enhancing Principal Direction Divisive Clustering. Pattern Recognition 43, 3391–3411 (2010)
Weng, J., Zhang, Y., Hwang, W.: Candid covariance-free incremental principal component analysis (2003)
Zhang, Y., Weng, J.: Convergence analysis of complementary candid incremental principal component analysis (2001)
Zhou, A., Cai, Z., Wei, L., Qian, W.: M-kernel merging: Towards density estimation over data streams. In: International Conference on Database Systems for Advanced Applications, p. 285 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tasoulis, S.K., Tasoulis, D.K., Plagianakos, V.P. (2012). Clustering of High Dimensional Data Streams. In: Maglogiannis, I., Plagianakos, V., Vlahavas, I. (eds) Artificial Intelligence: Theories and Applications. SETN 2012. Lecture Notes in Computer Science(), vol 7297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30448-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-30448-4_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30447-7
Online ISBN: 978-3-642-30448-4
eBook Packages: Computer ScienceComputer Science (R0)