Abstract
This paper proposes a new stream clustering algorithm for text streams. The algorithm combines concepts from stream clustering and text analysis in order to incrementally maintain a number of text droplets that represent topics within the stream. Our algorithm adapts to changes of topic over time and can handle noise and outliers gracefully by decaying the importance of irrelevant clusters. We demonstrate the performance of our approach by using more than one million real-world texts from the video streaming platform Twitch.tv.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C.: Mining text and social streams. ACM SIGKDD Explor. Newsl. 15(2), 9–19 (2014)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 81–92. VLDB Endowment, Berlin, Germany (2003)
Aggarwal, C.C., Yu, P.S.: On clustering massive text and categorical data streams. Knowl. Inf. Syst. 24(2), 171–196 (2010)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Conference on Data Mining (SIAM 2006), pp. 328–339 (2006)
Carnein, M., Assenmacher, D., Trautmann, H.: An empirical comparison of stream clustering algorithms. In: Proceedings of the ACM International Conference on Computing Frontiers (CF 2017), pp. 361–365 (2017)
Hahsler, M., Bolanos, M., Forrest, J.: stream: Infrastructure for Data Stream Mining (2015). https://cran.r-project.org/web/packages/stream/index.html
Hahsler, M., Bolaños, M.: Clustering data streams based on shared density between micro-clusters. IEEE Trans. Knowl. Data Eng. 28(6), 1449–1461 (2016)
López-Ibáñez, M., Dubois-Lacoste, J., Pérez Cáceres, L., Stützle, T., Birattari, M.: The irace package: Iterated racing for automatic algorithm configuration. Oper. Res. Perspect. 3, 43–58 (2016)
Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A.C.O.L.F., Gama, J.: Data stream clustering: A survey. ACM Comput. Surv. 46(1), 131–1331 (2013)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering databases method for very large. In: ACM SIGMOD International Conference on Management of Data, vol. 1, pp. 103–114 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Carnein, M., Assenmacher, D., Trautmann, H. (2017). Stream Clustering of Chat Messages with Applications to Twitch Streams. In: de Cesare, S., Frank, U. (eds) Advances in Conceptual Modeling. ER 2017. Lecture Notes in Computer Science(), vol 10651. Springer, Cham. https://doi.org/10.1007/978-3-319-70625-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-70625-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70624-5
Online ISBN: 978-3-319-70625-2
eBook Packages: Computer ScienceComputer Science (R0)