Abstract
Microblog has the characteristic of short length, complex structure and words deformation. In this paper, a two stage clustering algorithm based on probabilistic latent semantic analysis (pLSA) and K-means clustering (K-means) is proposed. Besides, this paper also presents the definition of popularity and mechanism of sorting the topics. Experiments show that our method can effectively cluster topics and be applied to microblog hot topic detection.
Chapter PDF
Similar content being viewed by others
References
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, New York, USA, pp. 851–860 (April 2010)
Chen, J.F., Yu, J.J., Shen, Y.: Towards Topic Trend Prediction on a Topic Evolution Model with Social Connection. In: Proceedings of the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, vol. 1, pp. 153–157. IEEE Computer Society, Washington (2012)
Chen, Y., Xu, B., Hao, H., Zhou, S., Cao, J.: User-defined hot topic detection in microblogging. In: Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, New York, USA, pp. 183–186 (August 2013)
Salton, G.: The SMART retrieval system—experiments in automatic document processing, Upper Saddle River, NJ, USA (1971)
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web, New York, USA, pp. 377–386 (May 2006)
Yih, W.T., Meek, C.: Improving similarity measures for short segments of text. AAAI 7(7), 1489–1494 (2007)
Zhai, Y.D., Wang, K.P., Zhang, D.N., Hunag, L., Zhou, C.G.: An algorithm for semantic similarity of short text based on WordNet. Acta Electronica Sinica 40(3), 617–620 (2012)
Banerjee, S., Ramanathan, K.: Clustering short texts using wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, USA, pp. 787–788 (July 2007)
Ma, H.F., Wang, B.: Microblog Online Event Analysis Based on Incremental Topic Model. Computer Engineering 39(3), 191–196 (2013)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1-2), 177–196 (2001)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1(281-297), p. 14 (1967)
Sun, S.P.: Chinese microblog hot topic detection and tracking technology. Beijing Jiaotong University, Beijing (2011)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
Sun, Y., Ma, H., Jia, M., Peiqing, W. (2014). An Efficient Microblog Hot Topic Detection Algorithm Based on Two Stage Clustering. In: Shi, Z., Wu, Z., Leake, D., Sattler, U. (eds) Intelligent Information Processing VII. IIP 2014. IFIP Advances in Information and Communication Technology, vol 432. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44980-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-662-44980-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44979-0
Online ISBN: 978-3-662-44980-6
eBook Packages: Computer ScienceComputer Science (R0)