Abstract
With rapid growth of information found on microblog services, dynamic summarization of evolving information has become an important task. However, the existing work on continuous microblog stream summarization cannot effectively work due to the enormous noises and redundancies. We tackle this problem using a two-step process, first by clustering online microblog streams and maintaining cluster feature vectors. Then, the dynamic summaries of arbitrary time durations are generated from the microblog cluster features. This helps users to better find the worthy interpretations of the online microblog streams. We make use of features to calculate the importance of similar sentences in each cluster for these two steps. Our approach integrates these cluster information with an unsupervised topic evolvement detection model, and illustrate that latent topics to capture the feature dependencies summaries with better performance. Finally, the experimental results on real microblogs demonstrate that our summarization framework can significantly improve the performance and make it comparable to the state-of-the-art summarization approaches.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB, pp. 81–92 (2003)
Aggarwal, C.C., Yu, P.S.: On clustering massive text and categorical data streams. Knowl. Inf. Syst. 24(2), 171–196 (2010)
Alguliev, R.M., Aliguliyev, R.M., Isazade, N.R.: DESAMC + DocSum: differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization. Knowl.-Based Syst. 36, 21–38 (2012)
Chen, Y., Zhang, X.M., Li, Z.J., Ng, J.P.: Search engine reinforced semi-supervised classification and graph-based summarization of microblogs. Neurocomputing 152, 274–286 (2015)
Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Erkan, G., Radev, D.R.: Lexpagerank: prestige in multi-document text summarization. In: EMNLP, pp. 365–371 (2004)
George, T.: Optimizing word segmentation tasks using ant colony metaheuristics. Literary Linguist. Comput. 29(2), 234–254 (2014)
Han, X.P., Zhao, J.: Named entity disambiguation by leveraging wikipedia semantic knowledge. In: CIKM, pp. 215–224 (2009)
Harabagiu, S.M., Hickl, A.: Relevance modeling for microblog summarization. In: ICWSM (2011)
Inouye, D., Kalita, J.K.: Comparing twitter summarization algorithms for multiple post summaries. In: SocialCom, pp. 298–306 (2011)
Peng, T., Liu, L.: A novel incremental conceptual hierarchical text clustering method using CFu-tree. Appl. Soft Comput. 27, 269–278 (2015)
Sharifi, B., Hutton, M.A., Kalita, J.: Summarizing microblogs automatically. In: HLT-NAACL, pp. 685–688 (2010)
Shou, L.D., Wang, Z.H., Chen, K., Chen, G.: Sumblr: continuous summarization of evolving tweet streams. In: SIGIR, pp. 533–542 (2013)
Verma, S., Vieweg, S., Corvey, W.J., Palen, L., Martin, J.H., Palmer, M., Schram, A., Anderson, K.M.: Natural language processing to the rescue? Extracting situational awareness tweets during mass emergency. In: ICWSM, pp. 49–57 (2011)
Wan, X.J., Yang, J.W.: Multi-document summarization using cluster-based link analysis. In: SIGIR, pp. 299–306 (2008)
Yang, G.B., Wen, D.W., Kinshuk, Chen, N.S., Sutinen, E.: A novel contextual topic model for multi-document summarization. Expert Syst. Appl. 42(3), 1340–1352 (2015)
Wang, D.D., Li, T.: Document update summarization using incremental hierarchical clustering. In: CIKM, pp. 279–288 (2010)
Zhang, C., Baldwin, T., Ho, H., Kimelfeld, B., Li, Y.: Adaptive parser-centric text normalization. In: ACL, pp. 1159–1168 (2013)
Acknowledgments
The authors thank the anonymous reviewers for their insightful and constructive comments. This work is supported by the National Natural Science Foundation of China “Research on High-order Collaboration, Real-time and Temporal Characteristics in Automatic Test of Safety-critical Systems” (NO. 61300007).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wu, Q., Lv, J., Ma, S. (2015). Continuous Summarization for Microblog Streams Based on Clustering. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9490. Springer, Cham. https://doi.org/10.1007/978-3-319-26535-3_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-26535-3_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26534-6
Online ISBN: 978-3-319-26535-3
eBook Packages: Computer ScienceComputer Science (R0)