Skip to main content

Continuous Summarization for Microblog Streams Based on Clustering

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9490))

Abstract

With rapid growth of information found on microblog services, dynamic summarization of evolving information has become an important task. However, the existing work on continuous microblog stream summarization cannot effectively work due to the enormous noises and redundancies. We tackle this problem using a two-step process, first by clustering online microblog streams and maintaining cluster feature vectors. Then, the dynamic summaries of arbitrary time durations are generated from the microblog cluster features. This helps users to better find the worthy interpretations of the online microblog streams. We make use of features to calculate the importance of similar sentences in each cluster for these two steps. Our approach integrates these cluster information with an unsupervised topic evolvement detection model, and illustrate that latent topics to capture the feature dependencies summaries with better performance. Finally, the experimental results on real microblogs demonstrate that our summarization framework can significantly improve the performance and make it comparable to the state-of-the-art summarization approaches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB, pp. 81–92 (2003)

    Google Scholar 

  2. Aggarwal, C.C., Yu, P.S.: On clustering massive text and categorical data streams. Knowl. Inf. Syst. 24(2), 171–196 (2010)

    Article  Google Scholar 

  3. Alguliev, R.M., Aliguliyev, R.M., Isazade, N.R.: DESAMC + DocSum: differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization. Knowl.-Based Syst. 36, 21–38 (2012)

    Article  Google Scholar 

  4. Chen, Y., Zhang, X.M., Li, Z.J., Ng, J.P.: Search engine reinforced semi-supervised classification and graph-based summarization of microblogs. Neurocomputing 152, 274–286 (2015)

    Article  Google Scholar 

  5. Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)

    Google Scholar 

  6. Erkan, G., Radev, D.R.: Lexpagerank: prestige in multi-document text summarization. In: EMNLP, pp. 365–371 (2004)

    Google Scholar 

  7. George, T.: Optimizing word segmentation tasks using ant colony metaheuristics. Literary Linguist. Comput. 29(2), 234–254 (2014)

    Article  Google Scholar 

  8. Han, X.P., Zhao, J.: Named entity disambiguation by leveraging wikipedia semantic knowledge. In: CIKM, pp. 215–224 (2009)

    Google Scholar 

  9. Harabagiu, S.M., Hickl, A.: Relevance modeling for microblog summarization. In: ICWSM (2011)

    Google Scholar 

  10. Inouye, D., Kalita, J.K.: Comparing twitter summarization algorithms for multiple post summaries. In: SocialCom, pp. 298–306 (2011)

    Google Scholar 

  11. Peng, T., Liu, L.: A novel incremental conceptual hierarchical text clustering method using CFu-tree. Appl. Soft Comput. 27, 269–278 (2015)

    Article  Google Scholar 

  12. Sharifi, B., Hutton, M.A., Kalita, J.: Summarizing microblogs automatically. In: HLT-NAACL, pp. 685–688 (2010)

    Google Scholar 

  13. Shou, L.D., Wang, Z.H., Chen, K., Chen, G.: Sumblr: continuous summarization of evolving tweet streams. In: SIGIR, pp. 533–542 (2013)

    Google Scholar 

  14. Verma, S., Vieweg, S., Corvey, W.J., Palen, L., Martin, J.H., Palmer, M., Schram, A., Anderson, K.M.: Natural language processing to the rescue? Extracting situational awareness tweets during mass emergency. In: ICWSM, pp. 49–57 (2011)

    Google Scholar 

  15. Wan, X.J., Yang, J.W.: Multi-document summarization using cluster-based link analysis. In: SIGIR, pp. 299–306 (2008)

    Google Scholar 

  16. Yang, G.B., Wen, D.W., Kinshuk, Chen, N.S., Sutinen, E.: A novel contextual topic model for multi-document summarization. Expert Syst. Appl. 42(3), 1340–1352 (2015)

    Article  Google Scholar 

  17. Wang, D.D., Li, T.: Document update summarization using incremental hierarchical clustering. In: CIKM, pp. 279–288 (2010)

    Google Scholar 

  18. Zhang, C., Baldwin, T., Ho, H., Kimelfeld, B., Li, Y.: Adaptive parser-centric text normalization. In: ACL, pp. 1159–1168 (2013)

    Google Scholar 

Download references

Acknowledgments

The authors thank the anonymous reviewers for their insightful and constructive comments. This work is supported by the National Natural Science Foundation of China “Research on High-order Collaboration, Real-time and Temporal Characteristics in Automatic Test of Safety-critical Systems” (NO. 61300007).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qunhui Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wu, Q., Lv, J., Ma, S. (2015). Continuous Summarization for Microblog Streams Based on Clustering. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9490. Springer, Cham. https://doi.org/10.1007/978-3-319-26535-3_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26535-3_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26534-6

  • Online ISBN: 978-3-319-26535-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics