Skip to main content

SoRTESum: A Social Context Framework for Single-Document Summarization

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Abstract

The combination of web document contents, sentences and users’ comments from social networks provides a viewpoint of a web document towards a special event. This paper proposes a framework named SoRTESum to take advantage of information from Twitter viz. Diversity and reflection of document content to generate high-quality summaries by a novel sentence similarity measurement. The framework first formulates sentences and tweets by recognizing textual entailment (RTE) relation to incorporate social information. Next, they are modeled in a Dual Wing Entailment Graph, which captures the entailment relation to calculate the sentence similarity based on mutual reinforcement information. Finally, important sentences and representative tweets are selected by a ranking algorithm. By incorporating social information, SoRTESum obtained improvements over state-of-the-art unsupervised baselines e.g., Random, SentenceLead, LexRank of 0.51 %–8.8 % of ROUGE-1 and comparable results with strong supervised methods e.g., L2R and CrossL2R trained by RankBoost for single-document summarization.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://twitter.com - a microblogging system.

  2. 2.

    The RTE term was kept instead of the similarity because all features were derived from RTE task.

  3. 3.

    http://www1.se.cuhk.edu.hk/~zywei/data/hilightextraction.zip.

  4. 4.

    http://edition.cnn.com.

  5. 5.

    http://www.usatoday.com.

  6. 6.

    http://snowball.tartarus.org/algorithms/porter/stemmer.html.

  7. 7.

    https://pypi.python.org/pypi/sumy/0.3.0.

  8. 8.

    https://people.cs.umass.edu/~vdang/ranklib.html.

  9. 9.

    https://github.com/klb3713/sentence2vec/blob/master/demo.py.

  10. 10.

    https://meta.wikimedia.org/wiki/Data_dump_torrents.

  11. 11.

    http://kavita-ganesan.com/content/rouge-2.0-documentation.

  12. 12.

    http://150.65.242.101:9293.

References

  1. Dagan, I., Dolan, B., Magnini, B., Roth, D.: Recognizing textual entailment: rational, evaluation and approaches - erratum. Nat. Lang. Eng. 16(1), 105–105 (2010)

    Article  Google Scholar 

  2. Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)

    Google Scholar 

  3. Gao, W., Li, P., Darwish, K.: Joint topic modeling for event summarization across news, social media streams. In: CIKM, pp. 1173–1182 (2012)

    Google Scholar 

  4. Meishan, H., Sun, A., Lim, E.-P.: Comments-oriented blog summarization by sentence extraction. In: CIKM, pp. 901–904 (2007)

    Google Scholar 

  5. Meishan, H., Sun, A., Lim, E.-P.: Comments-oriented document summarization: understanding document with readers’ feedback. In: SIGIR, pp. 291–298 (2008)

    Google Scholar 

  6. Po, H., Sun, C., Longfei, W., Ji, D.-H., Teng, C.: Social summarization via automatically discovered social context. In: IJCNLP pp. 483–490 (2011)

    Google Scholar 

  7. Huang, L., Li, H., Huang, L.: Comments-oriented document summarization based on multi-aspect co-feedback ranking. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds.) WAIM 2013. LNCS, vol. 7923, pp. 363–374. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. Lin, C.-Y., Hovy, E.H.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: HLT-NAACL, pp. 71–78 (2003)

    Google Scholar 

  9. Yue, L., Zhai, C.X., Sundaresan, N.: Rated aspect summarization of short comments. In: WWW, pp. 131–140 (2009)

    Google Scholar 

  10. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)

    Article  MathSciNet  Google Scholar 

  11. Nenkova, A.: Automatic text summarization of newswire: lessons learned from the document understanding conference. In: AAAI pp. 1436–1441 (2005)

    Google Scholar 

  12. Nguyen, M.-T., Ha, Q.-T., Nguyen, T.-D., Nguyen, T.-T., Nguyen, L.-M.: Recognizing textual entailment in vietnamese text: an experimental study. In: KSE (2015). doi:10.1109/KSE.2015.23

  13. Nguyen, M.-T., Kitamoto, A., Nguyen, T.-T.: TSum4act: a framework for retrieving and summarizing actionable tweets during a disaster for reaction. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS, vol. 9078, pp. 64–75. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  14. Porter, M.F.: Snowball: a language for stemming algorithms (2011)

    Google Scholar 

  15. Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: SIGIR, pp. 299–306 (2008)

    Google Scholar 

  16. Wei, Z., Gao, W.: Utilizing microblogs for automatic news highlights extraction. In: COLING, pp. 872–883 (2014)

    Google Scholar 

  17. Wei, Z., Gao, W.: Gibberish, assistant, or master? Using tweets linking to news for extractive single-document summarization. In: SIGIR, pp. 1003–1006 (2015)

    Google Scholar 

  18. Yang, Z., Cai, K., Tang, J., Zhang, L., Zhong, S., Li, J.: Social context summarization. In: SIGIR, pp. 255–264 (2011)

    Google Scholar 

Download references

Acknowledgment

We would like to thank to Preslav Nakov and Wei Gao for useful discussions and insightful comments on earlier drafts; Chien-Xuan Tran for building the web interface. We also thank to anonymous reviewers for their detailed comments for improving our paper. This work was partly supported by JSPS KAKENHI Grant number 3050941.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minh-Tien Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, MT., Nguyen, ML. (2016). SoRTESum: A Social Context Framework for Single-Document Summarization. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30671-1_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30670-4

  • Online ISBN: 978-3-319-30671-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics