SoRTESum: A Social Context Framework for Single-Document Summarization

Nguyen, Minh-Tien; Nguyen, Minh-Le

doi:10.1007/978-3-319-30671-1_1

SoRTESum: A Social Context Framework for Single-Document Summarization

Minh-Tien Nguyen^21,22 &
Minh-Le Nguyen²¹

Conference paper

4380 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Abstract

The combination of web document contents, sentences and users’ comments from social networks provides a viewpoint of a web document towards a special event. This paper proposes a framework named SoRTESum to take advantage of information from Twitter viz. Diversity and reflection of document content to generate high-quality summaries by a novel sentence similarity measurement. The framework first formulates sentences and tweets by recognizing textual entailment (RTE) relation to incorporate social information. Next, they are modeled in a Dual Wing Entailment Graph, which captures the entailment relation to calculate the sentence similarity based on mutual reinforcement information. Finally, important sentences and representative tweets are selected by a ranking algorithm. By incorporating social information, SoRTESum obtained improvements over state-of-the-art unsupervised baselines e.g., Random, SentenceLead, LexRank of 0.51 %–8.8 % of ROUGE-1 and comparable results with strong supervised methods e.g., L2R and CrossL2R trained by RankBoost for single-document summarization.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://twitter.com - a microblogging system.
2.
The RTE term was kept instead of the similarity because all features were derived from RTE task.
3.
http://www1.se.cuhk.edu.hk/~zywei/data/hilightextraction.zip.
4.
http://edition.cnn.com.
5.
http://www.usatoday.com.
6.
http://snowball.tartarus.org/algorithms/porter/stemmer.html.
7.
https://pypi.python.org/pypi/sumy/0.3.0.
8.
https://people.cs.umass.edu/~vdang/ranklib.html.
9.
https://github.com/klb3713/sentence2vec/blob/master/demo.py.
10.
https://meta.wikimedia.org/wiki/Data_dump_torrents.
11.
http://kavita-ganesan.com/content/rouge-2.0-documentation.
12.
http://150.65.242.101:9293.

References

Dagan, I., Dolan, B., Magnini, B., Roth, D.: Recognizing textual entailment: rational, evaluation and approaches - erratum. Nat. Lang. Eng. 16(1), 105–105 (2010)
Article Google Scholar
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Google Scholar
Gao, W., Li, P., Darwish, K.: Joint topic modeling for event summarization across news, social media streams. In: CIKM, pp. 1173–1182 (2012)
Google Scholar
Meishan, H., Sun, A., Lim, E.-P.: Comments-oriented blog summarization by sentence extraction. In: CIKM, pp. 901–904 (2007)
Google Scholar
Meishan, H., Sun, A., Lim, E.-P.: Comments-oriented document summarization: understanding document with readers’ feedback. In: SIGIR, pp. 291–298 (2008)
Google Scholar
Po, H., Sun, C., Longfei, W., Ji, D.-H., Teng, C.: Social summarization via automatically discovered social context. In: IJCNLP pp. 483–490 (2011)
Google Scholar
Huang, L., Li, H., Huang, L.: Comments-oriented document summarization based on multi-aspect co-feedback ranking. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds.) WAIM 2013. LNCS, vol. 7923, pp. 363–374. Springer, Heidelberg (2013)
Chapter Google Scholar
Lin, C.-Y., Hovy, E.H.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: HLT-NAACL, pp. 71–78 (2003)
Google Scholar
Yue, L., Zhai, C.X., Sundaresan, N.: Rated aspect summarization of short comments. In: WWW, pp. 131–140 (2009)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Nenkova, A.: Automatic text summarization of newswire: lessons learned from the document understanding conference. In: AAAI pp. 1436–1441 (2005)
Google Scholar
Nguyen, M.-T., Ha, Q.-T., Nguyen, T.-D., Nguyen, T.-T., Nguyen, L.-M.: Recognizing textual entailment in vietnamese text: an experimental study. In: KSE (2015). doi:10.1109/KSE.2015.23
Nguyen, M.-T., Kitamoto, A., Nguyen, T.-T.: TSum4act: a framework for retrieving and summarizing actionable tweets during a disaster for reaction. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS, vol. 9078, pp. 64–75. Springer, Heidelberg (2015)
Chapter Google Scholar
Porter, M.F.: Snowball: a language for stemming algorithms (2011)
Google Scholar
Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: SIGIR, pp. 299–306 (2008)
Google Scholar
Wei, Z., Gao, W.: Utilizing microblogs for automatic news highlights extraction. In: COLING, pp. 872–883 (2014)
Google Scholar
Wei, Z., Gao, W.: Gibberish, assistant, or master? Using tweets linking to news for extractive single-document summarization. In: SIGIR, pp. 1003–1006 (2015)
Google Scholar
Yang, Z., Cai, K., Tang, J., Zhang, L., Zhong, S., Li, J.: Social context summarization. In: SIGIR, pp. 255–264 (2011)
Google Scholar

Download references

Acknowledgment

We would like to thank to Preslav Nakov and Wei Gao for useful discussions and insightful comments on earlier drafts; Chien-Xuan Tran for building the web interface. We also thank to anonymous reviewers for their detailed comments for improving our paper. This work was partly supported by JSPS KAKENHI Grant number 3050941.

Author information

Authors and Affiliations

Japan Advanced Institute of Science and Technology (JAIST), 1-1 Asahidai, Nomi, Ishikawa, 923-1292, Japan
Minh-Tien Nguyen & Minh-Le Nguyen
Hung Yen University of Technology and Education (UTEHY), Hung Yen, Vietnam
Minh-Tien Nguyen

Authors

Minh-Tien Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Minh-Le Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minh-Tien Nguyen .

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Padova, Italy
Nicola Ferro
Faculty of Informatics, University of Lugano (USI), Lugano, Switzerland
Fabio Crestani
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Marie-Francine Moens
Systèmes d’informations, Big Data et Recherche d’Information, Institut de Recherche en Informatique de Toulouse IRIT/équipe SIG, Toulouse Cedex 04, France
Josiane Mothe
Yahoo! Labs London, London, UK
Fabrizio Silvestri
Department of Information Engineering, University of Padua, Padova, Italy
Giorgio Maria Di Nunzio
TU Delft - EWI/ST/WIS, Delft, The Netherlands
Claudia Hauff
Department of Information Engineering, University of Padua, Padova, Italy
Gianmaria Silvello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, MT., Nguyen, ML. (2016). SoRTESum: A Social Context Framework for Single-Document Summarization. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-30671-1_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics