Skip to main content
Log in

Event phase oriented news summarization

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Event summarization is a task to generate a single, concise textual representation of an event. This task does not consider multiple development phases in an event. However, news articles related to long and complicated events often involve multiple phases. Thus, traditional approaches for event summarization generally have difficulty in capturing event phases in summarization effectively. In this paper, we define the task of Event Phase Oriented News Summarization (EPONS). In this approach, we assume that a summary contains multiple timelines, each corresponding to an event phase. We model the semantic relations of news articles via a graph model called Temporal Content Coherence Graph. A structural clustering algorithm EPCluster is designed to separate news articles into several groups corresponding to event phases. We apply a vertex-reinforced random walk to rank news articles. The ranking results are further used to create timelines. Extensive experiments conducted on multiple datasets show the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

Notes

  1. See background info at: https://en.wikipedia.org/wiki/Egyptian_Revolution_of_2011.

  2. One issue that needs to be discussed here is that because our dataset is relatively large and there are over k news articles in each cluster regarding an event phase, we set a uniform parameter k for all the event phases. We can also modify the definition such that k varies for different event phases without changing our algorithm.

  3. In the implementation, we set one day as a time slot and compute w t (⋅) based on publication date difference. See Figure 2a and b.

  4. Based on the definition, we can see that each news article d i and node v i has a one-to-one correspondence relationship. In the following, without ambiguity, we will use d i to represent a node and a news article interchangeably.

  5. Many other methods focus on timeline generation. However, the summaries we generates are headlines and dates, making it difficult to compare our method with them. We will investigate how to modify these algorithms for our task in the future.

References

  1. Bansal, T., Kanti Das, M., Bhattacharyya, C.: Content driven user profiling for comment-worthy recommendations of news and blog articles. In: Proceedings of the 9th ACM, Conference on Recommender Systems, pp. 195–202 (2015)

  2. Bauer, S., Teufel, S.: Unsupervised timeline generation for wikipedia history articles. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2343–2349 (2016)

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. 30(1–7), 107–117 (1998)

    Google Scholar 

  5. Cao, Z., Wei, F., Li, S., Li, W., Zhou, M., Wang, H.: Learning summary prior representation for extractive summarization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, pp. 829–833 (2015)

  6. Chang, L., Li, W., Lin, X., Qin, L., Zhang, W.: pscan: Fast and exact structural graph clustering. In: Proceedings of the 32nd IEEE, International Conference on Data Engineering, pp. 253–264 (2016)

  7. Chen, C.C., Chen, Y.-T., Sun, Y.S., Chen, M.C.: Life cycle modeling of news events using aging theory. In: Proceedings of the 14th European Conference on Machine Learning, pp. 47–59 (2003)

  8. Chen, J., Niu, Z., Fu, H.: A multi-news timeline summarization algorithm based on aging theory. In: Web Technologies and Applications - 17th Asia-Pacific Web Conference, pp. 449–460 (2015)

  9. Chieu, H.L., Lee, Y.K.: Query based event extraction along a timeline. In: Proceedings of the 27th Annual International ACM, SIGIR Conference on Research and Development in Information Retrieval, pp. 425–432 (2004)

  10. Chopra, S., Auli, M., Rush, A.M.: Abstractive sentence summarization with attentive recurrent neural networks. In: Human Language Technologies: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 93–98 (2016)

  11. Conroy, J.M., O’Leary, D.P.: Text summarization via hidden markov models. In: Proceedings of the 24th Annual International ACM, SIGIR Conference on Research and Development in Information Retrieval, pp. 406–407 (2001)

  12. Davis, J.V., Dhillon, I.S.: Estimating the global pagerank of Web communities. In: Proceedings of the Twelfth ACM, SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 116–125 (2006)

  13. de Kretser, O., Moffat, A.: Effective document presentation with a locality-based similarity heuristic. In: Proceedings of the 22nd Annual International ACM, SIGIR Conference on Research and Development in Information Retrieval, pp. 113–120 (1999)

  14. Diao, Q., Shan, J.: A new Web page summarization method. In: Proceedings of the 29th Annual International ACM, SIGIR Conference on Research and Development in Information Retrieval, pp. 639–640 (2006)

  15. Dolby, J., Fokoue, A., Kalyanpur, A., Kershenbaum, A., Schonberg, E., Srinivas, K., Ma, L.: Scalable semantic retrieval through summarization and refinement. In: Proceedings of the Twenty-Second AAAI, Conference on Artificial Intelligence, pp. 299–304 (2007)

  16. Erkan, G., Radev, D.R.: Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22, 457–479 (2004)

    Article  Google Scholar 

  17. Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM, SIGIR Conference on Research and Development in Information Retrieval, pp. 19–25 (2001)

  18. Gu, Y., Yang, Z., Xu, G., Nakano, M., Toyoda, M., Kitsuregawa, M.: Exploration on efficient similar sentences extraction. World Wide Web 17(4), 595–626 (2014)

    Article  Google Scholar 

  19. Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k-means clustering algorithm. J. R. Stat. Soc.: Ser. C: Appl. Stat. 28(1), 100–108 (1979)

    MATH  Google Scholar 

  20. He, Z., Chen, C., Bu, J., Wang, C., Zhang, L., Cai, D., He, X.: Document summarization based on data reconstruction. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (2012)

  21. Hong, K., Nenkova, A.: Improving the estimation of word importance for news multi-document summarization. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 712–721 (2014)

  22. Jiang, L., Luo, P., Wang, J., Xiong, Y., Lin, B., Wang, M., An, N.: GRIAS: an entity-relation graph based framework for discovering entity aliases. In: Proceedins of the 2013 IEEE, 13th International Conference on Data Mining, pp. 310–319 (2013)

  23. Kessler, R., Tannier, X., Hagége, C., Moriceau, V., Bittar, A.: Finding salient years for building thematic timelines. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 730–739 (2012)

  24. Khuller, S., Moss, A., Naor, J.: The budgeted maximum coverage problem. Inf. Process. Lett. 70(1), 39–45 (1999)

    Article  MathSciNet  Google Scholar 

  25. Knights, D., Mozer, M.C., Nicolov, N.: Detecting topic drift with compound topic models. In: Proceedings of the Third International Conference on Weblogs and Social Media (2009)

  26. Li, J., Li, S.: Evolutionary hierarchical dirichlet process for timeline summarization. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 556–560 (2013)

  27. Li, W., He, L., Zhuge, H.: Abstractive news summarization based on event semantic link network. In: Proceedings of the 26th International Conference on Computational Linguistics, pp. 236–246 (2016)

  28. Lin, C.-Y., Hovy, E.H.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (2003)

  29. Mei, Q., Guo, J., Radev, D.R.: Divrank: the interplay of prestige and diversity in information networks. In: Proceedings of the 16th ACM, SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1009–1018 (2010)

  30. Ng, J.-P., Chen, Y., Kan, M.-Y., Li, Z.: Exploiting timelines to enhance multi-document summarization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 923–933 (2014)

  31. Parveen, D., Ramsl, H.-M., Strube, M.: Topical coherence for graph-based extractive summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1949–1954 (2015)

  32. Pemantle, R.: Vertex-reinforced random walk. Probab. Theory Relat. Fields 92(1), 117–136 (1992)

    Article  MathSciNet  Google Scholar 

  33. Peng, M., Zhu, J., Li, X., Huang, J., Wang, H., Zhang, Y.: Central topic model for event-oriented topics mining in microblog stream. In: Proceedings of the 24th ACM, International Conference on Information and Knowledge Management, pp. 1611–1620 (2015)

  34. Qian, X., Liu, Y.: Fast joint compression and summarization via graph cuts. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1492–1502 (2013)

  35. Ren, P., Wei, F., Chen, Z., Ma, J., Zhou, M.: A redundancy-aware sentence regression framework for extractive summarization. In: Proceedings of the 26th International Conference on Computational Linguistics, pp. 33–43 (2016)

  36. Seeland, M., Berger, S.A., Stamatakis, A., Kramer, S.: Parallel structural graph clustering. In: Proceedings of the 2011 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 256–272 (2011)

  37. Shen, W., Wang, J., Luo, P., Wang, M.: A hybrid framework for semantic relation extraction over enterprise data. Int. J. Semantic Web Inf. Syst. 11(3), 1–24 (2015)

    Article  Google Scholar 

  38. Tran, G.B., Alrifai, M., Herder, E.: Timeline summarization from relevant headlines. In: Advances in Information Retrieval - 37th European Conference on IR, Research, pp. 245–256 (2015)

  39. Tran, G.B., Herder, E., Markert, K.: Joint graphical models for year selection in timeline summarization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, pp. 1598–1607 (2015)

  40. Unankard, S., Li, X., Sharaf, M.A.: Emerging event detection in social networks with location sensitivity. World Wide Web 18(5), 1393–1417 (2015)

    Article  Google Scholar 

  41. Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st Annual International ACM, SIGIR Conference on Research and Development in Information Retrieval, pp. 299–306 (2008)

  42. Wan, X., Zhang, J.: CTSUM: extracting more certain summaries for news articles. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–796 (2014)

  43. Wang, C., Zhang, R., He, X., Zhou, A.: Nerank: Ranking named entities in document collections. In: Proceedings of the 25th International Conference on World Wide Web, pp. 123–124 (2016)

  44. Wang, C., Zhang, R., He, X., Zhou, G., Zhou, A.: Event phase extraction and summarization. In: Proceedings of the 17th International Conference on Web Information Systems Engineering, pp. 473–488 (2016)

  45. Wang, C., Zhang, R., He, X., Zhou, G., Zhou, A.: Nerank: Bringing order to named entities from texts. In: Web Technologies and Applications - Proceedings of the 18th Asia-Pacific Web Conference, pp. 15–27 (2016)

  46. Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: SCAN: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM, SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 824–833 (2007)

  47. Yan, R., Kong, L., Huang, C., Wan, X., Li, X., Zhang, Y.: Timeline generation through evolutionary trans-temporal summarization. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 433–443 (2011)

  48. Yan, J., Cheng, W., Wang, C., Liu, J., Gao, M., Zhou, A.: Optimizing word set coverage for multi-event summarization. J. Comb Optim. 30(4), 996–1015 (2015)

    Article  MathSciNet  Google Scholar 

  49. Yu, H., Hatzivassiloglou, V.: Owards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (2003)

  50. Zhao, W.X., Guo, Y., Yan, R., He, Y., Li, X.: Timeline generation with social attention. In: Proceedings of the 36th International ACM, SIGIR conference on research and development in Information Retrieval, pp. 1061–1064 (2013)

  51. Zhou, E., Zhong, N., Li, Y.: Extracting news blog hot topics based on the W2T methodology. World Wide Web 17(3), 377–404 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This work is partially supported by the National Key Research and Development Program of China under Grant No. 2016YFB1000904. Chengyu Wang is partially supported by the Outstanding Doctoral Dissertation Cultivation Plan of Action under Grant No. YB2016040. This manuscript is an extended version of the paper “Event Phase Extraction and Summarization” presented at WISE 2016 [44].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofeng He.

Additional information

This article belongs to the Topical Collection: Special Issue on Web Information Systems Engineering

Guest Editors: Wojciech Cellary, Hua Wang, and Yanchun Zhang

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., He, X. & Zhou, A. Event phase oriented news summarization. World Wide Web 21, 1069–1092 (2018). https://doi.org/10.1007/s11280-017-0501-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-017-0501-x

Keywords

Navigation