Advertisement

Extractive Summarization via Overlap-Based Optimized Picking

  • Gaokun DaiEmail author
  • Zhendong Niu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10569)

Abstract

Optimization-based methods regard summarization as a combinatorial optimization problem and formulate it as weighted linear combination of criteria metrics. However due to inconsistent criteria metrics, it is hard to set proper weights. Subjectivity problem also arises since most of them summarize original texts. In this paper, we propose overlap based greedy picking (OGP) algorithm for citation-based extractive summarization. In the algorithm, overlap is defined as a sentence containing several topics. Since including overlaps into summaires indirectly impacts on salience, summary size and content redundancy, OGP effectively avoids the problem of inconsistent metric while dynamically involving criteria into optimization. Despite of greedy method, OGP proves above \((1-1/e)\) of optimal solution. Since citation context is composed of objective evaluations, OGP also solves subjectivity problem. Our experiment results show that OGP outperforms other baseline methods. And various criteria proves effectively involved under the control of single parameter \(\beta \).

Keywords

Overlap-based optimization Non-decreasing submodular objective function Citation-based extractive summarization 

References

  1. 1.
    Ahn, Y.Y., Bagrow, J.P., Lehmann, S.: Link communities reveal multiscale complexity in networks. Nature, pp. 761–764 (2010)CrossRefGoogle Scholar
  2. 2.
    Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and development in information retrieval, pp. 335–336 (1998)Google Scholar
  3. 3.
    Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 1–6 (2004)CrossRefGoogle Scholar
  4. 4.
    Erkan, G., Radev, D.R.: Lexpagerank: Prestige in multi-document text summarization. In: Conference on Empirical Methods in Natural Language Processing, pp. 365–371 (2004)Google Scholar
  5. 5.
    Erkan, G., Radev, D.R.: Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)CrossRefGoogle Scholar
  6. 6.
    Filatova, E., Hatzivassiloglou, V.: A formal model for information selection in multi-sentence text extraction. In: Proceedings of the 20th International Conference on Computational Linguistics, pp. 397–403 (2004)Google Scholar
  7. 7.
    Fung, P., Ngai, G., Cheung, C.S.: Combining optimal clustering and hidden Markov models for extractive summarization. In: Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering, pp. 21–28 (2003)Google Scholar
  8. 8.
    Harabagiu, S., Lacatusu, F.: Topic themes for multi-document summarization. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 202–209 (2005)Google Scholar
  9. 9.
    Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Zhang, X., Wise, G.B.: Cross-document summarization by concept classification. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 121–128 (2002)Google Scholar
  10. 10.
    Hirao, T., Yoshida, Y., Nishino, M., Yasuda, N., Nagata, M.: Single-document summarization as a tree knapsack problem. In: Conference on Empirical Methods in Natural Language Processing, pp. 1515–1520 (2013)Google Scholar
  11. 11.
    Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz (1901)Google Scholar
  12. 12.
    Kaplan, D., Iida, R., Tokunaga, T.: Automatic extraction of citation contexts for research paper summarization: a coreference-chain based approach. In: Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, pp. 88–95 (2009)Google Scholar
  13. 13.
    Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pp. 74–81 (2004)Google Scholar
  14. 14.
    Lin, H., Bilmes, J., Xie, S.: Graph-based submodular selection for extractive summarization. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2009, pp. 381–386 (2009)Google Scholar
  15. 15.
    Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 912–920 (2010)Google Scholar
  16. 16.
    Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 510–520 (2011)Google Scholar
  17. 17.
    Lin, H., Bilmes, J.: Learning mixtures of submodular shells with application to document summarization. In: Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, pp. 479–490 (2012)Google Scholar
  18. 18.
    McDonald, R.: A study of global inference algorithms in multi-document summarization. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 557–564. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-71496-5_51CrossRefGoogle Scholar
  19. 19.
    McKeown, K., Klavans, J., Hatzivassiloglou, V., Barzilay, R., Eskin, E.: Towards multidocument summarization by reformulation: progress and prospects. In: Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, pp. 453–460 (1999)Google Scholar
  20. 20.
    Mei, Q., Guo, J., Radev, D.: Divrank: the interplay of prestige and diversity in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1009–1018 (2010)Google Scholar
  21. 21.
    Mei, Q., Zhai, C.: Generating impact-based summaries for scientific literature. In: Proceedings of the Meeting of the Association for Computational Linguistics, pp. 816–824 (2008)Google Scholar
  22. 22.
    Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., Radev, D., Zajic, D.: Using citations to generate surveys of scientific paradigms. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 584–592 (2009)Google Scholar
  23. 23.
    Morita, H., Sasano, R., Takamura, H., Okumura, M.: Subtree extractive summarization via submodular maximization. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1023–1032 (2013)Google Scholar
  24. 24.
    Nakov, P.I., Schwartz, A.S., Hearst, M.: Citances: citation sentences for semantic analysis of bioscience text. In: Proceedings of the SIGIR 2004 workshop on Search and Discovery in Bioinformatics, pp. 81–88 (2004)Google Scholar
  25. 25.
    Nanba, H., Okumura, M.: Towards multi-paper summarization using reference information. In: International Joint Conference on Artificial Intelligence, pp. 926–931 (1999)CrossRefGoogle Scholar
  26. 26.
    Nishikawa, H., Hirao, T., Makino, T., Matsuo, Y.: Text summarization model based on redundancy-constrained knapsack problem. In: Proceedings of COLING 2012: Posters, pp. 893–902 (2012)Google Scholar
  27. 27.
    Parveen, D., Mesgar, M., Strube, M.: Generating coherent summaries of scientific articles using coherence patterns. In: Conference on Empirical Methods in Natural Language Processing, pp. 772–783 (2016)Google Scholar
  28. 28.
    Parveen, D., Ramsl, H.M., Strube, M.: Topical coherence for graph-based extractive summarization, pp. 1949–1954 (2015)Google Scholar
  29. 29.
    Qazvinian, V., Radev, D.R.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics, pp. 689–696 (2008)Google Scholar
  30. 30.
    Qazvinian, V., Radev, D.R.: Identifying non-explicit citing sentences for citation-based summarization. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp. 555–564 (2010)Google Scholar
  31. 31.
    Qian, X., Liu, Y.: Fast joint compression and summarization via graph cuts. In: Conference on Empirical Methods in Natural Language Processing, pp. 1492–1502 (2013)Google Scholar
  32. 32.
    Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., Dimitrov, S., Drabek, E., Hakim, A., Lam, W., Liu, D., et al.: Mead-a platform for multidocument multilingual text summarization (2004)Google Scholar
  33. 33.
    Radev, D.R., Muthukrishnan, P., Qazvinian, V., Abu-Jbara, A.: The ACL anthology network corpus. Lang. Resour. Eval. 47, 919–944 (2013)CrossRefGoogle Scholar
  34. 34.
    Shen, C., Li, T.: Multi-document summarization via the minimum dominating set. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 984–992 (2010)Google Scholar
  35. 35.
    Siddharthan, A., Nenkova, A., McKeown, K.: Syntactic simplification for improving content selection in multi-document summarization. In: Proceedings of the 20th international conference on Computational Linguistics, pp. 896–902 (2004)Google Scholar
  36. 36.
    Skabar, A., Abdalgader, K.: Clustering sentence-level text using a novel fuzzy relational clustering algorithm. IEEE Trans. Knowl. Data Eng. 25, 62–75 (2013)CrossRefGoogle Scholar
  37. 37.
    Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: Conference of the European Chapter of the Association for Computational Linguistics, pp. 505–513 (2009)Google Scholar
  38. 38.
    Vigneshwaran, L.J.K.P.M., Sharma, M.V.V.D.M.: Non-decreasing sub-modular function for comprehensible summarization. In: Proceedings of NAACL-HLT, pp. 94–101 (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Beijing Institute of TechnologyBeijingChina

Personalised recommendations