Automatic Multi-Document Summarization Based on Keyword Density and Sentence-Word Graphs

  • Feiyue Ye (叶飞跃)
  • Xinchen Xu (徐欣辰)Email author


As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of documents. In this paper, we propose a sentence-word two layer graph algorithm combining with keyword density to generate the multi-document summarization, known as Graph & Keywordρ. The traditional graph methods of multi-document summarization only consider the influence of sentence and word in all documents rather than individual documents. Therefore, we construct multiple word graph and extract right keywords in each document to modify the sentence graph and to improve the significance and richness of the summary. Meanwhile, because of the differences in the words importance in documents, we propose to use keyword density for the summaries to provide rich content while using a small number of words. The experiment results show that the Graph & Keywordρ method outperforms the state of the art systems when tested on the Duc2004 data set.

Key words

multi-document graph algorithm keyword density Graph & Keywordρ Duc2004 

CLC number

TP 391 

Document code


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    CHAO S, Tao L. Multi-document summarization via the minimum dominating set [C]//Proceedings of the 23rd International Conference on Computational Linguistics. Beijing: ACM, 2010: 984–992.Google Scholar
  2. [2]
    BHARTI S K, BABU K S, PRADHAN A. Automatic keyword extraction for text summarization in multidocument e-newspapers articles [J]. European Journal of Advances in Engineering and Technology, 2017, 4(6): 410–427.Google Scholar
  3. [3]
    MA L, HE T, LI F, et al. Query-focused multidocument summarization using keyword extraction [C]//Proceedings of 2008 International Conference on Computer Science and Software Engineering. Wuhan: IEEE, 2008: 20–23.Google Scholar
  4. [4]
    LITVAK M, LAST M. Graph-based keyword extraction for single-document summarization [C]//Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization. Manchester, UK: ACM, 2008: 17–24.Google Scholar
  5. [5]
    HONG K, CONROY J M, FAVRE B, et al. A repository of state of the art and competitive baseline summaries for generic news summarization [C]//Proceedings of the 9th International Conference on Language Resources and Evaluation. Reykjavik, Iceland: ELRA, 2014: 1608–1616.Google Scholar
  6. [6]
    RADEV D R, JING H, STYS M, et al. Centroid-based summarization of multiple documents [J]. Information Processing & Management, 2004, 40(6): 919–938.CrossRefzbMATHGoogle Scholar
  7. [7]
    ERKAN G, RADEV D R. Lexrank: Graph-based lexical centrality as salience in text summarization [J]. Journal of Artificial Intelligence Research, 2004, 22(1): 457–479.CrossRefGoogle Scholar
  8. [8]
    WAN X, YANG J. Multi-document summarization using cluster-based link analysis [C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Singapore: ACM, 2008: 299–306.Google Scholar
  9. [9]
    WAN X, YANG J, XIAO J. Manifold-ranking based topic-focused multi-document summarization [C]// Proceedings of the 20th International Joint Conference on Artifical Intelligence. Hyderabad, India: Morgan Kaufmann Publishers Inc., 2007: 2903–2908.Google Scholar
  10. [10]
    WAN X, XIAO J. Graph-based multi-modality learning for topic-focused multi-document summarization [C]//Proceedings of the 21th International Joint Conference on Artificial Intelligence. Pasadena, California, USA: Morgan Kaufmann Publishers Inc., 2009: 1586–1591.Google Scholar
  11. [11]
    CAO Z, LI W, LI S, et al. Improving multi-document summarization via text classification [C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, California, USA: AAAI, 2017: 3053–3059.Google Scholar
  12. [12]
    HADYAN F, SHAUFIAH BIJAKSANA M A. Comparison of document index graph using TextRank and HITS weighting method in automatic text summarization [J]. Journal of Physics: Conference Series, 2017, 801(1): 012076.Google Scholar
  13. [13]
    XIONG C, LI Y, LV K. Multi-documents summarization based on the TextRank and its application in argumentation system [C]//Proceedings of the 5th International Conference on Emerging Internetworking, Data & Web Technologies. Wuhan, China: Springer, 2017: 457–466.Google Scholar
  14. [14]
    YU S, SU J, LI P, et al. Towards high performance text mining: A TextRank-based method for automatic text summarization [J]. International Journal of Grid and High Performance Computing, 2016, 8(2): 58–75.CrossRefGoogle Scholar
  15. [15]
    BRITSOM D V, BRONSELAER A, TR´E G D. Using data merging techniques for generating multidocument summarizations [J]. IEEE Transactions on Fuzzy Systems, 2015, 23(3): 576–592.CrossRefGoogle Scholar
  16. [16]
    BARRIOS F, Ló PEZ F, ARGERICH L, et al. Variations of the similarity function of TextRank for automated summarization [EB/OL]. (2016-02-11). [2017-10-23]. Scholar
  17. [17]
    AL-HASHEMI R. Text summarization extraction system (TSES) Using extracted keywords [J]. International Arab Journal of E-Technology, 2010, 1(4): 164–168.Google Scholar
  18. [18]
    LIN C Y. ROUGE: A package for automatic evaluation of summaries [C]//Proceedings of Workshop on Text Summarization Branches Out. Barcelina, Spain: ACL, 2004.Google Scholar
  19. [19]
    WANG D, ZHU S, LI T, et al. Integrating document clustering and multidocument summarization [J]. ACM Transactions on Knowledge Discovery from Data, 2011, 5(3): 1–26.MathSciNetCrossRefGoogle Scholar
  20. [20]
    KULESZA A, TASKAR B. Determinantal point processes for machine learning [J]. Foundations and Trends® in Machine Learning, 2012, 5(2/3): 123–286.CrossRefzbMATHGoogle Scholar
  21. [21]
    DAVIS S T, CONROY J M, SCHLESINGER J D. OCCAMS —An optimal combinatorial covering algorithm for multi-document summarization [C]//Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops. Brussels, Belgium: IEEE, 2012: 454–463.Google Scholar

Copyright information

© Shanghai Jiaotong University and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer Engineering and ScienceShanghai UniversityShanghaiChina

Personalised recommendations