A New Automatic Multi-document Text Summarization using Topic Modeling

  • Rajendra Kumar RoulEmail author
  • Samarth Mehrotra
  • Yash Pungaliya
  • Jajati Keshari Sahoo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11319)


This paper proposes a novel methodology to generate an extractive text summary from a corpus of documents. Unlike most existing methods, our approach is designed in such a way that the final generated summary covers all the important topics from a corpus of documents. We propose a heuristic method which uses the Latent Dirichlet Allocation technique to identify the optimum number of independent topics present in the corpus. Some of the sentences are identified as the important sentences from each independent topic using a set of word and sentence level features. In order to ensure that the final summary is coherent, we suggest a novel technique to reorder the sentences based on sentence similarity. The use of topic modeling ensures that all the important content from the corpus of documents is captured in the extracted summary which in turn strengthen the summary. Experimental results show that the proposed approach is promising.


Extractive Multi-document ROUGE Summarization Topic modeling 


  1. 1.
    Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  2. 2.
    Ganesan, K., Zhai, C., Han, J.: Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 340–348. Association for Computational Linguistics (2010)Google Scholar
  3. 3.
    Moratanch, N., Chitrakala, S.: A survey on extractive text summarization. In: 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP), pp. 1–6. IEEE (2017)Google Scholar
  4. 4.
    Fang, C., Mu, D., Deng, Z., Wu, Z.: Word-sentence co-ranking for automatic extractive text summarization. Expert Syst. Appl. 72, 189–195 (2017)CrossRefGoogle Scholar
  5. 5.
    Nallapati, R., Zhai, F., Zhou, B.: SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents. In: AAAI, pp. 3075–3081 (2017)Google Scholar
  6. 6.
    Roul, R.K., Sahoo, J.K., Goel, R.: Deep learning in the domain of multi-document text summarization. In: Shankar, B.U., Ghosh, K., Mandal, D.P., Ray, S.S., Zhang, D., Pal, S.K. (eds.) PReMI 2017. LNCS, vol. 10597, pp. 575–581. Springer, Cham (2017). Scholar
  7. 7.
    Narayan, S., Cohen, S.B., Lapata, M.: Ranking sentences for extractive summarization with reinforcement learning. In: 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. US. ACL anthology, New Orleans (2018)Google Scholar
  8. 8.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)zbMATHGoogle Scholar
  9. 9.
    Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Fuglede, B., Topsoe, F.: Jensen-Shannon divergence and Hilbert space embedding. In: Proceedings, International Symposium on Information Theory. ISIT 2004, p. 31. IEEE (2004)Google Scholar
  11. 11.
    Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, vol. 8, pp. 74–81 (2004)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Rajendra Kumar Roul
    • 1
    Email author
  • Samarth Mehrotra
    • 2
  • Yash Pungaliya
    • 2
  • Jajati Keshari Sahoo
    • 3
  1. 1.Department of Computer ScienceThapar Institute of Engineering and TechnologyPatialaIndia
  2. 2.Department of Computer ScienceBITS-PilaniPilaniIndia
  3. 3.Department of MathematicsBITS-PilaniPilaniIndia

Personalised recommendations