Skip to main content

Extractive Document Summarization using Non-negative Matrix Factorization

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11707))

Abstract

Effectiveness of Non-negative Matrix Factorization (NMF) in mining latent semantic structure of text has motivated its use for single document summarization. Initial promise shown by the method provokes further research in this field to advance state-of-the-art.

In this paper, we propose two methods to improve the performance of NMF based document summarization method for mining important sentences from the text to construct summary. We use Non-negative Double Singular Value Decomposition (NNDSVD) method to initialize NMF factor matrices, which begets summary stability and improves quality. Next, we propose two novel sentence scoring methods that use parts-based representation of text obtained after NMF decomposition. Both variations exploit information contained in feature and co-efficient matrices to achieve improvement in summary quality. Quality of summaries mined by the proposed methods is evaluated for four public data-sets using standard ROUGE tool.

The proposed method is unsupervised, agnostic to the language of the document and does not use external knowledge. It is also generic, independent of domain and collection. These features of NMF based summarization along with additional advantage of speed make our method a potent candidate for online extractive summarization tool.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We apologize for the forward reference to the data-set overview and the metric description in Sect. 5.

  2. 2.

    We create binary incidence term-sentence matrix (A) for NMF decomposition.

  3. 3.

    http://duc.nist.gov.

  4. 4.

    CNN and DailyMail corpora contain news articles and were originally constructed by [16] for the task of passage-based question answering, and later re-purposed for the task of document summarization.

  5. 5.

    However, we experimented on 533 unique documents from the data-set.

References

  1. Al-Sabahi, K., Zuping, Z., Nadher, M.: A hierarchical structured self-attentive model for extractive document summarization (HSSAS). IEEE Access 6, 24205–24212 (2018)

    Article  Google Scholar 

  2. Alguliev, R.M., Aliguliyev, R.M., Hajirahimova, M.S., Mehdiyev, C.A.: MCMR: maximum coverage and minimum redundant text summarization model. Expert Syst. Appl. 38(12), 14514–14522 (2011)

    Article  Google Scholar 

  3. Alguliyev, R.M., Aliguliyev, R.M., Isazade, N.R., Abdi, A., Idris, N.: COSUM: text summarization based on clustering and optimization. Expert Syst. 36(1), e12340 (2019)

    Article  Google Scholar 

  4. Aliguliyev, R.M.: A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Syst. Appl. 36(4), 7764–7772 (2009)

    Article  Google Scholar 

  5. Belford, M., Mac Namee, B., Greene, D.: Stability of topic modeling via matrix factorization. Expert Syst. Appl. 91, 159–169 (2018)

    Article  Google Scholar 

  6. Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recogn. 41(4), 1350–1362 (2008)

    Article  Google Scholar 

  7. Cheng, J., Lapata, M.: Neural summarization by extracting sentences and words. arXiv preprint arXiv:1603.07252 (2016)

  8. Conroy, J.M., O’leary, D.P.: Text summarization via hidden Markov models. In: 24th ACM SIGIR, pp. 406–407. ACM (2001)

    Google Scholar 

  9. Dong, Y., Shen, Y., Crawford, E., van Hoof, H., Cheung, J.C.K.: BanditSum: extractive summarization as a contextual bandit. arXiv:1809.09672 (2018)

  10. Edmundson, H.P.: New methods in automatic extracting. J. ACM (JACM) 16(2), 264–285 (1969)

    Article  Google Scholar 

  11. Fang, C., Mu, D., Deng, Z., Wu, Z.: Word-sentence co-ranking for automatic extractive text summarization. Expert Syst. Appl. 72, 189–195 (2017)

    Article  Google Scholar 

  12. Fattah, M.A., Ren, F.: GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput. Speech Lang. 23(1), 126–144 (2009)

    Article  Google Scholar 

  13. Genest, P.E., Lapalme, G.: Framework for abstractive summarization using text-to-text generation. In: Proceedings of the Workshop on Monolingual Text-To-Text Generation, pp. 64–73. Association for Computational Linguistics (2011)

    Google Scholar 

  14. Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: 24th ACM SIGIR, pp. 19–25. ACM (2001)

    Google Scholar 

  15. He, Z., et al.: Document summarization based on data reconstruction. In: AAAI (2012)

    Google Scholar 

  16. Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)

    Google Scholar 

  17. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788 (1999)

    Article  Google Scholar 

  18. Lee, J.H., Park, S., Ahn, C.M., Kim, D.: Automatic generic document summarization based on non-negative matrix factorization. Inform. Process. Manage. 45(1), 20–34 (2009)

    Article  Google Scholar 

  19. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. Text Summarization Branches Out (2004)

    Google Scholar 

  20. Lloret, E., Romá-Ferri, M.T., Palomar, M.: COMPENDIUM: a text summarization system for generating abstracts of research papers. Data Knowl. Eng. 88, 164–175 (2013)

    Article  Google Scholar 

  21. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)

    Article  MathSciNet  Google Scholar 

  22. Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004)

    Google Scholar 

  23. Moawad, I.F., Aref, M.: Semantic graph reduction approach for abstractive text summarization. In: ICCES 2012, pp. 132–138. IEEE (2012)

    Google Scholar 

  24. Nallapati, R., Zhai, F., Zhou, B.: SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  25. Nallapati, R., Zhou, B., Ma, M.: Classify or select: neural architectures for extractive document summarization. arXiv:1611.04244 (2016)

  26. Narayan, S., Cohen, S.B., Lapata, M.: Ranking sentences for extractive summarization with reinforcement learning. arXiv preprint arXiv:1802.08636 (2018)

  27. Parveen, D., Ramsl, H.M., Strube, M.: Topical coherence for graph-based extractive summarization. In: Proceedings of the 2015 EMNLP, pp. 1949–1954 (2015)

    Google Scholar 

  28. Qiang, J., Li, Y., Yuan, Y., Liu, W.: Snapshot ensembles of non-negative matrix factorization for stability of topic modeling. Appl. Intell. 48, 1–13 (2018)

    Article  Google Scholar 

  29. Shen, D., Sun, J.T., Li, H., Yang, Q., Chen, Z.: Document summarization using conditional random fields. In: IJCAI, vol. 7, pp. 2862–2867 (2007)

    Google Scholar 

  30. Steinberger, J., Ježek, K.: Text summarization and singular value decomposition. In: Yakhno, T. (ed.) ADVIS 2004. LNCS, vol. 3261, pp. 245–254. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30198-1_25

    Chapter  Google Scholar 

  31. Vikas, O., Meshram, A.K., Meena, G., Gupta, A.: Multiple document summarization using principal component analysis incorporating semantic vector space model. IJCLCLP 13(2), 141–156 (2008)

    Google Scholar 

  32. Wan, X.: Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1137–1145. ACL (2010)

    Google Scholar 

  33. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: 26th ACM SIGIR, pp. 267–273. ACM (2003)

    Google Scholar 

  34. Yang, K., Al-Sabahi, K., Xiang, Y., Zhang, Z.: An integrated graph model for document summarization. Information 9(9), 232 (2018)

    Article  Google Scholar 

  35. Yao, K., Zhang, L., Luo, T., Wu, Y.: Deep reinforcement learning for extractive document summarization. Neurocomputing 284, 52–62 (2018)

    Article  Google Scholar 

  36. Zhou, Q., Yang, N., Wei, F., Huang, S., Zhou, M., Zhao, T.: Neural document summarization by jointly learning to score and select sentences. arXiv:1807.02305 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alka Khurana .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khurana, A., Bhatnagar, V. (2019). Extractive Document Summarization using Non-negative Matrix Factorization. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11707. Springer, Cham. https://doi.org/10.1007/978-3-030-27618-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27618-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27617-1

  • Online ISBN: 978-3-030-27618-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics