Extractive Document Summarization using Non-negative Matrix Factorization

Khurana, Alka; Bhatnagar, Vasudha

doi:10.1007/978-3-030-27618-8_6

Extractive Document Summarization using Non-negative Matrix Factorization

Conference paper
First Online: 06 August 2019

747 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11707))

Abstract

Effectiveness of Non-negative Matrix Factorization (NMF) in mining latent semantic structure of text has motivated its use for single document summarization. Initial promise shown by the method provokes further research in this field to advance state-of-the-art.

In this paper, we propose two methods to improve the performance of NMF based document summarization method for mining important sentences from the text to construct summary. We use Non-negative Double Singular Value Decomposition (NNDSVD) method to initialize NMF factor matrices, which begets summary stability and improves quality. Next, we propose two novel sentence scoring methods that use parts-based representation of text obtained after NMF decomposition. Both variations exploit information contained in feature and co-efficient matrices to achieve improvement in summary quality. Quality of summaries mined by the proposed methods is evaluated for four public data-sets using standard ROUGE tool.

The proposed method is unsupervised, agnostic to the language of the document and does not use external knowledge. It is also generic, independent of domain and collection. These features of NMF based summarization along with additional advantage of speed make our method a potent candidate for online extractive summarization tool.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We apologize for the forward reference to the data-set overview and the metric description in Sect. 5.
2.
We create binary incidence term-sentence matrix (A) for NMF decomposition.
3.
http://duc.nist.gov.
4.
CNN and DailyMail corpora contain news articles and were originally constructed by [16] for the task of passage-based question answering, and later re-purposed for the task of document summarization.
5.
However, we experimented on 533 unique documents from the data-set.

References

Al-Sabahi, K., Zuping, Z., Nadher, M.: A hierarchical structured self-attentive model for extractive document summarization (HSSAS). IEEE Access 6, 24205–24212 (2018)
Article Google Scholar
Alguliev, R.M., Aliguliyev, R.M., Hajirahimova, M.S., Mehdiyev, C.A.: MCMR: maximum coverage and minimum redundant text summarization model. Expert Syst. Appl. 38(12), 14514–14522 (2011)
Article Google Scholar
Alguliyev, R.M., Aliguliyev, R.M., Isazade, N.R., Abdi, A., Idris, N.: COSUM: text summarization based on clustering and optimization. Expert Syst. 36(1), e12340 (2019)
Article Google Scholar
Aliguliyev, R.M.: A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Syst. Appl. 36(4), 7764–7772 (2009)
Article Google Scholar
Belford, M., Mac Namee, B., Greene, D.: Stability of topic modeling via matrix factorization. Expert Syst. Appl. 91, 159–169 (2018)
Article Google Scholar
Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recogn. 41(4), 1350–1362 (2008)
Article Google Scholar
Cheng, J., Lapata, M.: Neural summarization by extracting sentences and words. arXiv preprint arXiv:1603.07252 (2016)
Conroy, J.M., O’leary, D.P.: Text summarization via hidden Markov models. In: 24th ACM SIGIR, pp. 406–407. ACM (2001)
Google Scholar
Dong, Y., Shen, Y., Crawford, E., van Hoof, H., Cheung, J.C.K.: BanditSum: extractive summarization as a contextual bandit. arXiv:1809.09672 (2018)
Edmundson, H.P.: New methods in automatic extracting. J. ACM (JACM) 16(2), 264–285 (1969)
Article Google Scholar
Fang, C., Mu, D., Deng, Z., Wu, Z.: Word-sentence co-ranking for automatic extractive text summarization. Expert Syst. Appl. 72, 189–195 (2017)
Article Google Scholar
Fattah, M.A., Ren, F.: GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput. Speech Lang. 23(1), 126–144 (2009)
Article Google Scholar
Genest, P.E., Lapalme, G.: Framework for abstractive summarization using text-to-text generation. In: Proceedings of the Workshop on Monolingual Text-To-Text Generation, pp. 64–73. Association for Computational Linguistics (2011)
Google Scholar
Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: 24th ACM SIGIR, pp. 19–25. ACM (2001)
Google Scholar
He, Z., et al.: Document summarization based on data reconstruction. In: AAAI (2012)
Google Scholar
Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)
Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788 (1999)
Article Google Scholar
Lee, J.H., Park, S., Ahn, C.M., Kim, D.: Automatic generic document summarization based on non-negative matrix factorization. Inform. Process. Manage. 45(1), 20–34 (2009)
Article Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. Text Summarization Branches Out (2004)
Google Scholar
Lloret, E., Romá-Ferri, M.T., Palomar, M.: COMPENDIUM: a text summarization system for generating abstracts of research papers. Data Knowl. Eng. 88, 164–175 (2013)
Article Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004)
Google Scholar
Moawad, I.F., Aref, M.: Semantic graph reduction approach for abstractive text summarization. In: ICCES 2012, pp. 132–138. IEEE (2012)
Google Scholar
Nallapati, R., Zhai, F., Zhou, B.: SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Nallapati, R., Zhou, B., Ma, M.: Classify or select: neural architectures for extractive document summarization. arXiv:1611.04244 (2016)
Narayan, S., Cohen, S.B., Lapata, M.: Ranking sentences for extractive summarization with reinforcement learning. arXiv preprint arXiv:1802.08636 (2018)
Parveen, D., Ramsl, H.M., Strube, M.: Topical coherence for graph-based extractive summarization. In: Proceedings of the 2015 EMNLP, pp. 1949–1954 (2015)
Google Scholar
Qiang, J., Li, Y., Yuan, Y., Liu, W.: Snapshot ensembles of non-negative matrix factorization for stability of topic modeling. Appl. Intell. 48, 1–13 (2018)
Article Google Scholar
Shen, D., Sun, J.T., Li, H., Yang, Q., Chen, Z.: Document summarization using conditional random fields. In: IJCAI, vol. 7, pp. 2862–2867 (2007)
Google Scholar
Steinberger, J., Ježek, K.: Text summarization and singular value decomposition. In: Yakhno, T. (ed.) ADVIS 2004. LNCS, vol. 3261, pp. 245–254. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30198-1_25
Chapter Google Scholar
Vikas, O., Meshram, A.K., Meena, G., Gupta, A.: Multiple document summarization using principal component analysis incorporating semantic vector space model. IJCLCLP 13(2), 141–156 (2008)
Google Scholar
Wan, X.: Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1137–1145. ACL (2010)
Google Scholar
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: 26th ACM SIGIR, pp. 267–273. ACM (2003)
Google Scholar
Yang, K., Al-Sabahi, K., Xiang, Y., Zhang, Z.: An integrated graph model for document summarization. Information 9(9), 232 (2018)
Article Google Scholar
Yao, K., Zhang, L., Luo, T., Wu, Y.: Deep reinforcement learning for extractive document summarization. Neurocomputing 284, 52–62 (2018)
Article Google Scholar
Zhou, Q., Yang, N., Wei, F., Huang, S., Zhou, M., Zhao, T.: Neural document summarization by jointly learning to score and select sentences. arXiv:1807.02305 (2018)

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Delhi, Delhi, India
Alka Khurana & Vasudha Bhatnagar

Authors

Alka Khurana
View author publications
You can also search for this author in PubMed Google Scholar
Vasudha Bhatnagar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alka Khurana .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Johannes Kepler University of Linz, Linz, Austria
Josef Küng
The University of Texas at Arlington, Arlington, TX, USA
Sharma Chakravarthy
Johannes Kepler University of Linz, Linz, Austria
Gabriele Anderst-Kotsis
Software Competence Center Hagenberg, Hagenberg im Mühlkreis, Austria
A Min Tjoa
Johannes Kepler University of Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khurana, A., Bhatnagar, V. (2019). Extractive Document Summarization using Non-negative Matrix Factorization. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11707. Springer, Cham. https://doi.org/10.1007/978-3-030-27618-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-27618-8_6
Published: 06 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27617-1
Online ISBN: 978-3-030-27618-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics