Multiple Document Summarization Using Text-Based Keyword Extraction

Deepak Motwani; Saxena, A. S.

doi:10.1007/978-981-10-0448-3_15

Deepak Motwani⁷ &
A. S. Saxena⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 436))

1220 Accesses

Abstract

The main focus of the paper is on the comparison between the proposed methodology keyword-based text extraction using threading and synchronization just like multiple files input as batch processing and previously used technologies for text extraction from research papers. Keyword-based summary is defined as selecting important sentences from actual text. Text summarization is the condensed form of any type of document whether pdf, doc, or txt files but this condensed form should preserve complete information and meaningful text with the help of single input file and multiple input file. It is not an easy task for human being to maintain the summary of large number of documents. Various text summarizations and text extraction techniques are being explained in this paper. Our proposed technique creates the summary by extracting sentences from the original document with the font type and pdf font or keyword extractor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mendeley is a desktop and web program for managing and sharing research papers, discovering research data and collaborating online
Google Scholar
Accurate Information Extraction from Research Papers using Conditional Random Fields
Google Scholar
Lin, C.-J., Lin, Y.-I.: Text mining techniques for patent analysis. Int. J. Inf. Proc. Manag., ACM, USA, 43, 1216–1247 (2007)
Google Scholar
Tu, Y.-N., Seng, J.-L.: Research intelligence involving information retrieval—an example of conferences and journals. Int. J. Expert Syst. Appl. 12151–12166 (2009)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. Int. J. IBM J. Res. Dev., ACM, USA, vol. 2, pp. 159–165, 1958.
Google Scholar
Edmundson, H.P.: New methods in automatic extracting. J. ACM, USA 16, 264–285 (1969)
Google Scholar
Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: Proceedings of the 18th ACMSIGIR Conference on Research and Development in Information Retrieval, USA, pp. 68–73 (1995)
Google Scholar
Mittendorf, E., Schauble, P.: Document and passage retrieval based on hidden markov models. In: Proceedings of the 17th ACM-SIGIR Conference on Research and Development in Information Retrieval, New York, pp. 318–327 (1994)
Google Scholar
Brandow, R., Mitze, K., Rau, L.F.: Automatic condensation of electronic publications by sentence selection. In: International Journal on Information Processing and Management, ACM, USA, vol. 31, pp. 675–685 (1995)
Google Scholar
Bookstein, A., Klein S.T., Raita, T.: Detecting content-bearing words by serial clustering. In: Proceedings of the 18th ACM-SIGIR Conference on Research and Development in Information Technology, New York, pp. 319–327 (1995)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of European Conference on Machine Learning, ACM, London, pp. 137–142 (1998)
Google Scholar
Makrehchi, M., Kamel, M.: A fuzzy set approach to extracting keywords from abstracts. IEEE Int. Conf. Fuzzy Inf. 2, 528–532 (2004)
Google Scholar
Alguliev, R., Aliguliyev, R.: Evolutionary algorithm for extractive text summarization. Int. J. Intell. Inf. Manag. 1 (2), 128–138 (2009).
Google Scholar
Liao, S.-H., Chu, P.-H., Hsiao, P.-Y.: Data mining techniques and applications– A decade review from 2000 to 2011. J. Expert Syst. Appl., Elsevier 39, 11303–11311 (2012)
Google Scholar
Saleem, O., Latif, S.: Information extraction from research papers by data integration and data validation from multiple header extraction sources. In: World Congress on Engineering and Computer Science (WCECS), San Francisco, USA (2012)
Google Scholar
Lu, H., Zheng, X., Sun, X., Zhang, N.: Research on intelligent scientific research collaboration platform and taking journal intelligence system as example. In: International Conference on Service Operations and Logistics, and Informatics (SOLI), IEEE, Suzhou, pp. 138–143 (2012)
Google Scholar
Kumar, Y.J., Salim, N.: Automatic multi document summarization approaches. Int. J. Comput. Sci.
Google Scholar
Xie, W.-L., Li, Y.-M., Zhang, Y.: Applying information retrieval technology in analyzing the journals. In: Fourth International Conference on Emerging Intelligent Data and Web Technologies (EIDWT), Xi’an, pp. 88–94 (2013)
Google Scholar
Beel, J., Langer, S., Genzmehr, M., Müller, C.: Docear’s PDF inspector: title extraction from PDF files. In: Proceedings of 13th ACM/IEEE-CS joint Conference on Digital Libraries, ACM, USA, pp. 443–444 (2013)
Google Scholar
Yang, X., Lian, L.: A new data mining algorithm based on map reduce and Hadoop. Int. J. Signal Process. Image Process. Pattern Recogn. 7, 131–142 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, Mewar University, Chittorgarh, Rajasthan, India
Deepak Motwani
Faculty of Engineering & Technology, Mewar University, Chittorgarh, Rajasthan, India
A. S. Saxena

Authors

Deepak Motwani
View author publications
You can also search for this author in PubMed Google Scholar
A. S. Saxena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepak Motwani .

Editor information

Editors and Affiliations

Dept of Applied Sci & Eng, Indian Instit of Tech Roorkee, Roorkee, India
Millie Pant
Department of Mathematics, Indian Inst of Tech Roorkee, Roorkee, India
Kusum Deep
Chankyapuri, Rm 327, South Asian Univ, Akbar Bhawan, New Delhi, India
Jagdish Chand Bansal
Department of Mathematics and Comp Sci, Liverpool Hope University, LIVERPOOL, United Kingdom
Atulya Nagar
Department of Mathematics, National Inst of Tech Silchar, Silchar, Assam, India
Kedar Nath Das

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deepak Motwani, Saxena, A.S. (2016). Multiple Document Summarization Using Text-Based Keyword Extraction. In: Pant, M., Deep, K., Bansal, J., Nagar, A., Das, K. (eds) Proceedings of Fifth International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 436. Springer, Singapore. https://doi.org/10.1007/978-981-10-0448-3_15

Download citation

DOI: https://doi.org/10.1007/978-981-10-0448-3_15
Published: 15 March 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0447-6
Online ISBN: 978-981-10-0448-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics