Abstract
The main focus of the paper is on the comparison between the proposed methodology keyword-based text extraction using threading and synchronization just like multiple files input as batch processing and previously used technologies for text extraction from research papers. Keyword-based summary is defined as selecting important sentences from actual text. Text summarization is the condensed form of any type of document whether pdf, doc, or txt files but this condensed form should preserve complete information and meaningful text with the help of single input file and multiple input file. It is not an easy task for human being to maintain the summary of large number of documents. Various text summarizations and text extraction techniques are being explained in this paper. Our proposed technique creates the summary by extracting sentences from the original document with the font type and pdf font or keyword extractor.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mendeley is a desktop and web program for managing and sharing research papers, discovering research data and collaborating online
Accurate Information Extraction from Research Papers using Conditional Random Fields
Lin, C.-J., Lin, Y.-I.: Text mining techniques for patent analysis. Int. J. Inf. Proc. Manag., ACM, USA, 43, 1216–1247 (2007)
Tu, Y.-N., Seng, J.-L.: Research intelligence involving information retrieval—an example of conferences and journals. Int. J. Expert Syst. Appl. 12151–12166 (2009)
Luhn, H.P.: The automatic creation of literature abstracts. Int. J. IBM J. Res. Dev., ACM, USA, vol. 2, pp. 159–165, 1958.
Edmundson, H.P.: New methods in automatic extracting. J. ACM, USA 16, 264–285 (1969)
Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: Proceedings of the 18th ACMSIGIR Conference on Research and Development in Information Retrieval, USA, pp. 68–73 (1995)
Mittendorf, E., Schauble, P.: Document and passage retrieval based on hidden markov models. In: Proceedings of the 17th ACM-SIGIR Conference on Research and Development in Information Retrieval, New York, pp. 318–327 (1994)
Brandow, R., Mitze, K., Rau, L.F.: Automatic condensation of electronic publications by sentence selection. In: International Journal on Information Processing and Management, ACM, USA, vol. 31, pp. 675–685 (1995)
Bookstein, A., Klein S.T., Raita, T.: Detecting content-bearing words by serial clustering. In: Proceedings of the 18th ACM-SIGIR Conference on Research and Development in Information Technology, New York, pp. 319–327 (1995)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of European Conference on Machine Learning, ACM, London, pp. 137–142 (1998)
Makrehchi, M., Kamel, M.: A fuzzy set approach to extracting keywords from abstracts. IEEE Int. Conf. Fuzzy Inf. 2, 528–532 (2004)
Alguliev, R., Aliguliyev, R.: Evolutionary algorithm for extractive text summarization. Int. J. Intell. Inf. Manag. 1 (2), 128–138 (2009).
Liao, S.-H., Chu, P.-H., Hsiao, P.-Y.: Data mining techniques and applications– A decade review from 2000 to 2011. J. Expert Syst. Appl., Elsevier 39, 11303–11311 (2012)
Saleem, O., Latif, S.: Information extraction from research papers by data integration and data validation from multiple header extraction sources. In: World Congress on Engineering and Computer Science (WCECS), San Francisco, USA (2012)
Lu, H., Zheng, X., Sun, X., Zhang, N.: Research on intelligent scientific research collaboration platform and taking journal intelligence system as example. In: International Conference on Service Operations and Logistics, and Informatics (SOLI), IEEE, Suzhou, pp. 138–143 (2012)
Kumar, Y.J., Salim, N.: Automatic multi document summarization approaches. Int. J. Comput. Sci.
Xie, W.-L., Li, Y.-M., Zhang, Y.: Applying information retrieval technology in analyzing the journals. In: Fourth International Conference on Emerging Intelligent Data and Web Technologies (EIDWT), Xi’an, pp. 88–94 (2013)
Beel, J., Langer, S., Genzmehr, M., Müller, C.: Docear’s PDF inspector: title extraction from PDF files. In: Proceedings of 13th ACM/IEEE-CS joint Conference on Digital Libraries, ACM, USA, pp. 443–444 (2013)
Yang, X., Lian, L.: A new data mining algorithm based on map reduce and Hadoop. Int. J. Signal Process. Image Process. Pattern Recogn. 7, 131–142 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Deepak Motwani, Saxena, A.S. (2016). Multiple Document Summarization Using Text-Based Keyword Extraction. In: Pant, M., Deep, K., Bansal, J., Nagar, A., Das, K. (eds) Proceedings of Fifth International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 436. Springer, Singapore. https://doi.org/10.1007/978-981-10-0448-3_15
Download citation
DOI: https://doi.org/10.1007/978-981-10-0448-3_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0447-6
Online ISBN: 978-981-10-0448-3
eBook Packages: EngineeringEngineering (R0)