Summary-Based Document Classification

Assainar Hafnan, P. P.; Mohan, Anuraj

doi:10.1007/978-981-10-8633-5_16

P. P. Assainar Hafnan¹⁸ &
Anuraj Mohan¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 709))

718 Accesses

Abstract

Document classification is one among the major NLP tasks that facilitate mining of text data and retrieval of relevant information. Most of the existing works use pre-computed features for building the classification model. Large-scale document classification relies on the efficiency or appropriateness of feature selection for document representation. The proposed system uses text summarization for automated feature selection to build the classification model. This work considers feature selection as a sentence extraction task which can be done using extractive text summarization. The method will have the advantage of reduced feature space, as classifier will be trained on shorter summary than the original document. Also, deep learning-based summarization generates the most relevant features resulting in improved efficiency and accuracy of the classifier. Experiments showed that classification based on features generated using deep learning provides better classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Edmundson, H.P.: New methods in automatic extracting. J. ACM (JACM) 16, 264–285 (1969)
Article Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res Dev. 2, 159–165 (1958)
Article MathSciNet Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Assoc. Comput. Linguist. (2004)
Google Scholar
Parveen, D., Strube, M.: Integrating importance, non-redundancy and coherence in graph-based extractive summarization. In: proceedings of International Joint Conference on Artificial Intelligence, pp. 1298–1304 (2015)
Google Scholar
Patil, M.S., Bewoor, M.S., Patil, S.H.: A hybrid approach for extractive document summarization using machine learning and clustering technique. Int. J. Comput. Sci. Inf. Technol. 5, 1584–1586 (2014)
Google Scholar
Amini, M.R., Usunier, N., Gallinari, P.: Automatic text summarization based on word-clusters and ranking algorithms. In: European Conference on Information Retrieval, pp. 142–156. Springer, Berlin, Heidelberg (2005)
Google Scholar
PadmaPriya, G., Duraiswamy, K.: An approach for concept-based automatic multi-document summarization using machine learning. Int. J. Appl. Inf. Syst. 3, 49–55 (2012)
Google Scholar
Kaikhah, K.: Text Summarization using Neural Networks (2004)
Google Scholar
Igave, M.S., Gaikwad, C.M.: Int. J. Adv. Eng. Manag. Sci. 2, 0952–0957 (2016)
Google Scholar
Svore, K.M., Vanderwende, L., Burges, C.J.C.: Enhancing single-document summarization by combining RankNet and third-party sources. In: Emnlp-conll, pp. 448–457 (2007)
Google Scholar
Denil, M., Demiraj, A., Kalchbrenner, N., Blunsom, P., de Freitas, N.: Modeling, Visualizing and Summarizing Documents with a Single Convolutional Neural Network (2014). arXiv:1406.3830
Kulkarni, A.R., Sarda, A.: Text summarization using neural networks and rhetorical structure theory. Int. J. Adv. Res. Comput. Commun. Eng. 4, 49–52 (2015)
Article Google Scholar
Cao, Z., Wei, F., Dong, L., Li, S., Zhou, M.: Ranking with recursive neural networks and its application to multi-document summarization. In: proceedings of the Association for the Advancement of Artificial Intelligence conference, pp. 2153–2159 (2015)
Google Scholar
Zhong, S.-H., Liu, Y., Li, B., Long, J.: Query-oriented unsupervised multi-document summarization via deep learning model. Expert Syst. Appl. 42, 8146–8155 (2015)
Article Google Scholar
Basu, A., Watters, C., Shepherd, M.: Support vector machines for text categorization. In: Proceedings of the 36th Hawaii International Conference on System Sciences (2002)
Google Scholar
kim, S.-B., Han, K.-S. Rim, H.-C., Myaeng, S.H.: Some effective techniques for naive Bayes text classification. IEEE Trans. Knowl. Data Eng. (2006)
Google Scholar
Bijalwan, V., Kumar, V., Kumari, P., Pascual, J.: KNN based machine learning approach for text and document mining. J. Database Theory Appl. 7, 61–70 (2014)
Article Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
Google Scholar
Balaji, J., Geetha, T.V., Parthasarathi, R.: A Graph based query focused multi-document summarization. Int. J. Intell. Inf. Technol. (IJIIT) 10, 16–41 (2014)
Article Google Scholar
Jeonghun, Y.O.O.N., Dae-Won, K.: Classification based on predictive association rules of incomplete data. IEICE Trans. Inf. Syst. 95, 1531–1535 (2012)
Google Scholar
Kim, Y.: Convolutional Neural Networks for Sentence Classification (2014). arXiv:1408.5882
Lertnattee, V., Theeramunkong, T.: Class normalization in centroid-based text categorization. Inf. Sci. 176, 1712–1738 (2006)
Article Google Scholar
Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the IEEE International Conference on Data Mining series, pp. 369–376 (2001)
Google Scholar
PadmaPriya, G., Duraiswamy, K.: An approach for text summarization using deep learning algorithm. J. Comput. Sci. 10, 1–9 (2014)
Article Google Scholar
Rahman, C.M., Sohel, F.A., Naushad, P., Kamruzzaman, S.M.: Text classification using the concept of association rule of data mining (2010). arXiv:1009.4582
Tan, S.: An improved centroid classifier for text categorization. Expert Syst. Appl. 35, 279–285 (2008)
Article Google Scholar
Thakkar, K.S., Dharaskar, R.V., Chandak, M.B.: Graph-based algorithms for text summarization. In: Proceedings of the 3rd International Conference on Emerging Trends in Engineering and Technology, pp. 516–519 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, NSS Engineering College, Palakkad, India
P. P. Assainar Hafnan & Anuraj Mohan

Authors

P. P. Assainar Hafnan
View author publications
You can also search for this author in PubMed Google Scholar
Anuraj Mohan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. P. Assainar Hafnan .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Pankaj Kumar Sa
Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Sambit Bakshi
Department of Computer Engineering and Informatics, University of Patras, Patras, Greece
Ioannis K. Hatzilygeroudis
Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Manmath Narayan Sahoo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Assainar Hafnan, P.P., Mohan, A. (2018). Summary-Based Document Classification. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds) Recent Findings in Intelligent Computing Techniques . Advances in Intelligent Systems and Computing, vol 709. Springer, Singapore. https://doi.org/10.1007/978-981-10-8633-5_16

Download citation

DOI: https://doi.org/10.1007/978-981-10-8633-5_16
Published: 04 November 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8632-8
Online ISBN: 978-981-10-8633-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics