Abstract
Document classification is one among the major NLP tasks that facilitate mining of text data and retrieval of relevant information. Most of the existing works use pre-computed features for building the classification model. Large-scale document classification relies on the efficiency or appropriateness of feature selection for document representation. The proposed system uses text summarization for automated feature selection to build the classification model. This work considers feature selection as a sentence extraction task which can be done using extractive text summarization. The method will have the advantage of reduced feature space, as classifier will be trained on shorter summary than the original document. Also, deep learning-based summarization generates the most relevant features resulting in improved efficiency and accuracy of the classifier. Experiments showed that classification based on features generated using deep learning provides better classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Edmundson, H.P.: New methods in automatic extracting. J. ACM (JACM) 16, 264–285 (1969)
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res Dev. 2, 159–165 (1958)
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Assoc. Comput. Linguist. (2004)
Parveen, D., Strube, M.: Integrating importance, non-redundancy and coherence in graph-based extractive summarization. In: proceedings of International Joint Conference on Artificial Intelligence, pp. 1298–1304 (2015)
Patil, M.S., Bewoor, M.S., Patil, S.H.: A hybrid approach for extractive document summarization using machine learning and clustering technique. Int. J. Comput. Sci. Inf. Technol. 5, 1584–1586 (2014)
Amini, M.R., Usunier, N., Gallinari, P.: Automatic text summarization based on word-clusters and ranking algorithms. In: European Conference on Information Retrieval, pp. 142–156. Springer, Berlin, Heidelberg (2005)
PadmaPriya, G., Duraiswamy, K.: An approach for concept-based automatic multi-document summarization using machine learning. Int. J. Appl. Inf. Syst. 3, 49–55 (2012)
Kaikhah, K.: Text Summarization using Neural Networks (2004)
Igave, M.S., Gaikwad, C.M.: Int. J. Adv. Eng. Manag. Sci. 2, 0952–0957 (2016)
Svore, K.M., Vanderwende, L., Burges, C.J.C.: Enhancing single-document summarization by combining RankNet and third-party sources. In: Emnlp-conll, pp. 448–457 (2007)
Denil, M., Demiraj, A., Kalchbrenner, N., Blunsom, P., de Freitas, N.: Modeling, Visualizing and Summarizing Documents with a Single Convolutional Neural Network (2014). arXiv:1406.3830
Kulkarni, A.R., Sarda, A.: Text summarization using neural networks and rhetorical structure theory. Int. J. Adv. Res. Comput. Commun. Eng. 4, 49–52 (2015)
Cao, Z., Wei, F., Dong, L., Li, S., Zhou, M.: Ranking with recursive neural networks and its application to multi-document summarization. In: proceedings of the Association for the Advancement of Artificial Intelligence conference, pp. 2153–2159 (2015)
Zhong, S.-H., Liu, Y., Li, B., Long, J.: Query-oriented unsupervised multi-document summarization via deep learning model. Expert Syst. Appl. 42, 8146–8155 (2015)
Basu, A., Watters, C., Shepherd, M.: Support vector machines for text categorization. In: Proceedings of the 36th Hawaii International Conference on System Sciences (2002)
kim, S.-B., Han, K.-S. Rim, H.-C., Myaeng, S.H.: Some effective techniques for naive Bayes text classification. IEEE Trans. Knowl. Data Eng. (2006)
Bijalwan, V., Kumar, V., Kumari, P., Pascual, J.: KNN based machine learning approach for text and document mining. J. Database Theory Appl. 7, 61–70 (2014)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
Balaji, J., Geetha, T.V., Parthasarathi, R.: A Graph based query focused multi-document summarization. Int. J. Intell. Inf. Technol. (IJIIT) 10, 16–41 (2014)
Jeonghun, Y.O.O.N., Dae-Won, K.: Classification based on predictive association rules of incomplete data. IEICE Trans. Inf. Syst. 95, 1531–1535 (2012)
Kim, Y.: Convolutional Neural Networks for Sentence Classification (2014). arXiv:1408.5882
Lertnattee, V., Theeramunkong, T.: Class normalization in centroid-based text categorization. Inf. Sci. 176, 1712–1738 (2006)
Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the IEEE International Conference on Data Mining series, pp. 369–376 (2001)
PadmaPriya, G., Duraiswamy, K.: An approach for text summarization using deep learning algorithm. J. Comput. Sci. 10, 1–9 (2014)
Rahman, C.M., Sohel, F.A., Naushad, P., Kamruzzaman, S.M.: Text classification using the concept of association rule of data mining (2010). arXiv:1009.4582
Tan, S.: An improved centroid classifier for text categorization. Expert Syst. Appl. 35, 279–285 (2008)
Thakkar, K.S., Dharaskar, R.V., Chandak, M.B.: Graph-based algorithms for text summarization. In: Proceedings of the 3rd International Conference on Emerging Trends in Engineering and Technology, pp. 516–519 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Assainar Hafnan, P.P., Mohan, A. (2018). Summary-Based Document Classification. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds) Recent Findings in Intelligent Computing Techniques . Advances in Intelligent Systems and Computing, vol 709. Springer, Singapore. https://doi.org/10.1007/978-981-10-8633-5_16
Download citation
DOI: https://doi.org/10.1007/978-981-10-8633-5_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8632-8
Online ISBN: 978-981-10-8633-5
eBook Packages: EngineeringEngineering (R0)