Abstract
The continuous increase of information in the web with varying dimensions is becoming difficult for users to filter and analyse them efficiently as it incorporates redundant and irrelevant terms. Managing, filtering and organizing such huge datasets need the classification of text documents to be performed. Text classification is the process of assigning the text documents to their predefined text categories based on the content. The aim of this paper is to explore Cuckoo search optimization (CSO) problem established from the behaviour of cuckoo birds for selection of relevant features by modifying the algorithm. The revised algorithm is named as modified Cuckoo search (MCS) optimization algorithm that can be proved to be useful for developing an efficient text classification system. The proposed method is generated by combining the ability of MCS with the sharpness of Naive Bayes Multinomial (NBM) algorithm for generating proper feature which increases the rate of success. The approach adopted here is tested on 9000 text documents that cover eight different domains fetched from several web sources and obtains encouraging outcome. The results compared with the results from other well-known approaches for text classification task show the effectiveness of the proposed approach as an automatic Bangla text classification system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Al-Radaideh, Q.A., Al-Khateeb, S.S.: An associative rule-based classifier for Arabic medical text. Int. J. Knowl. Eng. Data Min. 03, 255–273 (2015)
Aly, W., Kelleny, H.A.: Adaptation of Cuckoo search for documents clustering. Int. J. Comput. Appl. Technol. 86, 4–10 (2014)
ArunaDevi, K., Saveeth, R.: A novel approach on tamil text classification using C-Feature. Int. J. Sci. Res. Dev. 2, 343–345 (2014)
Bolaj, P., Govilkar, S.: Text classification for Marathi documents using supervised learning methods. Int. J. Comput. Appl. 155, 6–10 (2016)
Bouguelia, M.R., Nowaczyk, S., Santosh, K.C., Verikas, A.: Agreeing to disagree: active learning with noisy labels without crowdsourcing. Int. J. Mach. Learn. Cybern. 9, 1307–1319 (2018)
DeySarkar, S., Goswami, S., Agarwal, A., Akhtar, J.: A novel feature selection technique for text classification using Naive Bayes. Int. Sch. Res. Not. 2014, 10 (2014)
Dhar, A., Dash, N.S., Roy, K.: Categorization of bangla web text documents based on TF-IDF-ICF text analysis scheme. In: Mandal, J.K., Sinha, D. (eds.) CSI 2018. CCIS, vol. 836, pp. 477–484. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-1343-1_39
Gupta, N., Gupta, V.: Punjabi text classification using Naive Bayes, centroid and hybrid approach. In: Proceedings of the 3rd Workshop on South and South East Asian Natural Language Processing, pp. 109–122 (2012)
Guru, D.S., Suhil, M.: A novel term\_ class relevance measure for text categorization. In: Proceedings of International Conference on Advanced Computing Technologies and Applications, pp. 13–22 (2015)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)
Islam, Md.S., Jubayer, F.E.Md., Ahmed, S.I.: A support vector machine mixed with TF-IDF algorithm to categorize Bengali document. In: Proceedings of International Conference on Electrical, Computer and Communication Engineering, pp. 191–196 (2017)
Jin, P., Zhang, Y., Chen, X., Xia, Y.: Bag-of-embeddings for text classification. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, pp. 2824–2830 (2016)
Kabir, F., Siddique, S., Kotwal, M.R.A., Huda, M.N.: Bangla text document categorization using stochastic gradient descent (SGD) classifier. In: Proceedings of International Conference on Cognitive Computing and Information Processing, pp. 1–4 (2015)
Kim, S., Han, K., Rim, H., Myaeng, S.: Some effective techniques for Naive Bayes text classification. IEEE Trans. Knowl. Data Eng. 18, 1457–1466 (2006)
Mandal, A.K., Sen, R.: Supervised learning methods for Bangla web document categorization. Int. J. Artif. Intell. Appl. 05, 93–105 (2014)
Mansur, M., UzZaman, N., Khan, M.: Analysis of N-gram based text categorization for Bangla in a Newspaper Corpus. In: Proceedings of International Conference on Computer and Information Technology, p. 08 (2006)
Rautray, R., Balabantaray, R.C.: CSTS: cuckoo search based model for text summarization. In: Dash, S.S., Vijayakumar, K., Panigrahi, B.K., Das, S. (eds.) Artificial Intelligence and Evolutionary Computations in Engineering Systems. AISC, vol. 517, pp. 141–150. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-3174-8_13
Redmond, M., Salesi, S., Cosma, G.: A novel approach based on an extended cuckoo search algorithm for the classification of tweets which contain Emoticon and Emoji. In: Proceedings of IEEE International Conference on Knowledge Engineering and Applications, pp. 13–19 (2017)
Sujana, T.S., Rao, N.M.S., Reddy, R.S.: An efficient feature selection using parallel cuckoo search and Naive Bayes classifier. In: Proceedings of IEEE International Conference on Networks & Advances in Computational Technologies, pp. 167–172 (2017)
Vajda, S., Santosh, K.C.: A fast k-nearest neighbor classifier using unsupervised clustering. In: Santosh, K.C., Hangarge, M., Bevilacqua, V., Negi, A. (eds.) RTIP2R 2016. CCIS, vol. 709, pp. 185–193. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-4859-3_17
Wang, D., Zhang, H., Liu, R., Lv, W.: Feature selection based on term frequency and T-Test for text categorization. In: Proceedings of the ACM International Conference on Information and Knowledge Management, pp. 1482–1486 (2012)
Wilbur, W.J., Kim, W.: The ineffectiveness of within-document term frequency in text classification. Inf. Retrieval 12, 509–525 (2009)
Yang, X.S., Deb, S.: Cuckoo search via Levy flights. World Congress on Nature & Biologically Inspired Computing, pp. 210–214 (2009)
Acknowledgement
One of the authors thank DST for the INSPIRE fellowship and also thank various links provided in [7] from which the data has been collected.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dhar, A., Dash, N.S., Roy, K. (2019). Efficient Feature Selection Based on Modified Cuckoo Search Optimization Problem for Classifying Web Text Documents. In: Santosh, K., Hegadi, R. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2018. Communications in Computer and Information Science, vol 1037. Springer, Singapore. https://doi.org/10.1007/978-981-13-9187-3_57
Download citation
DOI: https://doi.org/10.1007/978-981-13-9187-3_57
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9186-6
Online ISBN: 978-981-13-9187-3
eBook Packages: Computer ScienceComputer Science (R0)