A Filter Based Feature Selection for Imbalanced Text Classification
In this work, a text classification method through a filter type feature selection for imbalanced data is addressed. The model initially clusters the documents associated with a class through a hierarchical clustering there by accomplishing a balanced or near balanced class. Later, a filter type feature selection is recommended to choose the most discriminative features for text classification. Subsequently, the documents are stored in the form of interval valued data. For classification purpose, a suitable symbolic classifier is recommended. The experimentation is done with two standard benchmarking datasets viz., Reuters 21578 and TDT2. The experimental results obtained from the proposed model are better in terms of f-measure when compared to the available models.
KeywordsImbalance text Clustering Feature selection Symbolic representation Text classification
The author N Vinay Kumar acknowledges the Department of Science and Technology, Govt. of India for their financial support rendered through DST-INSPIRE fellowship.
- 3.Elhadad, M.K., Khaled, M., Badran, K.M., Salama, G.: A novel approach for ontology-based dimensionality reduction for web text document classification. In International Conference on Information Systems (ICIS) - 2017, vol. 978, pp. 5090–5507. IEEE (2017)Google Scholar
- 7.Harish, B.S., Guru, D.S., Manjunath, S.: Representation and classification of text documents: a brief review. IJCA Spec. Issue Recent. Trends Image Process. Pattern Recognit. (RTIPPR) 110–119 (2010)Google Scholar
- 10.Raju, L.N., Suhil, M., Guru, D.S., Gowda, H.S.: Cluster based symbolic representation for skewed text categorization. In: Santosh, K.C., Hangarge, M., Bevilacqua, V., Negi, A. (eds.) RTIP2R 2016. CCIS, vol. 709, pp. 202–216. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-4859-3_19CrossRefGoogle Scholar
- 17.Suhil, M., Guru, D.S., Lavanya, N.R., Harsha, S.G.: Simple yet effective classification model for skewed text categorization. In: International Conference on Computing, Communications and Informatics (ICACCI)-2016. IEEE, pp. 904–910 (2016)Google Scholar
- 18.Swarnalatha, K., Guru, D.S., Anami, B.S., Suhil, M.: Classwise clustering for classification of imbalanced text data. In: Sridhar, V., Padma, M.C., Rao, K.A.R. (eds.) Emerging Research in Electronics, Computer Science and Technology. LNEE, vol. 545, pp. 83–94. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-5802-9_8. Text categorization. Expert Syst. Appl. 49, 31–47 (2016)CrossRefGoogle Scholar
- 23.Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, vol. 97, pp. 412–420 (1997)Google Scholar
- 24.Zeina, D.A., Fawaz, S., Anzi, A.: Employing fisher discriminant analysis for Arabic text classification. Comput. Electr. Eng. 000, 1–13 (2017)Google Scholar