A Filter Based Feature Selection for Imbalanced Text Classification

  • K. SwarnalathaEmail author
  • D. S. Guru
  • Basavaraj S. Anami
  • N. Vinay Kumar
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1037)


In this work, a text classification method through a filter type feature selection for imbalanced data is addressed. The model initially clusters the documents associated with a class through a hierarchical clustering there by accomplishing a balanced or near balanced class. Later, a filter type feature selection is recommended to choose the most discriminative features for text classification. Subsequently, the documents are stored in the form of interval valued data. For classification purpose, a suitable symbolic classifier is recommended. The experimentation is done with two standard benchmarking datasets viz., Reuters 21578 and TDT2. The experimental results obtained from the proposed model are better in terms of f-measure when compared to the available models.


Imbalance text Clustering Feature selection Symbolic representation Text classification 



The author N Vinay Kumar acknowledges the Department of Science and Technology, Govt. of India for their financial support rendered through DST-INSPIRE fellowship.


  1. 1.
    Aghdam, M.H., Aghaee, N.G., Basiri, M.E.: Text feature selection using ant colony optimization. Expert Syst. Appl. 36(3)-2, 6843–6853 (2009)CrossRefGoogle Scholar
  2. 2.
    Azam, N., Yao, J.: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst. Appl. 39, 4760–4768 (2012)CrossRefGoogle Scholar
  3. 3.
    Elhadad, M.K., Khaled, M., Badran, K.M., Salama, G.: A novel approach for ontology-based dimensionality reduction for web text document classification. In International Conference on Information Systems (ICIS) - 2017, vol. 978, pp. 5090–5507. IEEE (2017)Google Scholar
  4. 4.
    Guru, D.S., Nagendraswamy, H.S.: Symbolic representation of two-dimensional shapes. Pattern Recognit. Lett. 28, 144–155 (2006)CrossRefGoogle Scholar
  5. 5.
    Guru, D.S., Suhil, M., Guru, D.S., Lavanya, N.R., Vinay Kumar, N.: An alternative framework for univariate filter based feature selection for text categorization. Pattern Recognit. Lett. 103, 23–31 (2018)CrossRefGoogle Scholar
  6. 6.
    Guru, D.S., Suhil, M.: A novel term class relevance measure for text categorization. Procedia Comput. Sci. 45, 13–22 (2015)CrossRefGoogle Scholar
  7. 7.
    Harish, B.S., Guru, D.S., Manjunath, S.: Representation and classification of text documents: a brief review. IJCA Spec. Issue Recent. Trends Image Process. Pattern Recognit. (RTIPPR) 110–119 (2010)Google Scholar
  8. 8.
    Jiang, S., Pang, S., Wu, M., Kuang, L.: An improved K-nearest-neighbor algorithm for text categorization. Expert Syst. Appl. 39, 1503–1509 (2012)CrossRefGoogle Scholar
  9. 9.
    Junejo, K.A., Karim, A., Tahir, M.H., Jeon, M.: Terms-based discriminative Information space for robust text classification. Inf. Sci. 372, 518–538 (2016)CrossRefGoogle Scholar
  10. 10.
    Raju, L.N., Suhil, M., Guru, D.S., Gowda, H.S.: Cluster based symbolic representation for skewed text categorization. In: Santosh, K.C., Hangarge, M., Bevilacqua, V., Negi, A. (eds.) RTIP2R 2016. CCIS, vol. 709, pp. 202–216. Springer, Singapore (2017). Scholar
  11. 11.
    Pinheiro, R.H.W., Cavalcanti, G.D.C., Ren, T.I.: Data-driven global-ranking local feature selection methods for text categorization. Expert Syst. Appl. 42, 1941–1949 (2015)CrossRefGoogle Scholar
  12. 12.
    Pinheiro, R.H.W., Cavalcanti, G.D.C., Correa, R.F., Ren, T.I.: A global-ranking local feature selection method for text categorization. Expert Syst. Appl. 39, 12851–12857 (2012)CrossRefGoogle Scholar
  13. 13.
    Rehman, A., Javed, K., Babri, H.A., Saeed, M.: Relative discrimination criterion - a novel feature ranking method for text data. Expert Syst. Appl. 42, 3670–3681 (2012)CrossRefGoogle Scholar
  14. 14.
    Rehman, A., Javed, K., Babri, H.A.: Feature selection based on a normalized difference measure for text classification. Inf. Process. Manag. 53, 473–489 (2017)CrossRefGoogle Scholar
  15. 15.
    Sabbaha, T., Selamat, A., Selamat, M.H., Fawaz, S., Viedmae, A.E.H., Krejcarg, O.: Modified frequency-based term weighting schemes for text classification. Appl. Soft Comput. 58, 193–206 (2017)CrossRefGoogle Scholar
  16. 16.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRefGoogle Scholar
  17. 17.
    Suhil, M., Guru, D.S., Lavanya, N.R., Harsha, S.G.: Simple yet effective classification model for skewed text categorization. In: International Conference on Computing, Communications and Informatics (ICACCI)-2016. IEEE, pp. 904–910 (2016)Google Scholar
  18. 18.
    Swarnalatha, K., Guru, D.S., Anami, B.S., Suhil, M.: Classwise clustering for classification of imbalanced text data. In: Sridhar, V., Padma, M.C., Rao, K.A.R. (eds.) Emerging Research in Electronics, Computer Science and Technology. LNEE, vol. 545, pp. 83–94. Springer, Singapore (2019). Text categorization. Expert Syst. Appl. 49, 31–47 (2016)CrossRefGoogle Scholar
  19. 19.
    Uysal, A.K.: An improved global feature selection scheme for text classification. Expert Syst. Appl. 43, 82–92 (2016)CrossRefGoogle Scholar
  20. 20.
    Uysal, A.K., Gunal, S.: A novel probabilistic feature selection method for text classification. Knowl.-Based Syst. 36, 226–235 (2012)CrossRefGoogle Scholar
  21. 21.
    Wang, D., Zhang, H., Li, R., Lv, W., Wang, D.: t-Test feature selection approach based on term frequency for text categorization. Pattern Recognit. Lett. 45, 1–10 (2011)CrossRefGoogle Scholar
  22. 22.
    Yang, J., Liu, Y., Zhu, X., Liu, Z., Zhang, X.: A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Inf. Process. Manag. 48, 741–754 (2012)CrossRefGoogle Scholar
  23. 23.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, vol. 97, pp. 412–420 (1997)Google Scholar
  24. 24.
    Zeina, D.A., Fawaz, S., Anzi, A.: Employing fisher discriminant analysis for Arabic text classification. Comput. Electr. Eng. 000, 1–13 (2017)Google Scholar
  25. 25.
    Zong, W., Wu, F., Chu, L.K., Sculli, D.: A discriminative and semantic feature selection method for text categorization. Int J. Prod. Econ. 165, 215–222 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • K. Swarnalatha
    • 1
    Email author
  • D. S. Guru
    • 2
  • Basavaraj S. Anami
    • 3
  • N. Vinay Kumar
    • 2
  1. 1.Department of Information Science and EngineeringMaharaja Institute of Technology ThandavapuraMysuruIndia
  2. 2.Department of Studies in Computer ScienceUniversity of MysoreMysuruIndia
  3. 3.KLE Institute of TechnologyHubliIndia

Personalised recommendations