Skip to main content

A Filter Based Feature Selection for Imbalanced Text Classification

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1037))

Abstract

In this work, a text classification method through a filter type feature selection for imbalanced data is addressed. The model initially clusters the documents associated with a class through a hierarchical clustering there by accomplishing a balanced or near balanced class. Later, a filter type feature selection is recommended to choose the most discriminative features for text classification. Subsequently, the documents are stored in the form of interval valued data. For classification purpose, a suitable symbolic classifier is recommended. The experimentation is done with two standard benchmarking datasets viz., Reuters 21578 and TDT2. The experimental results obtained from the proposed model are better in terms of f-measure when compared to the available models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aghdam, M.H., Aghaee, N.G., Basiri, M.E.: Text feature selection using ant colony optimization. Expert Syst. Appl. 36(3)-2, 6843–6853 (2009)

    Article  Google Scholar 

  2. Azam, N., Yao, J.: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst. Appl. 39, 4760–4768 (2012)

    Article  Google Scholar 

  3. Elhadad, M.K., Khaled, M., Badran, K.M., Salama, G.: A novel approach for ontology-based dimensionality reduction for web text document classification. In International Conference on Information Systems (ICIS) - 2017, vol. 978, pp. 5090–5507. IEEE (2017)

    Google Scholar 

  4. Guru, D.S., Nagendraswamy, H.S.: Symbolic representation of two-dimensional shapes. Pattern Recognit. Lett. 28, 144–155 (2006)

    Article  Google Scholar 

  5. Guru, D.S., Suhil, M., Guru, D.S., Lavanya, N.R., Vinay Kumar, N.: An alternative framework for univariate filter based feature selection for text categorization. Pattern Recognit. Lett. 103, 23–31 (2018)

    Article  Google Scholar 

  6. Guru, D.S., Suhil, M.: A novel term class relevance measure for text categorization. Procedia Comput. Sci. 45, 13–22 (2015)

    Article  Google Scholar 

  7. Harish, B.S., Guru, D.S., Manjunath, S.: Representation and classification of text documents: a brief review. IJCA Spec. Issue Recent. Trends Image Process. Pattern Recognit. (RTIPPR) 110–119 (2010)

    Google Scholar 

  8. Jiang, S., Pang, S., Wu, M., Kuang, L.: An improved K-nearest-neighbor algorithm for text categorization. Expert Syst. Appl. 39, 1503–1509 (2012)

    Article  Google Scholar 

  9. Junejo, K.A., Karim, A., Tahir, M.H., Jeon, M.: Terms-based discriminative Information space for robust text classification. Inf. Sci. 372, 518–538 (2016)

    Article  Google Scholar 

  10. Raju, L.N., Suhil, M., Guru, D.S., Gowda, H.S.: Cluster based symbolic representation for skewed text categorization. In: Santosh, K.C., Hangarge, M., Bevilacqua, V., Negi, A. (eds.) RTIP2R 2016. CCIS, vol. 709, pp. 202–216. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-4859-3_19

    Chapter  Google Scholar 

  11. Pinheiro, R.H.W., Cavalcanti, G.D.C., Ren, T.I.: Data-driven global-ranking local feature selection methods for text categorization. Expert Syst. Appl. 42, 1941–1949 (2015)

    Article  Google Scholar 

  12. Pinheiro, R.H.W., Cavalcanti, G.D.C., Correa, R.F., Ren, T.I.: A global-ranking local feature selection method for text categorization. Expert Syst. Appl. 39, 12851–12857 (2012)

    Article  Google Scholar 

  13. Rehman, A., Javed, K., Babri, H.A., Saeed, M.: Relative discrimination criterion - a novel feature ranking method for text data. Expert Syst. Appl. 42, 3670–3681 (2012)

    Article  Google Scholar 

  14. Rehman, A., Javed, K., Babri, H.A.: Feature selection based on a normalized difference measure for text classification. Inf. Process. Manag. 53, 473–489 (2017)

    Article  Google Scholar 

  15. Sabbaha, T., Selamat, A., Selamat, M.H., Fawaz, S., Viedmae, A.E.H., Krejcarg, O.: Modified frequency-based term weighting schemes for text classification. Appl. Soft Comput. 58, 193–206 (2017)

    Article  Google Scholar 

  16. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  Google Scholar 

  17. Suhil, M., Guru, D.S., Lavanya, N.R., Harsha, S.G.: Simple yet effective classification model for skewed text categorization. In: International Conference on Computing, Communications and Informatics (ICACCI)-2016. IEEE, pp. 904–910 (2016)

    Google Scholar 

  18. Swarnalatha, K., Guru, D.S., Anami, B.S., Suhil, M.: Classwise clustering for classification of imbalanced text data. In: Sridhar, V., Padma, M.C., Rao, K.A.R. (eds.) Emerging Research in Electronics, Computer Science and Technology. LNEE, vol. 545, pp. 83–94. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-5802-9_8. Text categorization. Expert Syst. Appl. 49, 31–47 (2016)

    Chapter  Google Scholar 

  19. Uysal, A.K.: An improved global feature selection scheme for text classification. Expert Syst. Appl. 43, 82–92 (2016)

    Article  Google Scholar 

  20. Uysal, A.K., Gunal, S.: A novel probabilistic feature selection method for text classification. Knowl.-Based Syst. 36, 226–235 (2012)

    Article  Google Scholar 

  21. Wang, D., Zhang, H., Li, R., Lv, W., Wang, D.: t-Test feature selection approach based on term frequency for text categorization. Pattern Recognit. Lett. 45, 1–10 (2011)

    Article  Google Scholar 

  22. Yang, J., Liu, Y., Zhu, X., Liu, Z., Zhang, X.: A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Inf. Process. Manag. 48, 741–754 (2012)

    Article  Google Scholar 

  23. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, vol. 97, pp. 412–420 (1997)

    Google Scholar 

  24. Zeina, D.A., Fawaz, S., Anzi, A.: Employing fisher discriminant analysis for Arabic text classification. Comput. Electr. Eng. 000, 1–13 (2017)

    Google Scholar 

  25. Zong, W., Wu, F., Chu, L.K., Sculli, D.: A discriminative and semantic feature selection method for text categorization. Int J. Prod. Econ. 165, 215–222 (2015)

    Article  Google Scholar 

Download references

Acknowledgement

The author N Vinay Kumar acknowledges the Department of Science and Technology, Govt. of India for their financial support rendered through DST-INSPIRE fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Swarnalatha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Swarnalatha, K., Guru, D.S., Anami, B.S., Kumar, N.V. (2019). A Filter Based Feature Selection for Imbalanced Text Classification. In: Santosh, K., Hegadi, R. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2018. Communications in Computer and Information Science, vol 1037. Springer, Singapore. https://doi.org/10.1007/978-981-13-9187-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-9187-3_18

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-9186-6

  • Online ISBN: 978-981-13-9187-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics