Advertisement

An Innovative Method of Feature Extraction for Text Classification Using PART Classifier

  • Ankita DharEmail author
  • Niladri Sekhar Dash
  • Kaushik Roy
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 835)

Abstract

With the advent of technology and thrust to create machines with human intelligence, the frontiers in machine learning are being stretched incessantly. With the increasing number of digital information in recent scenario, text classification has gained importance in machine learning task which needs to study texts thoroughly to achieve success. It is a task of assigning a random document to its predefined text class. This paper aims at presenting a methodology for developing an automatic system for solving the problem of classifying Bangla text document into their respective text categories. It introduces a hybrid approach (i.e., PART) for classification of Bangla text documents based on ‘term association’ and ‘term aggregation’ as baseline feature extraction methods. Comparison of results with other classification algorithms shows that this approach can elicit better results than the existing methods.

Keywords

Text classification Feature Term association Term aggregation PART Classifier 

Notes

Acknowledgement

One of the authors thanks DST for their support.

References

  1. 1.
    DeySarkar, S., Goswami, S., Agarwal, A., Akhtar, J.: A novel feature selection technique for text classification using naive bayes. In: International Scholarly Research Notices, vol. 2014, p. 10 (2014)Google Scholar
  2. 2.
    Guru, D.S., Suhil, M.: A novel term_class relevance measure for text categorization. In: Proceedings of ICACTA, pp. 13–22 (2015)CrossRefGoogle Scholar
  3. 3.
    Jin, P., Zhang, Y., Chen, X., Xia, Y.: Bag-of-embeddings for text classification. In: Proceedings of IJCAI, pp. 2824–2830. (2016)Google Scholar
  4. 4.
    Wang, D., Zhang, H.: Inverse-category-frequency based supervised term weighting schemes for text categorization. J. Inf. Sci. Eng. 29, 209–225 (2013)Google Scholar
  5. 5.
    Gupta, N., Gupta, V.: Punjabi text classification using naive bayes, centroid and hybrid approach. In: Proceedings of SANLP, pp. 109–122 (2012)Google Scholar
  6. 6.
    Mansur, M., UzZaman, N., Khan, M.: Analysis of N-gram based text categorization for bangla in a newspaper corpus. In: Proceedings of ICCIT, p. 08 (2006)Google Scholar
  7. 7.
    Kabir, F., Siddique, S., Kotwal, M.R.A., Huda, M.N.: Bangla text document categorization using stochastic gradient descent (SGD) classifier. In: Proceedings of CCIP, pp. 1–4 (2015)Google Scholar
  8. 8.
    Islam, Md.S., Jubayer, F.E.Md., Ahmed, S.I.: A comparative study on different types of approaches to Bengali document categorization. In: Proceedings of ICERIE, p. 6 (2017)Google Scholar
  9. 9.
    Islam, Md.S., Jubayer, F.E.Md., Ahmed, S.I.: A support vector machine mixed with TF-IDF algorithm to categorize Bengali document. In: Proceedings of ICECCE, pp. 191–196 (2017)Google Scholar
  10. 10.
    ArunaDevi, K., Saveeth, R.: A novel approach on tamil text classification using C-feature. Int. J. Sci. Res. Develop. 02, 343–345 (2014)Google Scholar
  11. 11.
    Swamy, M.N., Thappa, M.H.: Indian language text representation and categorization using supervised learning algorithm. Int. J. Data Min. Tech. Appl. 02, 251–257 (2013)Google Scholar
  12. 12.
    Patil, J.J., Bogiri, N.: Automatic text categorization marathi documents. Int. J. Adv. Res. Comput. Sci. Manage. Stud. 03, 280–287 (2015)Google Scholar
  13. 13.
    Bolaj, P., Govilkar, S.: Text classification for Marathi documents using supervised learning methods. Int. J. Comput. Appl. 155, 6–10 (2016)Google Scholar
  14. 14.
    Al-Radaideh, Q.A., Al-Khateeb, S.S.: An associative rule-based classifier for Arabic medical text. Int. J. Knowl. Eng. Data Min. 03, 255–273 (2015)CrossRefGoogle Scholar
  15. 15.
    Haralambous, Y., Elidrissi, Y., Lenca, P.: Arabic language text classification using dependency syntax-based feature selection. In: Proceedings of ICALP, p. 10 (2014)Google Scholar
  16. 16.
    Ali, A.R., Ijaz, M.: Urdu text classification. In: Proceedings of FIT, pp. 21–27 (2009)Google Scholar
  17. 17.
    Sarmah, J., Saharia, N., Sarma, S.K.: A novel approach for document classification using Assamese WordNet. In: Proceedings of ICGW, pp. 324–329 (2012)Google Scholar
  18. 18.
    Mohanty, S., Santi, P.K., Mishra, R., Mohapatra, R.N., Swain, S.: Semantic-based text classification using WordNets: Indian language perspective. In: Proceedings of ICGW, pp. 321–324 (2006)Google Scholar
  19. 19.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)CrossRefGoogle Scholar
  20. 20.
    Dhar, A., Dash, N.S., Roy, K.: Classification of text documents through distance measurement: an experiment with multi-domain bangla text documents. In: Proceedings of ICACCA, pp. 1–6 (2017)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceWest Bengal State UniversityKolkataIndia
  2. 2.Linguistic Research UnitIndian Statistical InstituteKolkataIndia

Personalised recommendations