Advertisement

Improved Term Weighting Factors for Keyword Extraction in Hierarchical Category Structure and Thai Text Classification

  • Boonthida ChiraratanasophaEmail author
  • Thanaruk Theeramunkong
  • Salin Boonbrahm
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 807)

Abstract

Keyword extraction of complex hierarchical categories becomes a challenge in text mining since commonly used classification for flat categories results in low accuracy. This paper presents a method to improve keyword extraction from hierarchical categories considering terms occurred in category from a hierarchy as additional factors in term-weighting. The method is an enhancement of a basic TF-IDF calculation; thus, it can comfortably be used for keyword extraction and classification. By taking term frequency and inverse document frequency of categories hierarchically related to a focused category, we can determine how important terms are in their family categories. In this work, hierarchy relations used in calculation are sub-categories, supercategories and sibling-categories. From experiment results, we found that the proposed method gained higher accuracy for about 40% from a baseline in a classification task.

Keywords

Keyword extraction Term weighting Hierarchical categories Term frequency-inverse documents frequency (TF-IDF) 

Notes

Acknowledgement

Author would like to thank National Reform Council for providing comment data from Thai Reform Website.

References

  1. 1.
    Uzun, Y.: Keyword extraction using Naive Bayes. Department of Computer Science, Bilkent University, Turkey (2005). www.cs.bilkent.edu.tr/~guvenir/courses/CS550/Workshop/Yasin_Uzun.pdf
  2. 2.
    Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109, 18–23 (2015)Google Scholar
  3. 3.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24, 513–523 (1988)CrossRefGoogle Scholar
  4. 4.
    Tipsena, R.: Automatic question classification on webboard using text mining techniques. J. Sci. Technol. Mahasarakham Univ. 33, 493 (2014). (in Thai)Google Scholar
  5. 5.
    Sarakit, P., Theeramunkong, T., Haruechaiyasak, C., Okumura, M.: Classifying emotion in Thai Youtube comments. In: 2015 6th International Conference of Information and Communication Technology for Embedded Systems (IC-ICTES), pp. 1–5. IEEE (2015)Google Scholar
  6. 6.
    Obasi, C.K., Ugwu, C.: Feature selection and vectorization in legal case documents using chi-square statistical analysis and Naïve Bayes approaches. IOSR J. Comput. Eng. 17, 42–50 (2015)Google Scholar
  7. 7.
    Shen, D., Ruvini, J.-D., Sarwar, B.: Large-scale item categorization for e-commerce. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 595–604. ACM (2012)Google Scholar
  8. 8.
    SillaJr, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22, 31–72 (2011)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Qiu, X., Huang, X., Liu, Z., Zhou, J.: Hierarchical text classification with latent concepts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 598–602. Association for Computational Linguistics (2011)Google Scholar
  10. 10.
    Qu, B., Cong, G., Li, C., Sun, A., Chen, H.: An evaluation of classification models for question topic categorization. J. Am. Soc. Inf. Sci. Technol. 63, 889–903 (2012)CrossRefGoogle Scholar
  11. 11.
    Phachongkitphiphat, N., Vateekul, P.: An improvement of flat approach on hierarchical text classification using top-level pruning classifiers. In: 2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 86–90. IEEE (2014)Google Scholar
  12. 12.
    Thai Reform website. http://static.thaireform.org/
  13. 13.
    Javed, F., Luo, Q., McNair, M., Jacob, F., Zhao, M., Carotene, K.T.: A job title classification system for the online recruitment domain. In: 2015 IEEE First International Conference on Big Data Computing Service and Applications (BigDataService), pp. 286–293. IEEE (2015)Google Scholar
  14. 14.
    Kashireddy, S.D., Gauch, S., Billah, S.M.: Automatic class labeling for CiteSeerX. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 241–245. IEEE (2013)Google Scholar
  15. 15.
  16. 16.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)CrossRefGoogle Scholar
  17. 17.
    Frank, E., Bouckaert, R.R.: Naive Bayes for text classification with unbalanced classes. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 503–510. Springer (2006)Google Scholar
  18. 18.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Machine Learning: ECML 1998, pp. 137–142 (1998)CrossRefGoogle Scholar
  19. 19.
    Al-Jadir, L.: Encapsulating classification in an OODBMS for data mining applications. In: Proceedings of Seventh International Conference on Database Systems for Advanced Applications, pp. 100–101. IEEE (2001)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Boonthida Chiraratanasopha
    • 1
  • Thanaruk Theeramunkong
    • 2
    • 3
  • Salin Boonbrahm
    • 1
  1. 1.School of InformaticsWalailak UniversityNakhon Si ThammaratThailand
  2. 2.School of ICTSirindhorn International Institute of Technology, Thammasat UniversityBangkokThailand
  3. 3.The Royal Society of ThailandBangkokThailand

Personalised recommendations