Improved Term Weighting Factors for Keyword Extraction in Hierarchical Category Structure and Thai Text Classification

  • Boonthida ChiraratanasophaEmail author
  • Thanaruk Theeramunkong
  • Salin Boonbrahm
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 807)


Keyword extraction of complex hierarchical categories becomes a challenge in text mining since commonly used classification for flat categories results in low accuracy. This paper presents a method to improve keyword extraction from hierarchical categories considering terms occurred in category from a hierarchy as additional factors in term-weighting. The method is an enhancement of a basic TF-IDF calculation; thus, it can comfortably be used for keyword extraction and classification. By taking term frequency and inverse document frequency of categories hierarchically related to a focused category, we can determine how important terms are in their family categories. In this work, hierarchy relations used in calculation are sub-categories, supercategories and sibling-categories. From experiment results, we found that the proposed method gained higher accuracy for about 40% from a baseline in a classification task.


Keyword extraction Term weighting Hierarchical categories Term frequency-inverse documents frequency (TF-IDF) 



Author would like to thank National Reform Council for providing comment data from Thai Reform Website.


  1. 1.
    Uzun, Y.: Keyword extraction using Naive Bayes. Department of Computer Science, Bilkent University, Turkey (2005).
  2. 2.
    Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109, 18–23 (2015)Google Scholar
  3. 3.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24, 513–523 (1988)CrossRefGoogle Scholar
  4. 4.
    Tipsena, R.: Automatic question classification on webboard using text mining techniques. J. Sci. Technol. Mahasarakham Univ. 33, 493 (2014). (in Thai)Google Scholar
  5. 5.
    Sarakit, P., Theeramunkong, T., Haruechaiyasak, C., Okumura, M.: Classifying emotion in Thai Youtube comments. In: 2015 6th International Conference of Information and Communication Technology for Embedded Systems (IC-ICTES), pp. 1–5. IEEE (2015)Google Scholar
  6. 6.
    Obasi, C.K., Ugwu, C.: Feature selection and vectorization in legal case documents using chi-square statistical analysis and Naïve Bayes approaches. IOSR J. Comput. Eng. 17, 42–50 (2015)Google Scholar
  7. 7.
    Shen, D., Ruvini, J.-D., Sarwar, B.: Large-scale item categorization for e-commerce. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 595–604. ACM (2012)Google Scholar
  8. 8.
    SillaJr, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22, 31–72 (2011)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Qiu, X., Huang, X., Liu, Z., Zhou, J.: Hierarchical text classification with latent concepts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 598–602. Association for Computational Linguistics (2011)Google Scholar
  10. 10.
    Qu, B., Cong, G., Li, C., Sun, A., Chen, H.: An evaluation of classification models for question topic categorization. J. Am. Soc. Inf. Sci. Technol. 63, 889–903 (2012)CrossRefGoogle Scholar
  11. 11.
    Phachongkitphiphat, N., Vateekul, P.: An improvement of flat approach on hierarchical text classification using top-level pruning classifiers. In: 2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 86–90. IEEE (2014)Google Scholar
  12. 12.
    Thai Reform website.
  13. 13.
    Javed, F., Luo, Q., McNair, M., Jacob, F., Zhao, M., Carotene, K.T.: A job title classification system for the online recruitment domain. In: 2015 IEEE First International Conference on Big Data Computing Service and Applications (BigDataService), pp. 286–293. IEEE (2015)Google Scholar
  14. 14.
    Kashireddy, S.D., Gauch, S., Billah, S.M.: Automatic class labeling for CiteSeerX. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 241–245. IEEE (2013)Google Scholar
  15. 15.
  16. 16.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)CrossRefGoogle Scholar
  17. 17.
    Frank, E., Bouckaert, R.R.: Naive Bayes for text classification with unbalanced classes. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 503–510. Springer (2006)Google Scholar
  18. 18.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Machine Learning: ECML 1998, pp. 137–142 (1998)CrossRefGoogle Scholar
  19. 19.
    Al-Jadir, L.: Encapsulating classification in an OODBMS for data mining applications. In: Proceedings of Seventh International Conference on Database Systems for Advanced Applications, pp. 100–101. IEEE (2001)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Boonthida Chiraratanasopha
    • 1
  • Thanaruk Theeramunkong
    • 2
    • 3
  • Salin Boonbrahm
    • 1
  1. 1.School of InformaticsWalailak UniversityNakhon Si ThammaratThailand
  2. 2.School of ICTSirindhorn International Institute of Technology, Thammasat UniversityBangkokThailand
  3. 3.The Royal Society of ThailandBangkokThailand

Personalised recommendations