Abstract
Keyword extraction of complex hierarchical categories becomes a challenge in text mining since commonly used classification for flat categories results in low accuracy. This paper presents a method to improve keyword extraction from hierarchical categories considering terms occurred in category from a hierarchy as additional factors in term-weighting. The method is an enhancement of a basic TF-IDF calculation; thus, it can comfortably be used for keyword extraction and classification. By taking term frequency and inverse document frequency of categories hierarchically related to a focused category, we can determine how important terms are in their family categories. In this work, hierarchy relations used in calculation are sub-categories, supercategories and sibling-categories. From experiment results, we found that the proposed method gained higher accuracy for about 40% from a baseline in a classification task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Uzun, Y.: Keyword extraction using Naive Bayes. Department of Computer Science, Bilkent University, Turkey (2005). www.cs.bilkent.edu.tr/~guvenir/courses/CS550/Workshop/Yasin_Uzun.pdf
Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109, 18–23 (2015)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24, 513–523 (1988)
Tipsena, R.: Automatic question classification on webboard using text mining techniques. J. Sci. Technol. Mahasarakham Univ. 33, 493 (2014). (in Thai)
Sarakit, P., Theeramunkong, T., Haruechaiyasak, C., Okumura, M.: Classifying emotion in Thai Youtube comments. In: 2015 6th International Conference of Information and Communication Technology for Embedded Systems (IC-ICTES), pp. 1–5. IEEE (2015)
Obasi, C.K., Ugwu, C.: Feature selection and vectorization in legal case documents using chi-square statistical analysis and Naïve Bayes approaches. IOSR J. Comput. Eng. 17, 42–50 (2015)
Shen, D., Ruvini, J.-D., Sarwar, B.: Large-scale item categorization for e-commerce. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 595–604. ACM (2012)
SillaJr, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22, 31–72 (2011)
Qiu, X., Huang, X., Liu, Z., Zhou, J.: Hierarchical text classification with latent concepts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 598–602. Association for Computational Linguistics (2011)
Qu, B., Cong, G., Li, C., Sun, A., Chen, H.: An evaluation of classification models for question topic categorization. J. Am. Soc. Inf. Sci. Technol. 63, 889–903 (2012)
Phachongkitphiphat, N., Vateekul, P.: An improvement of flat approach on hierarchical text classification using top-level pruning classifiers. In: 2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 86–90. IEEE (2014)
Thai Reform website. http://static.thaireform.org/
Javed, F., Luo, Q., McNair, M., Jacob, F., Zhao, M., Carotene, K.T.: A job title classification system for the online recruitment domain. In: 2015 IEEE First International Conference on Big Data Computing Service and Applications (BigDataService), pp. 286–293. IEEE (2015)
Kashireddy, S.D., Gauch, S., Billah, S.M.: Automatic class labeling for CiteSeerX. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 241–245. IEEE (2013)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Frank, E., Bouckaert, R.R.: Naive Bayes for text classification with unbalanced classes. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 503–510. Springer (2006)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Machine Learning: ECML 1998, pp. 137–142 (1998)
Al-Jadir, L.: Encapsulating classification in an OODBMS for data mining applications. In: Proceedings of Seventh International Conference on Database Systems for Advanced Applications, pp. 100–101. IEEE (2001)
Acknowledgement
Author would like to thank National Reform Council for providing comment data from Thai Reform Website.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Chiraratanasopha, B., Theeramunkong, T., Boonbrahm, S. (2019). Improved Term Weighting Factors for Keyword Extraction in Hierarchical Category Structure and Thai Text Classification. In: Theeramunkong, T., et al. Advances in Intelligent Informatics, Smart Technology and Natural Language Processing. iSAI-NLP 2017. Advances in Intelligent Systems and Computing, vol 807. Springer, Cham. https://doi.org/10.1007/978-3-319-94703-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-94703-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94702-0
Online ISBN: 978-3-319-94703-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)