Abstract
Most supervised term weighting (STW) schemes can only be applied to binary text classification tasks such as sentiment analysis (SA) rather than text classification with more than two categories. In this paper, we proposed a new supervised term weighting scheme for multi-class text categorization. The so-called inverse term entropy (ite) measures the distribution of different terms across all the categories according to the definition of entropy in information theory. We present experimental results obtained on the 20NewsGroup dataset with a popular classifier learning method, support vector machine (SVM). Our weighting scheme ite achieved the best result in classification accuracy compared with other existing methods. And ite has the most stable performance with the reduction of training samples as well. Furthermore, our method has a built-in property to prevent over-weighting in STW. Over-weighting is a newly proposed concept especially with supervised term weightings in our earlier work and re-introduced here. Caused by the improper singular terms and too large ratios between term weights, over-weighting could deprive the performance of text classification tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bata, I., Hauskrecht, M.: Boosting KNN text classification accuracy by using supervised term weighting schemes. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2041–2044. ACM, November 2009
Croft, W.B.: Experiments with representation in a document-retrieval system. Inf. Technol.-Res. Dev. Appl. 2(1), 1–21 (1983)
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Sirmakessis, S. (ed.) Text Mining and Its Applications. Springer, Heidelberg, pp. 81–97 (2004)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(1), 1871–1874 (2008)
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments. Inf. Process. Manag. 36(6), 779–808 (2000)
Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, pp. 331–339, July 1995
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics, June 2011
Martineau, J., Finin, T.: Delta TFIDF: an improved feature space for sentiment analysis. ICWSM 9, 106 (2009)
Paltoglou, G., Thelwall, M.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1386–1395. Association for Computational Linguistics, July 2010
Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics, July 2004
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Soucy, P., Mineau, G.W.: Beyond TFIDF weighting for text categorization in the vector space model. IJCAI 5, 1130–1135 (2005)
Wu, H., Gu, X., Gu, Y.: Balancing between over-weighting and under-weighting in supervised term weighting. Inf. Process. Manag. 53(2), 547–557 (2017). doi:10.1016/j.ipm.2016.10.003
Wu, H., Salton, G.: A comparison of search term weighting: term relevance vs. inverse document frequency. In: ACM SIGIR Forum, vol. 16, no. 1, pp. 30–39. ACM, May 1981
Acknowledgments
This work was supported in part by National Natural Science Foundation of China under grant 61371148.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Gu, Y., Gu, X. (2017). A Supervised Term Weighting Scheme for Multi-class Text Categorization. In: Huang, DS., Hussain, A., Han, K., Gromiha, M. (eds) Intelligent Computing Methodologies. ICIC 2017. Lecture Notes in Computer Science(), vol 10363. Springer, Cham. https://doi.org/10.1007/978-3-319-63315-2_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-63315-2_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63314-5
Online ISBN: 978-3-319-63315-2
eBook Packages: Computer ScienceComputer Science (R0)