Skip to main content

A Supervised Term Weighting Scheme for Multi-class Text Categorization

  • Conference paper
  • First Online:
Intelligent Computing Methodologies (ICIC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10363))

Included in the following conference series:

Abstract

Most supervised term weighting (STW) schemes can only be applied to binary text classification tasks such as sentiment analysis (SA) rather than text classification with more than two categories. In this paper, we proposed a new supervised term weighting scheme for multi-class text categorization. The so-called inverse term entropy (ite) measures the distribution of different terms across all the categories according to the definition of entropy in information theory. We present experimental results obtained on the 20NewsGroup dataset with a popular classifier learning method, support vector machine (SVM). Our weighting scheme ite achieved the best result in classification accuracy compared with other existing methods. And ite has the most stable performance with the reduction of training samples as well. Furthermore, our method has a built-in property to prevent over-weighting in STW. Over-weighting is a newly proposed concept especially with supervised term weightings in our earlier work and re-introduced here. Caused by the improper singular terms and too large ratios between term weights, over-weighting could deprive the performance of text classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bata, I., Hauskrecht, M.: Boosting KNN text classification accuracy by using supervised term weighting schemes. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2041–2044. ACM, November 2009

    Google Scholar 

  2. Croft, W.B.: Experiments with representation in a document-retrieval system. Inf. Technol.-Res. Dev. Appl. 2(1), 1–21 (1983)

    Google Scholar 

  3. Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Sirmakessis, S. (ed.) Text Mining and Its Applications. Springer, Heidelberg, pp. 81–97 (2004)

    Chapter  Google Scholar 

  4. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(1), 1871–1874 (2008)

    MATH  Google Scholar 

  5. Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)

    Article  Google Scholar 

  6. Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments. Inf. Process. Manag. 36(6), 779–808 (2000)

    Article  Google Scholar 

  7. Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, pp. 331–339, July 1995

    Google Scholar 

  8. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics, June 2011

    Google Scholar 

  9. Martineau, J., Finin, T.: Delta TFIDF: an improved feature space for sentiment analysis. ICWSM 9, 106 (2009)

    Google Scholar 

  10. Paltoglou, G., Thelwall, M.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1386–1395. Association for Computational Linguistics, July 2010

    Google Scholar 

  11. Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics, July 2004

    Google Scholar 

  12. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  13. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  14. Soucy, P., Mineau, G.W.: Beyond TFIDF weighting for text categorization in the vector space model. IJCAI 5, 1130–1135 (2005)

    Google Scholar 

  15. Wu, H., Gu, X., Gu, Y.: Balancing between over-weighting and under-weighting in supervised term weighting. Inf. Process. Manag. 53(2), 547–557 (2017). doi:10.1016/j.ipm.2016.10.003

    Article  Google Scholar 

  16. Wu, H., Salton, G.: A comparison of search term weighting: term relevance vs. inverse document frequency. In: ACM SIGIR Forum, vol. 16, no. 1, pp. 30–39. ACM, May 1981

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by National Natural Science Foundation of China under grant 61371148.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaodong Gu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Gu, Y., Gu, X. (2017). A Supervised Term Weighting Scheme for Multi-class Text Categorization. In: Huang, DS., Hussain, A., Han, K., Gromiha, M. (eds) Intelligent Computing Methodologies. ICIC 2017. Lecture Notes in Computer Science(), vol 10363. Springer, Cham. https://doi.org/10.1007/978-3-319-63315-2_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63315-2_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63314-5

  • Online ISBN: 978-3-319-63315-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics