A Term Weighting Approach for Text Categorization

Lee, Kyung-Chan; Kang, Seung-Shik; Hahn, Kwang-Soo

doi:10.1007/11562382_66

Kyung-Chan Lee²⁰,
Seung-Shik Kang²⁰ &
Kwang-Soo Hahn²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Included in the following conference series:

Asia Information Retrieval Symposium

1002 Accesses
1 Citations

Abstract

It is common that representative words in a document are identified and discriminated by their statistical distribution of their frequency statistics. We assume that evaluating the confidence measure of terms through content-based document analysis leads to a better performance than the parametric assumptions of the standard frequency-based method. In this paper, we propose a new approach of term weighting method that replaces the frequency-based probabilistic methods. Experiments on Naïve Bayesian classifiers showed that our approach achieved an improvement compared to the frequency-based method on each point of the evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Model-induced term-weighting schemes for text classification

Article 15 January 2016

A New Improved Term Weighting Scheme for Text Categorization

A Comparative Study on Term Weighting Schemes for Text Classification

References

Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Yang, Y., Zhang, J., Kisiel, B.: A Scalability Analysis of Classifiers in Text Categorization. In: SIGIR 2003, pp. 96–103 (2003)
Google Scholar
Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proceedings of Int. Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)
Google Scholar
Bennett, P.: Using symmetric Distributions to Improve Text Classifier Probability Estimates. In: SIGIR 2003, pp. 111–118 (2003)
Google Scholar
Yang, Y., Pedersen, J.P.: A Comparative Study on Feature Selection in Text Categorization. In: Fisher Jr., D.H. (ed.) Proceedings of the 14th Int. Conference on Machine Learning, pp. 412–420 (1997)
Google Scholar
Lam, W., Lai, K.: A Meta-Learning Approach for Text Categorization. In: SIGIR 2001, pp. 303–309 (2001)
Google Scholar
Robertson, S.: The Probability Ranking Principle in IR, pp. 281–286. Morgan Kaufmann Publishers, San Francisco (1997)
Google Scholar
Bekkerman, R., El-Yaniv, R., Tisshby, N., Winter, Y.: On Feature Distributional Clustering for Text Categorization. In: SIGIR 2001, pp. 146–153 (2001)
Google Scholar
Kawatani, T.: Topic Difference Factor Extraction between Two Document Sets and its Application to Text Categorization. In: SIGIR 2002, pp. 137–144 (2002)
Google Scholar
Rijsbergen, C., Harper, D., Porter, M.: The Selection of Good Search Terms. Information Processing and Management 17, 77–91 (1981)
Article Google Scholar
Lai, Y., Wu, C.: Meaningful Term Extraction and Discriminative Term Selection in Text Categorization via Unknown-Word Methodology. ACM Transactions on Asian Languages Information Processing 1(1), 34–64 (2002)
Article Google Scholar
Yang, Y.: A Study on Thresholding Strategies for Text Categorization. In: Proceedings of SIGIR 2001, pp. 137–145 (2001)
Google Scholar
Kang, S., Lee, H., Son, S., Hong, G., Moon, B.: Term Weighting Method by Postposition and Compound Noun Recognition. In: Proceedings of the 13th Conference on Korean Language Computing, pp. 196–198 (2001)
Google Scholar
Ko, Y., Park, J., Seo, J.: Automatic Text Categorization using the Importance of Sentences. Journal of Korean Information Science Society: Software and Application, 417–423 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Kookmin University & AITrc, Seoul, 136-702, Korea
Kyung-Chan Lee, Seung-Shik Kang & Kwang-Soo Hahn

Authors

Kyung-Chan Lee
View author publications
You can also search for this author in PubMed Google Scholar
Seung-Shik Kang
View author publications
You can also search for this author in PubMed Google Scholar
Kwang-Soo Hahn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, 790-784, Pohang, Korea
Gary Geunbae Lee
Computer and Communication Media Research, NEC Corp., Miyazaki 4-1-1, Miyamae-ku, 216-8555, Kawasaki, Japan
Akio Yamada
Human-Computer Communications Laboratory, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong
Helen Meng
School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, 305-732, Daejeon, Korea
Sung Hyon Myaeng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, KC., Kang, SS., Hahn, KS. (2005). A Term Weighting Approach for Text Categorization. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_66

Download citation

DOI: https://doi.org/10.1007/11562382_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Term Weighting Approach for Text Categorization

Abstract

Access this chapter

Preview

Similar content being viewed by others

Model-induced term-weighting schemes for text classification

A New Improved Term Weighting Scheme for Text Categorization

A Comparative Study on Term Weighting Schemes for Text Classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Term Weighting Approach for Text Categorization

Abstract

Access this chapter

Preview

Similar content being viewed by others

Model-induced term-weighting schemes for text classification

A New Improved Term Weighting Scheme for Text Categorization

A Comparative Study on Term Weighting Schemes for Text Classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation