Estimation of Class Membership Probabilities in the Document Classification

Takahashi, Kazuko; Takamura, Hiroya; Okumura, Manabu

doi:10.1007/978-3-540-71701-0_29

Estimation of Class Membership Probabilities in the Document Classification

Kazuko Takahashi¹,
Hiroya Takamura² &
Manabu Okumura²

Conference paper

1871 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4426))

Abstract

We propose a method for estimating class membership probabilities of a predicted class, using classification scores not only for the predicted class but also for other classes in a document classification. Class membership probabilities are important in many applications in document classification, in which multiclass classification is often applied. In the proposed method, we first make an accuracy table by counting the number of correctly classified training samples in each range or cell of classification scores. We then apply smoothing methods such as a moving average method with coverage to the accuracy table. In order to determine the class membership probability of an unknown sample, we first calculate the classification scores of the sample, then find the range or cell that corresponds to the scores and output the values associated in the range or cell in the accuracy table. Through experiments on two different datasets with both Support Vector Machines and Naive Bayes classifiers, we empirically show that the use of multiple classification scores is effective in the estimation of class membership probabilities, and that the proposed smoothing methods for the accuracy table work quite well. We also show that the estimated class membership probabilities by the proposed method are useful in the detection of the misclassified samples.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agui, T., Nakajima, M.: Graphical Information Processing. Morikita Press, Tokyo (1991)
Google Scholar
Bennett, P.N.: Assessing the Calibration of Naive Bayes’s Posterior Estimates. Technical Report CMU-CS-00-155, pp. 1-8. School of Computer Science, Carnegie Mellon University (2000)
Google Scholar
Chan, Y.S., Ng, H.T.: Estimating Class Priors in Domain Adaptaion for Word Sense Disambiguation. In: Proceedings of 21st International Conference on Computaional Linguistic and 44th Annual Meeting of the ACL, pp. 89–96 (2006)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of 10th European Conference on Machine Learning, pp. 137–142 (1998)
Google Scholar
Jones, R., et al.: Generating Query Substitutions. In: Proceedings of International World Wide Web Conference, pp. 387–396 (2006)
Google Scholar
Kita, K.: Laguage and Computing Volume 4: Probabilistic Language Model. University of Tokyo Press, Tokyo (1999)
Google Scholar
Kressel, U.: Pairwise classification and support vector machines. In: Schölkopf, B., et al. (eds.) Advances in Kernel Methods Support Vector Learning, pp. 255–268. MIT Press, Cambridge (1999)
Google Scholar
Niculescu-Mizil, A., Caruana, R.: Predicting Good Probabilities With Supervised Learning. In: Proceedings of 22nd International Conference on Machine Learning, pp. 625–632 (2005)
Google Scholar
Nigam, K., et al.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)
Article MATH Google Scholar
Platt, J.C.: Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. In: Smola, A.J., et al. (eds.) Advances in Large Margin Classifiers, pp. 1–11. MIT Press, Cambridge (1999)
Google Scholar
Sakamoto, Y., Ishiguro, M., Kitagawa, G.: Akaike Information Criterion Statistics. Kyoritsu Press, Tokyo (1983)
Google Scholar
Schohn, G., Cohn, D.: Less is More: Active Learning with Support Vector Machines. In: Proceedings of 17th International Conference on Machine Learning, pp. 839–846 (2000)
Google Scholar
Takahashi, K., Takamura, H., Okumura, M.: Automatic Occupation Coding with Combination of Machine Learning and Hand-Crafted Rules. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 269–279. Springer, Heidelberg (2005)
Google Scholar
Takahashi, K., et al.: Applying Occupation Coding Supporting System for Coders (NANACO) in JGSS-2003. In: Japanese Value and Behavioral Pattern Seen in JGSS in 2003, pp. 225–242. The IRS at Osaka University of Commerce (2005)
Google Scholar
Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayeian classifiers. In: Proceedings of 7th International Conference on Knowledge Discovery and Data Mining, pp. 609–616 (2001)
Google Scholar
Zadrozny, B., Elkan, C.: Learning and Making Decisions When Costs and Probabilities are Both Unknown. In: Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining (KDD’01), pp. 204–213 (2001)
Google Scholar
Zadrozny, B., Elkan, C.: Transforming Classifier Scores into Accurate Multiclass Probability Estimates. In: Proceedings of 8th International Conference on Knowledge Discovery and Data Mining, pp. 694–699 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Keiai University, Faculty of International Studies, 1-9 Sanno, Sakura, Japan
Kazuko Takahashi
Tokyo Institute of Technology, Precision and Intelligence Laboratory, 4259 Nagatsuta-cho Midori-ku, Yokohama, Japan
Hiroya Takamura & Manabu Okumura

Authors

Kazuko Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Hiroya Takamura
View author publications
You can also search for this author in PubMed Google Scholar
Manabu Okumura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zhi-Hua Zhou Hang Li Qiang Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Takahashi, K., Takamura, H., Okumura, M. (2007). Estimation of Class Membership Probabilities in the Document Classification. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_29

Download citation

DOI: https://doi.org/10.1007/978-3-540-71701-0_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71700-3
Online ISBN: 978-3-540-71701-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics