A Classifier Design Based on Combining Multiple Components by Maximum Entropy Principle

Fujino, Akinori; Ueda, Naonori; Saito, Kazumi

doi:10.1007/11562382_33

Akinori Fujino²⁰,
Naonori Ueda²⁰ &
Kazumi Saito²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Included in the following conference series:

Asia Information Retrieval Symposium

1008 Accesses

Abstract

Designing high performance classifiers for structured data consisting of multiple components is an important and challenging research issue in the field of machine learning. Although the main component of structured data plays an important role when designing classifiers, additional components may contain beneficial information for classification. This paper focuses on a probabilistic classifier design for multiclass classification based on the combination of main and additional components. Our formulation separately considers component generative models and constructs the classifier by combining these trained models based on the maximum entropy principle. We use naive Bayes models as the component generative models for text and link components so that we can apply our classifier design to document and web page classification problems. Our experimental results for three test collections confirmed that the proposed method effectively combined the main and additional components to improve classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allwein, E.L., Schapire, R.E., Singer, Y.: Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research 1, 113–141 (2000)
Article MathSciNet Google Scholar
Brochu, E., Freitas, N.: “Name that song!”: A probabilistic approach to querying on music and text. In: Advances in Neural Information Processing Systems, vol. 15, pp. 1505–1512. MIT Press, Cambridge (2003)
Google Scholar
Berger, A., Della Pietra, S., Della Pietra, V.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)
Google Scholar
Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of ACM International Conference on Management of Data (SIGMOD 1998), pp. 307–318 (1998)
Google Scholar
Chen, S.F., Rosenfeld, R.: A Gaussian prior for smoothing maximum entropy models, Technical Report, Carnegie Mellon University (1999)
Google Scholar
Cohn, D., Hofmann, T.: The missing link - A probabilistic model of document content and hypertext connectivity. In: Advances in Neural Information Processing Systems, vol. 13, pp. 430–436. MIT Press, Cambridge (2001)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2001)
MATH Google Scholar
Liu, D.C., Nocedel, J.: On the limited memory BFGS method for large scale optimization. Math. Programming 45(3, (ser. B)), 503–528 (1989)
Google Scholar
Lu, Q., Getoor, L.: Link-based text classification. In: IJCAI Workshop on Text-Mining & Link-Analysis (TextLink 2003) (2003)
Google Scholar
Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, pp. 61–67 (1999)
Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)
Article MATH Google Scholar
Raina, R., Shen, Y., Ng, A.Y., McCallum, A.: Classification with hybrid generative/discriminative models. In: Advances in Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2004)
Google Scholar
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. Journal of Machine Learning Research 5, 101–141 (2004)
MathSciNet Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Sun, A., Lim, E.-P., Ng, W.-K.: Web classification using support vector machine. In: Proceedings of 4th Int. Workshop on Web Information and Data Management (WIDM 2002) held in conj. with CIKM 2002, pp. 96–99 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0237, Japan
Akinori Fujino, Naonori Ueda & Kazumi Saito

Authors

Akinori Fujino
View author publications
You can also search for this author in PubMed Google Scholar
Naonori Ueda
View author publications
You can also search for this author in PubMed Google Scholar
Kazumi Saito
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, 790-784, Pohang, Korea
Gary Geunbae Lee
Computer and Communication Media Research, NEC Corp., Miyazaki 4-1-1, Miyamae-ku, 216-8555, Kawasaki, Japan
Akio Yamada
Human-Computer Communications Laboratory, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong
Helen Meng
School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, 305-732, Daejeon, Korea
Sung Hyon Myaeng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fujino, A., Ueda, N., Saito, K. (2005). A Classifier Design Based on Combining Multiple Components by Maximum Entropy Principle. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_33

Download citation

DOI: https://doi.org/10.1007/11562382_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics