Abstract
With a great amount of textual information are available on the Internet and corporate intranets, it has become a necessary to categorize large documents. As we known, text classification problem is representative multiclass problem. This paper describes a framework, which we call Strong-to-Weak- to-Strong (SWS). It transforms a “strong” learning algorithm to a “weak” algorithm by decreasing its iterative numbers of optimization while preserving its other characteristics like geometric properties and then makes use of the kernel trick for “weak” algorithms to work in high dimensional spaces, finally improves the performances of text classification. We analyzed the particular properties of learning with text and identified why this approach is appropriate for this task. Empirical results show that our approach is competitive with the other methods.
Chapter PDF
References
Allwein, E., Schapire, R., Singer, Y.: Reducing multiclass to binary: A unifying approach for margin classifiers. In: Machine Learning: Proceedings of the SeventeenthInternational Conference (2000)
Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error correcting output codes. Journal of Artificial Intelligence Research 2, 263–286 (1995)
Crammer, K., Singer, Y.: On the Learnability and Design of Output Codes for Multiclass Problems. In: Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pp. 35–46 (2000)
Lanckriet, R.G., Ghaoui, L.E., Bhattacharyya, C., Jordan, M.I.: A robust minimax approach to classification. Journal of Machine Learning Research 3, 555–582 (2002)
Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: International Conference on Machine Learning, ICML (1997)
Vapnik, V.: The Nature of Statistical Learning Theory. Spinger, New York (1995)
Marshall, A.W., Olkin, I.: Multivariate Chebyshev inequalities. Annals of Mathematical Statistics 31(4), 1001–1014 (1960)
Smola, A.J., Bartlett, P.L., Scholkopf, B., Schuurmans, D.: Advances in large margin classifiers. MIT Press, Cambridge (2000)
Freund, Y.: Boosting a weak learning algorithm by majority. Information and Computation 121(2), 256–285 (1995)
Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Müller, K.-R.: Fisher discriminant analysis with kernels. Neural Networks for Signal Processing IX, 41–48 (1999)
Schölkopf, B., Smola, A.J., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10, 1299–1319 (1998)
Aha, D.W., Bankert, R.L.: Cloud classification using error-correcting output codes. In: Artificial Intelligence Applications: Natural Science, Agriculture, and Environmental Science, vol. 11, pp. 13–28 (1997)
Hsu, C., Lin, C.A.: Comparison of methods for multiclass support vector machines. Technical report, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, 19 (2001)
Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin DAGs for multiclass classifycation. In: Advances in Neural Information Processing Systems, vol. 12, pp. 547–553. MIT Press, Cambridge (2000)
Joachims, T.: Text cateforization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)
Kudo, T., Matsumoto, Y.: Fast methods for kernel-based text analysis. In: Proceedings of the 41est Annual Meeting of the Association for Computational Linguistics, pp. 24–31 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qiang, Q., He, Q. (2006). A Multiclass Classification Framework for Document Categorization. In: Bunke, H., Spitz, A.L. (eds) Document Analysis Systems VII. DAS 2006. Lecture Notes in Computer Science, vol 3872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11669487_42
Download citation
DOI: https://doi.org/10.1007/11669487_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32140-8
Online ISBN: 978-3-540-32157-6
eBook Packages: Computer ScienceComputer Science (R0)