Abstract
A number of powerful kernel-based learning machines, such as support vector machines (SVMs), kernel Fisher discriminant analysis (KFDA), have been proposed with competitive performance. However, directly applying existing attractive kernel approaches to text classification (TC) task will suffer semantic related information deficiency and incur huge computation costs hindering their practical use in numerous large scale and real-time applications with fast testing requirement. To tackle this problem, this paper proposes a novel semantic kernel-based framework for efficient TC which offers a sparse representation of the final optimal prediction function while preserving the semantic related information in kernel approximate subspace. Experiments on 20-Newsgroup dataset demonstrate the proposed method compared with SVM and KNN (K-nearest neighbor) can significantly reduce the computation costs in predicating phase while maintaining considerable classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Comput. Surv (CSUR) 34(1), 1–47 (2002)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)
Cristianini, N., Shawe-Taylor, J., Lodhi, H.: Latent Semantic Kernels. J. Intell. Inf. Syst. (JIIS) 18(2-3), 127–152 (2002)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Commun. ACM (CACM) 18(11), 613–620 (1975)
Kandola, J., Shawe-Taylor, J., Cristianini, N.: Learning Semantic Similarity. In: NIPS, pp. 657–664 (2002)
Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.: Text Relatedness Based on a Word Thesaurus. J. Artif. Intell. Res (JAIR) 37, 1–39 (2010)
Wang, H., Chen, Y., Dai, Y.: A Soft Real-Time Web News Classification System with Double Control Loops. In: Fan, W., Wu, Z., Yang, J. (eds.) WAIM 2005. LNCS, vol. 3739, pp. 81–90. Springer, Heidelberg (2005)
Miltsakaki, E., Troutt, A.: Real-time Web Text Classification and Analysis of Reading Difficulty. In: The Third Workshop on Innovative Use of NLP for Building Educational Applications at ACL, pp. 89–97 (2008)
Smola, A.J., Schökopf, B.: Sparse Greedy Matrix Approximation for Machine Learning. In: ICML, pp. 911–918 (2000)
Fine, S., Scheinberg, K.: Efficient SVM Training Using Low-Rank Kernel Representations. Journal of Machine Learning Research (JMLR) 2, 243–264 (2001)
Burges, C.J.C.: Simplified Support Vector Decision Rules. In: ICML, pp. 71–77 (1996)
Zhang, Q., Li, J.: Constructing Sparse KFDA Using Pre-image Reconstruction. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010, Part II. LNCS, vol. 6444, pp. 658–667. Springer, Heidelberg (2010)
Wu, M., Schölkopf, B., Bakir, G.: Building Sparse Large Margin Classifiers. In: ICML, pp. 996–1003 (2005)
Diethe, T., Hussain, Z., Hardoon, D.R., Shawe-Taylor, J.: Matching Pursuit Kernel Fisher Discriminant Analysis. In: AISTATS, pp. 121–128 (2009)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. JASIS 41(6), 391–407 (1990)
Wang, P., Domeniconi, C.: Building Semantic Kernels for Text Classification Using Wikipedia. In: KDD, pp. 713–21 (2008)
Hu, X., Zhang, X., Lu, C., Park, E.K., Zhou, X.: Exploiting Wikipedia as External Knowledge for Document Clustering. In: KDD, pp. 389–396 (2009)
20 Newsgroups Dataset, http://people.csail.mit.edu/jrennie/20Newsgroups/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Q., Li, J., Zhang, Z. (2011). Efficient Semantic Kernel-Based Text Classification Using Matching Pursuit KFDA. In: Lu, BL., Zhang, L., Kwok, J. (eds) Neural Information Processing. ICONIP 2011. Lecture Notes in Computer Science, vol 7063. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24958-7_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-24958-7_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24957-0
Online ISBN: 978-3-642-24958-7
eBook Packages: Computer ScienceComputer Science (R0)