Abstract
We present a generative model based approach for transductive learning for text classification. Our approach combines three methodological ingredients: learning from background corpora, latent variable models for decomposing the topic-word space into topic-concept and concept-word spaces, and explicit knowledge models (light-weight ontologies, thesauri, e.g. WordNet) with named concepts for populating latent variables. The combination has synergies that can boost the combined performance. This paper presents the theoretical model and extensive experimental results on three data collections. Our experiments show improved classification results over state-of-the-art classification techniques such as the Spectral Graph Transducer and Transductive Support Vector Machines, particularly for the case of sparse training.
Chapter PDF
Similar content being viewed by others
References
Bennet, K.: Combining support vector and mathematical programming methods for classification. In: Advances in Kernel Methods. MIT-Press, Cambridge (1999)
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. In: NIPS (2002)
Bloehdorn, S., Hotho, A.: Text classification by boosting weak learners based on terms and concepts. In: ICDM (2004)
Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: ICML (2001)
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufman Publishers, San Francisco (2003)
Deerwester, S., Dumais, S.T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6) (1990)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1999)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1) (2001)
Ifrim, G., Theobald, M., Weikum, G.: Learning word-to-concept mappings for automatic text classification. In: Learning in Web Search Workshop, ICML (2005)
Ifrim, G.: A Bayesian Learning Approach to Concept-Based Document Classification. Master Thesis (2005), http://www.mpi-inf.de/~ifrim/publications/
Joachims, T.: Transductive learning via spectral graph partitioning. In: ICML (2003)
Joachims, T.: Transductive inference for text classification using Support Vector Machines. In: ICML (1999)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: ECML (1998)
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization (1998)
Ng, A., Jordan, M.: On discriminative versus generative classifiers: A comparison of logistic regression and naive bayes. In: NIPS (2001)
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning (39) (2000)
Rennie, J.: Tackling the poor assumptions of naive bayes. In: ICML (2003)
Sebastiani, F.: Machine learning in automated text categorization. ACM, New York (2002)
Scott, S., Matwin, S.: Feature engineering for text classification. In: ICML (1999)
Vapnik, V.: Statistical learning theory. Wiley, Chichester (1998)
Zhang, T., Oles, F.J.: A probability analysis on the value of unlabeled data for classification problems. In: ICML (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ifrim, G., Weikum, G. (2006). Transductive Learning for Text Classification Using Explicit Knowledge Models. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science(), vol 4213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871637_24
Download citation
DOI: https://doi.org/10.1007/11871637_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45374-1
Online ISBN: 978-3-540-46048-0
eBook Packages: Computer ScienceComputer Science (R0)