Abstract
This paper proposes a semi-supervised latent Dirichlet allocation (ssLDA) method, which differs from the existing supervised topic models for multi-label classification in mainly two aspects. Firstly both labeled and unlabeled learning data are used in ssLDA to train a model, which is very important for reducing the cost by manually labeling, especially when obtaining a fully labeled dataset is difficult. Secondly ssLDA provides a more flexible training scheme that allows two ways of labeling assignment while existing topic model-based methods usually focus on either of them: (1) a document-level assignment of labels to a document; (2) imposing word-level correspondences between words and labels within a document. Our experiment results indicate that ssLDA gains an advantage over other methods in implementation flexibility and can outperform others in terms of multi-label classification performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Rudin, T.N., Chambers, A., Smyth, P., Steyvers, M.: Statical Topic Models for Multi-label Document. Machine Learning 88, 157–208 (2012)
Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a Supervised Topic Model for Credit Attribution in Multi-labeled Corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 248–256 (2009)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Wang, Y., Sabzmeydani, P., Mori, G.: Semi-latent Dirichlet Allocation: a Hierarchical Model for Human Action Recognition. In: Elgammal, A., Rosenhahn, B., Klette, R. (eds.) Human Motion 2007. LNCS, vol. 4814, pp. 240–254. Springer, Heidelberg (2007)
Chapelle, O., Scholkopf, B., Zien, A.: Semi-Supervised Learning. The MIT Press (2006)
Ashley, K.D., Bruninghaus, S.: Automatically Classifying Case Texts and Predicting Outcomes. Artificial Intelligence Law 17, 125–165 (2009)
Niu, L., Shi, Y.: Semi-supervised PLSA for Document Clustering. In: ICDMW 2010 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, pp. 1196–1203 (2010)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific (1999)
Lu, Y., Zhai, C.: Opinion Integration through Semi-supervised Topic Modeling. In: Proceedings of the 17th International Conference on World Wide Web, pp. 121–130 (2008)
Furnkranz, J., Hullermeier, E., Loza Mencia, E.: Multilabel Classification via Calibrated Label Ranking. Machine Learning 73(2), 133–153 (2008)
Loza Mencia, E., Furnkranz, J.: Efficient Pairwise Multilabel Classification for Large-scale Problems in the Legal Domain. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Datasets (Part II), pp. 50–65 (2008)
Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1, 69–90 (1999)
Support Vector Machine, http://svmlight.joachims.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lu, Y., Okada, S., Nitta, K. (2013). Semi-supervised Latent Dirichlet Allocation for Multi-label Text Classification. In: Ali, M., Bosse, T., Hindriks, K.V., Hoogendoorn, M., Jonker, C.M., Treur, J. (eds) Recent Trends in Applied Artificial Intelligence. IEA/AIE 2013. Lecture Notes in Computer Science(), vol 7906. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38577-3_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-38577-3_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38576-6
Online ISBN: 978-3-642-38577-3
eBook Packages: Computer ScienceComputer Science (R0)