Semi-supervised Latent Dirichlet Allocation for Multi-label Text Classification

Lu, Youwei; Okada, Shogo; Nitta, Katsumi

doi:10.1007/978-3-642-38577-3_36

Youwei Lu²⁴,
Shogo Okada²⁴ &
Katsumi Nitta²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7906))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

4517 Accesses
6 Citations

Abstract

This paper proposes a semi-supervised latent Dirichlet allocation (ssLDA) method, which differs from the existing supervised topic models for multi-label classification in mainly two aspects. Firstly both labeled and unlabeled learning data are used in ssLDA to train a model, which is very important for reducing the cost by manually labeling, especially when obtaining a fully labeled dataset is difficult. Secondly ssLDA provides a more flexible training scheme that allows two ways of labeling assignment while existing topic model-based methods usually focus on either of them: (1) a document-level assignment of labels to a document; (2) imposing word-level correspondences between words and labels within a document. Our experiment results indicate that ssLDA gains an advantage over other methods in implementation flexibility and can outperform others in terms of multi-label classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rudin, T.N., Chambers, A., Smyth, P., Steyvers, M.: Statical Topic Models for Multi-label Document. Machine Learning 88, 157–208 (2012)
Article MathSciNet Google Scholar
Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a Supervised Topic Model for Credit Attribution in Multi-labeled Corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 248–256 (2009)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Wang, Y., Sabzmeydani, P., Mori, G.: Semi-latent Dirichlet Allocation: a Hierarchical Model for Human Action Recognition. In: Elgammal, A., Rosenhahn, B., Klette, R. (eds.) Human Motion 2007. LNCS, vol. 4814, pp. 240–254. Springer, Heidelberg (2007)
Chapter Google Scholar
Chapelle, O., Scholkopf, B., Zien, A.: Semi-Supervised Learning. The MIT Press (2006)
Google Scholar
Ashley, K.D., Bruninghaus, S.: Automatically Classifying Case Texts and Predicting Outcomes. Artificial Intelligence Law 17, 125–165 (2009)
Article Google Scholar
Niu, L., Shi, Y.: Semi-supervised PLSA for Document Clustering. In: ICDMW 2010 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, pp. 1196–1203 (2010)
Google Scholar
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific (1999)
Google Scholar
Lu, Y., Zhai, C.: Opinion Integration through Semi-supervised Topic Modeling. In: Proceedings of the 17th International Conference on World Wide Web, pp. 121–130 (2008)
Google Scholar
Furnkranz, J., Hullermeier, E., Loza Mencia, E.: Multilabel Classification via Calibrated Label Ranking. Machine Learning 73(2), 133–153 (2008)
Article Google Scholar
Loza Mencia, E., Furnkranz, J.: Efficient Pairwise Multilabel Classification for Large-scale Problems in the Legal Domain. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Datasets (Part II), pp. 50–65 (2008)
Google Scholar
Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1, 69–90 (1999)
Article Google Scholar
Support Vector Machine, http://svmlight.joachims.org

Download references

Author information

Authors and Affiliations

Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Tokyo, Japan
Youwei Lu, Shogo Okada & Katsumi Nitta

Authors

Youwei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Shogo Okada
View author publications
You can also search for this author in PubMed Google Scholar
Katsumi Nitta
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Texas State University, 78666, San Marcos, TX, USA
Moonis Ali
Agent Systems Research Group, Department of Computer Science, Faculty of Sciences, VU University Amsterdam, De Boelelaan 1081, 1081, Amsterdam, HV, The Netherlands
Tibor Bosse
Interactive Intelligence Group, Department of Intelligent Systems, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
Koen V. Hindriks & Catholijn M. Jonker &
Computational Intelligence Group, Department of Computer Science, Faculty of Sciences, VU University Amsterdam, De Boelelaan 1081, 1081 HV, Amsterdam, The Netherlands
Mark Hoogendoorn
Agent Systems Research Group, Department of Computer Science, Faculty of Sciences, VU University Amsterdam, De Boelelaan 1081, 1081 HV, Amsterdam, The Netherlands
Jan Treur

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, Y., Okada, S., Nitta, K. (2013). Semi-supervised Latent Dirichlet Allocation for Multi-label Text Classification. In: Ali, M., Bosse, T., Hindriks, K.V., Hoogendoorn, M., Jonker, C.M., Treur, J. (eds) Recent Trends in Applied Artificial Intelligence. IEA/AIE 2013. Lecture Notes in Computer Science(), vol 7906. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38577-3_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-38577-3_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38576-6
Online ISBN: 978-3-642-38577-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics