Skip to main content

Semi-supervised Latent Dirichlet Allocation for Multi-label Text Classification

  • Conference paper
Recent Trends in Applied Artificial Intelligence (IEA/AIE 2013)

Abstract

This paper proposes a semi-supervised latent Dirichlet allocation (ssLDA) method, which differs from the existing supervised topic models for multi-label classification in mainly two aspects. Firstly both labeled and unlabeled learning data are used in ssLDA to train a model, which is very important for reducing the cost by manually labeling, especially when obtaining a fully labeled dataset is difficult. Secondly ssLDA provides a more flexible training scheme that allows two ways of labeling assignment while existing topic model-based methods usually focus on either of them: (1) a document-level assignment of labels to a document; (2) imposing word-level correspondences between words and labels within a document. Our experiment results indicate that ssLDA gains an advantage over other methods in implementation flexibility and can outperform others in terms of multi-label classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rudin, T.N., Chambers, A., Smyth, P., Steyvers, M.: Statical Topic Models for Multi-label Document. Machine Learning 88, 157–208 (2012)

    Article  MathSciNet  Google Scholar 

  2. Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a Supervised Topic Model for Credit Attribution in Multi-labeled Corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 248–256 (2009)

    Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Wang, Y., Sabzmeydani, P., Mori, G.: Semi-latent Dirichlet Allocation: a Hierarchical Model for Human Action Recognition. In: Elgammal, A., Rosenhahn, B., Klette, R. (eds.) Human Motion 2007. LNCS, vol. 4814, pp. 240–254. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Chapelle, O., Scholkopf, B., Zien, A.: Semi-Supervised Learning. The MIT Press (2006)

    Google Scholar 

  6. Ashley, K.D., Bruninghaus, S.: Automatically Classifying Case Texts and Predicting Outcomes. Artificial Intelligence Law 17, 125–165 (2009)

    Article  Google Scholar 

  7. Niu, L., Shi, Y.: Semi-supervised PLSA for Document Clustering. In: ICDMW 2010 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, pp. 1196–1203 (2010)

    Google Scholar 

  8. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific (1999)

    Google Scholar 

  9. Lu, Y., Zhai, C.: Opinion Integration through Semi-supervised Topic Modeling. In: Proceedings of the 17th International Conference on World Wide Web, pp. 121–130 (2008)

    Google Scholar 

  10. Furnkranz, J., Hullermeier, E., Loza Mencia, E.: Multilabel Classification via Calibrated Label Ranking. Machine Learning 73(2), 133–153 (2008)

    Article  Google Scholar 

  11. Loza Mencia, E., Furnkranz, J.: Efficient Pairwise Multilabel Classification for Large-scale Problems in the Legal Domain. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Datasets (Part II), pp. 50–65 (2008)

    Google Scholar 

  12. Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1, 69–90 (1999)

    Article  Google Scholar 

  13. Support Vector Machine, http://svmlight.joachims.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lu, Y., Okada, S., Nitta, K. (2013). Semi-supervised Latent Dirichlet Allocation for Multi-label Text Classification. In: Ali, M., Bosse, T., Hindriks, K.V., Hoogendoorn, M., Jonker, C.M., Treur, J. (eds) Recent Trends in Applied Artificial Intelligence. IEA/AIE 2013. Lecture Notes in Computer Science(), vol 7906. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38577-3_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38577-3_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38576-6

  • Online ISBN: 978-3-642-38577-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics