Skip to main content

Supervising Latent Topic Model for Maximum-Margin Text Classification and Regression

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6118))

Abstract

In this paper, we investigate the text classification and regression problems: given a corpus of text documents as training, each of which has a response label, the task is to train a predictor for predicting its response of any given document. In previous work, many researchers decompose this task into two separate steps: they first use a generative latent topic model to learn low-dimensional semantic representations of documents; and then train a max-margin predictor using them as features. In this work we demonstrate that it is beneficial to combine both steps of learning low-dimensional representations and training a predictor into one step of minimizing a singe learning objective. We present a novel step-wise convex optimization algorithm which solves this objective properly via a tight variational upper bound. We conduct an extensive experimental study on public available movie review and 20 Newsgroups datasets. Experimental results show that compared with state of art results in the literature, our one step approach can train noticeably better predictors and discover much lower-dimensional representations: a 2% relative accuracy improvement and a 95% relative number of dimensions reduction in the classification task on the Newsgroups dataset; and a 5.7% relative predictive R2 improvement and a 55% relative number of dimensions reduction in the regression task on the movie review dataset.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blei, D.M., McAuliffe, J.D.: Supervised topic models. In: NIPS, pp. 121–128 (2007)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    Article  MATH  Google Scholar 

  3. Bosch, A., Zisserman, A., Munoz, X.: Scene classification via plsa. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Burges, C.J.C.: A tutorial on support vector machine for pattern recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998)

    Article  Google Scholar 

  5. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of SIGIR, pp. 491–501 (1999)

    Google Scholar 

  6. Jaakkola, T., Jordan, M.: A variational approach to bayesian logistic regression models and their extensions. In: Proceedings of the 1997 Conference on Artificial Intelligence and Statistics (1997)

    Google Scholar 

  7. Klie, S.: An application of latent topic document analysis to large-scale proteomics databases. In: German Bioinformatics Conference (2007)

    Google Scholar 

  8. Lacoste-Julien, S., Sha, F., Jordan, M.I.: Disclda: Discriminative learning for dimensionality reduction and classification. In: NIPS, pp. 897–904 (2008)

    Google Scholar 

  9. Pang, B., Lee, L.: Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL (2005)

    Google Scholar 

  10. Smola, A., Scholkopf, B.: A tutorial on support vector regression. Statistics and Computing, 199–222 (2003)

    Google Scholar 

  11. Xu, W.: Supervising latent topic model for maximum-margin text classification and regression. CMU Technical Report (2009)

    Google Scholar 

  12. Zhang, T., Oles, F.: Text categorization based on regularized linear classification methods. Information Retrieval, 5–31 (2001)

    Google Scholar 

  13. Zhu, J., Ahmed, A., Xing, E.P.: Medlda: Maximum margin supervised topic models for regression and classification. In: ICML, pp. 1257–1264 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xu, W. (2010). Supervising Latent Topic Model for Maximum-Margin Text Classification and Regression. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2010. Lecture Notes in Computer Science(), vol 6118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13657-3_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13657-3_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13656-6

  • Online ISBN: 978-3-642-13657-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics