Skip to main content

Sparse Topical Coding with Sparse Groups

  • Conference paper
  • First Online:
Book cover Web-Age Information Management (WAIM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9658))

Included in the following conference series:

Abstract

Learning a latent semantic representing from a large number of short text corpora makes a profound practical significance in research and engineering. However, it is difficult to use standard topic models in microblogging environments since microblogs have short length, large amount, snarled noise and irregular modality characters, which prevent topic models from using full information of microblogs. In this paper, we propose a novel non-probabilistic topic model called sparse topical coding with sparse groups (STCSG), which is capable of discovering sparse latent semantic representations of large short text corpora. STCSG relaxes the normalization constraint of the inferred representations with sparse group lasso, a sparsity-inducing regularizer, which is convenient to directly control the sparsity of document, topic and word codes. Furthermore, the relaxed non-probabilistic STCSG can be effectively learned with alternating direction method of multipliers (ADMM). Our experimental results on Twitter dataset demonstrate that STCSG performs well in finding meaningful latent representations of short documents. Therefore, it can substantially improve the accuracy and efficiency of document classification.

This research is supported by the Natural Science Foundation of China No.61472291, and the Natural Science Foundation of Hubei Province No. ZRY2014000901.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://sc.whu.edu.cn/.

  2. 2.

    http://www.cs.princeton.edu/blei/lda-c/.

  3. 3.

    http://bigml.cs.tsinghua.edu.cn/~jun/stc.shtml/.

  4. 4.

    https://cran.r-project.org/web/packages/e1071/.

References

  1. Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Blei, D., Lafferty, J.: Dynamic topic models. In: ICML 2006, pp. 113–120. ACM, Pittsburgh (2006)

    Google Scholar 

  3. AlSumait, L.,Barbar, D., Domeniconi, C.: Online LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: ICDM 2008, pp. 3–12. IEEE, Pisa (2008)

    Google Scholar 

  4. Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: KDD 2006, pp. 424–433. ACM, Philadelphia (2006)

    Google Scholar 

  5. Boyd, S., Parikh, N., Chu, E., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2011)

    Article  MATH  Google Scholar 

  6. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. 68, 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. Simon, N., Friedman, J., Hastie, T., et al.: A Sparse-Group Lasso. J. Comput. Graph. Stat. 22, 231–245 (2013)

    Article  MathSciNet  Google Scholar 

  8. Eltoft, T., Kim, T., Lee, T.: On the multivariate Laplace distribution. IEEE Sig. Process. Lett. 13, 300–303 (2006)

    Article  Google Scholar 

  9. Heiler, M.: Learning sparse representations by non-negative matrix factorization and sequential cone programming. J. Mach. Learn. Res. 7, 1385–1407 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Chartrand, R., Wohlberg, B.: A nonconvex ADMM algorithm for group sparsitywith sparse groups. In: ICASSP 2013, pp. 6009–6013. IEEE, Vancouver (2013)

    Google Scholar 

  11. Bai, L., Guo, J., Lan, Y., et al.: Group sparse topical coding: from code to topic. In: WSDM 2013, pp. 315–324. ACM, Rome (2013)

    Google Scholar 

  12. Zhu, J., Xing, E.: Sparse topical coding. In: UAI 2011, pp. 831–838. AUAI, Barcelona (2011)

    Google Scholar 

  13. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  14. Wang, C., Blei, D.: Decoupling sparsity and smoothness in the discrete hierarchical dirichlet process. In: NIPS 2009, pp. 1982–1989. MIT Press, Vancouver (2009)

    Google Scholar 

  15. Lin, T., Tian, W., Mei, Q., et al.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: WWW 2014, pp. 539–550. ACM, Seoul (2014)

    Google Scholar 

  16. Chien, J., Chang, Y.: Bayesian sparse topic model. J. Sig. Process. Syst. 74, 375–389 (2014)

    Article  Google Scholar 

  17. Than, K., Ho, T.B.: Fully sparse topic models. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 490–505. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  18. Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR 1999, pp. 50–57. ACM, Berkeley (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qianqian Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Peng, M. et al. (2016). Sparse Topical Coding with Sparse Groups. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9658. Springer, Cham. https://doi.org/10.1007/978-3-319-39937-9_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39937-9_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39936-2

  • Online ISBN: 978-3-319-39937-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics