Abstract
Learning a latent semantic representing from a large number of short text corpora makes a profound practical significance in research and engineering. However, it is difficult to use standard topic models in microblogging environments since microblogs have short length, large amount, snarled noise and irregular modality characters, which prevent topic models from using full information of microblogs. In this paper, we propose a novel non-probabilistic topic model called sparse topical coding with sparse groups (STCSG), which is capable of discovering sparse latent semantic representations of large short text corpora. STCSG relaxes the normalization constraint of the inferred representations with sparse group lasso, a sparsity-inducing regularizer, which is convenient to directly control the sparsity of document, topic and word codes. Furthermore, the relaxed non-probabilistic STCSG can be effectively learned with alternating direction method of multipliers (ADMM). Our experimental results on Twitter dataset demonstrate that STCSG performs well in finding meaningful latent representations of short documents. Therefore, it can substantially improve the accuracy and efficiency of document classification.
This research is supported by the Natural Science Foundation of China No.61472291, and the Natural Science Foundation of Hubei Province No. ZRY2014000901.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Blei, D., Lafferty, J.: Dynamic topic models. In: ICML 2006, pp. 113–120. ACM, Pittsburgh (2006)
AlSumait, L.,Barbar, D., Domeniconi, C.: Online LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: ICDM 2008, pp. 3–12. IEEE, Pisa (2008)
Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: KDD 2006, pp. 424–433. ACM, Philadelphia (2006)
Boyd, S., Parikh, N., Chu, E., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2011)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. 68, 49–67 (2006)
Simon, N., Friedman, J., Hastie, T., et al.: A Sparse-Group Lasso. J. Comput. Graph. Stat. 22, 231–245 (2013)
Eltoft, T., Kim, T., Lee, T.: On the multivariate Laplace distribution. IEEE Sig. Process. Lett. 13, 300–303 (2006)
Heiler, M.: Learning sparse representations by non-negative matrix factorization and sequential cone programming. J. Mach. Learn. Res. 7, 1385–1407 (2006)
Chartrand, R., Wohlberg, B.: A nonconvex ADMM algorithm for group sparsitywith sparse groups. In: ICASSP 2013, pp. 6009–6013. IEEE, Vancouver (2013)
Bai, L., Guo, J., Lan, Y., et al.: Group sparse topical coding: from code to topic. In: WSDM 2013, pp. 315–324. ACM, Rome (2013)
Zhu, J., Xing, E.: Sparse topical coding. In: UAI 2011, pp. 831–838. AUAI, Barcelona (2011)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. 58, 267–288 (1996)
Wang, C., Blei, D.: Decoupling sparsity and smoothness in the discrete hierarchical dirichlet process. In: NIPS 2009, pp. 1982–1989. MIT Press, Vancouver (2009)
Lin, T., Tian, W., Mei, Q., et al.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: WWW 2014, pp. 539–550. ACM, Seoul (2014)
Chien, J., Chang, Y.: Bayesian sparse topic model. J. Sig. Process. Syst. 74, 375–389 (2014)
Than, K., Ho, T.B.: Fully sparse topic models. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 490–505. Springer, Heidelberg (2012)
Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR 1999, pp. 50–57. ACM, Berkeley (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Peng, M. et al. (2016). Sparse Topical Coding with Sparse Groups. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9658. Springer, Cham. https://doi.org/10.1007/978-3-319-39937-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-39937-9_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39936-2
Online ISBN: 978-3-319-39937-9
eBook Packages: Computer ScienceComputer Science (R0)