Topic modeling for large-scale text data



This paper develops a novel online algorithm, namely moving average stochastic variational inference (MASVI), which applies the results obtained by previous iterations to smooth out noisy natural gradients. We analyze the convergence property of the proposed algorithm and conduct a set of experiments on two large-scale collections that contain millions of documents. Experimental results indicate that in contrast to algorithms named ‘stochastic variational inference’ and ‘SGRLD’, our algorithm achieves a faster convergence rate and better performance.

Key words

Latent Dirichlet allocation (LDA) Topic modeling Online learning Moving average 

CLC number



Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Amari, S., 1998. Natural gradient works efficiently in learning. Neur. Comput., 10(2):251–276. [doi:10.1162/089976698300017746]MathSciNetCrossRefGoogle Scholar
  2. Andrieu, C., de Freitas, N., Doucet, A., et al., 2003. An introduction to MCMC for machine learning. Mach. Learn., 50(1–2):5–43. [doi:10.1023/A:1020281327116]MATHCrossRefGoogle Scholar
  3. Blatt, D., Hero, A.O., Gauchman, H., 2007. A convergent incremental gradient method with a constant step size. SIAM J. Optim., 18(1):29–51. [doi:10.1137/040615961]MATHMathSciNetCrossRefGoogle Scholar
  4. Blei, D.M., 2012. Probabilistic topic models. Commun. ACM, 55(4):77–84. [doi:10.1145/2133806.2133826]MathSciNetCrossRefGoogle Scholar
  5. Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent Dirichlet allocation. J. Mach. Learn. Res., 3:993–1022.MATHGoogle Scholar
  6. Canini, K.R., Shi, L., Griffiths, T.L., 2009. Online inference of topics with latent Dirichlet allocation. J. Mach. Learn. Res., 5(2):65–72.Google Scholar
  7. Griffiths, T.L., Steyvers, M., 2004. Finding scientific topics. PNAS, 101(suppl 1):5228–5235. [doi:10.1073/pnas.0307752101]CrossRefGoogle Scholar
  8. Hoffman, M., Bach, F.R., Blei, D.M., 2010. Online learning for latent Dirichlet allocation. Advances in Neural Information Processing Systems, p.856–864.Google Scholar
  9. Hoffman, M., Blei, D.M., Wang, C., et al., 2013. Stochastic variational inference. J. Mach. Learn. Res., 14(1): 1303–1347.MATHMathSciNetGoogle Scholar
  10. Liu, Z., Zhang, Y., Chang, E.Y., et al., 2011. PLDA+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol., 2(3), Article 26.Google Scholar
  11. Newman, D., Asuncion, A., Smyth, P., et al., 2009. Distributed algorithms for topic models. J. Mach. Learn. Res., 10:1801–1828.MATHMathSciNetGoogle Scholar
  12. Ouyang, J., Lu, Y., Li, X., 2014. Momentum online LDA for large-scale datasets. Proc. 21st European Conf. on Artificial Intelligence, p.1075–1076.Google Scholar
  13. Patterson, S., Teh, Y.W., 2013. Stochastic gradient Riemannian Langevin dynamics on the probability simplex. Advances in Neural Information Processing Systems, p.3102–3110.Google Scholar
  14. Ranganath, R., Wang, C., Blei, D.M., et al., 2013. An adaptive learning rate for stochastic variational inferencen. J. Mach. Learn. Res., 28(2):298–306.Google Scholar
  15. Schaul, T., Zhang, S., LeCun, Y., 2013. No more pesky learning rates. arXiv preprint, arXiv:1206:1106v2.Google Scholar
  16. Song, X., Lin, C.Y., Tseng, B.L., et al., 2005. Modeling and predicting personal information dissemination behavior. Proc. 11th ACM SIGKDD Int. Conf. on Knowledge Discovery in Data Mining, p.479–488. [doi:10.1145/1081870.1081925]Google Scholar
  17. Tadić, V.B., 2009. Convergence rate of stochastic gradient search in the case of multiple and non-isolated minima. arXiv preprint, arXiv:0904.4229v2.Google Scholar
  18. Teh, Y.W., Newman, D., Welling, M., 2007. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. Advances in Neural Information Processing Systems, p.1353–1360.Google Scholar
  19. Wang, C., Chen, X., Smola, A.J., et al., 2013. Variance reduction for stochastic gradient optimization. Advances in Neural Information Processing Systems, p.181–189.Google Scholar
  20. Wang, Y., Bai, H., Stanton, M., et al., 2009. PLDA: parallel latent Dirichlet allocation for large-scale applications. Proc. 5th Int. Conf. on Algorithmic Aspects in Information and Management, p.301–314. [doi:10.1007/978-3-642-02158-9_26]CrossRefGoogle Scholar
  21. Yan, F., Xu, N., Qi, Y., 2009. Parallel inference for latent Dirichlet allocation on graphics processing units. Advances in Neural Information Processing Systems, p.2134–2142.Google Scholar
  22. Ye, Y., Gong, S., Liu, C., et al., 2013. Online belief propagation algorithm for probabilistic latent semantic analysis. Front. Comput. Sci., 7(5):526–535. [doi:10.1007/s11704-013-2360-7]MathSciNetCrossRefGoogle Scholar

Copyright information

© Journal of Zhejiang University Science Editorial Office and Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyJilin UniversityChangchunChina
  2. 2.MOE Key Laboratory of Symbolic Computation and Knowledge EngineeringJilin UniversityChangchunChina

Personalised recommendations