Accelerating Topic Model Training on a Single Machine

  • Mian Lu
  • Ge Bai
  • Qiong Luo
  • Jie Tang
  • Jiuxin Zhao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7808)


We present the design and implementation of GLDA, a library that utilizes the GPU (Graphics Processing Unit) to perform Gibbs sampling of Latent Dirichlet Allocation (LDA) on a single machine. LDA is an effective topic model used in many applications, e.g., classification, feature selection, and information retrieval. However, training an LDA model on large data sets takes hours, even days, due to the heavy computation and intensive memory access. Therefore, we explore the use of the GPU to accelerate LDA training on a single machine. Specifically, we propose three memory-efficient techniques to handle large data sets on the GPU: (1) generating document-topic counts as needed instead of storing all of them, (2) adopting a compact storage scheme for sparse matrices, and (3) partitioning word tokens. Through these techniques, the LDA training which would take 10 GB memory originally, can be performed on a commodity GPU card with only 1 GB GPU memory. Furthermore, our GLDA achieves a speedup of 15X over the original CPU-based LDA for large data sets.


Graphic Processing Unit Single Machine Latent Dirichlet Alloca Thread Block Latent Dirichlet Alloca Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  2. 2.
    Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed inference for latent dirichlet allocation. In: NIPS (2007)Google Scholar
  3. 3.
    Owens, J.D., Luebke, D., Govindaraju, N.K., Harris, M., Kruger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. In: Eurographics 2005, State of the Art Reports (2005)Google Scholar
  4. 4.
    Masada, T., Hamada, T., Shibata, Y., Oguri, K.: Accelerating collapsed variational bayesian inference for latent dirichlet allocation with nvidia CUDA compatible devices. In: Chien, B.-C., Hong, T.-P., Chen, S.-M., Ali, M. (eds.) IEA/AIE 2009. LNCS, vol. 5579, pp. 491–500. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Yan, F., Xu, N., Qi, Y.: Parallel inference for latent dirichlet allocation on graphics processing units. In: NIPS 2009, pp. 2134–2142 (2009)Google Scholar
  6. 6.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences, PNAS 2004 (2004)Google Scholar
  7. 7.
    Chen, W.Y., Chu, J.C., Luan, J., Bai, H., Wang, Y., Chang, E.Y.: Collaborative filtering for orkut communities: discovery of user latent behavior. In: WWW 2009 (2009)Google Scholar
  8. 8.
    Asuncion, A., Smyth, P., Welling, M.: Asynchronous distributed learning of topic models. In: NIPS (2008)Google Scholar
  9. 9.
    Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and ir precision-recall measures. In: SIGIR 2003, pp. 369–370 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Mian Lu
    • 1
  • Ge Bai
    • 2
  • Qiong Luo
    • 2
  • Jie Tang
    • 3
  • Jiuxin Zhao
    • 2
  1. 1.A*STAR Institute of High Performance ComputingSingapore
  2. 2.Hong Kong University of Science and TechnologyHong Kong
  3. 3.Tsinghua UniversityChina

Personalised recommendations