Larger Residuals, Less Work: Active Document Scheduling for Latent Dirichlet Allocation

  • Mirwaes Wahabzada
  • Kristian Kersting
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6913)


Recently, there have been considerable advances in fast inference for latent Dirichlet allocation (LDA). In particular, stochastic optimization of the variational Bayes (VB) objective function with a natural gradient step was proved to converge and able to process massive document collections. To reduce noise in the gradient estimation, it considers multiple documents chosen uniformly at random. While it is widely recognized that the scheduling of documents in stochastic optimization may have significant consequences, this issue remains largely unexplored. In this work, we address this issue. Specifically, we propose residual LDA, a novel, easy-to-implement, LDA approach that schedules documents in an informed way. Intuitively, in each iteration, residual LDA actively selects documents that exert a disproportionately large influence on the current residual to compute the next update. On several real-world datasets, including 3M articles from Wikipedia, we demonstrate that residual LDA can handily analyze massive document collections and find topic models as good or better than those found with batch VB and randomly scheduled VB, and significantly faster.


Markov Chain Monte Carlo Document Collection Topic Model Latent Dirichlet Allocation Batch Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  2. 2.
    Canini, K., Shi, L., Griffiths, T.: Online inference of topics with latent dirichlet allocation. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, AISTATS 2009 (2009)Google Scholar
  3. 3.
    Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., Blei, D.: Reading tea leaves: How humans interpret topic models. In: Proceeding of NIPS (2009)Google Scholar
  4. 4.
    Ding, C., Li, T., Peng, W.: NMF and PLSI: Equivalence and a Hybrid Algorithm. In: Proc. SIGIR (2006)Google Scholar
  5. 5.
    Drineas, P., Kannan, R., Mahoney, M.: Fast monte carlo algorithms for matrices iii: Computing a compressed approximate matrix decomposition. SIAM Journal of Computing 36, 184–206 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Frank, A., Asuncion, A.: UCI machine learning repository (2010),
  7. 7.
    Frieze, A., Kannan, R., Vempala, S.: Fast monte-carlo algorithms for finding lowrank approximations. Journal of the ACM 51(6), 1025–1041 (2004)CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    Gaussier, E., Goutte, C.: Relations between PLSA and NMF and Implications. In: Proc. SIGIR (2005)Google Scholar
  9. 9.
    Gehler, P., Holub, A., Welling, M.: The rate adapting poisson model for information retrieval and object recognition. In: Proceedings of ICML, pp. 337–344 (2006)Google Scholar
  10. 10.
    Girolami, M., Kaban, A.: On an Equivalence between PLSI and LDA. In: Proc. SIGIR (2003)Google Scholar
  11. 11.
    Hoffman, M., Blei, D., Bach, F.: Online learning for latent dirichlet allocation. In: Proceedings of Neural Information Processing Systems (NIPS 2010) (2010)Google Scholar
  12. 12.
    Hofmann, T.: Probabilistic latent semantic indexing. Research and Development in Information Retrieval, pp. 50–57 (1999)Google Scholar
  13. 13.
    Hofmann, T., Buhmann, J.: Active data clustering. In: Proceedings of NIPS (1997)Google Scholar
  14. 14.
    Mahoney, M., Drineas, P.: Cur matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences of the United States of America (PNAS) 106(3), 697–703 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed algorithms for topic models. Journal of Machine Learning Research 10, 1801–1828 (2009)zbMATHMathSciNetGoogle Scholar
  16. 16.
    Sato, I., Kurihara, K., Nakagawa, H.: Deterministic single-pass algorithm for lda. In: Proceedings of Neural Information Processing Systems, NIPS 2010 (2010)Google Scholar
  17. 17.
    Settles, B.: Active learning literature survey. Tech. Rep. 1648, University of Wisconsin-Madison (2010)Google Scholar
  18. 18.
    Smola, A., Narayanamurthy, S.: An architecture for parallel topic models. PVLDB 3(1), 703–710 (2010)Google Scholar
  19. 19.
    Sun, J., Xie, Y., Zhang, H., Faloutsos, C.: Less is more: Sparse graph mining with compact matrix decomposition. Statistical Analysis and Data Mining 1(1), 6–22 (2008)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Wallach, H., Mimno, D., McCallum, A.: Rethinking lda: Why priors matter. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1973–1981 (2009)Google Scholar
  21. 21.
    Wang, X., Davidson, I.: Active spectral clustering. In: Proceedings of the IEEE International Conference on Data Mining (ICDM 2010) (2010)Google Scholar
  22. 22.
    Yan, F., Xu, N., Qi, Y.: Parallel inference for latent dirichlet allocation on graphics processing units. In: Proceedings of NIPS (2009)Google Scholar
  23. 23.
    Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 937–946 (2009)Google Scholar
  24. 24.
    Yi, S., Wierstra, D., Schaul, T., Schmidhuber, J.: Stochastic search using the natural gradient. In: Proceedings of ICML, p. 146 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Mirwaes Wahabzada
    • 1
  • Kristian Kersting
    • 1
  1. 1.Knowledge Discovery DepartmentFraunhofer IAISSankt AugustinGermany

Personalised recommendations