Advertisement

\(\mathtt{SpectralLeader}\): Online Spectral Learning for Single Topic Models

  • Tong YuEmail author
  • Branislav Kveton
  • Zheng Wen
  • Hung Bui
  • Ole J. Mengshoel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11052)

Abstract

We study the problem of learning a latent variable model online from a stream of data. Latent variable models are popular because they can explain observed data through unobserved concepts. These models have traditionally been studied in the offline setting. In the online setting, online expectation maximization (EM) is arguably the most popular approach for learning latent variable models. Although online EM is computationally efficient, it typically converges to a local optimum. In this work, we develop a new online learning algorithm for latent variable models, which we call \(\mathtt{SpectralLeader}\). \(\mathtt{SpectralLeader}\) converges to the global optimum, and we derive a sublinear upper bound on its n-step regret in a single topic model. In both synthetic and real-world experiments, we show that \(\mathtt{SpectralLeader}\) performs similarly to or better than online EM with tuned hyper-parameters.

Keywords

Online learning Spectral method Topic models 

Notes

Acknowledgments

This work is supported, in part, by funding from Adobe and Intel to CMU.

References

  1. 1.
    Amoualian, H., Clausel, M., Gaussier, E., Amini, M.R.: Streaming-lda: a copula-based approach to modeling topic dependencies in document streams. In: KDD, pp. 695–704. ACM (2016)Google Scholar
  2. 2.
    Anandkumar, A., Foster, D.P., Hsu, D.J., Kakade, S.M., Liu, Y.K.: A spectral algorithm for latent Dirichlet allocation. In: NIPS, pp. 917–925 (2012)Google Scholar
  3. 3.
    Anandkumar, A., Ge, R., Hsu, D.J., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. JMLR 15(1), 2773–2832 (2014)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Anandkumar, A., Hsu, D., Kakade, S.M.: A method of moments for mixture models and hidden Markov models. In: COLT, p. 33-1 (2012)Google Scholar
  5. 5.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)zbMATHGoogle Scholar
  6. 6.
    Cappé, O., Moulines, E.: On-line expectation-maximization algorithm for latent data models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 71(3), 593–613 (2009)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Chaganty, A.T., Liang, P.: Spectral experts for estimating mixtures of linear regressions. In: ICML, pp. 1040–1048 (2013)Google Scholar
  8. 8.
    Davis, C., Kahan, W.M.: The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7(1), 1–46 (1970)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points - online stochastic gradient for tensor decomposition. In: COLT, pp. 797–842 (2015)Google Scholar
  10. 10.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: NIPS, pp. 856–864 (2010)Google Scholar
  12. 12.
    Huang, F., Niranjan, U., Hakeem, M.U., Anandkumar, A.: Online tensor methods for learning latent variable models. JMLR 16, 2797–2835 (2015)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Liang, P., Klein, D.: Online EM for unsupervised models. In: NAACL HLT, pp. 611–619. Association for Computational Linguistics (2009)Google Scholar
  14. 14.
    Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Learning in Graphical Models, pp. 355–368. Springer, Dordrecht (1998).  https://doi.org/10.1007/978-94-011-5014-9_12CrossRefGoogle Scholar
  15. 15.
    Nowozin, S., Lampert, C.H.: Structured learning and prediction in computer vision. Found. Trends® Comput. Graph. Vis. 6(3–4), 185–365 (2011)zbMATHGoogle Scholar
  16. 16.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  17. 17.
    Shaban, A., Farajtabar, M., Xie, B., Song, L., Boots, B.: Learning latent variable models by improving spectral solutions with exterior point method. In: UAI, pp. 792–801. AUAI Press, Arlington (2015)Google Scholar
  18. 18.
    Tung, H.Y., Smola, A.J.: Spectral methods for Indian buffet process inference. In: NIPS, pp. 1484–1492 (2014)Google Scholar
  19. 19.
    Tung, H.Y.F., Wu, C.Y., Zaheer, M., Smola, A.J.: Spectral methods for nonparametric models (2017). arXiv preprint arXiv:1704.00003
  20. 20.
    Wallach, H.M.: Topic modeling: beyond bag-of-words. In: ICML, pp. 977–984. ACM (2006)Google Scholar
  21. 21.
    Weyl, H.: Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen 71(4), 441–479 (1912)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Zou, J.Y., Hsu, D.J., Parkes, D.C., Adams, R.P.: Contrastive learning using spectral methods. In: NIPS, pp. 2238–2246 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Tong Yu
    • 1
    Email author
  • Branislav Kveton
    • 2
  • Zheng Wen
    • 3
  • Hung Bui
    • 4
  • Ole J. Mengshoel
    • 1
  1. 1.Electrical and Computer EngineeringCarnegie Mellon UniversityPittsburghUSA
  2. 2.Google ResearchMountain ViewUSA
  3. 3.Adobe ResearchSan JoseUSA
  4. 4.DeepMindMountain ViewUSA

Personalised recommendations