Advertisement

A Fast and Effective Framework for Lifelong Topic Model with Self-learning Knowledge

  • Kang XuEmail author
  • Feng Liu
  • Tianxing Wu
  • Sheng Bi
  • Guilin Qi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10565)

Abstract

To discover semantically coherent topics from topic models, knowledge-based topic models have been proposed to incorporate prior knowledge into topic models. Moreover, some researchers propose lifelong topic models (LTM) to mine prior knowledge from topics generated from multi-domain corpus without human intervene. LTM incorporates the learned knowledge from multi-domain corpus into topic models by introducing the Generalized Polya Urn (GPU) model into Gibbs sampling. However, GPU model is nonexchangeable so that topic inference for LTM is computationally expensive. Meanwhile, variational inference is an alternative approach to Gibbs sampling and tend to be faster than Gibbs sampling. Moreover, variational inference can also be flexible for inferring topic models with knowledge, i.e., regularized topic model. In this paper, we propose a fast and effective framework for lifelong topic model, called Regularized Lifelong Topic Model with Self-learning Knowledge (RLTM-SK), with lexical knowledge automatically learnt from the previous topic extraction, then design a variational inference method to estimate the posterior distributions of hidden variables for RLTM-SK. We compare our method with 5 state-of-the-art baselines on a dataset of product reviews from 50 domains. Results show that the performance of our method is comparable to LTM and other knowledge-based topic models. Moreover, our model is consistently faster than the best baseline method, LTM.

Keywords

Variational inference Lifelong topic model Knowledge-based topic model 

Notes

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China (NSFC) under Grant No. 61672153, the 863 Program under Grant No. 2015AA015406 and the Fundamental Research Funds for the Central Universities and the Research Innovation Program for College Graduates of Jiangsu Province under Grant No. KYLX16_0295.

References

  1. 1.
    Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: Proceedings of ICML, pp. 25–32. ACM (2009)Google Scholar
  2. 2.
    Andrzejewski, D., Zhu, X., Craven, M., Recht, B.: A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic. In: Proceedings of IJCAI, pp. 1171–1192. AAAI (2011)Google Scholar
  3. 3.
    Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. CoRR, abs/1601.00670 (2016)Google Scholar
  4. 4.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  5. 5.
    Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Proceedings of NIPS, pp. 288–296. MIT Press (2009)Google Scholar
  6. 6.
    Chen, Z.: Lifelong machine learning for topic modeling and beyond. In: Proceedings of NAACL-HLT, pp. 133–139 (2015)Google Scholar
  7. 7.
    Chen, Z., Liu, B.: Topic modeling using topics from many domains, lifelong learning and big data. In: Proceedings of ICML, pp. 703–711. ACM (2014)Google Scholar
  8. 8.
    Chen, Z., Mukherjee, A., Liu, B.: Aspect extraction with automated prior knowledge learning. In: Proceedings of ACL, pp. 347–358. ACL (2014)Google Scholar
  9. 9.
    Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Discovering coherent topics using general knowledge. In: Proceedings of CIKM, pp. 209–218. ACM (2013)Google Scholar
  10. 10.
    Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Exploiting domain knowledge in aspect extraction. In: Proceedings of EMNLP, pp. 1655–1667. ACL (2013)Google Scholar
  11. 11.
    Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Leveraging multi-domain prior knowledge in topic models. In: Proceedings of IJCAI, pp. 2071–2077. AAAI (2013)Google Scholar
  12. 12.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)CrossRefzbMATHGoogle Scholar
  13. 13.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of SIGIR, pp. 50–57. ACM (1999)Google Scholar
  14. 14.
    Koronacki, J., Ras, Z.W., Wierzchon, S.T., Kacprzyk, J. (eds.): Advances in Machine Learning II, Dedicated to the Memory of Professor Ryszard S. Michalski. SCI, vol. 263. Springer, Heidelberg (2010)zbMATHGoogle Scholar
  15. 15.
    Mei, S., Zhu, J., Zhu, J.: Robust Regbayes: selectively incorporating first-order logic domain knowledge into Bayesian models. In: Proceedings of ICML, pp. 253–261. ACM (2014)Google Scholar
  16. 16.
    Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of EMNLP, pp. 262–272. ACL (2011)Google Scholar
  17. 17.
    Wang, S., Chen, Z., Liu, B.: Mining aspect-specific opinion using a holistic lifelong topic model. In: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, 11–15 April 2016, pp. 167–176 (2016)Google Scholar
  18. 18.
    Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of WWW, pp. 1445–1456. Springer (2013)Google Scholar
  19. 19.
    Yang, Y., Downey, D., Evanston, I.L., Boyd-Graber, J.: Efficient methods for incorporating knowledge into topic models. In: Proceedings of EMNLP, pp. 308–317. ACL (2015)Google Scholar
  20. 20.
    Zhai, K., Boyd-Graber, J.L., Asadi, N., Alkhouja, M.L.: LDA: a flexible large scale topic modeling package using variational inference in mapreduce. In: Proceedings of WWW, pp. 879–888 (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Kang Xu
    • 1
    Email author
  • Feng Liu
    • 1
  • Tianxing Wu
    • 1
  • Sheng Bi
    • 1
  • Guilin Qi
    • 1
  1. 1.School of Computer Science and EngineeringSoutheast UniversityNanjingChina

Personalised recommendations