Abstract
To discover semantically coherent topics from topic models, knowledge-based topic models have been proposed to incorporate prior knowledge into topic models. Moreover, some researchers propose lifelong topic models (LTM) to mine prior knowledge from topics generated from multi-domain corpus without human intervene. LTM incorporates the learned knowledge from multi-domain corpus into topic models by introducing the Generalized Polya Urn (GPU) model into Gibbs sampling. However, GPU model is nonexchangeable so that topic inference for LTM is computationally expensive. Meanwhile, variational inference is an alternative approach to Gibbs sampling and tend to be faster than Gibbs sampling. Moreover, variational inference can also be flexible for inferring topic models with knowledge, i.e., regularized topic model. In this paper, we propose a fast and effective framework for lifelong topic model, called Regularized Lifelong Topic Model with Self-learning Knowledge (RLTM-SK), with lexical knowledge automatically learnt from the previous topic extraction, then design a variational inference method to estimate the posterior distributions of hidden variables for RLTM-SK. We compare our method with 5 state-of-the-art baselines on a dataset of product reviews from 50 domains. Results show that the performance of our method is comparable to LTM and other knowledge-based topic models. Moreover, our model is consistently faster than the best baseline method, LTM.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: Proceedings of ICML, pp. 25–32. ACM (2009)
Andrzejewski, D., Zhu, X., Craven, M., Recht, B.: A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic. In: Proceedings of IJCAI, pp. 1171–1192. AAAI (2011)
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. CoRR, abs/1601.00670 (2016)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Proceedings of NIPS, pp. 288–296. MIT Press (2009)
Chen, Z.: Lifelong machine learning for topic modeling and beyond. In: Proceedings of NAACL-HLT, pp. 133–139 (2015)
Chen, Z., Liu, B.: Topic modeling using topics from many domains, lifelong learning and big data. In: Proceedings of ICML, pp. 703–711. ACM (2014)
Chen, Z., Mukherjee, A., Liu, B.: Aspect extraction with automated prior knowledge learning. In: Proceedings of ACL, pp. 347–358. ACL (2014)
Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Discovering coherent topics using general knowledge. In: Proceedings of CIKM, pp. 209–218. ACM (2013)
Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Exploiting domain knowledge in aspect extraction. In: Proceedings of EMNLP, pp. 1655–1667. ACL (2013)
Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Leveraging multi-domain prior knowledge in topic models. In: Proceedings of IJCAI, pp. 2071–2077. AAAI (2013)
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of SIGIR, pp. 50–57. ACM (1999)
Koronacki, J., Ras, Z.W., Wierzchon, S.T., Kacprzyk, J. (eds.): Advances in Machine Learning II, Dedicated to the Memory of Professor Ryszard S. Michalski. SCI, vol. 263. Springer, Heidelberg (2010)
Mei, S., Zhu, J., Zhu, J.: Robust Regbayes: selectively incorporating first-order logic domain knowledge into Bayesian models. In: Proceedings of ICML, pp. 253–261. ACM (2014)
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of EMNLP, pp. 262–272. ACL (2011)
Wang, S., Chen, Z., Liu, B.: Mining aspect-specific opinion using a holistic lifelong topic model. In: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, 11–15 April 2016, pp. 167–176 (2016)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of WWW, pp. 1445–1456. Springer (2013)
Yang, Y., Downey, D., Evanston, I.L., Boyd-Graber, J.: Efficient methods for incorporating knowledge into topic models. In: Proceedings of EMNLP, pp. 308–317. ACL (2015)
Zhai, K., Boyd-Graber, J.L., Asadi, N., Alkhouja, M.L.: LDA: a flexible large scale topic modeling package using variational inference in mapreduce. In: Proceedings of WWW, pp. 879–888 (2012)
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China (NSFC) under Grant No. 61672153, the 863 Program under Grant No. 2015AA015406 and the Fundamental Research Funds for the Central Universities and the Research Innovation Program for College Graduates of Jiangsu Province under Grant No. KYLX16_0295.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Xu, K., Liu, F., Wu, T., Bi, S., Qi, G. (2017). A Fast and Effective Framework for Lifelong Topic Model with Self-learning Knowledge. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2017 2017. Lecture Notes in Computer Science(), vol 10565. Springer, Cham. https://doi.org/10.1007/978-3-319-69005-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-69005-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69004-9
Online ISBN: 978-3-319-69005-6
eBook Packages: Computer ScienceComputer Science (R0)