Multimedia Tools and Applications

, Volume 77, Issue 3, pp 3455–3471 | Cite as

Conceptualization topic modeling

  • Yi-Kun Tang
  • Xian-Ling MaoEmail author
  • Heyan Huang
  • Xuewen Shi
  • Guihua Wen


Recently, topic modeling has been widely used to discover the abstract topics in the multimedia field. Most of the existing topic models are based on the assumption of three-layer hierarchical Bayesian structure, i.e. each document is modeled as a probability distribution over topics, and each topic is a probability distribution over words. However, the assumption is not optimal. Intuitively, it’s more reasonable to assume that each topic is a probability distribution over concepts, and then each concept is a probability distribution over words, i.e. adding a latent concept layer between topic layer and word layer in traditional three-layer assumption. In this paper, we verify the proposed assumption by incorporating the new assumption in two representative topic models, and obtain two novel topic models. Extensive experiments were conducted among the proposed models and corresponding baselines, and the results show that the proposed models significantly outperform the baselines in terms of case study and perplexity, which means the new assumption is more reasonable than traditional one.


Conceptualization topic modeling Hierarchical bayesian structure Conceptualization latent dirichlet allocation Conceptualization labeled latent dirichlet allocation 



This work was supported by 863 Program (2015AA015404), China National Science Foundation (61402036, 60973083, 61273363), Beijing Technology Project (Z151100001615029), Science and Technology Planning Project of Guangdong Province (2014A010103009, 2015A020217002), Guangzhou Science and Technology Planning Project(201604020179). Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF201738)


  1. 1.
    Blei D, Griffiths T, Jordan M, Tenenbaum J (2004) Hierarchical topic models and the nested chinese restaurant process. Adv Neural Inf Proces Syst 16:106Google Scholar
  2. 2.
    Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  3. 3.
    Cao D, Ji R, Lin D, Li S (2016) Visual sentiment topic model based microblog image sentiment analysis. Multimedia Tools and Applications 75(15):8955–8968CrossRefGoogle Scholar
  4. 4.
    Cao Z, Li S, Liu Y, Li W, Ji H (2015) A novel neural topic model and its supervised extension. In: AAAI, pp 2210–2216Google Scholar
  5. 5.
    Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407CrossRefGoogle Scholar
  6. 6.
    Fan Y, Zhou Q, Yue W, Zhu W (2017) A dynamic causal topic model for mining activities from complex videos. Multimedia Tools and Applications:1–16.
  7. 7.
    Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 50–57Google Scholar
  8. 8.
    Hu W, Tsujii J (2016) A latent concept topic model for robust topic inference using word embeddings. In: The 54th annual meeting of the association for computational linguistics, p 380Google Scholar
  9. 9.
    Jayabharathy J, Kanmani S, Sivaranjani N (2014) Correlated concept based topic updation model for dynamic corpora. Int J Comput Appl 89(10):1–7Google Scholar
  10. 10.
    Joshi A, Bhattacharyya P, Carman M (2016) Political issue extraction model: a novel hierarchical topic model that uses tweets by political and non-political authors. In: Proceedings of NAACL-HLT, pp 82–90Google Scholar
  11. 11.
    Lim KW, Chen C, Buntine W (2016) Twitter-network topic model: A full bayesian treatment for social network and text modeling. arXiv:1609.06791
  12. 12.
    Magnusson M, Jonsson L, Villani M (2016) Dolda-a regularized supervised topic model for high-dimensional multi-class regression. arXiv:1602.00260
  13. 13.
    Mao XL, Ming ZY, Chua TS, Li S, Yan H, Li X (2012) SSHLDA: a semi-supervised hierarchical topic model. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics, pp 800–809Google Scholar
  14. 14.
    Mao XL, Xiao Y, Zhou Q, Wang J, Huang H (2015) Ehllda: a supervised hierarchical topic model. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, Cham, pp 215–226Google Scholar
  15. 15.
    Mimno D, Li W, Mccallum A (2007) Mixtures of hierarchical topics with pachinko allocation pp 633–640Google Scholar
  16. 16.
    Murphy GL (2004) The big book of concepts. J Child Lang 31(1):247–253CrossRefGoogle Scholar
  17. 17.
    Perotte A, Bartlett N, Elhadad N, Wood F (2011) Hierarchically supervised latent dirichlet allocation. Neural Information Processing Systems (to appear)Google Scholar
  18. 18.
    Petinot Y, McKeown K, Thadani K (2011) A hierarchical model of web summaries. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, vol 2. Association for Computational Linguistics, pp 670–675Google Scholar
  19. 19.
    Qian S, Zhang T, Xu C, Shao J (2016) Multi-modal event topic model for social event analysis. IEEE Trans Multimedia 18(2):233–246CrossRefGoogle Scholar
  20. 20.
    Ramage D, Hall D, Nallapati R, Manning C (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, pp 248–256Google Scholar
  21. 21.
    Ramage D, Manning C, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 457–465Google Scholar
  22. 22.
    Rubin T, Chambers A, Smyth P, Steyvers M (2011) Statistical topic models for multi-label document classification. arXiv:1107.2462
  23. 23.
    Shin SJ, Moon IC (2017) Guided htm: Hierarchical topic model with dirichlet forest priors. IEEE Trans Knowl Data Eng 29(2):330–343CrossRefGoogle Scholar
  24. 24.
    Tang YK, Mao XL, Huang H (2016) Labeled phrase latent dirichlet allocation. In: International conference on web information systems engineering. Springer International Publishing, pp 525–536Google Scholar
  25. 25.
    Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1566–1581MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Wang Z, Wang H, Wen JR, Xiao Y (2015) An inference approach to basic level of categorization. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp 653–662Google Scholar
  27. 27.
    Wu W, Li H, Wang H, Zhu KQ (2012) Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 481–492Google Scholar
  28. 28.
    Yan Y, Ricci E, Subramanian R, Liu G, Lanz O, Sebe N (2016) A multi-task learning framework for head pose estimation under target motion. IEEE Trans Pattern Anal Mach Intell 38(6):1070–1083CrossRefGoogle Scholar
  29. 29.
    Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann AG, Sebe N (2015) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878MathSciNetCrossRefGoogle Scholar
  30. 30.
    Yao L, Zhang Y, Wei B, Li L, Wu F, Zhang P, Bian Y (2016) Concept over time: the combination of probabilistic topic model with wikipedia knowledge. Expert Systems With Applications 60:27–38CrossRefGoogle Scholar
  31. 31.
    Yao L, Zhang Y, Wei B, Qian H, Wang Y (2015) Incorporating probabilistic knowledge into topic models. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 586–597Google Scholar
  32. 32.
    Zhang C, Ek C, Gratal X, Pokorny F, Kjellstrom H (2013) Supervised hierarchical dirichlet processes with variational inference. In: Proceedings of the IEEE international conference on computer vision workshops, pp 254–261Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Yi-Kun Tang
    • 1
    • 2
  • Xian-Ling Mao
    • 1
    Email author
  • Heyan Huang
    • 1
  • Xuewen Shi
    • 1
  • Guihua Wen
    • 3
  1. 1.School of Computer Science and TechnologyBeijing Institute of TechnologyBeijingChina
  2. 2.Fujian Provincial Key Laboratory of Information Processing and Intelligent ControlMinjiang UniversityFuzhouChina
  3. 3.Department of Computer Science and TechnologySouth China University of TechnologyGuangzhou ShiChina

Personalised recommendations