Skip to main content

Advertisement

Log in

Conceptualization topic modeling

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recently, topic modeling has been widely used to discover the abstract topics in the multimedia field. Most of the existing topic models are based on the assumption of three-layer hierarchical Bayesian structure, i.e. each document is modeled as a probability distribution over topics, and each topic is a probability distribution over words. However, the assumption is not optimal. Intuitively, it’s more reasonable to assume that each topic is a probability distribution over concepts, and then each concept is a probability distribution over words, i.e. adding a latent concept layer between topic layer and word layer in traditional three-layer assumption. In this paper, we verify the proposed assumption by incorporating the new assumption in two representative topic models, and obtain two novel topic models. Extensive experiments were conducted among the proposed models and corresponding baselines, and the results show that the proposed models significantly outperform the baselines in terms of case study and perplexity, which means the new assumption is more reasonable than traditional one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Source code can be found at https://github.com/anonymity01/CLDA

References

  1. Blei D, Griffiths T, Jordan M, Tenenbaum J (2004) Hierarchical topic models and the nested chinese restaurant process. Adv Neural Inf Proces Syst 16:106

    Google Scholar 

  2. Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  3. Cao D, Ji R, Lin D, Li S (2016) Visual sentiment topic model based microblog image sentiment analysis. Multimedia Tools and Applications 75(15):8955–8968

    Article  Google Scholar 

  4. Cao Z, Li S, Liu Y, Li W, Ji H (2015) A novel neural topic model and its supervised extension. In: AAAI, pp 2210–2216

  5. Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  6. Fan Y, Zhou Q, Yue W, Zhu W (2017) A dynamic causal topic model for mining activities from complex videos. Multimedia Tools and Applications:1–16. https://link.springer.com/article/10.1007/s11042-017-4760-4

  7. Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 50–57

  8. Hu W, Tsujii J (2016) A latent concept topic model for robust topic inference using word embeddings. In: The 54th annual meeting of the association for computational linguistics, p 380

  9. Jayabharathy J, Kanmani S, Sivaranjani N (2014) Correlated concept based topic updation model for dynamic corpora. Int J Comput Appl 89(10):1–7

    Google Scholar 

  10. Joshi A, Bhattacharyya P, Carman M (2016) Political issue extraction model: a novel hierarchical topic model that uses tweets by political and non-political authors. In: Proceedings of NAACL-HLT, pp 82–90

  11. Lim KW, Chen C, Buntine W (2016) Twitter-network topic model: A full bayesian treatment for social network and text modeling. arXiv:1609.06791

  12. Magnusson M, Jonsson L, Villani M (2016) Dolda-a regularized supervised topic model for high-dimensional multi-class regression. arXiv:1602.00260

  13. Mao XL, Ming ZY, Chua TS, Li S, Yan H, Li X (2012) SSHLDA: a semi-supervised hierarchical topic model. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics, pp 800–809

  14. Mao XL, Xiao Y, Zhou Q, Wang J, Huang H (2015) Ehllda: a supervised hierarchical topic model. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, Cham, pp 215–226

    Google Scholar 

  15. Mimno D, Li W, Mccallum A (2007) Mixtures of hierarchical topics with pachinko allocation pp 633–640

  16. Murphy GL (2004) The big book of concepts. J Child Lang 31(1):247–253

    Article  Google Scholar 

  17. Perotte A, Bartlett N, Elhadad N, Wood F (2011) Hierarchically supervised latent dirichlet allocation. Neural Information Processing Systems (to appear)

  18. Petinot Y, McKeown K, Thadani K (2011) A hierarchical model of web summaries. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, vol 2. Association for Computational Linguistics, pp 670–675

  19. Qian S, Zhang T, Xu C, Shao J (2016) Multi-modal event topic model for social event analysis. IEEE Trans Multimedia 18(2):233–246

    Article  Google Scholar 

  20. Ramage D, Hall D, Nallapati R, Manning C (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, pp 248–256

  21. Ramage D, Manning C, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 457–465

  22. Rubin T, Chambers A, Smyth P, Steyvers M (2011) Statistical topic models for multi-label document classification. arXiv:1107.2462

  23. Shin SJ, Moon IC (2017) Guided htm: Hierarchical topic model with dirichlet forest priors. IEEE Trans Knowl Data Eng 29(2):330–343

    Article  Google Scholar 

  24. Tang YK, Mao XL, Huang H (2016) Labeled phrase latent dirichlet allocation. In: International conference on web information systems engineering. Springer International Publishing, pp 525–536

  25. Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1566–1581

    Article  MathSciNet  Google Scholar 

  26. Wang Z, Wang H, Wen JR, Xiao Y (2015) An inference approach to basic level of categorization. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp 653–662

  27. Wu W, Li H, Wang H, Zhu KQ (2012) Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 481–492

  28. Yan Y, Ricci E, Subramanian R, Liu G, Lanz O, Sebe N (2016) A multi-task learning framework for head pose estimation under target motion. IEEE Trans Pattern Anal Mach Intell 38(6):1070–1083

    Article  Google Scholar 

  29. Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann AG, Sebe N (2015) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878

    Article  MathSciNet  Google Scholar 

  30. Yao L, Zhang Y, Wei B, Li L, Wu F, Zhang P, Bian Y (2016) Concept over time: the combination of probabilistic topic model with wikipedia knowledge. Expert Systems With Applications 60:27–38

    Article  Google Scholar 

  31. Yao L, Zhang Y, Wei B, Qian H, Wang Y (2015) Incorporating probabilistic knowledge into topic models. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 586–597

    Chapter  Google Scholar 

  32. Zhang C, Ek C, Gratal X, Pokorny F, Kjellstrom H (2013) Supervised hierarchical dirichlet processes with variational inference. In: Proceedings of the IEEE international conference on computer vision workshops, pp 254–261

Download references

Acknowledgements

This work was supported by 863 Program (2015AA015404), China National Science Foundation (61402036, 60973083, 61273363), Beijing Technology Project (Z151100001615029), Science and Technology Planning Project of Guangdong Province (2014A010103009, 2015A020217002), Guangzhou Science and Technology Planning Project(201604020179). Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF201738)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xian-Ling Mao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, YK., Mao, XL., Huang, H. et al. Conceptualization topic modeling. Multimed Tools Appl 77, 3455–3471 (2018). https://doi.org/10.1007/s11042-017-5145-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5145-4

Keywords

Navigation