Coherent Topic Hierarchy: A Strategy for Topic Evolutionary Analysis on Microblog Feeds

  • Jiahui Zhu
  • Xuhui Li
  • Min PengEmail author
  • Jiajia Huang
  • Tieyun Qian
  • Jimin Huang
  • Jiping Liu
  • Ri Hong
  • Pinglan Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9098)


Topic evolutionary analysis on microblog feeds can help reveal users’ interests and public concerns in a global perspective. However, it is not easy to capture the evolutionary patterns since the semantic coherence is usually difficult to be expressed and the timeline structure is always intractable to be organized. In this paper, we propose a novel strategy, in which a coherent topic hierarchy is designed to deal with these challenges. First, we incorporate the sparse biterm topic model to extract some coherent topics from microblog feeds. Then the topology of these topics is constructed by the basic Bayesian rose tree combined with topic similarity. Finally, we devise a cross-tree random walk with restart model to bond each pair of sequential trees into a timeline hierarchy. Experimental results on microblog datasets demonstrate that the coherent topic hierarchy is capable of providing meaningful topic interpretations, achieving high clustering performance, as well as presenting motivated patterns for topic evolutionary analysis.


Coherent topic hierarchy Topic evolution Microblog feed Bayesian rose tree 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  2. 2.
    Blei, D., Lafferty, J.: Dynamic topic models. In: ICML 2006, pp. 113–120. ACM (2006)Google Scholar
  3. 3.
    AlSumait, L., Barbar, D., Domeniconi, C.: Online lda: adaptive topic models for mining text streams with applications to topic detection and tracking. In: ICDM 2008, pp. 3–12. IEEE (2008)Google Scholar
  4. 4.
    Long, R., Wang, H., Chen, Y., Jin, O., Yu, Y.: Towards effective event detection, tracking and summarization on microblog data. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) WAIM 2011. LNCS, vol. 6897, pp. 652–663. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  5. 5.
    Yang, X., Ghoting, A., Ruan, Y., et al.: A framework for summarizing and analyzing twitter feeds. In: KDD 2012, pp. 370–378. ACM (2012)Google Scholar
  6. 6.
    Shou, L., Wang, Z., Chen, K., et al.: Sumblr: continuous summarization of evolving tweet streams. In: SIGIR 2013, pp. 533–542. ACM (2013)Google Scholar
  7. 7.
    Wang, X., Liu, S., Song, Y., et al.: Mining evolutionary multi-branch trees from text streams. In: KDD 2013, pp. 722–730. ACM (2013)Google Scholar
  8. 8.
    Zhu, X., Ming, Z., Hao, Y., et al.: Customized organization of social media contents using focused topic hierarchy. In: CIKM 2014, pp. 1509–1518. ACM (2014)Google Scholar
  9. 9.
    Yan, X., Guo, J., Lan, Y., et al.: A biterm topic model for short texts. In: WWW 2013, pp. 1445–1456. ACM (2013)Google Scholar
  10. 10.
    Blundell, C., Teh, Y., Heller, K.: Bayesian rose trees. In: UAI 2010 (2010). arXiv:1203.3468
  11. 11.
    Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: KDD 2006, pp. 424–433. ACM (2006)Google Scholar
  12. 12.
    Zhang, J., Song, Y., Zhang, C., et al.: Evolutionary hierarchical Dirichlet processes for multiple correlated time-varying corpora. In: KDD 2010, pp. 1079–1088. ACM (2010)Google Scholar
  13. 13.
    Wang, Y., Agichtein, E., Benzi, M.: Tm-lda: efficient online modeling of latent topic transitions in social media. In: KDD 2012, pp. 123–131. ACM (2012)Google Scholar
  14. 14.
    Chang, J., Gerrish, S., Wang, C., et al.: Reading tea leaves: how humans interpret topic models. In: NIPS 2009, pp. 288–296. MIT Press (2009)Google Scholar
  15. 15.
    Chen, Z., Mukherjee, A., Liu, B., et al.: Discovering coherent topics using general knowledge. In: CIKM 2013, pp. 209–218. ACM (2013)Google Scholar
  16. 16.
    Lin, T., Tian, W., Mei, Q., et al.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: WWW 2014, pp. 539–550. ACM (2014)Google Scholar
  17. 17.
    Lin, C., Lin, C., Li, J., et al.: Generating event storylines from microblogs. In: CIKM 2012, pp. 175–184. ACM (2012)Google Scholar
  18. 18.
    Zhu, X., Ming, Z., Zhu, X., et al.: Topic hierarchy construction for the organization of multi-source user generated contents. In: SIGIR 2013, pp. 233–242. ACM (2013)Google Scholar
  19. 19.
    Tong, H., Faloutsos, C., Pan, J.: Fast random walk with restart and its applications. In: ICDM 2006, pp. 613–622. IEEE (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Jiahui Zhu
    • 1
  • Xuhui Li
    • 2
    • 3
  • Min Peng
    • 2
    • 4
    Email author
  • Jiajia Huang
    • 2
  • Tieyun Qian
    • 2
  • Jimin Huang
    • 2
  • Jiping Liu
    • 2
  • Ri Hong
    • 2
  • Pinglan Liu
    • 2
  1. 1.State Key Lab of Software Engineering, School of ComputerWuhan UniversityWuhanChina
  2. 2.School of ComputerWuhan UniversityWuhanChina
  3. 3.School of Information ManagementWuhan UniversityWuhanChina
  4. 4.Shenzhen Research InstituteWuhan UniversityWuhanChina

Personalised recommendations