Author Tree-Structured Hierarchical Dirichlet Process

  • Md Hijbul AlamEmail author
  • Jaakko PeltonenEmail author
  • Jyrki Nummenmaa
  • Kalervo Järvelin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11198)


Three key aspects of online discussion venues are the multitude of participants, the underlying trends of content, and the structure of the venue. However, most models are unable to take into account all three of these. In hierarchically organized message forums, authors may participate differently at multiple levels of sections, with different interests and contributions across the hierarchy. Well-designed probabilistic models of online discussion are applicable to many tasks such as prediction of future content or authorship attribution. However, traditional models such as Hierarchical Dirichlet Processes (HDPs) do not fully take into account authors, and are further unable to fully take into account deep hierarchical venues where documents can arise at all tree nodes. We introduce the Author Tree-structured Hierarchical Dirichlet Process (ATHDP), allowing Dirichlet process based topic modeling of both text content and authors over a given tree structure of arbitrary size and height. Experiments on six hierarchical discussion data sets demonstrate better performance of ATHDP compared to traditional HDP based alternatives in terms of perplexity and authorship attribution accuracy.


Hierarchical Dirichlet Processes Topic Modeling Message Forum 


  1. 1.
    Adams, R., Ghahramani, Z., Jordan, M.: Tree-structured stick breaking for hierarchical data. In: Proceedings of NIPS, pp. 19–27. Curran Associates Inc. (2010)Google Scholar
  2. 2.
    Ahmed, A., Ho, Q., Teo, C.H., Eisenstein, J., Smola, A.J., Xing, E.P.: Online inference for the infinite topic-cluster model: Storylines from streaming text. In: Proceedings of AISTATS, pp. 101–109 (2011)Google Scholar
  3. 3.
    Alam, M.H., Ryu, W.J., Lee, S.: Joint multi-grain topic sentiment. Inf. Sci. 339(C), 206–223 (2016)Google Scholar
  4. 4.
    Blei, D., Griffiths, T., Jordan, M.: The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM 57, 7:1–7:30 (2010)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  6. 6.
    Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci. 101(suppl 1), 5220–5227 (2004)CrossRefGoogle Scholar
  7. 7.
    He, R., McAuley, J.: Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of WWW, pp. 507–517 (2016)Google Scholar
  8. 8.
    Jiang, S., Qian, X., Shen, J., Fu, Y., Mei, T.: Author topic model-based collaborative filtering for personalized poi recommendations. IEEE Trans. Multimed. 17(6), 907–918 (2015)Google Scholar
  9. 9.
    Kim, H., Sun, Y., Hockenmaier, J., Han, J.: ETM: entity topic models for mining documents associated with entities. In: Proceedings of ICDM, pp. 349–358. IEEE Computer Society (2012)Google Scholar
  10. 10.
    Kim, J., Kim, D., Kim, S., Oh, A.: Modeling topic hierarchies with the recursive Chinese restaurant process. In: Proceedings of CIKM, pp. 783–792. ACM (2012)Google Scholar
  11. 11.
    Li, W., McCallum, A.: Pachinko allocation: DAG-structured mixture models of topic correlations. In: Proceedings of ICML, pp. 577–584. ACM (2006)Google Scholar
  12. 12.
    Peltonen, J., Belorustceva, K., Ruotsalo, T.: Topic-relevance map: visualization for improving search result comprehension. In: Proceedings of IUI. pp. 611–622. ACM (2017)Google Scholar
  13. 13.
    Poddar, L., Hsu, W., Lee, M.L.: Author-aware aspect topic sentiment model to retrieve supporting opinions from reviews. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 472–481. Association for Computational Linguistics (2017)Google Scholar
  14. 14.
    Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of UAI, pp. 487–494. AUAI Press (2004)Google Scholar
  15. 15.
    Teh, Y., Jordan, M., Beal, M., Blei, D.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101, 1566–1581 (2006)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Xuan, J., Lu, J., Zhang, G., Xu, R.Y., Luo, X.: A Bayesian nonparametric model for multi-label learning. Mach. Learn. 106(11), 1787–1815 (2017). NovMathSciNetCrossRefGoogle Scholar
  17. 17.
    Yang, L., et al.: CQArank: jointly model topics and expertise in community question answering. In: Proceedings of CIKM, pp. 99–108. ACM (2013)Google Scholar
  18. 18.
    Yang, M., Hsu, W.H.: HDPauthor: a new hybrid author-topic model using latent Dirichlet allocation and hierarchical Dirichlet processes. In: Proceedings of WWW, pp. 619–624. ACM (2016)Google Scholar
  19. 19.
    Zhang, S., Zhang, S., Yen, N.Y., Zhu, G.: The recommendation system of micro-blog topic based on user clustering. Mob. Netw. Appl. 22(2), 228–239 (2017). AprCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.University of TampereTampereFinland
  2. 2.Aalto UniversityEspooFinland

Personalised recommendations