EHLLDA: A Supervised Hierarchical Topic Model

Mao, Xian-Ling; Xiao, Yixuan; Zhou, Qiang; Wang, Jun; Huang, Heyan

doi:10.1007/978-3-319-25816-4_18

Xian-Ling Mao¹⁹,
Yixuan Xiao¹⁹,
Qiang Zhou¹⁹,
Jun Wang²⁰ &
…
Heyan Huang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9427))

Included in the following conference series:

7197 Accesses

Abstract

In this paper, we consider the problem of modeling hierarchical labeled data – such as Web pages and their placement in hierarchical directories. The state-of-the-art model, hierarchical Labeled LDA (hLLDA), assumes that each child of a non-leaf label has equal importance, and that a document in the corpus cannot locate in a non-leaf node. However, in most cases, these assumptions do not meet the actual situation. Thus, in this paper, we introduce a supervised hierarchical topic models: Extended Hierarchical Labeled Latent Dirichlet Allocation (EHLLDA), which aim to relax the assumptions of hLLDA by incorporating prior information of labels into hLLDA. The experimental results show that the perplexity performance of EHLLDA is always better than that of LLDA and hLLDA on all four datasets; and our proposed model is also superior to hLLDA in terms of p@n.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

1.
http://dmoz.org/.

References

Blei, D., Griffiths, T., Jordan, M., Tenenbaum, J.: Hierarchical topic models and the nested chinese restaurant process. In: Advances in Neural Information Processing Systems, vol. 16, pp. 106 (2004)
Google Scholar
Blei, D., Lafferty, J.: Correlated topic models. In: Advances in Neural Information Processing Systems, vol. 18, p. 147 (2006)
Google Scholar
Blei, D., McAuliffe, J.: Supervised topic models. In: Proceeding of the Neural Information Processing Systems (NIPS) (2007)
Google Scholar
Blei, D., McAuliffe, J.: Supervised topic models (2010). Arxiv preprint arXiv:1003.0783
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by combining semantic concepts with unsupervised statistical learning. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008)
Chapter Google Scholar
Chemudugunta, C., Smyth, P., Steyvers, M.: Combining concept hierarchies and statistical topic models. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 1469–1470. ACM (2008)
Google Scholar
Chemudugunta, C., Smyth, P., Steyvers, M.: Text modeling using unsupervised topic models and concept hierarchies (2008). Arxiv preprint arXiv:0808.0973
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Du, L., Pate, J.K., Johnson, M.: Topic segmentation with an ordering-based topic model. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Griffiths, T., Steyvers, M.: Finding scientific topics. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 101(Suppl 1), p. 5228 (2004)
Article Google Scholar
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of Uncertainty in Artificial Intelligence, UAI1999, p. 21. Citeseer (1999)
Google Scholar
Kawamae, N.: Supervised n-gram topic model. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 473–482. ACM (2014)
Google Scholar
Lacoste-Julien, S., Sha, F., Jordan, M.: ndisclda: Discriminative learning for dimensionality reduction and classification. In: Advances in Neural Information Processing Systems, vol. 21 (2008)
Google Scholar
Ma, Z., Sun, A., Yuan, Q., Cong, G.: A tri-role topic model for domain-specific question answering. In: Proceedings of The Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with pachinko allocation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 633–640. ACM (2007)
Google Scholar
Minka, T.: Estimating a dirichlet distribution. Ann. Phys. 2000(8), 1–13 (2003)
Google Scholar
Perotte, A.J., Wood, F., Elhadad, N., Bartlett, N.: Hierarchically supervised latent dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 2609–2617 (2011)
Google Scholar
Petinot, Y., McKeown, K., Thadani, K.: A hierarchical model of web summaries. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, vol. 2, pp. 670–675. Association for Computational Linguistics (2011)
Google Scholar
Rabinovich, M., Blei, D.: The inverse regression topic model. In: Proceedings of the 31st International Conference on Machine Learning, pp. 199–207 (2014)
Google Scholar
Ramage, D., Hall, D., Nallapati, R., Manning, C.: Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 248–256. Association for Computational Linguistics (2009)
Google Scholar
Ramage, D., Heymann, P., Manning, C., Garcia-Molina, H.: Clustering the tagged web. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 54–63. ACM (2009)
Google Scholar
Ramage, D., Manning, C., Dumais, S.: Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 457–465. ACM (2011)
Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)
Google Scholar
Rubin, T., Chambers, A., Smyth, P., Steyvers, M.: Statistical topic models for multi-label document classification (2011). Arxiv preprint arXiv:1107.2462
Article MathSciNet Google Scholar
Teh, Y., Jordan, M., Beal, M., Blei, D.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)
Article MathSciNet Google Scholar
Xia, Y., Tang, N., Hussain, A., Cambria, E.: Discriminative bi-term topic model for headline-based social news clustering. In: The Twenty-Eighth International Flairs Conference (2015)
Google Scholar
Xiao, H., Wang, X., Du, C.: Injecting structured data to generative topic model in enterprise settings. In: Zhou, Z.-H., Washio, T. (eds.) ACML 2009. LNCS, vol. 5828, pp. 382–395. Springer, Heidelberg (2009)
Chapter Google Scholar
Zhu, J., Ahmed, A., Xing, E.P.: Medlda: maximum margin supervised topic models. J. Mach. Learn. Res. 13(1), 2237–2278 (2012)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

The work was supported by National Natural Science Foundation of China (No. 61402036), 863 Program of China (No. 2015AA015404) and 973 Program (No. 2013CB329605).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Xian-Ling Mao, Yixuan Xiao, Qiang Zhou & Heyan Huang
Institute of Biz Big Data, Sogou Inc., Beijing, China
Jun Wang

Authors

Xian-Ling Mao
View author publications
You can also search for this author in PubMed Google Scholar
Yixuan Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Heyan Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xian-Ling Mao .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Tsinghua University, Beijing, China
Zhiyuan Liu
Soochow University, Suzhou, Jiangsu, China
Min Zhang
Tsinghua University, Beijing, China
Yang Liu

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mao, XL., Xiao, Y., Zhou, Q., Wang, J., Huang, H. (2015). EHLLDA: A Supervised Hierarchical Topic Model. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2015 2015. Lecture Notes in Computer Science(), vol 9427. Springer, Cham. https://doi.org/10.1007/978-3-319-25816-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-25816-4_18
Published: 08 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25815-7
Online ISBN: 978-3-319-25816-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics