A Model for Discovering Unpopular Research Interests

Feng, Shanshan; Cao, Jian; Chen, Yuwen; Qi, Jing

doi:10.1007/978-3-319-25159-2_35

Shanshan Feng²²,
Jian Cao²²,
Yuwen Chen²² &
…
Jing Qi²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9403))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

2936 Accesses
2 Citations

Abstract

Traditional topic models illustrate how documents can be modeled as mixtures of topic probability distributions, which provides a simple method for discovering author’s research interests from a collection of documents. However, existing topic models such as the simplest author model [1] and the author-topic model (ATM) [2] mainly detect popular research topics and largely neglect unpopular ones. In these models, general topical words are grouped into the shared word distribution of each topic, but the words contained in each author-specific distribution that best describe the authors’ research interests are not included. Thus, a novel author-topic model for discovering unpopular research interests (URI-ATM) is proposed, which incorporates a new control variable k that takes on different values when the words belong to different types of research topics. In the model, each topic is associated with two classes of word distributions: one is the popular class among all authors, and the other is the author-specific class from which the document comes. After the URI-ATM is explained, a variety of qualitative and quantitative evaluations are performed. The results demonstrate the advantage of our approach over comparative ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

McCallum, A.: Multi-label text classification with a mixture model trained by em. In: AAAI 1999 Workshop on Text Learning, pp. 1–7 (1999)
Google Scholar
Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., Steyvers, M.: Learning author-topic models from text corpora. ACM Transactions on Information Systems (TOIS) 28(1), 4 (2010)
Article Google Scholar
Zhai, C., Velivelli, A., Yu, B.: A cross-collection mixture model for comparative text mining. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 743–748. ACM (2004)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)
Google Scholar
Grimmer, J.: A bayesian hierarchical topic model for political texts: Measuring expressed agendas in senate press releases. Political Analysis 18(1), 1–35 (2010)
Article MathSciNet Google Scholar
Bao, Y., Zhang, H.F.J.: TopicMF: simultaneously exploiting ratings and reviews for recommendation (2014)
Google Scholar
Mitchell, T.M.: Machine learning and data mining. Communications of the ACM 42(11), 30–36 (1999)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 1–38 (1977)
Google Scholar
Griffiths, T.L., Steyvers, M., Tenenbaum, J.B.: Topics in semantic representation. Psychological Review 114(2), 211 (2007)
Article Google Scholar
Wang, X., McCallum, A., Wei, X.: Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 697–702 (2007)
Google Scholar
Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with pachinko allocation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 633–640. ACM (2007)
Google Scholar
Yano, T., Cohen, W.W., Smith, N.A.: Predicting response to political blog posts with topic models. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 477–485. Association for Computational Linguistics (2009)
Google Scholar
McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on enron and academic email. Journal of Artificial Intelligence Research, 249–272 (2007)
Google Scholar
McCallum, A., Wang, X., Mohanty, N.: Joint group and topic discovery from relations and text. In: Airoldi, E.M., Xing, E.P., Blei, D.M., Goldenberg, A., Zheng, A.X., Fienberg, S.E. (eds.) ICML 2006. LNCS, vol. 4503, pp. 28–44. Springer, Heidelberg (2007)
Chapter Google Scholar
Hall, D., Jurafsky, D., Manning, C.D.: Studying the history of ideas using topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 363–371. Association for Computational Linguistics (2008)
Google Scholar
Paul, M.J., Girju, R.: Topic modeling of research fields: an interdisciplinary perspective. In: RANLP, pp. 337–342 (2009)
Google Scholar
Wang, C., Thiesson, B., Meek, C., Blei, D.M.: Markov topic models. In: International Conference on Artificial Intelligence and Statistics, pp. 583–590 (2009)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101(suppl 1), 5228–5235 (2004)
Article Google Scholar
Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Introducing markov chain monte carlo. In: Markov Chain Monte Carlo in Practice, vol. 1, p. 19 (1996)
Google Scholar
Heinrich, G.: Parameter estimation for text analysis. Technical report (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, 200240, China
Shanshan Feng, Jian Cao, Yuwen Chen & Jing Qi

Authors

Shanshan Feng
View author publications
You can also search for this author in PubMed Google Scholar
Jian Cao
View author publications
You can also search for this author in PubMed Google Scholar
Yuwen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jing Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shanshan Feng .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Songmao Zhang
Ludwig-Maximilians-Universität München, Munich, Germany
Martin Wirsing
Southwest University, Chongqing, China
Zili Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, S., Cao, J., Chen, Y., Qi, J. (2015). A Model for Discovering Unpopular Research Interests. In: Zhang, S., Wirsing, M., Zhang, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2015. Lecture Notes in Computer Science(), vol 9403. Springer, Cham. https://doi.org/10.1007/978-3-319-25159-2_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-25159-2_35
Published: 03 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25158-5
Online ISBN: 978-3-319-25159-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics