Skip to main content

A Model for Discovering Unpopular Research Interests

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9403))

Abstract

Traditional topic models illustrate how documents can be modeled as mixtures of topic probability distributions, which provides a simple method for discovering author’s research interests from a collection of documents. However, existing topic models such as the simplest author model [1] and the author-topic model (ATM) [2] mainly detect popular research topics and largely neglect unpopular ones. In these models, general topical words are grouped into the shared word distribution of each topic, but the words contained in each author-specific distribution that best describe the authors’ research interests are not included. Thus, a novel author-topic model for discovering unpopular research interests (URI-ATM) is proposed, which incorporates a new control variable k that takes on different values when the words belong to different types of research topics. In the model, each topic is associated with two classes of word distributions: one is the popular class among all authors, and the other is the author-specific class from which the document comes. After the URI-ATM is explained, a variety of qualitative and quantitative evaluations are performed. The results demonstrate the advantage of our approach over comparative ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. McCallum, A.: Multi-label text classification with a mixture model trained by em. In: AAAI 1999 Workshop on Text Learning, pp. 1–7 (1999)

    Google Scholar 

  2. Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., Steyvers, M.: Learning author-topic models from text corpora. ACM Transactions on Information Systems (TOIS) 28(1), 4 (2010)

    Article  Google Scholar 

  3. Zhai, C., Velivelli, A., Yu, B.: A cross-collection mixture model for comparative text mining. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 743–748. ACM (2004)

    Google Scholar 

  4. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)

    Google Scholar 

  5. Grimmer, J.: A bayesian hierarchical topic model for political texts: Measuring expressed agendas in senate press releases. Political Analysis 18(1), 1–35 (2010)

    Article  MathSciNet  Google Scholar 

  6. Bao, Y., Zhang, H.F.J.: TopicMF: simultaneously exploiting ratings and reviews for recommendation (2014)

    Google Scholar 

  7. Mitchell, T.M.: Machine learning and data mining. Communications of the ACM 42(11), 30–36 (1999)

    Article  Google Scholar 

  8. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  9. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 1–38 (1977)

    Google Scholar 

  10. Griffiths, T.L., Steyvers, M., Tenenbaum, J.B.: Topics in semantic representation. Psychological Review 114(2), 211 (2007)

    Article  Google Scholar 

  11. Wang, X., McCallum, A., Wei, X.: Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 697–702 (2007)

    Google Scholar 

  12. Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with pachinko allocation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 633–640. ACM (2007)

    Google Scholar 

  13. Yano, T., Cohen, W.W., Smith, N.A.: Predicting response to political blog posts with topic models. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 477–485. Association for Computational Linguistics (2009)

    Google Scholar 

  14. McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on enron and academic email. Journal of Artificial Intelligence Research, 249–272 (2007)

    Google Scholar 

  15. McCallum, A., Wang, X., Mohanty, N.: Joint group and topic discovery from relations and text. In: Airoldi, E.M., Xing, E.P., Blei, D.M., Goldenberg, A., Zheng, A.X., Fienberg, S.E. (eds.) ICML 2006. LNCS, vol. 4503, pp. 28–44. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  16. Hall, D., Jurafsky, D., Manning, C.D.: Studying the history of ideas using topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 363–371. Association for Computational Linguistics (2008)

    Google Scholar 

  17. Paul, M.J., Girju, R.: Topic modeling of research fields: an interdisciplinary perspective. In: RANLP, pp. 337–342 (2009)

    Google Scholar 

  18. Wang, C., Thiesson, B., Meek, C., Blei, D.M.: Markov topic models. In: International Conference on Artificial Intelligence and Statistics, pp. 583–590 (2009)

    Google Scholar 

  19. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101(suppl 1), 5228–5235 (2004)

    Article  Google Scholar 

  20. Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Introducing markov chain monte carlo. In: Markov Chain Monte Carlo in Practice, vol. 1, p. 19 (1996)

    Google Scholar 

  21. Heinrich, G.: Parameter estimation for text analysis. Technical report (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shanshan Feng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Feng, S., Cao, J., Chen, Y., Qi, J. (2015). A Model for Discovering Unpopular Research Interests. In: Zhang, S., Wirsing, M., Zhang, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2015. Lecture Notes in Computer Science(), vol 9403. Springer, Cham. https://doi.org/10.1007/978-3-319-25159-2_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25159-2_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25158-5

  • Online ISBN: 978-3-319-25159-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics