Advertisement

Polynomial Topic Distribution with Topic Modeling for Generic Labeling

  • Syeda Sumbul HossainEmail author
  • Md. Rezwan Ul-Hassan
  • Shadikur Rahman
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1046)

Abstract

Topics generated by topic models are typically reproduced as a list of words. To decrease the cognitional overhead of understanding these topics for end-users, we have proposed labeling topics with a noun phrase that summarizes its theme or idea. Using the WordNet lexical database as candidate labels, we estimate natural labeling for documents with words to select the most relevant labels for topics. Compared to WUP similarity topic labeling system, our methodology is simpler, more effective, and obtains better topic labels.

Keywords

Text mining Topic model Topic label LDA WordNet 

References

  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  2. 2.
    Gildea, D., Hofmann, T.: Topic-based language models using EM. In: Sixth European Conference on Speech Communication and Technology (1999)Google Scholar
  3. 3.
    Deerwester, S., et al.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRefGoogle Scholar
  4. 4.
    Salton, G., Michael, J.: Introduction to Modern Information Retrieval. McGill (1983)Google Scholar
  5. 5.
    Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  6. 6.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc. (1999)Google Scholar
  7. 7.
    Denny, M.J., Spirling, A.: Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about it. Polit. Anal. 26(2), 168–189 (2018)CrossRefGoogle Scholar
  8. 8.
    Sajid, A., Jan, S., Shah, I.A.: Automatic topic modeling for single document short texts. In: 2017 International Conference on Frontiers of Information Technology (FIT). IEEE (2017)Google Scholar
  9. 9.
    Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (1994)Google Scholar
  10. 10.
    Makhoul, J., et al.: Performance measures for information extraction. In: Proceedings of DARPA Broadcast News Workshop (1999)Google Scholar
  11. 11.
    Valle, D., et al.: Extending the Latent Dirichlet Allocation model to presence/absence data: a case study on North American breeding birds and biogeographical shifts expected from climate change. Glob. Change Biol. 24(11), 5560–5572 (2018)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Guo, Y., Barnes, S.J., Jia, Q.: Mining meaning from online ratings and reviews: tourist satisfaction analysis using latent Dirichlet allocation. Tourism Manag. 59, 467–483 (2017)CrossRefGoogle Scholar
  13. 13.
    Feuerriegel, S., Ratku, A., Neumann, D.: Analysis of how underlying topics in financial news affect stock prices using latent Dirichlet allocation. In: 2016 49th Hawaii International Conference on System Sciences (HICSS). IEEE (2016)Google Scholar
  14. 14.
    Pinoli, P., Chicco, D., Masseroli, M.: Latent Dirichlet allocation based on Gibbs sampling for gene function prediction. In: 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology. IEEE (2014)Google Scholar
  15. 15.
    Lienou, M., Maitre, H., Datcu, M.: Semantic annotation of satellite images using latent Dirichlet allocation. IEEE Geosci. Remote Sens. Lett. 7(1), 28–32 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Syeda Sumbul Hossain
    • 1
    Email author
  • Md. Rezwan Ul-Hassan
    • 1
  • Shadikur Rahman
    • 1
  1. 1.Department of Software EngineeringDaffodil International UniversityDhakaBangladesh

Personalised recommendations