Word Embeddings Versus LDA for Topic Assignment in Documents

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10449)


Topic assignment for a corpus of documents is a task of natural language processing (NLP). One of the noted and well studied methods is Latent Dirichlet Allocation (LDA) where statistical methods are applied. On the other hand applying deep-learning paradigm proved useful for many NLP tasks such as classification [3], sentiment analysis [8], text summarization [11]. This paper compares the results of LDA method and application of representations provided by Word2Vec [5] which makes use of deep learning paradigm.


NLP - topic assignment Deep learning LDA 


  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). Scholar
  2. 2.
  3. 3.
    Enríquez, F., Troyano, J.A., López-Solaz, T.: An approach to the use of word embeddings in an opinion classification task. Expert Syst. Appl. 66, 1–6 (2016). Scholar
  4. 4.
    Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, ETMTNLP 2002, vol. 1, pp. 63–70. Association for Computational Linguistics, Stroudsburg (2002).
  5. 5.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Proceedings of a Meeting, 5–8 December 2013, Lake Tahoe, Nevada, USA, pp. 3111–3119 (2013).
  6. 6.
    Nallapati, R., Cohen, W.W., Lafferty, J.D.: Parallelized variational EM for latent Dirichlet allocation: an experimental evaluation of speed and scalability. In: Workshops Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), 28–31 October 2007, Omaha, Nebraska, USA, pp. 349–354. IEEE Computer Society (2007).
  7. 7.
    Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworksm, pp. 45–50. ELRA, Valletta, May 2010.
  8. 8.
    Sakenovich, N.S., Zharmagambetov, A.S.: On one approach of solving sentiment analysis task for Kazakh and Russian languages using deep learning. In: Nguyen, N.-T., Manolopoulos, Y., Iliadis, L., Trawiński, B. (eds.) ICCCI 2016. LNCS (LNAI), vol. 9876, pp. 537–545. Springer, Cham (2016). doi: 10.1007/978-3-319-45246-3_51CrossRefGoogle Scholar
  9. 9.
    Skfuzzy: Fuzzy logic toolkit in python (2016).
  10. 10.
    Topicmodels: Package for r (2016).
  11. 11.
    Yousefi-Azar, M., Hamey, L.: Text summarization using unsupervised deep learning. Expert Syst. Appl. 68, 93–105 (2017). Scholar
  12. 12.
    Zhang, W., Wang, J.: Prior-based dual additive latent Dirichlet allocation for user-item connected documents. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI 2015, pp. 1405–1411. AAAI Press (2015).

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Faculty of Mathematics, Physics and Informatics, Institute of InformaticsUniversity of GdanskGdanskPoland

Personalised recommendations