Advertisement

Word Embeddings Versus LDA for Topic Assignment in Documents

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10449)

Abstract

Topic assignment for a corpus of documents is a task of natural language processing (NLP). One of the noted and well studied methods is Latent Dirichlet Allocation (LDA) where statistical methods are applied. On the other hand applying deep-learning paradigm proved useful for many NLP tasks such as classification [3], sentiment analysis [8], text summarization [11]. This paper compares the results of LDA method and application of representations provided by Word2Vec [5] which makes use of deep learning paradigm.

Keywords

NLP - topic assignment Deep learning LDA 

References

  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://www.jmlr.org/papers/v3/blei03a.htmlMATHGoogle Scholar
  2. 2.
  3. 3.
    Enríquez, F., Troyano, J.A., López-Solaz, T.: An approach to the use of word embeddings in an opinion classification task. Expert Syst. Appl. 66, 1–6 (2016). http://dx.doi.org/10.1016/j.eswa.2016.09.005CrossRefGoogle Scholar
  4. 4.
    Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, ETMTNLP 2002, vol. 1, pp. 63–70. Association for Computational Linguistics, Stroudsburg (2002). http://dx.doi.org/10.3115/1118108.1118117
  5. 5.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Proceedings of a Meeting, 5–8 December 2013, Lake Tahoe, Nevada, USA, pp. 3111–3119 (2013). http://papers.nips.cc/book/advances-in-neural-information-processing-systems-26-2013
  6. 6.
    Nallapati, R., Cohen, W.W., Lafferty, J.D.: Parallelized variational EM for latent Dirichlet allocation: an experimental evaluation of speed and scalability. In: Workshops Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), 28–31 October 2007, Omaha, Nebraska, USA, pp. 349–354. IEEE Computer Society (2007). http://dx.doi.org/10.1109/ICDMW.2007.33
  7. 7.
    Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworksm, pp. 45–50. ELRA, Valletta, May 2010. http://is.muni.cz/publication/884893/en
  8. 8.
    Sakenovich, N.S., Zharmagambetov, A.S.: On one approach of solving sentiment analysis task for Kazakh and Russian languages using deep learning. In: Nguyen, N.-T., Manolopoulos, Y., Iliadis, L., Trawiński, B. (eds.) ICCCI 2016. LNCS (LNAI), vol. 9876, pp. 537–545. Springer, Cham (2016). doi: 10.1007/978-3-319-45246-3_51CrossRefGoogle Scholar
  9. 9.
    Skfuzzy: Fuzzy logic toolkit in python (2016). http://pythonhosted.org/scikit-fuzzy/
  10. 10.
    Topicmodels: Package for r (2016). https://cran.r-project.org/web/packages/topicmodels/
  11. 11.
    Yousefi-Azar, M., Hamey, L.: Text summarization using unsupervised deep learning. Expert Syst. Appl. 68, 93–105 (2017). http://dx.doi.org/10.1016/j.eswa.2016.10.017CrossRefGoogle Scholar
  12. 12.
    Zhang, W., Wang, J.: Prior-based dual additive latent Dirichlet allocation for user-item connected documents. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI 2015, pp. 1405–1411. AAAI Press (2015). http://dl.acm.org/citation.cfm?id=2832415.2832445

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Faculty of Mathematics, Physics and Informatics, Institute of InformaticsUniversity of GdanskGdanskPoland

Personalised recommendations