Aggregating Neural Word Embeddings for Document Representation

  • Ruqing ZhangEmail author
  • Jiafeng Guo
  • Yanyan Lan
  • Jun Xu
  • Xueqi Cheng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10772)


Recent advances in natural language processing (NLP) have shown that semantically meaningful representations of words can be efficiently acquired by distributed models. In such a case, a text document can be viewed as a bag-of-word-embeddings (BoWE), and the remaining question is how to obtain a fixed-length vector representation of the document for efficient document process. Beyond those heuristic aggregation methods, recent work has shown that one can leverage the Fisher kernel (FK) framework to generate document representations based on BoWE in a principled way. In this work, words are embedded into a Euclidean space by latent semantic indexing (LSI), and a Gaussian Mixture Model (GMM) is employed as the generative model for nonlinear FK-based aggregation. In this work, we propose an alternate FK-based aggregation method for document representation based on neural word embeddings. As we know, neural embedding models have been proven significantly better performance in word representations than LSI, where semantic relations between neural word embeddings are typically measured by cosine similarity rather than Euclidean distance. Therefore, we introduce a mixture of Von Mises-Fisher distributions (moVMF) as the generative model of neural word embeddings, and derive a new FK-based aggregation method for document representation based on BoWE. We report document classification, clustering and retrieval experiments and demonstrate that our model can produce state-of-the-art performance as compared with existing baseline methods.



This work was funded by the 973 Program of China under Grant No. 2014CB340401, the National Natural Science Foundation of China (NSFC) under Grants No. 61232010, 61433014, 61425016, 61472401, 61203298 and 61722211, the Youth Innovation Promotion Association CAS under Grants No. 20144310 and 2016102, and the National Key R&D Program of China under Grants No. 2016QY02D0405.


  1. 1.
    Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)CrossRefGoogle Scholar
  2. 2.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391 (1990)CrossRefGoogle Scholar
  3. 3.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefzbMATHGoogle Scholar
  4. 4.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR, pp. 50–57. ACM (1999)Google Scholar
  5. 5.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  6. 6.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  7. 7.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 14, pp. 1532–1543 (2014)Google Scholar
  8. 8.
    Vulic, I., Moens, M.F.: Cross-lingual semantic similarity of words as the similarity of their semantic word responses. In: NAACL-HLT 2013, pp. 106–116. ACL (2013)Google Scholar
  9. 9.
    Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, vol. 14, pp. 1188–1196 (2014)Google Scholar
  10. 10.
    Clinchant, S., Perronnin, F.: Aggregating continuous word embeddings for information retrieval. In: Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, pp. 100–109 (2013)Google Scholar
  11. 11.
    Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NIPS, pp. 487–493 (1999)Google Scholar
  12. 12.
    Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Eisenstein, J., Ahmed, A., Xing, E.P.: Sparse additive generative models of text (2011)Google Scholar
  14. 14.
    Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 261–270. ACM (2010)Google Scholar
  15. 15.
    Wang, Q., Xu, J., Li, H., Craswell, N.: Regularized latent semantic indexing. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 685–694. ACM (2011)Google Scholar
  16. 16.
    Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics (2011)Google Scholar
  17. 17.
    David, M., Blei, J.D.: Supervised topic models. In: Proceedings of Advances in Neural Information Processing Systems (2007)Google Scholar
  18. 18.
    Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, vol. 1631, Citeseer, p. 1642 (2013)Google Scholar
  19. 19.
    Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: ICML, vol. 11, pp. 1017–1024 (2011)Google Scholar
  20. 20.
    Zhao, H., Lu, Z., Poupart, P.: Self-adaptive hierarchical sentence model. In: IJCAI, pp. 4069–4076 (2015)Google Scholar
  21. 21.
    Fisher, R.: Dispersion on a sphere. In: Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 217, pp. 295–305. The Royal Society (1953)Google Scholar
  22. 22.
    Batmanghelich, K., Saeedi, A., Narasimhan, K., Gershman, S.: Nonparametric spherical topic modeling with word embeddings. arXiv preprint arXiv:1604.00126 (2016)
  23. 23.
    Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Sharing clusters among related groups: hierarchical dirichlet processes. In: NIPS, pp. 1385–1392 (2005)Google Scholar
  24. 24.
    Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR, pp. 1–8. IEEE (2007)Google Scholar
  25. 25.
    Bressan, M., Cifarelli, C., Perronnin, F.: An analysis of the relationship between painters based on their work. In: ICIP, 113–116. IEEE (2008)Google Scholar
  26. 26.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)Google Scholar
  27. 27.
    Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: CVPR, pp. 3384–3391. IEEE (2010)Google Scholar
  28. 28.
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)zbMATHGoogle Scholar
  29. 29.
    Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics (2004)Google Scholar
  30. 30.
    Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, p. 115–124. Association for Computational Linguistics (2005)Google Scholar
  31. 31.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRefzbMATHGoogle Scholar
  32. 32.
    Estévez, P.A., Tesmer, M., Perez, C.A., Zurada, J.M.: Normalized mutual information feature selection. IEEE Trans. Neural Netw. 20(2), 189–201 (2009)CrossRefGoogle Scholar
  33. 33.
    Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: SIGIR, pp. 232–241. Springer, New York (1994).

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Ruqing Zhang
    • 1
    • 2
    Email author
  • Jiafeng Guo
    • 1
    • 2
  • Yanyan Lan
    • 1
    • 2
  • Jun Xu
    • 1
    • 2
  • Xueqi Cheng
    • 1
    • 2
  1. 1.University of Chinese Academy of SciencesBeijingChina
  2. 2.CAS Key Lab of Network Data Science and TechnologyInstitute of Computing TechnologyBeijingChina

Personalised recommendations