Rehabilitation of Count-Based Models for Word Vector Representations

  • Rémi LebretEmail author
  • Ronan Collobert
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9041)


Recent works on word representations mostly rely on predictive models. Distributed word representations (aka word embeddings) are trained to optimally predict the contexts in which the corresponding words tend to appear. Such models have succeeded in capturing word similarities as well as semantic and syntactic regularities. Instead, we aim at reviving interest in a model based on counts. We present a systematic study of the use of the Hellinger distance to extract semantic representations from the word co-occurrence statistics of large text corpora. We show that this distance gives good performance on word similarity and analogy tasks, with a proper type and size of context, and a dimensionality reduction based on a stochastic low-rank approximation. Besides being both simple and intuitive, this method also provides an encoding function which can be used to infer unseen words or phrases. This becomes a clear advantage compared to predictive models which must train these new words.


Frequent Word Context Word Hellinger Distance Rare Word Word Representation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Harris, Z.: Distributional structure 10 (1954)Google Scholar
  2. 2.
    Firth, J.R.: A Synopsis of Linguistic Theory 1930-55 (1957)Google Scholar
  3. 3.
    Turney, P., Pantel, P.: From Frequency to Meaning: Vector Space Models of Semantics. Journal of Artificial Intelligence Research (2010)Google Scholar
  4. 4.
    Pennington, J., Socher, R., Manning, C.D.: GloVe: Global Vectors for Word Representation. In: EMNLP (2014)Google Scholar
  5. 5.
    Lowe, W.: Towards a theory of semantic space. In: Conference of the Cognitive Science Society (2001)Google Scholar
  6. 6.
    Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review (1997)Google Scholar
  7. 7.
    Väyrynen, J.J., Honkela, T.: Word Category Maps based on Emergent Features Created by ICA. In: STeP 2004 Cognition + Cybernetics Symposium (2004)Google Scholar
  8. 8.
    Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods (2007)Google Scholar
  9. 9.
    Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behav. Res. Methods (2012)Google Scholar
  10. 10.
    Lebret, R., Collobert, R.: Word Embeddings through Hellinger PCA. In: EACL (2014)Google Scholar
  11. 11.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. In: NIPS (2013)Google Scholar
  12. 12.
    Collobert, R., Weston, J.: A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In: ICML (2008)Google Scholar
  13. 13.
    Huang, F., Yates, A.: Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling. In: ACL (2009)Google Scholar
  14. 14.
    Turian, J., Ratinov, L., Bengio, Y.: Word representations: A simple and general method for semi-supervised learning. In: ACL (2010)Google Scholar
  15. 15.
    Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. In: NIPS (2013)Google Scholar
  16. 16.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. In: ICLR Workshop (2013)Google Scholar
  17. 17.
    Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL (2014)Google Scholar
  18. 18.
    Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems (2002)Google Scholar
  19. 19.
    Rubenstein, H., Goodenough, J.B.: Contextual Correlates of Synonymy. Communications of the ACM (1965)Google Scholar
  20. 20.
    Luong, M., Socher, R., Manning, C.D.: Better Word Representations with Recursive Neural Networks for Morphology. In: CoNLL (2013)Google Scholar
  21. 21.
    Levy, O., Goldberg, Y.: Linguistic Regularities in Sparse and Explicit Word Representations. In: CoNLL (2014)Google Scholar
  22. 22.
    Jolliffe, I.: Principal Component Analysis. Springer (1986)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Idiap Research InstituteMartignySwitzerland
  2. 2.Ecole Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland

Personalised recommendations