A survey on training and evaluation of word embeddings

Torregrossa, François; Allesiardo, Robin; Claveau, Vincent; Kooli, Nihel; Gravier, Guillaume

doi:10.1007/s41060-021-00242-8

A survey on training and evaluation of word embeddings

Review
Published: 17 February 2021

Volume 11, pages 85–103, (2021)
Cite this article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

François Torregrossa ORCID: orcid.org/0000-0003-3900-2196¹,
Robin Allesiardo²,
Vincent Claveau³,
Nihel Kooli² &
…
Guillaume Gravier³

1177 Accesses
15 Citations
Explore all metrics

Abstract

Word embeddings have proven to be effective for many natural language processing tasks by providing word representations integrating prior knowledge. In this article, we focus on the algorithms and models used to compute those representations and on their methods of evaluation. Many new techniques were developed in a short amount of time, and there is no unified terminology to emphasise strengths and weaknesses of those methods. Based on the state of the art, we propose a thorough terminology to help with the classification of these various models and their evaluations. We also provide comparisons of those algorithms and methods, highlighting open problems and research paths, as well as a compilation of popular evaluation metrics and datasets. This survey gives: (1) an exhaustive description and terminology of currently investigated word embeddings, (2) a clear segmentation of evaluation methods and their associated datasets, and (3) high-level properties to indicate pros and cons of each solution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

A Fistful of Vectors: A Tool for Intrinsic Evaluation of Word Embeddings

Article 22 January 2024

Roberto Ascari, Anna Giabelli, … Mario Mezzanzanica

A Systematic Literature Review on Word Embeddings

Size vs. Structure in Training Corpora for Word Embedding Models: Araneum Russicum Maximum and Russian National Corpus

Notes

Using the FNV algorithm (Bojanowski et al. [8]).
Source code for the tetrahedron by Ignasi:https://tex.stackexchange.com/questions/174317/creating-a-labeled-tetrahedron-with-tikzpicture.
http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html.
https://github.com/facebookresearch/fastText.

References

Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1638–1649. ACL (2018)
Almuhareb, A.: Attributes in lexical acquisition. Thesis from the University of Essex (2006)
Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M.E., Kawarabayashi, K.-I., Nett, M.: Estimating local intrinsic dimensionality. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, New York, NY, USA, pp. 29–38. ACM (2015)
Bakarov, A.: A survey of word embeddings evaluation methods. CoRR (2018). arXiv:1801.09536
Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, Maryland, pp. 238–247. ACL (2014)
Baroni, M., Lenci, A.: How we BLESSed distributional semantic evaluation. In: Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, Edinburgh, UK, pp. 1–10. Association for Computational Linguistics (2011)
Baroni, M., Murphy, B., Barbu, E., Poesio, M.: Strudel: a corpus-based semantic model based on properties and types. Cogn. Sci. 34, 222–54 (2010)
Article Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 5, 135–146 (2017)
Google Scholar
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Proceedings of NeurIPS, Advances in Neural Information Processing Systems 26, pp. 2787–2795. Curran Associates, Inc (2013)
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krüger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners. OpenAI publications (2020)
Bruni, E., Tran, N.K., Baroni, M.: Multimodal distributional semantics. J. Artif. Int. Res. 49(1), 1–47 (2014)
MathSciNet MATH Google Scholar
Claveau, V., Kijak, E.: Direct vs. indirect evaluation of distributional thesauri. In: International Conference on Computational Linguistics, COLING, Osaka, Japan (2016)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 999888, 2493–2537 (2011)
MATH Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186 (2019). arXiv:1810.04805
Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: Proceedings—12th IEEE International Conference on Data Mining Workshops, ICDMW 2012, pp. 587–594 (2012)
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20(1), 116–131 (2002)
Article Google Scholar
Ganea, O., Bécigneul, G., Hofmann, T.: Hyperbolic neural networks. CoRR (2018). arXiv:1805.09112
Gerz, D., Vulic, I., Hill, F., Reichart, R., Korhonen, A.: Simverb-3500: A large-scale evaluation set of verb similarity. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016). arXiv:1608.00869
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Mikolov, T.: Learning word vectors for 157 languages. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7–12, 2018 (2018)
Gu, J., Bradbury, J., Xiong, C., Li, V.O.K., Socher, R.: Non-autoregressive neural machine translation. In: International Conference on Learning Representations, ICLR (2018). arXiv:1711.02281
Harris, Z.: Distributional structure. Word 10, 146–162 (1954)
Article Google Scholar
Hill, F., Reichart, R., Korhonen, A.: SimLex-999: evaluating semantic models with (genuine) similarity estimation. Am. J. Comput. Ling. 41(4), 665–695 (2015)
Article MathSciNet Google Scholar
Iacobacci, I., Pilehvar, M.T., Navigli, R.: SensEmbed: Learning sense embeddings for word and relational similarity. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, pp. 95–105. Association for Computational Linguistics (2015)
Józefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling. CoRR (2016). arXiv:1602.02410
Karypis, G.: Cluto: a clustering toolkit. Technical Report 02-017, University of Minnesota (Department of Computer Science) (2003)
Kochurov, M., Kozlukov, S., Karimov, R., Yanush, V.: Geoopt: adaptive Riemannian optimization in pytorch (2019). https://github.com/geoopt/geoopt
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp. 260–270. Association for Computational Linguistics (2016)
Laub, J., Müller, K.-R.: Feature discovery in non-metric pairwise data. J. Mach. Learn. Res. 5, 801–818 (2004)
MathSciNet MATH Google Scholar
Leimeister, M., Wilson, B.J.: Skip-gram word embeddings in hyperbolic space. CoRR (2018). arXiv:1809.01498
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2177–2185. Curran Associates, Inc (2014)
Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Ling. 3, 211–225 (2015)
Google Scholar
Ling, W., Tsvetkov, Y., Amir, S., Fermandez, R., Dyer, C., Black, A.W., Trancoso, I., Lin, C.-C.: Not all contexts are created equal: better word representations with variable attention. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1367–1372. Association for Computational Linguistics (2015)
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: a robustly optimized bert pretraining approach (2019). arXiv:1907.11692
Lu, W., Zhang, Y., Wang, S., Huang, H., Liu, Q., Luo, S.: Concept representation by learning explicit and implicit concept couplings. IEEE Intell. Syst. (2020)
Luong, T., Socher, R., Manning, C.: Better word representations with recursive neural networks for morphology. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, Sofia, Bulgaria, pp. 104–113. Association for Computational Linguistics (2013)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book Google Scholar
McCann, B., Bradbury, J., Xiong, C., Socher, R.: Learned in translation: contextualized word vectors (2017). arXiv:1708.00107
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, 2013, Workshop Track Proceedings (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates, Inc (2013)
Mikolov, T., Yih, W.-T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, pp. 746–751. Association for Computational Linguistics (2013)
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Nickel, M., Kiela, D.: Poincaré embeddings for learning hierarchical representations. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 6338–6347. Curran Associates, Inc (2017)
Nickel, M., Kiela, D.: Learning continuous hierarchies in the Lorentz model of hyperbolic geometry. In: Proceedings of the International Conference on Machine Learning, ICML (2018)
Niven, T., Kao, H.: Probing neural network comprehension of natural language arguments. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of NAACL (2018)
Poling, B., Lerman, G.: A new approach to two-view motion segmentation using global dimension minimization. Int. J. Comput. Vis. 108(3), 165–185 (2014)
Article MathSciNet Google Scholar
Radford, A.: Improving language understanding by generative pre-training. OpenAI publications (2018)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI publications (2019)
Roy, O., Vetterli, M.: The effective rank: a measure of effective dimensionality. In: 2007 15th European Signal Processing Conference, pp. 606–610 (2007)
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)
Article Google Scholar
Sang, E.F.T.K., Meulder, F.D.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 (2003)
Schakel, A.M.J., Wilson, B.J.: Measuring word significance using distributed representations of words. CoRR (2015). arXiv:1508.02297
Schnabel, T., Labutov, I., Mimno, D., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 298–307. Association for Computational Linguistics (2015)
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, pp. 1631–1642. Association for Computational Linguistics (2013)
Sun, C., Yan, H., Qiu, X., Huang, X.: Gaussian word embedding with a Wasserstein distance loss. CoRR (2018). arXiv:1808.07016
Sun, K., Wang, J., Kalousis, A., Marchand-Maillet, S.: Space-time local embeddings. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 100–108. Curran Associates, Inc (2015)
Tifrea, A., Becigneul, G., Ganea, O.-E.: Poincaré glove: hyperbolic word embeddings. In: International Conference on Learning Representations (ICLR 2019) (2019)
Torregrossa, F., Claveau, V., Kooli, N., Gravier, G., Allesiardo, R.: On the correlation of word embedding evaluation metrics. In: Proceedings of The 12th Language Resources and Evaluation Conference, Marseille, France, pp. 4789–4797. European Language Resources Association (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc (2017)
Vilnis, L., McCallum, A.: Word representations via Gaussian embedding. In: International Conference on Learning Representations, ICLR 2015 (2015)
Vulić, I., Gerz, D., Kiela, D., Hill, F., Korhonen, A.: HyperLex: a large-scale evaluation of graded lexical entailment. Am. J. Comput. Ling. 43(4), 781–835 (2017)
Article MathSciNet Google Scholar
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, pp. 353–355. Association for Computational Linguistics (2018)
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T.L., Gugger, S., Drame, M., Lhoest, Q., Rush, A.M.: Huggingface’s transformers: state-of-the-art natural language processing (2019). arXiv:1910.03771
Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., and Le, Q.V: XLNet: generalized autoregressive pretraining for language understanding. CoRR (2019). arXiv:1906.08237
You, Y., Li, J., Hseu, J., Song, X., Demmel, J., Hsieh, C.: Reducing BERT pre-training time from 3 days to 76 minutes. CoRR (2019). arXiv:1904.00962
Zhai, X., Oliver, A., Kolesnikov, A., Beyer, L.: S\({}^{\text{4}}\)l: self-supervised semi-supervised learning. In: Proceedings of the International Conference on Computer Vision (ICCV) (2019)
Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics (2016)

Download references

Author information

Authors and Affiliations

Solocal/IRISA, 35000, Rennes, France
François Torregrossa
Solocal, Rennes, France
Robin Allesiardo & Nihel Kooli
IRISA, CNRS, Rennes, France
Vincent Claveau & Guillaume Gravier

Authors

François Torregrossa
View author publications
You can also search for this author in PubMed Google Scholar
Robin Allesiardo
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Claveau
View author publications
You can also search for this author in PubMed Google Scholar
Nihel Kooli
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Gravier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to François Torregrossa.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Nihel Kooli and Robin Allesiardo have moved to another company/ organization.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Torregrossa, F., Allesiardo, R., Claveau, V. et al. A survey on training and evaluation of word embeddings. Int J Data Sci Anal 11, 85–103 (2021). https://doi.org/10.1007/s41060-021-00242-8

Download citation

Received: 01 October 2020
Accepted: 05 January 2021
Published: 17 February 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s41060-021-00242-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on training and evaluation of word embeddings

Abstract

Access this article

Similar content being viewed by others

A Fistful of Vectors: A Tool for Intrinsic Evaluation of Word Embeddings

A Systematic Literature Review on Word Embeddings

Size vs. Structure in Training Corpora for Word Embedding Models: Araneum Russicum Maximum and Russian National Corpus

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey on training and evaluation of word embeddings

Abstract

Access this article

Similar content being viewed by others

A Fistful of Vectors: A Tool for Intrinsic Evaluation of Word Embeddings

A Systematic Literature Review on Word Embeddings

Size vs. Structure in Training Corpora for Word Embedding Models: Araneum Russicum Maximum and Russian National Corpus

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation