Advertisement

Evaluating Distributional Features for Multiword Expression Recognition

  • Natalia Loukachevitch
  • Ekaterina Parkhomenko
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

In this paper we consider the task of extracting multiword expression for Russian thesaurus RuThes, which contains various types of phrases, including non-compositional phrases, multiword terms and their variants, light verb constructions, and others. We study several embedding-based features for phrases and their components and estimate their contribution to finding multiword expressions of different types comparing them with traditional association and context measures. We found that one of the distributional features has relatively high results of MWE extraction even when used alone. Different forms of its combination with other features (phrase frequency, association measures) improve both initial orderings.

Keywords

Thesaurus Multiword expression Embedding 

Notes

Acknowledgments

This work was partially supported by Russian Science Foundation, grant N16-18-02074.

References

  1. 1.
    Astrakhantsev, N.: ATR4S: toolkit with state-of-the-art automatic terms recognition methods in Scala. Lang. Resour. Eval. 52, 853–872 (2018)CrossRefGoogle Scholar
  2. 2.
    Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of GSCL, pp. 31–40 (2009)Google Scholar
  3. 3.
    Cordeiro, S., Ramisch, C., Idiart, M., Villavicencio, A.: Predicting the compositionality of nominal compounds: giving word embeddings a hard time. In: Proceedings of ACL-2016g Papers, vol. 1, pp. 1986–1997 (2016)Google Scholar
  4. 4.
    Daille, B.: Term Variation in Specialised Corpora: Characterisation, Automatic Discovery and Applications, vol. 19. John Benjamins Publishing Company, Amsterdam (2017)CrossRefGoogle Scholar
  5. 5.
    Daille, B.: Combined approach for terminology extraction: lexical statistics and linguistic filtering. Ph.D. thesis. University Paris 7 (1994)Google Scholar
  6. 6.
    Farahmand, M., Smith, A., Nivre, J.: A multiword expression data set: annotating non-compositionality and conventionalization for English noun compounds. In: Proceedings of the 11th Workshop on Multiword Expressions, pp. 29–33 (2015)Google Scholar
  7. 7.
    Fazly, A., Cook, P., Stevenson, S.: Unsupervised type and token identification of idiomatic expressions. Comput. Linguist. 35(1), 61–103 (2009)CrossRefGoogle Scholar
  8. 8.
    Fellbaum, C.: WordNet. Wiley Online Library (1998)Google Scholar
  9. 9.
    Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. Int. J. Digit. Libr. 3(2), 115–130 (2000)CrossRefGoogle Scholar
  10. 10.
    Gharbieh, W., Bhavsar, V.C., Cook, P.: A word embedding approach to identifying verb-noun idiomatic combinations, pp. 112–118 (2016)Google Scholar
  11. 11.
    Loukachevitch, N., Dobrov, B.: RuThes linguistic ontology vs. Russian wordnets. In: Proceedings of the Seventh Global Wordnet Conference, pp. 154–162 (2014)Google Scholar
  12. 12.
    Loukachevitch, N., Nokel, M.: An experimental study of term extraction for real information-retrieval thesauri. In: Proceedings of TIA-2013, pp. 69–76 (2013)Google Scholar
  13. 13.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  14. 14.
    Pecina, P.: Lexical association measures and collocation extraction. Lang. Resour. Eval. 44(1–2), 137–158 (2010)CrossRefGoogle Scholar
  15. 15.
    Piasecki, M., Wendelberger, M., Maziarz, M.: Extraction of the multi-word lexical units in the perspective of the wordnet expansion. In: RANLP-2015, pp. 512–520 (2015)Google Scholar
  16. 16.
    Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45715-1_1CrossRefGoogle Scholar
  17. 17.
    Salehi, B., Cook, P., Baldwin, T.: A word embedding approach to predicting the compositionality of multiword expressions. In: Proceedings of NAACL-2015, pp. 977–983 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Lomonosov Moscow State UniversityMoscowRussia
  2. 2.Tatarstan Academy of SciencesKazanRussia

Personalised recommendations