Evaluating Distributional Features for Multiword Expression Recognition
In this paper we consider the task of extracting multiword expression for Russian thesaurus RuThes, which contains various types of phrases, including non-compositional phrases, multiword terms and their variants, light verb constructions, and others. We study several embedding-based features for phrases and their components and estimate their contribution to finding multiword expressions of different types comparing them with traditional association and context measures. We found that one of the distributional features has relatively high results of MWE extraction even when used alone. Different forms of its combination with other features (phrase frequency, association measures) improve both initial orderings.
KeywordsThesaurus Multiword expression Embedding
This work was partially supported by Russian Science Foundation, grant N16-18-02074.
- 2.Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of GSCL, pp. 31–40 (2009)Google Scholar
- 3.Cordeiro, S., Ramisch, C., Idiart, M., Villavicencio, A.: Predicting the compositionality of nominal compounds: giving word embeddings a hard time. In: Proceedings of ACL-2016g Papers, vol. 1, pp. 1986–1997 (2016)Google Scholar
- 5.Daille, B.: Combined approach for terminology extraction: lexical statistics and linguistic filtering. Ph.D. thesis. University Paris 7 (1994)Google Scholar
- 6.Farahmand, M., Smith, A., Nivre, J.: A multiword expression data set: annotating non-compositionality and conventionalization for English noun compounds. In: Proceedings of the 11th Workshop on Multiword Expressions, pp. 29–33 (2015)Google Scholar
- 8.Fellbaum, C.: WordNet. Wiley Online Library (1998)Google Scholar
- 10.Gharbieh, W., Bhavsar, V.C., Cook, P.: A word embedding approach to identifying verb-noun idiomatic combinations, pp. 112–118 (2016)Google Scholar
- 11.Loukachevitch, N., Dobrov, B.: RuThes linguistic ontology vs. Russian wordnets. In: Proceedings of the Seventh Global Wordnet Conference, pp. 154–162 (2014)Google Scholar
- 12.Loukachevitch, N., Nokel, M.: An experimental study of term extraction for real information-retrieval thesauri. In: Proceedings of TIA-2013, pp. 69–76 (2013)Google Scholar
- 13.Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
- 15.Piasecki, M., Wendelberger, M., Maziarz, M.: Extraction of the multi-word lexical units in the perspective of the wordnet expansion. In: RANLP-2015, pp. 512–520 (2015)Google Scholar
- 17.Salehi, B., Cook, P., Baldwin, T.: A word embedding approach to predicting the compositionality of multiword expressions. In: Proceedings of NAACL-2015, pp. 977–983 (2015)Google Scholar