Abstract
The identification of Multi-Word Expressions (MWEs) is central to resolving ambiguity of phrases. Recent works show that deep learning methods outperform statistical and lexical based approaches. The deep learning approaches mostly use word2vec embedding; our paper aims at comparing the use of word2vec, GloVe, and a combination of the two word embeddings in identifying MWEs. GloVe, and the combination of word2vec and GloVe were marginally better in terms of F-score, identifying more unique words, and identifying words not seen in the train data. GloVe was marginally better at identifying Verbal Multi-Word Expressions (VMWEs) which tend to be the hardest group of MWEs because they can be gappy, which is caused by interleaving of words that are part of the MWE and words that are not part of the MWE. The major purpose of the paper is to compare the use of different word embeddings in identifying MWEs and not to suggest improvements to the state-of-the-art. Future work using different dimensions of word embedding vectors and use of fasttext are suggested.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Maldonado, A., et al.: Detection of verbal multi-word expressions via conditional random fields with syntactic dependency features and semantic re-ranking. In: Markantonatou, S., Ramisch, C., Savary , Savary , A., Vincze, V. (eds.) Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017). pp. 114–120. Association for Computational Linguistics, Valencia, Spain (Apr 2017)
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1
Nagy T., I., Vincze, V.: Vpctagger: detecting verb-particle constructions with syntax-based methods. In: Proceedings of the 10th Workshop on Multiword Expressions (MWE), pp. 17–25. Association for Computational Linguistics, Gothenburg, Sweden, April 2014
Fazly, A., Cook, P., Stevenson, S.: Unsupervised type and token identification of idiomatic expressions. Comput. Linguist. 35(1), 61–103 (2009). https://doi.org/10.1162/coli.08-010-R1-07-048
Piao, S.S., Rayson, P., Archer, D., McEnery, T.: Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Comput. Speech Lang. 19(4), 378–397 (2005). https://doi.org/10.1016/j.csl.2004.11.002
Komai, M., Shindo, H., Matsumoto, Y.: An efficient annotation for phrasal verbs using dependency information. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters, pp. 125–131 (2015)
Tsvetkov, Y., Wintner, S.: Identification of multi-word expressions by combining multiple linguistic information sources. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP 2011, pp. 836–845. Association for Computational Linguistics, Stroudsburg (2011)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Johannsen, A., Schneider, N., Hovy, D., Carpuat, M.: Dimsum 2016 shared task data (2015). Accessed 10 Aug 2018
Schneider, N., Danchik, E., Dyer, C., Smith, N.A.: Discriminative lexical semantic segmentation with gaps: running the MWE gamut. Trans. Assoc. Comput. Linguist. 2, 193–206 (2014)
Gharbieh, W., Bhavsar, V., Cook, P.: Deep learning models for multiword expression identification. In: Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (* SEM 2017), pp. 54–64 (2017)
Klyueva, N., Doucet, A., Straka, M.: Neural networks for multi-word expression detection. In: Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp. 60–65 (2017)
Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820 (2015)
Schneider, N., Hovy, D., Johannses, A., Carpuat, M.: SemEval-2016 task 10: detecting minimal semantic units and their meanings (DiMSUM). In: Proceedings of SemEval-2016, pp. 546–559 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ashok, A., Elmasri, R., Natarajan, G. (2019). Comparing Different Word Embeddings for Multiword Expression Identification. In: Métais, E., Meziane, F., Vadera, S., Sugumaran, V., Saraee, M. (eds) Natural Language Processing and Information Systems. NLDB 2019. Lecture Notes in Computer Science(), vol 11608. Springer, Cham. https://doi.org/10.1007/978-3-030-23281-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-23281-8_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23280-1
Online ISBN: 978-3-030-23281-8
eBook Packages: Computer ScienceComputer Science (R0)