Skip to main content

Comparing Different Word Embeddings for Multiword Expression Identification

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11608))

Abstract

The identification of Multi-Word Expressions (MWEs) is central to resolving ambiguity of phrases. Recent works show that deep learning methods outperform statistical and lexical based approaches. The deep learning approaches mostly use word2vec embedding; our paper aims at comparing the use of word2vec, GloVe, and a combination of the two word embeddings in identifying MWEs. GloVe, and the combination of word2vec and GloVe were marginally better in terms of F-score, identifying more unique words, and identifying words not seen in the train data. GloVe was marginally better at identifying Verbal Multi-Word Expressions (VMWEs) which tend to be the hardest group of MWEs because they can be gappy, which is caused by interleaving of words that are part of the MWE and words that are not part of the MWE. The major purpose of the paper is to compare the use of different word embeddings in identifying MWEs and not to suggest improvements to the state-of-the-art. Future work using different dimensions of word embedding vectors and use of fasttext are suggested.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Maldonado, A., et al.: Detection of verbal multi-word expressions via conditional random fields with syntactic dependency features and semantic re-ranking. In: Markantonatou, S., Ramisch, C., Savary , Savary , A., Vincze, V. (eds.) Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017). pp. 114–120. Association for Computational Linguistics, Valencia, Spain (Apr 2017)

    Google Scholar 

  2. Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1

    Chapter  Google Scholar 

  3. Nagy T., I., Vincze, V.: Vpctagger: detecting verb-particle constructions with syntax-based methods. In: Proceedings of the 10th Workshop on Multiword Expressions (MWE), pp. 17–25. Association for Computational Linguistics, Gothenburg, Sweden, April 2014

    Google Scholar 

  4. Fazly, A., Cook, P., Stevenson, S.: Unsupervised type and token identification of idiomatic expressions. Comput. Linguist. 35(1), 61–103 (2009). https://doi.org/10.1162/coli.08-010-R1-07-048

    Article  Google Scholar 

  5. Piao, S.S., Rayson, P., Archer, D., McEnery, T.: Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Comput. Speech Lang. 19(4), 378–397 (2005). https://doi.org/10.1016/j.csl.2004.11.002

    Article  Google Scholar 

  6. Komai, M., Shindo, H., Matsumoto, Y.: An efficient annotation for phrasal verbs using dependency information. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters, pp. 125–131 (2015)

    Google Scholar 

  7. Tsvetkov, Y., Wintner, S.: Identification of multi-word expressions by combining multiple linguistic information sources. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP 2011, pp. 836–845. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  8. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  9. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  10. Johannsen, A., Schneider, N., Hovy, D., Carpuat, M.: Dimsum 2016 shared task data (2015). Accessed 10 Aug 2018

    Google Scholar 

  11. Schneider, N., Danchik, E., Dyer, C., Smith, N.A.: Discriminative lexical semantic segmentation with gaps: running the MWE gamut. Trans. Assoc. Comput. Linguist. 2, 193–206 (2014)

    Article  Google Scholar 

  12. Gharbieh, W., Bhavsar, V., Cook, P.: Deep learning models for multiword expression identification. In: Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (* SEM 2017), pp. 54–64 (2017)

    Google Scholar 

  13. Klyueva, N., Doucet, A., Straka, M.: Neural networks for multi-word expression detection. In: Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp. 60–65 (2017)

    Google Scholar 

  14. Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820 (2015)

  15. Schneider, N., Hovy, D., Johannses, A., Carpuat, M.: SemEval-2016 task 10: detecting minimal semantic units and their meanings (DiMSUM). In: Proceedings of SemEval-2016, pp. 546–559 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aishwarya Ashok .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ashok, A., Elmasri, R., Natarajan, G. (2019). Comparing Different Word Embeddings for Multiword Expression Identification. In: Métais, E., Meziane, F., Vadera, S., Sugumaran, V., Saraee, M. (eds) Natural Language Processing and Information Systems. NLDB 2019. Lecture Notes in Computer Science(), vol 11608. Springer, Cham. https://doi.org/10.1007/978-3-030-23281-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-23281-8_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-23280-1

  • Online ISBN: 978-3-030-23281-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics