A Convolutional Approach to Multiword Expression Detection Based on Unsupervised Distributed Word Representations and Task-Driven Embedding of Lexical Features

Boros, Tiberiu; Dumitrescu, Stefan Daniel

doi:10.1007/978-3-319-65172-9_13

A Convolutional Approach to Multiword Expression Detection Based on Unsupervised Distributed Word Representations and Task-Driven Embedding of Lexical Features

Tiberiu Boros¹³ &
Stefan Daniel Dumitrescu¹³

Conference paper
First Online: 02 August 2017

2899 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 744))

Abstract

We introduce a convolutional network architecture aimed at performing token-level processing in natural language applications. We tune this architecture for a specific task - multiword expression detection - and we compare our results to state-of-the-art systems on the same datasets. The approach is multilingual and we rely on automatically extracted word embeddings from Wikipedia dumps. We also show that task-driven lexical features embeddings increase the speed and robustness of the system versus sparse encodings.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://typo.uni-konstanz.de/parseme/index.php/2-general/184-parseme-shared-task -format-of-the-final-annotation (last accessed 2017-02-15).
2.
http://universaldependencies.org/format.html (last accessed 2017-02-15).
3.
https://github.com/dav/word2vec - accessed 2017-04-10.
4.
During our experiments we observed that doing so speeds up convergence of the algorithm, with little impact over the computation time required by each training iteration.

References

Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Advances in Automatic Text Summarization, pp. 111–121 (1999)
Google Scholar
Boros, T., Pipa, S., Mititelu, V.B., Tufis, D.: A data-driven approach to verbal multiword expression detection. PARSEME shared task system description paper. In: MWE 2017, p. 121 (2017)
Google Scholar
Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: EMNLP, pp. 740–750 (2014)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)
Google Scholar
Dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: COLING, pp. 69–78 (2014)
Google Scholar
Hirst, G., St-Onge, D., et al.: Lexical chains as representations of context for the detection and correction of malapropisms. WordNet: Electron. Lex. Database 305, 305–332 (1998)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)
Google Scholar
Johnson, R., Zhang, T.: Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:14121058 (2014)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:14085882 (2014)
Kingma, D., Ba, J.: A method for stochastic optimization. arXiv preprint arXiv:14126980 (2014)
Lafferty, J., McCallum, A., Pereira, F., et al.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML, vol. 1, pp. 282–289 (2001)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Miller, G.A.: WordNet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015)
Google Scholar
Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. arXiv preprint arXiv:11042086 (2011)
Poria, S., Cambria, E., Gelbukh, A.F.: Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: EMNLP, pp. 2539–2544 (2015)
Google Scholar
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). doi:10.1007/3-540-45715-1_1
Chapter Google Scholar
Savary, A., Ramisch, C., Cordeiro, S., Sangati, F., Vincze, V., QasemiZadeh, B., Candito, M., Cap, F., Giouli, V., Stoyanova, I., Doucet, A.: The PARSEME shared task on automatic identification of verbal multiword expressions. In: Proceedings of the 13th Workshop on Multiword Expressions, Association for Computational Linguistics, Valencia, Spain (2017)
Google Scholar
Vossen, P.: EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Springer, Heidelberg (1998)
Book MATH Google Scholar
Zeman, D.: Reusable tagset conversion using tagset drivers. In: LREC (2008)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
Google Scholar

Download references

Acknowledgements

This work was supported by UEFISCDI, under grant PN-II-PT-PCCA-2013-4-0789, project “Assistive Natural-language, Voice-controlled System for Intelligent Buildings” (2013–2017).

Author information

Authors and Affiliations

Research Center for Artificial Intelligence, Romanian Academy, Calea 13 Septembrie, 050711, Bucharest, Romania
Tiberiu Boros & Stefan Daniel Dumitrescu

Authors

Tiberiu Boros
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Daniel Dumitrescu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tiberiu Boros .

Editor information

Editors and Affiliations

Politecnico di Milano, Milan, Italy
Giacomo Boracchi
Democritus University of Thrace, University Campus, Xanthi, Greece
Lazaros Iliadis
School of Computing Science and Digital Media, Robert Gordon University, Aberdeen, United Kingdom
Chrisina Jayne
Univesity of Ioannina, Ioannina, Greece
Aristidis Likas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boros, T., Dumitrescu, S.D. (2017). A Convolutional Approach to Multiword Expression Detection Based on Unsupervised Distributed Word Representations and Task-Driven Embedding of Lexical Features. In: Boracchi, G., Iliadis, L., Jayne, C., Likas, A. (eds) Engineering Applications of Neural Networks. EANN 2017. Communications in Computer and Information Science, vol 744. Springer, Cham. https://doi.org/10.1007/978-3-319-65172-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-65172-9_13
Published: 02 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65171-2
Online ISBN: 978-3-319-65172-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics