On Multilingual Training of Neural Dependency Parsers

Zapotoczny, Michał; Rychlikowski, Paweł; Chorowski, Jan

doi:10.1007/978-3-319-64206-2_37

Michał Zapotoczny¹⁵,
Paweł Rychlikowski¹⁵ &
Jan Chorowski¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1519 Accesses
2 Citations

Abstract

We show that a recently proposed neural dependency parser can be improved by joint training on multiple languages from the same family. The parser is implemented as a deep neural network whose only input is orthographic representations of words. In order to successfully parse, the network has to discover how linguistically relevant concepts can be inferred from word spellings. We analyze the representations of characters and words that are learned by the network to establish which properties of languages were accounted for. In particular we show that the parser has approximately learned to associate Latin characters with their Cyrillic counterparts and that it can group Polish and Russian words that have a similar grammatical function. Finally, we evaluate the parser on selected languages from the Universal Dependencies dataset and show that it is competitive with other recently proposed state-of-the art methods, while having a simple structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
However, experiments use UD 1.3 dataset which does not include Belarusian and Ukrainian.
2.
Conveniently, the Unicode has separate codes for Latin and Cyrillic letters.

References

Alberti, C., et al.: SyntaxNet models for the CoNLL 2017 shared task. arXiv:1703.04929, March 2017
Ammar, W., et al.: Many languages, one parser. Trans. Assoc. Comput. Linguist. 4(0), 431–444 (2016)
Google Scholar
Andor, D., Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., Petrov, S., Collins, M.: Globally normalized transition-based neural networks. arXiv:1603.06042 [cs], March 2016
Ballesteros, M., Dyer, C., Smith, N.A.: Improved transition-based parsing by modeling characters instead of words with LSTMs. arXiv preprint arXiv:1508.00657 (2015)
Bender, E.M.: On achieving and evaluating language-independence in NLP. Linguist. Issues Lang. Technol. 6(3), 1–26 (2011)
Google Scholar
Bergstra, J., et al.: Theano: a CPU and GPU math expression compiler. In: Proceedings of SciPy (2010)
Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: EMNLP, pp. 740–750 (2014)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014)
Google Scholar
Chorowski, J., Bahdanau, D., Cho, K., Bengio, Y.: End-to-end continuous speech recognition using attention-based recurrent NN: first results. arXiv:1412.1602 [cs stat], December 2014
Chorowski, J., Zapotoczny, M., Rychlikowski, P.: Read, tag, and parse all at once, or fully-neural dependency parsing. CoRR abs/1609.03441 (2016)
Google Scholar
Dozat, T., Manning, C.D.: Deep biaffine attention for neural dependency parsing. CoRR abs/1611.01734 (2016)
Google Scholar
Duong, L., Cohn, T., Bird, S., Cook, P.: A neural network model for low-resource universal dependency parsing. In: EMNLP, pp. 339–348. Citeseer (2015)
Google Scholar
Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A.: Transition-based dependency parsing with stack long short-term memory. arXiv preprint arXiv:1505.08075 (2015)
Edmonds, J.: Optimim branchings. J. Res. Natl. Bur. Stand. B 71B(4), 233–240 (1966)
Article Google Scholar
Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: ICML, pp. 1319–1327 (2013)
Google Scholar
Guo, J., Che, W., Yarowsky, D., Wang, H., Liu, T.: Cross-lingual dependency parsing based on distributed representations. In: ACL, vol. 1, pp. 1234–1244 (2015)
Google Scholar
Hinton, G.E., McClelland, J.L., Rumelhart, D.E.: Paralell Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, vol. 1. MIT Press/Bradford Books, Cambridge (1986)
Google Scholar
Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling. arXiv:1602.02410 [cs], February 2016
Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. arXiv preprint arXiv:1508.06615 (2015)
Kiperwasser, E., Goldberg, Y.: Simple and accurate dependency parsing using bidirectional LSTM feature representations. arXiv:1603.04351 [cs], March 2016
van Merriënboer, B., et al.: Blocks and fuel: frameworks for deep learning. arXiv:1506.00619 [cs stat], June 2015
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernocky, J., Khudanpur, S.: Recurrent neural network based language model, Makuhari, Chiba, Japan, September 2010
Google Scholar
Nivre, J.: Algorithms for deterministic incremental dependency parsing. Comput. Linguist. 34(4), 513–553 (2008)
Article MathSciNet Google Scholar
Nivre, J., et al.: MaltParser: a language-independent system for data-driven dependency parsing. Nat. Lang. Eng., 1 (2005)
Google Scholar
Nivre, J., et al.: Universal dependencies 1.2. http://universaldependencies.github.io/docs/
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. arXiv:1505.00387 [cs], May 2015
Titov, I., Henderson, J.: A latent variable model for generative dependency parsing. In: Proceedings of IWPT (2007)
Google Scholar
Vinyals, O., Kaiser, L., Koo, T., Petrov, S., Sutskever, I., Hinton, G.: Grammar as a Foreign language. arXiv:1412.7449 [cs stat], December 2014
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144, September 2016
Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv:1212.5701 (2012)
Zhang, X., Cheng, J., Lapata, M.: Dependency parsing as head selection. CoRR abs/1606.01280 (2016)
Google Scholar

Download references

Acknowledgments

The experiments used Theano [6], Blocks and Fuel [22] libraries. The authors would like to acknowledge the support of the following agencies for research funding and computing support: National Science Center (Poland) grant Sonata 8 2014/15/D/ST6/04402, National Center for Research and Development (Poland) grant Audioscope (Applied Research Program, 3rd contest, submission no. 245755).

Author information

Authors and Affiliations

Institute of Computer Science, University of Wrocław, Wrocław, Poland
Michał Zapotoczny, Paweł Rychlikowski & Jan Chorowski

Authors

Michał Zapotoczny
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Rychlikowski
View author publications
You can also search for this author in PubMed Google Scholar
Jan Chorowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Chorowski .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Kamil Ekštein
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zapotoczny, M., Rychlikowski, P., Chorowski, J. (2017). On Multilingual Training of Neural Dependency Parsers. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-64206-2_37
Published: 29 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64205-5
Online ISBN: 978-3-319-64206-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics