Abstract
We show that a recently proposed neural dependency parser can be improved by joint training on multiple languages from the same family. The parser is implemented as a deep neural network whose only input is orthographic representations of words. In order to successfully parse, the network has to discover how linguistically relevant concepts can be inferred from word spellings. We analyze the representations of characters and words that are learned by the network to establish which properties of languages were accounted for. In particular we show that the parser has approximately learned to associate Latin characters with their Cyrillic counterparts and that it can group Polish and Russian words that have a similar grammatical function. Finally, we evaluate the parser on selected languages from the Universal Dependencies dataset and show that it is competitive with other recently proposed state-of-the art methods, while having a simple structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
However, experiments use UD 1.3 dataset which does not include Belarusian and Ukrainian.
- 2.
Conveniently, the Unicode has separate codes for Latin and Cyrillic letters.
References
Alberti, C., et al.: SyntaxNet models for the CoNLL 2017 shared task. arXiv:1703.04929, March 2017
Ammar, W., et al.: Many languages, one parser. Trans. Assoc. Comput. Linguist. 4(0), 431–444 (2016)
Andor, D., Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., Petrov, S., Collins, M.: Globally normalized transition-based neural networks. arXiv:1603.06042 [cs], March 2016
Ballesteros, M., Dyer, C., Smith, N.A.: Improved transition-based parsing by modeling characters instead of words with LSTMs. arXiv preprint arXiv:1508.00657 (2015)
Bender, E.M.: On achieving and evaluating language-independence in NLP. Linguist. Issues Lang. Technol. 6(3), 1–26 (2011)
Bergstra, J., et al.: Theano: a CPU and GPU math expression compiler. In: Proceedings of SciPy (2010)
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: EMNLP, pp. 740–750 (2014)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014)
Chorowski, J., Bahdanau, D., Cho, K., Bengio, Y.: End-to-end continuous speech recognition using attention-based recurrent NN: first results. arXiv:1412.1602 [cs stat], December 2014
Chorowski, J., Zapotoczny, M., Rychlikowski, P.: Read, tag, and parse all at once, or fully-neural dependency parsing. CoRR abs/1609.03441 (2016)
Dozat, T., Manning, C.D.: Deep biaffine attention for neural dependency parsing. CoRR abs/1611.01734 (2016)
Duong, L., Cohn, T., Bird, S., Cook, P.: A neural network model for low-resource universal dependency parsing. In: EMNLP, pp. 339–348. Citeseer (2015)
Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A.: Transition-based dependency parsing with stack long short-term memory. arXiv preprint arXiv:1505.08075 (2015)
Edmonds, J.: Optimim branchings. J. Res. Natl. Bur. Stand. B 71B(4), 233–240 (1966)
Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: ICML, pp. 1319–1327 (2013)
Guo, J., Che, W., Yarowsky, D., Wang, H., Liu, T.: Cross-lingual dependency parsing based on distributed representations. In: ACL, vol. 1, pp. 1234–1244 (2015)
Hinton, G.E., McClelland, J.L., Rumelhart, D.E.: Paralell Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, vol. 1. MIT Press/Bradford Books, Cambridge (1986)
Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling. arXiv:1602.02410 [cs], February 2016
Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. arXiv preprint arXiv:1508.06615 (2015)
Kiperwasser, E., Goldberg, Y.: Simple and accurate dependency parsing using bidirectional LSTM feature representations. arXiv:1603.04351 [cs], March 2016
van Merriënboer, B., et al.: Blocks and fuel: frameworks for deep learning. arXiv:1506.00619 [cs stat], June 2015
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Mikolov, T., Karafiát, M., Burget, L., Cernocky, J., Khudanpur, S.: Recurrent neural network based language model, Makuhari, Chiba, Japan, September 2010
Nivre, J.: Algorithms for deterministic incremental dependency parsing. Comput. Linguist. 34(4), 513–553 (2008)
Nivre, J., et al.: MaltParser: a language-independent system for data-driven dependency parsing. Nat. Lang. Eng., 1 (2005)
Nivre, J., et al.: Universal dependencies 1.2. http://universaldependencies.github.io/docs/
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR 15, 1929–1958 (2014)
Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. arXiv:1505.00387 [cs], May 2015
Titov, I., Henderson, J.: A latent variable model for generative dependency parsing. In: Proceedings of IWPT (2007)
Vinyals, O., Kaiser, L., Koo, T., Petrov, S., Sutskever, I., Hinton, G.: Grammar as a Foreign language. arXiv:1412.7449 [cs stat], December 2014
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144, September 2016
Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv:1212.5701 (2012)
Zhang, X., Cheng, J., Lapata, M.: Dependency parsing as head selection. CoRR abs/1606.01280 (2016)
Acknowledgments
The experiments used Theano [6], Blocks and Fuel [22] libraries. The authors would like to acknowledge the support of the following agencies for research funding and computing support: National Science Center (Poland) grant Sonata 8 2014/15/D/ST6/04402, National Center for Research and Development (Poland) grant Audioscope (Applied Research Program, 3rd contest, submission no. 245755).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zapotoczny, M., Rychlikowski, P., Chorowski, J. (2017). On Multilingual Training of Neural Dependency Parsers. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-64206-2_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64205-5
Online ISBN: 978-3-319-64206-2
eBook Packages: Computer ScienceComputer Science (R0)