Skip to main content

On Multilingual Training of Neural Dependency Parsers

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

Abstract

We show that a recently proposed neural dependency parser can be improved by joint training on multiple languages from the same family. The parser is implemented as a deep neural network whose only input is orthographic representations of words. In order to successfully parse, the network has to discover how linguistically relevant concepts can be inferred from word spellings. We analyze the representations of characters and words that are learned by the network to establish which properties of languages were accounted for. In particular we show that the parser has approximately learned to associate Latin characters with their Cyrillic counterparts and that it can group Polish and Russian words that have a similar grammatical function. Finally, we evaluate the parser on selected languages from the Universal Dependencies dataset and show that it is competitive with other recently proposed state-of-the art methods, while having a simple structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    However, experiments use UD 1.3 dataset which does not include Belarusian and Ukrainian.

  2. 2.

    Conveniently, the Unicode has separate codes for Latin and Cyrillic letters.

References

  1. Alberti, C., et al.: SyntaxNet models for the CoNLL 2017 shared task. arXiv:1703.04929, March 2017

  2. Ammar, W., et al.: Many languages, one parser. Trans. Assoc. Comput. Linguist. 4(0), 431–444 (2016)

    Google Scholar 

  3. Andor, D., Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., Petrov, S., Collins, M.: Globally normalized transition-based neural networks. arXiv:1603.06042 [cs], March 2016

  4. Ballesteros, M., Dyer, C., Smith, N.A.: Improved transition-based parsing by modeling characters instead of words with LSTMs. arXiv preprint arXiv:1508.00657 (2015)

  5. Bender, E.M.: On achieving and evaluating language-independence in NLP. Linguist. Issues Lang. Technol. 6(3), 1–26 (2011)

    Google Scholar 

  6. Bergstra, J., et al.: Theano: a CPU and GPU math expression compiler. In: Proceedings of SciPy (2010)

    Google Scholar 

  7. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)

    Article  MathSciNet  Google Scholar 

  8. Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: EMNLP, pp. 740–750 (2014)

    Google Scholar 

  9. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014)

    Google Scholar 

  10. Chorowski, J., Bahdanau, D., Cho, K., Bengio, Y.: End-to-end continuous speech recognition using attention-based recurrent NN: first results. arXiv:1412.1602 [cs stat], December 2014

  11. Chorowski, J., Zapotoczny, M., Rychlikowski, P.: Read, tag, and parse all at once, or fully-neural dependency parsing. CoRR abs/1609.03441 (2016)

    Google Scholar 

  12. Dozat, T., Manning, C.D.: Deep biaffine attention for neural dependency parsing. CoRR abs/1611.01734 (2016)

    Google Scholar 

  13. Duong, L., Cohn, T., Bird, S., Cook, P.: A neural network model for low-resource universal dependency parsing. In: EMNLP, pp. 339–348. Citeseer (2015)

    Google Scholar 

  14. Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A.: Transition-based dependency parsing with stack long short-term memory. arXiv preprint arXiv:1505.08075 (2015)

  15. Edmonds, J.: Optimim branchings. J. Res. Natl. Bur. Stand. B 71B(4), 233–240 (1966)

    Article  Google Scholar 

  16. Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: ICML, pp. 1319–1327 (2013)

    Google Scholar 

  17. Guo, J., Che, W., Yarowsky, D., Wang, H., Liu, T.: Cross-lingual dependency parsing based on distributed representations. In: ACL, vol. 1, pp. 1234–1244 (2015)

    Google Scholar 

  18. Hinton, G.E., McClelland, J.L., Rumelhart, D.E.: Paralell Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, vol. 1. MIT Press/Bradford Books, Cambridge (1986)

    Google Scholar 

  19. Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling. arXiv:1602.02410 [cs], February 2016

  20. Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. arXiv preprint arXiv:1508.06615 (2015)

  21. Kiperwasser, E., Goldberg, Y.: Simple and accurate dependency parsing using bidirectional LSTM feature representations. arXiv:1603.04351 [cs], March 2016

  22. van Merriënboer, B., et al.: Blocks and fuel: frameworks for deep learning. arXiv:1506.00619 [cs stat], June 2015

  23. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)

    Google Scholar 

  24. Mikolov, T., Karafiát, M., Burget, L., Cernocky, J., Khudanpur, S.: Recurrent neural network based language model, Makuhari, Chiba, Japan, September 2010

    Google Scholar 

  25. Nivre, J.: Algorithms for deterministic incremental dependency parsing. Comput. Linguist. 34(4), 513–553 (2008)

    Article  MathSciNet  Google Scholar 

  26. Nivre, J., et al.: MaltParser: a language-independent system for data-driven dependency parsing. Nat. Lang. Eng., 1 (2005)

    Google Scholar 

  27. Nivre, J., et al.: Universal dependencies 1.2. http://universaldependencies.github.io/docs/

  28. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  29. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  30. Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. arXiv:1505.00387 [cs], May 2015

  31. Titov, I., Henderson, J.: A latent variable model for generative dependency parsing. In: Proceedings of IWPT (2007)

    Google Scholar 

  32. Vinyals, O., Kaiser, L., Koo, T., Petrov, S., Sutskever, I., Hinton, G.: Grammar as a Foreign language. arXiv:1412.7449 [cs stat], December 2014

  33. Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144, September 2016

  34. Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv:1212.5701 (2012)

  35. Zhang, X., Cheng, J., Lapata, M.: Dependency parsing as head selection. CoRR abs/1606.01280 (2016)

    Google Scholar 

Download references

Acknowledgments

The experiments used Theano [6], Blocks and Fuel [22] libraries. The authors would like to acknowledge the support of the following agencies for research funding and computing support: National Science Center (Poland) grant Sonata 8 2014/15/D/ST6/04402, National Center for Research and Development (Poland) grant Audioscope (Applied Research Program, 3rd contest, submission no. 245755).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Chorowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zapotoczny, M., Rychlikowski, P., Chorowski, J. (2017). On Multilingual Training of Neural Dependency Parsers. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64206-2_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64205-5

  • Online ISBN: 978-3-319-64206-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics