Skip to main content

Named Entity Recognition in Russian with Word Representation Learned by a Bidirectional Language Model

  • Conference paper
  • First Online:
Artificial Intelligence and Natural Language (AINL 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 930))

Included in the following conference series:

Abstract

Named Entity Recognition is one of the most popular tasks of the natural language processing. Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for natural language processing tasks. However, in most cases, a recurrent network that operates on word-level representations to produce context sensitive representations is trained on relatively few labeled data. Also, there are many difficulties in processing Russian language. In this paper, we present a semi-supervised approach for adding deep contextualized word representation that models both complex characteristics of word usage (e.g., syntax and semantics), and how these usages vary across linguistic contexts (i.e., to model polysemy). Here word vectors are learned functions of the internal states of a deep bidirectional language model, which is pretrained on a large text corpus. We show that these representations can be easily added to existing models and be combined with other word representation features. We evaluate our model on FactRuEval-2016 dataset for named entity recognition in Russian and achieve state of the art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/allenai/bilm-tf.

  2. 2.

    https://github.com/ctlab/ML/BiLSTM_for_NER.

  3. 3.

    https://github.com/dialogue-evaluation/factRuEval-2016.

  4. 4.

    http://www.chaskor.ru/.

  5. 5.

    https://ru.wikinews.org.

  6. 6.

    http://universaldependencies.org/conll17/.

References

  1. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  2. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  3. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  4. Wieting, J., Bansal, M., Gimpel, K., Livescu, K.: Charagram: embedding words and sentences via character n-Grams. arXiv:1607.02789 (2016)

  5. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv:1607.04606 (2016)

  6. Neelakantan, A., Shankar, J., Passos, A., McCallum, A.: Efficient non-parametric estimation of multiple embeddings per word in vector space. arXiv:1504.06654 (2015)

  7. Melamud, O., Goldberger, J., Dagan, I.: context2vec: learning generic context embedding with bidirectional LSTM. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp. 51–61 (2016)

    Google Scholar 

  8. McCann, B., Bradbury, J., Xiong, C., Socher, R.: Learned in translation: contextualized word vectors. In: Advances in Neural Information Processing Systems, pp. 6297–6308 (2017)

    Google Scholar 

  9. Peters, M.E., Ammar, W., Bhagavatula, C., Power, R.: Semi-supervised sequence tagging with bidirectional language models. arXiv:1705.00108 (2017)

  10. Hashimoto, K., Xiong, C., Tsuruoka, Y., Socher, R.: A joint many-task model: growing a neural network for multiple NLP tasks. arXiv:1611.01587 (2016)

  11. Belinkov, Y., Durrani, N., Dalvi, F., Sajjad, H., Glass, J.: What do neural machine translation models learn about morphology? arXiv:1704.03471 (2017)

  12. Yang, Z., Salakhutdinov, R., Cohen, W.W.: Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv:1703.06345 (2017)

  13. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv:1603.01360 (2016)

  14. Peters, M.E., et al.: Deep contextualized word representations. arXiv:1802.05365 (2018)

  15. Howard, J., Sebastian, R.: Fine-tuned language models for text classification. arXiv:1801.06146 (2018)

  16. Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling. arXiv:1602.02410 (2016)

  17. Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)

    Google Scholar 

  18. Ruder, S.: An Overview of gradient descent optimization algorithms. arXiv:1609.04747 (2016)

  19. Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., Lehmann, S.: Using millions of Emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv:1708.00524 (2017)

  20. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv:1607.06450 (2016)

  21. Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems, pp. 2377–2385 (2015)

    Google Scholar 

  22. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)

    Google Scholar 

  23. Trofimov, I.V.: Person name recognition in news articles based on the per-sons1000/1111-F collections. In: 16th All-Russian Scientific Conference Digital Libraries: Advanced Methods and Technologies, Digital Collections, RCDL 2014, pp. 217–221 (2014)

    Google Scholar 

  24. Gareev, R., Tkachenko, M., Solovyev, V., Simanovsky, A., Ivanov, V.: Introducing baselines for russian named entity recognition. In: Gelbukh, A. (ed.) CICLing 2013 Part I. LNCS, vol. 7816, pp. 329–342. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37247-6_27

    Chapter  Google Scholar 

  25. Mozharova, V., Loukachevitch, N.: Two-stage approach in Russian named entity recognition. In: Proceeding of IEEE International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT 2016), pp. 1–6 (2016)

    Google Scholar 

  26. Ivanitskiy, R., Shipilo, A., Kovriguina, L.: Russian named entities recognition and classification using distributed word and phrase representations. In: SIMBig, pp. 150–156 (2016)

    Google Scholar 

  27. Sysoev, A.A., Andrianov, I.A.: Named entity recognition in Russian: the power of wiki-based approach. In: Dialog Conference (2016, in Russian)

    Google Scholar 

  28. Malykh, V., Ozerin, A.: Reproducing Russian NER baseline quality without additional data. In: CDUD@ CLA, pp. 54–59 (2016)

    Google Scholar 

  29. Rubaylo, A.V., Kosenko, M.Y.: Software utilities for natural language information retrievial. Alm. Mod. Sci. Educ. 12(114), 87–92 (2016)

    Google Scholar 

  30. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991 (2015)

  31. Tutubalina, E., Nikolenko, S.: Combination of deep recurrent neural networks and conditional random fields for extracting adverse drug reactions from user reviews. J Healthc. Eng. 2017 (2017)

    Article  Google Scholar 

  32. Anh, L.T., Arkhipov, M.Y., Burtsev. M.S.: Application of a hybrid Bi-LSTM-CRF model to the task of russian named entity recognition. arXiv:1709.09686 (2017)

Download references

Acknowledgements

The authors would like to thank Ivan Smetannikov for helpful conversation and unknown AINL reviewers for useful comments.

The research was supported by the Government of the Russian Federation (Grant 08-08).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrey Filchenkov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Konoplich, G., Putin, E., Filchenkov, A., Rybka, R. (2018). Named Entity Recognition in Russian with Word Representation Learned by a Bidirectional Language Model. In: Ustalov, D., Filchenkov, A., Pivovarova, L., Žižka, J. (eds) Artificial Intelligence and Natural Language. AINL 2018. Communications in Computer and Information Science, vol 930. Springer, Cham. https://doi.org/10.1007/978-3-030-01204-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01204-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01203-8

  • Online ISBN: 978-3-030-01204-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics