Skip to main content

A Neural Network Model to Include Textual Dependency Tree Structure in Gender Classification of Russian Text Author

  • Conference paper
  • First Online:
Advanced Technologies in Robotics and Intelligent Systems

Part of the book series: Mechanisms and Machine Science ((Mechan. Machine Science,volume 80))

Abstract

The research proposes the neural network methods to include a textual dependency tree structure in classification tasks of Russian texts. Author profiling task of gender identification was chosen to test the models, and two corpora used in experiments: based on a crowdsource, and in-person polling. The first approach is based on a long short-term memory (LSTM) layers, and developed graph embedding algorithm. The second one is based on a graph convolution network and LSTM. Two syntactic parsers were used to obtain dependency trees from the texts. Input data was represented in different forms: morphological binary vectors, FastText vectors, and their combination. The developed models result was compared to the state-of-the-art, that is neural network model based on a convolutional and LSTM layers. Finally, we demonstrate that including textual dependency tree structure to input feature space improves f1-score of gender classification task on 4% for the RusPersonality dataset, and 7% for the crowdsource dataset in average. The developed models resulting f1-score is 84% and 83%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS’13, vol. 2, pp. 3111–3119. Curran Associates Inc., Lake Tahoe, Nevada, USA (2013)

    Google Scholar 

  2. Greff, K., Srivastava, R.K., Koutnık, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2016)

    Article  MathSciNet  Google Scholar 

  3. LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook of Brain Theory and Neural Networks, 2nd edn, pp. 276–279. MIT Press, Cambridge (1995)

    Google Scholar 

  4. Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1556–1566. Association for Computational Linguistics, Beijing (2015)

    Google Scholar 

  5. Miyazaki, R., Komachi, M.: Japanese Sentiment Classification Using a Tree-Structured Long Short-Term Memory with Attention. arXiv:1704.00924 (2017)

  6. Sboev, A., Moloshnikov, I., Gudovskikh, D., Rybka, R.: A comparison of data driven models of solving the task of gender identification of author in Russian language texts for cases without and with the gender deception. IOP Conf. Ser.: J. Phys. 937, 012046 (7 pp.) (2017)

    Google Scholar 

  7. Sboev, A., Moloshnikov, I., Gudovskikh, D., Selivanov, A., Rybka, R., Litvinova, T.: Automatic gender identification of author of Russian text by machine learning and neural net algorithms in case of gender deception. Procedia Comput. Sci. 123, 417–423 (2018)

    Article  Google Scholar 

  8. Sboev, A., Moloshnikov, I., Gudovskikh, D., Selivanov, A., Rybka, R., Litvinova, T.: Deep learning neural nets versus traditional machine learning in gender identification of authors of RusProfiling texts. Procedia Comput. Sci. 123, 424–431 (2018)

    Article  Google Scholar 

  9. Straka, M., Straková, J.: Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88–99. Association for Computational Linguistics, Vancouver, Canada (2017)

    Google Scholar 

  10. Rybka, R., Sboev, A., Moloshnikov, I., Gudovskikh, D.: Morpho-syntactic parsing based on neural networks and corpus data. In: 2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT), pp. 89–95. IEEE, St. Petersburg (2015)

    Google Scholar 

  11. Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv:1412.6806 (2014)

  12. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  13. Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text classification. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), pp. 7370–7377 (2019)

    Google Scholar 

  14. Smith, L.N.: Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE, Santa Rosa (2017)

    Google Scholar 

Download references

Acknowledgements

The reported study was funded by RFBR according to the research project №18-29-10084 and carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Sboev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sboev, A., Selivanov, A., Rybka, R., Moloshnikov, I., Bogachev, D. (2020). A Neural Network Model to Include Textual Dependency Tree Structure in Gender Classification of Russian Text Author. In: Misyurin, S., Arakelian, V., Avetisyan, A. (eds) Advanced Technologies in Robotics and Intelligent Systems. Mechanisms and Machine Science, vol 80. Springer, Cham. https://doi.org/10.1007/978-3-030-33491-8_48

Download citation

Publish with us

Policies and ethics