Abstract
The research proposes the neural network methods to include a textual dependency tree structure in classification tasks of Russian texts. Author profiling task of gender identification was chosen to test the models, and two corpora used in experiments: based on a crowdsource, and in-person polling. The first approach is based on a long short-term memory (LSTM) layers, and developed graph embedding algorithm. The second one is based on a graph convolution network and LSTM. Two syntactic parsers were used to obtain dependency trees from the texts. Input data was represented in different forms: morphological binary vectors, FastText vectors, and their combination. The developed models result was compared to the state-of-the-art, that is neural network model based on a convolutional and LSTM layers. Finally, we demonstrate that including textual dependency tree structure to input feature space improves f1-score of gender classification task on 4% for the RusPersonality dataset, and 7% for the crowdsource dataset in average. The developed models resulting f1-score is 84% and 83%, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS’13, vol. 2, pp. 3111–3119. Curran Associates Inc., Lake Tahoe, Nevada, USA (2013)
Greff, K., Srivastava, R.K., Koutnık, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2016)
LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook of Brain Theory and Neural Networks, 2nd edn, pp. 276–279. MIT Press, Cambridge (1995)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1556–1566. Association for Computational Linguistics, Beijing (2015)
Miyazaki, R., Komachi, M.: Japanese Sentiment Classification Using a Tree-Structured Long Short-Term Memory with Attention. arXiv:1704.00924 (2017)
Sboev, A., Moloshnikov, I., Gudovskikh, D., Rybka, R.: A comparison of data driven models of solving the task of gender identification of author in Russian language texts for cases without and with the gender deception. IOP Conf. Ser.: J. Phys. 937, 012046 (7 pp.) (2017)
Sboev, A., Moloshnikov, I., Gudovskikh, D., Selivanov, A., Rybka, R., Litvinova, T.: Automatic gender identification of author of Russian text by machine learning and neural net algorithms in case of gender deception. Procedia Comput. Sci. 123, 417–423 (2018)
Sboev, A., Moloshnikov, I., Gudovskikh, D., Selivanov, A., Rybka, R., Litvinova, T.: Deep learning neural nets versus traditional machine learning in gender identification of authors of RusProfiling texts. Procedia Comput. Sci. 123, 424–431 (2018)
Straka, M., Straková, J.: Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88–99. Association for Computational Linguistics, Vancouver, Canada (2017)
Rybka, R., Sboev, A., Moloshnikov, I., Gudovskikh, D.: Morpho-syntactic parsing based on neural networks and corpus data. In: 2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT), pp. 89–95. IEEE, St. Petersburg (2015)
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv:1412.6806 (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text classification. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), pp. 7370–7377 (2019)
Smith, L.N.: Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE, Santa Rosa (2017)
Acknowledgements
The reported study was funded by RFBR according to the research project №18-29-10084 and carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sboev, A., Selivanov, A., Rybka, R., Moloshnikov, I., Bogachev, D. (2020). A Neural Network Model to Include Textual Dependency Tree Structure in Gender Classification of Russian Text Author. In: Misyurin, S., Arakelian, V., Avetisyan, A. (eds) Advanced Technologies in Robotics and Intelligent Systems. Mechanisms and Machine Science, vol 80. Springer, Cham. https://doi.org/10.1007/978-3-030-33491-8_48
Download citation
DOI: https://doi.org/10.1007/978-3-030-33491-8_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33490-1
Online ISBN: 978-3-030-33491-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)