Abstract
In this paper we address the problem of automatic extraction of facts from Russian texts. The facts under examination are the intentions of social network users to purchase certain goods or use certain services. The utilized approach is machine learning with annotation. A training set for expert annotation consists of messages from the “VKontakte” social network, selected through the LeadScanner API. The invented system of semantic tags allows distinguishing between various intentional blocks: objects, their different properties and emphatic constructions. Pre-processing of the training set includes lemmatization and grammatical tagging with PyMorphy2. Then, on the material of the training set, a directed graph is constructed. Each node in this graph corresponds to an intentional block, including information about its expertly-assigned intentional tag, grammatical and/or lexical properties of its main word. The edges of the graph connect the intentional blocks that can be found in adjacent positions across all the messages of the training set. Extraction of intention objects and their properties is achieved by test set analysis in accordance to the constructed graph. Test set includes both messages containing non-consumer intentions or no intentions at all. The precision and recall of intention extraction with macro average is 82% and 74% respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pivovarova, L.M.: Faktograficheskij analiz teksta v sisteme podderzhki prinyatiya reshenij. Vestnik SPbGU, Ser. 9, Vyp. 4, pp. 190–197 (2010)
Cunningham, H., Maynard, D., Tablan V.: JAPE: a Java Annotation Patterns. Technical report CS–00–10, University of Sheffield, Department of Computer Science (2000)
Tomita-parser. http://tech.yandex.ru/tomita. Accessed 26 Mar 2018
Bol’shakova, E., Baeva, N., Bordachenkova, E.: Leksiko-sintaksicheskie shablony v zadachah avtomaticheskoj obrabotki tekstov. In: Komp’yuternaya lingvistika i intellektual’nye tekhnologii (Dialog 2007), vol. 2, pp. 70–75. RGGU, Moskva (2007)
Tang, J., Hong, M., Zhang, D., Liang, B., Li, J.: Information extraction: methodologies and applications. In: Emerging Technologies of Text Mining: Techniques and Applications, chap. 1, pp. 1–33 (2008). http://keg.cs.tsinghua.edu.cn/jietang/publications/Tang-et-al-Information_Extraction.pdf. Accessed 21 Mar 2018
Nguyen, T., Grishman, R.: Event detection and domain adaptation with convolutional neural networks. In: ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings, vol. 2, pp. 365–371 (2015)
Lukashevich, N.: Iteracionnoe izvlechenie shablonov opisaniya sobytij po novost-nym klasteram. In: RCDL 2012, pp. 353–359. Pereslavl’-Zalesskij (2012)
Navigli, R., Velardi, P.: Learning Word-Class Lattices for Definition and Hypernym Extraction. http://www.aclweb.org/anthology/P10-1134. Accessed 23 Mar 2017
Pymorphy2. http://pymorphy2.readthedocs.io/en/latest. Accessed 26 Mar 2018
Leadscanner. https://leadscanner.ru. Accessed 24 Feb 2018
NCRL. http://www.ruscorpora.ru/search-main.html. Accessed 11 Feb 2018
Acknowledgements
The study was supported by the Russian Academy of Science (the Program of Basic Research, project 0314-2016-0015).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Pimenov, I., Salomatina, N. (2018). Extraction of Explicit Consumer Intentions from Social Network Messages. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2018. Lecture Notes in Computer Science(), vol 11179. Springer, Cham. https://doi.org/10.1007/978-3-030-11027-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-11027-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11026-0
Online ISBN: 978-3-030-11027-7
eBook Packages: Computer ScienceComputer Science (R0)