Extraction of Explicit Consumer Intentions from Social Network Messages
In this paper we address the problem of automatic extraction of facts from Russian texts. The facts under examination are the intentions of social network users to purchase certain goods or use certain services. The utilized approach is machine learning with annotation. A training set for expert annotation consists of messages from the “VKontakte” social network, selected through the LeadScanner API. The invented system of semantic tags allows distinguishing between various intentional blocks: objects, their different properties and emphatic constructions. Pre-processing of the training set includes lemmatization and grammatical tagging with PyMorphy2. Then, on the material of the training set, a directed graph is constructed. Each node in this graph corresponds to an intentional block, including information about its expertly-assigned intentional tag, grammatical and/or lexical properties of its main word. The edges of the graph connect the intentional blocks that can be found in adjacent positions across all the messages of the training set. Extraction of intention objects and their properties is achieved by test set analysis in accordance to the constructed graph. Test set includes both messages containing non-consumer intentions or no intentions at all. The precision and recall of intention extraction with macro average is 82% and 74% respectively.
KeywordsIntention Intention marker Machine learning with annotation Directed graph Fact extraction
The study was supported by the Russian Academy of Science (the Program of Basic Research, project 0314-2016-0015).
- 1.Pivovarova, L.M.: Faktograficheskij analiz teksta v sisteme podderzhki prinyatiya reshenij. Vestnik SPbGU, Ser. 9, Vyp. 4, pp. 190–197 (2010)Google Scholar
- 2.Cunningham, H., Maynard, D., Tablan V.: JAPE: a Java Annotation Patterns. Technical report CS–00–10, University of Sheffield, Department of Computer Science (2000)Google Scholar
- 3.Tomita-parser. http://tech.yandex.ru/tomita. Accessed 26 Mar 2018
- 4.Bol’shakova, E., Baeva, N., Bordachenkova, E.: Leksiko-sintaksicheskie shablony v zadachah avtomaticheskoj obrabotki tekstov. In: Komp’yuternaya lingvistika i intellektual’nye tekhnologii (Dialog 2007), vol. 2, pp. 70–75. RGGU, Moskva (2007)Google Scholar
- 5.Tang, J., Hong, M., Zhang, D., Liang, B., Li, J.: Information extraction: methodologies and applications. In: Emerging Technologies of Text Mining: Techniques and Applications, chap. 1, pp. 1–33 (2008). http://keg.cs.tsinghua.edu.cn/jietang/publications/Tang-et-al-Information_Extraction.pdf. Accessed 21 Mar 2018
- 6.Nguyen, T., Grishman, R.: Event detection and domain adaptation with convolutional neural networks. In: ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings, vol. 2, pp. 365–371 (2015)Google Scholar
- 7.Lukashevich, N.: Iteracionnoe izvlechenie shablonov opisaniya sobytij po novost-nym klasteram. In: RCDL 2012, pp. 353–359. Pereslavl’-Zalesskij (2012)Google Scholar
- 8.Navigli, R., Velardi, P.: Learning Word-Class Lattices for Definition and Hypernym Extraction. http://www.aclweb.org/anthology/P10-1134. Accessed 23 Mar 2017
- 9.Pymorphy2. http://pymorphy2.readthedocs.io/en/latest. Accessed 26 Mar 2018
- 10.Leadscanner. https://leadscanner.ru. Accessed 24 Feb 2018
- 11.NCRL. http://www.ruscorpora.ru/search-main.html. Accessed 11 Feb 2018