Understanding Russian Information Operations Using Unsupervised Multilingual Topic Modeling
What does this or that population think about a given issue? Which topics ‘go viral’ and why? How does disinformation spread? How do populations view issues in light of national ‘master narratives’? These are all questions which automated approaches to analyzing social media promise to help answer.
We have adapted a technique for multilingual topic modeling to look at differences between what is discussed in Russian versus English. This kills several birds with one stone. We turn the data’s multilinguality from an impediment into a leverageable advantage. But most importantly, we play to unsupervised machine learning’s strengths: its ability to detect large-scale trends, anomalies, similarities and differences, in a highly general way.
Applying this approach to different Twitter datasets, we were able to draw out several interesting and non-obvious insights about Russian cyberspace and how it differs from its English counterpart. We show how these insights reveal aspects of how master narratives are instantiated, and how sentiment plays out on a large scale, in Russian discourse relating to NATO.
KeywordsInformation operations Topic modeling Multilingual Russia
- 1.Duda, R.O., Hart, P.E., Stork, D.G.: Unsupervised learning and clustering. In: Pattern Classification, 2nd edn. Wiley, New York (2001). ISBN: 0-471-05669-3Google Scholar
- 2.Kim, S.-M., Hovy, E.: Determining the sentiment of opinions. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), pp. 1367–1373 (2004)Google Scholar
- 3.Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, July 2002, pp. 79–86 (2002)Google Scholar
- 7.Center for Computational Analysis of Social and Organizational Systems: Multilingual Twitter sentiment analysis (2016). http://www.casos.cs.cmu.edu/projects/projects/mltsa.php. Accessed 27 July 2016
- 9.Chew, P.: Multilingual retrieval and topic modeling using vector-space word alignment. Galisteo Consulting Group, Inc. Technical report GCG002, February 2016. doi: 10.13140/RG.2.2.21482.11205
- 10.Bouveng, K.: The role of messianism in contemporary Russian identity and statecraft. Durham Theses, Durham University (2010). http://etheses.dur.ac.uk/438