Understanding Russian Information Operations Using Unsupervised Multilingual Topic Modeling

  • Peter A. ChewEmail author
  • Jessica G. Turnley
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10354)


What does this or that population think about a given issue? Which topics ‘go viral’ and why? How does disinformation spread? How do populations view issues in light of national ‘master narratives’? These are all questions which automated approaches to analyzing social media promise to help answer.

We have adapted a technique for multilingual topic modeling to look at differences between what is discussed in Russian versus English. This kills several birds with one stone. We turn the data’s multilinguality from an impediment into a leverageable advantage. But most importantly, we play to unsupervised machine learning’s strengths: its ability to detect large-scale trends, anomalies, similarities and differences, in a highly general way.

Applying this approach to different Twitter datasets, we were able to draw out several interesting and non-obvious insights about Russian cyberspace and how it differs from its English counterpart. We show how these insights reveal aspects of how master narratives are instantiated, and how sentiment plays out on a large scale, in Russian discourse relating to NATO.


Information operations Topic modeling Multilingual Russia 


  1. 1.
    Duda, R.O., Hart, P.E., Stork, D.G.: Unsupervised learning and clustering. In: Pattern Classification, 2nd edn. Wiley, New York (2001). ISBN: 0-471-05669-3Google Scholar
  2. 2.
    Kim, S.-M., Hovy, E.: Determining the sentiment of opinions. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), pp. 1367–1373 (2004)Google Scholar
  3. 3.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, July 2002, pp. 79–86 (2002)Google Scholar
  4. 4.
    Bader, B.W., Berry, M.W., Browne, M.: Discussion tracking in Enron email using PARAFAC. In: Berry, M.W., Castellanos, M. (eds.) Survey of Text Mining II, pp. 147–163. Springer, London (2008)CrossRefGoogle Scholar
  5. 5.
    Chew, P.A.: ‘Linguistics-Lite’ topic extraction from multilingual social media data. In: Agarwal, N., Xu, K., Osgood, N. (eds.) SBP 2015. LNCS, vol. 9021, pp. 276–282. Springer, Cham (2015). doi: 10.1007/978-3-319-16268-3_30 Google Scholar
  6. 6.
    Tsikerdekis, M., Zeadally, S.: Online deception in social media. Commun. ACM 57(9), 72–80 (2014)CrossRefGoogle Scholar
  7. 7.
    Center for Computational Analysis of Social and Organizational Systems: Multilingual Twitter sentiment analysis (2016). Accessed 27 July 2016
  8. 8.
    Halverson, J., Corman, S., Goodall, H.: Master Narratives of Islamist Extremism. Macmillan, New York (2011)CrossRefGoogle Scholar
  9. 9.
    Chew, P.: Multilingual retrieval and topic modeling using vector-space word alignment. Galisteo Consulting Group, Inc. Technical report GCG002, February 2016. doi: 10.13140/RG.2.2.21482.11205
  10. 10.
    Bouveng, K.: The role of messianism in contemporary Russian identity and statecraft. Durham Theses, Durham University (2010).

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Galisteo Consulting Group, Inc.AlbuquerqueUSA

Personalised recommendations