Programming and Computer Software

, Volume 45, Issue 3, pp 99–115 | Cite as

Machine Learning Methods for Detecting and Monitoring Extremist Information on the Internet

  • I. V. MashechkinEmail author
  • M. I. PetrovskiyEmail author
  • D. V. TsarevEmail author
  • M. N. ChikunovEmail author


In this paper, we employ machine learning methods to solve the problem of countering terrorism and extremism by using information from the Internet. This problem involves retrieving electronic messages, documents, and web resources that potentially contain information of terrorist or extremist nature, identifying the structure of user groups and online communities that disseminate this information, monitoring and modeling information flows in these communities, as well as assessing threats and predicting risks based on monitoring results. We propose some original language-independent algorithms for pattern-based information retrieval, thematic modeling, and prediction of message flow characteristics, as well as assessment and prediction of potential risk coming from members of online communities by using data on the structure of relations in these communities, which makes it possible to detect potentially dangerous users even without full access to the content they distribute, e.g., through private channels and chat rooms.



This work was supported by the Russian Foundation for Basic Research, project no. 16-29-09555 ofi_m.


  1. 1.
    Why big data analytics holds the key to tackling the changing terror threat, J. Adv. Anal. Intell., 2015. http:// iq-q115.pdf.Google Scholar
  2. 2.
    Hankin, C., IDEAS factory – Detecting terrorist activities: Making sense. Scholar
  3. 3.
    Nizamani, S. et al., Modeling suspicious email detection using enhanced feature selection, 2013, preprint arXiv 1312.1971.Google Scholar
  4. 4.
    Sheehan, I.S., Assessing and comparing data sources for terrorism research, Evidence-Based Counterterrorism Policy, New York: Springer, 2012, vol. 3, pp. 13–40.Google Scholar
  5. 5.
    Berger, J.M. and Morgan, J., The ISIS Twitter Census, Brookings Project on US Relations with the Islamic World, 2015, no. 20.Google Scholar
  6. 6.
    IDEAS factory – Detecting terrorist activities: Making sense. Scholar
  7. 7.
    Proc. Workshop Link Analysis, Counterterrorism, and Security, SIAM Int. Conf. Data Mining, California, 2005. Scholar
  8. 8.
    Zhang, Y., Zeng, S., Fan, L., Dang, Y., Catherine, A., Larson, C.A., and Chen, H., Dark web forums portal: Searching and analyzing jihadist forums, Proc. IEEE Int. Conf. Intelligence and Security Informatics (ISI), Piscataway, USA, 2009, pp. 71–76.Google Scholar
  9. 9.
    Abbasi, A. and Chen, H., Applying authorship analysis to extremist-group web forum messages, IEEE Intell. Syst., 2005, vol. 20, pp. 67–75.CrossRefGoogle Scholar
  10. 10.
    Ríos, S.A. and Muñoz, R., Dark web portal overlapping community detection based on topic models, Proc. ACM SIGKDD Workshop Intelligence and Security Informatics (ISI-KDD), New York, 2012.Google Scholar
  11. 11.
    Kuang, D., Choo, J., and Park, H., Nonnegative matrix factorization for interactive topic modeling and document clustering, in Partitional Clustering Algorithms, Springer, 2015, pp. 215–243.Google Scholar
  12. 12.
    Tsarev, D.V., Petrovskiy, M.I., and Mashechkin, I.V., Using NMF-based text summarization to improve supervised and unsupervised classification, Proc. 11th IEEE Int. Conf. Application of Information and Communication Technologies, 2011, pp. 185–189.Google Scholar
  13. 13.
    Elovici, Y., Shapira, B., Last, M., Zaafrany, O., Friedman, M., Schneider, M., and Kandel, A., Detection of access to terror-related web sites using an advanced terror detection system (ATDS), J. Am. Soc. Inf. Sci. Technol., 2010, vol. 61, pp. 405–418.Google Scholar
  14. 14.
    Agarwal, S. and Sureka, A., Applying social media intelligence for predicting and identifying on-line radicalization and civil unrest oriented threats, 2015, arXiv 1511.06858.Google Scholar
  15. 15.
    Badia, A. and Kantardzic, M., Link analysis tools for intelligence and counterterrorism, Lect. Notes Comput. Sci., vol. 3495, pp. 49–59.Google Scholar
  16. 16.
    Ferrara, E., Wang, W.-Q., Varol, O., Flammini, A., and Galstyan, A., Predicting online extremism, content adopters, and interaction reciprocity, Proc. Int. Conf. Social Informatics, 2016, pp. 22–39.Google Scholar
  17. 17.
    Ríos, S.A. and Muñoz, R., Dark web portal overlapping community detection based on topic models, Proc. ACM SIGKDD Workshop Intelligence and Security Informatics (ISI-KDD), New York, 2012.Google Scholar
  18. 18.
    Toure, I. and Gangopadhyay, A., Analyzing terror attacks using latent semantic indexing, Proc. IEEE Int. Conf. Technologies for Homeland Security (HST), 2013, pp. 334–337.Google Scholar
  19. 19.
    Scanlon, J.R. and Gerber, M.S., Forecasting violent extremist cyber recruitment, IEEE Trans. Inf. Forensics Secur., 2015, vol. 10, no. 11, pp. 2461–2470.CrossRefGoogle Scholar
  20. 20.
    L’Huillier, G., Alvarez, H., Ríos, S.A., and Aguilera, F., Topic-based social network analysis for virtual communities of interests in the dark web, SIGKDD Explor. Newsl., 2011, vol. 12, no. 2, pp. 66–73.CrossRefGoogle Scholar
  21. 21.
    Yang, L., Liu, F., Kizza, J.M., and Ege, R.K., Discovering topics from dark websites, Proc. IEEE Symp. Computational Intelligence in Cyber Security (CICS), 2009, pp. 175–179.Google Scholar
  22. 22.
    Petrovskiy, M., Tsarev, D., and Pospelova, I., Pattern based information retrieval approach to discover extremist information on the Internet, Mining Intelligence and Knowledge Exploration, Ghosh, A., Pal, R., and Prasath, R., Eds., Springer, 2017.Google Scholar
  23. 23.
    Manning, C.D. et al., Introduction to Information Retrieval, Cambridge University Press, 2008, vol. 1.CrossRefzbMATHGoogle Scholar
  24. 24.
    Chisholm, E. and Kolda, T.G., New term weighting formulas for the vector space method in information retrieval, Computer Science and Mathematics Division, Oak Ridge National Laboratory, 1999.CrossRefGoogle Scholar
  25. 25.
    Landauer, T.K. and Dumais, S.T., A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge, Psychol. Rev., 1997, vol. 104, no. 2, p. 211.CrossRefGoogle Scholar
  26. 26.
    Lee, D.D. and Seung, H.S., Learning the parts of objects by non-negative matrix factorization, Nature, 1999, vol. 401, no. 6755, pp. 788–791.CrossRefzbMATHGoogle Scholar
  27. 27.
    Tsarev, D.V., Petrovskiy, M.I., Mashechkin, I.V., and Popov, D.S., Automatic text summarization using latent semantic analysis, Program. Comput. Software, 2011, vol. 37, no. 6, pp. 299–305.MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Steinberger, J. and Ježek, K., Text summarization and singular value decomposition, Advances in Information Systems, Berlin: Springer, 2005, pp. 245–254.Google Scholar
  29. 29.
    Ester, M., Kriegel, H.-P., Sander, J., and Xu, X., A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD), 1996, pp. 226–231.Google Scholar
  30. 30.
    Levenshtein, V.I., Binary codes with correction of fallouts, insertions, and substitutions of characters, Dokl. Akad. Nauk SSSR (Proc. Acad. Sci. USSR), 1965, vol. 163, no. 4, pp. 845–848.Google Scholar
  31. 31.
    Hurvich, C.M., Simonoff, J.S., and Tsai, C.L., Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion, J. R. Stat. Soc. B, 1998, vol. 60, pp. 271–293.MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Salvador, S. and Chan, P., FastDTW: Toward accurate dynamic time warping in linear time and space, Proc. KDD Workshop Mining Temporal and Sequential Data, 2004, pp. 70–80.Google Scholar
  33. 33.
    Notation for ARIMA models, Time Series Forecasting System, SAS Institute.Google Scholar
  34. 34.
    Shehabat, A., Mitew, T., and Alzoubi, Y., Encrypted jihad: Investigating the role of Telegram app in lone wolf attacks in the West, J. Strategic Secur., 2017, no. 3, pp. 27–53.Google Scholar
  35. 35.
    Page, L., Brin, S., Motwani, R., and Winograd, T., The pagerank citation ranking: Bringing order to the web, Stanford InfoLab, 1999.Google Scholar
  36. 36.
    Kleinberg, J.M., Authoritative sources in a hyperlinked environment, J. ACM, 1999, vol. 46, nos. 5–7, pp. 604–632.MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Wasserman, S. and Faust, K., Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences), Cambridge University Press, 1994, 1st ed.CrossRefzbMATHGoogle Scholar
  38. 38.
    Chen, T. and Guestrin, C., XGBoost: A scalable tree boosting system, Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2016, pp. 785–794.Google Scholar
  39. 39.
    Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.Y., LightGBM: A highly efficient gradient boosting decision tree, Adv. Neural Inf. Process. Syst., 2017, pp. 3149–3157.Google Scholar
  40. 40.
    Baldi, P., Autoencoders, unsupervised learning, and deep architectures, Proc. ICML Workshop Unsupervised and Transfer Learning, 2012, pp. 37–49.Google Scholar
  41. 41.
    The 20 Newsgroups data set. edu/jrennie/20Newsgroups.Google Scholar
  42. 42.
    Kaggle “How ISIS uses Twitter” dataset. http://www. Scholar
  43. 43.
    Kaggle “ISIS religious texts” dataset. Scholar
  44. 44.
    Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J., Classification and regression trees, Monterey: Wadsworth & Brooks/Cole Advanced Books & Software, 1984.zbMATHGoogle Scholar
  45. 45.
    Breiman, L., Bagging predictors, Mach. Learn., 1996, vol. 24, no. 2, pp. 123–140.zbMATHGoogle Scholar
  46. 46.
    Hutter, F., Hoos, H., and Leyton-Brown, K., Sequential model-based optimization for general algorithm configuration, Learn. Intell. Optim., 2011, pp. 507–523.Google Scholar

Copyright information

© Pleiades Publishing, Ltd. 2019

Authors and Affiliations

  1. 1.Faculty of Computational Mathematics and Cybernetics, Moscow State UniversityMoscowRussia

Personalised recommendations