Abstract
Currently, the Internet is becoming one of the most dangerous threats to personal, public and state information security. Therefore, the task of detecting and counteracting inappropriate information in digital network content becomes of national importance. The paper offers a new approach to creating an intelligent system for detecting and counteracting inappropriate information on the Internet based on the use of machine learning methods and processing of big data and describes the architecture of such a system. Experimental evaluation of one of the most important system components, which is the component of multidimensional evaluation and categorization of information objects in single-threaded and multi-threaded modes showed high efficiency of using various classifiers included in the Python Scikit-learn and Spark MLlib libraries to solve the problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C.: Data Classification: Algorithms and Applications. CRC Press, Boca Raton (2014)
Aggarwal, C.C.: Machine Learning for Text. Springer, Cham (2018)
Al-Khateeb, S., Hussain, M.N., Agarwal, M.N.: Leveraging social network analysis and cyber forensics approaches to study cyber propaganda campaigns. In: Social Networks and Surveillance for Society, pp. 19–42. Springer, Cham (2019)
Atodiresei, C.-S., Tănăselea, A., Iftene, A.: Identifying fake news and fake users on twitter. Procedia Comput. Sci. 126, 451–461 (2018)
Badri Satya, P.R., Lee, K., Lee, D., Tran, T., Zhang, J.J.: Uncovering fake likers in online social networks. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 2365–2370. ACM (2016)
Benkler, Y., Faris, R., Roberts, H.: Network Propaganda: Manipulation, Disinformation, and Radicalization in American Politics. Oxford University Press, Oxford (2018)
Conroy, N.J., Rubin, V.L., Chen, Y.: Automatic deception detection: methods for finding fake news. Proc. Assoc. Inf. Sci. Technol. 52(1), 1–4 (2015)
Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2007)
Khonji, M., Iraqi, Y., Jones, A.: Enhancing phishing e-mail classifiers: a lexical url analysis approach. Int. J. Inf. Secur. Res. (IJISR) 2(1/2), 40 (2012)
Kotenko, I., Chechulin, A., Komashinsky, D.: Categorisation of web pages for protection against inappropriate content in the internet. Int. J. Internet Protoc. Technol. 10(1), 61–71 (2017)
Kotenko, I.V., Saenko, I., Kushnerevich, A.: Parallel big data processing system for security monitoring in internet of things networks. JoWUA 8(4), 60–74 (2017)
Li, M., Wang, X., Gao, K., Zhang, S.: A survey on information diffusion in online social networks: models and methods. Information 8(4), 118 (2017)
Liu, Y., Liu, Y., Zhang, M., Ma, S.: Pay me and i’ll follow you: detection of crowdturfing following activities in microblog environment. In: IJCAI, pp. 3789–3796 (2016)
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious urls. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1245–1254. ACM (2009)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mustafaraj, E., Metaxas, P.T.: The fake news spreading plague: was it preventable? In: Proceedings of the 2017 ACM on Web Science Conference, pp. 235–239. ACM (2017)
Novozhilov, D., Kotenko, I., Chechulin, A.: Improving the categorization of web sites by analysis of html-tags statistics to block inappropriate content. In: Intelligent Distributed Computing IX, pp. 257–263. Springer (2016)
Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. (CSUR) 41(2), 12 (2009)
Raniere, K.A.: Data stream division to increase data transmission rates, 5 December 2017. US Patent 9,838,166 (2017)
Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017)
Tschiatschek, S., Singla, A., Gomez Rodriguez, M., Merchant, M., Krause, A.: Fake news detection in social networks via crowd signals. In: Companion of the The Web Conference 2018 on The Web Conference 2018, pp. 517–524. International World Wide Web Conferences Steering Committee (2018)
Tushkanova, O.: Comparative analysis of the numerical measures for mining associative and causal relationships in big data. In: Creativity in Intelligent Technologies and Data Science, First conference Proceedings, CIT&DS, pp. 571–582. Springer (2015)
Acknowledgements
This research is being supported by the grant of RSF #18-11-00302 in SPIIRAS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Vitkova, L., Saenko, I., Tushkanova, O. (2020). An Approach to Creating an Intelligent System for Detecting and Countering Inappropriate Information on the Internet. In: Kotenko, I., Badica, C., Desnitsky, V., El Baz, D., Ivanovic, M. (eds) Intelligent Distributed Computing XIII. IDC 2019. Studies in Computational Intelligence, vol 868. Springer, Cham. https://doi.org/10.1007/978-3-030-32258-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-32258-8_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32257-1
Online ISBN: 978-3-030-32258-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)