Toxic Comment Classification Service in Social Network

Dolgushin, Mikhail; Ismakova, Dayana; Bidulya, Yuliya; Krupkin, Igor; Barskaya, Galina; Lesiv, Anastasiya

doi:10.1007/978-3-030-87802-3_15

Mikhail Dolgushin¹⁰,
Dayana Ismakova¹⁰,
Yuliya Bidulya¹⁰,
Igor Krupkin¹⁰,
Galina Barskaya¹⁰ &
…
Anastasiya Lesiv¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

International Conference on Speech and Computer

1648 Accesses
3 Citations

Abstract

The article discusses the development of an online tool for moderating the content of social network groups. The use of classification using machine learning methods is proposed as the main element of the system. The creation of the feature set of messages is assumed by extracting the content features of the text, as well as the use of word embeddings vectors. The authors conducted a series of experiments to find the best combination of vector representation, content features and classification method. Tests on a dataset of 11 thousand messages in Russian showed the result of 87% accuracy. The architecture of the group moderator’s web application with the ability to automatically apply classification results to control users and display posts is proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recognition of Antisocial Behavior in Online Discussions

Detection of Insulting Comments in Online Discussion

Multilabel Toxic Comment Classification Using Supervised Machine Learning Algorithms

References

Georgakopoulos, S.V., Tasoulis, S.K., Vrahatis, A.G., Plagianakos, V.P.: Convolutional neural networks for toxic comment classification. arXiv preprint arXiv:1802.09957 (2018)
Medialogiya–monitoring and analysis of media and social networks (rus.). https://www.mlg.ru
Corazza, M., Menini, S., Cabrio, E., Tonelli, S., Villata, S.: A multilingual evaluation for online hate speech detection. ACM Trans. Internet Technol. Assoc. Comput. Mach. 20(2), 1–22 (2020). https://doi.org/10.1145/3377323.hal-02972184
Article Google Scholar
Russian Language Toxic Comments. https://www.kaggle.com/blackmoon/russian-language-toxic-comments
“Toxicology” project: vk_comments_DS. https://github.com/mihatronych/files/blob/main/ds_of_toxic_messages_from_vk/our_toxic_vk_comments_data.csv
Shekhar, R., Pranjić, M., Pollak, S., Pelicon, A., Purver, M.: Automating news comment moderation with limited resources: benchmarking in croatian and estonian. J. Lang. Technol. Comput. Linguist. 34, 49–79 (2020)
Google Scholar
Pavlopoulos, J., Malakasiotis, P., Androutsopoulos, I.: Deeper attention to abusive user content moderation. In: EMNLP, pp. 1125–1135. Copenghagen, Denmark (2017)
Google Scholar
Levonevskiy, D., Malov, D., Vatamaniuk, I.: Estimating aggressiveness of russian texts by means of machine learning. In: Salah, A.A., Karpov, A., Potapova, R. (eds.) SPECOM 2019. LNCS (LNAI), vol. 11658, pp. 270–279. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26061-3_28
Chapter Google Scholar
Lee, J.-T., Yang, M.-C., Rim, H.-C.: Discovering high-quality threaded discussions in online forums. J. Comput. Sci. Technol. 29(3), 519–531 (2014)
Article Google Scholar
Plaza-del Arco, F.M., Molina-Gonzalez, D., Martın-Valdivia, T., Urena-Lopez, A.: SINAI at SemEval-2019 Task 6: incorporating lexicon knowledge into SVM learning to identify and categorize offensive language in social media. In: The 13th International Workshop on Semantic Evaluation (SemEval) (2019)
Google Scholar
Chernyaev, A., Spryiskov, A., Ivashko, A., Bidulya, Y.: A rumor detection in Russian tweets. In: Karpov, A., Potapova, R. (eds.) SPECOM 2020. LNCS (LNAI), vol. 12335, pp. 108–118. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60276-5_11
Chapter Google Scholar
Pavlopoulos, J., Thain, N., Dixon, L., Androutsopoulos, I.: ConvAI at SemEval-2019 Task 6: offensive language identification and categorization with perspective and BERT. In: SemEval, Minneapolis, USA (2019)
Google Scholar
Pietro, M.D.: Text Classification with NLP: tf-idf vs Word2Vec vs BERT. https://towardsdatascience.com/text-classification-with-nlp-tf-idf-vs-word2vec-vs-bert-41ff868d1794
Camacho-Collados, J., Pilehvar, M.T.: From word to sense embeddings: a survey on vector representations of meaning. arXiv:1805.04032. Bibcode:2018arXiv180504032C (2018)
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp. 88–93 (2016)
Google Scholar
NLTK documentation. https://www.nltk.org
Morphological analyzer pymorphy2. https://pymorphy2.readthedocs.io
Document-term matrix. https://en.wikipedia.org/wiki/Document-term_matrix
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830. JMLR (2011)
Google Scholar
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. Valletta, Malta, May. ELRA (2010). http://is.muni.cz/publication/884893/en
Gensim: Doc2vec. https://radimrehurek.com/gensim/models/doc2vec.html
Mestre, M.: FastText: stepping through the code. https://medium.com/@mariamestre/fasttext-stepping-through-the-code-259996d6ebc4
Dostoevsky: Sentiment Analysis Library for Russian Language. https://pypi.org/project/dostoevsky
SpaCy: Industrial-Strength Natural Language Processing. https://spacy.io
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification, Department of Computer Science, Stanford University, Stanford 94305. https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf
Wang, Z.: NBSVM. https://www.kaggle.com/ziliwang/nbsvm

Download references

Author information

Authors and Affiliations

University of Tyumen, Tyumen, Russia
Mikhail Dolgushin, Dayana Ismakova, Yuliya Bidulya, Igor Krupkin, Galina Barskaya & Anastasiya Lesiv

Authors

Mikhail Dolgushin
View author publications
You can also search for this author in PubMed Google Scholar
Dayana Ismakova
View author publications
You can also search for this author in PubMed Google Scholar
Yuliya Bidulya
View author publications
You can also search for this author in PubMed Google Scholar
Igor Krupkin
View author publications
You can also search for this author in PubMed Google Scholar
Galina Barskaya
View author publications
You can also search for this author in PubMed Google Scholar
Anastasiya Lesiv
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dolgushin, M., Ismakova, D., Bidulya, Y., Krupkin, I., Barskaya, G., Lesiv, A. (2021). Toxic Comment Classification Service in Social Network. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-87802-3_15
Published: 22 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Toxic Comment Classification Service in Social Network

Abstract

Access this chapter

Similar content being viewed by others

Recognition of Antisocial Behavior in Online Discussions

Detection of Insulting Comments in Online Discussion

Multilabel Toxic Comment Classification Using Supervised Machine Learning Algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Toxic Comment Classification Service in Social Network

Abstract

Access this chapter

Similar content being viewed by others

Recognition of Antisocial Behavior in Online Discussions

Detection of Insulting Comments in Online Discussion

Multilabel Toxic Comment Classification Using Supervised Machine Learning Algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation