Abstract
The automatic detection of hate speech is a blooming field in the natural language processing community. In recent years there have been efforts in detecting hate speech in multiple languages, using models trained on multiple languages at the same time. Furthermore, there is special interest in the capabilities of language agnostic features to represent text in hate speech detection. This is because models can be trained in multiple languages, and then the capabilities of the model and representation can be tested on a unseen language.
In this work we focused on detecting hate speech in mono-lingual, multi-lingual and cross-lingual settings. For this we used a pre-trained language model called Language Agnostic BERT Sentence Embeddings (LabSE), both for feature extraction and as an end to end classification model. We tested different models such as Support Vector Machines and Tree-based models, and representations in particular bag of words, bag of characters, and sentence embeddings extracted from Multi-lingual BERT. The dataset used was the SemEval 2019 task 5 data set, which covers hate speech against immigrants and women in English and Spanish. The results show that the usage of LabSE as feature extraction improves the performance on both languages in a mono-lingual setting, and in a cross-lingual setting. Moreover, LabSE as an end to end classification model performs better than the reported by the authors of SemEval 2019 task 5 data set for the Spanish language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aluru, S.S., Mathew, B., Saha, P., Mukherjee, A.: Deep learning models for multilingual hate speech detection. arXiv preprint arXiv:2004.06465 (2020)
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760 (2017)
Basile, V., et al.: Semeval-2019 task 5: multilingual detection of hate speech against immigrants and women in twitter. In: 13th International Workshop on Semantic Evaluation, pp. 54–63. Association for Computational Linguistics (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Feng, F., Yang, Y., Cer, D., Arivazhagan, N., Wang, W.: Language-agnostic bert sentence embedding. arXiv preprint arXiv:2007.01852 (2020)
Gambäck, B., Sikdar, U.K.: Using convolutional neural networks to classify hate-speech. In: Proceedings of the First Workshop on Abusive Language Online, pp. 85–90 (2017)
Ghosh Roy, S., Narayan, U., Raha, T., Abid, Z., Varma, V.: Leveraging multilingual transformers for hate speech detection. arXiv e-prints pp. arXiv-2101 (2021)
Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimedia Ubiquitous Eng. 10(4), 215–230 (2015)
Glavaš, G., Karan, M., Vulić, I.: Xhate-999: analyzing and detecting abusive language across domains and languages. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 6350–6365 (2020)
Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 1621–1622 (2013)
Mishra, S., Mishra, S.: 3idiots at hasoc 2019: fine-tuning transformer neural networks for hate speech identification in Indo-European languages. In: FIRE (Working Notes), pp. 208–213 (2019)
Mozafari, M., Farahbakhsh, R., Crespi, N.: A BERT-based transfer learning approach for hate speech detection in online social media. In: Cherifi, H., Gaito, S., Mendes, J.F., Moro, E., Rocha, L.M. (eds.) COMPLEX NETWORKS 2019. SCI, vol. 881, pp. 928–940. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36687-2_77
Ousidhoum, N., Lin, Z., Zhang, H., Song, Y., Yeung, D.Y.: Multilingual and multi-aspect hate speech analysis. arXiv preprint arXiv:1908.11049 (2019)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Polignano, M., Basile, P., De Gemmis, M., Semeraro, G.: Hate speech detection through Alberto Italian language understanding model. In: NL4AI@ AI* IA (2019)
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709 (2015)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)
Sohn, H., Lee, H.: Mc-bert4hate: hate speech detection using multi-channel bert for different languages and translations. In: 2019 International Conference on Data Mining Workshops (ICDMW), pp. 551–559. IEEE (2019)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Stappen, L., Brunn, F., Schuller, B.: Cross-lingual zero-and few-shot hate speech detection utilising frozen transformer language models and axel. arXiv preprint arXiv:2004.13850 (2020)
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, pp. 19–26 (2012)
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)
Zimmerman, S., Kruschwitz, U., Fox, C.: Improving hate speech detection with deep learning ensembles. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Acknowledgment
This work was supported in part by Basal Project AFB 1800082, in part by Project DGIIP-UTFSM PI-LIR-2020-17. Héctor Allende-Cid work is supported by PUCV VRIEA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Rodríguez, S.E., Allende-Cid, H., Allende, H. (2021). Detecting Hate Speech in Cross-Lingual and Multi-lingual Settings Using Language Agnostic Representations. In: Tavares, J.M.R.S., Papa, J.P., González Hidalgo, M. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2021. Lecture Notes in Computer Science(), vol 12702. Springer, Cham. https://doi.org/10.1007/978-3-030-93420-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-93420-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93419-4
Online ISBN: 978-3-030-93420-0
eBook Packages: Computer ScienceComputer Science (R0)