Detecting Hate Speech in Cross-Lingual and Multi-lingual Settings Using Language Agnostic Representations

Rodríguez, Sebastián E.; Allende-Cid, Héctor; Allende, Héctor

doi:10.1007/978-3-030-93420-0_8

Sebastián E. Rodríguez¹¹,
Héctor Allende-Cid¹² &
Héctor Allende¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12702))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

768 Accesses
2 Citations

Abstract

The automatic detection of hate speech is a blooming field in the natural language processing community. In recent years there have been efforts in detecting hate speech in multiple languages, using models trained on multiple languages at the same time. Furthermore, there is special interest in the capabilities of language agnostic features to represent text in hate speech detection. This is because models can be trained in multiple languages, and then the capabilities of the model and representation can be tested on a unseen language.

In this work we focused on detecting hate speech in mono-lingual, multi-lingual and cross-lingual settings. For this we used a pre-trained language model called Language Agnostic BERT Sentence Embeddings (LabSE), both for feature extraction and as an end to end classification model. We tested different models such as Support Vector Machines and Tree-based models, and representations in particular bag of words, bag of characters, and sentence embeddings extracted from Multi-lingual BERT. The dataset used was the SemEval 2019 task 5 data set, which covers hate speech against immigrants and women in English and Spanish. The results show that the usage of LabSE as feature extraction improves the performance on both languages in a mono-lingual setting, and in a cross-lingual setting. Moreover, LabSE as an end to end classification model performs better than the reported by the authors of SemEval 2019 task 5 data set for the Spanish language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/capkuro/DetectingHatespeechLabse.

References

Aluru, S.S., Mathew, B., Saha, P., Mukherjee, A.: Deep learning models for multilingual hate speech detection. arXiv preprint arXiv:2004.06465 (2020)
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760 (2017)
Google Scholar
Basile, V., et al.: Semeval-2019 task 5: multilingual detection of hate speech against immigrants and women in twitter. In: 13th International Workshop on Semantic Evaluation, pp. 54–63. Association for Computational Linguistics (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Feng, F., Yang, Y., Cer, D., Arivazhagan, N., Wang, W.: Language-agnostic bert sentence embedding. arXiv preprint arXiv:2007.01852 (2020)
Gambäck, B., Sikdar, U.K.: Using convolutional neural networks to classify hate-speech. In: Proceedings of the First Workshop on Abusive Language Online, pp. 85–90 (2017)
Google Scholar
Ghosh Roy, S., Narayan, U., Raha, T., Abid, Z., Varma, V.: Leveraging multilingual transformers for hate speech detection. arXiv e-prints pp. arXiv-2101 (2021)
Google Scholar
Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimedia Ubiquitous Eng. 10(4), 215–230 (2015)
Article Google Scholar
Glavaš, G., Karan, M., Vulić, I.: Xhate-999: analyzing and detecting abusive language across domains and languages. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 6350–6365 (2020)
Google Scholar
Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 1621–1622 (2013)
Google Scholar
Mishra, S., Mishra, S.: 3idiots at hasoc 2019: fine-tuning transformer neural networks for hate speech identification in Indo-European languages. In: FIRE (Working Notes), pp. 208–213 (2019)
Google Scholar
Mozafari, M., Farahbakhsh, R., Crespi, N.: A BERT-based transfer learning approach for hate speech detection in online social media. In: Cherifi, H., Gaito, S., Mendes, J.F., Moro, E., Rocha, L.M. (eds.) COMPLEX NETWORKS 2019. SCI, vol. 881, pp. 928–940. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36687-2_77
Chapter Google Scholar
Ousidhoum, N., Lin, Z., Zhang, H., Song, Y., Yeung, D.Y.: Multilingual and multi-aspect hate speech analysis. arXiv preprint arXiv:1908.11049 (2019)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Polignano, M., Basile, P., De Gemmis, M., Semeraro, G.: Hate speech detection through Alberto Italian language understanding model. In: NL4AI@ AI* IA (2019)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709 (2015)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)
Sohn, H., Lee, H.: Mc-bert4hate: hate speech detection using multi-channel bert for different languages and translations. In: 2019 International Conference on Data Mining Workshops (ICDMW), pp. 551–559. IEEE (2019)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Stappen, L., Brunn, F., Schuller, B.: Cross-lingual zero-and few-shot hate speech detection utilising frozen transformer language models and axel. arXiv preprint arXiv:2004.13850 (2020)
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, pp. 19–26 (2012)
Google Scholar
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)
Zimmerman, S., Kruschwitz, U., Fox, C.: Improving hate speech detection with deep learning ensembles. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Google Scholar

Download references

Acknowledgment

This work was supported in part by Basal Project AFB 1800082, in part by Project DGIIP-UTFSM PI-LIR-2020-17. Héctor Allende-Cid work is supported by PUCV VRIEA.

Author information

Authors and Affiliations

Departamento de Informática, Universidad Técnica Federico Santa María, Valparaíso, Chile
Sebastián E. Rodríguez & Héctor Allende
Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Valparaíso, Chile
Héctor Allende-Cid

Authors

Sebastián E. Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Héctor Allende-Cid
View author publications
You can also search for this author in PubMed Google Scholar
Héctor Allende
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastián E. Rodríguez .

Editor information

Editors and Affiliations

Universidade do Porto, Porto, Portugal
João Manuel R. S. Tavares
Universidade Estadual Paulista, São Paulo, Brazil
João Paulo Papa
University of the Balearic Islands, Palma de Mallorca, Spain
Manuel González Hidalgo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodríguez, S.E., Allende-Cid, H., Allende, H. (2021). Detecting Hate Speech in Cross-Lingual and Multi-lingual Settings Using Language Agnostic Representations. In: Tavares, J.M.R.S., Papa, J.P., González Hidalgo, M. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2021. Lecture Notes in Computer Science(), vol 12702. Springer, Cham. https://doi.org/10.1007/978-3-030-93420-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-93420-0_8
Published: 13 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93419-4
Online ISBN: 978-3-030-93420-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Detecting Hate Speech in Cross-Lingual and Multi-lingual Settings Using Language Agnostic Representations