Skip to main content

Detecting Hate Speech in Cross-Lingual and Multi-lingual Settings Using Language Agnostic Representations

  • Conference paper
  • First Online:
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (CIARP 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12702))

Included in the following conference series:

Abstract

The automatic detection of hate speech is a blooming field in the natural language processing community. In recent years there have been efforts in detecting hate speech in multiple languages, using models trained on multiple languages at the same time. Furthermore, there is special interest in the capabilities of language agnostic features to represent text in hate speech detection. This is because models can be trained in multiple languages, and then the capabilities of the model and representation can be tested on a unseen language.

In this work we focused on detecting hate speech in mono-lingual, multi-lingual and cross-lingual settings. For this we used a pre-trained language model called Language Agnostic BERT Sentence Embeddings (LabSE), both for feature extraction and as an end to end classification model. We tested different models such as Support Vector Machines and Tree-based models, and representations in particular bag of words, bag of characters, and sentence embeddings extracted from Multi-lingual BERT. The dataset used was the SemEval 2019 task 5 data set, which covers hate speech against immigrants and women in English and Spanish. The results show that the usage of LabSE as feature extraction improves the performance on both languages in a mono-lingual setting, and in a cross-lingual setting. Moreover, LabSE as an end to end classification model performs better than the reported by the authors of SemEval 2019 task 5 data set for the Spanish language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/capkuro/DetectingHatespeechLabse.

References

  1. Aluru, S.S., Mathew, B., Saha, P., Mukherjee, A.: Deep learning models for multilingual hate speech detection. arXiv preprint arXiv:2004.06465 (2020)

  2. Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760 (2017)

    Google Scholar 

  3. Basile, V., et al.: Semeval-2019 task 5: multilingual detection of hate speech against immigrants and women in twitter. In: 13th International Workshop on Semantic Evaluation, pp. 54–63. Association for Computational Linguistics (2019)

    Google Scholar 

  4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  5. Feng, F., Yang, Y., Cer, D., Arivazhagan, N., Wang, W.: Language-agnostic bert sentence embedding. arXiv preprint arXiv:2007.01852 (2020)

  6. Gambäck, B., Sikdar, U.K.: Using convolutional neural networks to classify hate-speech. In: Proceedings of the First Workshop on Abusive Language Online, pp. 85–90 (2017)

    Google Scholar 

  7. Ghosh Roy, S., Narayan, U., Raha, T., Abid, Z., Varma, V.: Leveraging multilingual transformers for hate speech detection. arXiv e-prints pp. arXiv-2101 (2021)

    Google Scholar 

  8. Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimedia Ubiquitous Eng. 10(4), 215–230 (2015)

    Article  Google Scholar 

  9. Glavaš, G., Karan, M., Vulić, I.: Xhate-999: analyzing and detecting abusive language across domains and languages. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 6350–6365 (2020)

    Google Scholar 

  10. Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 1621–1622 (2013)

    Google Scholar 

  11. Mishra, S., Mishra, S.: 3idiots at hasoc 2019: fine-tuning transformer neural networks for hate speech identification in Indo-European languages. In: FIRE (Working Notes), pp. 208–213 (2019)

    Google Scholar 

  12. Mozafari, M., Farahbakhsh, R., Crespi, N.: A BERT-based transfer learning approach for hate speech detection in online social media. In: Cherifi, H., Gaito, S., Mendes, J.F., Moro, E., Rocha, L.M. (eds.) COMPLEX NETWORKS 2019. SCI, vol. 881, pp. 928–940. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36687-2_77

    Chapter  Google Scholar 

  13. Ousidhoum, N., Lin, Z., Zhang, H., Song, Y., Yeung, D.Y.: Multilingual and multi-aspect hate speech analysis. arXiv preprint arXiv:1908.11049 (2019)

  14. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  15. Polignano, M., Basile, P., De Gemmis, M., Semeraro, G.: Hate speech detection through Alberto Italian language understanding model. In: NL4AI@ AI* IA (2019)

    Google Scholar 

  16. Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709 (2015)

  17. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)

  18. Sohn, H., Lee, H.: Mc-bert4hate: hate speech detection using multi-channel bert for different languages and translations. In: 2019 International Conference on Data Mining Workshops (ICDMW), pp. 551–559. IEEE (2019)

    Google Scholar 

  19. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  20. Stappen, L., Brunn, F., Schuller, B.: Cross-lingual zero-and few-shot hate speech detection utilising frozen transformer language models and axel. arXiv preprint arXiv:2004.13850 (2020)

  21. Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, pp. 19–26 (2012)

    Google Scholar 

  22. Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)

  23. Zimmerman, S., Kruschwitz, U., Fox, C.: Improving hate speech detection with deep learning ensembles. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)

    Google Scholar 

Download references

Acknowledgment

This work was supported in part by Basal Project AFB 1800082, in part by Project DGIIP-UTFSM PI-LIR-2020-17. Héctor Allende-Cid work is supported by PUCV VRIEA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastián E. Rodríguez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rodríguez, S.E., Allende-Cid, H., Allende, H. (2021). Detecting Hate Speech in Cross-Lingual and Multi-lingual Settings Using Language Agnostic Representations. In: Tavares, J.M.R.S., Papa, J.P., González Hidalgo, M. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2021. Lecture Notes in Computer Science(), vol 12702. Springer, Cham. https://doi.org/10.1007/978-3-030-93420-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93420-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93419-4

  • Online ISBN: 978-3-030-93420-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics