Advertisement

Arabic Offensive Language Classification on Twitter

  • Hamdy Mubarak
  • Kareem DarwishEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11864)

Abstract

Social media users often employ offensive language in their communication. Detecting offensive language on Twitter has many applications ranging from detecting/predicting conflict to measuring polarization. In this paper, we focus on building effective offensive tweet detection. We show that we can rapidly build a training set using a seed list of offensive words. Given the automatically created dataset, we trained a character n-gram based deep learning classifier that can effectively classify tweets with F1 score of 90%. We also show that we can expand our offensive word list by contrasting offensive and non-offensive tweets.

Keywords

Offensive language Obscenities Text classification 

References

  1. 1.
    Abozinadah, E.: Detecting abusive arabic language twitter accounts using a multidimensional analysis model. Ph.D. thesis, George Mason University (2017)Google Scholar
  2. 2.
    Agrawal, S., Awekar, A.: Deep learning for detecting cyberbullying across multiple social media platforms. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 141–153. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-76941-7_11CrossRefGoogle Scholar
  3. 3.
    Alakrot, A., Murray, L., Nikolov, N.S.: Towards accurate detection of offensive language in online communication in arabic. Procedia Comput. Sci. 142, 315–320 (2018)CrossRefGoogle Scholar
  4. 4.
    Albadi, N., Kurdi, M., Mishra, S.: Are they our brothers? analysis and detection of religious hate speech in the arabic twittersphere. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 69–76. IEEE (2018)Google Scholar
  5. 5.
    Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee (2017)Google Scholar
  6. 6.
    Barberá, P., Sood, G.: Follow your ideology: measuring media ideology on social networks. In: Annual Meeting of the European Political Science Association, Vienna, Austria (2015). http://www.gsood.com/research/papers/mediabias.pdf
  7. 7.
    Chadefaux, T.: Early warning signals for war in the news. J. Peace Res. 51(1), 5–18 (2014)CrossRefGoogle Scholar
  8. 8.
    Conover, M., Ratkiewicz, J., Francisco, M.R., Gonçalves, B., Menczer, F., Flammini, A.: Political polarization on twitter. In: ICWSM, vol. 133, pp. 89–96 (2011)Google Scholar
  9. 9.
    Darwish, K., Alexandrov, D., Nakov, P., Mejova, Y.: Seminar users in the arabic twitter sphere. In: Ciampaglia, G.L., Mashhadi, A., Yasseri, T. (eds.) SocInfo 2017. LNCS, vol. 10539, pp. 91–108. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-67217-5_7CrossRefGoogle Scholar
  10. 10.
    Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Eleventh International Conference on Web and Social Media (ICWSM), pp. 512–515 (2017)Google Scholar
  11. 11.
    Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th international conference on world wide web, pp. 29–30. ACM (2015)Google Scholar
  12. 12.
    Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3(Mar), 1289–1305 (2003)zbMATHGoogle Scholar
  13. 13.
    Jay, T., Janschewitz, K.: The pragmatics of swearing. J. Politeness Res. Lang. Behav. Cult. 4(2), 267–288 (2008)Google Scholar
  14. 14.
    Joachims, T.: A statistical learning model of text classification with support vector machines. In: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 128–136 (2001)Google Scholar
  15. 15.
    Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
  16. 16.
    Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: Twenty-seventh AAAI Conference on Artificial Intelligence (2013)Google Scholar
  17. 17.
    Malmasi, S., Zampieri, M.: Detecting hate speech in social media. arXiv preprint arXiv:1712.06427 (2017)
  18. 18.
    Mubarak, H., Darwish, K.: Using twitter to collect a multi-dialectal corpus of arabic. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), pp. 1–7 (2014)Google Scholar
  19. 19.
    Mubarak, H., Darwish, K., Magdy, W.: Abusive language detection on arabic social media. In: Proceedings of the First Workshop on Abusive Language Online, pp. 52–56 (2017)Google Scholar
  20. 20.
    Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th international conference on world wide web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)Google Scholar
  21. 21.
    Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In: Proceedings of the NAACL student research workshop, pp. 88–93 (2016)Google Scholar
  22. 22.
    Weber, I., Garimella, V.R.K., Batayneh, A.: Secular vs. islamist polarization in Egypt on twitter. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 290–297. ACM (2013)Google Scholar
  23. 23.
    Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A., Edwards, L.: Detection of harassment on web 2.0. Proc. Content Anal. WEB 2, 1–7 (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Qatar Computing Research Institute, HBKUDohaQatar

Personalised recommendations