Abstract
Social media users often employ offensive language in their communication. Detecting offensive language on Twitter has many applications ranging from detecting/predicting conflict to measuring polarization. In this paper, we focus on building effective offensive tweet detection. We show that we can rapidly build a training set using a seed list of offensive words. Given the automatically created dataset, we trained a character n-gram based deep learning classifier that can effectively classify tweets with F1 score of 90%. We also show that we can expand our offensive word list by contrasting offensive and non-offensive tweets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Abozinadah, E.: Detecting abusive arabic language twitter accounts using a multidimensional analysis model. Ph.D. thesis, George Mason University (2017)
Agrawal, S., Awekar, A.: Deep learning for detecting cyberbullying across multiple social media platforms. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 141–153. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_11
Alakrot, A., Murray, L., Nikolov, N.S.: Towards accurate detection of offensive language in online communication in arabic. Procedia Comput. Sci. 142, 315–320 (2018)
Albadi, N., Kurdi, M., Mishra, S.: Are they our brothers? analysis and detection of religious hate speech in the arabic twittersphere. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 69–76. IEEE (2018)
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee (2017)
Barberá, P., Sood, G.: Follow your ideology: measuring media ideology on social networks. In: Annual Meeting of the European Political Science Association, Vienna, Austria (2015). http://www.gsood.com/research/papers/mediabias.pdf
Chadefaux, T.: Early warning signals for war in the news. J. Peace Res. 51(1), 5–18 (2014)
Conover, M., Ratkiewicz, J., Francisco, M.R., Gonçalves, B., Menczer, F., Flammini, A.: Political polarization on twitter. In: ICWSM, vol. 133, pp. 89–96 (2011)
Darwish, K., Alexandrov, D., Nakov, P., Mejova, Y.: Seminar users in the arabic twitter sphere. In: Ciampaglia, G.L., Mashhadi, A., Yasseri, T. (eds.) SocInfo 2017. LNCS, vol. 10539, pp. 91–108. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67217-5_7
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Eleventh International Conference on Web and Social Media (ICWSM), pp. 512–515 (2017)
Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th international conference on world wide web, pp. 29–30. ACM (2015)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3(Mar), 1289–1305 (2003)
Jay, T., Janschewitz, K.: The pragmatics of swearing. J. Politeness Res. Lang. Behav. Cult. 4(2), 267–288 (2008)
Joachims, T.: A statistical learning model of text classification with support vector machines. In: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 128–136 (2001)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: Twenty-seventh AAAI Conference on Artificial Intelligence (2013)
Malmasi, S., Zampieri, M.: Detecting hate speech in social media. arXiv preprint arXiv:1712.06427 (2017)
Mubarak, H., Darwish, K.: Using twitter to collect a multi-dialectal corpus of arabic. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), pp. 1–7 (2014)
Mubarak, H., Darwish, K., Magdy, W.: Abusive language detection on arabic social media. In: Proceedings of the First Workshop on Abusive Language Online, pp. 52–56 (2017)
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th international conference on world wide web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In: Proceedings of the NAACL student research workshop, pp. 88–93 (2016)
Weber, I., Garimella, V.R.K., Batayneh, A.: Secular vs. islamist polarization in Egypt on twitter. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 290–297. ACM (2013)
Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A., Edwards, L.: Detection of harassment on web 2.0. Proc. Content Anal. WEB 2, 1–7 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Mubarak, H., Darwish, K. (2019). Arabic Offensive Language Classification on Twitter. In: Weber, I., et al. Social Informatics. SocInfo 2019. Lecture Notes in Computer Science(), vol 11864. Springer, Cham. https://doi.org/10.1007/978-3-030-34971-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-34971-4_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34970-7
Online ISBN: 978-3-030-34971-4
eBook Packages: Computer ScienceComputer Science (R0)