A New Measure of Polarization in the Annotation of Hate Speech

Akhtar, Sohail; Basile, Valerio; Patti, Viviana

doi:10.1007/978-3-030-35166-3_41

A New Measure of Polarization in the Annotation of Hate Speech

Sohail Akhtar¹¹,
Valerio Basile¹¹ &
Viviana Patti¹¹

Conference paper
First Online: 12 November 2019

1865 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11946))

Abstract

The number of social media users is ever-increasing. Unfortunately, this has also resulted in the massive rise of uncensored online hate against vulnerable communities such as immigrants, LGBT and women. Current work on the automatic detection of various forms of hate speech (HS) typically employs supervised learning, requiring manually annotated data. The highly polarizing nature of the topics involved raises concerns about the quality of annotations these systems rely on, because not all the annotators are equally sensitive to different kinds of hate speech. We propose an approach to leverage the fine-grained knowledge expressed by individual annotators, before their subjectivity is averaged out by the gold standard creation process. This helps us to refine the quality of training sets for hate speech detection. We introduce a measure of polarization at the level of single instances in the data to manipulate the training set and reduce the impact of most polarizing text on the learning process.

We test our approach on three datasets, in English and Italian, annotated by experts and workers hired on a crowdsourcing platform. We classify instances of sexist, racist, and homophobic hate speech in tweets and show how our approach improves the prediction performance of a supervised classifier. Moreover, the proposed polarization measure helps towards the manual exploration of the individual instances of tweets in our datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/ZeerakW/hatespeech.
2.
https://www.figure-eight.com/.
3.
http://accept.arcigay.it/.
4.
https://www.arcigay.it/en/.
5.
This difference is due to having five annotators in total, therefore uneven group sizes.
6.
The threshold values come from the observation of actual P-index values in the data.
7.
In Italian, the English word gender is used as a borrowing only to refer to the modern gender theories.

References

Abbondante, F.: Il ruolo dei social network nella lotta all’hate speech: un’analisi comparata fra l’esperienza statunitense e quella europea. Informatica e diritto 26(1–2), 41–68 (2017)
Google Scholar
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008). https://doi.org/10.1162/coli.07-034-R2
Article Google Scholar
Basile, A., Caselli, T., Nissim, M.: Predicting controversial news using Facebook reactions. In: Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017), Rome, Italy, 11–13 December 2017 (2017)
Google Scholar
Basile, V., et al.: SemEval-2019 task 5: multilingual detection of hate speech against immigrants and women in Twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019). Association for Computational Linguistics (2019)
Google Scholar
Basile, V., Novielli, N., Croce, D., Barbieri, F., Nissim, M., Patti, V.: Sentiment polarity classification at EVALITA: lessons learned and open challenges. IEEE Trans. Affect. Comput. (2018)
Google Scholar
Beelen, K., Kanoulas, E., van de Velde, B.: Detecting controversies in online news media. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, pp. 1069–1072. ACM, New York (2017). https://doi.org/10.1145/3077136.3080723
Benesch, S., Ruths, D., Dillon, K.P., Saleem, H.M., Wright, L.: Counterspeech on Twitter: a field study. In: A Report for Public Safety Canada under the Kanishka Project (2016)
Google Scholar
Bosco, C., Dell’Orletta, F., Poletto, F., Sanguinetti, M., Tesconi, M.: Overview of the EVALITA 2018 hate speech detection task. In: Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2018) co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Turin, Italy, 12–13 December 2018. CEUR Workshop Proceedings, vol. 2263. CEUR-WS.org (2018)
Google Scholar
Checco, A., Roitero, K., Maddalena, E., Mizzaro, S., Demartini, G.: Let’s agree to disagree: fixing agreement measures for crowdsourcing. In: Proceedings of the Fifth AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2017, Québec City, Québec, Canada, 23–26 October 2017, pp. 11–20. AAAI Press (2017)
Google Scholar
Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: Proceedings of the 2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust, SOCIALCOM-PASSAT 2012, pp. 71–80. IEEE Computer Society, Washington, DC, USA (2012). https://doi.org/10.1109/SocialCom-PASSAT.2012.55
Duggan, M.: Online harassment 2017. Technical report, Pew Research Center (2017)
Google Scholar
Fersini, E., Nozza, D., Rosso, P.: Overview of the EVALITA 2018 task on automatic misogyny identification (AMI). In: Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, Final Workshop (EVALITA 2018) co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Turin, Italy, 12–13 December 2018. CEUR Workshop Proceedings, vol. 2263. CEUR-WS.org (2018)
Google Scholar
Fersini, E., Rosso, P., Anzovino, M.: Overview of the task on automatic misogyny identification at IberEval 2018. In: IberEval@SEPLN (2018)
Google Scholar
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51(4), 85:1–85:30 (2018)
Article Google Scholar
Hovy, D., Berg-Kirkpatrick, T., Vaswani, A., Hovy, E.: Learning whom to trust with MACE. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1120–1130. Association for Computational Linguistics, Atlanta, June 2013. https://www.aclweb.org/anthology/N13-1132
Izsák-Ndiaye, R.: Report of the special rapporteur on minority issues, Rita Iizsák: Comprehensive study of the human rights situation of Roma worldwide, with a particular focus on the phenomenon of anti-Gypsyism. Technical report, UN, Geneva, 11 May 2015. http://digitallibrary.un.org/record/797194. Submitted pursuant to Human Rights Council resolution 26/4
Lai, M., Tambuscio, M., Patti, V., Ruffo, G., Rosso, P.: Extracting graph topological information and users’ opinion. In: Jones, G., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 112–118. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_10
Chapter Google Scholar
Mcpherson, M., Smith-Lovin, L., Cook, J.: Birds of a feather: homophily in social networks. Annu. Rev. Sociol. 27, 415–444 (2001). https://doi.org/10.3410/f.725356294.793504070
Article Google Scholar
Miller, C., et al.: From brussels to Brexit: islamophobia, xenophobia, racism and reports of hateful incidents on twitter. DEMOS (2016). https://www.demos.co.uk/wp-content/uploads/2016/07/From-Brussels-to-Brexit_-Islamophobia-Xenophobia-Racism-and-Reports-of-Hateful-Incidents-on-Twitter-Research-Prepared-for-Channel-4-Dispatches-%E2%80%98Racist-Britain%E2%80%99-.pdf
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, pp. 145–153. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2016). https://doi.org/10.1145/2872427.2883062
Nockleby, J, T.: Hate speech. In: Encyclopedia of the American Constitution, vol. 3, pp. 1277–1279. Macmillan (2000)
Google Scholar
Poletto, F., Stranisci, M., Sanguinetti, M., Patti, V., Bosco, C.: Hate speech annotation: analysis of an Italian Twitter corpus. In: Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017), Rome, Italy, 11–13 December 2017. CEUR Workshop Proceedings, vol. 2006. CEUR-WS.org (2017)
Google Scholar
Popescu, A.M., Pennacchiotti, M.: Detecting controversial events from Twitter. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 1873–1876. ACM, New York (2010). https://doi.org/10.1145/1871437.1871751
Sanguinetti, M., Poletto, F., Bosco, C., Patti, V., Stranisci, M.: An Italian Twitter corpus of hate speech against immigrants. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018). European Language Resource Association (2018). http://aclweb.org/anthology/L18-1443
Sheerman-Chase, T., Ong, E.J., Bowden, R.: Cultural factors in the regression of non-verbal communication perception, pp. 1242–1249, November 2011. https://doi.org/10.1109/ICCVW.2011.6130393
Soberón, G., Aroyo, L., Welty, C., Inel, O., Lin, H., Overmeen, M.: Measuring crowd truth: disagreement metrics combined with worker behavior filters. In: Proceedings of the 1st International Conference on Crowdsourcing the Semantic Web, CrowdSem 2013, vol. 1030, pp. 45–58. CEUR-WS.org, Aachen (2013). http://dl.acm.org/citation.cfm?id=2874376.2874381
Sood, S., Antin, J., Churchill, E.: Profanity use in online communities. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2012, pp. 1481–1490. ACM, New York (2012). https://doi.org/10.1145/2207676.2208610
Van Hee, C., et al.: Detection and fine-grained classification of cyberbullying events. In: Proceedings of the 10th Recent Advances in Natural Language Processing (RANLP 2015), Hissar, Bulgaria, October 2015
Google Scholar
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, LSM 2012, pp. 19–26. Association for Computational Linguistics, Stroudsburg (2012). http://dl.acm.org/citation.cfm?id=2390374.2390377
Waseem, Z.: Are you a racist or am i seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, pp. 138–142 (2016)
Google Scholar
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88–93. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/N16-2013. http://aclweb.org/anthology/N16-2013
Xu, J.M., Jun, K.S., Zhu, X., Bellmore, A.: Learning from bullying traces in social media. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2012, pp. 656–666. Association for Computational Linguistics, Stroudsburg (2012). http://dl.acm.org/citation.cfm?id=2382029.2382139

Download references

Acknowledgments

The work of Valerio Basile and Viviana Patti is partially funded by Progetto di Ateneo/CSP 2016 (S1618_L2_BOSC_01, Immigrants, Hate and Prejudice in Social Media).

Author information

Authors and Affiliations

Dipartimento di Informatica, University of Turin, Turin, Italy
Sohail Akhtar, Valerio Basile & Viviana Patti

Authors

Sohail Akhtar
View author publications
You can also search for this author in PubMed Google Scholar
Valerio Basile
View author publications
You can also search for this author in PubMed Google Scholar
Viviana Patti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valerio Basile .

Editor information

Editors and Affiliations

University of Calabria, Rende, Italy
Mario Alviano
University of Calabria, Rende, Italy
Gianluigi Greco
University of Calabria, Rende, Italy
Francesco Scarcello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Akhtar, S., Basile, V., Patti, V. (2019). A New Measure of Polarization in the Annotation of Hate Speech. In: Alviano, M., Greco, G., Scarcello, F. (eds) AI*IA 2019 – Advances in Artificial Intelligence. AI*IA 2019. Lecture Notes in Computer Science(), vol 11946. Springer, Cham. https://doi.org/10.1007/978-3-030-35166-3_41

Download citation

DOI: https://doi.org/10.1007/978-3-030-35166-3_41
Published: 12 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35165-6
Online ISBN: 978-3-030-35166-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics