Skip to main content

A New Measure of Polarization in the Annotation of Hate Speech

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11946))

Abstract

The number of social media users is ever-increasing. Unfortunately, this has also resulted in the massive rise of uncensored online hate against vulnerable communities such as immigrants, LGBT and women. Current work on the automatic detection of various forms of hate speech (HS) typically employs supervised learning, requiring manually annotated data. The highly polarizing nature of the topics involved raises concerns about the quality of annotations these systems rely on, because not all the annotators are equally sensitive to different kinds of hate speech. We propose an approach to leverage the fine-grained knowledge expressed by individual annotators, before their subjectivity is averaged out by the gold standard creation process. This helps us to refine the quality of training sets for hate speech detection. We introduce a measure of polarization at the level of single instances in the data to manipulate the training set and reduce the impact of most polarizing text on the learning process.

We test our approach on three datasets, in English and Italian, annotated by experts and workers hired on a crowdsourcing platform. We classify instances of sexist, racist, and homophobic hate speech in tweets and show how our approach improves the prediction performance of a supervised classifier. Moreover, the proposed polarization measure helps towards the manual exploration of the individual instances of tweets in our datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/ZeerakW/hatespeech.

  2. 2.

    https://www.figure-eight.com/.

  3. 3.

    http://accept.arcigay.it/.

  4. 4.

    https://www.arcigay.it/en/.

  5. 5.

    This difference is due to having five annotators in total, therefore uneven group sizes.

  6. 6.

    The threshold values come from the observation of actual P-index values in the data.

  7. 7.

    In Italian, the English word gender is used as a borrowing only to refer to the modern gender theories.

References

  1. Abbondante, F.: Il ruolo dei social network nella lotta all’hate speech: un’analisi comparata fra l’esperienza statunitense e quella europea. Informatica e diritto 26(1–2), 41–68 (2017)

    Google Scholar 

  2. Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008). https://doi.org/10.1162/coli.07-034-R2

    Article  Google Scholar 

  3. Basile, A., Caselli, T., Nissim, M.: Predicting controversial news using Facebook reactions. In: Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017), Rome, Italy, 11–13 December 2017 (2017)

    Google Scholar 

  4. Basile, V., et al.: SemEval-2019 task 5: multilingual detection of hate speech against immigrants and women in Twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019). Association for Computational Linguistics (2019)

    Google Scholar 

  5. Basile, V., Novielli, N., Croce, D., Barbieri, F., Nissim, M., Patti, V.: Sentiment polarity classification at EVALITA: lessons learned and open challenges. IEEE Trans. Affect. Comput. (2018)

    Google Scholar 

  6. Beelen, K., Kanoulas, E., van de Velde, B.: Detecting controversies in online news media. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, pp. 1069–1072. ACM, New York (2017). https://doi.org/10.1145/3077136.3080723

  7. Benesch, S., Ruths, D., Dillon, K.P., Saleem, H.M., Wright, L.: Counterspeech on Twitter: a field study. In: A Report for Public Safety Canada under the Kanishka Project (2016)

    Google Scholar 

  8. Bosco, C., Dell’Orletta, F., Poletto, F., Sanguinetti, M., Tesconi, M.: Overview of the EVALITA 2018 hate speech detection task. In: Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2018) co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Turin, Italy, 12–13 December 2018. CEUR Workshop Proceedings, vol. 2263. CEUR-WS.org (2018)

    Google Scholar 

  9. Checco, A., Roitero, K., Maddalena, E., Mizzaro, S., Demartini, G.: Let’s agree to disagree: fixing agreement measures for crowdsourcing. In: Proceedings of the Fifth AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2017, Québec City, Québec, Canada, 23–26 October 2017, pp. 11–20. AAAI Press (2017)

    Google Scholar 

  10. Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: Proceedings of the 2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust, SOCIALCOM-PASSAT 2012, pp. 71–80. IEEE Computer Society, Washington, DC, USA (2012). https://doi.org/10.1109/SocialCom-PASSAT.2012.55

  11. Duggan, M.: Online harassment 2017. Technical report, Pew Research Center (2017)

    Google Scholar 

  12. Fersini, E., Nozza, D., Rosso, P.: Overview of the EVALITA 2018 task on automatic misogyny identification (AMI). In: Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, Final Workshop (EVALITA 2018) co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Turin, Italy, 12–13 December 2018. CEUR Workshop Proceedings, vol. 2263. CEUR-WS.org (2018)

    Google Scholar 

  13. Fersini, E., Rosso, P., Anzovino, M.: Overview of the task on automatic misogyny identification at IberEval 2018. In: IberEval@SEPLN (2018)

    Google Scholar 

  14. Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51(4), 85:1–85:30 (2018)

    Article  Google Scholar 

  15. Hovy, D., Berg-Kirkpatrick, T., Vaswani, A., Hovy, E.: Learning whom to trust with MACE. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1120–1130. Association for Computational Linguistics, Atlanta, June 2013. https://www.aclweb.org/anthology/N13-1132

  16. Izsák-Ndiaye, R.: Report of the special rapporteur on minority issues, Rita Iizsák: Comprehensive study of the human rights situation of Roma worldwide, with a particular focus on the phenomenon of anti-Gypsyism. Technical report, UN, Geneva, 11 May 2015. http://digitallibrary.un.org/record/797194. Submitted pursuant to Human Rights Council resolution 26/4

  17. Lai, M., Tambuscio, M., Patti, V., Ruffo, G., Rosso, P.: Extracting graph topological information and users’ opinion. In: Jones, G., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 112–118. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_10

    Chapter  Google Scholar 

  18. Mcpherson, M., Smith-Lovin, L., Cook, J.: Birds of a feather: homophily in social networks. Annu. Rev. Sociol. 27, 415–444 (2001). https://doi.org/10.3410/f.725356294.793504070

    Article  Google Scholar 

  19. Miller, C., et al.: From brussels to Brexit: islamophobia, xenophobia, racism and reports of hateful incidents on twitter. DEMOS (2016). https://www.demos.co.uk/wp-content/uploads/2016/07/From-Brussels-to-Brexit_-Islamophobia-Xenophobia-Racism-and-Reports-of-Hateful-Incidents-on-Twitter-Research-Prepared-for-Channel-4-Dispatches-%E2%80%98Racist-Britain%E2%80%99-.pdf

  20. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, pp. 145–153. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2016). https://doi.org/10.1145/2872427.2883062

  21. Nockleby, J, T.: Hate speech. In: Encyclopedia of the American Constitution, vol. 3, pp. 1277–1279. Macmillan (2000)

    Google Scholar 

  22. Poletto, F., Stranisci, M., Sanguinetti, M., Patti, V., Bosco, C.: Hate speech annotation: analysis of an Italian Twitter corpus. In: Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017), Rome, Italy, 11–13 December 2017. CEUR Workshop Proceedings, vol. 2006. CEUR-WS.org (2017)

    Google Scholar 

  23. Popescu, A.M., Pennacchiotti, M.: Detecting controversial events from Twitter. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 1873–1876. ACM, New York (2010). https://doi.org/10.1145/1871437.1871751

  24. Sanguinetti, M., Poletto, F., Bosco, C., Patti, V., Stranisci, M.: An Italian Twitter corpus of hate speech against immigrants. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018). European Language Resource Association (2018). http://aclweb.org/anthology/L18-1443

  25. Sheerman-Chase, T., Ong, E.J., Bowden, R.: Cultural factors in the regression of non-verbal communication perception, pp. 1242–1249, November 2011. https://doi.org/10.1109/ICCVW.2011.6130393

  26. Soberón, G., Aroyo, L., Welty, C., Inel, O., Lin, H., Overmeen, M.: Measuring crowd truth: disagreement metrics combined with worker behavior filters. In: Proceedings of the 1st International Conference on Crowdsourcing the Semantic Web, CrowdSem 2013, vol. 1030, pp. 45–58. CEUR-WS.org, Aachen (2013). http://dl.acm.org/citation.cfm?id=2874376.2874381

  27. Sood, S., Antin, J., Churchill, E.: Profanity use in online communities. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2012, pp. 1481–1490. ACM, New York (2012). https://doi.org/10.1145/2207676.2208610

  28. Van Hee, C., et al.: Detection and fine-grained classification of cyberbullying events. In: Proceedings of the 10th Recent Advances in Natural Language Processing (RANLP 2015), Hissar, Bulgaria, October 2015

    Google Scholar 

  29. Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, LSM 2012, pp. 19–26. Association for Computational Linguistics, Stroudsburg (2012). http://dl.acm.org/citation.cfm?id=2390374.2390377

  30. Waseem, Z.: Are you a racist or am i seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, pp. 138–142 (2016)

    Google Scholar 

  31. Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88–93. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/N16-2013. http://aclweb.org/anthology/N16-2013

  32. Xu, J.M., Jun, K.S., Zhu, X., Bellmore, A.: Learning from bullying traces in social media. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2012, pp. 656–666. Association for Computational Linguistics, Stroudsburg (2012). http://dl.acm.org/citation.cfm?id=2382029.2382139

Download references

Acknowledgments

The work of Valerio Basile and Viviana Patti is partially funded by Progetto di Ateneo/CSP 2016 (S1618_L2_BOSC_01, Immigrants, Hate and Prejudice in Social Media).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valerio Basile .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Akhtar, S., Basile, V., Patti, V. (2019). A New Measure of Polarization in the Annotation of Hate Speech. In: Alviano, M., Greco, G., Scarcello, F. (eds) AI*IA 2019 – Advances in Artificial Intelligence. AI*IA 2019. Lecture Notes in Computer Science(), vol 11946. Springer, Cham. https://doi.org/10.1007/978-3-030-35166-3_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-35166-3_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-35165-6

  • Online ISBN: 978-3-030-35166-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics