Skip to main content

The Thin Line Between Hate and Profanity

  • Conference paper
  • First Online:
AI 2019: Advances in Artificial Intelligence (AI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11919))

Included in the following conference series:

Abstract

Hate speech can be defined as a language used to demean people within a specific group. Hate speech often contains explicitly profane words, however, the presence of these words does not always mean that the text instance is hateful. In some cases, text instances with profane words are just offensive language and they do not target any specific group, and so cannot be classified as hate speech. In this work, we build on existing studies to find a better demarcation between hate speech and offensive language. Our main contribution is to introduce the use of typed dependency as new features in our feature set. This new feature enables us to consider the relationship between long distance words in a text instance, thereby provides more identifying information than single word-based features. We evaluate our approach using a dataset with the classes: hate, offensive and neither. Comparing our work with existing studies, our feature set is much smaller but we achieve better accuracy and show comparable results in further analysis. Our detailed analysis also showed instances missed by the lexical features that were correctly predicted by the proposed feature set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://alt.qcri.org/semeval2019/.

  2. 2.

    https://competitions.codalab.org/competitions/20011.

  3. 3.

    https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy.

  4. 4.

    https://www.facebook.com/communitystandards/hate_speech.

  5. 5.

    The authors have added ‘*’ for public viewing. These were not part of the original tweet.

  6. 6.

    https://spacy.io/.

  7. 7.

    https://spacy.io/api/annotation#dependency-parsing.

  8. 8.

    https://github.com/t-davidson/hate-speech-and-offensive-language.

  9. 9.

    https://hatebase.org/.

  10. 10.

    https://www.figure-eight.com/.

  11. 11.

    https://www.nltk.org/_modules/nltk/stem/wordnet.html.

References

  1. Alorainy, W., Burnap, P., Liu, H., Williams, M.: The enemy among us: detecting hate speech with threats based ’othering’ language embeddings, vol. 9, no. 4, pp. 1–26 (2018). http://arxiv.org/abs/1801.07495

  2. Burnap, P., Williams, M.: Hate speech, machine classification and statistical modelling of information flows on Twitter: interpretation and communication for policy decision making. Internet Policy Polit. 9999(9999), 1–18 (2015). http://orca.cf.ac.uk/id/eprint/65227%0A

    Google Scholar 

  3. Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(1), 11 (2016). https://doi.org/10.1140/epjds/s13688-016-0072-6

    Article  Google Scholar 

  4. Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing, pp. 71–80, September 2012. https://doi.org/10.1109/SocialCom-PASSAT.2012.55

  5. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language (2017). http://arxiv.org/abs/1703.04009

  6. Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51(4), 85:1–85:30 (2018). https://doi.org/10.1145/3232676

    Article  Google Scholar 

  7. Gaydhani, A., Doma, V., Kendre, S., Bhagwat, L.: Detecting hate speech and offensive language on Twitter using machine learning: An n-gram and TFIDF based approach. CoRR abs/1809.08651 (2018). http://arxiv.org/abs/1809.08651

  8. Greevy, E.: Automatic text categorisation of racist webpages harassment (August 2004). http://doras.dcu.ie/17275/

  9. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 168–177. ACM, New York (2004). https://doi.org/10.1145/1014052.1014073

  10. Kim, E., Sung, Y., Kang, H.: Brand followers’ retweeting behavior on Twitter: how brand relationships influence brand electronic word-of-mouth. Comput. Hum. Behav. 37, 18–25 (2014)

    Article  Google Scholar 

  11. Komninos, A., Manandhar, S.: Dependency based embeddings for sentence classification tasks. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1490–1500. Association for Computational Linguistics, San Diego, June 2016. https://www.aclweb.org/anthology/N16-1175

  12. Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Volume 2: Short Papers, pp. 302–308. Association for Computational Linguistics, Baltimore, June 2014. https://www.aclweb.org/anthology/P14-2050

  13. MacAvaney, S., Zeldes, A.: A deeper look into dependency-based word embeddings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 40–45. Association for Computational Linguistics, New Orleans, June 2018. https://www.aclweb.org/anthology/N18-4006

  14. Malmasi, S., Cahill, A.: Measuring feature diversity in native language identification (July 2015). https://doi.org/10.3115/v1/W15-0606

  15. Malmasi, S., Zampieri, M.: Detecting hate speech in social media (2017). http://arxiv.org/abs/1712.06427

  16. Malmasi, S., Zampieri, M.: Challenges in discriminating profanity from hate speech. J. Exp. Theor. Artif. Intell. 30(2), 187–202 (2018). https://doi.org/10.1080/0952813X.2017.1409284

    Article  Google Scholar 

  17. de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006). European Language Resources Association (ELRA), Genoa, May 2006. http://www.lrec-conf.org/proceedings/lrec2006/pdf/440_pdf.pdf

  18. Mehdad, Y., Tetreault, J.: Do characters abuse more than words?, pp. 299–303 (September 2016). https://doi.org/10.18653/v1/w16-3638

  19. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content pp. 145–153 (2017). https://doi.org/10.1145/2872427.2883062

  20. Rizoiu, M., Wang, T., Ferraro, G., Suominen, H.: Transfer learning for hate speech detection in social media. CoRR abs/1906.03829 (2019). http://arxiv.org/abs/1906.03829

  21. Robinson, D., Zhang, Z., Tepper, J.A.: Hate speech detection on Twitter: feature engineering v.s. feature selection. In: ESWC (2018)

    Google Scholar 

  22. Stephens-Davidowitz, S.I.: The effects of racial animus on a black presidential candidate: using google search data to find what surveys miss (June 2012). https://ssrn.com/abstract=2050673

  23. Tan, L.K.W., Na, J.C., Theng, Y.L., Chang, K.: Phrase-level sentiment polarity classification using rule-based typed dependencies and additional complex phrases consideration. J. Comput. Sci. Technol. 27(3), 650–666 (2012). https://doi.org/10.1007/s11390-012-1251-y

    Article  Google Scholar 

  24. Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the 2012 Workshop on Language in Social Media (LSM), pp. 19–26 (2012). http://info.yahoo.com/legal/us/yahoo/utos/utos-173.html

  25. Waseem, Z.: Are you a racist or am i seeing things? annotator influence on hate speech detection on Twitter pp. 138–142 (2016). https://doi.org/10.18653/v1/w16-5618

  26. Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on Twitter, pp. 88–93 (2016). https://doi.org/10.18653/v1/n16-2013

  27. Watanabe, H., Bouazizi, M., Ohtsuki, T.: Hate speech on Twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6, 13825–13835 (2018). https://doi.org/10.1109/ACCESS.2018.2806394

    Article  Google Scholar 

  28. Zhang, Z., Luo, L.: Hate speech detection: a solved problem? the challenging case of long tail on Twitter. CoRR abs/1803.03662 (2018). http://arxiv.org/abs/1803.03662

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kosisochukwu Judith Madukwe .

Editor information

Editors and Affiliations

Ethics declarations

This work contains examples of hateful and offensive instances. All examples were obtained from the dataset and do not represent the principles of the authors.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Madukwe, K.J., Gao, X. (2019). The Thin Line Between Hate and Profanity. In: Liu, J., Bailey, J. (eds) AI 2019: Advances in Artificial Intelligence. AI 2019. Lecture Notes in Computer Science(), vol 11919. Springer, Cham. https://doi.org/10.1007/978-3-030-35288-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-35288-2_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-35287-5

  • Online ISBN: 978-3-030-35288-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics