The Thin Line Between Hate and Profanity

Madukwe, Kosisochukwu Judith; Gao, Xiaoying

doi:10.1007/978-3-030-35288-2_28

Kosisochukwu Judith Madukwe¹⁰ &
Xiaoying Gao¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11919))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

2227 Accesses
9 Citations

Abstract

Hate speech can be defined as a language used to demean people within a specific group. Hate speech often contains explicitly profane words, however, the presence of these words does not always mean that the text instance is hateful. In some cases, text instances with profane words are just offensive language and they do not target any specific group, and so cannot be classified as hate speech. In this work, we build on existing studies to find a better demarcation between hate speech and offensive language. Our main contribution is to introduce the use of typed dependency as new features in our feature set. This new feature enables us to consider the relationship between long distance words in a text instance, thereby provides more identifying information than single word-based features. We evaluate our approach using a dataset with the classes: hate, offensive and neither. Comparing our work with existing studies, our feature set is much smaller but we achieve better accuracy and show comparable results in further analysis. Our detailed analysis also showed instances missed by the lexical features that were correctly predicted by the proposed feature set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://alt.qcri.org/semeval2019/.
2.
https://competitions.codalab.org/competitions/20011.
3.
https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy.
4.
https://www.facebook.com/communitystandards/hate_speech.
5.
The authors have added ‘*’ for public viewing. These were not part of the original tweet.
6.
https://spacy.io/.
7.
https://spacy.io/api/annotation#dependency-parsing.
8.
https://github.com/t-davidson/hate-speech-and-offensive-language.
9.
https://hatebase.org/.
10.
https://www.figure-eight.com/.
11.
https://www.nltk.org/_modules/nltk/stem/wordnet.html.

References

Alorainy, W., Burnap, P., Liu, H., Williams, M.: The enemy among us: detecting hate speech with threats based ’othering’ language embeddings, vol. 9, no. 4, pp. 1–26 (2018). http://arxiv.org/abs/1801.07495
Burnap, P., Williams, M.: Hate speech, machine classification and statistical modelling of information flows on Twitter: interpretation and communication for policy decision making. Internet Policy Polit. 9999(9999), 1–18 (2015). http://orca.cf.ac.uk/id/eprint/65227%0A
Google Scholar
Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(1), 11 (2016). https://doi.org/10.1140/epjds/s13688-016-0072-6
Article Google Scholar
Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing, pp. 71–80, September 2012. https://doi.org/10.1109/SocialCom-PASSAT.2012.55
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language (2017). http://arxiv.org/abs/1703.04009
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51(4), 85:1–85:30 (2018). https://doi.org/10.1145/3232676
Article Google Scholar
Gaydhani, A., Doma, V., Kendre, S., Bhagwat, L.: Detecting hate speech and offensive language on Twitter using machine learning: An n-gram and TFIDF based approach. CoRR abs/1809.08651 (2018). http://arxiv.org/abs/1809.08651
Greevy, E.: Automatic text categorisation of racist webpages harassment (August 2004). http://doras.dcu.ie/17275/
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 168–177. ACM, New York (2004). https://doi.org/10.1145/1014052.1014073
Kim, E., Sung, Y., Kang, H.: Brand followers’ retweeting behavior on Twitter: how brand relationships influence brand electronic word-of-mouth. Comput. Hum. Behav. 37, 18–25 (2014)
Article Google Scholar
Komninos, A., Manandhar, S.: Dependency based embeddings for sentence classification tasks. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1490–1500. Association for Computational Linguistics, San Diego, June 2016. https://www.aclweb.org/anthology/N16-1175
Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Volume 2: Short Papers, pp. 302–308. Association for Computational Linguistics, Baltimore, June 2014. https://www.aclweb.org/anthology/P14-2050
MacAvaney, S., Zeldes, A.: A deeper look into dependency-based word embeddings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 40–45. Association for Computational Linguistics, New Orleans, June 2018. https://www.aclweb.org/anthology/N18-4006
Malmasi, S., Cahill, A.: Measuring feature diversity in native language identification (July 2015). https://doi.org/10.3115/v1/W15-0606
Malmasi, S., Zampieri, M.: Detecting hate speech in social media (2017). http://arxiv.org/abs/1712.06427
Malmasi, S., Zampieri, M.: Challenges in discriminating profanity from hate speech. J. Exp. Theor. Artif. Intell. 30(2), 187–202 (2018). https://doi.org/10.1080/0952813X.2017.1409284
Article Google Scholar
de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006). European Language Resources Association (ELRA), Genoa, May 2006. http://www.lrec-conf.org/proceedings/lrec2006/pdf/440_pdf.pdf
Mehdad, Y., Tetreault, J.: Do characters abuse more than words?, pp. 299–303 (September 2016). https://doi.org/10.18653/v1/w16-3638
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content pp. 145–153 (2017). https://doi.org/10.1145/2872427.2883062
Rizoiu, M., Wang, T., Ferraro, G., Suominen, H.: Transfer learning for hate speech detection in social media. CoRR abs/1906.03829 (2019). http://arxiv.org/abs/1906.03829
Robinson, D., Zhang, Z., Tepper, J.A.: Hate speech detection on Twitter: feature engineering v.s. feature selection. In: ESWC (2018)
Google Scholar
Stephens-Davidowitz, S.I.: The effects of racial animus on a black presidential candidate: using google search data to find what surveys miss (June 2012). https://ssrn.com/abstract=2050673
Tan, L.K.W., Na, J.C., Theng, Y.L., Chang, K.: Phrase-level sentiment polarity classification using rule-based typed dependencies and additional complex phrases consideration. J. Comput. Sci. Technol. 27(3), 650–666 (2012). https://doi.org/10.1007/s11390-012-1251-y
Article Google Scholar
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the 2012 Workshop on Language in Social Media (LSM), pp. 19–26 (2012). http://info.yahoo.com/legal/us/yahoo/utos/utos-173.html
Waseem, Z.: Are you a racist or am i seeing things? annotator influence on hate speech detection on Twitter pp. 138–142 (2016). https://doi.org/10.18653/v1/w16-5618
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on Twitter, pp. 88–93 (2016). https://doi.org/10.18653/v1/n16-2013
Watanabe, H., Bouazizi, M., Ohtsuki, T.: Hate speech on Twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6, 13825–13835 (2018). https://doi.org/10.1109/ACCESS.2018.2806394
Article Google Scholar
Zhang, Z., Luo, L.: Hate speech detection: a solved problem? the challenging case of long tail on Twitter. CoRR abs/1803.03662 (2018). http://arxiv.org/abs/1803.03662

Download references

Author information

Authors and Affiliations

School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
Kosisochukwu Judith Madukwe & Xiaoying Gao

Authors

Kosisochukwu Judith Madukwe
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoying Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kosisochukwu Judith Madukwe .

Editor information

Editors and Affiliations

University of South Australia, Adelaide, SA, Australia
Jixue Liu
The University of Melbourne, Melbourne, VIC, Australia
James Bailey

Ethics declarations

This work contains examples of hateful and offensive instances. All examples were obtained from the dataset and do not represent the principles of the authors.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Madukwe, K.J., Gao, X. (2019). The Thin Line Between Hate and Profanity. In: Liu, J., Bailey, J. (eds) AI 2019: Advances in Artificial Intelligence. AI 2019. Lecture Notes in Computer Science(), vol 11919. Springer, Cham. https://doi.org/10.1007/978-3-030-35288-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-35288-2_28
Published: 25 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35287-5
Online ISBN: 978-3-030-35288-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics