Abstract
Hate speech can be defined as a language used to demean people within a specific group. Hate speech often contains explicitly profane words, however, the presence of these words does not always mean that the text instance is hateful. In some cases, text instances with profane words are just offensive language and they do not target any specific group, and so cannot be classified as hate speech. In this work, we build on existing studies to find a better demarcation between hate speech and offensive language. Our main contribution is to introduce the use of typed dependency as new features in our feature set. This new feature enables us to consider the relationship between long distance words in a text instance, thereby provides more identifying information than single word-based features. We evaluate our approach using a dataset with the classes: hate, offensive and neither. Comparing our work with existing studies, our feature set is much smaller but we achieve better accuracy and show comparable results in further analysis. Our detailed analysis also showed instances missed by the lexical features that were correctly predicted by the proposed feature set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
The authors have added ‘*’ for public viewing. These were not part of the original tweet.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
References
Alorainy, W., Burnap, P., Liu, H., Williams, M.: The enemy among us: detecting hate speech with threats based ’othering’ language embeddings, vol. 9, no. 4, pp. 1–26 (2018). http://arxiv.org/abs/1801.07495
Burnap, P., Williams, M.: Hate speech, machine classification and statistical modelling of information flows on Twitter: interpretation and communication for policy decision making. Internet Policy Polit. 9999(9999), 1–18 (2015). http://orca.cf.ac.uk/id/eprint/65227%0A
Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(1), 11 (2016). https://doi.org/10.1140/epjds/s13688-016-0072-6
Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing, pp. 71–80, September 2012. https://doi.org/10.1109/SocialCom-PASSAT.2012.55
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language (2017). http://arxiv.org/abs/1703.04009
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51(4), 85:1–85:30 (2018). https://doi.org/10.1145/3232676
Gaydhani, A., Doma, V., Kendre, S., Bhagwat, L.: Detecting hate speech and offensive language on Twitter using machine learning: An n-gram and TFIDF based approach. CoRR abs/1809.08651 (2018). http://arxiv.org/abs/1809.08651
Greevy, E.: Automatic text categorisation of racist webpages harassment (August 2004). http://doras.dcu.ie/17275/
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 168–177. ACM, New York (2004). https://doi.org/10.1145/1014052.1014073
Kim, E., Sung, Y., Kang, H.: Brand followers’ retweeting behavior on Twitter: how brand relationships influence brand electronic word-of-mouth. Comput. Hum. Behav. 37, 18–25 (2014)
Komninos, A., Manandhar, S.: Dependency based embeddings for sentence classification tasks. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1490–1500. Association for Computational Linguistics, San Diego, June 2016. https://www.aclweb.org/anthology/N16-1175
Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Volume 2: Short Papers, pp. 302–308. Association for Computational Linguistics, Baltimore, June 2014. https://www.aclweb.org/anthology/P14-2050
MacAvaney, S., Zeldes, A.: A deeper look into dependency-based word embeddings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 40–45. Association for Computational Linguistics, New Orleans, June 2018. https://www.aclweb.org/anthology/N18-4006
Malmasi, S., Cahill, A.: Measuring feature diversity in native language identification (July 2015). https://doi.org/10.3115/v1/W15-0606
Malmasi, S., Zampieri, M.: Detecting hate speech in social media (2017). http://arxiv.org/abs/1712.06427
Malmasi, S., Zampieri, M.: Challenges in discriminating profanity from hate speech. J. Exp. Theor. Artif. Intell. 30(2), 187–202 (2018). https://doi.org/10.1080/0952813X.2017.1409284
de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006). European Language Resources Association (ELRA), Genoa, May 2006. http://www.lrec-conf.org/proceedings/lrec2006/pdf/440_pdf.pdf
Mehdad, Y., Tetreault, J.: Do characters abuse more than words?, pp. 299–303 (September 2016). https://doi.org/10.18653/v1/w16-3638
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content pp. 145–153 (2017). https://doi.org/10.1145/2872427.2883062
Rizoiu, M., Wang, T., Ferraro, G., Suominen, H.: Transfer learning for hate speech detection in social media. CoRR abs/1906.03829 (2019). http://arxiv.org/abs/1906.03829
Robinson, D., Zhang, Z., Tepper, J.A.: Hate speech detection on Twitter: feature engineering v.s. feature selection. In: ESWC (2018)
Stephens-Davidowitz, S.I.: The effects of racial animus on a black presidential candidate: using google search data to find what surveys miss (June 2012). https://ssrn.com/abstract=2050673
Tan, L.K.W., Na, J.C., Theng, Y.L., Chang, K.: Phrase-level sentiment polarity classification using rule-based typed dependencies and additional complex phrases consideration. J. Comput. Sci. Technol. 27(3), 650–666 (2012). https://doi.org/10.1007/s11390-012-1251-y
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the 2012 Workshop on Language in Social Media (LSM), pp. 19–26 (2012). http://info.yahoo.com/legal/us/yahoo/utos/utos-173.html
Waseem, Z.: Are you a racist or am i seeing things? annotator influence on hate speech detection on Twitter pp. 138–142 (2016). https://doi.org/10.18653/v1/w16-5618
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on Twitter, pp. 88–93 (2016). https://doi.org/10.18653/v1/n16-2013
Watanabe, H., Bouazizi, M., Ohtsuki, T.: Hate speech on Twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6, 13825–13835 (2018). https://doi.org/10.1109/ACCESS.2018.2806394
Zhang, Z., Luo, L.: Hate speech detection: a solved problem? the challenging case of long tail on Twitter. CoRR abs/1803.03662 (2018). http://arxiv.org/abs/1803.03662
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
This work contains examples of hateful and offensive instances. All examples were obtained from the dataset and do not represent the principles of the authors.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Madukwe, K.J., Gao, X. (2019). The Thin Line Between Hate and Profanity. In: Liu, J., Bailey, J. (eds) AI 2019: Advances in Artificial Intelligence. AI 2019. Lecture Notes in Computer Science(), vol 11919. Springer, Cham. https://doi.org/10.1007/978-3-030-35288-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-35288-2_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35287-5
Online ISBN: 978-3-030-35288-2
eBook Packages: Computer ScienceComputer Science (R0)