Abstract
Hate speech may take different forms in online social media. Most of the investigations in the literature are focused on detecting abusive language in discussions about ethnicity, religion, gender identity and sexual orientation. In this paper, we address the problem of automatic detection and categorization of misogynous language in online social media. The main contribution of this paper is two-fold: (1) a corpus of misogynous tweets, labelled from different perspective and (2) an exploratory investigations on NLP features and ML models for detecting and classifying misogynistic language.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The dataset has been made available for the IberEval-2018 (https://amiibereval2018.wordpress.com/) and the EvalIta-2018 (https://amievalita2018.wordpress.com/) challenges.
- 3.
- 4.
We employed the machine learning package scikit-learn: http://scikit-learn.org/stable/supervised_learning.html.
- 5.
When training the considered classifiers, we didn’t apply any feature filtering or parameter tuning.
- 6.
Results obtained with All Features are statistically significant (Student t-test with p-value equal to 0.05).
References
Argamon, S., Whitelaw, C., Chase, P., Hota, S.R., Garg, N., Levitan, S.: Stylistic text classification using functional lexical features: research articles. J. Am. Soc. Inf. Sci. Technol. 58(6), 802–822 (2007)
Byrt, T., Bishop, J., Carlin, J.B.: Bias, prevalence and kappa. J. Clin. Epidemiol. 46(5), 423–429 (1993)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Fulper, R., Ciampaglia, G.L., Ferrara, E., Ahn, Y., Flammini, A., Menczer, F., Lewis, B., Rowe, K.: Misogynistic language on Twitter and sexual violence. In: Proceedings of the ACM Web Science Workshop on Computational Approaches to Social Modeling (ChASM) (2014)
HaCohen-Kerner, Y., Beck, H., Yehudai, E., Rosenstein, M., Mughaz, D.: Cuisine: classification using stylistic feature sets and/or name-based feature sets. J. Assoc. Inf. Sci. Technol. 61(8), 1644–1657 (2010)
HaCohen-kerner, Y., Ido, Z., Ya’akobov, R.: Stance classification of tweets using skip char Ngrams. In: Altun, Y., Das, K., Mielikäinen, T., Malerba, D., Stefanowski, J., Read, J., Žitnik, M., Ceci, M., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10536, pp. 266–278. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71273-4_22
Hewitt, S., Tiropanis, T., Bokhove, C.: The problem of identifying misogynist language on Twitter (and other online social spaces). In: Proceedings of the 8th ACM Conference on Web Science, pp. 333–335. ACM, May 2016
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the 12th International AAAI Conference on Web and Social Media (2017)
Megarry, J.: Online incivility or sexual harassment? Conceptualising women’s experiences in the digital age. In: Women’s Studies International Forum, vol. 47, pp. 46–55. Pergamon (2014)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196, January 2014
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)
Parker, R.I., Vannest, K.J., Davis, J.L.: Effect size in single-case research: a review of nine nonoverlap techniques. Behav. Modif. 35(4), 303–322 (2011)
Poland, B.: Haters: Harassment, Abuse, and Violence Online. University of Nebraska Press, Lincoln (2016)
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (2010)
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, Valencia, Spain, pp. 1–10 (2017)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In: SRW@ HLT-NAACL, pp. 88–93 (2016)
Acknowledgements
The work of the third author was partially funded by the Spanish MINECO under the research project SomEMBED (TIN2015-71147-C2-1-P).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Anzovino, M., Fersini, E., Rosso, P. (2018). Automatic Identification and Classification of Misogynistic Language on Twitter. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-91947-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91946-1
Online ISBN: 978-3-319-91947-8
eBook Packages: Computer ScienceComputer Science (R0)