Transfer Learning for Detecting Hateful Sentiments in Code Switched Language

Rajput, Kshitij; Kapoor, Raghav; Mathur, Puneet; Hitkul; Kumaraguru, Ponnurangam; Shah, Rajiv Ratn

doi:10.1007/978-981-15-1216-2_7

Kshitij Rajput^8,9,
Raghav Kapoor^8,9,
Puneet Mathur⁸,
Hitkul⁸,
Ponnurangam Kumaraguru⁸ &
…
Rajiv Ratn Shah⁸

Part of the book series: Algorithms for Intelligent Systems ((AIS))

2329 Accesses
6 Citations

Abstract

With the phenomenal increase in the penetration of social media in linguistically diverse demographic regions, conversations have become more casual and multilingual. The rise of informal code-switched multilingual languages makes it tough for automated systems to monitor instances of hate speech, which are further intelligently disguised through the use of spelling variations, code-mixing, homophones, homonyms, and the absence of sophisticated grammar rules. Machine transliteration can be employed for converting the code-switched text into a singular script but poses the challenge of the semantical breakdown of the text. To overcome this drawback, this chapter investigates the application of transfer learning. The CNN-based neural models are trained on a large dataset of hateful tweets in a chosen primary language, followed by retraining on the small transliterated dataset in the same language. Since transfer learning can act as an effective strategy to reuse already learned features in learning a specialized task through cross-domain knowledge transfer, hate speech classification on a large English corpus can act as source tasks to help in obtaining pre-trained deep learning classifiers for the target task of classifying tweets translated in English from other code-switched languages. Effects of the different types of popular word embeddings and multiple supervised inputs such as the LIWC, the presence of profanities, and sentiment are carefully studied to derive the most representative combination of input settings that can help achieve state-of-the-art hate speech detection from code-switched multilingual short texts on Twitter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hate speech recognition in multilingual text: hinglish documents

Article 13 March 2023

A Language-Free Hate Speech Identification on Code-mixed Conversational Tweets

UHated: hate speech detection in Urdu language using transfer learning

Article 24 February 2023

References

Agarwal, Apoorv, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca Passonneau. 2011. Sentiment analysis of Twitter data. In Proceedings of the Workshop on Language in Social Media (LSM 2011), 30–38
Google Scholar
Ayyar, Meghna, Puneet Mathur, Rajiv Ratn Shah, and Shree G. Sharma. 2018. Harnessing AI for kidney Glomeruli classification. In 2018 IEEE International Symposium on Multimedia (ISM), 17–20. New York: IEEE
Google Scholar
Badjatiya, Pinkesh, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion, 759–760. International World Wide Web Conferences Steering Committee
Google Scholar
Bali, Kalika, Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas. 2014. I am borrowing ya mixing? An analysis of English-Hindi code mixing in Facebook. In Proceedings of the First Workshop on Computational Approaches to Code Switching, 116–126
Google Scholar
Bohra, Aditya, Deepanshu Vijay, Vinay Singh, Syed Sarfaraz Akhtar, and Manish Shrivastava. 2018. A dataset of Hindi-English code-mixed social media text for hate speech detection. In Proceedings of the Second Workshop on Computational Modeling of Peoples Opinions, Personality, and Emotions in Social Media, 36–41
Google Scholar
Bojanowski, Piotr, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5: 135–146
Article Google Scholar
Cavnar, William B., John M. Trenkle, et al. 1994. N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, vol. 161175. Citeseer
Google Scholar
Chowdhury, Arijit Ghosh, Ramit Sawhney, Puneet Mathur, Debanjan Mahata, and Rajiv Ratn Shah. 2019. Speak up, fight back! detection of social media disclosures of sexual harassment. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, 136–146
Google Scholar
Das, Amitava, and Björn Gambäck. 2014. Identifying languages at the word level in code-mixed indian social media text. In Proceedings of the 11th International Conference on Natural Language Processing, 378–387
Google Scholar
Davidson, Thomas, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Eleventh International AAAI Conference on Web and Social Media
Google Scholar
Godin, Fréderic, Baptist Vandersmissen, Wesley De Neve, and Rik Van de Walle. 2015. Multimedia Lab @ ACL W-NUT NER shared task: Named entity recognition for twitter microposts using distributed word representations. In Proceedings of the Workshop on Noisy User-Generated Text, 146–153
Google Scholar
Gupta, Deepak, Ankit Lamba, Asif Ekbal, and Pushpak Bhattacharyya. 2016. Opinion mining in a code-mixed environment: A case study with government portals. In Proceedings of the 13th International Conference on Natural Language Processing, 249–258
Google Scholar
Gupta, Deepak, Shubham Tripathi, Asif Ekbal, and Pushpak Bhattacharyya. 2017. SMPOST: Parts of speech tagger for code-mixed Indic social media text. arXiv preprint arXiv:1702.00167
Haccianella, S., A. Esuli, and F. Sebastiani. 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the Seventh Conference on International Language Resources and Evaluation
Google Scholar
Huffman, Stephen. 1995. Acquaintance: Language-independent document categorization by n-grams. Technical report, Department of Defense Fort George G Meade MD
Google Scholar
Jain, Roopal, Ramit Sawhney, and Puneet Mathur. 2018. Feature selection for cryotherapy and immunotherapy treatment methods based on gravitational search algorithm. In 2018 International Conference on Current Trends Towards Converging Technologies (ICCTCT), 1–7. New York: IEEE
Google Scholar
Jangid, Hitkul, Shivangi Singhal, Rajiv Ratn Shah, and Roger Zimmermann. 2018. Aspect-based financial sentiment analysis using deep learning. In Companion of the The Web Conference 2018 on The Web Conference 2018, 1961–1966. International World Wide Web Conferences Steering Committee
Google Scholar
Jhanwar, Madan Gopal, and Arpita Das. 2018. An ensemble model for sentiment analysis of Hindi-English code-mixed data. arXiv preprint arXiv:1806.04450
Kapoor, Raghav, Yaman Kumar, Kshitij Rajput, Rajiv Ratn Shah, Ponnurangam Kumaraguru, and Roger Zimmermann. 2018. Mind your language: Abuse and offense detection for code-switched languages. arXiv preprint arXiv:1809.08652
Kingma, Diederik P., and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Lafferty, John, Andrew McCallum, and Fernando C.N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data
Google Scholar
Lodhi, Huma, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins. 2002. Text classification using string kernels. Journal of Machine Learning Research 2: 419–444
Google Scholar
Maas, Andrew L., Awni Y. Hannun, and Andrew Y. Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of ICML, vol. 30, 3
Google Scholar
Mahata, Debanjan, Jasper Friedrichs, Rajiv Ratn Shah, et al. 2018. # phramacovigilance-exploring deep learning techniques for identifying mentions of medication intake from twitter. arXiv preprint arXiv:1805.06375
Mahata, Debanjan, Haimin Zhang, Karan Uppal, Yaman Kumar, Rajiv Shah, Simra Shahid, Laiba Mehnaz, and Sarthak Anand. 2019. MIDAS at SemEval-2019 task 6: Identifying offensive posts and targeted offense from twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation, 683–690
Google Scholar
Mathur, Puneet, Meghna Ayyar, Rajiv Ratn Shah, and Sg Sharma. 2019. Exploring classification of histological disease biomarkers from renal biopsy images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 81–90. New York: IEEE
Google Scholar
Mathur, Puneet, Ramit Sawhney, Meghna Ayyar, and Rajiv Shah. 2018. Did you offend me? classification of offensive tweets in Hinglish language. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), 138–148
Google Scholar
Mathur, Puneet, Rajiv Shah, Ramit Sawhney, and Debanjan Mahata. 2018. Detecting offensive tweets in Hindi-English code-switched language. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media, 18–26
Google Scholar
Mave, Deepthi, Suraj Maharjan, and Thamar Solorio. 2018. Language identification and analysis of code-switched social media text. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching, 51–61
Google Scholar
Meghawat, Mayank, Satyendra Yadav, Debanjan Mahata, Yifang Yin, Rajiv Ratn Shah, and Roger Zimmermann. 2018. A multimodal approach to predict social media popularity. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 190–195. New York: IEEE
Google Scholar
Mishra, Rohan, Pradyumn Prakhar Sinha, Ramit Sawhney, Debanjan Mahata, Puneet Mathur, and Rajiv Ratn Shah. 2019. SNAP-BATNET: Cascading author profiling and social network graphs for suicide ideation detection on social media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, 147–156
Google Scholar
Mohammad, Saif. 2012. Portable features for classifying emotional text. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 587–591. Association for Computational Linguistics
Google Scholar
Pan, Sinno Jialin and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22 (10): 1345–1359
Article Google Scholar
Pang, Bo, Lillian Lee, et al. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2): 1–135
Google Scholar
Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12: 2825–2830
Google Scholar
Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543
Google Scholar
Prabhu, Ameya, Aditya Joshi, Manish Shrivastava, and Vasudeva Varma. 2016. Towards sub-word level compositions for sentiment analysis of Hindi-English code mixed text. arXiv preprint arXiv:1611.00472
Purver, Matthew, and Stuart Battersby. 2012. Experimenting with distant supervision for emotion classification. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 482–491. Association for Computational Linguistics
Google Scholar
Rao, Pattabhi R.K., and Sobha Lalitha Devi. 2016. CMEE-IL: Code mix entity extraction in Indian languages from social media text@ fire 2016-an overview. In FIRE (Working Notes), 289–295
Google Scholar
Sawhney, Ramit, Prachi Manchanda, Puneet Mathur, Rajiv Shah, and Raj Singh. 2018. Exploring and learning suicidal ideation connotations on social media with deep learning. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 167–175
Google Scholar
Sawhney, Ramit, Prachi Manchanda, Raj Singh, and Swati Aggarwal. 2018. A computational approach to feature extraction for identification of suicidal ideation in tweets. In Proceedings of ACL 2018, Student Research Workshop, 91–98
Google Scholar
Sawhney, Ramit, Puneet Mathur, and Ravi Shankar. 2018. A firefly algorithm based wrapper-penalty feature selection method for cancer diagnosis. In International Conference on Computational Science and Its Applications, 438–449. Berlin: Springer
Google Scholar
Sawhney, Ramit, Ravi Shankar, and Roopal Jain. 2018. A comparative study of transfer functions in binary evolutionary algorithms for single objective optimization. In International Symposium on Distributed Computing and Artificial Intelligence, 27–35. Berlin: Springer
Google Scholar
Shah, Rajiv Ratn. 2016. Multimodal analysis of user-generated content in support of social media applications. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 423–426. New York: ACM
Google Scholar
Shah, Rajiv Ratn, Debanjan Mahata, Vishal Choudhary, and Rajiv Bajpai. 2018. Multimodal semantics and affective computing from multimedia content. In Intelligent Multidimensional Data and Image Processing, 359–382. IGI Global
Google Scholar
Shah, Rajiv Ratn, Anwar Dilawar Shaikh, Yi Yu, Wenjing Geng, Roger Zimmermann, and Gangshan Wu. 2015. Eventbuilder: Real-time multimedia event summarization by visualizing social media. In Proceedings of the 23rd ACM International Conference on Multimedia, 185–188. New York: ACM
Google Scholar
Shah, Rajiv Ratn, Yi Yu, Anwar Dilawar Shaikh, Suhua Tang, and Roger Zimmermann. ATLAS: automatic temporal segmentation and annotation of lecture videos based on modelling transition time. In Proceedings of the 22nd ACM International Conference on Multimedia, 209–212. New York: ACM
Google Scholar
Sharma, Shashank, P.Y.K.L. Srinivas, and Rakesh Chandra Balabantaray. 2015. Text normalization of code mix and sentiment analysis. In 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 1468–1473. New York: IEEE
Google Scholar
Singh, Kushagra, Indira Sen, and Ponnurangam Kumaraguru. 2018. Language identification and named entity recognition in Hinglish code mixed tweets. In Proceedings of ACL 2018, Student Research Workshop, 52–58
Google Scholar
Solorio, Thamar, Melissa Sherman, Yang Liu, Lisa M. Bedore, Elisabeth D. Peña, and Aquiles Iglesias. 2011. Analyzing language samples of Spanish–English bilingual children for the automated prediction of language dominance. Natural Language Engineering, 17(3): 367–395
Google Scholar
Vyas, Yogarshi, Spandana Gella, Jatin Sharma, Kalika Bali, and Monojit Choudhury. 2014. POS tagging of English-Hindi code-mixed social media content. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 974–979
Google Scholar
Wang, Sida, and Christopher D. Manning. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, 90–94. Association for Computational Linguistics
Google Scholar
Warner, William, and Julia Hirschberg. 2012. Detecting hate speech on the world wide web. In Proceedings of the Second Workshop on Language in Social Media, 19–26. Association for Computational Linguistics
Google Scholar
Zhang, Haimin, Debanjan Mahata, Simra Shahid, Laiba Mehnaz, Sarthak Anand, Yaman Singla, Rajiv Ratn Shah, and Karan Uppal. 2019. Identifying offensive posts and targeted offense from twitter. arXiv preprint arXiv:1904.09072

Download references

Author information

Authors and Affiliations

MIDAS Lab, IIIT Delhi, New Delhi, India
Kshitij Rajput, Raghav Kapoor, Puneet Mathur, Hitkul, Ponnurangam Kumaraguru & Rajiv Ratn Shah
NSIT, New Delhi, India
Kshitij Rajput & Raghav Kapoor

Authors

Kshitij Rajput
View author publications
You can also search for this author in PubMed Google Scholar
Raghav Kapoor
View author publications
You can also search for this author in PubMed Google Scholar
Puneet Mathur
View author publications
You can also search for this author in PubMed Google Scholar
Hitkul
View author publications
You can also search for this author in PubMed Google Scholar
Ponnurangam Kumaraguru
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Ratn Shah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajiv Ratn Shah .

Editor information

Editors and Affiliations

Indian Institute of Information Technology Kota (IIIT-Kota), Jaipur, Rajasthan, India
Basant Agarwal
Faculty of Science and Engineering, School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, QLD, Australia
Richi Nayak
Department of Computer Science and Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, India
Namita Mittal
Department of Computer Science and Engineering, SOA University, Bhubaneswar, Odisha, India
Srikanta Patnaik

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rajput, K., Kapoor, R., Mathur, P., Hitkul, Kumaraguru, P., Shah, R.R. (2020). Transfer Learning for Detecting Hateful Sentiments in Code Switched Language. In: Agarwal, B., Nayak, R., Mittal, N., Patnaik, S. (eds) Deep Learning-Based Approaches for Sentiment Analysis. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-1216-2_7

Download citation

DOI: https://doi.org/10.1007/978-981-15-1216-2_7
Published: 25 January 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1215-5
Online ISBN: 978-981-15-1216-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Transfer Learning for Detecting Hateful Sentiments in Code Switched Language

Abstract

Access this chapter

Similar content being viewed by others

Hate speech recognition in multilingual text: hinglish documents

A Language-Free Hate Speech Identification on Code-mixed Conversational Tweets

UHated: hate speech detection in Urdu language using transfer learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Transfer Learning for Detecting Hateful Sentiments in Code Switched Language

Abstract

Access this chapter

Similar content being viewed by others

Hate speech recognition in multilingual text: hinglish documents

A Language-Free Hate Speech Identification on Code-mixed Conversational Tweets

UHated: hate speech detection in Urdu language using transfer learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation