Text Processing of Telugu–English Code Mixed Languages

Padmaja, S.; Bandu, Sasidhar; Fatima, S. Sameen

doi:10.1007/978-3-030-24322-7_19

Text Processing of Telugu–English Code Mixed Languages

S. Padmaja⁹,
Sasidhar Bandu¹⁰ &
S. Sameen Fatima¹¹

Conference paper
First Online: 13 July 2019

824 Accesses
6 Citations

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 3))

Abstract

In social media, code mixed data has increased, due to which there is an enormous development in noisy and inadequate multilingual content. Automation of noisy social media text is one of the existing research areas. This work focuses on extracting sentiments for movie related code mixed Telugu–English bilingual Roman script data. The raw data of size 11250 tweets were extracted using Twitter API. Initially, the data was cleaned and the annotated data was addressed for sentiment extraction through two approaches namely, lexicon based and machine learning based. In lexicon based approach, the language of each word was identified to back transliterate and extract sentiments. In machine learning based approach, sentiment classification was accomplished with uni-gram, bi-gram and skip-gram features using support vector machine classifier. Machine learning performed better in skip-gram with an accuracy of 76.33% as compared to lexicon based approach holding an accuracy of 66.82%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Barman U, Das A, Wagner J, Foster J (2014) Code mixing: A challenge for language identification in the language of social media. In: Proceedings of the first workshop on computational approaches to code switching, pp 13–23
Google Scholar
Das A, Gambäck B (2014) Identifying languages at the word level in code-mixed Indian social media text. International Institute of Information Technology, Goa, India
Google Scholar
Das A, Bandyopadhyay S (2010) Sentiwordnet for Indian languages. In: Proceedings of the eighth workshop on Asian language resources, pp 56–63
Google Scholar
Garcia I, Stevenson V (2009) Reviews-Google translator toolkit. Multiling Comput Technol 20:6–22
Google Scholar
Gella S, Bali K, Choudhury M (2010) ye word kis lang ka hai bhai? testing the limits of word level language identification. In: Proceedings of the eleventh international conference on natural language processing, pp 130–139
Google Scholar
Ghosh S, Ghosh S, Das D (2017) Sentiment identification in code-mixed social media text. arXiv preprint arXiv:1707.01184
Goldhahn D, Eckart T, Quasthoff U (2010) Building large Monolingual Dictionaries at the Leipzig Corpora Collection: from 100 to 200 languages. In: LREC, pp 31–43
Google Scholar
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–177. ACM
Google Scholar
Burnard, L (2000) Reference guide for the British National Corpus, world edition. Oxford University Computing Services, Oxford
Google Scholar
Malgaonkar S, Khan A, Vichare A (2017) Mixed bilingual social media analytics: case study: live Twitter data. In: 2017 international conference on advances in computing, communications and informatics (ICACCI), pp 11407–1412. IEEE
Google Scholar
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schoelkopf B, Burges C, Smola A (ed) Advances in Kernel methods - support vector learning. MIT Press. http://research.microsoft.com/˜jplatt/smo.html, http://research.microsoft.com/˜jplatt/smo-book.ps.gz, http://research.microsoft.com/˜jplatt/smo-book.pdf
Pravalika A, Oza V, Meghana NP, Kamath SS (2017) Domain-specific sentiment analysis approaches for code-mixed social network data. In: 2017 8th international conference on computing, communication and networking technologies (ICCCNT), pp 1–6. IEEE
Google Scholar
Sarkar K (2018) JU KS@ SAIL CodeMixed-2017: sentiment analysis for Indian code mixed social media texts. arXiv preprint arXiv:1802.05737
Sharma S, Srinivas PYKL, Balabantaray, RC (2015) Sentiment analysis of code-mix script. In: 2015 international conference on computing and network communications (CoCoNet), pp 530–534. IEEE
Google Scholar
Sharma S, Srinivas PYKL, Balabantaray RC (2015) Text normalization of code mix and sentiment analysis. In: 2015 international conference on advances in computing, communications and informatics (ICACCI), pp 1468–1473. IEEE
Google Scholar

Download references

Author information

Authors and Affiliations

Keshav Memorial Institute of Technology, Hyderabad, India
S. Padmaja
Prince Sattam Bin Abdul Aziz University, Al-Kharj, Saudi Arabia
Sasidhar Bandu
Osmania University, Hyderabad, India
S. Sameen Fatima

Authors

S. Padmaja
View author publications
You can also search for this author in PubMed Google Scholar
Sasidhar Bandu
View author publications
You can also search for this author in PubMed Google Scholar
S. Sameen Fatima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to S. Padmaja , Sasidhar Bandu or S. Sameen Fatima .

Editor information

Editors and Affiliations

School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT) Deemed to be University, Bhubaneswar, Odisha, India
Suresh Chandra Satapathy
Department of CSE, CMR Technical Campus, Hyderabad, Telangana, India
K. Srujan Raju
Department of CSE, University College of Engineering, Osmania University, Hyderabad, Telangana, India
K. Shyamala
Department of ECE, University College of Engineering, Osmania University, Hyderabad, Telangana, India
D. Rama Krishna
Institute of Informatics and Telecommunications, Reshetnev Siberian State University of Science and Technology, Krasnoyarsk, Russia
Margarita N. Favorskaya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Padmaja, S., Bandu, S., Fatima, S.S. (2020). Text Processing of Telugu–English Code Mixed Languages. In: Satapathy, S.C., Raju, K.S., Shyamala, K., Krishna, D.R., Favorskaya, M.N. (eds) Advances in Decision Sciences, Image Processing, Security and Computer Vision. ICETE 2019. Learning and Analytics in Intelligent Systems, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-030-24322-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-24322-7_19
Published: 13 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24321-0
Online ISBN: 978-3-030-24322-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics