Skip to main content

Text Processing of Telugu–English Code Mixed Languages

  • Conference paper
  • First Online:

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 3))

Abstract

In social media, code mixed data has increased, due to which there is an enormous development in noisy and inadequate multilingual content. Automation of noisy social media text is one of the existing research areas. This work focuses on extracting sentiments for movie related code mixed Telugu–English bilingual Roman script data. The raw data of size 11250 tweets were extracted using Twitter API. Initially, the data was cleaned and the annotated data was addressed for sentiment extraction through two approaches namely, lexicon based and machine learning based. In lexicon based approach, the language of each word was identified to back transliterate and extract sentiments. In machine learning based approach, sentiment classification was accomplished with uni-gram, bi-gram and skip-gram features using support vector machine classifier. Machine learning performed better in skip-gram with an accuracy of 76.33% as compared to lexicon based approach holding an accuracy of 66.82%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Barman U, Das A, Wagner J, Foster J (2014) Code mixing: A challenge for language identification in the language of social media. In: Proceedings of the first workshop on computational approaches to code switching, pp 13–23

    Google Scholar 

  2. Das A, Gambäck B (2014) Identifying languages at the word level in code-mixed Indian social media text. International Institute of Information Technology, Goa, India

    Google Scholar 

  3. Das A, Bandyopadhyay S (2010) Sentiwordnet for Indian languages. In: Proceedings of the eighth workshop on Asian language resources, pp 56–63

    Google Scholar 

  4. Garcia I, Stevenson V (2009) Reviews-Google translator toolkit. Multiling Comput Technol 20:6–22

    Google Scholar 

  5. Gella S, Bali K, Choudhury M (2010) ye word kis lang ka hai bhai? testing the limits of word level language identification. In: Proceedings of the eleventh international conference on natural language processing, pp 130–139

    Google Scholar 

  6. Ghosh S, Ghosh S, Das D (2017) Sentiment identification in code-mixed social media text. arXiv preprint arXiv:1707.01184

  7. Goldhahn D, Eckart T, Quasthoff U (2010) Building large Monolingual Dictionaries at the Leipzig Corpora Collection: from 100 to 200 languages. In: LREC, pp 31–43

    Google Scholar 

  8. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–177. ACM

    Google Scholar 

  9. Burnard, L (2000) Reference guide for the British National Corpus, world edition. Oxford University Computing Services, Oxford

    Google Scholar 

  10. Malgaonkar S, Khan A, Vichare A (2017) Mixed bilingual social media analytics: case study: live Twitter data. In: 2017 international conference on advances in computing, communications and informatics (ICACCI), pp 11407–1412. IEEE

    Google Scholar 

  11. Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schoelkopf B, Burges C, Smola A (ed) Advances in Kernel methods - support vector learning. MIT Press. http://research.microsoft.com/˜jplatt/smo.html, http://research.microsoft.com/˜jplatt/smo-book.ps.gz, http://research.microsoft.com/˜jplatt/smo-book.pdf

  12. Pravalika A, Oza V, Meghana NP, Kamath SS (2017) Domain-specific sentiment analysis approaches for code-mixed social network data. In: 2017 8th international conference on computing, communication and networking technologies (ICCCNT), pp 1–6. IEEE

    Google Scholar 

  13. Sarkar K (2018) JU KS@ SAIL CodeMixed-2017: sentiment analysis for Indian code mixed social media texts. arXiv preprint arXiv:1802.05737

  14. Sharma S, Srinivas PYKL, Balabantaray, RC (2015) Sentiment analysis of code-mix script. In: 2015 international conference on computing and network communications (CoCoNet), pp 530–534. IEEE

    Google Scholar 

  15. Sharma S, Srinivas PYKL, Balabantaray RC (2015) Text normalization of code mix and sentiment analysis. In: 2015 international conference on advances in computing, communications and informatics (ICACCI), pp 1468–1473. IEEE

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to S. Padmaja , Sasidhar Bandu or S. Sameen Fatima .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Padmaja, S., Bandu, S., Fatima, S.S. (2020). Text Processing of Telugu–English Code Mixed Languages. In: Satapathy, S.C., Raju, K.S., Shyamala, K., Krishna, D.R., Favorskaya, M.N. (eds) Advances in Decision Sciences, Image Processing, Security and Computer Vision. ICETE 2019. Learning and Analytics in Intelligent Systems, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-030-24322-7_19

Download citation

Publish with us

Policies and ethics