Skip to main content

Tag Me a Label with Multi-arm: Active Learning for Telugu Sentiment Analysis

  • Conference paper
  • First Online:
Big Data Analytics and Knowledge Discovery (DaWaK 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10440))

Included in the following conference series:

Abstract

Sentiment Analysis is one of the most active research areas in natural language processing and an extensively studied problem in data mining, web mining and text mining for English language. With the proliferation of social media these days, data is widely increasing in regional languages along with English. Telugu is one such regional language with abundant data available in social media, but it’s hard to find a labeled training set as human annotation is time-consuming and cost-ineffective. To address this issue, in this paper the practicality of active learning for Telugu sentiment analysis is investigated. We built a hybrid approach by combining different query selection strategy frameworks to increase more accurate training data instances with limited labeled data. Using a set of classifiers like SVM, XGBoost, and Gradient Boosted Trees (GBT), we achieved promising results with minimal error rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.ethnologue.com/statistics/size.

  2. 2.

    https://en.wikipedia.org/wiki/Telugu_language.

  3. 3.

    https://en.wikipedia.org/wiki/WX_notation.

References

  1. Settles, B.: Active learning literature survey. Technical report (2010)

    Google Scholar 

  2. Lewis, D.D.: A sequential algorithm for training text classifiers: corrigendum and additional data, pp. 13–19 (1995)

    Google Scholar 

  3. Kolar Rajagopal, A., Subramanian, R., Ricci, E., Vieriu, R.L., Lanz, O., Kalpathi, R., Sebe, N.: Exploring transfer learning approaches for head pose classification from multi-view surveillance images. Int. J. Comput. Vision 109, 146–167 (2014)

    Article  Google Scholar 

  4. Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 892–900 (2010)

    Google Scholar 

  5. Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: EMNLP 2008, pp. 1070–1079. Association for Computational Linguistics (2008)

    Google Scholar 

  6. Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 287–294 (1992)

    Google Scholar 

  7. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  8. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)

    Google Scholar 

  9. Ganjisaffar, Y., Caruana, R., Lopes, C.V.: Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 85–94. ACM (2011)

    Google Scholar 

  10. Motlani, R., Lalwani, H., Shrivastava, M., Sharma, D.M.: Developing part-of-speech tagger for a resource poor language: Sindhi

    Google Scholar 

  11. Gad-Elrab, M.H., Yosef, M.A., Weikum, G.: Named entity disambiguation for resource-poor languages. In: Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval, ESAIR 2015, pp. 29–34 (2015)

    Google Scholar 

  12. Gasser, M.: Expanding the lexicon for a resource-poor language using a morphological analyzer and a web crawler. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17–23 May 2010

    Google Scholar 

  13. Sravanthi, M.C., Prathyusha, K., Mamidi, R.: A Dialogue System for Telugu, a Resource-Poor Language, pp. 364–374 (2015)

    Google Scholar 

  14. Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  15. Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28, 133–168 (1997)

    Article  MATH  Google Scholar 

  16. Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15, 201–221 (1994)

    Google Scholar 

  17. Chu, W., Zinkevich, M., Li, L., Thomas, A., Tseng, B.: Unbiased online active learning in data streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 195–203 (2011)

    Google Scholar 

  18. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)

    MATH  Google Scholar 

  19. Campbell, C., Cristianini, N., Smola, A., et al.: Query learning with large margin classifiers. In: ICML, pp. 111–118 (2000)

    Google Scholar 

  20. Kremer, J., Steenstrup Pedersen, K., Igel, C.: Active learning with support vector machines. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 4(4), 313–326 (2014)

    Article  Google Scholar 

  21. Fu, Y., Zhu, X., Li, B.: A survey on instance selection for active learning (2012)

    Google Scholar 

  22. Reitmaier, T., Sick, B.: Let us know your decision: pool-based active training of a generative classifier with the selection strategy 4DS. Inf. Sci. 230, 106–131 (2013)

    Article  Google Scholar 

  23. Kottke, D., Krempl, G., Spiliopoulou, M.: Probabilistic active learning in datastreams. In: Fromont, E., Bie, T., Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 145–157. Springer, Cham (2015). doi:10.1007/978-3-319-24465-5_13

    Chapter  Google Scholar 

  24. Settles, B.: Curious machines: active learning with structured instances. ProQuest (2008)

    Google Scholar 

  25. Zhou, S., Chen, Q., Wang, X.: Active deep learning method for semi-supervised sentiment classification. Neurocomputing 120, 536–546 (2013)

    Article  Google Scholar 

  26. Li, S., Ju, S., Zhou, G., Li, X.: Active learning for imbalanced sentiment classification. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 139–148 (2012)

    Google Scholar 

  27. Boiy, E., Moens, M.F.: A machine learning approach to sentiment analysis in multilingual web texts. Inf. Retrieval 12(5), 526–558 (2009)

    Article  Google Scholar 

  28. Mukku, S.S., Choudhary, N., Mamidi, R.: Enhanced sentiment classification of Telugu text using ML techniques. In: Proceedings of 25th International Joint Conference on Artificial Intelligence, p. 29 (2016)

    Google Scholar 

  29. Patra, B.G., Das, D., Das, A., Prasath, R.: Shared task on sentiment analysis in Indian languages (SAIL) tweets - an overview. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS, vol. 9468, pp. 650–655. Springer, Cham (2015). doi:10.1007/978-3-319-26832-3_61

    Chapter  Google Scholar 

  30. Gupta, R., Goyal, P., Diwakar, S.: Transliteration among Indian languages using WX notation. In: KONVENS, pp. 147–150 (2010)

    Google Scholar 

  31. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  32. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  33. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  34. Krishnamurti, B., Gwynn, J.P.L.: A Grammar of Modern Telugu. Oxford University Press, New York (1985)

    Google Scholar 

  35. Krishnamurthi, B.: Telugu verbal bases: a comparative and descriptive study (1961)

    Google Scholar 

  36. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents

    Google Scholar 

  37. Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Hero, A.O., CastaĂ±Ă³n, D.A., Cochran, D., Kastella, K. (eds.) Foundations and Applications of Sensor Management, pp. 121–151. Springer, Boston (2008)

    Chapter  Google Scholar 

  38. Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 233–240 (2006)

    Google Scholar 

  39. Seewald, A.K.: Meta-learning for stacked classification. Audiology 24(226), 69

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Sandeep Sricharan Mukku or Subba Reddy Oota .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Mukku, S.S., Oota, S.R., Mamidi, R. (2017). Tag Me a Label with Multi-arm: Active Learning for Telugu Sentiment Analysis. In: Bellatreche, L., Chakravarthy, S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science(), vol 10440. Springer, Cham. https://doi.org/10.1007/978-3-319-64283-3_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64283-3_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64282-6

  • Online ISBN: 978-3-319-64283-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics