Tag Me a Label with Multi-arm: Active Learning for Telugu Sentiment Analysis

Mukku, Sandeep Sricharan; Oota, Subba Reddy; Mamidi, Radhika

doi:10.1007/978-3-319-64283-3_26

Sandeep Sricharan Mukku¹⁵,
Subba Reddy Oota¹⁶ &
Radhika Mamidi¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10440))

Included in the following conference series:

International Conference on Big Data Analytics and Knowledge Discovery

1692 Accesses
8 Citations

Abstract

Sentiment Analysis is one of the most active research areas in natural language processing and an extensively studied problem in data mining, web mining and text mining for English language. With the proliferation of social media these days, data is widely increasing in regional languages along with English. Telugu is one such regional language with abundant data available in social media, but it’s hard to find a labeled training set as human annotation is time-consuming and cost-ineffective. To address this issue, in this paper the practicality of active learning for Telugu sentiment analysis is investigated. We built a hybrid approach by combining different query selection strategy frameworks to increase more accurate training data instances with limited labeled data. Using a set of classifiers like SVM, XGBoost, and Gradient Boosted Trees (GBT), we achieved promising results with minimal error rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Settles, B.: Active learning literature survey. Technical report (2010)
Google Scholar
Lewis, D.D.: A sequential algorithm for training text classifiers: corrigendum and additional data, pp. 13–19 (1995)
Google Scholar
Kolar Rajagopal, A., Subramanian, R., Ricci, E., Vieriu, R.L., Lanz, O., Kalpathi, R., Sebe, N.: Exploring transfer learning approaches for head pose classification from multi-view surveillance images. Int. J. Comput. Vision 109, 146–167 (2014)
Article Google Scholar
Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 892–900 (2010)
Google Scholar
Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: EMNLP 2008, pp. 1070–1079. Association for Computational Linguistics (2008)
Google Scholar
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 287–294 (1992)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
Google Scholar
Ganjisaffar, Y., Caruana, R., Lopes, C.V.: Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 85–94. ACM (2011)
Google Scholar
Motlani, R., Lalwani, H., Shrivastava, M., Sharma, D.M.: Developing part-of-speech tagger for a resource poor language: Sindhi
Google Scholar
Gad-Elrab, M.H., Yosef, M.A., Weikum, G.: Named entity disambiguation for resource-poor languages. In: Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval, ESAIR 2015, pp. 29–34 (2015)
Google Scholar
Gasser, M.: Expanding the lexicon for a resource-poor language using a morphological analyzer and a web crawler. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17–23 May 2010
Google Scholar
Sravanthi, M.C., Prathyusha, K., Mamidi, R.: A Dialogue System for Telugu, a Resource-Poor Language, pp. 364–374 (2015)
Google Scholar
Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)
Article MathSciNet MATH Google Scholar
Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28, 133–168 (1997)
Article MATH Google Scholar
Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15, 201–221 (1994)
Google Scholar
Chu, W., Zinkevich, M., Li, L., Thomas, A., Tseng, B.: Unbiased online active learning in data streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 195–203 (2011)
Google Scholar
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)
MATH Google Scholar
Campbell, C., Cristianini, N., Smola, A., et al.: Query learning with large margin classifiers. In: ICML, pp. 111–118 (2000)
Google Scholar
Kremer, J., Steenstrup Pedersen, K., Igel, C.: Active learning with support vector machines. Wiley Interdiscip. Rev. Data Min. Knowl. Disc. 4(4), 313–326 (2014)
Article Google Scholar
Fu, Y., Zhu, X., Li, B.: A survey on instance selection for active learning (2012)
Google Scholar
Reitmaier, T., Sick, B.: Let us know your decision: pool-based active training of a generative classifier with the selection strategy 4DS. Inf. Sci. 230, 106–131 (2013)
Article Google Scholar
Kottke, D., Krempl, G., Spiliopoulou, M.: Probabilistic active learning in datastreams. In: Fromont, E., Bie, T., Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 145–157. Springer, Cham (2015). doi:10.1007/978-3-319-24465-5_13
Chapter Google Scholar
Settles, B.: Curious machines: active learning with structured instances. ProQuest (2008)
Google Scholar
Zhou, S., Chen, Q., Wang, X.: Active deep learning method for semi-supervised sentiment classification. Neurocomputing 120, 536–546 (2013)
Article Google Scholar
Li, S., Ju, S., Zhou, G., Li, X.: Active learning for imbalanced sentiment classification. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 139–148 (2012)
Google Scholar
Boiy, E., Moens, M.F.: A machine learning approach to sentiment analysis in multilingual web texts. Inf. Retrieval 12(5), 526–558 (2009)
Article Google Scholar
Mukku, S.S., Choudhary, N., Mamidi, R.: Enhanced sentiment classification of Telugu text using ML techniques. In: Proceedings of 25th International Joint Conference on Artificial Intelligence, p. 29 (2016)
Google Scholar
Patra, B.G., Das, D., Das, A., Prasath, R.: Shared task on sentiment analysis in Indian languages (SAIL) tweets - an overview. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS, vol. 9468, pp. 650–655. Springer, Cham (2015). doi:10.1007/978-3-319-26832-3_61
Chapter Google Scholar
Gupta, R., Goyal, P., Diwakar, S.: Transliteration among Indian languages using WX notation. In: KONVENS, pp. 147–150 (2010)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
Krishnamurti, B., Gwynn, J.P.L.: A Grammar of Modern Telugu. Oxford University Press, New York (1985)
Google Scholar
Krishnamurthi, B.: Telugu verbal bases: a comparative and descriptive study (1961)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents
Google Scholar
Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Hero, A.O., Castañón, D.A., Cochran, D., Kastella, K. (eds.) Foundations and Applications of Sensor Management, pp. 121–151. Springer, Boston (2008)
Chapter Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 233–240 (2006)
Google Scholar
Seewald, A.K.: Meta-learning for stacked classification. Audiology 24(226), 69
Google Scholar

Download references

Author information

Authors and Affiliations

LTRC, KCIS, IIIT Hyderabad, Hyderabad, India
Sandeep Sricharan Mukku & Radhika Mamidi
Teradata, Hyderabad, India
Subba Reddy Oota

Authors

Sandeep Sricharan Mukku
View author publications
You can also search for this author in PubMed Google Scholar
Subba Reddy Oota
View author publications
You can also search for this author in PubMed Google Scholar
Radhika Mamidi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Sandeep Sricharan Mukku or Subba Reddy Oota .

Editor information

Editors and Affiliations

LIAS/ISAE-ENSMA, Chasseneuil, France
Ladjel Bellatreche
University of Texas at Arlington, Arlington, Texas, USA
Sharma Chakravarthy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukku, S.S., Oota, S.R., Mamidi, R. (2017). Tag Me a Label with Multi-arm: Active Learning for Telugu Sentiment Analysis. In: Bellatreche, L., Chakravarthy, S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science(), vol 10440. Springer, Cham. https://doi.org/10.1007/978-3-319-64283-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-64283-3_26
Published: 03 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64282-6
Online ISBN: 978-3-319-64283-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics