Abstract
Cross-lingual sentiment classification aims to predict the sentiment orientation of a text in a language (named as the target language) with the help of the resources from another language (named as the source language). However, current cross-lingual performance is normally far away from satisfaction due to the huge difference in linguistic expression and social culture. In this paper, we suggest to perform active learning for cross-lingual sentiment classification, where only a small scale of samples are actively selected and manually annotated to achieve reasonable performance in a short time for the target language. The challenge therein is that there are normally much more labeled samples in the source language than those in the target language. This makes the small amount of labeled samples from the target language flooded in the aboundance of labeled samples from the source language, which largely reduces their impact on cross-lingual sentiment classification. To address this issue, we propose a data quality controlling approach in the source language to select high-quality samples from the source language. Specifically, we propose two kinds of data quality measurements, intra- and extra-quality measurements, from the certainty and similarity perspectives. Empirical studies verify the appropriateness of our active learning approach to cross-lingual sentiment classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Balahur, A., Turchi, M.: Multilingual Sentiment Analysis using Machine Translation? In: Proceedings of the 3rd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, pp. 52–60 (2012)
Blitzer, J., Dredze, M., Pereira, F.: Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In: Proceedings of ACL 2007, pp. 440–447 (2007)
Boyd-Graber, J., Resnik, P.: Holistic Sentiment Analysis across Languages Multilingual Supervised Latent Dirichlet Allocation. In: Proceedings of ACL 2010, pp. 45–55 (2010)
Kohavi, R.: A Study of Cross-validation and Bootstrp for Accuracy Estimation and Model Selection. In: Proceedings of IJCAI, pp. 1137–1143 (1995)
Liu, B.: Sentiment Analysis and Opinion Mining (Introduction and Survey). Morgan & Claypool Publishers (May 2012)
Lu, B., Tan, C., Cardie, C., Tsou, B.: Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora. In: Proceedings of ACL 2011, pp. 320–330 (2011)
Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis: Foundations and Trends. Information Retrieval 2(12), 1–135 (2008)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Proceedings of EMNLP 2002, pp. 79–86 (2002)
Prettenhofer, P., Stein, B.: Cross Language Text Classification Using Structural Correspondence Learning. In: Proceedings of ACL 2010, pp. 1118–1127 (2010)
Turney, P.: Thumbs up or Thumbs down? Semantic Orientation Applied to Unsupervised Classification of reviews. In: Proceedings of ACL 2002, pp. 417–424 (2002)
Wan, X.: Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis. In: Proceedings of ACL 2008, pp. 553–561 (2008)
Wan, X.: Co-Training for Cross-Lingual Sentiment Classification. In: Proceedings of ACL 2009, pp. 235–243 (2009)
Wan, X.: Bilingual Co-Training for Sentiment Classification of Chinese Product Reviews. Computational Linguistics 37, 587–616 (2011)
Wei, B., Pal, C.: Cross Lingual Adaptation An Experiment on Sentiment Classifications. In: Proceedings of ACL 2010, pp. 258–262 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, S., Wang, R., Liu, H., Huang, CR. (2013). Active Learning for Cross-Lingual Sentiment Classification. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2013. Communications in Computer and Information Science, vol 400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41644-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-41644-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41643-9
Online ISBN: 978-3-642-41644-6
eBook Packages: Computer ScienceComputer Science (R0)