Active Learning on Sentiment Classification by Selecting Both Words and Documents

Ju, Shengfeng; Li, Shoushan

doi:10.1007/978-3-642-36337-5_6

Shengfeng Ju²¹ &
Shoushan Li²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7717))

Included in the following conference series:

Workshop on Chinese Lexical Semantics

3095 Accesses
2 Citations

Abstract

Currently, sentiment analysis has become a hot research topic in the natural language processing (NLP) field as it is highly valuable for many real applications.. One basic task in sentiment analysis is sentiment classification which aims to predict the sentiment orientation (positive or negative) of a document. Current approaches to this problem are mainly based on supervised machine learning technologies. The main drawback of such approaches lies in their needs of large amounts of labeled data. How to reduce the annotation cost has become an important issue in sentiment classification. In this study, we propose a novel active learning approach to select both "informative" word and document samples for annotation. Experimental results show that our approach apparently outperforms random selection or uncertainty sampling on documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Proceedings of EMNLP 2002, pp. 79–86 (2002)
Google Scholar
Li, S., Zong, C.: Multi-domain Sentiment Classification (short paper). In: Proceedings of ACL 2008, pp. 257–260 (2008)
Google Scholar
Melville, P., Gryc, W., Lawrence, R.: Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification. In: Proceedings of KDD 2009, pp. 1275–1284 (2009)
Google Scholar
Pang, B., Lee, L.: A Sentimental Education: Sentiment Analysis using Subjectivity Summarization based on Minimum Cuts. In: Proceedings of ACL 2004, pp. 271–278 (2004)
Google Scholar
Riloff, E., Patwardhan, S., Wiebe, J.: Feature Subsumption for Opinion Analysis. In: Proceedings of EMNLP 2006, pp. 440–448 (2006)
Google Scholar
McDonald, R., Hannan, K., Neylon, T., Wells, M., Reynar, J.: Structured Models for Fine-to-coarse Sentiment Analysis. In: Proceedings of ACL 2007, pp. 432–439 (2007)
Google Scholar
Cui, H., Mittal, V., Datar, M.: Comparative Experiments on Sentiment Classification for Online Product Reviews. In: Proceedings of AAAI 2006, pp. 1265–1270 (2006)
Google Scholar
Li, S., Huang, C., Zong, C.: Multi-domain Sentiment Classification with Classifier Combination. Journal of Computer Science and Technology (JCST) 26(1), 25–33 (2011)
Article Google Scholar
Li, S., Lee, S., Chen, Y., Huang, C., Zhou, G.: Sentiment Classification and Polarity Shifting. In: Proceeding of COLING 2010, pp. 635–643 (2010b)
Google Scholar
Li, S., Huang, C., Zhou, G., Lee, S.: Employing Personal/Impersonal Views in Supervised and Semi-supervised Sentiment Classification. In: Proceedings of ACL 2010, pp. 414–423 (2010a)
Google Scholar
Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis: Foundations and Trends. Information Retrieval 2(12), 1–135 (2008)
Google Scholar
Hatzivassiloglou, V., McKeown, K.: Predicting the Semantic Orientation of Adjectives. In: Proceedings of ACL 1997, pp. 174–181 (1997)
Google Scholar
Wiebe, J.: Learning Subjective Adjectives from Corpora. In: Proceedings of AAAI 2000 (2000)
Google Scholar
McCallum, A., Nigam, K.: Employing EM in pool-based active learning for text classification. In: Proceedings of ICML 1998, pp. 350–358 (1998)
Google Scholar
Long, J., Yin, J., Zhu, E., Zhao, W.: Active learning research. Research and Development of Computer 45, 300–304 (2008)
Google Scholar
Roy, N., McCallum, A.: Toward Optimal Active Learning through Sampling Estimation of Error Reduction. In: Proceedings of ICML 2001, pp. 441–448 (2001)
Google Scholar
Lewis, D., Gale, W.: Training Text Classifiers by Uncertainty Sampling. In: Proceedings of SIGIR 1994, pp. 3–12 (1994)
Google Scholar
Argamon-Engleson, S., Dagan, I.: Committee-Based Sample Selection For Probabilistic Classifiers. Journal of Artificial Intelligence Research, 335–360 (1999)
Google Scholar
Melville, P., Sindhwani, V.: Active Dual Supervision: Reducing the Cost of Annotating Examples and Features. In: Proceedings of NAACL 2009, pp. 49–57 (2009)
Google Scholar
Sindhwani, V., Melville, P.: Document-Word Co-Regularization for Semi-supervised Sentiment Analysis. In: Proceedings of ICDM 2008, pp. 1025–1030 (2008)
Google Scholar
Sindhwani, V., Hu, J., Mojsilovic, A.: Regularized co-clustering with dual supervision. In: NIPS, pp. 1505–1512 (2008)
Google Scholar
Zong, C.: Statistical natural language processing. Tsinghua University Publishing (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Natural Language Processing Lab, Soochow University, 1 Shizi Street, Suzhou, Jiangsu, China, 215006
Shengfeng Ju & Shoushan Li

Authors

Shengfeng Ju
View author publications
You can also search for this author in PubMed Google Scholar
Shoushan Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer School, Wuhan University, 430072, Wuhan, China
Donghong Ji
College of Chinese Language and Literature, Wuhan University, 430072, Wuhan, China
Guozheng Xiao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ju, S., Li, S. (2013). Active Learning on Sentiment Classification by Selecting Both Words and Documents. In: Ji, D., Xiao, G. (eds) Chinese Lexical Semantics. CLSW 2012. Lecture Notes in Computer Science(), vol 7717. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36337-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-36337-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36336-8
Online ISBN: 978-3-642-36337-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics