An Ensemble Method Based on Confidence Probability for Multi-domain Sentiment Classification
Multi-domain sentiment classification methods based on ensemble decision attracts more and more attention. These methods avoid collecting a large amount of new training data in target domain and expand aspect of deploying source domain systems. However, these methods face some important issues: the quantity of incorrect pre-labeled data remains high and the fixed weights limit accuracy of the ensemble classifier. Thus, we propose a novel method, named CEC, which integrates the ideas of self-training and co-training into multi-domain sentiment classification. Classification confidence is used to pre-label the data in the target domain. Meanwhile, CEC combines the base classifiers according to classification confidence probabilities when taking a vote for prediction. The experiments show the accuracy of the proposed algorithm has highly improved compared with the baseline algorithms.
Keywordsensemble multi-domain sentiment classification co-training
Unable to display preview. Download preview PDF.
- 1.Whitehead, M., Yaeger, L.: Building a General Purpose Cross-Domain Sentiment Mining Model. In: Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering, pp. 472–476 (2009)Google Scholar
- 2.Li, S., Zong, C.: Multi-domain Sentiment Classification. In: Proceedings of ACL 2008: HTL. Short Papers (Companion Volume), pp. 257–260 (2008)Google Scholar
- 3.Avrim, B., Tom, M.: Combining Labeled and Unlabeled Data with Co-Training. In: Proceeding of The Eleventh Annual Conference on Computational Learning Theory, pp. 92–100 (1998)Google Scholar
- 4.Ng, V., Cardie, C.: Weakly Supervised Natural Language Learning without Redundant Views. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, pp. 94–101 (2003)Google Scholar
- 5.Clark, S., Curran, J., Osborne, M.: Bootstrapping Pos Taggers Using Unlabelled Data. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, Edmonton, Canada, pp. 49–55 (2003)Google Scholar
- 8.Yang, Y., Pedersen, J.: A Comparative Study on Feature Selection in Text Categorization. In: ICML, pp. 412–420 (1997)Google Scholar
- 9.Tan, S., Wang, Y., Cheng, X.: An Efficient Feature Ranking Measure for Text Categorization. In: SAC 2008, Fortaleza, Ceará, Brazil, pp. 407–413 (2008)Google Scholar