Service Oriented Computing and Applications

, Volume 13, Issue 2, pp 155–167 | Cite as

Improving generalization ability of instance transfer-based imbalanced sentiment classification of turn-level interactive Chinese texts

  • Feng Tian
  • Fan WuEmail author
  • Xiang Fei
  • Nazaraf Shah
  • Qinghua Zheng
  • Yuanyuan Wang
Original Research Paper


Generally, a classification model achieving better generalization ability means the model performs better on the future incoming data, otherwise the history dataset. Increasing the generalization ability of multi-domain and imbalanced multi-class emotion classification of turn-level interactive Chinese texts poses the challenges due to its high dimension and sparse feature values in its feature space. Moreover, the properties of different feature spaces or diverse data distributions in various domains of target dataset (T) and source dataset (S) make it difficult to employ multi-class and multi-domain instance transfer. To address these challenges, we propose a data-level sampling approach for multi-class and multi-domain instance transfer which is inspired by transfer learning. To verify the validity of our proposed method, an imbalanced dataset is taken as target dataset, while three datasets, one collected from Bulletin Board System of Xi’an Jiaotong University and other two datasets collected from China microblog platform Weibo, as source datasets. The experimental results show that the proposed approach outperforms classic algorithms by alleviating the imbalanced problem in interactive texts effectively. Moreover, a classification model that is trained on immigrated datasets produced by employing our proposed method achieves the best ability of generalization.


Imbalanced sentiment classification Multi-class Multi-domain Interactive Chinese texts Instance immigration-based sampling Generalization ability 



This work is supported by National Key Research and Development Program of China (2018YFB1004500), National Nature Science Foundation of China (61877048, 61472315), Innovative Research Group of the National Natural Science Foundation of China (61721002), Innovation Research Team of Ministry of Education (IRT_17R86), Project of China Knowledge Center for Engineering Science and Technology, Project of Chinese Academy of Engineering “The Online and Offline Mixed Educational Service System for ‘The Belt and Road’ Training in MOOC China.”

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

11761_2019_264_MOESM1_ESM.docx (21 kb)
Supplementary material 1 (DOCX 20 kb)


  1. 1.
    Tian F, Zheng Q, Zheng D (2010) Mining patterns of e-learner emotion communication in turn level of Chinese interactive text experiments and findings. In: Proceeding of 2010 12th international conference on computer supported cooperative work in design. Shanghai, China, pp 664–670Google Scholar
  2. 2.
    Tian F, Liang H, Li L (2012) Sentiment classification in turn-level interactive Chinese texts of E-learning applications. In: Proceeding of 2012 12th international conference on computer supported cooperative work in design. Roma, Italy, pp 480–484Google Scholar
  3. 3.
    Gibson W (2009) Intercultural communication online conversation analysis and the investigation of asynchronous written discourse. Forum Qualit Soc Res 10(1):1–18MathSciNetGoogle Scholar
  4. 4.
    Wu C, Huang Y, Hwang J (2016) Review of affective computing in education/learning: trends and challenges. Br J Educ Technol 47(6):1304–1323Google Scholar
  5. 5.
    Liu Z, Liu S, Liu L, Sun J, Peng X, Wang T (2016) Sentiment recognition of online course reviews using multi-swarm optimization-based selected features. Neurocomputing 185:1120Google Scholar
  6. 6.
    Liu Z, Zhang W, Sun J, Cheng HNH, Peng X, Liu S (2016) Emotion and associated topic detection for course comments in a MOOC platform. In: 2016 International conference on educational innovation through techGoogle Scholar
  7. 7.
    Tian F, Gao P, Li L, Zhang W, Liang H, Qian Y, Zhao R (2014) Recognizing and regulating e-learners emotions based on interactive Chinese texts in e-learning systems. Knowl Based Syst 55:148–164Google Scholar
  8. 8.
    Tian F, Wu F, Chao K, Zheng Q, Shah N, Lan T, Yue J (2016) A topic sentence-based instance transfer method for imbalanced sentiment classification of Chinese product reviews. Electron Commer Res Appl 16:6676Google Scholar
  9. 9.
    Riloff E, Wiebe J, Wilson T (2003) Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the seventh conference on natural language learning conference, Edmonton, Canada, pp 25–32Google Scholar
  10. 10.
    Turney P, Littman M (2003) Measuring praise and criticism inference of semantic orientation from association. ACM Trans Inf Syst 21(4):315–346Google Scholar
  11. 11.
    Xu T, Peng Q (2012) Identifying the semantic orientation of terms using S-HAL for sentiment analysis. Knowl Based Syst 35:279–289Google Scholar
  12. 12.
    Wilson T, Wiebe J, Hoffmann P (2009) Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis. Comput Linguist 35(3):399–433Google Scholar
  13. 13.
    Pang B, Lee L (2004) A sentimental education: sentimental analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting of the association for computational linguistics, Barcelona, Spain, pp 271–278Google Scholar
  14. 14.
    Kim S, Hovy E (2005) Automatic detection of opinion bearing words and sentences. In: Proceeding of the IJCNLP 2005, Jeju Island, Republic of Korea, pp 61–65Google Scholar
  15. 15.
    Yin C, Peng Q (2009) Sentiment analysis for product features in Chinese reviews based on semantic association. In: Proceeding of international conference on artificial intelligence and computational intelligence 2009. Shanghai, China, pp 81–85Google Scholar
  16. 16.
    Hatzivassiloglou V, Wiebe J (200) Effects of adjective orientation and gradability on sentence subjectivity. In: Proceeding of the international conference on computational linguistics (COLING), Sarbrucken, Germany, pp 299–305Google Scholar
  17. 17.
    Yu H, Hatzivassiloglou V (2003) Towards answering opinion questions separating facts from opinions and identifying the polarity of opinion sentences. In: Proceeding of the EMNLP 2003. Sapporo, Japan, pp 129–136Google Scholar
  18. 18.
    Efron M (2004) Cultural orientations: classifying subjective documents by cocitation analysis. In: Proceedings of the AAAI fall symposium series on style and meaning in language, art, music, and design, Washington, DC, USA, pp 41–48Google Scholar
  19. 19.
    Lin W, Wilson T, Wiebe J (2006) Which side are you on? Identifying perspectives at the document and sentence levels. In: Proceeding of the conference on natural language learning. Morristown, USA, pp 109–116Google Scholar
  20. 20.
    Jindal N, Liu B (2006) Identifying comparative sentences in text documents. In: Proceedings of the ACM special interest group on information retrieval, Seattle, USA, pp 244–251Google Scholar
  21. 21.
    Khan K, Baharudin B, Khan A, Malik F (2009) Mining opinion from text documents: a survey. In: Proceedings of 2009 3rd IEEE international conference on digital ecosystems and technologies, Istanbul, Turkey, pp 217–222Google Scholar
  22. 22.
    Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A (2010) Sentiment strength detection in short informal text. J Am Soc Inform Sci Technol 61(12):2544–2558Google Scholar
  23. 23.
    Wang X, Fu G (2010) Chinese sentence-level sentiment classification based on sentiment morphemes. In: 2010 International conference on Asian language processing, Harbin, China, pp 28–30Google Scholar
  24. 24.
    Borrajo L, Romero R, Iglesias E, Redondo E (2011) Improving imbalanced scientific text classification using sampling strategies and dictionaries. J Integr Bioinform 8(3):176–191Google Scholar
  25. 25.
    Barandela R, Valdovinos R, Sánchez J, Ferri F (2004) The imbalanced training sample problem under or over sampling? Lecture notes on computer science, volume 3138. Structural, syntactic, and statistical pattern recognition. Springer, BerlinGoogle Scholar
  26. 26.
    Chawla N (2003) C4.5 and imbalanced data sets investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the ICML03, workshop in datasets imbalance, Washington DC, USA, pp 315–330Google Scholar
  27. 27.
    Kamel M, Wong A, Wang Y (2007) Cost sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378zbMATHGoogle Scholar
  28. 28.
    Zhou Z, Liu X (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77MathSciNetGoogle Scholar
  29. 29.
    Wang S, Li D, Song X, Wei Y, Li H (2011) A feature selection method based on improved fishers discriminant ratio for text sentiment classification. Expert Syst Appl 38(3):8696–8702Google Scholar
  30. 30.
    Wang S, Li D, Zhao L, Zhang J (2013) Sample cutting method for imbalanced text sentiment classification based on BRC. Knowl Based Syst 37:451–461Google Scholar
  31. 31.
    Liu T, Peng Q (2009) Imbalanced text classification: a term weighting approach. Knowl Based Syst 36(1):690–701Google Scholar
  32. 32.
    Raskutti B, Kowalczyk A (2004) Extreme rebalancing for SVMs: a case study. ACM SIGKDD Explor Newsl 6(1):60–69Google Scholar
  33. 33.
    Ogura H, Amano H, Kondo M (2010) Comparison of metrics for feature selection in imbalanced text classification. Expert Syst Appl 38(5):4978–4989Google Scholar
  34. 34.
    Satyam M, Jitendra A, Sanjeev S (2011) A new approach for classification of highly imbalanced datasets using evolutionary algorithms. Int J Sci Eng Res 2(7):1–5Google Scholar
  35. 35.
    Pan S, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359Google Scholar
  36. 36.
    Cha S-H (2007) Comprehensive survey on distance/similarity measures between probability density functions. City 1(2):1MathSciNetGoogle Scholar
  37. 37.
    Tatti N (2007) Distances between data sets based on summary statistics. J Mach Learn Res 8(1):131–154MathSciNetzbMATHGoogle Scholar
  38. 38.
    Song Q, Wang G, Wang C (2012) Automatic recommendation of classification algorithms based on data set characteristics. Pattern Recognit 45(7):2672–2689Google Scholar
  39. 39.
    He H, Ma Y (2010) Imbalanced learning—foundations, algorithms, and applications. IEEE Press, New YorkzbMATHGoogle Scholar
  40. 40.
    Salton G, McGill M (1983) Introduction to modern information retrieval. McGraw-Hill, New YorkzbMATHGoogle Scholar
  41. 41.
    Salton G, Fox E, Wu H (1983) Extended Boolean information retrieval. ACM Commun 26:1022–1036MathSciNetzbMATHGoogle Scholar
  42. 42.
    Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523Google Scholar
  43. 43.
    Fisher R, Yates F (1963) Statistical tables for biological, agricultural and medical research, 6th edn. Oliver & Boyd, EdinburghzbMATHGoogle Scholar
  44. 44.
    MacKay DJC (2003) Information theory, inference, and learning algorithms, First edn. Cambridge University Press, Cambridge, p 34zbMATHGoogle Scholar
  45. 45.
    Han J, Kamber M (2006) Data mining concept and techniques, 2nd edn. The Morgan Kaufmann, Los AltoszbMATHGoogle Scholar
  46. 46.
    Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) SMOTE synthetic minority oversampling technique. J Artif Intell Res 16:321–357zbMATHGoogle Scholar
  47. 47.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Systems Engineering InstituteXi’an Jiaotong UniversityXi’anChina
  2. 2.Faculty of Engineering and ComputingCoventry UniversityCoventryUK
  3. 3.Department of Computer Science and TechnologyXi’an Jiaotong UniversityXi’anChina

Personalised recommendations