Advertisement

Information Systems and e-Business Management

, Volume 16, Issue 4, pp 721–742 | Cite as

FCE-SVM: a new cluster based ensemble method for opinion mining from social media

  • Gang Wang
  • Daqing Zheng
  • Shanlin Yang
  • Jian Ma
Original Article
  • 194 Downloads

Abstract

Opinion mining aiming to automatically detect subjective information has raised more and more interests from both academic and industry fields in recent years. In order to enhance the performance of opinion mining, some ensemble methods have been investigated and proven to be effective theoretically and empirically. However, cluster based ensemble method is paid less attention to in the area of opinion mining. In this paper, a new cluster based ensemble method, FCE-SVM, is proposed for opinion mining from social media. Based on the philosophy of divide and conquer, FCE-SVM uses fuzzy clustering module to generate different training sub datasets in the first stage. Then, base learners are trained based on different training datasets in the second stage. Finally, fusion module is employed to combine the results of based learners. Moreover, the multi-domain opinion datasets were investigated to verify the effectiveness of proposed method. Empirical results reveal that FCE-SVM gets the best performance through reducing bias and variance simultaneously. These results illustrate that FCE-SVM can be used as a viable method for opinion mining.

Keywords

Opinion mining Ensemble learning Cluster SVM Social media 

Notes

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (Nos. 71101042, 71471054), the National Program on Key Basic Research Project (973 Program) (No. 2013CB329603), Specialized Research Fund for the Doctoral Program of Higher Education (20110111120014), the China Postdoctoral Science Foundation (2011M501041, 2013T60611).

References

  1. Abbasi A, Chen H, Salem A (2008a) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst (TOIS) 26(3):12Google Scholar
  2. Abbasi A, Chen H, Thoms S, Fu T (2008b) Affect analysis of web forums and blogs using correlation ensembles. IEEE Trans Knowl Data Eng 20(9):1168–1180Google Scholar
  3. Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. ACL 31(2):440–447Google Scholar
  4. Boiy E, Moens M-F (2009) A machine learning approach to sentiment analysis in multilingual web texts. Inf Retr 12(5):526–558Google Scholar
  5. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140Google Scholar
  6. Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 28(2):15–21Google Scholar
  7. Chen H, Yang C (2011) Special issue on social media analytics: understanding the pulse of the society. Syst Man Cybern Part A Syst Hum IEEE Trans 41(5):826–827Google Scholar
  8. Chern C-C, Wei C-P, Shen F-Y, Fan Y-N (2015) A sales forecasting model for consumer products based on the influence of online word-of-mouth. Inf Syst E-Bus Manag 13(3):445–473Google Scholar
  9. Chiu SL (1994) Fuzzy model identification based on cluster estimation. J Intell Fuzzy syst 2(3):267–278Google Scholar
  10. Cover T, Hart P (1967) Nearest neighbor pattern classification. Inf Theory IEEE Trans 13(1):21–27Google Scholar
  11. Dang Y, Zhang Y, Chen H (2010) A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. Intell Syst IEEE 25(4):46–53Google Scholar
  12. Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international conference on world wide web. ACM, pp 519–528Google Scholar
  13. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30Google Scholar
  14. Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2):139–157Google Scholar
  15. García-Pedrajas N (2009) Constructing ensembles of classifiers by means of weighted instance selection. Neural Netw IEEE Trans 20(2):258–277Google Scholar
  16. Iman RL, Davenport JM (1980) Approximations of the critical region of the fbietkan statistic. Commun Stat Theory Methods 9(6):571–595Google Scholar
  17. Isa D, Lee LH, Kallimani VP, Rajkumar R (2008) Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Trans Knowl Data Eng 20(9):1264–1272Google Scholar
  18. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Machine learning (ECML-98), pp 137–142Google Scholar
  19. Kim S-M, Hovy E (2004) Determining the sentiment of opinions. In: Proceedings of the 20th international conference on computational linguistics. Association for Computational Linguistics, p 1367Google Scholar
  20. Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the 13th international conference on machine learning, pp 275–283Google Scholar
  21. Leopold E, Kindermann J (2002) Text categorization with support vector machines. how to represent texts in input space? Mach Learn 46(1):423–444Google Scholar
  22. Li W, WANG W, Chen Y (2012) Heterogeneous ensemble learning for chinese sentiment classification. J Inf Comput Sci 9(15):4551–4558Google Scholar
  23. Liu L, Zsu MT (2009) Encyclopedia of database systems. Springer, BerlinGoogle Scholar
  24. Lu B, Tsou BK (2010) Combining a large sentiment lexicon and machine learning for subjectivity classification. In: Proceedings of the IEEE 2010 international conference on machine learning and cybernetics, pp 3311–3316Google Scholar
  25. Pal NR, Bezdek JC (1995) On cluster validity for the fuzzy C-means model. Fuzzy Syst IEEE Trans 3(3):370–379Google Scholar
  26. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135Google Scholar
  27. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 79–86Google Scholar
  28. Polikar R (2006) Ensemble based systems in decision making. Circuits Syst Mag IEEE 6(3):21–45Google Scholar
  29. Prabowo R, Thelwall M (2009) sentiment analysis: a combined approach. J Informetr 3(2):143–157Google Scholar
  30. Quinlan JR (1993) C4. 5: programs for machine learning. Morgan Kaufmann Press, San Mateo, CA, United StatesGoogle Scholar
  31. Rish I (2001) An empirical study of the naive Bayes classifier, pp 41–46Google Scholar
  32. Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47Google Scholar
  33. Su Y, Zhang Y, Ji D, Wang Y, Wu H (2013) Ensemble learning for sentiment classification. In: Ji D, Xiao G (eds) Chinese lexical semantics. Springer, Berlin, Heidelberg, pp 84–93Google Scholar
  34. Subrahmanian VS, Reforgiato D (2008) Ava: adjective-verb-adverb combinations for sentiment analysis. Intell Syst IEEE 23(4):43–50Google Scholar
  35. Thelwall M, Buckley K (2013) Topic—based sentiment analysis for the social web: the role of mood and issue—related words. J Am Soc Inf Sci Technol 64(8):1608–1617Google Scholar
  36. Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63(1):163–173Google Scholar
  37. Thet TT, Na J-C, Khoo CS (2010) Aspect-based sentiment analysis of movie reviews on discussion boards. J Inf Sci 36(6):823–848Google Scholar
  38. Tsutsumi K, Shimada K, Endo T (2007) Movie review classification based on a multiple classifier. In: Proceedings of the 21th Pacific Asia conference on language, information and computation, pp 481–488Google Scholar
  39. Turney PD (2002) Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 417–424Google Scholar
  40. Vapnik VN (2000) The nature of statistical learning theory. Springer, NY, United StatesGoogle Scholar
  41. Wang G, Hao J, Ma J, Jiang H (2011a) A comparative assessment of ensemble learning for credit scoring. Expert Syst Appl 38(1):223–230Google Scholar
  42. Wang G, Ma J, Yang S (2011b) Igf-bagging: information gain based feature selection for bagging. Int J Innov Comput Inf Control 7(11):6247–6259Google Scholar
  43. Wang G, Sun J, Ma J, Xu K, Gu J (2014) Sentiment classification: the contribution of ensemble learning. Decis Support Syst 57(1):77–93Google Scholar
  44. Whitehead M, Yaeger L (2010) Sentiment mining using ensemble classification models. In: Sobh T (ed) Innovations and advances in computer sciences and engineering. Springer, Berlin, pp 509–514Google Scholar
  45. Wilson T, Wiebe J, Hwa R (2006) Recognizing strong and weak opinion clauses. Comput Intell 22(2):73–99Google Scholar
  46. Windeatt T, Ardeshir G (2004) Decision tree simplification for classifier ensembles. Int J Pattern Recognit Artif Intell 18(5):749–776Google Scholar
  47. Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques. Morgan Kaufmann Press, Cambridge, MA, United StatesGoogle Scholar
  48. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82Google Scholar
  49. Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181(6):1138–1152Google Scholar
  50. Yang C-S, Chen C-H, Chang P-C (2015) Harnessing consumer reviews for marketing intelligence: a domain-adapted sentiment classification approach. Inf Syst E-Bus Manag 13(3):403–419Google Scholar
  51. Yi J, Nasukawa T, Bunescu R, Niblack W (2003) sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In: Third IEEE international conference on data mining ICDM 2003, pp 427–434Google Scholar
  52. Zhang C, Zeng D, Li J, Wang FY, Zuo W (2009) Sentiment analysis of chinese documents: from sentence to document level. J Am Soc Inf Sci Technol 60(12):2474–2487Google Scholar
  53. Zhou Z-H (2012) Ensemble methods: foundations and algorithms. Chapman & Hall/CRC Press, NY, United StatesGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  • Gang Wang
    • 1
    • 2
    • 3
  • Daqing Zheng
    • 4
    • 5
  • Shanlin Yang
    • 1
    • 2
  • Jian Ma
    • 3
  1. 1.School of ManagementHefei University of TechnologyHefeiPeople’s Republic of China
  2. 2.Key Laboratory of Process Optimization and Intelligent Decision-makingMinistry of EducationHefeiPeople’s Republic of China
  3. 3.Department of Information SystemsCity University of Hong KongKowloonHong Kong
  4. 4.Room 208, School of Information Management and EngineeringSUFEShanghaiPeople’s Republic of China
  5. 5.Shanghai Key Laboratory of Financial Information TechnologySUFEShanghaiPeople’s Republic of China

Personalised recommendations