Developing a supervised learning-based social media business sentiment index

  • Hyeonseo Lee
  • Nakyeong Lee
  • Harim Seo
  • Min SongEmail author


The fast-growing digital data generation leads to the emergence of the era of big data, which become particularly more valuable because approximately 70% of the collected data in the world comes from social media. Thus, the investigation of online social network services is of paramount importance. In this paper, we use the sentiment analysis, which detects attitudes and emotions toward issues of society posted in social media, to understand the actual economic situation. To this end, two steps are suggested. In the first step, after training the sentiment classifiers with several big data sources of social media datasets, we consider three types of feature sets: feature vector, sequence vector and a combination of dictionary-based feature and sequence vectors. Then, the performance of six classifiers is assessed: MaxEnt-L1, C4.5 decision tree, SVM-kernel, Ada-boost, Naïve Bayes and MaxEnt. In the second step, we collect datasets that are relevant to several economic words that the public use to explicitly express their opinions. Finally, we use a vector auto-regression analysis to confirm our hypothesis. The results show the statistically significant relationship between public sentiment and economic performance. That is, “depression” and “unemployment” lead to KOSPI. Also, it shows that the extracted keywords from the sentiment analysis, such as “price,” “year-end-tax” and “budget deficit,” cause the exchange rates.


Sentiment analysis Social media Machine learning Supervised learning 



This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2015S1A3A2046711).


  1. 1.
    Perrin A (2015) Social media usage. Pew research center, pp 52–68Google Scholar
  2. 2.
    Statista, Number of social network users worldwide from 2010 to 2021 (in billions).
  3. 3.
    Jay Jacobs, CFA (2016) Social Media: Tech’s Growth Industry.
  4. 4.
    Jin S, Lin W, Yin H, Yang S, Li A, Deng B (2015) Community structure mining in big data social media networks with MapReduce. Clust Comput 18(3):999–1010CrossRefGoogle Scholar
  5. 5.
    Zhang G, Xu L, Xue Y (2017) Model and forecast stock market behavior integrating investor sentiment analysis and transaction data. Clust Comput 20(1):789–803CrossRefGoogle Scholar
  6. 6.
    Nasukawa T, Yi J (2003) Sentiment analysis: capturing favorability using natural language processing. In: Proceedings of the 2nd International Conference on Knowledge Capture. ACM, pp 70–77Google Scholar
  7. 7.
    Appel O, Chiclana F, Carter J (2015) Main concepts, state of the art and future research questions in sentiment analysis. Acta Polytech Hung 12(3):87–108Google Scholar
  8. 8.
    Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135CrossRefGoogle Scholar
  9. 9.
    Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167CrossRefGoogle Scholar
  10. 10.
    Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10. Association for Computational Linguistics, pp 79–86Google Scholar
  11. 11.
    Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp 347-354Google Scholar
  12. 12.
    O’Hare N, Davy M, Bermingham A, Ferguson P, Sheridan P, Gurrin C, Smeaton AF (2009) Topic-dependent sentiment analysis of financial blogs. In: Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion. ACM, pp 9–16Google Scholar
  13. 13.
    Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, vol 1, no 12Google Scholar
  14. 14.
    Wu F, Yuan Z, Huang Y (2017) Collaboratively training sentiment classifiers for multiple domains. IEEE Trans Knowl Data Eng 29(7):1370–1383CrossRefGoogle Scholar
  15. 15.
    Fernández AM, Esuli A, Sebastiani F (2016) Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification. J Artif Intell Res 55(1):131–163MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Wang L, Niu J, Song H, Atiquzzaman M (2018) SentiRelated: a cross-domain sentiment classification algorithm for short texts through sentiment related index. J Netw Comput Appl 101:111–119CrossRefGoogle Scholar
  17. 17.
    Bader BW, Kegelmeyer WP, Chew PA (2011) Multilingual sentiment analysis using latent semantic indexing and machine learning. In: IEEE 11th International Conference on Data Mining Workshops, pp 45–52Google Scholar
  18. 18.
    Manek AS, Shenoy PD, Mohan MC, Venugopal KR (2017) Aspect term extraction for sentiment analysis in large movie reviews using Gini index feature selection method and SVM classifier. World Wide Web 20(2):135–154CrossRefGoogle Scholar
  19. 19.
    Culnan M, McHugh P, Zubillaga J (2010) How large U.S. companies can use twitter and other social media to gain business value. MIS Q Executive 9(4):243–259Google Scholar
  20. 20.
    Di Gangi PM, Wasko M, Hooker RE (2010) Getting customers’ ideas to work for you: learning from dell how to succeed with online user innovation communities. MIS Q Executive 9(4):163–178Google Scholar
  21. 21.
    He W, Zha S, Li L (2013) Social media competitive analysis and text mining: a case study in the pizza industry. Int J Inf Manag 33(3):464–472CrossRefGoogle Scholar
  22. 22.
    Yang Y, Duan W, Cao Q (2013) The impact of social and conventional media on firm equity value: a sentiment analysis approach. Decis Support Syst 55(4):919–926CrossRefGoogle Scholar
  23. 23.
    Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions. Ecol Model 190(3):231–259CrossRefGoogle Scholar
  24. 24.
    Sun CJ, Yao L, Lin L, Sha XJ, Wang XL (2011) Semi-supervised biomedical relation classification using generalized expectation criteria. In: 2011 International Conference on Machine Learning and Cybernetics (ICMLC), vol 4. IEEE, pp 1949–1952Google Scholar
  25. 25.
    Mann GS, McCallum A (2010) Generalized expectation criteria for semi-supervised learning with weakly labeled data. J Mach Learn Res 11:955–984MathSciNetzbMATHGoogle Scholar
  26. 26.
    Polat K, Güneş S (2009) A novel hybrid intelligent method based on C4. 5 decision tree classifier and one-against-all approach for multi-class classification problems. Expert Syst Appl 36(2):1587–1592CrossRefGoogle Scholar
  27. 27.
    Schapire RE (2003) The boosting approach to machine learning: an overview. In: Nonlinear estimation and classification. Springer, New York, pp 149–171Google Scholar
  28. 28.
    Lewis DD (1998) Naive (Bayes) at forty: the independence assumption in information retrieval. In: European Conference on Machine Learning. Springer, Berlin, pp 4–15Google Scholar
  29. 29.
    Vapnik V (2013) The nature of statistical learning theory. Springer, BerlinzbMATHGoogle Scholar
  30. 30.
    Levine R, Zervos S (1998) Stock markets, banks, and economic growth. Am Econ Rev 88:537–558Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Library and Information ScienceYonsei UniversitySeoulSouth Korea

Personalised recommendations