Automatic Creation of a Large and Polished Training Set for Sentiment Analysis on Twitter

  • Stefano Cagnoni
  • Paolo Fornacciari
  • Juxhino Kavaja
  • Monica Mordonini
  • Agostino Poggi
  • Alex Solimeo
  • Michele TomaiuoloEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10710)


Within the field of sentiment analysis and emotion detection applied to tweets, one of the main problems related to the construction of an automatic classifier is the lack of suitable training sets. Considering the tediousness of manually annotating a training set, and the noise present in data collected directly from the social web, in this paper we propose an iterative learning approach, which combines distant supervision with dataset pruning techinque. In particular, following the “eat your own dogfood” idea, we have applied a classifier, trained on raw data obtained from different Twitter channels, to the same original dataset, for removing the most doubious instances automatically. This kind of approach has been used to obtain a more polished training set for emotion classification, based on Parrot’s model of six basic emotions. On the basis of the achieved results, we argue that the automatic filtering of training sets can make the application of the distant supervision approach more effective in many use cases.


Social media Emotion detection Distant supervision Machine learning 


  1. 1.
    Yan, J.L.S., Turtle, H.R., Liddy, E.D.: EmoTweet-28: a fine-grained emotion corpus for sentiment analysis. In: Proceedings of the 10th International Conference on Language Resources and Evaluation. LREC 2016, pp. 1149–1156 (2016)Google Scholar
  2. 2.
    Franchi, E., Poggi, A., Tomaiuolo, M.: Social media for online collaboration in firms and organizations. Int. J. Inf. Syst. Model. Des. (IJISMD) 7, 18–31 (2016)CrossRefGoogle Scholar
  3. 3.
    Sani, L., et al.: Efficient search of relevant structures in complex systems. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016. LNCS (LNAI), vol. 10037, pp. 35–48. Springer, Cham (2016). Scholar
  4. 4.
    Amoretti, M., Ferrari, A., Fornacciari, P., Mordonini, M., Rosi, F., Tomaiuolo, M.: Local-first algorithms for community detection. In: 2nd International Workshop on Knowledge Discovery on the WEB, KDWeb 2016 (2016)Google Scholar
  5. 5.
    Ducange, P., Pecori, R., Mezzina, P.: A glimpse on big data analytics in the framework of marketing strategies. Soft. Comput. 21, 1–18 (2017)CrossRefGoogle Scholar
  6. 6.
    Bollen, J., Mao, H., Pepe, A.: Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In: ICWSM, vol. 11, pp. 450–453 (2011)Google Scholar
  7. 7.
    Ugolotti, R., Sassi, F., Mordonini, M., Cagnoni, S.: Multi-sensor system for detection and classification of human activities. J. Ambient Intell. Humaniz. Comput. 4, 27–41 (2013)CrossRefGoogle Scholar
  8. 8.
    Matrella, G., Parada, G., Mordonini, M., Cagnoni, S.: A video-based fall detector sensor well suited for a data-fusion approach. In: Assistive Technology from Adapted Equipment to Inclusive Environments. Assistive Technology Research Series, vol. 25, pp. 327–331 (2009)Google Scholar
  9. 9.
    Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5, 1–167 (2012)CrossRefGoogle Scholar
  10. 10.
    Mohammad, S.M.: Sentiment analysis: detecting valence, emotions, and other affectual states from text. In: Emotion Measurement (2015)Google Scholar
  11. 11.
    Fornacciari, P., Mordonini, M., Tomauiolo, M.: Social network and sentiment analysis on Twitter: towards a combined approach. In: 1st International Workshop on Knowledge Discovery on the WEB, KDWeb 2015 (2015)Google Scholar
  12. 12.
    Mislove, A., Lehmann, S., Ahn, Y.Y., Onnela, J.P., Rosenquist, J.N.: Pulse of the nation: US mood throughout the day inferred from twitter. Northeastern University (2010)Google Scholar
  13. 13.
    Allisio, L., Mussa, V., Bosco, C., Patti, V., Ruffo, G.: Felicittà: Visualizing and estimating happiness in italian cities from geotagged tweets. In: ESSEM@ AI* IA, pp. 95–106 (2013)Google Scholar
  14. 14.
    Healey, C., Ramaswamy, S.: Visualizing Twitter sentiment (2010). Accessed 17 Jun 2016Google Scholar
  15. 15.
    Strapparava, C., Valitutti, A., et al.: Wordnet affect: an affective extension of wordnet. In: LREC, vol. 4, pp. 1083–1086 (2004)Google Scholar
  16. 16.
    Strapparava, C., Mihalcea, R.: Learning to identify emotions in text. In: Proceedings of the 2008 ACM symposium on Applied computing, pp. 1556–1560. ACM (2008)Google Scholar
  17. 17.
    Poria, S., Cambria, E., Winterstein, G., Huang, G.B.: Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl. Based Syst. 69, 45–63 (2014)CrossRefGoogle Scholar
  18. 18.
    Kao, E.C., Liu, C.C., Yang, T.H., Hsieh, C.T., Soo, V.W.: Towards text-based emotion detection a survey and possible improvements. In: 2009 International Conference on Information Management and Engineering, ICIME 2009, pp. 70–74. IEEE (2009)Google Scholar
  19. 19.
    Al-Hajjar, D., Syed, A.Z.: Applying sentiment and emotion analysis on brand tweets for digital marketing. In: 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6. IEEE (2015)Google Scholar
  20. 20.
    Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC. Vol. 10, pp. 2200–2204 (2010)Google Scholar
  21. 21.
    Mohammad, S.M., Kiritchenko, S.: Using hashtags to capture fine emotion categories from tweets. Comput. Intell. 31, 301–326 (2015)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Ghazi, D., Inkpen, D., Szpakowicz, S.: Hierarchical versus flat classification of emotions in text. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 140–146. Association for Computational Linguistics (2010)Google Scholar
  23. 23.
    Angiani, G., Cagnoni, S., Chuzhikova, N., Fornacciari, P., Mordonini, M., Tomaiuolo, M.: Flat and hierarchical classifiers for detecting emotion in tweets. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016. LNCS (LNAI), vol. 10037, pp. 51–64. Springer, Cham (2016). Scholar
  24. 24.
    Parrott, W.G.: Emotions in Social Psychology: Essential Readings. Psychology Press, Philadelphia (2001)Google Scholar
  25. 25.
    Suttles, J., Ide, N.: Distant supervision for emotion classification with discrete binary values. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 121–136. Springer, Heidelberg (2013). Scholar
  26. 26.
    Plutchik, R., Kellerman, H.: Emotion: Theory, Research and Experience. Academic press, New York (1986)Google Scholar
  27. 27.
    Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1 (2009)Google Scholar
  28. 28.
    Purver, M., Battersby, S.: Experimenting with distant supervision for emotion classification. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 482–491. Association for Computational Linguistics (2012)Google Scholar
  29. 29.
    Intxaurrondo, A., Surdeanu, M., De Lacalle, O.L., Agirre, E.: Removing noisy mentions for distant supervision. Procesamiento del lenguaje natural 51, 41–48 (2013)Google Scholar
  30. 30.
    Roth, B., Barth, T., Wiegand, M., Klakow, D.: A survey of noise reduction methods for distant supervision. In: Proceedings of the 2013 Workshop on Automated Knowledge Base Construction, pp. 73–78. ACM (2013)Google Scholar
  31. 31.
    Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., Wilson, T.: Semeval-2013 task 2: Sentiment analysis in twitter (2013)Google Scholar
  32. 32.
    Mohammad, S.M.: # Emotional tweets. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation. SemEval 2012, pp. 246–255. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Stefano Cagnoni
    • 1
  • Paolo Fornacciari
    • 1
  • Juxhino Kavaja
    • 1
  • Monica Mordonini
    • 1
  • Agostino Poggi
    • 1
  • Alex Solimeo
    • 1
  • Michele Tomaiuolo
    • 1
    Email author
  1. 1.Dipartimento di Ingegneria e ArchitetturaUniversità di ParmaParmaItaly

Personalised recommendations