Skip to main content

Automatic Creation of a Large and Polished Training Set for Sentiment Analysis on Twitter

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Big Data (MOD 2017)

Abstract

Within the field of sentiment analysis and emotion detection applied to tweets, one of the main problems related to the construction of an automatic classifier is the lack of suitable training sets. Considering the tediousness of manually annotating a training set, and the noise present in data collected directly from the social web, in this paper we propose an iterative learning approach, which combines distant supervision with dataset pruning techinque. In particular, following the “eat your own dogfood” idea, we have applied a classifier, trained on raw data obtained from different Twitter channels, to the same original dataset, for removing the most doubious instances automatically. This kind of approach has been used to obtain a more polished training set for emotion classification, based on Parrot’s model of six basic emotions. On the basis of the achieved results, we argue that the automatic filtering of training sets can make the application of the distant supervision approach more effective in many use cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yan, J.L.S., Turtle, H.R., Liddy, E.D.: EmoTweet-28: a fine-grained emotion corpus for sentiment analysis. In: Proceedings of the 10th International Conference on Language Resources and Evaluation. LREC 2016, pp. 1149–1156 (2016)

    Google Scholar 

  2. Franchi, E., Poggi, A., Tomaiuolo, M.: Social media for online collaboration in firms and organizations. Int. J. Inf. Syst. Model. Des. (IJISMD) 7, 18–31 (2016)

    Article  Google Scholar 

  3. Sani, L., et al.: Efficient search of relevant structures in complex systems. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016. LNCS (LNAI), vol. 10037, pp. 35–48. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49130-1_4

    Chapter  Google Scholar 

  4. Amoretti, M., Ferrari, A., Fornacciari, P., Mordonini, M., Rosi, F., Tomaiuolo, M.: Local-first algorithms for community detection. In: 2nd International Workshop on Knowledge Discovery on the WEB, KDWeb 2016 (2016)

    Google Scholar 

  5. Ducange, P., Pecori, R., Mezzina, P.: A glimpse on big data analytics in the framework of marketing strategies. Soft. Comput. 21, 1–18 (2017)

    Article  Google Scholar 

  6. Bollen, J., Mao, H., Pepe, A.: Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In: ICWSM, vol. 11, pp. 450–453 (2011)

    Google Scholar 

  7. Ugolotti, R., Sassi, F., Mordonini, M., Cagnoni, S.: Multi-sensor system for detection and classification of human activities. J. Ambient Intell. Humaniz. Comput. 4, 27–41 (2013)

    Article  Google Scholar 

  8. Matrella, G., Parada, G., Mordonini, M., Cagnoni, S.: A video-based fall detector sensor well suited for a data-fusion approach. In: Assistive Technology from Adapted Equipment to Inclusive Environments. Assistive Technology Research Series, vol. 25, pp. 327–331 (2009)

    Google Scholar 

  9. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5, 1–167 (2012)

    Article  Google Scholar 

  10. Mohammad, S.M.: Sentiment analysis: detecting valence, emotions, and other affectual states from text. In: Emotion Measurement (2015)

    Google Scholar 

  11. Fornacciari, P., Mordonini, M., Tomauiolo, M.: Social network and sentiment analysis on Twitter: towards a combined approach. In: 1st International Workshop on Knowledge Discovery on the WEB, KDWeb 2015 (2015)

    Google Scholar 

  12. Mislove, A., Lehmann, S., Ahn, Y.Y., Onnela, J.P., Rosenquist, J.N.: Pulse of the nation: US mood throughout the day inferred from twitter. Northeastern University (2010)

    Google Scholar 

  13. Allisio, L., Mussa, V., Bosco, C., Patti, V., Ruffo, G.: Felicittà: Visualizing and estimating happiness in italian cities from geotagged tweets. In: ESSEM@ AI* IA, pp. 95–106 (2013)

    Google Scholar 

  14. Healey, C., Ramaswamy, S.: Visualizing Twitter sentiment (2010). Accessed 17 Jun 2016

    Google Scholar 

  15. Strapparava, C., Valitutti, A., et al.: Wordnet affect: an affective extension of wordnet. In: LREC, vol. 4, pp. 1083–1086 (2004)

    Google Scholar 

  16. Strapparava, C., Mihalcea, R.: Learning to identify emotions in text. In: Proceedings of the 2008 ACM symposium on Applied computing, pp. 1556–1560. ACM (2008)

    Google Scholar 

  17. Poria, S., Cambria, E., Winterstein, G., Huang, G.B.: Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl. Based Syst. 69, 45–63 (2014)

    Article  Google Scholar 

  18. Kao, E.C., Liu, C.C., Yang, T.H., Hsieh, C.T., Soo, V.W.: Towards text-based emotion detection a survey and possible improvements. In: 2009 International Conference on Information Management and Engineering, ICIME 2009, pp. 70–74. IEEE (2009)

    Google Scholar 

  19. Al-Hajjar, D., Syed, A.Z.: Applying sentiment and emotion analysis on brand tweets for digital marketing. In: 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6. IEEE (2015)

    Google Scholar 

  20. Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC. Vol. 10, pp. 2200–2204 (2010)

    Google Scholar 

  21. Mohammad, S.M., Kiritchenko, S.: Using hashtags to capture fine emotion categories from tweets. Comput. Intell. 31, 301–326 (2015)

    Article  MathSciNet  Google Scholar 

  22. Ghazi, D., Inkpen, D., Szpakowicz, S.: Hierarchical versus flat classification of emotions in text. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 140–146. Association for Computational Linguistics (2010)

    Google Scholar 

  23. Angiani, G., Cagnoni, S., Chuzhikova, N., Fornacciari, P., Mordonini, M., Tomaiuolo, M.: Flat and hierarchical classifiers for detecting emotion in tweets. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016. LNCS (LNAI), vol. 10037, pp. 51–64. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49130-1_5

    Chapter  Google Scholar 

  24. Parrott, W.G.: Emotions in Social Psychology: Essential Readings. Psychology Press, Philadelphia (2001)

    Google Scholar 

  25. Suttles, J., Ide, N.: Distant supervision for emotion classification with discrete binary values. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 121–136. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37256-8_11

    Chapter  Google Scholar 

  26. Plutchik, R., Kellerman, H.: Emotion: Theory, Research and Experience. Academic press, New York (1986)

    Google Scholar 

  27. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1 (2009)

    Google Scholar 

  28. Purver, M., Battersby, S.: Experimenting with distant supervision for emotion classification. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 482–491. Association for Computational Linguistics (2012)

    Google Scholar 

  29. Intxaurrondo, A., Surdeanu, M., De Lacalle, O.L., Agirre, E.: Removing noisy mentions for distant supervision. Procesamiento del lenguaje natural 51, 41–48 (2013)

    Google Scholar 

  30. Roth, B., Barth, T., Wiegand, M., Klakow, D.: A survey of noise reduction methods for distant supervision. In: Proceedings of the 2013 Workshop on Automated Knowledge Base Construction, pp. 73–78. ACM (2013)

    Google Scholar 

  31. Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., Wilson, T.: Semeval-2013 task 2: Sentiment analysis in twitter (2013)

    Google Scholar 

  32. Mohammad, S.M.: # Emotional tweets. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation. SemEval 2012, pp. 246–255. Association for Computational Linguistics, Stroudsburg (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michele Tomaiuolo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cagnoni, S. et al. (2018). Automatic Creation of a Large and Polished Training Set for Sentiment Analysis on Twitter. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R. (eds) Machine Learning, Optimization, and Big Data. MOD 2017. Lecture Notes in Computer Science(), vol 10710. Springer, Cham. https://doi.org/10.1007/978-3-319-72926-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-72926-8_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-72925-1

  • Online ISBN: 978-3-319-72926-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics