Automatic Creation of a Large and Polished Training Set for Sentiment Analysis on Twitter

Cagnoni, Stefano; Fornacciari, Paolo; Kavaja, Juxhino; Mordonini, Monica; Poggi, Agostino; Solimeo, Alex; Tomaiuolo, Michele

doi:10.1007/978-3-319-72926-8_13

Stefano Cagnoni¹⁸,
Paolo Fornacciari¹⁸,
Juxhino Kavaja¹⁸,
Monica Mordonini¹⁸,
Agostino Poggi¹⁸,
Alex Solimeo¹⁸ &
…
Michele Tomaiuolo¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10710))

Included in the following conference series:

International Workshop on Machine Learning, Optimization, and Big Data

3059 Accesses

Abstract

Within the field of sentiment analysis and emotion detection applied to tweets, one of the main problems related to the construction of an automatic classifier is the lack of suitable training sets. Considering the tediousness of manually annotating a training set, and the noise present in data collected directly from the social web, in this paper we propose an iterative learning approach, which combines distant supervision with dataset pruning techinque. In particular, following the “eat your own dogfood” idea, we have applied a classifier, trained on raw data obtained from different Twitter channels, to the same original dataset, for removing the most doubious instances automatically. This kind of approach has been used to obtain a more polished training set for emotion classification, based on Parrot’s model of six basic emotions. On the basis of the achieved results, we argue that the automatic filtering of training sets can make the application of the distant supervision approach more effective in many use cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yan, J.L.S., Turtle, H.R., Liddy, E.D.: EmoTweet-28: a fine-grained emotion corpus for sentiment analysis. In: Proceedings of the 10th International Conference on Language Resources and Evaluation. LREC 2016, pp. 1149–1156 (2016)
Google Scholar
Franchi, E., Poggi, A., Tomaiuolo, M.: Social media for online collaboration in firms and organizations. Int. J. Inf. Syst. Model. Des. (IJISMD) 7, 18–31 (2016)
Article Google Scholar
Sani, L., et al.: Efficient search of relevant structures in complex systems. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016. LNCS (LNAI), vol. 10037, pp. 35–48. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49130-1_4
Chapter Google Scholar
Amoretti, M., Ferrari, A., Fornacciari, P., Mordonini, M., Rosi, F., Tomaiuolo, M.: Local-first algorithms for community detection. In: 2nd International Workshop on Knowledge Discovery on the WEB, KDWeb 2016 (2016)
Google Scholar
Ducange, P., Pecori, R., Mezzina, P.: A glimpse on big data analytics in the framework of marketing strategies. Soft. Comput. 21, 1–18 (2017)
Article Google Scholar
Bollen, J., Mao, H., Pepe, A.: Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In: ICWSM, vol. 11, pp. 450–453 (2011)
Google Scholar
Ugolotti, R., Sassi, F., Mordonini, M., Cagnoni, S.: Multi-sensor system for detection and classification of human activities. J. Ambient Intell. Humaniz. Comput. 4, 27–41 (2013)
Article Google Scholar
Matrella, G., Parada, G., Mordonini, M., Cagnoni, S.: A video-based fall detector sensor well suited for a data-fusion approach. In: Assistive Technology from Adapted Equipment to Inclusive Environments. Assistive Technology Research Series, vol. 25, pp. 327–331 (2009)
Google Scholar
Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5, 1–167 (2012)
Article Google Scholar
Mohammad, S.M.: Sentiment analysis: detecting valence, emotions, and other affectual states from text. In: Emotion Measurement (2015)
Google Scholar
Fornacciari, P., Mordonini, M., Tomauiolo, M.: Social network and sentiment analysis on Twitter: towards a combined approach. In: 1st International Workshop on Knowledge Discovery on the WEB, KDWeb 2015 (2015)
Google Scholar
Mislove, A., Lehmann, S., Ahn, Y.Y., Onnela, J.P., Rosenquist, J.N.: Pulse of the nation: US mood throughout the day inferred from twitter. Northeastern University (2010)
Google Scholar
Allisio, L., Mussa, V., Bosco, C., Patti, V., Ruffo, G.: Felicittà: Visualizing and estimating happiness in italian cities from geotagged tweets. In: ESSEM@ AI* IA, pp. 95–106 (2013)
Google Scholar
Healey, C., Ramaswamy, S.: Visualizing Twitter sentiment (2010). Accessed 17 Jun 2016
Google Scholar
Strapparava, C., Valitutti, A., et al.: Wordnet affect: an affective extension of wordnet. In: LREC, vol. 4, pp. 1083–1086 (2004)
Google Scholar
Strapparava, C., Mihalcea, R.: Learning to identify emotions in text. In: Proceedings of the 2008 ACM symposium on Applied computing, pp. 1556–1560. ACM (2008)
Google Scholar
Poria, S., Cambria, E., Winterstein, G., Huang, G.B.: Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl. Based Syst. 69, 45–63 (2014)
Article Google Scholar
Kao, E.C., Liu, C.C., Yang, T.H., Hsieh, C.T., Soo, V.W.: Towards text-based emotion detection a survey and possible improvements. In: 2009 International Conference on Information Management and Engineering, ICIME 2009, pp. 70–74. IEEE (2009)
Google Scholar
Al-Hajjar, D., Syed, A.Z.: Applying sentiment and emotion analysis on brand tweets for digital marketing. In: 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6. IEEE (2015)
Google Scholar
Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC. Vol. 10, pp. 2200–2204 (2010)
Google Scholar
Mohammad, S.M., Kiritchenko, S.: Using hashtags to capture fine emotion categories from tweets. Comput. Intell. 31, 301–326 (2015)
Article MathSciNet Google Scholar
Ghazi, D., Inkpen, D., Szpakowicz, S.: Hierarchical versus flat classification of emotions in text. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 140–146. Association for Computational Linguistics (2010)
Google Scholar
Angiani, G., Cagnoni, S., Chuzhikova, N., Fornacciari, P., Mordonini, M., Tomaiuolo, M.: Flat and hierarchical classifiers for detecting emotion in tweets. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016. LNCS (LNAI), vol. 10037, pp. 51–64. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49130-1_5
Chapter Google Scholar
Parrott, W.G.: Emotions in Social Psychology: Essential Readings. Psychology Press, Philadelphia (2001)
Google Scholar
Suttles, J., Ide, N.: Distant supervision for emotion classification with discrete binary values. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 121–136. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37256-8_11
Chapter Google Scholar
Plutchik, R., Kellerman, H.: Emotion: Theory, Research and Experience. Academic press, New York (1986)
Google Scholar
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1 (2009)
Google Scholar
Purver, M., Battersby, S.: Experimenting with distant supervision for emotion classification. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 482–491. Association for Computational Linguistics (2012)
Google Scholar
Intxaurrondo, A., Surdeanu, M., De Lacalle, O.L., Agirre, E.: Removing noisy mentions for distant supervision. Procesamiento del lenguaje natural 51, 41–48 (2013)
Google Scholar
Roth, B., Barth, T., Wiegand, M., Klakow, D.: A survey of noise reduction methods for distant supervision. In: Proceedings of the 2013 Workshop on Automated Knowledge Base Construction, pp. 73–78. ACM (2013)
Google Scholar
Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., Wilson, T.: Semeval-2013 task 2: Sentiment analysis in twitter (2013)
Google Scholar
Mohammad, S.M.: # Emotional tweets. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation. SemEval 2012, pp. 246–255. Association for Computational Linguistics, Stroudsburg (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Ingegneria e Architettura, Università di Parma, Parco Area delle Scienze 181/A, 43124, Parma, Italy
Stefano Cagnoni, Paolo Fornacciari, Juxhino Kavaja, Monica Mordonini, Agostino Poggi, Alex Solimeo & Michele Tomaiuolo

Authors

Stefano Cagnoni
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Fornacciari
View author publications
You can also search for this author in PubMed Google Scholar
Juxhino Kavaja
View author publications
You can also search for this author in PubMed Google Scholar
Monica Mordonini
View author publications
You can also search for this author in PubMed Google Scholar
Agostino Poggi
View author publications
You can also search for this author in PubMed Google Scholar
Alex Solimeo
View author publications
You can also search for this author in PubMed Google Scholar
Michele Tomaiuolo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michele Tomaiuolo .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giuseppe Nicosia
University of Florida, Gainesville, FL, USA
Panos Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Harvard University, Cambridge, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cagnoni, S. et al. (2018). Automatic Creation of a Large and Polished Training Set for Sentiment Analysis on Twitter. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R. (eds) Machine Learning, Optimization, and Big Data. MOD 2017. Lecture Notes in Computer Science(), vol 10710. Springer, Cham. https://doi.org/10.1007/978-3-319-72926-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-72926-8_13
Published: 21 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72925-1
Online ISBN: 978-3-319-72926-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics