Abstract
Within the field of sentiment analysis and emotion detection applied to tweets, one of the main problems related to the construction of an automatic classifier is the lack of suitable training sets. Considering the tediousness of manually annotating a training set, and the noise present in data collected directly from the social web, in this paper we propose an iterative learning approach, which combines distant supervision with dataset pruning techinque. In particular, following the “eat your own dogfood” idea, we have applied a classifier, trained on raw data obtained from different Twitter channels, to the same original dataset, for removing the most doubious instances automatically. This kind of approach has been used to obtain a more polished training set for emotion classification, based on Parrot’s model of six basic emotions. On the basis of the achieved results, we argue that the automatic filtering of training sets can make the application of the distant supervision approach more effective in many use cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yan, J.L.S., Turtle, H.R., Liddy, E.D.: EmoTweet-28: a fine-grained emotion corpus for sentiment analysis. In: Proceedings of the 10th International Conference on Language Resources and Evaluation. LREC 2016, pp. 1149–1156 (2016)
Franchi, E., Poggi, A., Tomaiuolo, M.: Social media for online collaboration in firms and organizations. Int. J. Inf. Syst. Model. Des. (IJISMD) 7, 18–31 (2016)
Sani, L., et al.: Efficient search of relevant structures in complex systems. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016. LNCS (LNAI), vol. 10037, pp. 35–48. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49130-1_4
Amoretti, M., Ferrari, A., Fornacciari, P., Mordonini, M., Rosi, F., Tomaiuolo, M.: Local-first algorithms for community detection. In: 2nd International Workshop on Knowledge Discovery on the WEB, KDWeb 2016 (2016)
Ducange, P., Pecori, R., Mezzina, P.: A glimpse on big data analytics in the framework of marketing strategies. Soft. Comput. 21, 1–18 (2017)
Bollen, J., Mao, H., Pepe, A.: Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In: ICWSM, vol. 11, pp. 450–453 (2011)
Ugolotti, R., Sassi, F., Mordonini, M., Cagnoni, S.: Multi-sensor system for detection and classification of human activities. J. Ambient Intell. Humaniz. Comput. 4, 27–41 (2013)
Matrella, G., Parada, G., Mordonini, M., Cagnoni, S.: A video-based fall detector sensor well suited for a data-fusion approach. In: Assistive Technology from Adapted Equipment to Inclusive Environments. Assistive Technology Research Series, vol. 25, pp. 327–331 (2009)
Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5, 1–167 (2012)
Mohammad, S.M.: Sentiment analysis: detecting valence, emotions, and other affectual states from text. In: Emotion Measurement (2015)
Fornacciari, P., Mordonini, M., Tomauiolo, M.: Social network and sentiment analysis on Twitter: towards a combined approach. In: 1st International Workshop on Knowledge Discovery on the WEB, KDWeb 2015 (2015)
Mislove, A., Lehmann, S., Ahn, Y.Y., Onnela, J.P., Rosenquist, J.N.: Pulse of the nation: US mood throughout the day inferred from twitter. Northeastern University (2010)
Allisio, L., Mussa, V., Bosco, C., Patti, V., Ruffo, G.: Felicittà: Visualizing and estimating happiness in italian cities from geotagged tweets. In: ESSEM@ AI* IA, pp. 95–106 (2013)
Healey, C., Ramaswamy, S.: Visualizing Twitter sentiment (2010). Accessed 17 Jun 2016
Strapparava, C., Valitutti, A., et al.: Wordnet affect: an affective extension of wordnet. In: LREC, vol. 4, pp. 1083–1086 (2004)
Strapparava, C., Mihalcea, R.: Learning to identify emotions in text. In: Proceedings of the 2008 ACM symposium on Applied computing, pp. 1556–1560. ACM (2008)
Poria, S., Cambria, E., Winterstein, G., Huang, G.B.: Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl. Based Syst. 69, 45–63 (2014)
Kao, E.C., Liu, C.C., Yang, T.H., Hsieh, C.T., Soo, V.W.: Towards text-based emotion detection a survey and possible improvements. In: 2009 International Conference on Information Management and Engineering, ICIME 2009, pp. 70–74. IEEE (2009)
Al-Hajjar, D., Syed, A.Z.: Applying sentiment and emotion analysis on brand tweets for digital marketing. In: 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6. IEEE (2015)
Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC. Vol. 10, pp. 2200–2204 (2010)
Mohammad, S.M., Kiritchenko, S.: Using hashtags to capture fine emotion categories from tweets. Comput. Intell. 31, 301–326 (2015)
Ghazi, D., Inkpen, D., Szpakowicz, S.: Hierarchical versus flat classification of emotions in text. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 140–146. Association for Computational Linguistics (2010)
Angiani, G., Cagnoni, S., Chuzhikova, N., Fornacciari, P., Mordonini, M., Tomaiuolo, M.: Flat and hierarchical classifiers for detecting emotion in tweets. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016. LNCS (LNAI), vol. 10037, pp. 51–64. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49130-1_5
Parrott, W.G.: Emotions in Social Psychology: Essential Readings. Psychology Press, Philadelphia (2001)
Suttles, J., Ide, N.: Distant supervision for emotion classification with discrete binary values. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 121–136. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37256-8_11
Plutchik, R., Kellerman, H.: Emotion: Theory, Research and Experience. Academic press, New York (1986)
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1 (2009)
Purver, M., Battersby, S.: Experimenting with distant supervision for emotion classification. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 482–491. Association for Computational Linguistics (2012)
Intxaurrondo, A., Surdeanu, M., De Lacalle, O.L., Agirre, E.: Removing noisy mentions for distant supervision. Procesamiento del lenguaje natural 51, 41–48 (2013)
Roth, B., Barth, T., Wiegand, M., Klakow, D.: A survey of noise reduction methods for distant supervision. In: Proceedings of the 2013 Workshop on Automated Knowledge Base Construction, pp. 73–78. ACM (2013)
Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., Wilson, T.: Semeval-2013 task 2: Sentiment analysis in twitter (2013)
Mohammad, S.M.: # Emotional tweets. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation. SemEval 2012, pp. 246–255. Association for Computational Linguistics, Stroudsburg (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Cagnoni, S. et al. (2018). Automatic Creation of a Large and Polished Training Set for Sentiment Analysis on Twitter. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R. (eds) Machine Learning, Optimization, and Big Data. MOD 2017. Lecture Notes in Computer Science(), vol 10710. Springer, Cham. https://doi.org/10.1007/978-3-319-72926-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-72926-8_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72925-1
Online ISBN: 978-3-319-72926-8
eBook Packages: Computer ScienceComputer Science (R0)