Feature Selection for Twitter Sentiment Analysis: An Experimental Study

Mansour, Riham; Hady, Mohamed Farouk Abdel; Hosam, Eman; Amr, Hani; Ashour, Ahmed

doi:10.1007/978-3-319-18117-2_7

Feature Selection for Twitter Sentiment Analysis: An Experimental Study

Riham Mansour¹⁴,
Mohamed Farouk Abdel Hady¹⁵,
Eman Hosam¹⁴,
Hani Amr¹⁴ &
…
Ahmed Ashour¹⁴

Conference paper

3647 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9042))

Abstract

Feature selection is an important problem for any pattern classification task. In this paper, we developed an ensemble of two Maximum Entropy classifiers for Twitter sentiment analysis: one for subjectivity and the other for polarity classification. Our ensemble employs surface-form, semantic and sentiment features. The classification complexity of this ensemble of linear models is linear with respect to the number of features. Our goal is to select a compact feature subset from the exhaustive list of extracted features in order to reduce the computational complexity without scarifying the classification accuracy. We evaluate the performance on two benchmark datasets, CrowdScale and SemEval. Our selected 20K features have shown very similar results in subjectivity classification to the NRC state-of-the-art system with 4 million features that has ranked first in 2013 SemEval competition. Also, our selected features have shown a relative performance gain in the ensemble classification over the baseline of uni-gram and bi-gram features of 9.9% on CrowdScale and 11.9% on SemEval.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38 (2011)
Google Scholar
Agarwal, B., Mittal, N.: Optimal feature selection for sentiment analysis. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 13–24. Springer, Heidelberg (2013)
Chapter Google Scholar
Bakliwal, A., Arora, P., Varma, V.: Hindi subjective lexicon: A lexical resource for hindi adjective polarity classification. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012) (2012)
Google Scholar
Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pp. 36–44 (2010)
Google Scholar
Chalothorn, T., Ellman, J.: Tjp: Using twitter to analyze the polarity of contexts, Atlanta, Georgia, USA, p. 375 (2013)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. The Journal of Machine Learning Research 12, 2493–2537 (2011)
MATH Google Scholar
Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.A.: Part-of-speech tagging for twitter: Annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short papers, vol. 2, pp. 42–47. Association for Computational Linguistics (2011)
Google Scholar
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision, vol. 150, pp. 1–6. Ainsworth Press Ltd (2009)
Google Scholar
Han, B., Baldwin, T.: Lexical normalisation of short text messages: Makn sens a# twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (2011)
Google Scholar
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM (2004)
Google Scholar
Hu, X., Tang, J., Gao, H., Liu, H.: Unsupervised sentiment analysis with emotional signals. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 607–618. International World Wide Web Conferences Steering Committee (2013)
Google Scholar
Martınez-Cámara, E., Montejo-Ráez, A., Martın-Valdivia, M., Urena-López, L.: Sinai: Machine learning and emotion of the crowd for sentiment analysis in microblogs, Atlanta, Georgia, USA, p. 402 (2013)
Google Scholar
Mohammad, S.M., Kiritchenko, S., Zhu, X.: Nrc-canada: Building the state-of-the-art in sentiment analysis of tweets. arXiv preprint arXiv:1308.6242 (2013)
Google Scholar
Mohammad, S.M., Turney, P.D.: Emotions evoked by common words and phrases: Using mechanical turk to create an emotion lexicon. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, pp. 26–34. Association for Computational Linguistics (2010)
Google Scholar
Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., Wilson, T.: Semeval-2013 task 2: Sentiment analysis in twitter (2013)
Google Scholar
Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010) (2010)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques, pp. 79–86 (2002)
Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
Article Google Scholar
Read, J.: Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL Student Research Workshop, ACLstudent 2005, pp. 43–48 (2005)
Google Scholar
Remus, R.: Asvuniofleipzig: Sentiment analysis in twitter using data-driven machine learning techniques. Atlanta, Georgia, USA 3(1,278), 450 (2013)
Google Scholar
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1555–1565 (2014)
Google Scholar
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of HLT/EMNLP 2005 (2005)
Google Scholar
Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., Liu, B.: Combining lexicon-based and learning-based methods for twitter sentiment analysis, vol. 89 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Advanced Technology Lab, Cairo, Egypt
Riham Mansour, Eman Hosam, Hani Amr & Ahmed Ashour
Microsoft, Redmond, WA, USA
Mohamed Farouk Abdel Hady

Authors

Riham Mansour
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Farouk Abdel Hady
View author publications
You can also search for this author in PubMed Google Scholar
Eman Hosam
View author publications
You can also search for this author in PubMed Google Scholar
Hani Amr
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Ashour
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riham Mansour .

Editor information

Editors and Affiliations

Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico DF, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mansour, R., Hady, M.F.A., Hosam, E., Amr, H., Ashour, A. (2015). Feature Selection for Twitter Sentiment Analysis: An Experimental Study. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-18117-2_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics