Abstract
This work examines a novel method of developing features to use for machine learning of sentiment analysis and related tasks. This task is frequently approached using a “Bag of Words” representation – one feature for each word encountered in the training data – which can easily involve thousands of features. This paper describes a set of compact features developed by learning scores for words, dividing the range of possible scores into a number of bins, and then generating features based on the distribution of scored words in the document over the bins. This allows for effective learning of sentiment and related tasks with 25 features; in fact, performance was very often slightly better with these features than with a simple bag of words baseline. This vast reduction in the number of features reduces training time considerably on large datasets, and allows for using much larger datasets than previously attempted with bag of words approaches, improving performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andreevskaia, A., Bergler, S.: When specialists and generalists work together: Overcoming domain dependence in sentiment tagging. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (2008)
Blitzer, J., Dredze, M., Pereira, F.: Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, ACL 2007 (2007), http://acl.ldc.upenn.edu/P/P07/P07-1056.pdf
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: Opinon extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web (2003)
Gaudette, L., Japkowicz, N.: Evaluation methods for ordinal classification. In: Proceedings of the Twenty-second Canadian Conference in Artificial Intelligence, AI 2009 (2009)
Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15(6), 1437–1447 (2003)
Kennedy, A., Inkpen, D.: Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence 32(2), 223–262 (2006), http://www.site.uottawa.ca/~diana/publications.html
Martineau, J., Finin, T.: Delta TFIDF: An improved feature space for sentiment analysis. In: Third AAAI Internatonal Conference on Weblogs and Social Media (2009)
Mullen, T., Collier, N.: Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004), http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Mullen.pdf
Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL 2004: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics, Morristown (2004)
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL 2005: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics, Morristown (2005)
Pang, B., Lee, L.: Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval, vol. 2. Now (2008)
Wiebe, J., Wilson, T., Bell, M.: Identifying collocations for recognizing opinions. In: Proceedings of the ACL 2001 Workshop on Collocation (2001)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gaudette, L., Japkowicz, N. (2011). Compact Features for Sentiment Analysis. In: Butz, C., Lingras, P. (eds) Advances in Artificial Intelligence. Canadian AI 2011. Lecture Notes in Computer Science(), vol 6657. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21043-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-21043-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21042-6
Online ISBN: 978-3-642-21043-3
eBook Packages: Computer ScienceComputer Science (R0)