Skip to main content

Compact Features for Sentiment Analysis

  • Conference paper
Book cover Advances in Artificial Intelligence (Canadian AI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6657))

Included in the following conference series:

Abstract

This work examines a novel method of developing features to use for machine learning of sentiment analysis and related tasks. This task is frequently approached using a “Bag of Words” representation – one feature for each word encountered in the training data – which can easily involve thousands of features. This paper describes a set of compact features developed by learning scores for words, dividing the range of possible scores into a number of bins, and then generating features based on the distribution of scored words in the document over the bins. This allows for effective learning of sentiment and related tasks with 25 features; in fact, performance was very often slightly better with these features than with a simple bag of words baseline. This vast reduction in the number of features reduces training time considerably on large datasets, and allows for using much larger datasets than previously attempted with bag of words approaches, improving performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andreevskaia, A., Bergler, S.: When specialists and generalists work together: Overcoming domain dependence in sentiment tagging. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (2008)

    Google Scholar 

  2. Blitzer, J., Dredze, M., Pereira, F.: Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, ACL 2007 (2007), http://acl.ldc.upenn.edu/P/P07/P07-1056.pdf

  3. Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: Opinon extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web (2003)

    Google Scholar 

  4. Gaudette, L., Japkowicz, N.: Evaluation methods for ordinal classification. In: Proceedings of the Twenty-second Canadian Conference in Artificial Intelligence, AI 2009 (2009)

    Google Scholar 

  5. Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15(6), 1437–1447 (2003)

    Article  Google Scholar 

  6. Kennedy, A., Inkpen, D.: Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence 32(2), 223–262 (2006), http://www.site.uottawa.ca/~diana/publications.html

    Google Scholar 

  7. Martineau, J., Finin, T.: Delta TFIDF: An improved feature space for sentiment analysis. In: Third AAAI Internatonal Conference on Weblogs and Social Media (2009)

    Google Scholar 

  8. Mullen, T., Collier, N.: Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004), http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Mullen.pdf

  9. Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL 2004: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics, Morristown (2004)

    Chapter  Google Scholar 

  10. Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL 2005: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics, Morristown (2005)

    Chapter  Google Scholar 

  11. Pang, B., Lee, L.: Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval, vol. 2. Now (2008)

    Google Scholar 

  12. Wiebe, J., Wilson, T., Bell, M.: Identifying collocations for recognizing opinions. In: Proceedings of the ACL 2001 Workshop on Collocation (2001)

    Google Scholar 

  13. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gaudette, L., Japkowicz, N. (2011). Compact Features for Sentiment Analysis. In: Butz, C., Lingras, P. (eds) Advances in Artificial Intelligence. Canadian AI 2011. Lecture Notes in Computer Science(), vol 6657. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21043-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21043-3_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21042-6

  • Online ISBN: 978-3-642-21043-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics