Towards a Robust Metric of Polarity

  • Kamal Nigam
  • Matthew Hurst
Part of the The Information Retrieval Series book series (INRE, volume 20)


This chapter describes an automated system for detecting polar expressions about a specified topic. The two elementary components of this approach are a shallow NLP polar language extraction system and a machine learning based topic classifier. These components are composed together by making a simple but accurate collocation assumption: if a topical sentence contains polar language, the polarity is associated with the topic. We evaluate our system, components and assumption on a corpus of online consumer messages.

Based on these components, we discuss how to measure the overall sentiment about a particular topic as expressed in online messages authored by many different people. We propose to use the fundamentals of Bayesian statistics to form an aggregate authorial opinion metric. This metric would propagate uncertainties introduced by the polarity and topic modules to facilitate statistically valid comparisons of opinion across multiple topics.


natural language processing text classification sentiment analysis text mining metrics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

10. References

  1. Agrawal, R., Rajagopalan, S., Srikant, R., and Xu, Y. (2003) Mining newsgroups using networks arising from social behavior. In Proceedings of the 12th World Wide Web Conference.Google Scholar
  2. Banfield, A. (1982) Unspeakable Sentences. Boston: Routledge and Kegan Paul.Google Scholar
  3. Blum, A. (1997) Empirical support for Winnow and weighted-majority based algorithms: Results on a calendar scheduling domain. Machine Learning 26:5–23.CrossRefMathSciNetGoogle Scholar
  4. Dagan, I., Karov, Y, and Roth, D. (1997) Mistake-driven learning in text categorization. In EMNLP’ 97, 2nd Conference on Empirical Methods in Natural Language Processing.Google Scholar
  5. Dave, K., Lawrence, S., and Pennock, D. M. (2003) Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of the 12th World Wide Web Conference.Google Scholar
  6. Engstrom, C. (2004) Topic Dependence in Sentiment Classification. Master’s thesis, Cambridge University.Google Scholar
  7. GoogleMovies. Scholar
  8. Hurst, M., and Nigam, K. (2004) Retrieving topical sentiment from online document collections. In Proceedings of the 11th Conference on Document Recognition and Retrieval.Google Scholar
  9. Joachims, T. (1998) Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98 Tenth European Conference on Machine Learning, 137–142.Google Scholar
  10. Littlestone, N. (1998). Learning quickly when irrelevant features abound: A new linear-threshold algorithm. Machine Learning 2:285–318.Google Scholar
  11. Nasukawa, T., and Yi, J. (2003) Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of K-CAP’ 03.Google Scholar
  12. Pang, B., Lee, L., and Vaithyanathan, S. (2002) Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP 2002.Google Scholar
  13. Wiebe, J., Wilson, T., and Bell, M. (2001) Identifying collocations for recognizing opinions. In Proceedings of ACL/EACL’ 01 Workshop on Collocation.Google Scholar
  14. Yang, Y. (1999) An evaluation of statistical approaches to text categorization. Information Retrieval 1(1/2): 67–88.CrossRefzbMATHGoogle Scholar

Copyright information

© Springer 2006

Authors and Affiliations

  • Kamal Nigam
    • 1
  • Matthew Hurst
    • 1
  1. 1.Intelliseek Applied Research CenterPittsburghUSA

Personalised recommendations