Skip to main content

Unsupervised Learning Chinese Sentiment Lexicon from Massive Microblog Data

  • Conference paper
Book cover Advanced Data Mining and Applications (ADMA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7713))

Included in the following conference series:

Abstract

Analyzing people’s feelings and emotions in social media has become a major concern for both academic researchers and commercial companies. The sentiment lexicon plays a crucial role in the most sentiment analysis applications. However, existing thesaurus based lexicon building methods suffer from the coverage problems when faced with the new words and new meanings in social media. Nowadays, millions of users share their opinions on different aspects of life everyday in microblogs. In this paper, a novel method based on occurrence probability with emoticons is presented to learn the candidate sentiment words from the massive microblog data and the accuracy of the learned lexicon is further improved by using the whole microblog space as the corpus. Extensive experiments were conducted on real world datasets with different topics. The results show that the proposed method is able to extract the emerging words, and learned lexicon outperforms two well-known Chinese lexicons in classifying the sentiments in microblogs.

Project supported by the State Key Development Program for Basic Research of China (Grant No. 2011CB302200-G), State Key Program of National Natural Science of China (Grant No. 61033007), National Natural Science Foundation of China (Grant No. 61100026, 60973019), and the Fundamental Research Funds for the Central Universities (N100704001).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Proc. of EMNLP, pp. 79–86 (2002)

    Google Scholar 

  2. Jin, W., Ho, H., Srihari, R.: OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction. In: Proc. of KDD, pp. 1195–1204 (2009)

    Google Scholar 

  3. Das, S., Chen, M.: Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web. Management Science 53(9), 1375–1388 (2007)

    Article  Google Scholar 

  4. Kim, S., Hovy, E.: Determining the Sentiment of Opinions. In: Proc. of COLING, pp. 1367–1373 (2004)

    Google Scholar 

  5. Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In: Proc. of LREC, pp. 2200–2204 (2010)

    Google Scholar 

  6. Pak, A., Paroubek, P.: Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In: Proc. of LREC, pp. 1320–1326 (2010)

    Google Scholar 

  7. Davidov, D., Tsur, O., Rappoport, A.: Enhanced Sentiment Learning Using Twitter Hashtags and Smileys. In: Proc. of COLING, pp. 241–249 (2010)

    Google Scholar 

  8. Hu, M., Liu, B.: Mining and Summarizing Customer Reviews. In: Proc. of KDD, pp. 168–177 (2004)

    Google Scholar 

  9. Esuli, A., Sebastiani, F.: PageRanking WordNet Synsets: An Application to Opinion Mining. In: Proc. of ACL, pp. 424–431 (2007)

    Google Scholar 

  10. Turney, P.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: Proc. of ACL, pp. 417–424 (2002)

    Google Scholar 

  11. Kanayama, H., Nasukawa, T.: Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis. In: Proc. of ENMLP, pp. 355–363 (2006)

    Google Scholar 

  12. Kaji, N., Kitsuregawa, M.: Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents. In: Proc. of EMNLP-CoNLL, pp. 1075–1083 (2007)

    Google Scholar 

  13. Velikovich, L., Blair-Goldensohn, S., Hannan, K., McDonald, R.: The Viability of Web-derived Polarity Lexicons. In: Proc. of HLT-NAACL, pp. 777–785 (2010)

    Google Scholar 

  14. Bermingham, A., Smeaton, A.: Classifying Sentiment in Microblogs: Is Brevity an Advantage? In: Proc. of CIKM, pp. 1833–1836 (2010)

    Google Scholar 

  15. Brody, S., Diakopoulos, N.: Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs. In: Proc. of EMNLP, pp. 562–570 (2011)

    Google Scholar 

  16. Cilibrasi, R., Vitnyi, P.: The Google Similarity Distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)

    Article  Google Scholar 

  17. Ku, L., Chen, H.: Mining Opinions from the Web: Beyond Relevance Retrieval. Journal of American Society for Information Science and Technology 58(12), 1838–1850 (2007)

    Article  Google Scholar 

  18. HowNet, http://www.keenage.com/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Feng, S., Wang, L., Xu, W., Wang, D., Yu, G. (2012). Unsupervised Learning Chinese Sentiment Lexicon from Massive Microblog Data. In: Zhou, S., Zhang, S., Karypis, G. (eds) Advanced Data Mining and Applications. ADMA 2012. Lecture Notes in Computer Science(), vol 7713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35527-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35527-1_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35526-4

  • Online ISBN: 978-3-642-35527-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics