Blog Data Mining: The Predictive Power of Sentiments

  • Yang Liu
  • Xiaohui Yu
  • Xiangji Huang
  • Aijun An

In this chapter, we study the problem of mining sentiment information from online resources and investigate ways to use such information to predict product sales performance. In particular, we conduct an empirical study on using the sentiment information mined from blogs to predict movie box office performance. We propose Sentiment PLSA (S-PLSA), in which a blog entry is viewed as a document generated by a number of hidden sentiment factors. Training an S-PLSA model on the blog data enables us to obtain a succinct summary of the sentiment information embedded in the blogs. We then present ARSA, an autoregressive sentiment-aware model, to utilize the sentiment information captured by S-PLSA for predicting product sales performance. Extensive experiments were conducted on the movie data set. Experiments confirm the effectiveness and superiority of the proposed approach.


Mean Absolute Percentage Error Sentiment Analysis Sales Performance Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation.Journal of Machine Learning Research, 2003.Google Scholar
  2. 2.
    A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via theemalgorithm.Journal of Royal Statistical Society, B(39):1 – 38, 1977.MathSciNetGoogle Scholar
  3. 3.
    W. Enders.Applied Econometric Time Series. Wiley, New York, 2nd edition, 2004.Google Scholar
  4. 4.
    D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins. The predictive power of online chatter. InKDD '05, pages 78 – 87, 2005.Google Scholar
  5. 5.
    D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. InWWW '04, pages 491 – 501, 2004.Google Scholar
  6. 6.
    T. Hofmann. Probabilistic latent semantic analysis. InUAI'99, 1999.Google Scholar
  7. 7.
    W. Jank, G. Shmueli, and S. Wang. Dynamic, real-time forecasting of online auctions via functional models. InKDD '06, pages 580 – 585, 2006.Google Scholar
  8. 8.
    J. Kamps and M. Marx. Words with attitude. InProc. of the First International Conference on Global WordNet, pages332 – 341, 2002.Google Scholar
  9. 9.
    B. Liu, M. Hu, and J. Cheng. Opinion observer: analyzing and comparing opinions on the web. InWWW '05, pages 342 – 351, 2005.Google Scholar
  10. 10.
    B. Pang and L. Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. InACL '04, pages 271 – 278, 2004.Google Scholar
  11. 11.
    B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. InACL '05, pages 115 – 124, 2005.Google Scholar
  12. 12.
    B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. InProc. of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002.Google Scholar
  13. 13.
    P. D. Turney. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. InACL '02, pages 417 – 424, 2001.Google Scholar
  14. 14.
    C. Whitelaw, N. Garg, and S. Argamon. Using appraisal groups for sentiment analysis. InCIKM '05, pages 625 – 631, 2005.Google Scholar
  15. 15.
    Z. Zhang and B. Varadarajan. Utility scoring of product reviews. InCIKM '06, pages 51 – 57, 2006.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringYork UniversityTorontoCanada
  2. 2.School of Information TechnologyYork UniversityTorontoCanada

Personalised recommendations