Sentiment-Preserving Reduction for Social Media Analysis

  • Sergio Hernández
  • Philip Sallis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7042)


In this paper, we address the problem of opinion analysis using a probabilistic approach to the underlying structure of different types of opinions or sentiments around a certain object. In our approach, an opinion is partitioned according to whether there is a direct relevance to a latent topic or sentiment. Opinions are then expressed as a mixture of sentiment-related parameters and the noise is regarded as data stream errors or spam. We propose an entropy-based approach using a value-weighted matrix for word relevance matching which is also used to compute document scores. By using a bootstrap technique with sampling proportions given by the word scores, we show that a lower dimensionality matrix can be achieved. The resulting noise-reduced data is regarded as a sentiment-preserving reduction layer, where terms of direct relevance to the initial parameter values are stored


Opinion Mining Topic Modeling Latent Dirichlet Allocation Sentiment Analysis Latent Dirichlet Allocation Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  2. 2.
    Hernandez, S., Garden, K.L., Sallis, P.J.: A signal denoising method for text meaning vectors. In: Proceedings of the Fifth Asia Modelling Symposium (to appear, 2011)Google Scholar
  3. 3.
    Hong, L., Davison, B.D.: Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics, SOMA 2010, pp. 80–88. ACM, New York (2010)Google Scholar
  4. 4.
    Hu, M., Liu, B.: Opinion extraction and summarization on the web. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 2, pp. 1621–1624. AAAI Press (2006)Google Scholar
  5. 5.
    Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining, WSDM 2008, pp. 219–230. ACM, New York (2008)Google Scholar
  6. 6.
    Lin, C., He, Y., Everson, R., Ruger, S.: Weakly-supervised joint sentiment-topic detection from text. IEEE Transactions on Knowledge and Data Engineering PP(99), 1 (2011)Google Scholar
  7. 7.
    Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 375–384. ACM, New York (2009)Google Scholar
  8. 8.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)CrossRefGoogle Scholar
  9. 9.
    R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2011),, ISBN 3-900051-07-0
  10. 10.
    Taddy, M.A.: Inverse Regression for Analysis of Sentiment in Text. ArXiv e-prints (December 2010)Google Scholar
  11. 11.
    Wallach, H.M.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 977–984. ACM, New York (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Sergio Hernández
    • 1
  • Philip Sallis
    • 2
  1. 1.Laboratorio de Procesamiento de Información GeoespacialUniversidad Católica del MauleTalcaChile
  2. 2.Geoinformatics Research CentreAuckland University of TechnologyAucklandNew Zealand

Personalised recommendations