Skip to main content

Improving the Sentiment Analysis Process of Spanish Tweets with BM25

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9612))

Abstract

The enormous growth of user-generated information of social networks has caused the need for new algorithms and methods for their classification. The Sentiment Analysis (SA) methods attempt to identify the polarity of a text, using among other resources, the ranking algorithms. One of the most popular ranking algorithms is the Okapi BM25 ranking, designed to rank documents according to their relevance on a topic. In this paper, we present an approach of sentiment analysis for Spanish Tweets based combining the BM25 ranking function with a Linear Support Vector supervised model. We describe the implemented procedure to adapt BM25 to the peculiarities of SA in Twitter. The results confirm the potential of the BM25 algorithm to improve the sentiment analysis tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Workshop on Sentiment Analysis at SEPLN Conference.

  2. 2.

    http://www.sepln.org/.

  3. 3.

    http://scikit-learn.org/.

References

  1. Anta, A.F., Chiroque, L.N., Morere, P., Santos, A.: Sentiment analysis and topic detection of Spanish tweets: a comparative study of NLP techniques. Procesamiento del lenguaje natural 50, 45–52 (2013)

    Google Scholar 

  2. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  3. Cox, D.R.: The regression analysis of binary sequences. J. Roy. Stat. Soc. Series B (Methodological) 20, 215–242 (1958)

    MathSciNet  MATH  Google Scholar 

  4. Esparza, S.G., O’Mahony, M.P., Smyth, B.: Mining the real-time web: a novel approach to product recommendation. Knowl.-Based Syst. 29, 3–11 (2012)

    Article  Google Scholar 

  5. Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: Proceedings of the 27th Annual International ACM SIGIR (2004)

    Google Scholar 

  6. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  7. Gamallo, P., Garcia, M., Fernández-Lanza, S.: A Naive-Bayes strategy for sentiment analysis on Spanish tweets. In: Workshop on Sentiment Analysis at SEPLN (TASS 2013), pp. 126–132 (2013)

    Google Scholar 

  8. Han, B., Cook, P., Baldwin, T.: Unimelb: Spanish text normalisation. In: Tweet-Norm@ SEPLN (2013)

    Google Scholar 

  9. Hurtado, L.F., Pla, F., Buscaldi, D.: ELiRF-UPV en TASS 2015: Anlisis de Sentimientos en Twitter. In: TASS 2015: Workshop on Sentiment Analysis at SEPLN (2015)

    Google Scholar 

  10. Hurtado, L.F., Pla, F.: ELiRF-UPV en TASS 2014: Analisis de sentimientos, deteccin de tpicos y anlisis de sentimientos de aspectos en twitter. Procesamiento del Lenguaje Natural (2014)

    Google Scholar 

  11. Sparck-Jones, K., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments: Part 2. Info. Process. Manage. 36, 809–840 (2000)

    Article  Google Scholar 

  12. Liu, S., Liu, F., Yu, C., Meng, W.: An effective approach to document retrieval via utilizing WordNet and recognizing phrases. In: Proceedings of the 27th Annual International ACM SIGIR (2004)

    Google Scholar 

  13. Pla, F., Hurtado, L.-F.: Sentiment analysis in twitter for Spanish. In: Métais, E., Roche, M., Teisseire, M. (eds.) NLDB 2014. LNCS, vol. 8455, pp. 208–213. Springer, Heidelberg (2014)

    Google Scholar 

  14. Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference (1994)

    Google Scholar 

  15. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M.: Okapi at TREC-3. NIST SPECIAL PUBLICATION SP (1995)

    Google Scholar 

  16. Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Now Publishers Inc., Hanover (2009)

    Google Scholar 

  17. Sixto, J., Almeida, A., López-de-Ipiña, D.: DeustoTech Internet at TASS 2015: Sentiment analysis and polarity classification in spanish tweets. In: TASS 2015: Workshop on Sentiment Analysis at SEPLN (2015)

    Google Scholar 

  18. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. J. Am. Soc. Inform. Sci. Technol. 63(1), 163–173 (2012)

    Article  Google Scholar 

  19. Valverde, J., Tejada, J., Cuadros, E.: Comparing Supervised Learning Methods for Classifying Spanish Tweets. In: TASS 2015: Workshop on Sentiment Analysis at SEPLN, p. 87 (2015). Comit organizador

    Google Scholar 

  20. Villena-Román, J., García-Morera, J., García-Cumbreras, M., Martínez-Cámara, E., Martín-Valdivia, M., Ureña-López, L.: Overview of TASS 2015. In: TASS 2015: Workshop on Sentiment Analysis at SEPLN (2015)

    Google Scholar 

  21. Zhang, W., Yu, C., Meng, W.: Opinion retrieval from blogs. In: Proceedings of the Sixteenth ACM Conference (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Sixto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Sixto, J., Almeida, A., López-de-Ipiña, D. (2016). Improving the Sentiment Analysis Process of Spanish Tweets with BM25. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2016. Lecture Notes in Computer Science(), vol 9612. Springer, Cham. https://doi.org/10.1007/978-3-319-41754-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41754-7_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41753-0

  • Online ISBN: 978-3-319-41754-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics