Skip to main content

Semi-supervised Sentiment Annotation of Large Corpora

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11122))

Abstract

Huge annotated corpora are relevant for many Natural Language Processing tasks such as Sentiment Analysis. However, a manual and more precise annotation is always costly and becomes prohibitive when the corpus is too large. This paper presents a semi-supervised learning based framework for extending sentiment annotated corpora with unlabeled data, named CasSUL. The framework was used to extend in eight times TTsBR, a corpus of 15.000 tweets in Brazilian Portuguese manually annotated in three polarity classes. The extended annotated corpus was used to train several polarity classifiers and the results show that some combinations of classifier and features can preserve the annotation quality of the original corpus in the resulting corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Avanço, L.V., Brum, H.B., Nunes, M.: Improving opinion classifiers by combining different methods and resources. XIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pp. 25–36 (2016)

    Google Scholar 

  2. Avanço, L.V.: Sobre normalização e classificação de polaridade de textos opinativos na web (2015)

    Google Scholar 

  3. Bertaglia, T.F.C., Nunes, M.G.V.: Exploring word embeddings for unsupervised textual user-generated content normalization. In: WNUT 2016, p. 112 (2016)

    Google Scholar 

  4. Brum, H., Nunes, M.G.V.: Building a sentiment corpus of tweets in brazilian portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), May 2018

    Google Scholar 

  5. Correa Jr., E.A., Marinho, V.Q., dos Santos, L.B., Bertaglia, T.F., Treviso, M.V., Brum, H.B.: Pelesent: cross-domain polarity classification using distant supervision. arXiv preprint arXiv:1707.02657 (2017)

  6. Dasgupta, S., Ng, V.: Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL 2009, pp. 701–709. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  7. Fonseca, E.R., Rosa, J.L.G., Aluísio, S.M.: Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese. J. Braz. Comput. Soc. 21(1), 2 (2015)

    Article  Google Scholar 

  8. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(2009) 12 (2009)

    Google Scholar 

  9. Hartmann, N.S., et al.: A large corpus of product reviews in Portuguese: tackling out-of-vocabulary words. In: 9th International Conference on Language Resources and Evaluation (2014)

    Google Scholar 

  10. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)

    Article  Google Scholar 

  11. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  12. Monard, M.C., Batista, G.E.: Learning with skewed class distrihutions. Adv. Log. Artif. Intell. Robot. LAPTEC 85(2002), 173 (2002)

    Google Scholar 

  13. Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: Semeval-2016 task 4: sentiment analysis in Twitter. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016) (2016)

    Google Scholar 

  14. Novak, P.K., Smailović, J., Sluban, B., Mozetič, I.: Sentiment of emojis. PloS one 10(12), e0144296 (2015)

    Article  Google Scholar 

  15. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREc, vol. 10 (2010)

    Google Scholar 

  16. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, EMNLP 2002, pp. 79–86. Association for Computational Linguistics, Stroudsburg (2002)

    Google Scholar 

  17. Silva, I.S., Gomide, J., Veloso, A., Meira Jr, W., Ferreira, R.: Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 475–484. ACM (2011)

    Google Scholar 

  18. Silva, M.J., Carvalho, P., Sarmento, L.: Building a sentiment lexicon for social judgement mining. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS (LNAI), vol. 7243, pp. 218–228. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28885-2_25

    Chapter  Google Scholar 

  19. Silva, N.F.F.D., Coletta, L.F.S., Hruschka, E.R.: A survey and comparative study of tweet sentiment analysis via semi-supervised learning. ACM Comput. Surv. 49(1), 15:1–15:26 (2016)

    Google Scholar 

  20. da Silva, N.F.F., Coletta, L.F., Hruschka, E.R., Hruschka Jr., E.R.: Using unsupervised information to improve semi-supervised tweet sentiment classification. Inf. Sci. 355, 348–365 (2016)

    Article  Google Scholar 

  21. Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)

    Google Scholar 

  22. Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 417–424. Association for Computational Linguistics, Stroudsburg (2002)

    Google Scholar 

Download references

Acknowledgement

We acknowledge financial support from CNPq and CAPES for the financial support during the experiment that originated this research paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henrico Bertini Brum .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Brum, H.B., Nunes, M.d.G.V. (2018). Semi-supervised Sentiment Annotation of Large Corpora. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99722-3_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99721-6

  • Online ISBN: 978-3-319-99722-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics