Semi-supervised Sentiment Annotation of Large Corpora

Brum, Henrico Bertini; Nunes, Maria das Graças Volpe

doi:10.1007/978-3-319-99722-3_39

Henrico Bertini Brum²¹ &
Maria das Graças Volpe Nunes²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11122))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

823 Accesses
4 Citations

Abstract

Huge annotated corpora are relevant for many Natural Language Processing tasks such as Sentiment Analysis. However, a manual and more precise annotation is always costly and becomes prohibitive when the corpus is too large. This paper presents a semi-supervised learning based framework for extending sentiment annotated corpora with unlabeled data, named CasSUL. The framework was used to extend in eight times TTsBR, a corpus of 15.000 tweets in Brazilian Portuguese manually annotated in three polarity classes. The extended annotated corpus was used to train several polarity classifiers and the results show that some combinations of classifier and features can preserve the annotation quality of the original corpus in the resulting corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Avanço, L.V., Brum, H.B., Nunes, M.: Improving opinion classifiers by combining different methods and resources. XIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pp. 25–36 (2016)
Google Scholar
Avanço, L.V.: Sobre normalização e classificação de polaridade de textos opinativos na web (2015)
Google Scholar
Bertaglia, T.F.C., Nunes, M.G.V.: Exploring word embeddings for unsupervised textual user-generated content normalization. In: WNUT 2016, p. 112 (2016)
Google Scholar
Brum, H., Nunes, M.G.V.: Building a sentiment corpus of tweets in brazilian portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), May 2018
Google Scholar
Correa Jr., E.A., Marinho, V.Q., dos Santos, L.B., Bertaglia, T.F., Treviso, M.V., Brum, H.B.: Pelesent: cross-domain polarity classification using distant supervision. arXiv preprint arXiv:1707.02657 (2017)
Dasgupta, S., Ng, V.: Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL 2009, pp. 701–709. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Fonseca, E.R., Rosa, J.L.G., Aluísio, S.M.: Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese. J. Braz. Comput. Soc. 21(1), 2 (2015)
Article Google Scholar
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(2009) 12 (2009)
Google Scholar
Hartmann, N.S., et al.: A large corpus of product reviews in Portuguese: tackling out-of-vocabulary words. In: 9th International Conference on Language Resources and Evaluation (2014)
Google Scholar
Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Monard, M.C., Batista, G.E.: Learning with skewed class distrihutions. Adv. Log. Artif. Intell. Robot. LAPTEC 85(2002), 173 (2002)
Google Scholar
Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: Semeval-2016 task 4: sentiment analysis in Twitter. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016) (2016)
Google Scholar
Novak, P.K., Smailović, J., Sluban, B., Mozetič, I.: Sentiment of emojis. PloS one 10(12), e0144296 (2015)
Article Google Scholar
Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREc, vol. 10 (2010)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, EMNLP 2002, pp. 79–86. Association for Computational Linguistics, Stroudsburg (2002)
Google Scholar
Silva, I.S., Gomide, J., Veloso, A., Meira Jr, W., Ferreira, R.: Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 475–484. ACM (2011)
Google Scholar
Silva, M.J., Carvalho, P., Sarmento, L.: Building a sentiment lexicon for social judgement mining. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS (LNAI), vol. 7243, pp. 218–228. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28885-2_25
Chapter Google Scholar
Silva, N.F.F.D., Coletta, L.F.S., Hruschka, E.R.: A survey and comparative study of tweet sentiment analysis via semi-supervised learning. ACM Comput. Surv. 49(1), 15:1–15:26 (2016)
Google Scholar
da Silva, N.F.F., Coletta, L.F., Hruschka, E.R., Hruschka Jr., E.R.: Using unsupervised information to improve semi-supervised tweet sentiment classification. Inf. Sci. 355, 348–365 (2016)
Article Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Google Scholar
Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 417–424. Association for Computational Linguistics, Stroudsburg (2002)
Google Scholar

Download references

Acknowledgement

We acknowledge financial support from CNPq and CAPES for the financial support during the experiment that originated this research paper.

Author information

Authors and Affiliations

Núcleo Interinstitucional de Linguística Computacional, Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Paulo, Brazil
Henrico Bertini Brum & Maria das Graças Volpe Nunes

Authors

Henrico Bertini Brum
View author publications
You can also search for this author in PubMed Google Scholar
Maria das Graças Volpe Nunes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henrico Bertini Brum .

Editor information

Editors and Affiliations

Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Aline Villavicencio
Instituto de Informática - UFRGS, Porto Alegre, Brazil
Viviane Moreira
INESC-ID, Lisbon, Portugal
Alberto Abad
UFSCAR, Sao Carlos, Brazil
Helena Caseli
Centro Singular de Investigación en Tecnoloxías, Universidade de Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Pablo Gamallo
Université de Toulon, Parc Scientifique Technologique Luminy, Marseille, France
Carlos Ramisch
Centro de Informática e Sistemas, Universidade de Coimbra, Coimbra, Portugal
Hugo Gonçalo Oliveira
Federal University of Technology, Dois Vizinhos, Paraná, Brazil
Gustavo Henrique Paetzold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brum, H.B., Nunes, M.d.G.V. (2018). Semi-supervised Sentiment Annotation of Large Corpora. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-99722-3_39
Published: 26 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99721-6
Online ISBN: 978-3-319-99722-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics