Skip to main content

SICK-BR: A Portuguese Corpus for Inference

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2018)

Abstract

We describe SICK-BR, a Brazilian Portuguese corpus annotated with inference relations and semantic relatedness between pairs of sentences. SICK-BR is a translation and adaptation of the original SICK, a corpus of English sentences used in several semantic evaluations. SICK-BR consists of around 10k sentence pairs annotated for neutral/contradiction/entailment relations and for semantic relatedness, using a 5 point scale. Here we describe the strategies used for the adaptation of SICK, which preserve its original inference and relatedness relation labels in the SICK-BR Portuguese version. We also discuss some issues with the original corpus and how we might deal with them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://aclweb.org/aclwiki/Recognizing_Textual_Entailment.

  2. 2.

    http://propor2016.di.fc.ul.pt/?page_id=381.

  3. 3.

    http://clic.cimec.unitn.it/composes/sick.html.

  4. 4.

    Available at https://github.com/livyreal/SICK-BR/tree/master/Glossary.

  5. 5.

    We thank Katerina Kalouli for the processing of original SICK, made public available in https://github.com/kkalouli/SICK-processing.

  6. 6.

    These analyses are available in https://github.com/livyreal/SICK-BR.

References

  1. de Paiva, V., Rademaker, A., de Melo, G.: OpenWordNet-PT: an open Brazilian WordNet for reasoning. In: COLING 2012: Demonstration Papers (2012)

    Google Scholar 

  2. de Paiva, V., Real, L., Rademaker, A., de Melo, G.: NomLex-PT: a lexicon of Portuguese nominalizations. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland, May 2014

    Google Scholar 

  3. Real, L., Rademaker, A., Chalub, F., de Paiva, V.: Towards temporal reasoning in Portuguese. In: LREC2018 Workshop Linked Data in Linguistics (2018)

    Google Scholar 

  4. Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., Zamparelli, R.: A SICK cure for the evaluation of compositional distributional semantic models. In: Proceedings of LREC 2014 (2014)

    Google Scholar 

  5. Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 (2015)

  6. Williams, A., Nangia, N., Bowman, S.R.: A broad-coverage challenge corpus for sentence understanding through inference. arXiv (2017). http://arxiv.org/abs/1704.05426

  7. Gururangan, S., Swayamdipta, S., Levy, O., Schwartz, R., Bowman, S.R., Smith, N.A.: Annotation artifacts in natural language inference data. CoRR abs/1803.02324 (2018). http://arxiv.org/abs/1803.02324

  8. Fonseca, E., Borges dos Santos, L., Criscuolo, M., Aluisio, S.: Visao geral da avaliacao de similaridade semantica e inferencia textual. Linguamatica 8(2) (2016)

    Google Scholar 

  9. Fonseca, E.R.: Reconhecimento de implicação textual em português. Ph.D. thesis, ICMC-USP (2018)

    Google Scholar 

  10. Condoravdi, C., Crouch, D., De Paiva, V., Stolle, R., Bobrow, D.: Entailment, intensionality and text understanding. In: HLT-NAACL 2003 Workshop on Text Meaning (2003)

    Google Scholar 

  11. de Marneffe, M.C., Rafferty, A.N., Manning, C.D.: Finding contradictions in text. In: Proceedings of ACL 2008 (2008)

    Google Scholar 

  12. Kalouli, A.L., Real, L., de Paiva, V.: Textual inference: getting logic from humans. In: Proceedings of the 12th International Conference on Computational Semantics (IWCS) (2017)

    Google Scholar 

  13. Kalouli, A.L., Real, L., De Paiva, V.: Annotating logic inference pitfalls. In: Workshop on Data Provenance and Annotation in Computational Linguistics (2018)

    Google Scholar 

  14. Kalouli, A.L., Real, L., de Paiva, V.: Correcting contradictions. In: Proceedings of Computing Natural Language Inference (CONLI) Workshop (2017)

    Google Scholar 

  15. de Melo, G., de Paiva, V.: Sense-specific implicative commitments. In: Gelbukh, A. (ed.) CICLing 2014. LNCS, vol. 8403, pp. 391–402. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54906-9_32

    Chapter  Google Scholar 

  16. Nairn, R., Condoravdi, C., Karttunen, L.: Computing relative polarity for textual inference. In: Inference in Computational Semantics (ICoS-5), pp. 20–21 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Livy Real .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Real, L. et al. (2018). SICK-BR: A Portuguese Corpus for Inference. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99722-3_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99721-6

  • Online ISBN: 978-3-319-99722-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics