Skip to main content

CoRTE: A Corpus of Recognizing Textual Entailment Data Annotated for Coreference and Bridging Relations

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11107))

Included in the following conference series:

  • 1394 Accesses

Abstract

This paper presents CoRTE, an English corpus annotated with coreference and bridging relations, where the dataset is taken from the main task of recognizing textual entailment (RTE). Our annotation scheme elaborates existing schemes by introducing subcategories. Each coreference and bridging relation has been assigned a category. CoRTE is a useful resource for researchers working on coreference and bridging resolution, as well as recognizing textual entailment (RTE) task. RTE has its applications in many NLP domains. CoRTE would thus provide contextual information readily available to the NLP systems being developed for domains requiring textual inference and discourse understanding. The paper describes the annotation scheme with examples. We have annotated 340 text-hypothesis pairs, consisting of 24,742 tokens and 8,072 markables.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://aclweb.org/aclwiki/Textual_Entailment_Portal.

  2. 2.

    https://repeval2017.github.io/shared/.

  3. 3.

    https://sourceforge.net/projects/corte.

References

  1. Bos, J., Markert, K.: When logical inference helps determining textual entailment (and when it doesn’t). In: Proceedings of the Second PASCAL RTE Challenge, p. 26 (2006)

    Google Scholar 

  2. Abad, A., et al.: A resource for investigating the impact of anaphora and coreference on inference. In: Proceedings of LREC (2010)

    Google Scholar 

  3. Mirkin, S., Dagan, I., Padó, S.: Assessing the role of discourse references in entailment inference. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1209–1219 (2010)

    Google Scholar 

  4. Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference, pp. 632–642 (2015)

    Google Scholar 

  5. White, A.S., Rastogi, P., Duh, K.: Inference is everything: recasting semantic resources into a unified evaluation framework. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing, p. 10 (2017)

    Google Scholar 

  6. Harabagiu, S., Hickl, A.: Methods for using textual entailment in open-domain question answering. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 905–912 (2006)

    Google Scholar 

  7. Romano, L., Kouylekov, M., Szpektor, I., Dagan, I., Lavelli, A.: Investigating a generic paraphrase-based approach for relation extraction. In: 11th Conference of the European Chapter of the ACL (2006)

    Google Scholar 

  8. Padó, S., Galley, M., Jurafsky, D., Manning, C.: Robust machine translation evaluation with entailment features. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Stroudsburg, PA, USA, vol. 1, pp. 297–305 (2009)

    Google Scholar 

  9. Hirschman, L., Chinchor, N.: Appendix F: MUC-7 coreference task definition (version 3.0). In: Seventh Message Understanding Conference (MUC-7), Virginia (1998)

    Google Scholar 

  10. Doddington, G.R., Mitchell, A., Przybocki, M.A., Ramshaw, L.A., Strassel, S., Weischedel, R.M.: The automatic content extraction (ACE) program-tasks, data, and evaluation. In: LREC, vol. 2, p. 1 (2004)

    Google Scholar 

  11. Pradhan, S.S., Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: a unified relational semantic representation. Int. J. Semant. Comput. 1, 405–419 (2007)

    Article  Google Scholar 

  12. Clark, H.H.: Bridging. In: Proceedings of the 1975 Workshop on Theoretical Issues in Natural Language Processing, TINLAP 1975, pp. 169–174 (1975)

    Google Scholar 

  13. Poesio, M.: The MATE/GNOME proposals for anaphoric annotation, revisited. In: Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL (2004)

    Google Scholar 

  14. Poesio, M., Artstein, R.: Anaphoric annotation in the ARRAU corpus. In: LREC (2008)

    Google Scholar 

  15. Nedoluzhko, A., Mírovský, J., Pajas, P.: The coding scheme for annotating extended nominal coreference and bridging anaphora in the Prague Dependency Treebank. In: Proceedings of the Third Linguistic Annotation Workshop, pp. 108–111 (2009)

    Google Scholar 

  16. Stede, M.: The Potsdam commentary corpus. In: Proceedings of the 2004 ACL Workshop on Discourse Annotation, pp. 96–102 (2004)

    Google Scholar 

  17. Riester, A., Lorenz, D., Seemann, N.: A recursive annotation scheme for referential information status. In: LREC (2010)

    Google Scholar 

  18. Eckart, K., Riester, A., Schweitzer, K.: A discourse information radio news database for linguistic analysis. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 65–76. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28249-2_7

    Chapter  Google Scholar 

  19. Cahill, A., Riester, A.: Automatically acquiring fine-grained information status distinctions in German. In: Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 232–236 (2012)

    Google Scholar 

  20. Markert, K., Hou, Y., Strube, M.: Collective classification for fine-grained information status. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 795–804 (2012)

    Google Scholar 

  21. Hou, Y., Markert, K., Strube, M.: Cascading collective classification for bridging anaphora recognition using a rich linguistic feature set. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 814–820 (2013)

    Google Scholar 

  22. Grishina, Y.: Experiments on bridging across languages and genres. In: Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes, CORBON 2016 (2016)

    Google Scholar 

  23. Rösiger, I.: SciCorp: a corpus of English scientific articles annotated for information status analysis. In: LREC (2016)

    Google Scholar 

  24. Müller, C., Strube, M.: Multi-level annotation of linguistic data with MMAX2. In: Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods (2006)

    Google Scholar 

  25. Recasens, M., Martí, M.A., Orasan, C.: Annotating near-identity from coreference disagreements. In: LREC, pp. 165–172 (2012)

    Google Scholar 

  26. Schäfer, U., Spurk, C., Steffen, J.: A fully coreference-annotated corpus of scholarly papers from the ACL anthology. In: Proceedings of COLING 2012 Posters, pp. 1059–1070 (2012)

    Google Scholar 

  27. Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Afifah Waseem .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Waseem, A. (2018). CoRTE: A Corpus of Recognizing Textual Entailment Data Annotated for Coreference and Bridging Relations. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00794-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00793-5

  • Online ISBN: 978-3-030-00794-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics