Abstract
This paper presents CoRTE, an English corpus annotated with coreference and bridging relations, where the dataset is taken from the main task of recognizing textual entailment (RTE). Our annotation scheme elaborates existing schemes by introducing subcategories. Each coreference and bridging relation has been assigned a category. CoRTE is a useful resource for researchers working on coreference and bridging resolution, as well as recognizing textual entailment (RTE) task. RTE has its applications in many NLP domains. CoRTE would thus provide contextual information readily available to the NLP systems being developed for domains requiring textual inference and discourse understanding. The paper describes the annotation scheme with examples. We have annotated 340 text-hypothesis pairs, consisting of 24,742 tokens and 8,072 markables.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bos, J., Markert, K.: When logical inference helps determining textual entailment (and when it doesn’t). In: Proceedings of the Second PASCAL RTE Challenge, p. 26 (2006)
Abad, A., et al.: A resource for investigating the impact of anaphora and coreference on inference. In: Proceedings of LREC (2010)
Mirkin, S., Dagan, I., Padó, S.: Assessing the role of discourse references in entailment inference. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1209–1219 (2010)
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference, pp. 632–642 (2015)
White, A.S., Rastogi, P., Duh, K.: Inference is everything: recasting semantic resources into a unified evaluation framework. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing, p. 10 (2017)
Harabagiu, S., Hickl, A.: Methods for using textual entailment in open-domain question answering. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 905–912 (2006)
Romano, L., Kouylekov, M., Szpektor, I., Dagan, I., Lavelli, A.: Investigating a generic paraphrase-based approach for relation extraction. In: 11th Conference of the European Chapter of the ACL (2006)
Padó, S., Galley, M., Jurafsky, D., Manning, C.: Robust machine translation evaluation with entailment features. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Stroudsburg, PA, USA, vol. 1, pp. 297–305 (2009)
Hirschman, L., Chinchor, N.: Appendix F: MUC-7 coreference task definition (version 3.0). In: Seventh Message Understanding Conference (MUC-7), Virginia (1998)
Doddington, G.R., Mitchell, A., Przybocki, M.A., Ramshaw, L.A., Strassel, S., Weischedel, R.M.: The automatic content extraction (ACE) program-tasks, data, and evaluation. In: LREC, vol. 2, p. 1 (2004)
Pradhan, S.S., Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: a unified relational semantic representation. Int. J. Semant. Comput. 1, 405–419 (2007)
Clark, H.H.: Bridging. In: Proceedings of the 1975 Workshop on Theoretical Issues in Natural Language Processing, TINLAP 1975, pp. 169–174 (1975)
Poesio, M.: The MATE/GNOME proposals for anaphoric annotation, revisited. In: Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL (2004)
Poesio, M., Artstein, R.: Anaphoric annotation in the ARRAU corpus. In: LREC (2008)
Nedoluzhko, A., Mírovský, J., Pajas, P.: The coding scheme for annotating extended nominal coreference and bridging anaphora in the Prague Dependency Treebank. In: Proceedings of the Third Linguistic Annotation Workshop, pp. 108–111 (2009)
Stede, M.: The Potsdam commentary corpus. In: Proceedings of the 2004 ACL Workshop on Discourse Annotation, pp. 96–102 (2004)
Riester, A., Lorenz, D., Seemann, N.: A recursive annotation scheme for referential information status. In: LREC (2010)
Eckart, K., Riester, A., Schweitzer, K.: A discourse information radio news database for linguistic analysis. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 65–76. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28249-2_7
Cahill, A., Riester, A.: Automatically acquiring fine-grained information status distinctions in German. In: Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 232–236 (2012)
Markert, K., Hou, Y., Strube, M.: Collective classification for fine-grained information status. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 795–804 (2012)
Hou, Y., Markert, K., Strube, M.: Cascading collective classification for bridging anaphora recognition using a rich linguistic feature set. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 814–820 (2013)
Grishina, Y.: Experiments on bridging across languages and genres. In: Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes, CORBON 2016 (2016)
Rösiger, I.: SciCorp: a corpus of English scientific articles annotated for information status analysis. In: LREC (2016)
Müller, C., Strube, M.: Multi-level annotation of linguistic data with MMAX2. In: Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods (2006)
Recasens, M., Martí, M.A., Orasan, C.: Annotating near-identity from coreference disagreements. In: LREC, pp. 165–172 (2012)
Schäfer, U., Spurk, C., Steffen, J.: A fully coreference-annotated corpus of scholarly papers from the ACL anthology. In: Proceedings of COLING 2012 Posters, pp. 1059–1070 (2012)
Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Waseem, A. (2018). CoRTE: A Corpus of Recognizing Textual Entailment Data Annotated for Coreference and Bridging Relations. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-00794-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00793-5
Online ISBN: 978-3-030-00794-2
eBook Packages: Computer ScienceComputer Science (R0)