CoRTE: A Corpus of Recognizing Textual Entailment Data Annotated for Coreference and Bridging Relations

Waseem, Afifah

doi:10.1007/978-3-030-00794-2_12

Afifah Waseem¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11107))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1394 Accesses

Abstract

This paper presents CoRTE, an English corpus annotated with coreference and bridging relations, where the dataset is taken from the main task of recognizing textual entailment (RTE). Our annotation scheme elaborates existing schemes by introducing subcategories. Each coreference and bridging relation has been assigned a category. CoRTE is a useful resource for researchers working on coreference and bridging resolution, as well as recognizing textual entailment (RTE) task. RTE has its applications in many NLP domains. CoRTE would thus provide contextual information readily available to the NLP systems being developed for domains requiring textual inference and discourse understanding. The paper describes the annotation scheme with examples. We have annotated 340 text-hypothesis pairs, consisting of 24,742 tokens and 8,072 markables.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bos, J., Markert, K.: When logical inference helps determining textual entailment (and when it doesn’t). In: Proceedings of the Second PASCAL RTE Challenge, p. 26 (2006)
Google Scholar
Abad, A., et al.: A resource for investigating the impact of anaphora and coreference on inference. In: Proceedings of LREC (2010)
Google Scholar
Mirkin, S., Dagan, I., Padó, S.: Assessing the role of discourse references in entailment inference. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1209–1219 (2010)
Google Scholar
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference, pp. 632–642 (2015)
Google Scholar
White, A.S., Rastogi, P., Duh, K.: Inference is everything: recasting semantic resources into a unified evaluation framework. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing, p. 10 (2017)
Google Scholar
Harabagiu, S., Hickl, A.: Methods for using textual entailment in open-domain question answering. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 905–912 (2006)
Google Scholar
Romano, L., Kouylekov, M., Szpektor, I., Dagan, I., Lavelli, A.: Investigating a generic paraphrase-based approach for relation extraction. In: 11th Conference of the European Chapter of the ACL (2006)
Google Scholar
Padó, S., Galley, M., Jurafsky, D., Manning, C.: Robust machine translation evaluation with entailment features. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Stroudsburg, PA, USA, vol. 1, pp. 297–305 (2009)
Google Scholar
Hirschman, L., Chinchor, N.: Appendix F: MUC-7 coreference task definition (version 3.0). In: Seventh Message Understanding Conference (MUC-7), Virginia (1998)
Google Scholar
Doddington, G.R., Mitchell, A., Przybocki, M.A., Ramshaw, L.A., Strassel, S., Weischedel, R.M.: The automatic content extraction (ACE) program-tasks, data, and evaluation. In: LREC, vol. 2, p. 1 (2004)
Google Scholar
Pradhan, S.S., Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: a unified relational semantic representation. Int. J. Semant. Comput. 1, 405–419 (2007)
Article Google Scholar
Clark, H.H.: Bridging. In: Proceedings of the 1975 Workshop on Theoretical Issues in Natural Language Processing, TINLAP 1975, pp. 169–174 (1975)
Google Scholar
Poesio, M.: The MATE/GNOME proposals for anaphoric annotation, revisited. In: Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL (2004)
Google Scholar
Poesio, M., Artstein, R.: Anaphoric annotation in the ARRAU corpus. In: LREC (2008)
Google Scholar
Nedoluzhko, A., Mírovský, J., Pajas, P.: The coding scheme for annotating extended nominal coreference and bridging anaphora in the Prague Dependency Treebank. In: Proceedings of the Third Linguistic Annotation Workshop, pp. 108–111 (2009)
Google Scholar
Stede, M.: The Potsdam commentary corpus. In: Proceedings of the 2004 ACL Workshop on Discourse Annotation, pp. 96–102 (2004)
Google Scholar
Riester, A., Lorenz, D., Seemann, N.: A recursive annotation scheme for referential information status. In: LREC (2010)
Google Scholar
Eckart, K., Riester, A., Schweitzer, K.: A discourse information radio news database for linguistic analysis. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 65–76. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28249-2_7
Chapter Google Scholar
Cahill, A., Riester, A.: Automatically acquiring fine-grained information status distinctions in German. In: Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 232–236 (2012)
Google Scholar
Markert, K., Hou, Y., Strube, M.: Collective classification for fine-grained information status. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 795–804 (2012)
Google Scholar
Hou, Y., Markert, K., Strube, M.: Cascading collective classification for bridging anaphora recognition using a rich linguistic feature set. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 814–820 (2013)
Google Scholar
Grishina, Y.: Experiments on bridging across languages and genres. In: Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes, CORBON 2016 (2016)
Google Scholar
Rösiger, I.: SciCorp: a corpus of English scientific articles annotated for information status analysis. In: LREC (2016)
Google Scholar
Müller, C., Strube, M.: Multi-level annotation of linguistic data with MMAX2. In: Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods (2006)
Google Scholar
Recasens, M., Martí, M.A., Orasan, C.: Annotating near-identity from coreference disagreements. In: LREC, pp. 165–172 (2012)
Google Scholar
Schäfer, U., Spurk, C., Steffen, J.: A fully coreference-annotated corpus of scholarly papers from the ACL anthology. In: Proceedings of COLING 2012 Posters, pp. 1059–1070 (2012)
Google Scholar
Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Oxford, Oxford, OX1 3QD, UK
Afifah Waseem

Authors

Afifah Waseem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Afifah Waseem .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Waseem, A. (2018). CoRTE: A Corpus of Recognizing Textual Entailment Data Annotated for Coreference and Bridging Relations. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-00794-2_12
Published: 08 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00793-5
Online ISBN: 978-3-030-00794-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics