Evaluation Campaigns

Recasens, Marta; Pradhan, Sameer

doi:10.1007/978-3-662-47909-4_6

Marta Recasens⁷ &
Sameer Pradhan⁸

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

1083 Accesses
1 Citations

Abstract

In this chapter, we overview the major efforts in evaluation campaigns (shared tasks) for coreference resolution, where multiple participants are given the same datasets and annotations, and are evaluated on the same test set and using the same scoring software, thus making it possible to compare the different participating systems. More specifically, we overview the Message Understanding Conference (MUC), the Automatic Content Extraction program (ACE), the SemEval-2010 Task 1, the i2b2-2011 shared task, and the CoNLL-2011 and 2012 shared tasks. We discuss the critical issues behind the practice of coreference resolution evaluation, such as the range of mentions defined in the annotation guidelines, the use of gold vs. predicted mentions, the layers of preprocessing information that are provided, and the multiple coreference evaluation measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The first ACE editions focused on five classes, and later editions added vehicles and weapons.
2.
Although OntoNotes was not originally annotated with singletons, they were identified heuristically and added in the dataset used in SemEval so as to make the different datasets as similar as possible. A few non-referential NPs that could not be automatically detected (e.g., expletive pronouns) were unavoidably annotated as singletons in this process. In the Dutch dataset, only singletons for named entities are annotated.
3.
Some of the original corpora from which the SemEval datasets were extracted contain coreference annotations for non-NP mentions, but verbal mentions were removed to keep the evaluation campaign simpler.
4.
Even though the gold scenario at the CoNLL-2011 and 2012 evaluations provided coreferent mentions only, not all participants exploited this hint to corefer every given mention, and left some mentions unlinked, thus not achieving 100 % recall for mention detection.
5.
ACE also had “diagnostic” tasks where gold mentions were provided.
6.
To summarize some of the variations that have been proposed:
- Bengtson and Roth [8] discard the predicted mentions that have no counterpart in the gold.
- Stoyanov et al. [92] use b ³ _all, which retains all predicted mentions, and b ³ ₀, which discards all predicted mentions with no counterpart in the gold.
- Rahman and Ng [74] only discard the predicted mentions that have no counterpart in the gold and that are singletons.
- Cai and Strube [15] adjust a system output in three ways: gold mentions with no system counterpart are added as predicted singleton mentions, predicted singleton mentions with no counterpart are removed, and to compute precision, predicted coreferent mentions with no gold counterpart are added as gold singleton mentions.
7.
This assumption used to hold for blanc [76], but not anymore since Luo et al.’s extension [64].
8.
Word senses in OntoNotes have a direct one-to-many mapping to WordNet senses.
9.
http://conll.github.io/reference-coreference-scorers/
10.
The scorer used at SemEval-2010 was not the same version as the one used at the CoNLL-2011 and CoNLL-2012 shared tasks, as the latter incorporated a (buggy) implementation of Cai and Strube’s [15] variations.
11.
Nominal predicates and appositive phrases fell under the Identity type in the MUC annotation scheme.
12.
http://www.itl.nist.gov/iad/mig/tests/ace/
13.
http://projects.ldc.upenn.edu/ace/data/
14.
The full list of seven ACE entity types includes: person (e.g., the President of the U.S.), organization (e.g., University of Tennessee), geopolitical entity (e.g., the people of France), location (e.g., Germany), facility (e.g., the oil refinery), vehicle (e.g., the train), and weapon (e.g., knife).
15.
In Chinese, the word count is approximated by multiplying the number of characters by 1.5.
16.
We ran the scorer using the head-word relaxed flag, as the original SemEval task did.
17.
https://www.i2b2.org/NLP/
18.
The corpus guidelines are given as an appendix to the main JAMIA publication [96].
19.
http://conll.github.io/reference-coreference-scorers

References

Anick, P., Hong, P., Xue, N., et al.: Coreference resolution for electronic medical records. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
Appelt, D.E., Hobbs, J.R., Bear, J., Israel, D., Kameyama, M., Kehler, A., Martin, D., Myers, K., Tyson, M.: SRI international FASTUS system MUC-6 test results and analysis. In: Proceedings of MUC-6, Columbia, pp. 237–248 (1995)
Google Scholar
Attardi, G., Rossi, S.D., Simi, M.: TANL-1: coreference resolution by parse analysis and similarity clustering. In: Proceedings of SemEval-2, Uppsala, pp. 108–111 (2010)
Google Scholar
Bagga, A., Baldwin, B.: Algorithms for scoring coreference chains. In: Proceedings of the LREC Workshop on Linguistic Coreference, Granada, pp. 563–566 (1998)
Google Scholar
Baldwin, B., Morton, T., Bagga, A., Baldridge, J., Chandraseker, R., Dimitriadis, A., Snyder, K., Wolska, M.: Description of the UPenn CAMP system as used for coreference. In: Proceedings of MUC-7, Fairfax (1998)
Google Scholar
Baldwin, B., Reynar, J., Collins, M., Eisner, J., Ratnaparkhi, A., Rosenzweig, J., Sarkar, A., Srinivas: University of Pennsylvania: description of the University of Pennsylvania system used for MUC-6. In: Proceedings of MUC-6, Columbia, pp. 177–191 (1995)
Google Scholar
Benajiba, Y., Shaw, J.: An SVM-based coreference resolution system based on philips information extraction. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
Bengtson, E., Roth, D.: Understanding the value of features for coreference resolution. In: Proceedings of EMNLP 2008, Honolulu, pp. 294–303 (2008)
Google Scholar
Bergsma, S., Lin, D.: Bootstrapping path-based pronoun resolution. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, pp. 33–40 (2006)
Google Scholar
Björkelund, A., Farkas, R.: Data-driven multilingual coreference resolution using resolver stacking. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 49–55 (2012)
Google Scholar
Björkelund, A., Nugues, P.: Exploring lexicalized features for coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 45–50 (2011)
Google Scholar
Broscheit, S., Poesio, M., Ponzetto, S.P., Rodríguez, K.J., Romano, L., Uryupina, O., Versley, Y., Zanoli, R.: BART: a multilingual anaphora resolution system. In: Proceedings of SemEval-2, Uppsala, pp. 104–107 (2010)
Google Scholar
Cai, J., Mujdricza, E., Hou, Y., Strube, M.: Weakly supervised graph-based coreference resolution for clinical texts. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
Cai, J., Mujdricza-Maydt, E., Strube, M.: Unrestricted coreference resolution via global hypergraph partitioning. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 56–60 (2011)
Google Scholar
Cai, J., Strube, M.: Evaluation metrics for end-to-end coreference resolution systems. In: Proceedings of SIGDIAL, University of Tokyo, Tokyo, pp. 28–36 (2010)
Google Scholar
Chang, K.W., Samdani, R., Rozovskaya, A., Rizzolo, N., Sammons, M., Roth, D.: Inference protocols for coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 40–44 (2011)
Google Scholar
Chang, K.W., Samdani, R., Rozovskaya, A., Sammons, M., Roth, D.: Illinois-coref: the UI system in the CoNLL-2012 shared task. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 113–117 (2012)
Google Scholar
Charton, E., Gagnon, M.: Poly-co: a multilayer perceptron approach for coreference detection. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 97–101 (2011)
Google Scholar
Chen, C., Ng, V.: Combining the best of two worlds: a hybrid approach to multilingual coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 56–63 (2012)
Google Scholar
Chen, W., Zhang, M., Qin, B.: Coreference resolution system using maximum entropy classifier. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 127–130 (2011)
Google Scholar
Chinchor, N.A.: Overview of MUC-7/MET-2. In: Proceedings of the Seventh Message Understanding Conference (MUC-7), Fairfax (1998)
Google Scholar
Choi, Y., Cardie, C.: Structured local training and biased potential functions for conditional random fields with application to coreference resolution. In: Proceedings of HLT-NAACL, Rochester, pp. 65–72 (2007)
Google Scholar
Culotta, A., Wick, M., Hall, R., McCallum, A.: First-order probabilistic models for coreference resolution. In: HLT/NAACL, Rochester, pp. 81–88 (2007)
Google Scholar
Dai, H., Wu, C., Chen, C., et al.: Co-reference resolution of the medical concepts in the patient discharge summaries. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
van Deemter, K., Kibble, R.: On coreferring: coreference in MUC and related annotation schemes. Comput. Linguist. 26 (4), 629–637 (2000). Squib
Google Scholar
Denis, P., Baldridge, J.: Joint determination of anaphoricity and coreference resolution using integer programming. In: Proceedings of NAACL-HLT 2007, Rochester (2007)
Google Scholar
Denis, P., Baldridge, J.: Global joint models for coreference resolution and named entity classification. Procesamiento del Lenguaje Natural 42, 87–96 (2009)
Google Scholar
Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The automatic content extraction (ACE) program – tasks, data, and evaluation. In: Proceedings of LREC 2004, Lisbon, pp. 837–840 (2004)
Google Scholar
Fernandes, E., dos Santos, C., Milidiú, R.: Latent structure perceptron with feature induction for unrestricted coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 41–48 (2012)
Google Scholar
Fisher, D., Soderland, S., McCarthy, J., Feng, F., Lehnert, W.: Description of the UMass system as used for MUC-6. In: Proceedings of MUC-6, Columbia, pp. 127–140 (1995)
Google Scholar
Fukumoto, J., Masui, F., Shimohata, M., Sasaki, M.: Oki electric industry: description of the Oki system as used for MUC-7. In: Proceedings of MUC-7, Fairfax (1998)
Google Scholar
Gaizauskas, R., Wakao, T., Humphreys, K., Cunningham, H., Wilks, Y.: University of Sheffield: description of the LaSIE system as used for MUC-6. In: Proceedings of MUC-6, Columbia, pp. 207–220 (1995)
Google Scholar
Garigliano, R., Urbanowicz, A., Nettleton, D.J.: University of Durham: description of the LOLITA system as used in MUC-7. In: Proceedings of MUC-7, Fairfax (1998)
Google Scholar
Gärtner, M., Björkelund, A., Thiele, G., Seeker, W., Kuhn, J.: Visualization, search, and error analysis for coreference annotations. In: Proceedings of ACL: System Demonstrations, Baltimore, pp. 7–12 (2014)
Google Scholar
Glinos, D.: A search based method for clinical text coreference resolution. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
Gooch, P.: Coreference resolution in clinical discharge summaries, progress notes, surgical and pathology reports: a unified lexical approach. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
Grishman, R.: The NYU system for MUC-6 or where’s the syntax? In: Proceedings of MUC-6, Columbia, pp. 167–175 (1995)
Google Scholar
Grishman, R., Sundheim, B.: Design of the MUC-6 evaluation. In: Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia (1995)
Google Scholar
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of COLING, Copenhagen, pp. 466–471 (1996)
Google Scholar
Grouin, C., Dinarelli, M., Rosset, S.: Coreference resolution in clinical reports – the limsi participation in the i2b2/va 2011 challenge. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
Haghighi, A., Klein, D.: Unsupervised coreference resolution in a nonparametric Bayesian model. In: Proceedings of ACL, Prague, pp. 848–855 (2007)
Google Scholar
Hendrickx, I., Bouma, G., Coppens, F., Daelemans, W., Hoste, V., Kloosterman, G., Mineur, A.M., Van Der Vloet, J., Verschelde, J.L.: A coreference corpus and resolution system for Dutch. In: Proceedings of LREC, Marrakech (2008)
Google Scholar
Hinote, D., Ramirez, C., Chen, P.: A comparative study of co-refernece resolution in clinical text. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
Hinrichs, E., Kübler, S., Naumann, K.: A unified representation for morphological, syntactic, semantic and referential annotations. In: ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, Ann Arbor (2005)
Google Scholar
Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: the 90 % solution. In: Proceedings of HLT/NAACL, pp. 57–60. Association for Computational Linguistics, New York City (2006)
Google Scholar
Humphreys, K., Gaizauskas, R., Azzam, S., Huyck, C., Mitchell, B., Cunningham, H., Wilks, Y.: University of Sheffield: description of the LaSIE-II system as used for MUC-7. In: Proceedings of MUC-7, Fairfax (1998)
Google Scholar
Irwin, J., Komachi, M., Matsumoto, Y.: Narrative schema as world knowledge for coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 86–92 (2011)
Google Scholar
Jindal, P., Roth, D.: Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
Klein, D., Kummerfeld, J.K., Bansal, M., Burkett, D.: Mention detection: heuristics for the OntoNotes annotations. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 102–106 (2011)
Google Scholar
Klenner, M., Tuggener, D.: An incremental model for coreference resolution with restrictive antecedent accessibility. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 81–85 (2011)
Google Scholar
Kobdani, H., Schütze, H.: SUCRE: a modular system for coreference resolution. In: Proceedings of SemEval-2, Uppsala, pp. 92–95 (2010)
Google Scholar
Kobdani, H., Schütze, H.: Supervised coreference resolution with SUCRE. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 71–75 (2011)
Google Scholar
Kummerfeld, J.K., Klein, D.: Error-driven analysis of challenges in coreference resolution. In: Proceedings of EMNLP, Seattle, pp. 265–277 (2013)
Google Scholar
Lan, M., Zhao, J., Zhang, K., et al.: Comparative investigation on learning-based and rule-based approaches to coreference resolution in clinic domain: a case study in i2b2 challenge 2011 Task 1. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 28–34 (2011)
Google Scholar
Li, B.: Learning to model multilingual unrestricted coreference in OntoNotes. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 129–135 (2012)
Google Scholar
Li, X., Wang, X., Liao, X.: Simple maximum entropy models for multilingual coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 83–87 (2012)
Google Scholar
Li, X., Wang, X., Qi, S.: Coreference resolution with loose transitivity constraints. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 107–111 (2011)
Google Scholar
Lin, D.: University of Manitoba: description of the PIE system used for MUC-6. In: Proceedings of MUC-6, Columbia, pp. 114–126 (1995)
Google Scholar
Lin, D.: Using collocation statistics in information extraction. In: Proceedings of MUC-7, Fairfax (1998)
Google Scholar
Luo, X.: On coreference resolution performance metrics. In: Proceedings of HLT-EMNLP, Vancouver, pp. 25–32 (2005)
Google Scholar
Luo, X.: Coreference or not: a twin model for coreference resolution. In: Proceedings of HLT-NAACL 2007, Rochester, pp. 73–80 (2007)
Google Scholar
Luo, X., Ittycheriah, A., Jing, H., Kambhatla, N., Roukos, S.: A mention-synchronous coreference resolution algorithm based on the Bell tree. In: Proceedings of ACL, Barcelona, pp. 21–26 (2004)
Google Scholar
Luo, X., Pradhan, S., Recasens, M., Hovy, E.: An extension of BLANC to system mentions. In: Proceedings of ACL, Baltimore, pp. 24–29 (2014)
Google Scholar
Màrquez, L., Recasens, M., Sapena, E.: Coreference resolution: an empirical study based on SemEval-2010 shared Task 1. Lang. Resour. Eval. 47 (3), 661–694 (2012)
Article Google Scholar
Martschat, S., Cai, J., Broscheit, S., Mújdricza-Maydt, É., Strube, M.: A multigraph model for coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 100–106 (2012)
Google Scholar
Mitkov, R.: Towards a more consistent and comprehensive evaluation of anaphora resolution algorithms and systems. In: Proceedings of the Discourse Anaphora and Anaphora Resolution Colloquium (DAARC 2000), Lancaster, pp. 96–107 (2010)
Google Scholar
Morgan, R., Garigliano, R., Callaghan, P., Poria, S., Smith, M., Urbanowicz, A., Collingham, R., Costantino, M., Cooper, C., the LOLITA Group: University of Durham: description of the LOLITA system as used in MUC-6. In: Proceedings of MUC-6, Columbia, pp. 71–85 (1995)
Google Scholar
Ng, V.: Graph-cut-based anaphoricity determination for coreference resolution. In: Proceedings of NAACL-HLT 2009, Boulder, pp. 575–583 (2009)
Google Scholar
Pradhan, S., Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: a unified relational semantic representation. Int. J. Semant. Comput. 1 (4), 405–419 (2007)
Article Google Scholar
Pradhan, S., Luo, X., Recasens, M., Hovy, E., Ng, V., Strube, M.: Scoring coreference partitions of predicted mentions: a reference implementation. In: Proceedings of ACL, Baltimore, pp. 30–35 (2014)
Google Scholar
Pradhan, S., Moschitti, A., Xue, N., Uryupina, O., Zhang, Y.: CoNLL-2012 shared task: modeling multilingual unrestricted coreference in OntoNotes. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 1–40 (2012)
Google Scholar
Pradhan, S., Ramshaw, L., Marcus, M., Palmer, M., Weischedel, R., Xue, N.: CoNLL-2011 shared task: modeling unrestricted coreference in OntoNotes. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 1–27 (2011)
Google Scholar
Rahman, A., Ng, V.: Supervised models for coreference resolution. In: Proceedings of EMNLP 2009, Suntec, pp. 968–977 (2009)
Google Scholar
Recasens, M., Hovy, E.: Coreference resolution across corpora: languages, coding schemes, and preprocessing information. In: Proceedings of ACL, Uppsala, pp. 1423–1432 (2010)
Google Scholar
Recasens, M., Hovy, E.: BLANC: implementing the rand index for coreference evaluation. Nat. Lang. Eng. 17 (4), 485–510 (2011)
Article Google Scholar
Recasens, M., de Marneffe, M.C., Potts, C.: The life and death of discourse entities: identifying singleton mentions. In: Proceedings of NAACL-2013, Atlanta, pp. 627–633 (2013)
Google Scholar
Recasens, M., Màrquez, L., Sapena, E., Martí, M.A., Taulé, M., Hoste, V., Poesio, M., Versley, Y.: SemEval-2010 Task 1: coreference resolution in multiple languages. In: Proceedings of SemEval-2, Uppsala, pp. 1–8 (2010)
Google Scholar
Recasens, M., Martí, M.A.: AnCora-CO: coreferentially annotated corpora for Spanish and Catalan. Lang. Resour. Eval. 44 (4), 315–345 (2010)
Article Google Scholar
Rink, B., Harabagiu, S.: A supervised multi-pass sieve approach for resolving coreference in clinical records. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
Rodríguez, K.J., Delogu, F., Versley, Y., Stemle, E., Poesio, M.: Anaphoric annotation of Wikipedia and blogs in the live memories corpus. In: Proceedings LREC, Valletta (poster) (2010)
Google Scholar
dos Santos, C.N., Carvalho, D.L.: Rule and tree ensembles for unrestricted coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 51–55 (2011)
Google Scholar
Sapena, E., Padró, L., Turmo, J.: RelaxCor: a global relaxation labeling approach to coreference resolution for the SemEval-2 coreference task. In: Proceedings of SemEval-2, Uppsala, pp. 88–91 (2010)
Google Scholar
Sapena, E., Padró, L., Turmo, J.: RelaxCor participation in CoNLL shared task on coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 35–39 (2011)
Google Scholar
Savova, G.K., Chapman, W.W., Zheng, J., Crowley, R.S.: Anaphoric relations in the clinical narrative: corpus creation. J. Am. Med. Inform. Assoc. 18 (4), 459–465 (2011)
Article Google Scholar
Shou, H., Zhao, H.: System paper for CoNLL-2012 shared task: hybrid rule-based algorithm for coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 118–121 (2012)
Google Scholar
Sobha, L.D., Pattabhi, R.K.R., Vijay Sundar Ram, R., Malarkodi, C.S., Akilandeswari, A.: Hybrid approach for coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland (2011)
Google Scholar
Song, Y., Wang, H., Jiang, J.: Link type based pre-cluster pair model for coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 131–315 (2011)
Google Scholar
Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27 (4), 521–544 (2001)
Article Google Scholar
Stamborg, M., Medved, D., Exner, P., Nugues, P.: Using syntactic dependencies to solve coreferences. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 64–70 (2012)
Google Scholar
Stoyanov, V., Babbar, U., Gupta, P., Cardie, C.: Reconciling OntoNotes: unrestricted coreference resolution in OntoNotes with reconcile. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 122–126 (2011)
Google Scholar
Stoyanov, V., Gilbert, N., Cardie, C., Riloff, E.: Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art. In: Proceedings of ACL-IJCNLP, Singapore, pp. 656–664 (2009)
Google Scholar
Uryupina, O.: Corry: a system for coreference resolution. In: Proceedings of SemEval-2, Uppsala, pp. 100–103 (2010)
Google Scholar
Uryupina, O., Moschitti, A., Poesio, M.: BART goes multilingual: the UniTN/Essex submission to the CoNLL-2012 shared task. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 122–128 (2012)
Google Scholar
Uryupina, O., Saha, S., Ekbal, A., Poesio, M.: Multi-metric optimization for coreference: the UniTN/IITP/Essex submission to the 2011 CONLL shared task. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 61–65 (2011)
Google Scholar
Uzuner, O., Bodnari, A., Shen, S., Forbush, T., Pestian, J., South, B.R.: Evaluating the state of the art in coreference resolution for electronic medical records. J. Am. Med. Inform. Assoc. 19 (5), 786–791 (2012)
Article Google Scholar
Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L.: A model-theoretic coreference scoring scheme. In: Proceedings of MUC-6, Columbia, pp. 45–52 (1995)
Google Scholar
Ware, H., Mullet, C., Jagannathan, V., El-Rawas, O.: Machine learning-based coreference resolution of concepts in clinical documents. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
Weischedel, R., Hovy, E., Palmer, M., Marcus, M., Belvin, R., Pradhan, S., Ramshaw, L., Xue, N.: OntoNotes: a large training corpus for enhanced processing. In: Olive, J., Christianson, C., McCary, J. (eds.) Handbook of Natural Language Processing and Machine Translation. Springer, New York (2011)
Google Scholar
Xiong, H., Liu, Q.: ICT: system description for CoNLL-2012. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 71–75 (2012)
Google Scholar
Xiong, H., Song, L., Meng, F., Liu, Y., Liu, Q., Lv, Y.: ETS: an error tolerable system for coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 76–80 (2011)
Google Scholar
Xu, Y., Liu, J., Wu, J., Wang, Y., Chang, E.: EHUATUO: a mention-pair coreference system by exploiting document intrinsic latent structures and world knowledge in discharge summaries (Rank 1). In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
Xu, R., Xu, J., Liu, J., Liu, C., Zou, C., Gui, L., Zheng, Y., Qu, P.: Incorporating rule-based and statistic-based techniques for coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 107–112 (2012)
Google Scholar
Yang, H., Willis, A., De Roeck, A., Nuseibeh, B.: A system for coreference resolution in clinical documents. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)
Google Scholar
Yang, Y., Xue, N., Anick, P.: A machine learning-based coreference detection system for OntoNotes. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 117–121 (2011)
Google Scholar
Yuan, B., Chen, Q., Xiang, Y., Wang, X., Ge, L., Liu, Z., Liao, M., Si, X.: A mixed deterministic model for coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 76–82 (2012)
Google Scholar
Zhang, X., Wu, C., Zhao, H.: Chinese coreference resolution via ordered filtering. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 95–99 (2012)
Google Scholar
Zhekova, D., Kübler, S.: UBIU: a language-independent system for coreference resolution. In: Proceedings of SemEval-2, Uppsala, pp. 96–99 (2010)
Google Scholar
Zhekova, D., Kübler, S.: UBIU: a robust system for resolving unrestricted coreference. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 112–116 (2011)
Google Scholar
Zhekova, D., Kübler, S., Bonner, J., Ragheb, M., Hsu, Y.Y.: UBIU for multilingual coreference resolution in ontonotes. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp. 88–94 (2012)
Google Scholar
Zhou, H., Li, Y., Huang, D., Zhang, Y., Wu, C., Yang, Y.: Combining syntactic and semantic features by SVM for unrestricted coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp. 66–70 (2011)
Google Scholar

Download references

Acknowledgements

We would like to thank our co-organizers of SemEval-2010 Task 1 (Lluís Màrquez, Emili Sapena, M. Antònia Martí, Mariona Taulé, Véronique Hoste, Massimo Poesio, and Yannick Versley) and the CoNLL-2011/2012 Shared Tasks (Lance Ramshaw, Mitchell Marcus, Martha Palmer, Ralph Weischedel, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and Yuchen Zhang), as well as the organizers of the MUC, ACE and i2b2 evaluation campaigns.

We would also like to thank all the participants. Without their hard work, patience and perseverance, these evaluations would not have happened.

The second author gratefully acknowledges the support of the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, grants R01LM10090 from the National Library of Medicine, and IIS-1219142 from the National Science Foundation and the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant number 288024 (LiMoSINe).

Author information

Authors and Affiliations

Google Inc., Mountain Vew, CA, USA
Marta Recasens
Boulder Learning, Inc., Boulder, CO, USA
Sameer Pradhan

Authors

Marta Recasens
View author publications
You can also search for this author in PubMed Google Scholar
Sameer Pradhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marta Recasens .

Editor information

Editors and Affiliations

Trento, Italy
Massimo Poesio
Frankfurt am Main, Germany
Roland Stuckardt
Heidelberg, Germany
Yannick Versley

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Recasens, M., Pradhan, S. (2016). Evaluation Campaigns. In: Poesio, M., Stuckardt, R., Versley, Y. (eds) Anaphora Resolution. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-47909-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-662-47909-4_6
Published: 05 August 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-47908-7
Online ISBN: 978-3-662-47909-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics