Research on Language and Computation

, Volume 6, Issue 3–4, pp 333–353 | Cite as

Vagueness and Referential Ambiguity in a Large-Scale Annotated Corpus



In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevenson’s Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions.


Coreference annotation Vagueness Sloppiness 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Asher N. (2006) Things and their aspects. Philosophical Issues 16(1): 1–23CrossRefGoogle Scholar
  2. Asher N., Lascarides A. (2003) Logics of conversation. Cambridge University Press, CambridgeGoogle Scholar
  3. Asher, N., & Pustejovsky, J. (2005). Word meaning and commonsense metaphysics.
  4. Burchardt, A., Erk, K., Frank, A., Kowalski, A., Padó, S., & Pinkal, M. (2006). The SALSA Corpus: A German corpus resource for lexical semantics. In Proceedings of LREC 2006.Google Scholar
  5. Cahill, A., McCarthy, M., van Genabith, J., & Way, A. (2002). Parsing with PCFGs and automatic F-structure annotation. In Proceedings of the Seventh International Conference on LFG. CSLI Publications.Google Scholar
  6. Carletta J. (1996) Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics 22(2): 249–254Google Scholar
  7. Castaño, J., Zhang, J., & Pustejovsky, J. (2002). Anaphora resolution in biomedical literature. In International Symposium on Reference Resolution.Google Scholar
  8. Charniak, E., & Johnson, M. (2005). Coarse-to-fine n-best parsing and maxEnt discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005).Google Scholar
  9. Chiarcos, C., & Krasavina, O. (2005). PoCoS—Potsdam Coreference Scheme. Technical report, SFB 632 “Information structure: The linguistic means for structuring utterances, sentences and texts”.Google Scholar
  10. Cicchetti D.V., Feinstein A.R. (1990) High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology 43(6): 551–558CrossRefGoogle Scholar
  11. Cohen J. (1960) A coefficient of agreement for nominal scales. Education and Psychological Measurement 43(6): 37–46CrossRefGoogle Scholar
  12. Dickinson, M., & Meurers, W. D. (2005). Prune diseased branches to get healthy trees! How to find erroneous local trees in a treebank and why it matters. In Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005). Barcelona, Spain.Google Scholar
  13. Di Eugenio B., Glass M. (2004) The Kappa statistic: A second look. Computational Linguistics 30(1): 95–101CrossRefGoogle Scholar
  14. Fauconnier, G. (1984). Espaces Mentaux. Editions de Minuit.Google Scholar
  15. Gardent C., Manuélian H. (2005) Création d’un corpus annoté pour le traitement des déscriptions d éfinies. Traitement Automatique des Langues 46(1): 115–140Google Scholar
  16. Hinrichs, E., Kübler, S., & Naumann, K. (2005). A unified representation for morphological, syntactic, semantic and referential annotations. In ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky. Ann Arbor.Google Scholar
  17. Hirschman, L., Robinson, P., Burger, J., & Vilain, M. (1997). Automating coreference: The role of automated training data. In Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing.Google Scholar
  18. Hobbs, J. (1985). Granularity. In Proceedings IJCAI 1985.Google Scholar
  19. Hockenmaier, J., & Steedman, M. (2002). Acquiring compact lexicalized grammars from a cleaner treebank. In Proceedings LREC 2002.Google Scholar
  20. Hoste, V., & Daelemans, W. (2004). Learning Dutch coreference resolution. In Fifteenth Computational Linguistics in the Netherlands Meeting (CLIN 2004).Google Scholar
  21. Hripcsak G., Rothschild A.S. (2005) Agreement, the F-measure, and reliability in information retrieval. Journal of the American Medical Informatics Association 12: 296–298CrossRefGoogle Scholar
  22. Karttunen, L. (1976). Discourse Referents. In J. D. McCawley (Ed.), Syntax and semantics 7: Notes from the linguistic underground (pp. 363–385). Academic Press.Google Scholar
  23. Kintsch W., van Dijk T. (1978) Toward a model of text comprehension and production. Psychological Review 85: 363–394CrossRefGoogle Scholar
  24. Knees, M. (2006). The German temporal anaphor danach—ambiguity in interpretation and annotation. In ESSLLI 2006 workshop on Ambiguity and Anaphora.Google Scholar
  25. Link G. (1983) The logical analysis of plurals and mass terms: A lattice-theoretical approach. In: Bäuerle R., Schwarze C., Stechow A. (eds) Meaning, use and interpretation of language. de Gruyter, NY, USAGoogle Scholar
  26. Luo, X., Ittycheriah, A., Jing, H., Kambhatla, N., & Roukos, S. (2004). A mention-synchronous coreference resolution algorithm based on the bell tree. In ACL 2004.Google Scholar
  27. Magerman, D. M. (1995). Statistical decision-tree models for parsing. In ACL’1995.Google Scholar
  28. Mani I. (1998) A theory of granularity and its application to problems of polysemy and underspecification of meaning. In: Cohn A.G., Schubert L.K., Shapiro S.C. (eds) Principles of Knowledge Representation and Reasoning: Proceedings of the Sixth Internatinal Conference (KR’98). Morgan Kaufmann, San Mateo, Menlo Park, pp 245–255Google Scholar
  29. McCarthy, J. F., & Lehnert, W. G. (1995). Using decision trees for coreference resolution (pp. 1050–1055). In IJCAI 1995.Google Scholar
  30. Meurers W.D. (2005) On the use of electronic corpora for theoretical linguistics. Case studies from the syntax of German. Lingua 115(11): 1619–1639CrossRefGoogle Scholar
  31. Miyao, Y., & Tsujii, J. (2005). Probabilistic disambiguation models for wide-coverage HPSG parsing. In ACL 2005.Google Scholar
  32. MUC6. (1995). MUC-6 coreference task definition. DARPA Information Technology Office Tipster Text Program.Google Scholar
  33. Passonneau, R. (1997). Applying reliability metrics to co-reference annotation. Technical Report CUCS-025-03, Columbia University.Google Scholar
  34. Poesio, M. (2000). The GNOME annotation scheme manual. Technical report, University of Edinburgh, HCRC and Informatics.
  35. Poesio, M. (2004). The MATE/GNOME scheme for anaphoric annotation, revisited. In Proceedings of SIGDIAL’04. Boston.Google Scholar
  36. Poesio, M., & Artstein, R. (2005). Annotating (Anaphoric) ambiguity. In Corpus Linguistics 2005. Birmingham.Google Scholar
  37. Poesio, M., & Reyle, U. (2001). Underspecification in anaphoric reference. In Fourth International Workshop on Computational Semantics (IWCS-4).Google Scholar
  38. Poesio, M., Reyle, U., & Stevenson, R. (2003). Justified sloppiness in anaphoric reference. In H. Bunt & R. Muskens (Eds.), Computing meaning 3. Dordrecht: Kluwer. (To appear).Google Scholar
  39. Poesio M., Sturt P., Artstein R., Filik R. (2006) Underspecification and anaphora: Theoretical issues and preliminary evidence. Discource Processes 42(2): 152–175Google Scholar
  40. Reitsma, F., & Bittner, T. (2003). Process, hierarchy and scale. In W. Kuhn, M. Worboys & S. Timpf (Eds.), Spatial information theory. Cognitive and computational foundations of geographic information science (COSIT’03).Google Scholar
  41. Setzer, A., & Gaizauskas, R. (2001). A pilot study on annotating temporal relations in text. In ACL 2001 Workshop on Temporal and Spatial Information Processing.Google Scholar
  42. Smith B., Brogaard B. (2001) A unified theory of truth and reference. Logique et Analyse 43(169–170): 49–93Google Scholar
  43. Strassel, S., Walker, C., & Mitchell, A. (2004). Annotation consistency study. Slides found at
  44. Uryupina, O. (2006). Coreference resolution with and without linguistic knowledge. In Proceedings of LREC 2006.Google Scholar
  45. van Deemter K., Kibble R. (2000) On coreferring: Coreference in MUC and related annotation schemes. Computational Linguistics 26(4): 629–637CrossRefGoogle Scholar
  46. van Rijsbergen, C. J. K. (1979). Information retrieval. Butterworths.Google Scholar
  47. Vilain, M., Burger, J., Aberdeen, J., Connolly, D., & Hirschman, L. (1995). A model-theoretic coreference scoring scheme. In Proceedings of the 6th Message Understanding Conference.Google Scholar
  48. Zaenen, A., Carletta, J., Garretson, G., Bresnan, J., Koontz-Garboden, A., Nikitana, T., O’Connor, M. C., & Wasow, T. (2004). Animacy encoding in English: why and how. In ACL 2004 Workshop on Discourse Annotation.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  1. 1.Collaborative Research Center 441 “Linguistic Data Structures”University of TübingenTübingenGermany

Personalised recommendations