Vagueness and Referential Ambiguity in a Large-Scale Annotated Corpus

Versley, Yannick

doi:10.1007/s11168-008-9059-1

Vagueness and Referential Ambiguity in a Large-Scale Annotated Corpus

Published: 24 December 2008

Volume 6, pages 333–353, (2008)
Cite this article

Research on Language and Computation

Yannick Versley¹

97 Accesses
13 Citations
Explore all metrics

Abstract

In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevenson’s Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Asher N. (2006) Things and their aspects. Philosophical Issues 16(1): 1–23
Article Google Scholar
Asher N., Lascarides A. (2003) Logics of conversation. Cambridge University Press, Cambridge
Google Scholar
Asher, N., & Pustejovsky, J. (2005). Word meaning and commonsense metaphysics. http://semanticsarchive.net/Archive/TgxMDNkM/.
Burchardt, A., Erk, K., Frank, A., Kowalski, A., Padó, S., & Pinkal, M. (2006). The SALSA Corpus: A German corpus resource for lexical semantics. In Proceedings of LREC 2006.
Cahill, A., McCarthy, M., van Genabith, J., & Way, A. (2002). Parsing with PCFGs and automatic F-structure annotation. In Proceedings of the Seventh International Conference on LFG. CSLI Publications.
Carletta J. (1996) Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics 22(2): 249–254
Google Scholar
Castaño, J., Zhang, J., & Pustejovsky, J. (2002). Anaphora resolution in biomedical literature. In International Symposium on Reference Resolution.
Charniak, E., & Johnson, M. (2005). Coarse-to-fine n-best parsing and maxEnt discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005).
Chiarcos, C., & Krasavina, O. (2005). PoCoS—Potsdam Coreference Scheme. Technical report, SFB 632 “Information structure: The linguistic means for structuring utterances, sentences and texts”.
Cicchetti D.V., Feinstein A.R. (1990) High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology 43(6): 551–558
Article Google Scholar
Cohen J. (1960) A coefficient of agreement for nominal scales. Education and Psychological Measurement 43(6): 37–46
Article Google Scholar
Dickinson, M., & Meurers, W. D. (2005). Prune diseased branches to get healthy trees! How to find erroneous local trees in a treebank and why it matters. In Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005). Barcelona, Spain.
Di Eugenio B., Glass M. (2004) The Kappa statistic: A second look. Computational Linguistics 30(1): 95–101
Article Google Scholar
Fauconnier, G. (1984). Espaces Mentaux. Editions de Minuit.
Gardent C., Manuélian H. (2005) Création d’un corpus annoté pour le traitement des déscriptions d éfinies. Traitement Automatique des Langues 46(1): 115–140
Google Scholar
Hinrichs, E., Kübler, S., & Naumann, K. (2005). A unified representation for morphological, syntactic, semantic and referential annotations. In ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky. Ann Arbor.
Hirschman, L., Robinson, P., Burger, J., & Vilain, M. (1997). Automating coreference: The role of automated training data. In Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing.
Hobbs, J. (1985). Granularity. In Proceedings IJCAI 1985.
Hockenmaier, J., & Steedman, M. (2002). Acquiring compact lexicalized grammars from a cleaner treebank. In Proceedings LREC 2002.
Hoste, V., & Daelemans, W. (2004). Learning Dutch coreference resolution. In Fifteenth Computational Linguistics in the Netherlands Meeting (CLIN 2004).
Hripcsak G., Rothschild A.S. (2005) Agreement, the F-measure, and reliability in information retrieval. Journal of the American Medical Informatics Association 12: 296–298
Article Google Scholar
Karttunen, L. (1976). Discourse Referents. In J. D. McCawley (Ed.), Syntax and semantics 7: Notes from the linguistic underground (pp. 363–385). Academic Press.
Kintsch W., van Dijk T. (1978) Toward a model of text comprehension and production. Psychological Review 85: 363–394
Article Google Scholar
Knees, M. (2006). The German temporal anaphor danach—ambiguity in interpretation and annotation. In ESSLLI 2006 workshop on Ambiguity and Anaphora.
Link G. (1983) The logical analysis of plurals and mass terms: A lattice-theoretical approach. In: Bäuerle R., Schwarze C., Stechow A. (eds) Meaning, use and interpretation of language. de Gruyter, NY, USA
Google Scholar
Luo, X., Ittycheriah, A., Jing, H., Kambhatla, N., & Roukos, S. (2004). A mention-synchronous coreference resolution algorithm based on the bell tree. In ACL 2004.
Magerman, D. M. (1995). Statistical decision-tree models for parsing. In ACL’1995.
Mani I. (1998) A theory of granularity and its application to problems of polysemy and underspecification of meaning. In: Cohn A.G., Schubert L.K., Shapiro S.C. (eds) Principles of Knowledge Representation and Reasoning: Proceedings of the Sixth Internatinal Conference (KR’98). Morgan Kaufmann, San Mateo, Menlo Park, pp 245–255
Google Scholar
McCarthy, J. F., & Lehnert, W. G. (1995). Using decision trees for coreference resolution (pp. 1050–1055). In IJCAI 1995.
Meurers W.D. (2005) On the use of electronic corpora for theoretical linguistics. Case studies from the syntax of German. Lingua 115(11): 1619–1639
Article Google Scholar
Miyao, Y., & Tsujii, J. (2005). Probabilistic disambiguation models for wide-coverage HPSG parsing. In ACL 2005.
MUC6. (1995). MUC-6 coreference task definition. DARPA Information Technology Office Tipster Text Program.
Passonneau, R. (1997). Applying reliability metrics to co-reference annotation. Technical Report CUCS-025-03, Columbia University.
Poesio, M. (2000). The GNOME annotation scheme manual. Technical report, University of Edinburgh, HCRC and Informatics.http://www.hcrc.ed.ac.uk/~gnome.
Poesio, M. (2004). The MATE/GNOME scheme for anaphoric annotation, revisited. In Proceedings of SIGDIAL’04. Boston.
Poesio, M., & Artstein, R. (2005). Annotating (Anaphoric) ambiguity. In Corpus Linguistics 2005. Birmingham.
Poesio, M., & Reyle, U. (2001). Underspecification in anaphoric reference. In Fourth International Workshop on Computational Semantics (IWCS-4).
Poesio, M., Reyle, U., & Stevenson, R. (2003). Justified sloppiness in anaphoric reference. In H. Bunt & R. Muskens (Eds.), Computing meaning 3. Dordrecht: Kluwer. (To appear).
Poesio M., Sturt P., Artstein R., Filik R. (2006) Underspecification and anaphora: Theoretical issues and preliminary evidence. Discource Processes 42(2): 152–175
Google Scholar
Reitsma, F., & Bittner, T. (2003). Process, hierarchy and scale. In W. Kuhn, M. Worboys & S. Timpf (Eds.), Spatial information theory. Cognitive and computational foundations of geographic information science (COSIT’03).
Setzer, A., & Gaizauskas, R. (2001). A pilot study on annotating temporal relations in text. In ACL 2001 Workshop on Temporal and Spatial Information Processing.
Smith B., Brogaard B. (2001) A unified theory of truth and reference. Logique et Analyse 43(169–170): 49–93
Google Scholar
Strassel, S., Walker, C., & Mitchell, A. (2004). Annotation consistency study. Slides found athttp://projects.ldc.upenn.edu/ace/workshops/Feb2004.html.
Uryupina, O. (2006). Coreference resolution with and without linguistic knowledge. In Proceedings of LREC 2006.
van Deemter K., Kibble R. (2000) On coreferring: Coreference in MUC and related annotation schemes. Computational Linguistics 26(4): 629–637
Article Google Scholar
van Rijsbergen, C. J. K. (1979). Information retrieval. Butterworths.
Vilain, M., Burger, J., Aberdeen, J., Connolly, D., & Hirschman, L. (1995). A model-theoretic coreference scoring scheme. In Proceedings of the 6th Message Understanding Conference.
Zaenen, A., Carletta, J., Garretson, G., Bresnan, J., Koontz-Garboden, A., Nikitana, T., O’Connor, M. C., & Wasow, T. (2004). Animacy encoding in English: why and how. In ACL 2004 Workshop on Discourse Annotation.

Download references

Author information

Authors and Affiliations

Collaborative Research Center 441 “Linguistic Data Structures”, University of Tübingen, Tübingen, Germany
Yannick Versley

Authors

Yannick Versley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yannick Versley.

About this article

Cite this article

Versley, Y. Vagueness and Referential Ambiguity in a Large-Scale Annotated Corpus. Res on Lang and Comput 6, 333–353 (2008). https://doi.org/10.1007/s11168-008-9059-1

Download citation

Received: 19 October 2006
Revised: 30 January 2008
Accepted: 07 February 2008
Published: 24 December 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s11168-008-9059-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Vagueness and Referential Ambiguity in a Large-Scale Annotated Corpus

Abstract

Access this article

Similar content being viewed by others

Inter-annotator Agreement in Coreference Annotation of Polish

The Elusive Benefits of Vagueness: Evidence from Experiments

Elaboration and intuitions of disagreement

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Navigation

Vagueness and Referential Ambiguity in a Large-Scale Annotated Corpus

Abstract

Access this article

Similar content being viewed by others

Inter-annotator Agreement in Coreference Annotation of Polish

The Elusive Benefits of Vagueness: Evidence from Experiments

Elaboration and intuitions of disagreement

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Search

Navigation