Abstract
Understanding discourse relies to a great extent on correctly interpreting relations holding between the eventualities and facts mentioned in discourse. These discourse relations, such as causal, contrastive and temporal relations, can be expressed explicitly or implicitly in the discourse, and are the subject of annotation in the Penn Discourse Treebank (PDTB). This chapter presents a case study of the PDTB. Starting with the main ideas behind the annotation framework, we provide a brief overview of the annotation and representation, describe the research and other annotation efforts that the corpus has led to, and finally discuss some major challenges that have arisen in annotating the PDTB, focusing in particular on the problem of characterizing and identifying, via annotation, explicit as well as implicit signals of discourse relations, and of designing the overall annotation workflow.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
As this example shows, annotations in the PDTB can be discontinuous. Discontinuous annotation is possible for connectives as well, such as for on the one hand \(\ldots \) on the other hand.
References
Al-Saif, A., Markert, K.: The Leeds Arabic discourse treebank: annotating discourse connectives for Arabic. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC-2010), pp. 2046–2053. Valletta, Malta (2010)
Asher, N., Lascarides, A.: Logics of Conversation. Cambridge University Press, Cambridge (2003)
Baldridge, J., Asher, N., Hunter, J.: Annotation for and robust parsing of discourse structure on unrestricted texts. Zeitschrift fur Sprachwissenschaft 26, 213–239 (2007)
Banik, E., Lee, A.: A study of parentheticals in discourse corpora - implications for NLG systems. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), pp. 2668–2675. Marrakech, Morocco (2008)
Callison-Birch, C.: Paraphrasing and translation. Ph.D. thesis, School of Informatics, University of Edinburgh, 2007
Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In: Proceedings of the 2nd SIGDIAL Workshop on Discourse and Dialogue, Eurospeech 2001, pp. 1–10 (2001)
Danlos, L.: Discourse verbs. In: Proceedings of the 2nd Workshop on Constraints in Discourse, pp. 59–65. Maynooth, Ireland (2006)
Danlos, L., Antolinos-Basso, D., Braud, C., Roze, C.: Vers le FDTB: French discourse tree bank. In: Proceedings of the Joint Conference JEP-TALN-RECITAL, pp. 471–479. Grenoble, France (2012)
Dinesh, N., Lee, A., Miltsakaki, E., Prasad, R., Joshi, A., Webber, B.: Attribution and the (non)-alignment of syntactic and discourse arguments of connectives. In: Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pp. 29–36. Michigan, Ann Arbor (2005)
Elwell, R., Baldridge, J.: Discourse connective argument identification with connective specific rankers. In: Proceedings of ICSC-2008, pp. 198–205 (2008)
Forbes-Riley, K., Webber, B., Joshi, A.: Computing discourse semantics: the predicate-argument semantics of discourse connectives in D-LTAG. J. Semant. 23, 55–106 (2006)
Ghosh, S., Tonelli, S., Riccardi, G., Johansson, R.: End-to-end discourse parser evaluation. In: Proceedings of the Fifth IEEE International Conference on Semantic Computing (ICSC), pp. 169–172. Palo Alto, CA (2011)
Ghosh, S., Johansson, R., Riccardi, G., Tonelli, S.: Shallow discourse parsing with conditional random fields. In: Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), pp. 1071–1079 (2011)
Ghosh, S., Johansson, R., Riccardi, G., Tonelli, S.: Improving the recall of a discourse parser by constraint-based postprocessing. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC), pp. 2791–2794 (2012)
Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman, London (1976)
Hobbs, J.R.: On the coherence and structure of discourse. Technical Report CSLI-85-37, Center for the Study of Language and Information, Ventura Hall, Stanford University, Stanford, CA 94305 (1985)
Horn, L.: Remarks on neg-raising. In: Cole, P. (ed.) Syntax and Semantics 9: Pragmatics, pp. 129–220. Academic Press, New York (1978)
Huong, L., Abeysinghe, G., Huyck, C.: Using cohesive devices to recognize rhetorical relations in text. In: Proceedings of 4th Computational Linguistics UK Research Colloquium (CLUK 4), pp. 123–128. University of Edinburgh, UK (2003)
Joty, S., Carenini, G., Ng, R.: A novel discriminative framework for sentence-level discourse analysis. In: Proceedings, Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 904–915 (2012)
Kibble, R.: Nominalisation and rhetorical structure. In: Proceedings of ESSLLI Formal Grammar Conference, pp. 49–60. Utrecht (1999)
Knott, A.: A data-driven methodology for motivating a set of coherence relations. Ph.D. thesis, University of Edinburgh, Edinburgh, 1996
Knott, A., Oberlander, J., O’Donnell, M., Mellish, C.: Beyond elaboration: the interaction of relations and focus in coherent text. In: Sanders, T., Schilperoord, J., Spooren, W. (eds.) Text Representation: Linguistic and Psycholinguistic Aspects, pp. 181–196. Benjamins, Amsterdam (2001)
Kolachina, S., Prasad, R., Sharma, D.M., Joshi, A.: Evaluation of discourse relation annotation in the Hindi Discourse Relation Bank. In: In Proceedings of the Eighth International Conference on Language Resources and Evaluation, pp. 823–828 (2012)
Lee, A., Prasad, R., Joshi, A., Dinesh, N., Webber, B.: Complexity of dependencies in discourse: are dependencies in discourse more complex than in syntax? In: Proceedings of the 5th International Workshop on Treebanks and Linguistic Theories (TLT), pp. 79–90. Czech Republic, Prague (2006)
Lee, A., Prasad, R., Joshi, A., Webber, B.: Departures from tree structures in discourse: shared arguments in the Penn Discourse Treebank. In: Proceedings of the Constraints in Discourse III Workshop, pp. 61–68. Potsdam, Germany (2008)
Lin, Z., Kan, M.-Y., Ng, H.T.: Recognizing implicit discourse relations in the Penn Discourse Treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 343–351. Singapore (2009)
Lin, Z., Ng, H.T., Kan, M.-Y.: A PDTB-styled end-to-end discourse parser. Nat. Lang. Eng. 20, 151–184 (2014)
Louis, A., Joshi, A., Prasad, R., Nenkova, A.: Using entity features to classify implicit relations. In: Proceedings of the 11th Annual SIGdial Meeting on Discourse and Dialogue, pp. 59–62. Tokyo, Japan (2010)
Mann, W.C., Thompson, S.A.: Rhetorical Structure Theory: toward a functional theory of text organization. Text 8(3), 243–281 (1988)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)
Martin, J.R.: English Text: System and Structure. Benjamins, Amsterdam (1992)
Miltsakaki, E., Dinesh, N., Prasad, R., Joshi, A., Webber, B.: Experiments on sense annotation and sense disambiguation of discourse connectives. In: Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT), Barcelona, Spain (2005)
Miltsakaki, E., Robaldo, L., Lee, A., Joshi, A.: Sense annotation in the Penn Discourse Treebank. Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science, vol. 4919, pp. 275–286 (2008)
Mladová, L., Zikánová, Š., Hajičová, E.: From sentence to discourse: building an annotation scheme for discourse based on Prague dependency treebank. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), pp. 2564–2570. Marrakech, Morocco (2008)
Moore, J., Pollack, M.: A problem for RST: the need for multi-level discouse analysis. Comput. Linguist. 18(4), 537–544 (1992)
Oza, U., Prasad, R., Kolachina, S., Meena, S., Sharma, D.M., Joshi, A.: Experiments with annotating discourse relations in the Hindi discourse relation bank. In: Proceedings of the 7th International Conference on Natural Language Processing (ICON-2009), pp. 259–258. Hyderabad, India (2009)
Oza, U., Prasad, R., Kolachina, S., Sharma, D.M., Joshi, A.: The Hindi discourse relation bank. In: Proceedings of the ACL 2009 Linguistic Annotation Workshop III (LAW-III), pp. 158–161. Singapore (2009)
Palmer, A., Sporleder, C.: Situation entities and genre distinctions in the Penn Discourse Treebank. In: Proceedings of Texas Linguistics Society XII (TLSXII), Austin, Texas (2009)
Palmer, M., Guildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)
Pareti, S.: Towards a discourse resource for Italian: developing an annotation schema for attribution. Technical report, University of Pavia, Italy. M.S. thesis, Faculty of Letters and Philosophy (2009)
Pareti, S., Prodanof, I.: Annotating attribution relations: towards an Italian discourse treebank. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC-2010), pp. 3566–3571. Valletta, Malta (2010)
Pitler, E., Nenkova, A.: Revisiting readability: a unified framework for predicting text quality. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2008)
Pitler, E., Nenkova, A.: Using syntax to disambiguate explicit discourse connectives in text. In: Proceedings of the Joint Conference of the 47th Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing, pp. 13–16. Singapore (2009)
Pitler, E., Raghupathy, M., Mehta, H., Nenkova, A., Lee, A., Joshi, A.: Easily identifiable discourse relations. In: Proceedings of COLING: Posters and Demonstrations (2008)
Pitler, E., Louis, A., Nenkova, A.: Automatic sense prediction for implicit discourse relations in text. In: Proceedings of the Association for Computational Linguistics, pp. 683–691. Singapore (2009)
Power, R.: Abstract verbs. In: ENLG ’07: Proceedings of the Eleventh European Workshop on Natural Language Generation, pp. 93–96. Association for Computational Linguistics, Morristown, NJ, USA (2007)
Prasad, R., Bunt, H.: Semantic relations in discourse: the current state of ISO 24617-8. In: Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-11), pp. 80–92. London, UK (2015)
Prasad, R., Dinesh, N., Lee, A., Joshi, A., Webber, B.: Attribution and its annotation in the Penn Discourse Treebank. Traitement Automatique des Langues Special Issue Comput. Approaches Document Discourse, 47(2), 43–64 (2007)
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The Penn Discourse Treebank 2.0. In: Proceedings of the 6th International Conference of Language Resources and Evaluation (LREC), pp. 2961–2968. Marrakech, Morocco (2008)
Prasad, R., Husain, S., Sharma, D.M., Joshi, A.: Towards an annotated corpus of discourse relations in Hindi. In: Proceedings of the IJCNLP-08 Workshop on Asian Language Resources, pp. 73–80. Hyderabad, India (2008)
Prasad, R., Joshi, A., Webber, B.: Exploiting scope for shallow discourse parsing. In: Proceedings of the Seventh International Conference on Language Resources and their Evaluation (LREC-2010), pp. 2076–2083. Valletta, Malta (2010)
Prasad, R., Joshi, A., Webber, B.: Realization of discourse relations by other means: alternative lexicalizations. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pp. 1023–1031. Beijing, China (2010)
Prasad, R., McRoy, S., Frid, N., Joshi, A., Yu, H.: The biomedical discourse relation bank. BMC Bioinform. 12(1), 188 (2011)
Prasad, R., Webber, B., Joshi, A.: Reflections on the Penn Discourse Treebank, comparable corpora, and complementary annotation. Comput. Linguist. 40(4), 921–950 (2014)
Rachakonda, R.T., Sharma, D.M.: Creating an annotated Tamil corpus as a discourse resource. In: Proceedings of the 5th Linguistic Annotation Workshop, pp. 119–123, Portland, OR (2011)
Rysová, M.: Alternative lexicalizations of discourse connectives in Czech. In: Proceedings of LREC, pp. 2800–2807 (2012)
Sanders, T.J.M., Spooren, W.P.M., Noordman, L.G.M.: Toward a taxonomy of coherence relations. Discourse Process. 15, 1–35 (1992)
PDTB-Group: The Penn Discourse TreeBank 2.0 Annotation Manual. Technical Report IRCS-08-01, Institute for Research in Cognitive Science, University of Pennsylvania (2008)
Stede, M., Neumann, A.: Potsdam commentary corpus 2.0: annotation for discourse research. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp. 925–929, Reykjavik, Iceland (2014)
Taboada, M.: Discourse markers as signals (or not) of rhetorical relations. J. Pragmat. 38(4), 567–592 (2006)
Tonelli, S., Riccardi, G., Prasad, R., Joshi, A.: Annotation of discourse relations for conversational spoken dialogs. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), pp. 2084–2090. Valletta, Malta (2010)
Webber, B.: Genre distinctions for discourse in the Penn TreeBank. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 674–682. Suntec, Singapore (2009)
Webber, B., Di Eugenio, B.: Free adjuncts in natural language instructions. In: Proceedings of COLING90, pp. 395–400 (1990)
Webber, B., Joshi, A.: Anchoring a lexicalized tree-adjoining grammar for discourse. In: Stede, M., Wanner, L., Hovy, E. (eds.) Discourse Relations and Discourse Markers: Proceedings of the Conference, pp. 86–92. Association for Computational Linguistics, Somerset, New Jersey (1998)
Webber, B., Egg, M., Kordoni, V.: Discourse structure and language technology. Nat. Lang. Eng. 18(4), 437–490 (2012)
Wellner, B.: Sequence Models and Re-ranking Methods for Discourse Parsing. Ph.D. thesis, Brandeis University, Boston, MA (2009)
Wellner, B., Pustejovsky, J.: Automatically identifiying the arguments of discourse connectives. In: Proceedings of EMNLP-CoNLL, pp. 92–101 (2007)
Wolf, F., Gibson, E.: Representing discourse coherence: a corpus-based study. Comput. Linguist. 31(2), 249–287 (2005)
Xue, N.: Annotating discourse connectives in the Chinese Treebank. In: Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pp. 84–91. Michigan, Ann Arbor (2005)
Zeyrek, D., Webber, B.: A discourse resource for Turkish: Annotating discourse connectives in the METU corpus. In: Proceedings of the 6th Workshop on Asian Language Resources, The Third International Joint Conference on Natural Language Processing, (IJCNLP-2008), pp. 65–71. Hyderabad, India (2008)
Zeyrek, D., Demirşahin, I., Sevdik-Çallı, A.,Ögel, H., Yalçınkaya, İ, Ümit Deniz, T.: The annotation scheme of the Turkish Discourse Bank and an evaluation of inconsistent annotations. In: Proceedings of the Fourth Linguistic Annotation Workshop(LAW-IV), ACL 2010, pp. 282–289. Uppsala, Sweden (2010)
Zeyrek, D., Demir Şahin, I., Sevdik-Çallı, A., Çakıcı, R.: Turkish discourse bank: porting a discourse annotation style to a morphologically rich language. Dialogue Discourse 4(2), 174–184 (2013)
Zhou, Y., Xue, N.: PDTB-style discourse annotation of Chinese text. In: In: Proceedings of the 50\(^{\rm th}\) Annual Meeting of the ACL, pp. 69–77. Jeju Island, Korea (2012)
Zhou, Y., Xue, N.: The Chinese discourse treebank: a chinese corpus annotated with discourse relations. J. Lang. Resour. Eval. 49(2), 397–431 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Prasad, R., Webber, B., Joshi, A. (2017). The Penn Discourse Treebank: An Annotated Corpus of Discourse Relations. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_45
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_45
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)