Skip to main content

The Penn Discourse Treebank: An Annotated Corpus of Discourse Relations

  • Chapter
  • First Online:

Abstract

Understanding discourse relies to a great extent on correctly interpreting relations holding between the eventualities and facts mentioned in discourse. These discourse relations, such as causal, contrastive and temporal relations, can be expressed explicitly or implicitly in the discourse, and are the subject of annotation in the Penn Discourse Treebank (PDTB). This chapter presents a case study of the PDTB. Starting with the main ideas behind the annotation framework, we provide a brief overview of the annotation and representation, describe the research and other annotation efforts that the corpus has led to, and finally discuss some major challenges that have arisen in annotating the PDTB, focusing in particular on the problem of characterizing and identifying, via annotation, explicit as well as implicit signals of discourse relations, and of designing the overall annotation workflow.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.seas.upenn.edu/~pdtb.

  2. 2.

    https://catalog.ldc.upenn.edu/LDC2008T05.

  3. 3.

    As this example shows, annotations in the PDTB can be discontinuous. Discontinuous annotation is possible for connectives as well, such as for on the one hand \(\ldots \) on the other hand.

References

  1. Al-Saif, A., Markert, K.: The Leeds Arabic discourse treebank: annotating discourse connectives for Arabic. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC-2010), pp. 2046–2053. Valletta, Malta (2010)

    Google Scholar 

  2. Asher, N., Lascarides, A.: Logics of Conversation. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  3. Baldridge, J., Asher, N., Hunter, J.: Annotation for and robust parsing of discourse structure on unrestricted texts. Zeitschrift fur Sprachwissenschaft 26, 213–239 (2007)

    Article  Google Scholar 

  4. Banik, E., Lee, A.: A study of parentheticals in discourse corpora - implications for NLG systems. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), pp. 2668–2675. Marrakech, Morocco (2008)

    Google Scholar 

  5. Callison-Birch, C.: Paraphrasing and translation. Ph.D. thesis, School of Informatics, University of Edinburgh, 2007

    Google Scholar 

  6. Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In: Proceedings of the 2nd SIGDIAL Workshop on Discourse and Dialogue, Eurospeech 2001, pp. 1–10 (2001)

    Google Scholar 

  7. Danlos, L.: Discourse verbs. In: Proceedings of the 2nd Workshop on Constraints in Discourse, pp. 59–65. Maynooth, Ireland (2006)

    Google Scholar 

  8. Danlos, L., Antolinos-Basso, D., Braud, C., Roze, C.: Vers le FDTB: French discourse tree bank. In: Proceedings of the Joint Conference JEP-TALN-RECITAL, pp. 471–479. Grenoble, France (2012)

    Google Scholar 

  9. Dinesh, N., Lee, A., Miltsakaki, E., Prasad, R., Joshi, A., Webber, B.: Attribution and the (non)-alignment of syntactic and discourse arguments of connectives. In: Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pp. 29–36. Michigan, Ann Arbor (2005)

    Google Scholar 

  10. Elwell, R., Baldridge, J.: Discourse connective argument identification with connective specific rankers. In: Proceedings of ICSC-2008, pp. 198–205 (2008)

    Google Scholar 

  11. Forbes-Riley, K., Webber, B., Joshi, A.: Computing discourse semantics: the predicate-argument semantics of discourse connectives in D-LTAG. J. Semant. 23, 55–106 (2006)

    Article  Google Scholar 

  12. Ghosh, S., Tonelli, S., Riccardi, G., Johansson, R.: End-to-end discourse parser evaluation. In: Proceedings of the Fifth IEEE International Conference on Semantic Computing (ICSC), pp. 169–172. Palo Alto, CA (2011)

    Google Scholar 

  13. Ghosh, S., Johansson, R., Riccardi, G., Tonelli, S.: Shallow discourse parsing with conditional random fields. In: Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), pp. 1071–1079 (2011)

    Google Scholar 

  14. Ghosh, S., Johansson, R., Riccardi, G., Tonelli, S.: Improving the recall of a discourse parser by constraint-based postprocessing. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC), pp. 2791–2794 (2012)

    Google Scholar 

  15. Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman, London (1976)

    Google Scholar 

  16. Hobbs, J.R.: On the coherence and structure of discourse. Technical Report CSLI-85-37, Center for the Study of Language and Information, Ventura Hall, Stanford University, Stanford, CA 94305 (1985)

    Google Scholar 

  17. Horn, L.: Remarks on neg-raising. In: Cole, P. (ed.) Syntax and Semantics 9: Pragmatics, pp. 129–220. Academic Press, New York (1978)

    Google Scholar 

  18. Huong, L., Abeysinghe, G., Huyck, C.: Using cohesive devices to recognize rhetorical relations in text. In: Proceedings of 4th Computational Linguistics UK Research Colloquium (CLUK 4), pp. 123–128. University of Edinburgh, UK (2003)

    Google Scholar 

  19. Joty, S., Carenini, G., Ng, R.: A novel discriminative framework for sentence-level discourse analysis. In: Proceedings, Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 904–915 (2012)

    Google Scholar 

  20. Kibble, R.: Nominalisation and rhetorical structure. In: Proceedings of ESSLLI Formal Grammar Conference, pp. 49–60. Utrecht (1999)

    Google Scholar 

  21. Knott, A.: A data-driven methodology for motivating a set of coherence relations. Ph.D. thesis, University of Edinburgh, Edinburgh, 1996

    Google Scholar 

  22. Knott, A., Oberlander, J., O’Donnell, M., Mellish, C.: Beyond elaboration: the interaction of relations and focus in coherent text. In: Sanders, T., Schilperoord, J., Spooren, W. (eds.) Text Representation: Linguistic and Psycholinguistic Aspects, pp. 181–196. Benjamins, Amsterdam (2001)

    Chapter  Google Scholar 

  23. Kolachina, S., Prasad, R., Sharma, D.M., Joshi, A.: Evaluation of discourse relation annotation in the Hindi Discourse Relation Bank. In: In Proceedings of the Eighth International Conference on Language Resources and Evaluation, pp. 823–828 (2012)

    Google Scholar 

  24. Lee, A., Prasad, R., Joshi, A., Dinesh, N., Webber, B.: Complexity of dependencies in discourse: are dependencies in discourse more complex than in syntax? In: Proceedings of the 5th International Workshop on Treebanks and Linguistic Theories (TLT), pp. 79–90. Czech Republic, Prague (2006)

    Google Scholar 

  25. Lee, A., Prasad, R., Joshi, A., Webber, B.: Departures from tree structures in discourse: shared arguments in the Penn Discourse Treebank. In: Proceedings of the Constraints in Discourse III Workshop, pp. 61–68. Potsdam, Germany (2008)

    Google Scholar 

  26. Lin, Z., Kan, M.-Y., Ng, H.T.: Recognizing implicit discourse relations in the Penn Discourse Treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 343–351. Singapore (2009)

    Google Scholar 

  27. Lin, Z., Ng, H.T., Kan, M.-Y.: A PDTB-styled end-to-end discourse parser. Nat. Lang. Eng. 20, 151–184 (2014)

    Article  Google Scholar 

  28. Louis, A., Joshi, A., Prasad, R., Nenkova, A.: Using entity features to classify implicit relations. In: Proceedings of the 11th Annual SIGdial Meeting on Discourse and Dialogue, pp. 59–62. Tokyo, Japan (2010)

    Google Scholar 

  29. Mann, W.C., Thompson, S.A.: Rhetorical Structure Theory: toward a functional theory of text organization. Text 8(3), 243–281 (1988)

    Google Scholar 

  30. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  31. Martin, J.R.: English Text: System and Structure. Benjamins, Amsterdam (1992)

    Book  Google Scholar 

  32. Miltsakaki, E., Dinesh, N., Prasad, R., Joshi, A., Webber, B.: Experiments on sense annotation and sense disambiguation of discourse connectives. In: Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT), Barcelona, Spain (2005)

    Google Scholar 

  33. Miltsakaki, E., Robaldo, L., Lee, A., Joshi, A.: Sense annotation in the Penn Discourse Treebank. Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science, vol. 4919, pp. 275–286 (2008)

    Google Scholar 

  34. Mladová, L., Zikánová, Š., Hajičová, E.: From sentence to discourse: building an annotation scheme for discourse based on Prague dependency treebank. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), pp. 2564–2570. Marrakech, Morocco (2008)

    Google Scholar 

  35. Moore, J., Pollack, M.: A problem for RST: the need for multi-level discouse analysis. Comput. Linguist. 18(4), 537–544 (1992)

    Google Scholar 

  36. Oza, U., Prasad, R., Kolachina, S., Meena, S., Sharma, D.M., Joshi, A.: Experiments with annotating discourse relations in the Hindi discourse relation bank. In: Proceedings of the 7th International Conference on Natural Language Processing (ICON-2009), pp. 259–258. Hyderabad, India (2009)

    Google Scholar 

  37. Oza, U., Prasad, R., Kolachina, S., Sharma, D.M., Joshi, A.: The Hindi discourse relation bank. In: Proceedings of the ACL 2009 Linguistic Annotation Workshop III (LAW-III), pp. 158–161. Singapore (2009)

    Google Scholar 

  38. Palmer, A., Sporleder, C.: Situation entities and genre distinctions in the Penn Discourse Treebank. In: Proceedings of Texas Linguistics Society XII (TLSXII), Austin, Texas (2009)

    Google Scholar 

  39. Palmer, M., Guildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–106 (2005)

    Article  Google Scholar 

  40. Pareti, S.: Towards a discourse resource for Italian: developing an annotation schema for attribution. Technical report, University of Pavia, Italy. M.S. thesis, Faculty of Letters and Philosophy (2009)

    Google Scholar 

  41. Pareti, S., Prodanof, I.: Annotating attribution relations: towards an Italian discourse treebank. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC-2010), pp. 3566–3571. Valletta, Malta (2010)

    Google Scholar 

  42. Pitler, E., Nenkova, A.: Revisiting readability: a unified framework for predicting text quality. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2008)

    Google Scholar 

  43. Pitler, E., Nenkova, A.: Using syntax to disambiguate explicit discourse connectives in text. In: Proceedings of the Joint Conference of the 47th Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing, pp. 13–16. Singapore (2009)

    Google Scholar 

  44. Pitler, E., Raghupathy, M., Mehta, H., Nenkova, A., Lee, A., Joshi, A.: Easily identifiable discourse relations. In: Proceedings of COLING: Posters and Demonstrations (2008)

    Google Scholar 

  45. Pitler, E., Louis, A., Nenkova, A.: Automatic sense prediction for implicit discourse relations in text. In: Proceedings of the Association for Computational Linguistics, pp. 683–691. Singapore (2009)

    Google Scholar 

  46. Power, R.: Abstract verbs. In: ENLG ’07: Proceedings of the Eleventh European Workshop on Natural Language Generation, pp. 93–96. Association for Computational Linguistics, Morristown, NJ, USA (2007)

    Google Scholar 

  47. Prasad, R., Bunt, H.: Semantic relations in discourse: the current state of ISO 24617-8. In: Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-11), pp. 80–92. London, UK (2015)

    Google Scholar 

  48. Prasad, R., Dinesh, N., Lee, A., Joshi, A., Webber, B.: Attribution and its annotation in the Penn Discourse Treebank. Traitement Automatique des Langues Special Issue Comput. Approaches Document Discourse, 47(2), 43–64 (2007)

    Google Scholar 

  49. Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The Penn Discourse Treebank 2.0. In: Proceedings of the 6th International Conference of Language Resources and Evaluation (LREC), pp. 2961–2968. Marrakech, Morocco (2008)

    Google Scholar 

  50. Prasad, R., Husain, S., Sharma, D.M., Joshi, A.: Towards an annotated corpus of discourse relations in Hindi. In: Proceedings of the IJCNLP-08 Workshop on Asian Language Resources, pp. 73–80. Hyderabad, India (2008)

    Google Scholar 

  51. Prasad, R., Joshi, A., Webber, B.: Exploiting scope for shallow discourse parsing. In: Proceedings of the Seventh International Conference on Language Resources and their Evaluation (LREC-2010), pp. 2076–2083. Valletta, Malta (2010)

    Google Scholar 

  52. Prasad, R., Joshi, A., Webber, B.: Realization of discourse relations by other means: alternative lexicalizations. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pp. 1023–1031. Beijing, China (2010)

    Google Scholar 

  53. Prasad, R., McRoy, S., Frid, N., Joshi, A., Yu, H.: The biomedical discourse relation bank. BMC Bioinform. 12(1), 188 (2011)

    Google Scholar 

  54. Prasad, R., Webber, B., Joshi, A.: Reflections on the Penn Discourse Treebank, comparable corpora, and complementary annotation. Comput. Linguist. 40(4), 921–950 (2014)

    Google Scholar 

  55. Rachakonda, R.T., Sharma, D.M.: Creating an annotated Tamil corpus as a discourse resource. In: Proceedings of the 5th Linguistic Annotation Workshop, pp. 119–123, Portland, OR (2011)

    Google Scholar 

  56. Rysová, M.: Alternative lexicalizations of discourse connectives in Czech. In: Proceedings of LREC, pp. 2800–2807 (2012)

    Google Scholar 

  57. Sanders, T.J.M., Spooren, W.P.M., Noordman, L.G.M.: Toward a taxonomy of coherence relations. Discourse Process. 15, 1–35 (1992)

    Article  Google Scholar 

  58. PDTB-Group: The Penn Discourse TreeBank 2.0 Annotation Manual. Technical Report IRCS-08-01, Institute for Research in Cognitive Science, University of Pennsylvania (2008)

    Google Scholar 

  59. Stede, M., Neumann, A.: Potsdam commentary corpus 2.0: annotation for discourse research. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp. 925–929, Reykjavik, Iceland (2014)

    Google Scholar 

  60. Taboada, M.: Discourse markers as signals (or not) of rhetorical relations. J. Pragmat. 38(4), 567–592 (2006)

    Article  Google Scholar 

  61. Tonelli, S., Riccardi, G., Prasad, R., Joshi, A.: Annotation of discourse relations for conversational spoken dialogs. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), pp. 2084–2090. Valletta, Malta (2010)

    Google Scholar 

  62. Webber, B.: Genre distinctions for discourse in the Penn TreeBank. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 674–682. Suntec, Singapore (2009)

    Google Scholar 

  63. Webber, B., Di Eugenio, B.: Free adjuncts in natural language instructions. In: Proceedings of COLING90, pp. 395–400 (1990)

    Google Scholar 

  64. Webber, B., Joshi, A.: Anchoring a lexicalized tree-adjoining grammar for discourse. In: Stede, M., Wanner, L., Hovy, E. (eds.) Discourse Relations and Discourse Markers: Proceedings of the Conference, pp. 86–92. Association for Computational Linguistics, Somerset, New Jersey (1998)

    Google Scholar 

  65. Webber, B., Egg, M., Kordoni, V.: Discourse structure and language technology. Nat. Lang. Eng. 18(4), 437–490 (2012)

    Article  Google Scholar 

  66. Wellner, B.: Sequence Models and Re-ranking Methods for Discourse Parsing. Ph.D. thesis, Brandeis University, Boston, MA (2009)

    Google Scholar 

  67. Wellner, B., Pustejovsky, J.: Automatically identifiying the arguments of discourse connectives. In: Proceedings of EMNLP-CoNLL, pp. 92–101 (2007)

    Google Scholar 

  68. Wolf, F., Gibson, E.: Representing discourse coherence: a corpus-based study. Comput. Linguist. 31(2), 249–287 (2005)

    Article  Google Scholar 

  69. Xue, N.: Annotating discourse connectives in the Chinese Treebank. In: Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pp. 84–91. Michigan, Ann Arbor (2005)

    Google Scholar 

  70. Zeyrek, D., Webber, B.: A discourse resource for Turkish: Annotating discourse connectives in the METU corpus. In: Proceedings of the 6th Workshop on Asian Language Resources, The Third International Joint Conference on Natural Language Processing, (IJCNLP-2008), pp. 65–71. Hyderabad, India (2008)

    Google Scholar 

  71. Zeyrek, D., Demirşahin, I., Sevdik-Çallı, A.,Ögel, H., Yalçınkaya, İ, Ümit Deniz, T.: The annotation scheme of the Turkish Discourse Bank and an evaluation of inconsistent annotations. In: Proceedings of the Fourth Linguistic Annotation Workshop(LAW-IV), ACL 2010, pp. 282–289. Uppsala, Sweden (2010)

    Google Scholar 

  72. Zeyrek, D., Demir Şahin, I., Sevdik-Çallı, A., Çakıcı, R.: Turkish discourse bank: porting a discourse annotation style to a morphologically rich language. Dialogue Discourse 4(2), 174–184 (2013)

    Google Scholar 

  73. Zhou, Y., Xue, N.: PDTB-style discourse annotation of Chinese text. In: In: Proceedings of the 50\(^{\rm th}\) Annual Meeting of the ACL, pp. 69–77. Jeju Island, Korea (2012)

    Google Scholar 

  74. Zhou, Y., Xue, N.: The Chinese discourse treebank: a chinese corpus annotated with discourse relations. J. Lang. Resour. Eval. 49(2), 397–431 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rashmi Prasad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Prasad, R., Webber, B., Joshi, A. (2017). The Penn Discourse Treebank: An Annotated Corpus of Discourse Relations. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_45

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_45

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics