Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5320))

Included in the following conference series:

  • 1121 Accesses

Abstract

The overall goal is to discuss some issues concerning the dependencies at the discourse level and at the sentence level. However, first I will briefly describe the Penn Discourse Treebank (PDTB)*, a corpus in which we annotate the discourse connectives (explicit and implicit) and their arguments together with “attributions” of the arguments and the relations denoted by the connectives, and also the senses of the connectives. I will then focus on the complexity of dependencies in terms of (a) the elements that bear the dependency relations, (b) graph theoretic properties of these dependencies such as nested and crossed dependencies, dependencies with shared arguments, and (c) attributions and their relationship to the dependencies, among others. I will compare these dependencies with those at the sentence level and discuss some issues that relate to the transition from the sentence level to the level of ”immediate discourse” and propose some conjectures.

An increasing interest in moving human language technology beyond the level of the sentence in text summarization, question answering, and natural language generation , among others, has recently led to the development of several resources that are richly annotated at the discourse level. Among these is the Penn Discourse TreeBank. (PDTB), a large-scale resource of annotated discourse relations and their arguments over the one million word Wall Street Journal (WSJ) Corpus. Since the sentence-level syntactic annotations of the Penn Treebank [2] and the predicate-argument annotations of the Propbank [4] have been done over the same target corpus, the PDTB thus provides a richer substrate for the development and evaluation of practical algorithms while supporting the extraction of useful features pertaining to syntax, semantics and discourse all at once. The PDTB is the first to follow a lexically - grounded approach to the annotation of discourse relations. Discourse relations, when realized explicitly in the text, are annotated by marking the necessary lexical items – called discourse connectives - expressing them, thus supporting their automatic identification.

PDTB adopts a theory-neutral approach to the annotation, making no commitments to what kinds of high-level structures may be created from the low level annotations of relations and their arguments. This approach has the appeal of allowing the corpus to be useful for researchers working within different frameworks. This theory neutrality also permits investigation of the general question of how structure at the sentence level relates to structure at the discourse level, at least that part of the discourse structure that is “parallel” to the sentence structure [6]. In addition to the argument structure of discourse relations, the PDTB provides sense labels for each relation following a hierarchical classification scheme. Annotation of senses highlights the polysemy of connectives, making the PDTB useful for sense disambiguation tasks [3]. Finally, the PDTB separately annotates the attribution of each discourse relation and of each of its two arguments. While attribution is a relation between agents and abstract objects and thus not a discourse relation, it has been annotated in the PDTB because (a) it is useful for applications such as subjectivity analysis and multi-perspective QA [5], and (b) it exhibits an interesting and complex interaction between sentence-level structure and discourse structure [1]. The first preliminary release of the PDTB was in April 2006. A significantly extended version was released as PDTB-2.0 in February 2008, through the Linguistic Data Consortium (LDC), see http://www.seas.upenn.edu/ pdtb, for the annotation manual, published papers, tutorial slides and a link to LDC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dinesh, N., Lee, A., Miltsakaki, E., Prasad, R., Joshi, A., Webber, B.: Attribution and the (non)-alignment of syntactic and discourse arguments of connectives. In: Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, Ann Arbor, Michigan (2005)

    Google Scholar 

  2. Marcus, M.P., Santaroni, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)

    Google Scholar 

  3. Miltsakaki, E., Dinesh, N., Prasad, R., Joshi, A., Webber, B.: Experiments on sense annotation and sense disambiguation of discourse connectives. In: Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), Barcelona, Spain (2005)

    Google Scholar 

  4. Palmer, M., Guildea, D., Kingsbury, P.: The proposition Bank: an annotated corpus of semantic roles. Computational Linguistics 31(1), 71–106 (2005)

    Article  Google Scholar 

  5. Prasad, R., Dinesh, N., Lee, A., Joshi, A., Webber, B.: Annotating attribution in the Penn Discourse Treebank. In: Proceedings of the COLING/ACL Workshop on Sentiment and Subjectivity in Text, pp. 31–38 (2006)

    Google Scholar 

  6. Lee, A., Prasad, R., Joshi, A., Dinesh, N., Webber, B.: Complexity of Dependencies in Discourse: Are Dependencies in Discourse More Complex than in Syntax? In: Proceedings of the 5th International Workshop on Treebanks and Linguistic Theories, Prague, Czech Republic (December 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Joshi, A.K. (2008). Towards Discourse Meaning. In: Paech, B., Martell, C. (eds) Innovations for Requirement Analysis. From Stakeholders’ Needs to Formal Designs. Monterey Workshop 2007. Lecture Notes in Computer Science, vol 5320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89778-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89778-1_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89777-4

  • Online ISBN: 978-3-540-89778-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics