Skip to main content

Discourse Relations and Document Structure

  • Chapter
  • First Online:
Linguistic Modeling of Information and Markup Languages

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 41))

Abstract

This chapter addresses the requirements and linguistic foundations of automatic relational discourse analysis of complex text types such as scientific journal articles. It is argued that besides lexical and grammatical discourse markers, which have traditionally been employed in discourse parsing, cues derived from the logical and generical document structure and the thematic structure of a text must be taken into account. An approach to modelling such types of linguistic information in terms of XML-based multi-layer annotations and to a text-technological representation of additional knowledge sources is presented. By means of quantitative and qualitative corpus analyses, cues and constraints for automatic discourse analysis can be derived. Furthermore, the proposed representations are used as the input sources for discourse parsing. A short overview of the projected parsing architecture is given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Asher, Nichoas and Vieu, Laure (2005). Subordinating and coordinating discourse relations. Lingua, 115(4):591–610.

    Google Scholar 

  • Asher, Nicholas and Lascarides, Alex (2003). Logics of Conversation. Cambridge University Press, Cambridge, UK.

    Google Scholar 

  • Bärenfänger, Maja, Lüngen, Harald, Hilbert, Mirco, and Lobin, Henning (in press). The role of logical and generic document structure in relational discourse analysis. In Benz, Anton, Kühnlein, Peter, and Sidner, Candy, editors, Constraints in Discourse 2. Series Pragmatics & Beyond. John Benjamins, Amsterdam.

    Google Scholar 

  • Bärenfänger, Maja, Lobin, Henning, Lüngen, Harald, and Hilbert, Mirco (2008). OWL ontologies in discourse parsing. LDV-Forum. GLDV-Journal for Computational Linguistics and language Technololgy 23(1):7–26.

    Google Scholar 

  • Bayerl, Petra Saskia, Lüngen, H., Gut, U., and Paul, K.I. (2003a). Methodology for reliable schema development and evaluation of manual annotations. In Workshop Notes for the Workshop on Knowledge Markup and Semantic Annotation, Second International Conference on Knowledge Capture (K-CAP 2003), pages 17–23, Sanibel, Florida.

    Google Scholar 

  • Bayerl, Petra Saskia, Lüngen, Harald, Goecke, Daniela, Witt, Andreas, and Naber, Daniel (2003b). Methods for the semantic analysis of document markup. In Proceedings of the ACM Symposium on Document Engineering (DocEng 2003), pages 161–170, Grenoble.

    Google Scholar 

  • Bechhofer, Sean, van Harmelen, Frank, Hendler, Jim, Horrocks, Ian, McGuiness, Deborah L., Patel-Schneider, Peter F., and Stein, Andrea Lynn (2004). OWL Web Ontology Language – Reference. Technical report, W3C (World Wide Web) Consortium. http://www.w3.org/TR/2004/REC-owl-ref-20040210/.

  • Brinker, Klaus (1997). Linguistische Textanalyse. Eine Einführung in Grundbegriffe und Methoden. 4th edition, Erich Schmidt, Berlin.

    Google Scholar 

  • Carlson, Lynn and Marcu, Daniel (2001). Discourse tagging reference manual. Technical report, Information Science Institute, Marina del Rey, CA. ISI-TR-545.

    Google Scholar 

  • Carlson, Lynn, Marcu, Daniel, and Okurowski, Mary Ellen (2001). Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Proceedings of the 2nd SIGDIAL Workshop on Discourse and Dialogue, Eurospeech 2001, Denmark.

    Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurements, 20:37–46.

    Article  Google Scholar 

  • Corston-Oliver, Simon (1998). Computing of Representations of the Structure of Written Discourse. PhD thesis, University of California, Santa Barbara.

    Google Scholar 

  • Daneš, Frantisek (1970). Zur linguistischen Analyse der Textstruktur. Folia Linguistica, 4:72–78.

    Article  Google Scholar 

  • Danlos, Laurence (2005). Comparing RST and SDRT discourse structures through dependency graphs. In Sassen, Claudia, Benz, Anton, and Kühnlein, Peter, editors, Proceedings of Constraints in Discourse, pages 55–62, Dortmund.

    Google Scholar 

  • Egg, Markus and Redeker, Gisela (2005). Underspecified discourse representation. In Sassen, Claudia, Benz, Anton, and Kühnlein, Peter, editors, Proceedings of Constraints in Discourse, pages 46–53, Dortmund.

    Google Scholar 

  • Givon, Talmy (1983). Topic Continuity in Discourse: An Introduction. In Givon, Talmy, editor, Topic Continuity in Discourse: A Quantitative Cross-Language Study, pages 5–41. John Benjamins, Amsterdam, Philadelphia.

    Google Scholar 

  • Goecke, Daniela, Lüngen, Harald, Sasaki, Felix, Witt, Andreas, and Farrar, Scott (2005). GOLD and discourse: Domain- and community-specific extensions. In Proceedings of the 2005 E-MELD-Workshop, Boston, MA.

    Google Scholar 

  • Gruber, H. and Muntigl, P. (2005). Generic and rhetorical structures of texts: Two sides of the same coin? Folia Linguistica. Special Issue: Approaches to Genre, XXXIX(1–2):75–114.

    Google Scholar 

  • Helbig, Gerhard and Buscha, Joachim (1998). Deutsche Grammatik: Ein Handbuch für den Ausländerunterricht. 18th edition, Langenscheidt, Leipzig.

    Google Scholar 

  • Holler, Anke und Jan Frederik Maas und Angelika Storrer (2004). Exploiting coreference annotations for text-to-hypertext conversion. In Proceedings of LREC, volume II, pages 651–654, Lisboa.

    Google Scholar 

  • Hovy, Eduard and Maier, Elisabeth (1995). Parsimonious or profligate: How many and which discourse structure relations? Unpublished paper, http://www.isi.edu/natural-language/people/hovy/publications.html.

  • Kando, Noriko (1999). Text structure analysis as a tool to make retrieved documents usable. In Proceedings of the 4th International Workshop on Information Retrieval with Asian Languages, pages 126–135, Taipei, Taiwan.

    Google Scholar 

  • Kunze, Claudia (2001). Lexikalisch-semantische Wortnetze. In Carstensen, Kai-Uwe et al., editor, Computerlinguistik und Sprachtechnologie: eine Einführung, pages 386–393. Spektrum Verlag, Heidelberg.

    Google Scholar 

  • Landis, J.R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33:159–174.

    Article  MATH  MathSciNet  Google Scholar 

  • Langer, Hagen, Lüngen, Harald, and Bayerl, Petra Saskia (2004a). Text type structure and logical document structure. In Proceedings of the ACL 2004 Workshop on Discourse Annotation, pages 49–56, Barcelona.

    Google Scholar 

  • Le Thanh, Huong, Abeysinghe, Geetha, and Huyck, Christian (2004). Generating discourse structures for written texts. In Proceedings of COLING’04, Geneva, Switzerland.

    Google Scholar 

  • Lötscher, Andreas (1987). Text und Thema. Studien zur thematischen Konstituenz von Texten. Reihe Germanistische Linguistik, 81. Niemeyer, Tübingen.

    Google Scholar 

  • Lüngen, Harald, Lobin, Henning, Bärenfänger, Maja, Hilbert, Mirco, and Puskás, Csilla (2006a). Text parsing of a complex genre. In Proceedings of the Conference on Electronic Publishing (ELPUB), pages 247–256, Bansko, Bulgaria.

    Google Scholar 

  • Lüngen, Harald, Puskás, Csilla, Bärenfänger, Maja, Hilbert, Mirco, and Lobin, Henning (2006b). Discourse segmentation of German written text. In Proceedings of the 5th International Conference on Natural Language Processing (FinTAL 2006), pages 245–256, Åbo, Finland. Springer.

    Google Scholar 

  • Mann, William C. and Taboada, Maite (2005). RST – Rhetorical Structure Theory. W3C page. http://www.sfu.ca/rst.

  • Mann, William C. and Thompson, Sandra A. (1988). Rhetorical Structure Theory: Toward a functional theory of text organisation. Text, 8(3):243–281.

    Google Scholar 

  • Marcu, Daniel (1999). A decision-based approach to rhetorical parsing. In Proceedings of the 37th annual meeting of the ACL, pages 365–372, Maryland. Association for Computational Linguistics.

    Google Scholar 

  • Marcu, Daniel (2000). The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge, MA.

    Google Scholar 

  • Morris, Jane and Hirst, Graeme (1991). Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1):21–48.

    Google Scholar 

  • Motsch, Wolfgang and Viehweger, Dieter (1991). Illokutionsstruktur als Komponente einer modularen Textanalyse. In Brinker, Klaus, editor, Aspekte der Textlinguistik, volume 106/107 of Germanistische Linguistik, pages 107–132. Olms, Hildesheim/Zürich/New York.

    Google Scholar 

  • O’Donnell, Michael (2000). RSTTool 2.4 – A markup tool for Rhetorical Structure Theory. In Proceedings of the International Natural Language Generation Conference (INLG’2000), pages 253 – 256, Mitzpe Ramon, Israel.

    Google Scholar 

  • Pasch, Renate, Brauße, Ursula, Breindl, Eva, and Waßner, Ulrich Hermann, editors (2003). Handbuch der deutschen Konnektoren. Linguistische Grundlagen der Beschreibung und syntaktische Merkmale der deutschen Satzverknüpfer (Konjunktionen, Satzadverbien und Partikeln). Schriften des Instituts für Deutsche Sprache. de Gruyter, Berlin.

    Google Scholar 

  • Polanyi, Livia, Culy, Chris, van den Berg, Martin, Thione, Gian Lorenzo, and Ahn, David (2004a). A rule based approach to discourse parsing. In Proceedings of the 5th Workshop in Discourse and Dialogue, pages 108–117, Cambridge, MA. 2004.

    Google Scholar 

  • Polanyi, Livia, Culy, Chris, van den Berg, Martin, Thione, Gian Lorenzo, and Ahn, David (2004b). Sentential structure and discourse parsing. In Proceedings of the ACL 2004 Workshop on Discourse Annotation, pages 49–56, Barcelona.

    Google Scholar 

  • Polanyi, Livia, van den Berg, Martin, and Ahn, David (2003). Discourse structure and sentential information structure. Journal of Logic, Language and Information, 12:337–350.

    Article  MATH  Google Scholar 

  • Rehm, Georg (1998). Vorüberlegungen zur automatischen Zusammenfassung deutschsprachiger Texte mittels einer SGML- und DSSSL-basierten Repräsentation von RST-Relationen. Master’s thesis, Universität Osnabrück.

    Google Scholar 

  • Reitter, David (2003a). Rhetorical analysis with rich-feature support vector models. Master’s thesis, University of Potsdam.

    Google Scholar 

  • Reitter, David (2003b). Simple signals for complex rhetorics: On rhetorical analysis with rich-feature support vector models. In Seewald-Heeg, Uta, editor, Sprachtechnologie für die multilinguale Kommunikation. Textproduktion, Recherche, Übersetzung, Lokalisierung. Beiträge der GLDV-Frühjahrstagung 2003, volume 18 of LDV-Forum, pages 38–52, Köthen.

    Google Scholar 

  • Schröder, Thomas (2003). Die Handlungsstruktur von Texten. Ein integrativer Beitrag zur Texttheorie. Gunter Narr, Tübingen.

    Google Scholar 

  • Sporleder, Caroline and Lapata, Mirella (2004). Automatic paragraph identification: A study across languages and domains. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-04), pages 72–79, Barcelona.

    Google Scholar 

  • Stede, Manfred and Umbach, Carla (1998). DiMLex: A lexicon of discourse markers for text generation and understanding. In Proceedings of the 17th international conference on Computational Linguistics (COLING-98), pages 1238–1242, Montreal, Canada.

    Google Scholar 

  • Stein, Stephan (2003). Textgliederung. Einheitenbildung im geschriebenen und gesprochenen Deutsch: Theorie und Empirie, volume 69 of Studia Linguistica Germanica. de Gruyter, Berlin.

    Google Scholar 

  • Swales, John M. (1990). Genre Analysis. English in academic and research settings. Cambridge University Press, Cambridge, UK.

    Google Scholar 

  • Teufel, Simone (1999). Argumentative Zoning: Information Extraction from Scientific Text. PhD thesis, University of Edinburgh.

    Google Scholar 

  • Teufel, Simone and Moens, Marc (2002). Summarizing scientfic articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4):409–445.

    Article  Google Scholar 

  • van Dijk, Teun A. (1980). Macrostructures: An interdisciplinary study of global structures in discourse, interaction, and cognition. Lawrence Erlbaum Associates, Hillsdale, NJ.

    Google Scholar 

  • Walsh, Norman and Muellner, Leonard (1999). DocBook: The Definitive Guide. O’Reilly, Sebastopol, CA.

    Google Scholar 

  • Witt, Andreas, Lüngen, Harald, Goecke, Daniela, and Sasaki, Felix (2005). Unification of XML documents with concurrent markup. Literary and Linguistic Computing, 20(1):103–116.

    Article  Google Scholar 

  • Wolf, Florian and Gibson, Edward (2005). Representing discourse coherence: A corpus-based study. Computational Linguistics, 31(2):249–288.

    Article  Google Scholar 

  • Zifonun, Gisela, Hoffmann, Ludger, and Strecker, Bruno (1997). Grammatik der deutschen Sprache, volume 7 of Schriften des Instituts für deutsche Sprache, chapter C6 “Thematische Organisation von Text und Diskurs”, pages 535–591. de Gruyter, Berlin/New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harald Lüngen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Lüngen, H., Bärenfänger, M., Hilbert, M., Lobin, H., Puskás, C. (2010). Discourse Relations and Document Structure. In: Witt, A., Metzing, D. (eds) Linguistic Modeling of Information and Markup Languages. Text, Speech and Language Technology, vol 41. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3331-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-90-481-3331-4_6

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-3330-7

  • Online ISBN: 978-90-481-3331-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics