Abstract
In this paper, we investigate some of the problems associated with the automatic extraction of discourse relations. In particular, we study the influence of communicative goals encoded in a given genre against another, and between the various communicative goals encoded between sections of documents of a same genre. Some investigations have been made in the past in order to identify the differences seen across either genres or textual organization, but none have made a thorough statistical analysis of these differences across currently available annotated corpora. In this paper, we show that both the communicative goal of a given genre and, to a lesser extend, that of a particular topic tackled by that genre, do in fact influence in the distribution of discourse relations. Using a statistically grounded approach, we show that certain discourse relations are more likely to appear within given genres and subsequently within sections within a genre. In particular, we observed that Attributions are common in the newspaper articles genre while Joint relations are comparatively more frequent in online reviews. We also notice that Temporal relations are statically more common in the methodology sections of scientific research documents than in the rest of the text. These results are important as they give clues to allow the tailoring of current discourse taggers to specific textual genres.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Soricut, R., Marcu, D.: Sentence level discourse parsing using syntactic and lexical information. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, vol. 1, pp. 149–156 (2003)
Hilda: A discourse parser using support vector machine classification
Feng, V.W., Hirst, G.: Text-level discourse parsing with rich linguistic features. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea, vol. 1, pp. 60–68 (2012)
Swales, J.: Genre analysis: English in academic and research settings. Cambridge University Press (1990)
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: A framework for the analysis of texts. IPRA Papers in Pragmatics 1, 79–105 (1987)
Webber, B.: Genre distinctions for discourse in the Penn Treebank. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, vol. 2, pp. 674–682 (2009)
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A.K., Webber, B.L.: The Penn Discourse TreeBank 2.0. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), Marrakech, Morocco, pp. 2961–2968 (2008)
Taboada, M.: Stages in an online review genre. Text & Talk-An Interdisciplinary Journal of Language, Discourse & Communication Studies 31(2), 247–269 (2011)
Cardoso, P.C., Taboada, M., Pardo, T.A.: On the contribution of discourse structure to topic segmentation. In: Proceedings of the Special Interest Group on Discourse and Dialogue (SIGDIAL), Metz, France, pp. 92–96 (2013)
Cardoso, P.C., Maziero, E.G., Castro Jorge, M., Seno, E.M., Di Felippo, A., Rino, L.H., Nunes, M.: Cstnews-A discourse-annotated corpus for single and multi-document summarization of news texts in Brazilian Portuguese. In: Proceedings of the 3rd RST Brazilian Meeting, Brazil, pp. 88–105 (2011)
Carlson, L., Okurowski, M.E., Marcu, D.: RST Discourse Treebank. Linguistic Data Consortium, University of Pennsylvania (2002)
Wolf, F., Gibson, E., Fisher, A., Knight, M.: Discourse Graphbank. Linguistic Data Consortium, Philadelphia (2004)
Taboada, M., Renkema, J.: Discourse relations reference corpus. Simon Fraser University and Tilburg University (2008), http://www.sfu.ca/rst/06tools/discourse_relations_corpus.html
Prasad, R., Miltsakaki, E., Dinesh, N., Lee, A., Joshi, A., Robaldo, L., Webber, B.L.: The Penn Discourse Treebank 2.0 annotation manual. Technical Report, Institute for Research in Cognitive Science, University of Pennsylvania (2007), http://www.seas.upenn.edu/pdtb/PDTBAPI/pdtb-annotation-manual.pdf
Mihaila, C., Ohta, T., Pyysalo, S., Ananiadou, S., et al.: Biocause: Annotating and analysing causality in the biomedical domain. BMC Bioinformatics 14(2) (2013)
Prasad, R., McRoy, S., Frid, N., Joshi, A., Yu, H.: The biomedical discourse relation bank. BMC Bioinformatics 12, 188
Marcu, D.: Instructions for manually annotating the discourse structures of texts (1999), http://www.isi.edu/marcu
Taboada, M., Anthony, C., Voll, K.: Methods for creating semantic orientation dictionaries. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), Genova, Italy, pp. 427–432 (2006)
Taboada, M., Grieve, J.: Analyzing appraisal automatically. In: Proceedings of AAAI Spring Symposium on Exploring Attitude and Affect in Text, Stanford University, CA, pp. 158–161 (2004)
Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of rhetorical structure theory. In: Proceedings of the Second SIGdial Workshop on Discourse and Dialogue, SIGDIAL 2001, Aalborg, Denmark, vol. 16, pp. 1–10 (2001)
Rayson, P., Garside, R.: Comparing corpora using frequency profiling. In: Proceedings of the Workshop on Comparing Corpora, Hong Kong, pp. 1–6 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bachand, FH., Davoodi, E., Kosseim, L. (2014). An Investigation on the Influence of Genres and Textual Organisation on the Use of Discourse Relations. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54906-9_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-54906-9_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54905-2
Online ISBN: 978-3-642-54906-9
eBook Packages: Computer ScienceComputer Science (R0)