Advertisement

Language Resources and Evaluation

, Volume 52, Issue 1, pp 149–184 | Cite as

RST Signalling Corpus: a corpus of signals of coherence relations

  • Debopam Das
  • Maite Taboada
Original Paper

Abstract

We present the RST Signalling Corpus (Das et al. in RST signalling corpus, LDC2015T10. https://catalog.ldc.upenn.edu/LDC2015T10, 2015), a corpus annotated for signals of coherence relations. The corpus is developed over the RST Discourse Treebank (Carlson et al. in RST Discourse Treebank, LDC2002T07. https://catalog.ldc.upenn.edu/LDC2002T07, 2002) which is annotated for coherence relations. In the RST Signalling Corpus, these relations are further annotated with signalling information. The corpus includes annotation not only for discourse markers which are considered to be the most typical (or sometimes the only type of) signals in discourse, but also for a wide array of other signals such as reference, lexical, semantic, syntactic, graphical and genre features as potential indicators of coherence relations. We describe the research underlying the development of the corpus and the annotation process, and provide details of the corpus. We also present the results of an inter-annotator agreement study, illustrating the validity and reproducibility of the annotation. The corpus is available through the Linguistic Data Consortium, and can be used to investigate the psycholinguistic mechanisms behind the interpretation of relations through signalling, and also to develop discourse-specific computational systems such as discourse parsing applications.

Keywords

RST Signalling Corpus RST Discourse Treebank Coherence relations Rhetorical Structure Theory Signals Discourse markers 

Notes

Acknowledgements

We are greatly indebted to the late Dr. Paul McFetridge for his invaluable contribution to this work. Dr. McFetridge was the Senior Supervisor of Debopam Das’ Ph.D. dissertation (Das 2014). Sadly, he passed away on March 14, 2014, only a few months before the completion of the final version of the RST Signalling Corpus. He was a major driving force and a source of constant support for our work. Funding for this research was provided by the Natural Sciences and Engineering Research Council of Canada (Discovery Grant 261,104-2008).

References

  1. Afantenos, S., Asher, N., Benamara, F., Bras, M., Fabre, C., & Ho-Dac, M., et al. (2012). An empirical resource for discovering cognitive principles of discourse organization: the ANNODIS corpus. In Paper presented at the the 8th international conference on language resources and evaluation (LREC 2012), Istanbul, Turkey. Google Scholar
  2. Alonso, L., Castellón, I., Gibert, K., & Padró, L. (2002). An empirical approach to discourse markers by clustering. In M. T. Escrig, F. Toledo, & E. Golobardes (Eds.), Topics in artificial intelligence (Vol. 2504, pp. 173–183). Berlin: Springer.CrossRefGoogle Scholar
  3. Al-Saif, A., & Markert, K. (2010). The leeds Arabic Discourse Treebank: Annotating discourse connectives for Arabic. In Paper presented at the the 7th International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta.Google Scholar
  4. Bateman, J., Kamps, T., Kleinz, J., & Reichenberger, K. (2001). Towards constructive text, diagram, and layout generation for information presentation. Computational Linguistics, 27(3), 409–449.CrossRefGoogle Scholar
  5. Berzlánovich, I., & Redeker, G. (2012). Genre-dependent interaction of coherence and lexical cohesion in written discourse. Corpus Linguistics and Linguistic Theory, 8(1), 183–208.CrossRefGoogle Scholar
  6. Blakemore, D. (1987). Semantic constraints on relevance. Oxford: Blackwell.Google Scholar
  7. Blakemore, D. (1992). Understanding utterances: An introduction to pragmatics. Oxford: Blackwell.Google Scholar
  8. Blakemore, D. (2002). Relevance and linguistic meaning: The semantics and pragmatics of discourse markers. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  9. Cain, K., & Nash, H. M. (2011). The influence of connectives on young readers’ processing and comprehension of text. Journal of Educational Psychology, 103(2), 429–441.CrossRefGoogle Scholar
  10. Carlson, L., & Marcu, D. (2001). Discourse tagging manual. Los Angeles: University of Southern California.Google Scholar
  11. Carlson, L., Marcu, D., & Okurowski, M. E. (2002). RST Discourse Treebank, LDC2002T07. https://catalog.ldc.upenn.edu/LDC2002T07.
  12. Cevasco, J. (2009). The role of connectives in the comprehension of spontaneous spoken discourse. The Spanish Journal of Psychology, 12(1), 56–65.CrossRefGoogle Scholar
  13. Corston-Oliver, S. (1998). Beyond string matching and cue phrases: Improving efficiency and coverage in discourse analysis. In Paper presented at the AAAI 1998 spring symposium series, intelligent text summarization, Madison, Wisconsin.Google Scholar
  14. da Cunha, I., Juan, E. S., Torres-Moreno, J. M., Cabré, M. T., & Sierra, G. (2012). A symbolic approach for automatic detection of nuclearity and rhetorical relations among intra-sentence discourse segments in Spanish. In Paper presented at the CICLing, New Delhi, India.Google Scholar
  15. da Cunha, I., Torres-Moreno, J.-M., & Sierra, G. (2011). On the development of the RST Spanish Treebank. In Paper presented at the the 5th linguistic annotation workshop, 49th annual meeting of the association for computational linguistics (ACL), Portland, OR.Google Scholar
  16. Dale, R. (1991a). Exploring the role of punctuation in the signalling of discourse structure. In Paper presented at the workshop on text representation and domain modeling: Ideas from linguistics and AI. Technical University of Berlin.Google Scholar
  17. Dale, R. (1991b). The role of punctuation in discourse structure. In Paper presented at the the AAAI fall symposium on discourse structure in natural language understanding and generation, Asilomar, CA.Google Scholar
  18. Dancygier, B., & Sweetser, E. (2005). Mental spaces in grammar: Conditional constructions. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  19. Das, D. (2012). Investigating the role of discourse markers in signalling coherence relations: A corpus study. In Paper presented at the the Northwest linguistics conference. Seattle: University of Washington.Google Scholar
  20. Das, D. (2014). Signalling of coherence relations in discourse. Ph.D. dissertation. Burnaby: Simon Fraser University.Google Scholar
  21. Das, D., & Taboada, M. (2013). Explicit and implicit coherence relations: A corpus study. In Paper presented at the the Canadian linguistic association (CLA) conference. Victoria: University of Victoria.Google Scholar
  22. Das, D., Taboada, M., & McFetridge, P. (2015). RST signalling corpus, LDC2015T10. https://catalog.ldc.upenn.edu/LDC2015T10.
  23. Degand, L., & Sanders, T. (2002). The impact of relational markers on expository text omprehension in L1 and L2. Reading and Writing, 15(7–8), 739–758.CrossRefGoogle Scholar
  24. Derczynski, L., & Gaizauskas, R. (2013). Temporal signals help label temporal relations. In Paper presented at the annual meeting of the association for computational linguistics, ACL, Sofia, Bulgaria.Google Scholar
  25. Dipper, S., Götze, M., & Stede, M. (2004). Simple annotation tools for complex annotation tasks: An evaluation. In Paper presented at the the LREC workshop on XML-based richly annotated corpora, Lisbon, Portugal. Google Scholar
  26. Duque, E. (2014). Signaling causal coherence relations. Discourse Studies, 16(1), 25–46.CrossRefGoogle Scholar
  27. Feng, V. W., & Hirst, G. (2012). Text-level discourse parsing with rich linguistic features. In Paper presented at the the 50th annual meeting of the association for computational linguistics.Google Scholar
  28. Feng, V. W., & Hirst, G. (2014). A linear-time bottom-up discourse parser with constraints and post-editing. In Paper presented at the the 52th annual meeting of the association for computational linguistics (ACL-2014), Baltimore, USA. Google Scholar
  29. Fraser, B. (1990). An approach to discourse markers. Journal of Pragmatics, 14, 383–395.CrossRefGoogle Scholar
  30. Fraser, B. (1999). What are discourse markers? Journal of Pragmatics, 31, 931–953.CrossRefGoogle Scholar
  31. Fraser, B. (2006). Towards a theory of discourse markers. In K. Fischer (Ed.), Approaches to discourse particles (pp. 189–204). Amsterdam: Elsevier Press.Google Scholar
  32. Fraser, B. (2009). An account of discourse markers. International Review of Pragmatics, 1, 293–320.CrossRefGoogle Scholar
  33. Haberlandt, K. (1982). Reader expectations in text comprehension. In J.-F. Le Ny & W. Kintsch (Eds.), Language and comprehension (pp. 239–249). Amsterdam: North-Holland.CrossRefGoogle Scholar
  34. Halliday, M., & Hasan, R. (1976). Cohesion in english. London: Longman.Google Scholar
  35. Hernault, H., Bollegala, D., & Ishizuka, M. (2011). Semi-supervised discourse relation classification with structural learning. In Paper presented at the the 12th international conference on computational linguistics and intelligent text processing (CICLing ‘11), Tokyo, Japan. Google Scholar
  36. Hernault, H., Prendinger, H., duVerle, D. A., & Ishizuka, M. (2010). HILDA: A discourse parser using support vector machine classification. Dialogue and Discourse, 1(3), 1–33.Google Scholar
  37. Kamalski, J. (2007). Coherence marking, comprehension and persuasion: On the processing and representation of discourse. Utrecht: LOT.Google Scholar
  38. Knott, A. (1996). A data-driven methodology for motivating a set of coherence relations. Ph.D. dissertation. Edinburgh: University of Edinburgh.Google Scholar
  39. Knott, A., & Dale, R. (1994). Using linguistic phenomena to motivate a set of coherence relations. Discourse Processes, 18(1), 35–62.CrossRefGoogle Scholar
  40. Knott, A., & Sanders, T. (1998). The classification of coherence relation and their linguistic markers: An exploration of two languages. Journal of Pragmatics, 30, 135–175.CrossRefGoogle Scholar
  41. Kolachina, S., Prasad, R., Misra Sharma, D., & Joshi, A. (2012). Evaluation of discourse relation annotation in the Hindi Discourse Treebank. In Paper presented at the 8th international conference on language resources and evaluation (LREC 2012), Istanbul, Turkey. Google Scholar
  42. Lapata, M., & Lascarides, A. (2004). Inferring sentence-internal temporal relations. In Paper presented at the North American chapter of the assocation of computational linguistics.Google Scholar
  43. Le Thanh, H. (2007). An approach in automatically generating discourse structure of text. Journal of Computer Science and Cybernetics, 23(3), 212–230.Google Scholar
  44. Lin, Z., Kan, M.-Y., & Ng, H. T. (2009). Recognizing implicit discourse relations in the Penn Discourse Treebank. In Paper presented at the 2009 conference on empirical methods in natural language processing, Singapore. Google Scholar
  45. Louis, A., Joshi, A., Prasad, R., & Nenkova, A. (2010). Using entity features to classify implicit discourse relations. In Paper presented at the the 11th annual meeting of the special interest group on discourse and dialogue, SIGDIAL’10.Google Scholar
  46. Mak, W. M., & Sanders, T. J. M. (2013). The role of causality in discourse processing: Effects on expectation and coherence relations. Language and Cognitive Processes, 28(9), 1414–1437.CrossRefGoogle Scholar
  47. Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243–281.CrossRefGoogle Scholar
  48. Marcu, D. (1999). A decision-based approach to rhetorical parsing. In Paper presented at the the 37th annual meeting of the association for computational linguistics on computational linguistics, College Park, Maryland.Google Scholar
  49. Marcu, D. (2000). The rhetorical parsing of unrestricted texts: A surface based approach. Computational Linguistics, 26(3), 395–448.CrossRefGoogle Scholar
  50. Marcu, D., & Echihabi, A. (2002). An unsupervised approach to recognising discourse relations. In Paper presented at the 40th annual meeting of the association for computational linguistics (ACL’02), Philadelphia, PA.Google Scholar
  51. Marcus, M., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.Google Scholar
  52. Martin, J. R. (1992). English text: System and structure. Amsterdam: John Benjamins.CrossRefGoogle Scholar
  53. Matthiessen, C. M. I. M. (2015). Register in the round: Registerial cartography. Functional Linguistics, 2(9), 1–48.Google Scholar
  54. Matthiessen, C. M. I. M., & Teruya, K. (2015). Grammatical realizations of rhetorical relations in different registers. Word, 61(3), 232–281.CrossRefGoogle Scholar
  55. Maziero, E. G., Pardo, T. A. S., da Cunha, I., Torres-Moreno, J.-M., & SanJuan, E. (2011). DiZer 2.0—An Adaptable on-line discourse parser. In Paper presented at the the III RST meeting (8th Brazilian symposium in information and human language technology, Cuiaba, MT, Brazil. Google Scholar
  56. Meyer, B. J. F. (1975). The organization of prose and its effects on memory. Amsterdam: North-Holland.Google Scholar
  57. Meyer, T., & Webber, B. (2013). Implicitation of discourse connectives in (machine) translation. In Paper presented at the the 1st DiscoMT workshop at ACL 2013 (51th annual meeting of the association for computational linguistics), Sofia, Bulgaria. Google Scholar
  58. Millis, K. K., & Just, M. A. (1994). The influence of connectives on sentence comprehension. Journal of Memory and Language, 33, 128–147.CrossRefGoogle Scholar
  59. Mithun, S., & Kosseim, L. (2011). Comparing approaches to tag discourse relations. In Paper presented at the the 12th international conference on computational linguistics and intelligent text processing (CICLing’11), Tokyo, Japan.Google Scholar
  60. Mladová, L., Zikánová, Š., & Hajičova, E. (2008). From sentence to discourse: Building an annotation scheme for discourse based on Prague Dependency Treebank. In Paper presented at the the 6th international conference on language resources and evaluation (LREC 2008), Marakéš, Maroko.Google Scholar
  61. Mulder, G. (2008). Undestanding causal coherence relations. Ph.D. dissertation. Utrecht: Utrecht University.Google Scholar
  62. Mulder, G., & Sanders, T. J. M. (2012). Causal coherence relations and levels of discourse representation. Discourse Processes, 49(6), 501–522.CrossRefGoogle Scholar
  63. Murray, J. D. (1995). Logical connectives and local coherence. In J. R. F. Lorch & E. J. O’Brien (Eds.), Sources of coherence in reading (pp. 107–125). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  64. O’Donnell, M. (1997). RSTTool. http://www.wagsoft.com/RSTTool/.
  65. O’Donnell, M. (2008). The UAM CorpusTool: Software for corpus annotation and exploration. In Paper presented at the the XXVI Congreso de AESLA, Almeria, Spain.Google Scholar
  66. Pardo, T. A. S., & Nunes, M. D. G. V. (2008). On the development and evaluation of a Brazilian Portuguese discourse parser. Journal of Theoretical and Applied Computing, 15(2), 43–64.Google Scholar
  67. Pitler, E., Louis, A., & Nenkova, A. (2009). Automatic sense prediction for implicit discourse relations in text. In Paper presented at the the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, Singapore.Google Scholar
  68. Polanyi, L., Culy, C., van den Berg, M., Thione, G. L., & Ahn, D. (2004). A rule based approach to discourse parsing. In Paper presented at the the 5th SIGdial workshop on discourse and dialogue. Cambridge, MA: ACL.Google Scholar
  69. Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., & Webber, B. (2008). The Penn Discourse Treebank 2.0. In Paper presented at the 6th international conference on language resources and evaluation (LREC 2008), Marrackech, Morocco.Google Scholar
  70. Prasad, R., Joshi, A., & Webber, B. (2010). Realization of discourse relations by other means: Alternative lexicalizations. In Paper presented at the the 23rd international conference on computational linguistics, Beijing.Google Scholar
  71. Prasad, R., Miltsakaki, E., Dinesh, N., Lee, A., Joshi, A., Robaldo, L., & Webber, B. (2007). The Penn Discourse Treebank 2.0 annotation manual. The PDTB Research Group. University of Pennsylvania.Google Scholar
  72. Redeker, G., Berzlánovich, I., van der Vliet, N., Bouma, G., & Egg, M. (2012). Multi-layer discourse annotation of a Dutch text corpus. In Paper presented at the the 8th international conference on language resources and evaluation (LREC 2012), Istanbul, Turkey.Google Scholar
  73. Renkema, J. (2004). Introduction to discourse studies. Amsterdam: Benjamins.CrossRefGoogle Scholar
  74. Renkema, J. (2009). The texture of discourse. Amsterdam: John Benjamins Publishing Company.CrossRefGoogle Scholar
  75. Roze, C., Danlos, L., & Muller, P. (2012). EXCONN: A French lexicon of discourse connectives. Discours, 10, 114–125.Google Scholar
  76. Sanders, T., Land, J., & Mulder, G. (2007). Linguistic markers of coherence improve text comprehension in funtional contexts—On text representation and document design. Information Design Journal, 15(3), 219–235.CrossRefGoogle Scholar
  77. Sanders, T., & Noordman, L. (2000). The role of coherence relations and their linguistic markers in text processing. Discourse Processes, 29(1), 37–60.CrossRefGoogle Scholar
  78. Sanders, T., & Spooren, W. (2007). Discourse and text structure. In D. Geeraerts & J. Cuykens (Eds.), Handbook of cognitive linguistics (pp. 916–941). Oxford: Oxford University Press.Google Scholar
  79. Sanders, T., & Spooren, W. (2009). The cognition of discourse coherence. In J. Renkema (Ed.), Discourse, of course (pp. 197–212). Amsterdam: Benjamins.CrossRefGoogle Scholar
  80. Sanders, T., Spooren, W., & Noordman, L. (1992). Toward a taxonomy of coherence relations. Discourse Processes, 15, 1–35.CrossRefGoogle Scholar
  81. Sanders, T., Spooren, W., & Noordman, L. (1993). Coherence relations in a cognitive theory of discourse representation. Cognitive Linguistics, 4(2), 93–133.CrossRefGoogle Scholar
  82. Scanlan, C. (2000). Reporting and writing: Basics for the 21st century. Oxford: Oxford University Press.Google Scholar
  83. Schiffrin, D. (1987). Discourse markers. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  84. Schiffrin, D. (2001). Discourse markers: Language, meaning and context. In D. Schiffrin, D. Tannen, & H. E. Hamilton (Eds.), The handbook of discourse analysis (pp. 54–75). Malden, MA: Blackwell.Google Scholar
  85. Schilder, F. (2002). Robust discourse parsing via discourse markers, topicality and position. Natural Language Engineering, 8(2/3), 235–255.Google Scholar
  86. Scott, D., & de Souza, C. S. (1990). Getting the message across in RST-based text generation. In R. Dale, C. Mellish, & M. Zock (Eds.), Current research in natural language generation (pp. 47–73). London: Academic Press.Google Scholar
  87. Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences. New York: McGraw-Hil.Google Scholar
  88. Sporleder, C., & Lascarides, A. (2005). Exploiting linguistic cues to classify rhetorical relations. In Paper presented at the recent advances in natural language processing (RANLP-05).Google Scholar
  89. Sporleder, C., & Lascarides, A. (2008). Using automatically labelled examples to classify rhetorical relations: An assessment. Natural Language Engineering, 14, 369–416.CrossRefGoogle Scholar
  90. Spyridakis, J. H., & Standal, T. C. (1987). Signals in expository prose: Effects on reading comprehension. Reading Research Quarterly, 12, 285–298.CrossRefGoogle Scholar
  91. Stede, M., & Umbach, C. (1998). DiMLex: A lexicon of discourse markers for text generation and understanding. In Paper presented at the the COLING-ACL ‘98 conference, Montreal.Google Scholar
  92. Taboada, M. (2006). Discourse markers as signals (or not) of rhetorical relations. Journal of Pragmatics, 38(4), 567–592.CrossRefGoogle Scholar
  93. Taboada, M. (2009). Implicit and explicit coherence relations. In J. Renkema (Ed.), Discourse, of course. Amsterdam: John Benjamins.Google Scholar
  94. Taboada, M., & Das, D. (2013). Annotation upon annotation: Adding signaling information to a corpus of discourse relations. Dialogue and Discourse, 4(2), 249–281.CrossRefGoogle Scholar
  95. Taboada, M., & Mann, W. C. (2006a). Applications of rhetorical structure theory. Discourse Studies, 8(4), 567–588.CrossRefGoogle Scholar
  96. Taboada, M., & Mann, W. C. (2006b). Rhetorical structure theory: Looking back and moving ahead. Discourse Studies, 8(3), 423–459.CrossRefGoogle Scholar
  97. Theijssen, D. (2007). Features for automatic discourse analysis of paragraphs. M.A. dissertation. Nijmegen: Radboud University Nijmegen.Google Scholar
  98. Theijssen, D., van Halteren, H., Verberne, S., & Boves, L. (2008). Features for automatic discourse analysis of paragraphs. In Paper presented at the 18th meeting of computational linguistics in the Netherlands (CLIN 2007).Google Scholar
  99. Tonelli, S., Riccardi, G., Prasad, R., & Joshi, A. (2010). Annotation of discourse relations for conversational spoken dialogs. In Paper presented at the the 7th international conference on language resources and evaluation (LREC 2010), Valletta, Malta.Google Scholar
  100. Versley, Y. (2013). Subgraph-based classification of explicit and implicit discourse relations. In Paper presented at the the 10th international conference on computational semantics (IWCS 2013), Potsdam, Germany.Google Scholar
  101. Versley, Y., & Gastel, A. (2013). Linguistic tests for discourse relations in the TüBa-D/Z corpus of written German. Dialogue and Discourse, 4(2), 142–173.CrossRefGoogle Scholar
  102. Zeyrek, D., Demirşahin, I., Sevdik-Çalli, A. B., Balaban, H. Ö., Yalçinkaya, I., & Turan, Ü. D. (2010). The annotation scheme of the Turkish Discourse Bank and an evaluation of inconsistent annotation. In Paper presented at the the fourth linguistic annotation workshop (LAW-IV).Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  1. 1.Simon Fraser UniversityBurnabyCanada

Personalised recommendations