Abstract
In this chapter, we provide an overview of Turkish Discourse Bank, a resource of \(\sim \)400,000 words built on a sub-corpus of the 2-million-word METU Turkish Corpus annotated following the principles of Penn Discourse Tree Bank. We first present the annotation framework we adopted, explaining how it differs from the annotation of the original language, English. Then we focus on a novel annotation procedure that we have devised and named pair annotation after pair programming. We discuss the advantages it has offered as well as its potential drawbacks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The project website is at http://medid.ii.metu.edu.tr/. The corpus is freely available to researchers. Due to copyright agreements with the publishers, the content of the texts from the MTC cannot be redistributed in any commercial products.
- 2.
One of our reviewers suggests that we speculate on what percentage of PDTB-style discourse relations are covered by annotating explicit connectives, their arguments and supplementary materials. However, without annotating a substantial portion of TDB for implicit connectives, it is very difficult to make a speculation. Also, the ratio might change according to the genre and this makes a speculation even more difficult.
- 3.
Csató and Johanson [6] classify çünkü ‘because’ as a conjoining device on the basis of examples as Ali gelemiyor çünkü çalışıyor ‘Ali is not coming because he is working’, which cannot be subordinated as the complement of verbs such as bilmek ‘know’: *[Ali gelemiyor çünkü çalıştığını] biliyorum. ‘I know [that Ali is not coming because he is working].’ This is possible for coordinated structures, e.g. [Ali’nin geldiğini ve çalıştığını] biliyorum. ‘I know that Ali came and worked’. This supports our categorization of çünkü and various coordinating conjunctions under a single category.
- 4.
Simplex subordinators, and the dependent part of the complex subordinators have morphological variants due to the vowel and consonant harmony rules of Turkish. Briefly, vowel harmony works incrementally in a word, affecting all of the vowels in the root as well as the suffixes. Consonant harmony is an assimilatory process affecting, for example, the consonants at the boundary of a root and suffix. The capitalization we use represents a harmonizing vowel or consonant.
- 5.
In example (9), the temporal adverbial is used discourse-initially and scopes over the whole relation. This is very similar to Asher et al. [4] who argue that locative sentence adverbials have a topic framing role due to their forward-looking character. Asher uses such examples from French to discuss a specific kind of backgrounding relations, i.e. Background\(_\mathrm{{forward}}\) within the framework of SDRT. Further research will identify the role of adverbials marked as shared material in TDB and their contribution to discourse interpretation.
- 6.
The minimality principle may sometimes lead to disagreements among annotators, as discussed in Zeyrek et al. [30].
- 7.
The connective devices dolayısı-yla and dolayısı ile are different forms with the same meaning. The first word contains the suffix –yla, which is semantically equivalent to the clitic ile ‘with’.
- 8.
We are aware that this results in multiple annotation files for one raw text file. The next version of TDB is planned to include all the annotations for a raw text in the same XML file sorted by the character offset of the connective. This will result in fewer annotation files and allow easier processing [8].
- 9.
References
Aktaş, B., Bozsahin, C., Zeyrek, D.: Discourse relation configurations in Turkish and an annotation environment. In: Proceedings of the Fourth Linguistic Annotation Workshop, pp. 202–206 (2010)
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)
Asher, N.: Reference to Abstract Objects in Discourse. Springer, New York (1993)
Asher, N., Prévot, L., Vieu, L.: Setting the background in discourse. Discours. Revue de linguistique, psycholinguistique et informatique (2007). http://discours.revues.org/301, 15 February 2015
Bozşahin, C.: Word order as projection. Dilbilim Araştırmaları Dergisi/J. Linguist. Res. 1–23 (2014)
Csató, É.Á., Lars, J.: Turkish. In: Csató, É.Á., Lars, J. (eds.) The Turkic Languages, pp. 203–235. Routledge, London (1998)
Cresswell, C., Forbes, K., Miltsakaki, E., Prasad, R., Joshi, A., Webber, B.: The discourse anaphoric properties of connectives. In: Proceedings of the 4th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC), Lisbon, Portugal, pp. 45–50 (2002)
Demirşahin, I., Sevdik-Çallı, A., Ögel Balaban, H., Çakıcı, R., Zeyrek, D.: Turkish discourse bank: ongoing developments. In: Proceedings of LREC 2012 The First Turkic Languages Workshop (2012)
Demirşahin, I., Yalçınkaya, I., Zeyrek, D.: Pair annotation: adaption of pair programming to corpus annotation. In: Proceedings of the Sixth Linguistic Annotation Workshop, pp. 31–39 (2012)
Demirşahin, I., Öztürel, A., Bozşahin, C., Zeyrek, D.: Applicative structures and immediate discourse in the Turkish discourse bank. In: Proceedings of the Seventh Linguistic Annotation Workshop and Interoperability with Discourse, pp. 122–130 (2013)
Enç, M.: The semantics of specificity. Linguist. Inq. 22, 1–25 (1991)
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)
Göksel, A., Keslake, C.: Turkish: A Comprehensive Grammar. Routledge, London (2005)
Grosz, B.J., Sidner, C.L.: Attention, intention and the structure of discourse. Comput. Linguist. 12(3), 175–204 (1986)
Hoffman, B.: The computational analysis of the syntax and interpretation of “ free” word order in Turkish. IRCS Technical Reports Series, 130 (1995)
Lee, A., Prasad, R., Joshi, A.K., Webber, B.: Departures from tree structures in discourse. In: Proceedings of the Workshop on Constrains in Discourse III (2008)
Miltsakaki, E., Prasad, R., Joshi, A., Webber, B.: Annotating discourse connectives and their arguments. In: Proceedings of the HLT/NAACL Workshop on Frontiers in Corpus Annotation, pp. 9–16 (2004)
Prasad, R., Webber, B., Joshi, A.: The Penn discourse treebank 2.0. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), p. 2961 (2008)
Prasad, R., Joshi, A., Webber, B.: Realization of discourse relations by other means: alternative lexicalizations. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1023–1031 (2010)
Prasad, R., Webber, B., Joshi, A.: Reflections on the penn discourse treeBank, comparable corpora and complementary annotation. Comput. Linguist. 40(4), 921–950 (2014)
Say, B., Zeyrek, D., Oflazer, K., Özge, U.: Development of a corpus and a treebank for present-day written Turkish. In: Proceedings of the Eleventh International Conference of Turkish Linguistics, pp. 183–192 (2002)
Spooren, W., Degand, L.: Coding coherence relations: reliability and validity. Corpus Linguist. Linguist. Theory 6(2), 241–266 (2010)
Taylan, E.E.: The Function of Word Order in Turkish Grammar, vol. 106. University of California Press, Berkeley (1984)
Traugott, E.C.: The role of the development of discourse markers in a theory of grammaticalization. In: ICHL XII, Manchester, pp. 1–23 (1995)
Turan, Ü.D.: Subject and object in Turkish discourse: a centering analysis. Doctoral dissertation, Ph.D dissertation, University of Pennsylvania (1995)
Webber, B.: D-LTAG: extending lexicalized TAG to discourse. Cogn. Sci. 28(5), 751–779 (2004)
Williams, L.A., Kessler, R.R.: All I ever needed to know about pair programming I learned in kindergarten. Commun. ACM, 43(5), 108–114 (2000)
Williams, L.A., Kessler, R.R., Cunningham, W., Jeffries, R.: Strengthening the case for pair programming. IEEE Software 17(4), 19–25 (2000)
Zeyrek, D., Webber, B.: A discourse resource for Turkish: annotating discourse connectives in the METU Turkish corpus. In: Proceedings of the Sixth Workshop on Asian Language Resources, Hyderabad, India, pp. 65–72 (2008)
Zeyrek, D., Demirşahin, I., Sevdik-Çallı, A., Ögel, B.H., Yalçınkaya, İ.: The evaluation scheme of the Turkish discourse bank and an evaluation of inconsistent annotations. In: Proceedings of the Fourth Linguistic Annotation Workshop, ACL 2010, Uppsala, Sweden, pp. 282–289 (2010), 15–16 July 2010
Zeyrek, D., Turan, Ü.D., Demirşahin, I., Çakıcı, R.: Differential properties of three discourse connectives in Turkish: a corpus-based analysis of Fakat, Yoksa, Ayrıca. In: Benz, A., Kuehlnlein, P., Stede, M. (eds.) Constraints in Discourse III. John Benjamins, Amsterdam (2012)
Zeyrek, D., Demirşahin, I., Sevdik-Çallı, A., Çakıcı, R.: Turkish discourse bank: porting a discourse annotation style to a morphologically rich language. Dialogue Discourse 4(2), 174–184 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix 1: The Number of Files and Annotations in TDB 1.0
Appendix 1: The Number of Files and Annotations in TDB 1.0
See Table 8.
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Demirşahin, I., Zeyrek, D. (2017). Pair Annotation as a Novel Annotation Procedure: The Case of Turkish Discourse Bank. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_46
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_46
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)