Skip to main content

Pair Annotation as a Novel Annotation Procedure: The Case of Turkish Discourse Bank

  • Chapter
  • First Online:
Handbook of Linguistic Annotation

Abstract

In this chapter, we provide an overview of Turkish Discourse Bank, a resource of \(\sim \)400,000 words built on a sub-corpus of the 2-million-word METU Turkish Corpus annotated following the principles of Penn Discourse Tree Bank. We first present the annotation framework we adopted, explaining how it differs from the annotation of the original language, English. Then we focus on a novel annotation procedure that we have devised and named pair annotation after pair programming. We discuss the advantages it has offered as well as its potential drawbacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The project website is at http://medid.ii.metu.edu.tr/. The corpus is freely available to researchers. Due to copyright agreements with the publishers, the content of the texts from the MTC cannot be redistributed in any commercial products.

  2. 2.

    One of our reviewers suggests that we speculate on what percentage of PDTB-style discourse relations are covered by annotating explicit connectives, their arguments and supplementary materials. However, without annotating a substantial portion of TDB for implicit connectives, it is very difficult to make a speculation. Also, the ratio might change according to the genre and this makes a speculation even more difficult.

  3. 3.

    Csató and Johanson [6] classify çünkü ‘because’ as a conjoining device on the basis of examples as Ali gelemiyor çünkü çalışıyor ‘Ali is not coming because he is working’, which cannot be subordinated as the complement of verbs such as bilmek ‘know’: *[Ali gelemiyor çünkü çalıştığını] biliyorum. ‘I know [that Ali is not coming because he is working].’ This is possible for coordinated structures, e.g. [Ali’nin geldiğini ve çalıştığını] biliyorum. ‘I know that Ali came and worked’. This supports our categorization of çünkü and various coordinating conjunctions under a single category.

  4. 4.

    Simplex subordinators, and the dependent part of the complex subordinators have morphological variants due to the vowel and consonant harmony rules of Turkish. Briefly, vowel harmony works incrementally in a word, affecting all of the vowels in the root as well as the suffixes. Consonant harmony is an assimilatory process affecting, for example, the consonants at the boundary of a root and suffix. The capitalization we use represents a harmonizing vowel or consonant.

  5. 5.

    In example (9), the temporal adverbial is used discourse-initially and scopes over the whole relation. This is very similar to Asher et al. [4] who argue that locative sentence adverbials have a topic framing role due to their forward-looking character. Asher uses such examples from French to discuss a specific kind of backgrounding relations, i.e. Background\(_\mathrm{{forward}}\) within the framework of SDRT. Further research will identify the role of adverbials marked as shared material in TDB and their contribution to discourse interpretation.

  6. 6.

    The minimality principle may sometimes lead to disagreements among annotators, as discussed in Zeyrek et al. [30].

  7. 7.

    The connective devices dolayısı-yla and dolayısı ile are different forms with the same meaning. The first word contains the suffix –yla, which is semantically equivalent to the clitic ile ‘with’.

  8. 8.

    We are aware that this results in multiple annotation files for one raw text file. The next version of TDB is planned to include all the annotations for a raw text in the same XML file sorted by the character offset of the connective. This will result in fewer annotation files and allow easier processing [8].

  9. 9.

    Artstein and Poesio [2] suggest 0.8 as a good cut-off point for reasonable quality. On the other hand, Spooren and Degan [22] suggest reaching a minimal value of 0.7 in annotating discourse coherence. In this paper, we take 0.8 as the cut-off point.

References

  1. Aktaş, B., Bozsahin, C., Zeyrek, D.: Discourse relation configurations in Turkish and an annotation environment. In: Proceedings of the Fourth Linguistic Annotation Workshop, pp. 202–206 (2010)

    Google Scholar 

  2. Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)

    Article  Google Scholar 

  3. Asher, N.: Reference to Abstract Objects in Discourse. Springer, New York (1993)

    Book  Google Scholar 

  4. Asher, N., Prévot, L., Vieu, L.: Setting the background in discourse. Discours. Revue de linguistique, psycholinguistique et informatique (2007). http://discours.revues.org/301, 15 February 2015

  5. Bozşahin, C.: Word order as projection. Dilbilim Araştırmaları Dergisi/J. Linguist. Res. 1–23 (2014)

    Google Scholar 

  6. Csató, É.Á., Lars, J.: Turkish. In: Csató, É.Á., Lars, J. (eds.) The Turkic Languages, pp. 203–235. Routledge, London (1998)

    Google Scholar 

  7. Cresswell, C., Forbes, K., Miltsakaki, E., Prasad, R., Joshi, A., Webber, B.: The discourse anaphoric properties of connectives. In: Proceedings of the 4th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC), Lisbon, Portugal, pp. 45–50 (2002)

    Google Scholar 

  8. Demirşahin, I., Sevdik-Çallı, A., Ögel Balaban, H., Çakıcı, R., Zeyrek, D.: Turkish discourse bank: ongoing developments. In: Proceedings of LREC 2012 The First Turkic Languages Workshop (2012)

    Google Scholar 

  9. Demirşahin, I., Yalçınkaya, I., Zeyrek, D.: Pair annotation: adaption of pair programming to corpus annotation. In: Proceedings of the Sixth Linguistic Annotation Workshop, pp. 31–39 (2012)

    Google Scholar 

  10. Demirşahin, I., Öztürel, A., Bozşahin, C., Zeyrek, D.: Applicative structures and immediate discourse in the Turkish discourse bank. In: Proceedings of the Seventh Linguistic Annotation Workshop and Interoperability with Discourse, pp. 122–130 (2013)

    Google Scholar 

  11. Enç, M.: The semantics of specificity. Linguist. Inq. 22, 1–25 (1991)

    Google Scholar 

  12. Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)

    Article  Google Scholar 

  13. Göksel, A., Keslake, C.: Turkish: A Comprehensive Grammar. Routledge, London (2005)

    Book  Google Scholar 

  14. Grosz, B.J., Sidner, C.L.: Attention, intention and the structure of discourse. Comput. Linguist. 12(3), 175–204 (1986)

    Google Scholar 

  15. Hoffman, B.: The computational analysis of the syntax and interpretation of “ free” word order in Turkish. IRCS Technical Reports Series, 130 (1995)

    Google Scholar 

  16. Lee, A., Prasad, R., Joshi, A.K., Webber, B.: Departures from tree structures in discourse. In: Proceedings of the Workshop on Constrains in Discourse III (2008)

    Google Scholar 

  17. Miltsakaki, E., Prasad, R., Joshi, A., Webber, B.: Annotating discourse connectives and their arguments. In: Proceedings of the HLT/NAACL Workshop on Frontiers in Corpus Annotation, pp. 9–16 (2004)

    Google Scholar 

  18. Prasad, R., Webber, B., Joshi, A.: The Penn discourse treebank 2.0. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), p. 2961 (2008)

    Google Scholar 

  19. Prasad, R., Joshi, A., Webber, B.: Realization of discourse relations by other means: alternative lexicalizations. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1023–1031 (2010)

    Google Scholar 

  20. Prasad, R., Webber, B., Joshi, A.: Reflections on the penn discourse treeBank, comparable corpora and complementary annotation. Comput. Linguist. 40(4), 921–950 (2014)

    Article  Google Scholar 

  21. Say, B., Zeyrek, D., Oflazer, K., Özge, U.: Development of a corpus and a treebank for present-day written Turkish. In: Proceedings of the Eleventh International Conference of Turkish Linguistics, pp. 183–192 (2002)

    Google Scholar 

  22. Spooren, W., Degand, L.: Coding coherence relations: reliability and validity. Corpus Linguist. Linguist. Theory 6(2), 241–266 (2010)

    Article  Google Scholar 

  23. Taylan, E.E.: The Function of Word Order in Turkish Grammar, vol. 106. University of California Press, Berkeley (1984)

    Google Scholar 

  24. Traugott, E.C.: The role of the development of discourse markers in a theory of grammaticalization. In: ICHL XII, Manchester, pp. 1–23 (1995)

    Google Scholar 

  25. Turan, Ü.D.: Subject and object in Turkish discourse: a centering analysis. Doctoral dissertation, Ph.D dissertation, University of Pennsylvania (1995)

    Google Scholar 

  26. Webber, B.: D-LTAG: extending lexicalized TAG to discourse. Cogn. Sci. 28(5), 751–779 (2004)

    Article  Google Scholar 

  27. Williams, L.A., Kessler, R.R.: All I ever needed to know about pair programming I learned in kindergarten. Commun. ACM, 43(5), 108–114 (2000)

    Google Scholar 

  28. Williams, L.A., Kessler, R.R., Cunningham, W., Jeffries, R.: Strengthening the case for pair programming. IEEE Software 17(4), 19–25 (2000)

    Google Scholar 

  29. Zeyrek, D., Webber, B.: A discourse resource for Turkish: annotating discourse connectives in the METU Turkish corpus. In: Proceedings of the Sixth Workshop on Asian Language Resources, Hyderabad, India, pp. 65–72 (2008)

    Google Scholar 

  30. Zeyrek, D., Demirşahin, I., Sevdik-Çallı, A., Ögel, B.H., Yalçınkaya, İ.: The evaluation scheme of the Turkish discourse bank and an evaluation of inconsistent annotations. In: Proceedings of the Fourth Linguistic Annotation Workshop, ACL 2010, Uppsala, Sweden, pp. 282–289 (2010), 15–16 July 2010

    Google Scholar 

  31. Zeyrek, D., Turan, Ü.D., Demirşahin, I., Çakıcı, R.: Differential properties of three discourse connectives in Turkish: a corpus-based analysis of Fakat, Yoksa, Ayrıca. In: Benz, A., Kuehlnlein, P., Stede, M. (eds.) Constraints in Discourse III. John Benjamins, Amsterdam (2012)

    Google Scholar 

  32. Zeyrek, D., Demirşahin, I., Sevdik-Çallı, A., Çakıcı, R.: Turkish discourse bank: porting a discourse annotation style to a morphologically rich language. Dialogue Discourse 4(2), 174–184 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deniz Zeyrek .

Editor information

Editors and Affiliations

Appendix 1: The Number of Files and Annotations in TDB 1.0

Appendix 1: The Number of Files and Annotations in TDB 1.0

See Table 8.

Table 8 List of search tokens for TDB sorted by their occurrence as discourse connectives

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Demirşahin, I., Zeyrek, D. (2017). Pair Annotation as a Novel Annotation Procedure: The Case of Turkish Discourse Bank. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_46

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_46

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics