On the Use of Kappa Coefficients to Measure the Reliability of the Annotation of Non-acted Emotions

Callejas, Zoraida; López-Cózar, Ramón

doi:10.1007/978-3-540-69369-7_25

Zoraida Callejas¹ &
Ramón López-Cózar¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5078))

Included in the following conference series:

International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems

1399 Accesses
5 Citations

Abstract

In this paper we study the impact of three main factors on measuring the reliability of the annotation of non-acted emotions: the annotator biases, the similarity between the classified emotions, and the usage of contextual information during the annotation. We employed a corpus collected from real interactions between users and a spoken dialogue system. The user utterances were classified by nine non-expert annotators into four categories. We discuss the problems that the nature of non-acted emotional corpora impose in evaluating the reliability of the annotations using Kappa coefficients. Although deeply affected by the so-called paradoxes of Kappa coefficients, our study shows how taking into account context information and similarity between emotions helps to obtain values closer to the maximum agreement rates attainable, and allow the detection of emotions which are expressed more subtly by the users.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ang, J., Dhillon, R., Krupski, A., Shriberg, E., Stolcke, A.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proc. of Interspeech’02 - ICSLP, Denver, USA, pp. 2037–2040 (2002)
Google Scholar
Artstein, R., Poesio, M.: kappa ₃ = alpha (or beta). Technical report, University of Essex (2005)
Google Scholar
Callejas, Z., López-Cózar, R.: Implementing modular dialogue systems: a case study. In: Proc. of ASIDE 2005 (2005)
Google Scholar
Davies, M., Fleiss, J.L.: Measuring agreement for multinomial data. Biometrics 38(4), 1047–1051 (1982)
Article MATH Google Scholar
Devillers, L., Vidrascu, L., Lamel, L.: Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18(4), 407–422 (2005)
Article Google Scholar
Dunn, G.: Design and analysis of reliability studies: the statistical evaluation of measurement errors. Edward Arnold (1989)
Google Scholar
Feinstein, A.R., Cicchetti, D.V.: High agreement but low Kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology 43(6), 543–549 (1990)
Article Google Scholar
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5), 378–382 (1971)
Article Google Scholar
Forbes-Riley, K., Litman, D.J.: Predicting emotion in spoken dialogue from multiple knowledge sources. In: Proc. of HLT-NAACL 2004, pp. 201–208 (2004)
Google Scholar
Krippendorff, K.: Content Analysis: An Introduction to its Methodology. Sage Publications, Inc., Thousand Oaks (2003)
Google Scholar
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Article MATH MathSciNet Google Scholar
Lantz, C.A., Nebenzahl, E.: Behavior and interpretation of the κ statistic: Resolution of the two paradoxes. Journal of Clinical Epidemiology 49(4), 431–434 (1996)
Article Google Scholar
Lee, C.M., Narayanan, S.S.: Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing 13(2), 293–303 (2005)
Article Google Scholar
Litman, D.J., Forbes-Riley, K.: Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Communication 48(5), 559–590 (2006)
Article Google Scholar
Morrison, D., Wang, R., Silva, L.C.D.: Ensemble methods for spoken emotion recognition in call-centers. Speech Communication 49, 98–112 (2007)
Article Google Scholar
Plutchik, R.: EMOTION: A psychoevolutionary synthesis. Harper and Row publishers (1980)
Google Scholar
Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology 39, 1161–1178 (1980)
Article Google Scholar
Shafran, I., Riley, M., Mohri, M.: Voice signatures. In: Proc. of IEEE ASRU 2003 Workhop, pp. 31–36 (2003)
Google Scholar
Vidrascu, L., Devillers, L.: Real-Life Emotion Representation and Detection in Call Centers Data. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 739–746. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Languages and Computer Systems, , 18071, Granada, Spain
Zoraida Callejas & Ramón López-Cózar

Authors

Zoraida Callejas
View author publications
You can also search for this author in PubMed Google Scholar
Ramón López-Cózar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Elisabeth André Laila Dybkjær Wolfgang Minker Heiko Neumann Roberto Pieraccini Michael Weber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Callejas, Z., López-Cózar, R. (2008). On the Use of Kappa Coefficients to Measure the Reliability of the Annotation of Non-acted Emotions. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds) Perception in Multimodal Dialogue Systems. PIT 2008. Lecture Notes in Computer Science(), vol 5078. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69369-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-69369-7_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69368-0
Online ISBN: 978-3-540-69369-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics