Abstract
The ‘traditional’ first two dimensions in emotion research are VALENCE and AROUSAL. Normally, they are obtained by using elicited, acted data. In this paper, we use realistic, spontaneous speech data from our ‘AIBO’ corpus (human-robot communication, children interacting with Sony’s AIBO robot). The recordings were done in a Wizard-of-Oz scenario: the children believed that AIBO obeys their commands; in fact, AIBO followed a fixed script and often disobeyed. Five labellers annotated each word as belonging to one of eleven emotion-related states; seven of these states which occurred frequently enough are dealt with in this paper. The confusion matrices of these labels were used in a Non-Metrical Multi-dimensional Scaling to display two dimensions; the first we interpret as VALENCE, the second, however, not as AROUSAL but as INTERACTION, i.e., addressing oneself (angry, joyful) or the communication partner (motherese, reprimanding). We show that it depends on the specifity of the scenario and on the subjects’ conceptualizations whether this new dimension can be observed, and discuss impacts on the practice of labelling and processing emotional data. Two-dimensional solutions based on acoustic and linguistic features that were used for automatic classification of these emotional states are interpreted along the same lines.
Similar content being viewed by others
References
Ai, H., Litman, D.J., Forbes-Riley, K., Rotaru, M., Tetreault, J., Purandare, A.: Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proceedings of the International Conference on Spoken Language Processing (Interspeech 2006—ICSLP), pp. 797–800. Pittsburgh (2006)
Ang, J., Dhillon, R., Krupski, A., Shriberg, E., Stolcke, A.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings of the International Conference on Spoken Language Processing (Interspeech 2002—ICSLP), pp. 2037–2040. Denver, (2002)
Batliner A. and Möbius B. (2005). Prosodic models, automatic speech understanding and speech synthesis: towards the common ground?. In: Barry, W. and Dommelen, W. (eds) The Integration of Phonetic Knowledge in Speech Technology, pp 21–44. Springer, Dordrecht
Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E.: Desperately seeking emotions: actors, wizards, and human beings. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 195–200. Newcastle (2000a)
Batliner A., Huber R., Niemann H., Nöth E., Spilker J. and Fischer K. (2000b). The recognition of emotion. In: Wahlster, W. (eds) Verbmobil: Foundations of Speech-to-Speech Translations, pp 122–130. Springer, Berlin
Batliner A., Fischer K., Huber R., Spilker J. and Nöth E. (2003a). How to find trouble in communication. Speech Comm. 40:117–143
Batliner, A., Hacker, C., Steidl, S., Nöth, E., Haas, J.: User states, user strategies, and system performance: how to match the one with the other. In: Proceedings of an ISCA Tutorial and Research Workshop on Error Handling in Spoken Dialogue Systems, pp. 5–10. Chateau d’Oex (2003b)
Batliner, A., Zeissler, V., Frank, C., Adelhardt, J., Shi, R.P., Nöth, E.: We are not amused—but how do you know? User states in a multi-modal dialogue system. In: Proceedings of the European Conference on Speech Communication and Technology (Interspeech 2003—Eurospeech), pp. 733–736. Geneva (2003c)
Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M., Wong, M.: “You stupid tin box”—children interacting with the AIBO robot: a cross-linguistic emotional speech corpus. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2004), pp. 171–174. Lisbon (2004a)
Batliner A., Hacker C., Steidl S., Nöth E. and Haas J. (2004b). From emotion to interaction: lessons from real human-machine-dialogues. In: André, E., Dybkiaer, L., Minker, W., and Heisterkamp, P. (eds) Affective Dialogue Systems, Proceedings of a Tutorial and Research Workshop, Lecture Notes in Artificial Intelligence, pp 1–12. Springer, Berlin
Batliner, A., Steidl, S., Hacker, C., Nöth, E., Niemann, H.: Private emotions vs. social interaction—towards new dimensions in research on emotion. In: Proceedings of a Workshop on Adapting the Interaction Style to Affective Factors, 10th International Conference on User Modelling. Edinburgh, no pagination (2005a)
Batliner, A., Steidl, S., Hacker, C., Nöth, E., Niemann, H.: Tales of tuning—prototyping for automatic classification of emotional user states. In: Proceedings of the European Conference on Speech Communication and Technology (Interspeech 2005—Eurospeech), pp. 489–492. Lisbon (2005b)
Batliner, A., Biersack, S., Steidl, S.: The prosody of pet robot directed speech: evidence from children. In: Proceedings of Speech Prosody 2006, pp. 1–4. Dresden (2006a)
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for improving automatic classification of emotional user states. In: Proceedings of Language Technologies (IS-LTC 2006), pp. 240–245. Ljubljana (2006b)
Campbell, N.: A language-resources approach to emotion: the analysis of expressive speech. In: Proceedings of a Satellite Workshop of the International Conference on Language Resources and Evaluation (LREC 2006) on Corpora for Research on Emotion and Affect, pp. 1–5. Genoa (2006)
Cowie R. and Cornelius R. (2003). Describing the emotional states that are expressed in speech. Speech Comm. 40:5–32
Cowie R. and Schröder M. (2004). Piecing together the emotion jigsaw. In: Bengio, S. and Bourlard, H. (eds) Machine Learning for Multimodal Interaction, First International Workshop, MLMI 2004, Martigny, Switzerland, June 21–23, 2004, Lecture Notes in Computer Science, pp 305–317. Springer, Berlin
Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schröder, M.: ‘FEELTRACE’: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 19–24. Newcastle, Northern Ireland (2000)
Devillers, L., Vidrascu, L.: Real-life emotions detection with lexical and paralinguistic cues on Human–Human call center dialogs. In: Proceedings of the International Conference on Spoken Language Processing (Interspeech 2006—ICSLP), pp. 801–804. Pittsburgh (2006)
Devillers L., Vidrascu L. and Lamel L. (2005). Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18:407–422
D’Mello, S.K., Craig, S.D., Witherspoon, A., McDaniel, B., Graesser, A.: Automatic detection of learner’s affect from conversational cues. User Model. User-Adapt. Inter. 18, doi:10.1007/s11257-007-9037-6 (2008)
Gratch J., Mao W. and Marsella S. (2006). Modeling social emotions and social attributions. In: Sun, R. (eds) Cognitive Modeling and Multi-agent Interactions, pp 219–251. Cambridge University Press, Cambridge
Jäger, R., Bortz, J.: Rating scales with smilies as symbolic labels—determined and checked by methods of Psychophysics. In: 70. Annual Meeting of the International Society for Psychophysics. Leipzig, no pagination (2001)
Kehrein R. (2002). Prosodie und Emotionen. Niemeyer, Tübingen
Kruskal, J., Wish, M.: Multidimensional scaling. Sage University, Beverly Hills and London (1978)
Labov W. (1970). The study of language in its social context. Stud. Gen. 3:30–87
Laskowski, K., Burger, S.: Annotation and analysis of emotionally relevant behavior in the ISL meeting corpus. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2006), pp. 1111–1116. Genoa (2006)
Lee, C., Narayanan, S., Pieraccini, R.: Recognition of negative emotions from the speech signal. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU’01). no pagination (2001)
Lyons, M., Akamatsu, S., Kamachi, M., Gyoba, J.: Coding facial expressions with gabor wavelets. In: Proceedings of the 3rd International Conference on Face & Gesture Recognition (FG ’98), pp. 200–205. Nara (1998)
Müller S. and Kasper W. (2000). HPSG analysis of German. In: Wahlster, W. (eds) Verbmobil: Foundations of Speech-to-Speech Translations, pp 238–253. Springer, Berlin
Neiberg, D., Elenius, K., Laskowski, K.: Emotion recognition in spontaneous speech using GMMs. In: Proceedings of The International Conference on Spoken Language Processing (Interspeech 2006—ICSLP), pp. 809–812. Pittsburgh (2006)
Ortony A., Clore G.L. and Collins A. (1988). The cognitive structure of emotion. Cambridge University Press, Cambridge
Osgood C., Suci G. and Tannenbaum P. (1957). The measurement of meaning. University of Illinois Press, Urbana
Picard R. (1997). Affective Computing. MIT Press, Cambridge, MA
Poggi, I., Pelachaud, C., Carolis, B.D.: To display or not to display? towards the architecture of a reflexive agent. In: Proceedings of the 2nd Workshop on Attitude, Personality and Emotions in User-adapted Interaction. User Modeling 2001, pp. 13–17. Sonthofen (2001)
Portele T. (2004). Interaction modeling in the SmartKom system. In: André, E., Dybkiaer, L., Minker, W. and Heisterkamp, P. (eds) Affective Dialogue Systems, Proceedings of a Tutorial and Research Workshop, Lecture Notes in Artificial Intelligence, pp 89–94. Springer, Berlin
Reidsma, D., Heylen, D., Ordelman, R.: Annotating emotions in meetings. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2006), pp. 1117–1122. Genoa (2006)
Sammon J. (1969). A nonlinear mapping for data structure analysis. IEEE Trans. Comput. C- 18:401–409
Scherer, K.R.: Adding the affective dimension: a new look in speech analysis and synthesis. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP 1996). Philadelphia, no pagination (1996)
Scherer, K.R.: The nature and study of appraisal: a review of the issues. In: Scherer K.R., Schorr A., Johnstone T. (eds.) Appraisal Processes in Emotion: Theory, Methods, Research, pp. 369–391. Oxford University Press (2001)
Scherer K.R. (2003). Vocal communication of emotion: a review of research paradigms. Speech Comm. 40:227–256
Scherer K. and Ceschi G. (2000). Criteria for emotion recognition from verbal and nonverbal expression: studying baggage loss in the airport. Personal. Soc. Psychol. Bull. 26:327–339
Schlosberg H. (1941). A scale for judgment of facial expressions. J. Exper. Psychol. 29:497–510
Schlosberg H. (1952). The description of facial expressions in terms of two dimensions. J. Exper. Psychol. 44:229–237
Schlosberg H. (1954). Three dimensions of emotion. Psychol. Rev. 61:81–88
Schröder, M.: Speech and Emotion Research. An Overview of Research Frameworks and a Dimensional Approach to Emotional Speech Synthesis, Vol. 7 of Reports in Phonetics, University of the Saarland. Institute for Phonetics, University of Saarbrücken (2004)
Slaney, M., McRoberts, G.: Baby ears: a recognition system for affective vocalizations. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1998), pp. 985–988. Seattle (1998)
Steidl, S., Levit, M., Batliner, A., Nöth, E., Niemann, H.: “Of all things the measure is man”: automatic classification of emotions and inter-labeler consistency. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), pp. 317–320. Philadelphia (2005)
Streit M., Batliner A. and Portele T. (2006). Emotions analysis and emotion-handling subdialogues. In: Wahlster, W. (eds) SmartKom: Foundations of Multimodal Dialogue Systems, pp 317–332. Springer, Berlin
Watzlawick P., Beavin J. and Jackson D.D. (1967). Pragmatics of human communications. W.W. Norton & Company, New York
Wundt W. (1896). Grundriss der Psychologie. Engelmann, Leipzig
Zeißler V., Adelhardt J., Batliner A., Frank C., Nöth E., Shi P. and Niemann H. (2006). The prosody module. In: Wahlster, W. (eds) SmartKom: Foundations of Multimodal Dialogue Systems, pp 139–152. Springer, Berlin
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Batliner, A., Steidl, S., Hacker, C. et al. Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech. User Model User-Adap Inter 18, 175–206 (2008). https://doi.org/10.1007/s11257-007-9039-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11257-007-9039-4