Abstract
Research in affective computing has mainly focused on analyzing human emotional states as perceivable within limited contexts such as speech utterances. In our study, we focus on the dynamic transitions of the emotional states that are appearing throughout the conversations and investigate computational models to automatically label emotional states using the proposed affective scene framework. An affective scene includes a complete sequence of emotional states in a conversation from its start to its end. Affective scene instances include different patterns of behavior such as who manifests an emotional state, when it is manifested, and which kinds of changes occur due to the influence of one’s emotion onto another interlocutor. In this paper, we present the design and training of an automatic affective scene segmentation and classification system for spoken conversations. We comparatively evaluate the contributions of different feature types in the acoustic, lexical and psycholinguistic space and their correlations and combination.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this chapter, the word ‘interlocutor’ encompasses the person speaking and expressing (speaker) the emotion and the person listening and perceiving (listener) that emotion.
- 2.
- 3.
Turn refers to the spoken content of a speaker at a time. For example, speaker A says something, which is speaker A’s turn, then, speaker B says something, which is speaker B’s turn.
- 4.
- 5.
By data level, we refer to the data preparation phase, i.e., before feature extraction we select segments of the majority class, which is neutral in this case.
- 6.
By feature level, we refer to the over-sampling process on feature vector for minority classes.
References
Alam F (2016) Computational models for analyzing affective behaviors and personality from speech and text. PhD thesis, University of Trento
Alam F, Riccardi G (2013) Comparative study of speaker personality traits recognition in conversational and broadcast news speech. In: Proceedings of interspeech, ISCA, pp 2851–2855
Alam F, Riccardi G (2014) Fusion of acoustic, linguistic and psycholinguistic features for speaker personality traits recognition. In: Proceedings of international conference on acoustics, speech and signal processing (ICASSP), pp 955–959
Alam F, Riccardi G (2014) Predicting personality traits using multimodal information. In: Proceedings of the 2014 ACM multi media on workshop on computational personality recognition, ACM, pp 15–18
Alam F, Chowdhury SA, Danieli M, Riccardi G (2016) How interlocutors coordinate with each other within emotional segments? In: COLING: international conference on computational linguistics
Baranyi P, Csapó Á (2012) Definition and synergies of cognitive infocommunications. Acta Polytech Hung 9(1):67–83
Barrett LF, Lewis M, Haviland-Jones JM (2016) Handbook of emotions. Guilford Publications
Carletta J (1996) Assessing agreement on classification tasks: the kappa statistic. Comput Linguist 22(2):249–254
Castán D, Ortega A, Miguel A (2014) Lleida E (2014) Audio segmentation-by-classification approach based on factor analysis in broadcast news domain. EURASIP J Audio Speech Music Process 1:1–13
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 321–357
Chowdhury SA (2017) Computational modeling of turn-taking dynamics in spoken conversations. PhD thesis, University of Trento
Chowdhury SA, Riccardi G (2017) A deep learning approach to modeling competitiveness in spoken conversation. In: Proceedings of international conference on acoustics, speech and signal processing (ICASSP), IEEE
Chowdhury SA, Riccardi G, Alam F (2014) Unsupervised recognition and clustering of speech overlaps in spoken conversations. In: Proceedings of workshop on speech, language and audio in multimedia—SLAM2014. pp 62–66
Chowdhury SA, Danieli M, Riccardi G (2015) Annotating and categorizing competition in overlap speech. In: Proceedings of ICASSP. IEEE
Chowdhury SA, Danieli M, Riccardi G (2015) The role of speakers and context in classifying competition in overlapping speech. In: Sixteenth annual conference of the international speech communication association
Chowdhury SA, Stepanov E, Riccardi G (2016) Predicting user satisfaction from turn-taking in spoken conversations. In: Proceedings of Interspeech
Danieli M, Riccardi G, Alam F (2015) Emotion unfolding and affective scenes: a case study in spoken conversations. In: Proceedings of emotion representations and modelling for companion systems (ERM4CT) 2015. ICMI
Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In: Proceedings of Interspeech. pp 801–804
Eyben F, Weninger F, Gross F, Schuller B (2013) Recent developments in opensmile, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on Multimedia (ACMM). ACM, pp 835–838
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuousvalued attributes for classification learning. Thirteenth international joint conference on articial intelligence, vol 2. Morgan Kaufmann Publishers, pp 1022–1027
Filipowicz A, Barsade S, Melwani S (2011) Understanding emotional transitions: the interpersonal consequences of changing emotions in negotiations. J Pers Soc Psychol 101(3):541
Fisher W, Groff R, Roane H (2011) Applied behavior analysis: history, philosophy, principles, and basic methods. In: Handbook of applied behavior analysis, pp 3–13
Frijda NH (1993) Moods, emotion episodes, and emotions
Galanis D, Karabetsos S, Koutsombogera M, Papageorgiou H, Esposito A, Riviello MT (2013) Classification of emotional speech units in call centre interactions. In: 2013 IEEE 4th international conference on cognitive infocommunications (CogInfoCom). IEEE, pp 403–406
Gross JJ (1998) The emerging field of emotion regulation: an integrative review. Rev Gen Psychol 2(3):271
Gross JJ, Thompson RA (2007) Emotion regulation: conceptual foundations. In: Handbook of emotion regulation, vol 3, p 24
Harrigan J, Rosenthal R (2008) New handbook of methods in nonverbal behavior research. Oxford University Press
Hoffman ML (2008) Empathy and prosocial behavior. Handb Emot 3:440–455
Juslin PN, Scherer KR (2005) Vocal expression of affect. In: The new handbook of methods in nonverbal behavior research. pp 65–135
Kim S, Georgiou PG, Lee S, Narayanan S (2007) Real-time emotion detection system using speech: Multi-modal fusion of different timescale features. In: Proceedings of multimedia signal processing, 2007 (MMSP 2007). pp 48–51
Konar A, Chakraborty A (2014) Emotion recognition: a pattern analysis approach. Wiley
Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: Proceedings of machine learning: European conference on machine learning (ECML). Springer, pp 171–182
Lee CC, Busso C, Lee S, Narayanan SS (2009) Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions. In: Proceedings of Interspeech. pp 1983–1986
McCall C, Singer T (2013) Empathy and the brain. In: Understanding other minds: Perspectives from developmental social neuroscience. pp 195–214
NIST (2009) The 2009 RT-09 RIch transcription meeting recognition evaluation plan. NIST
Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count: Liwc 2001. Lawrence Erlbaum Associates, Mahway, p 71
Perry A, Shamay-Tsoory S (2013) Understanding emotional and cognitive empathy: a neuropsychological. In: Understanding other minds: Perspectives from developmental social neuroscience. Oup Oxford, p 178
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. MIT Press. http://research.microsoft.com/~jplatt/smo.html
Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, et al (2011) The kaldi speech recognition toolkit. In: Proceedings of automatic speech recognition and understanding workshop (ASRU). pp 1–4
Riccardi G, Hakkani-Tür D (2005) Grounding emotions in human-machine conversational systems. In: Lecture notes in computer science. Springer, pp 144–154
Robbins S, Judge TA, Millett B, Boyle M (2013) Organisational behaviour. Pearson Higher Education AU
Scherer KR (2000) Psychological models of emotion. Neuropsychol Emot 137(3):137–162
Scherer KR (2001) Appraisal considered as a process of multilevel sequential checking. Theory Methods Res Apprais Process Emot 92–120
Schuller B, Batliner A (2013) Computational paralinguistics: emotion, affect and personality in speech and language processing. Wiley
Schuller B, Steidl S, Batliner A (2009a) The interspeech 2009 emotion challenge. In: Proceedings of Interspeech. pp 312–315
Schuller B, Vlasenko B, Eyben F, Rigoll G, Wendemuth A (2009b) Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of automatic speech recognition and understanding workshop (ASRU). pp 552–557
Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Müller C, Narayanan S (2013) Paralinguistics in speech and language state-of-the-art and the challenge. Comput Speech Lang 27(1):4–39
Stepanov E, Favre B, Alam F, Chowdhury S, Singla K, Trione J, Béchet F, Riccardi G (2015) Automatic summarization of call-center conversations. In: In Proceedings of the IEEE automatic speech recognition and understanding workshop (ASRU 2015)
Tamaddoni Jahromi A, Sepehri MM, Teimourpour B, Choobdar S (2010) Modeling customer churn in a non-contractual setting: the case of telecommunications service providers. J Strateg Mark 18(7):587–598
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Alam, F., Danieli, M., Riccardi, G. (2019). Automatic Labeling Affective Scenes in Spoken Conversations. In: Klempous, R., Nikodem, J., Baranyi, P. (eds) Cognitive Infocommunications, Theory and Applications. Topics in Intelligent Engineering and Informatics, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-95996-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-95996-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95995-5
Online ISBN: 978-3-319-95996-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)