Abstract
Speech carries information not only about the lexical content, but also about the age, gender, signature and emotional state of the speaker. Speech in different emotional states is accompanied by distinct changes in the production mechanism. In this chapter, we present a review of analysis methods used for emotional speech. In particular, we focus on the issues in data collection, feature representations and development of automatic emotion recognition systems. The significance of the excitation source component of speech production in emotional states is examined in detail. The derived excitation source features are shown to carry the emotion correlates.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Airas M, Alku P (2004) Emotions in short vowel segments: effects of the glottal flow as reflected by the normalized amplitude quotient. In: Affective dialogue systems. Springer, pp 13–24
Airas M, Pulakka H, Bäckström T, Alku P (2005) A toolkit for voice inverse filtering and parametrization. In: INTERSPEECH. Lisbon, Portugal, pp 2145–2148
Alku P (2011) Glottal inverse filtering analysis of human voice production a review of estimation and parameterization methods of the glottal excitation and their applications. Sadhana 36(5):623–650
Alku P, Vilkman E (1996) A comparison of glottal voice source quantification parameters in breathy, normal and pressed phonation of female and male speakers. Folia Phoniatrica et Logopaedica 48:240–254
Amer MR, Siddiquie B, Richey C, Divakaran A (2014) Emotion recognition in speech using deep networks. In: ICASSP. Florence, Italy, pp 3752–3756
Amir N, Kerret O, Karlinski D (2001) Classifying emotions in speech: a comparison of methods. In: INTERSPEECH. Aalborg, Denmark, pp 127–130
Ang j, Dhillon R, Krupski A, Shriberg E, Stolcke A (2002) Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: INTERSPEECH. Denver, Colorado, USA
Arias JP, Busso C, Yoma NB (2013) Energy and F0 contour modeling with functional data analysis for emotional speech detection. In: INTERSPEECH. Lyon, France, pp 2871–2875
Arias JP, Busso C, Yoma NB (2014) Shape-based modeling of the fundamental frequency contour for emotion detection in speech. Comput Speech Lang 28(1):278–294
Atassi H, Esposito A (2008) A speaker independent approach to the classification of emotional vocal expressions. In: IEEE international conference on tools with artificial intelligence (ICTAI’08), vol 2. Dayton, Ohio, USA, pp 147–152
Atassi H, Riviello M, Smékal Z, Hussain A, Esposito A (2010) Emotional vocal expressions recognition using the COST 2102 Italian database of emotional speech. In: Esposito A, Campbell N, Vogel C, Hussain A, Nijholt A (eds) Development of multimodal interfaces: active listening and synchrony. Lecture notes in computer science, vol 5967. Springer, Berlin, pp 255–267
Bachorowski J (1999) Vocal expression and perception of emotion. Curr Dir Psychol Sci 8(2):53–57
Banse R, Scherer KR (1996) Acoustic profiles in vocal emotion expression. J Personal Soc Psychol 70(3):614–636
Batliner A, Schuller B, Seppi D, Steidl S, Devillers L, Vidrascu L, Vogt T, Aharonson V, Amir N (2011) The automatic recognition of emotions in speech. In: Petta P, Pelachaud C, Cowie R (eds) Emotion-oriented systems. Springer, pp 71–99
Bezooijen RAMG, Otto SA, Heenan TA (1983) Recognition of vocal expressions of emotion: a three-nation study to identify universal characteristics. J Cross-Cult Psychol 14:387–406
Boersma P, Heuven VV (2001) Speak and unSpeak with PRAAT. Glot Int 5(9/10):341–347
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of German emotional speech. In: INTERSPEECH. Lisbon, Portugal, pp 1517–1520
Busso C, Bulut M, Lee C, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan S (2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Res Eval 42(4):335–359
Chastagnol C, Devillers L (2011) Analysis of anger across several agent-customer interactions in French call centers. In: ICASSP. Prague, Czech Republic, pp 4960–4963
Childers DG, Lee CK (1991) Vocal quality factors: analysis, synthesis, and perception. J Acoust Soc Am 90(5):2394–2410
Cowie R, Cornelius RR (2003) Describing the emotional states that are expressed in speech. Speech Commun 40(1–2):5–32
Darwin C (1872) The expression of emotion in man and animals. reprinted by University of Chicago Press, Murray, London, UK (1975)
Davitz JR (1964) Personality, perceptual, and cognitive correlates of emotional sensitivity. In: Davitz JR (ed) The communication of emotional meaning. McGraw-Hill, New York
Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: international conference on spoken language processing (ICSLP). Philadelphia, USA, pp 1970–1973
Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In: INTERSPEECH. Pittsburgh, PA, USA, pp 801–804
Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun 40(1–2):33–60
Ekman P (1992) An argument for basic emotions. Cognit Emot 6:169–200
Engberg IS, Hansen AV, Andersen O, Dalsgaard P (1997) Design, recording and verification of a Danish emotional speech database. In: EUROSPEECH. Rhodes, Greece, pp 1695–1698
Erden M, Arslan LM (2011) Automatic detection of anger in human-human call center dialogs. In: INTERSPEECH. Florence, Italy, pp 81–84
Erickson D, Yoshida K, Menezes C, Fujino A, Mochida T, Shibuya Y (2006) Exploratory study of some acoustic and articulatory characteristics of sad speech. Phonetica 63:1–5
Erro D, Navas E, Hernáez I, Saratxaga I (2010) Emotion conversion based on prosodic unit selection. IEEE Trans Audio Speech Lang Process 18(5):974–983
Espinosa HP, Garcia JO, Pineda LV (2010) Features selection for primitives estimation on emotional speech. In: ICASSP. Florence, Italy, pp 5138–5141
Eyben F, Wollmer M, Schuller B (2009) OpenEarIntroducing the Munich open-source emotion and affect recognition toolkit. In: International conference on affective computing and intelligent interaction and workshops (ACII). Amsterdam, Netherlands, pp 1–6
Eyben F, Batliner A, Schuller B, Seppi D, Steidl S (2010) Cross-corpus classification of realistic emotions—some pilot experiments. In: International workshop on EMOTION (satellite of LREC): corpora for research on emotion and affect. Valletta, Malta, pp 77–82
Eyben F, Wöllmer M, Schuller B (2010) OpenSMILE: The Munich versatile and fast open-source audio feature extractor. In: International conference on multimedia. Firenze, Italy, pp 1459–1462
Fairbanks G, Hoaglin LW (1941) An experimental study of the durational characteristics of the voice during the expression of emotion. Speech Monogr 8:85–91
Fairbanks G, Pronovost W (1939) An experimental study of the pitch characteristics of the voice during the expression of emotion. Speech Monogr 6:87–104
Fant G, Lin Q, Gobl C (1985) Notes on glottal flow interaction. Speech Transm Lab Q Progress Status Rep, KTH 26:21–25
Fernandez R, Picard R (2011) Recognizing affect from speech prosody using hierarchical graphical models. Speech Commun 53(9–10):1088–1103
Fonagy I, Magdics K (1963) Emotional patterns in intonation and music. Kommunikationforsch 16:293–326
Gangamohan P, Mittal VK, Yegnanarayana B (2012) A flexible analysis and synthesis tool (FAST) for studying the characteristic features of emotion in speech. In: IEEE international conference on consumer communications and networking conference. Las Vegas, USA pp 266–270
Gangamohan P, Sudarsana RK, Yegnanarayana B (2013) Analysis of emotional speech at subsegmental level. In: INTERSPEECH. Lyon, France, pp 1916–1920
Gangamohan P, Sudarsana RK, Suryakanth VG, Yegnanarayana B (2014) Excitation source features for discrimination of anger and happy emotions. In: INTERSPEECH. Singapore, pp 1253–1257
Gnjatovic M, Rösner D (2010) Inducing genuine emotions in simulated speech-based human-machine interaction: the nimitek corpus. IEEE Trans Affect Comput 1(2):132–144
Gobl C (1988) Voice source dynamics in connected speech. Speech Trans Lab Q Progress Status Rep, KTH 1:123–159
Gobl C (1989) A preliminary study of acoustic voice quality correlates. Speech Trans Lab Q Progress Status Rep, KTH 4:9–21
Gobl C, Chasaide AN (1992) Acoustic characteristics of voice quality. Speech Commun 11(4):481–490
Gobl C, Chasaide AN (2003) The role of voice quality in communicating emotion, mood and attitude. Speech Commun 40(1–2):189–212
Grichkovtsova I, Morel M, Lacheret A (2012) The role of voice quality and prosodic contour in affective speech perception. Speech Commun 54(3):414–429
Grimm M, Kroschel K, Mower E, Narayanan S (2007) Primitives-based evaluation and estimation of emotions in speech. Speech Commun 49(10–11):787–800
Grimm M, Kroschel K, Narayanan S (2008) The Vera am Mittag German audio-visual emotional speech database. In: International conference on multimedia and expo. Hannover, Germany, pp 865–868
Guruprasad S, Yegnanarayana B (2009) Perceived loudness of speech based on the characteristics of glottal excitation source. J Acoust Soc Am 126(4):2061–2071
Hansen JH, Womack BD (1996) Feature analysis and neural network-based classification of speech under stress. IEEE Trans Speech Audio Process 4(4):307–313
Hanson HM (1997) Glottal characteristics of female speakers: acoustic correlates. J Acoust Soc Am 101(1):466–481
Hassan A, Damper RI (2010) Multi-class and hierarchical SVMs for emotion recognition. In: INTERSPEECH. Chiba, Japan, pp 2354–2357
He L, Lech M, Allen N (2010) On the importance of glottal flow spectral energy for the recognition of emotions in speech. In: INTERSPEECH. Chiba, Japan, pp 2346–2349
Hershey JR, Olsen PA (2007) Approximating the Kullback Leibler divergence between Gaussian mixture models. In: ICASSP, vol 4. Montreal, Quebec, Canada, pp 317–320
Huber R, Batliner A, Buckow J, Nöth E, Warnke V, Niemann H (2000) Recognition of emotion in a realistic dialogue scenario. In: Proceedings of international conference on spoken language processing. Beijing, China, pp 665–668
Hübner D, Vlasenko B, Grosser T, Wendemuth A (2010) Determining optimal features for emotion recognition from speech by applying an evolutionary algorithm. In: INTERSPEECH. Chiba, Japan, pp 2358–2361
Izard CE (1977) Human emotions. Plenum Press, New York
Jeon JH, Xia R, Liu Y (2011) Sentence level emotion recognition based on decisions from subsentence segments. In: ICASSP. Lyon, France, pp 4940–4943
Jeon JH, Le D, Xia R, Liu Y (2013) A preliminary study of cross-lingual emotion recognition from speech: automatic classification versus human perception. In: INTERSPEECH. Prague, Czech Republic, pp 2837–2840
Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. In: European conference on machine learning. London, UK, pp 137–142
Kadiri SR, Gangamohan P, Mittal VK, Yegnanarayana B (2014) Naturalistic audio-visual emotion database. In: International conference on natural language processing. Goa, India, pp 127–134
Kadiri SR, Gangamohan P, Yegnanarayana B (2014) Discriminating neutral and emotional speech using neural networks. In: Interenational conference on natural language processing. Goa, India, pp 119–126
Kadiri SR, Gangamohan P, Gangashetty SV, Yegnanarayana B (2015) Analysis of excitation source features of speech for emotion recognition. In: INTERSPEECH. Dresden, Germany, pp 1032–1036
Keller E (2005) The analysis of voice quality in speech processing. In: Gèrard C, Anna E, Marcos F, Maria M (eds) Lecture notes in computer science. Springer, pp 54–73
Kim W, Hansen JHL (2010) Angry emotion detection from real-life conversational speech by leveraging content structure. In: ICASSP. Dallas, Texas, USA, pp 5166–5169
Kim J, Lee S, Narayanan S (2010) An exploratory study of manifolds of emotional speech. In: ICASSP. Dallas, Texas, USA, pp 5142–5145
Kim J, Park J, Oh Y (2011) On-line speaker adaptation based emotion recognition using incremental emotional information. In: ICASSP. Prague, Czech Republic, pp 4948–4951
Klasmeyer G, Sendlmeier WF (2000) Voice and emotional states. In: Voice quality measurement. Springer, Berlin, Germany, pp 339–358
Klatt DH (1980) Software for a cascade/parallel formant synthesizer. J Acoust Soc Am 67(3):971–995
Koolagudi SG, Sreenivasa Rao K (2012) Emotion recognition from speech: a review. Int J Speech Technol 15(2):99–117
Koolagudi SG, Maity S, Vuppala AK, Chakrabarti S, Sreenivasa Rao K (2009) IITKGP-SESC: speech database for emotion analysis. In: Communications in computer and information science, pp 485–492
Laver John DM (1968) Voice quality and indexical information. Int J Lang Commun Disord 3(1):43–54
Lee C, Mower E, Busso C, Lee S, Narayanan S (2011) Emotion recognition using a hierarchical binary decision tree approach. Speech Commun 53(9–10):1162–1171
Lee CM, Narayanan S (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303
Lee CM, Yildirim S, Bulut M, Kazemzadeh A, Busso C, Deng Z, Lee S, Narayanan S (2004) Emotion recognition based on phoneme classes. In: INTERSPEECH. JejuIsland, Korea, pp 205–211
Lieberman P, Michaels SB (1962) Some aspects of fundamental frequency and envelope amplitude as related to the emotional content of speech. J Acoust Soc Am 34(7):922–927
Lin J, Wu C, Wei W (2013) Emotion recognition of conversational affective speech using temporal course modeling. In: INTERSPEECH. Lyon, France, pp 1336–1340
Luengo I, Navas E, Hernáez I, Sánchez J (2005) Automatic emotion recognition using prosodic parameters. In: INTERSPEECH. Lisbon, Portugal, pp 493–496
Lugger M, Yang B (2007) The relevance of voice quality features in speaker independent emotion recognition. In: ICASSP, vol 4. Honolulu, Hawaii, USA, pp 17–20
Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580
Mansoorizadeh M, Charkari NM (2007) Speech emotion recognition: comparison of speech segmentation approaches. In: Proceedings of IKT, Mashad, Iran
McGilloway S, Cowie R, Douglas-Cowie E, Gielen S, Westerdijk M, Stroeve S (2000) Approaching automatic recognition of emotion from voice: a rough benchmark. In: ISCA tutorial and research workshop (ITRW) on speech and emotion. Newcastle, Northern Ireland, UK
Mittal VK, Yegnanarayana B (2013) Effect of glottal dynamics in the production of shouted speech. J Acoust Soc Am 133(5):3050–3061
Morrison D, Wang R, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Commun 49(2):98–112
Murray IR, Arnott JL (1993) Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J Acoust Soc Am 93(2):1097–1108
Murty KSR, Yegnanarayana B (2008) Epoch extraction from speech signals. IEEE Trans Audio Speech Lang Process 16(8):1602–1613
Nogueiras A, Moreno A, Bonafonte A, Mariño JB (2001) Speech emotion recognition using hidden Markov models. In: EUROSPEECH. Aalborg, Denmark, pp 2679–2682
Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41(4):603–623
Oatley K (1989) The importance of being emotional. New Sci 123:33–36
Pereira C (2000) Dimensions of emotional meaning in speech. In: ISCA tutorial and research workshop (ITRW) on speech and emotion. Northern Ireland, UK
Polzehl T, Sundaram S, Ketabdar H, Wagner M, Metze F (2009) Emotion classification in children’s speech using fusion of acoustic and linguistic features. In: INTERSPEECH. Brighton, UK, pp 340–343
Prasanna SRM, Govind D (2010) Analysis of excitation source information in emotional speech. In: INTERSPEECH. Chiba, Japan, pp 781–784
Rothenberg M (1973) A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. J Acoust Soc Am 53(6):1632–1645
Rozgic V, Ananthakrishnan S, Saleem S, Kumar R, Vembu AN, Prasad R (2012) Emotion recognition using acoustic and lexical features. In: INTERSPEECH. Portland, USA
Scherer KR (1981) Speech and emotional states. In: Darby JK (ed) Speech evaluation in psychiatry. Grune and Stratton, New York
Scherer KR (1984) On the nature and function of emotion: a component process approach. In: Scherer KR, Ekman P (eds) Approaches to emotion. Lawrence Elbraum, Hillsdale
Scherer KR (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40(1–2):227–256
Scholsberg H (1941) A scale for the judgment of facial expressions. J Exp Psychol 29(6):497–510
Schlosberg H (1954) Three dimensions of emotion. J Psychol Rev 61(2):81–88
Schröder M (2001) Emotional speech synthesis-a review. In: INTERSPEECH. Aalborg,Denmark, pp 561–564
Schröder M (2004) Speech and emotion research: an overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD thesis, Saarland University
Schröder M, Cowie R, Douglas-Cowie E, Westerdijk M, Gielen SC (2001) Acoustic correlates of emotion dimensions in view of speech synthesis. In: INTERSPEECH. Aalborg, Denmark, pp 87–90
Schuller B (2011) Recognizing affect from linguistic information in 3D continuous space. IEEE Trans Affect Comput 2(4):192–205
Schuller B, Rigoll G (2006) Timing levels in segment-based speech emotion recognition. In: INTERSPEECH. Pittsburgh, Pennsylvania, pp 17–21
Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: ICASSP vol 1. Montreal, Quebec, Canada, pp 577–580
Schuller B, Müller R, Lang M, Rigoll G (2005) Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: INTERSPEECH. Lisbon, Portugal, pp 805–808
Schuller B, Villar RJ, Rigoll G, Lang MK (2005) Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition. In: ICASSP. Philadelphia, Pennsylvania, USA, pp 325–328
Schuller B, Batliner A, Steidl S, Seppi D (2009) Emotion recognition from speech: putting ASR in the loop. In: ICASSP. Taipei, Taiwan, pp 4585–4588
Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53(9–10):1062–1087
Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G (2010) Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans Affect Comput 1(2):119–131
Shami M, Verhelst W (2007) An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Commun 49(3):201–212
Shaver P, Schwartz J, kirson D, O’Connor C (1987) Emotion, knowledge: further exploration of a prototype approach. J Personal Soc Psychol 52:1061–1086
Sneddon I, McRorie M, McKeown G, Hanratty J (2012) The Belfast induced natural emotion database. IEEE Trans Affect Comput 3(1):32–41
Steidl S (2009) Automatic classification of emotion related user states in spontaneous children’s speech. PhD thesis, Universität Erlangen-Nürnberg, Germany
Steidl S, Batliner A, Seppi D, Schuller B (2010) On the impact of children’s emotional speech on acoustic and language models. EURASIP J Audio, Speech, and Music Processing
Stein N, Oatley K (1992) Basic emotions: theory and measurement. Cognit Emot 6:161–168
Sun R, Moore II E (2012) A preliminary study on cross-databases emotion recognition using the glottal features in speech. In: INTERSPEECH. Portland, USA, pp 1628–1631
Sun R, Moore II E, Torres JF (2009) Investigating glottal parameters for differentiating emotional categories with similar prosodics. In: ICASSP. Taipei, Taiwan, pp 4509–4512
Sundberg J, Patel S, Bjorkner E, Scherer KR (2011) Interdependencies among voice source parameters in emotional speech. IEEE Trans Affect Comput 2(3):162–174
Tahon M, Degottex G, Devillers L (2012) Usual voice quality features and glottal features for emotional valence detection. In: Speech Prosody. Shanghai, China, pp 693–696
Titze IR (1994) Principles of voice production. Prentice-Hall, Englewood Cliffs
Truong Khiet P, van Leeuwen David A, de Jong Franciska M G (2012) Speech-based recognition of self-reported and observed emotion in a dimensional space. Speech Commun 54(9):1049–1063
Ververidis D, Kotropoulos C (2003) A review of emotional speech databases. In: Proceedings of panhellenic conference on informatics (PCI). Thessaloniki, Greece, pp 560–574
Ververidis D, Kotropoulos C (2005) Emotional speech classification using Gaussian mixture models. In: International symposium on circuits and systems. Kobe, Japan, pp 2871–2874
Vlasenko B, Prylipko D, Philippou-Hübner D, Wendemuth A (2011) Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In: INTERSPEECH. Florence, Italy, pp 1577–1580
Vroomen J, Collier R, Mozziconacci S (1993) Duration and intonation in emotional speech. In: EUROSPEECH, vol 1. Berlin, Germany, pp 577–580
Švec Jan G, Schutte Harm K, Miller Donald G (1999) On pitch jumps between chest and falsetto registers in voice: data from living and excised human larynges. J Acoust Soc Am 106(3):1523–1531
Waaramaa T, Laukkanen AM, Airas M, Alku P (2010) Perception of emotional valences and activity levels from vowel segments of continuous speech. J Voice 24(1):30–38
Williams CE, Stevens KN (1969) On determining the emotional state of pilots during flight: an exploratory study. Aerosp Med 40:1369–1372
Williams CE, Stevens KN (1972) Emotions and speech: some acoustical correlates. J Acoust Soc Am 52(2):1238–1250
Wu S, Falk TH, Chan W (2011) Automatic speech emotion recognition using modulation spectral features. Speech Commun 53(5):768–785
Yegnanarayana B, Dhananjaya N (2013) Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Commun 55(6):782–795
Yegnanarayana B, Murty KSR (2009) Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans Audio Speech Lang Process 17(4):614–624
Yeh L, Chi T (2010) Spectro-temporal modulations for robust speech emotion recognition. In: INTERSPEECH. Chiba, Japan, pp 789–792
Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Gangamohan, P., Kadiri, S.R., Yegnanarayana, B. (2016). Analysis of Emotional Speech—A Review. In: Esposito, A., Jain, L. (eds) Toward Robotic Socially Believable Behaving Systems - Volume I . Intelligent Systems Reference Library, vol 105. Springer, Cham. https://doi.org/10.1007/978-3-319-31056-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-31056-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31055-8
Online ISBN: 978-3-319-31056-5
eBook Packages: EngineeringEngineering (R0)