Skip to main content

Analysis of Emotional Speech—A Review

  • Chapter
  • First Online:

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 105))

Abstract

Speech carries information not only about the lexical content, but also about the age, gender, signature and emotional state of the speaker. Speech in different emotional states is accompanied by distinct changes in the production mechanism. In this chapter, we present a review of analysis methods used for emotional speech. In particular, we focus on the issues in data collection, feature representations and development of automatic emotion recognition systems. The significance of the excitation source component of speech production in emotional states is examined in detail. The derived excitation source features are shown to carry the emotion correlates.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Airas M, Alku P (2004) Emotions in short vowel segments: effects of the glottal flow as reflected by the normalized amplitude quotient. In: Affective dialogue systems. Springer, pp 13–24

    Google Scholar 

  2. Airas M, Pulakka H, Bäckström T, Alku P (2005) A toolkit for voice inverse filtering and parametrization. In: INTERSPEECH. Lisbon, Portugal, pp 2145–2148

    Google Scholar 

  3. Alku P (2011) Glottal inverse filtering analysis of human voice production a review of estimation and parameterization methods of the glottal excitation and their applications. Sadhana 36(5):623–650

    Article  Google Scholar 

  4. Alku P, Vilkman E (1996) A comparison of glottal voice source quantification parameters in breathy, normal and pressed phonation of female and male speakers. Folia Phoniatrica et Logopaedica 48:240–254

    Google Scholar 

  5. Amer MR, Siddiquie B, Richey C, Divakaran A (2014) Emotion recognition in speech using deep networks. In: ICASSP. Florence, Italy, pp 3752–3756

    Google Scholar 

  6. Amir N, Kerret O, Karlinski D (2001) Classifying emotions in speech: a comparison of methods. In: INTERSPEECH. Aalborg, Denmark, pp 127–130

    Google Scholar 

  7. Ang j, Dhillon R, Krupski A, Shriberg E, Stolcke A (2002) Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: INTERSPEECH. Denver, Colorado, USA

    Google Scholar 

  8. Arias JP, Busso C, Yoma NB (2013) Energy and F0 contour modeling with functional data analysis for emotional speech detection. In: INTERSPEECH. Lyon, France, pp 2871–2875

    Google Scholar 

  9. Arias JP, Busso C, Yoma NB (2014) Shape-based modeling of the fundamental frequency contour for emotion detection in speech. Comput Speech Lang 28(1):278–294

    Article  Google Scholar 

  10. Atassi H, Esposito A (2008) A speaker independent approach to the classification of emotional vocal expressions. In: IEEE international conference on tools with artificial intelligence (ICTAI’08), vol 2. Dayton, Ohio, USA, pp 147–152

    Google Scholar 

  11. Atassi H, Riviello M, Smékal Z, Hussain A, Esposito A (2010) Emotional vocal expressions recognition using the COST 2102 Italian database of emotional speech. In: Esposito A, Campbell N, Vogel C, Hussain A, Nijholt A (eds) Development of multimodal interfaces: active listening and synchrony. Lecture notes in computer science, vol 5967. Springer, Berlin, pp 255–267

    Google Scholar 

  12. Bachorowski J (1999) Vocal expression and perception of emotion. Curr Dir Psychol Sci 8(2):53–57

    Article  Google Scholar 

  13. Banse R, Scherer KR (1996) Acoustic profiles in vocal emotion expression. J Personal Soc Psychol 70(3):614–636

    Article  Google Scholar 

  14. Batliner A, Schuller B, Seppi D, Steidl S, Devillers L, Vidrascu L, Vogt T, Aharonson V, Amir N (2011) The automatic recognition of emotions in speech. In: Petta P, Pelachaud C, Cowie R (eds) Emotion-oriented systems. Springer, pp 71–99

    Google Scholar 

  15. Bezooijen RAMG, Otto SA, Heenan TA (1983) Recognition of vocal expressions of emotion: a three-nation study to identify universal characteristics. J Cross-Cult Psychol 14:387–406

    Article  Google Scholar 

  16. Boersma P, Heuven VV (2001) Speak and unSpeak with PRAAT. Glot Int 5(9/10):341–347

    Google Scholar 

  17. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of German emotional speech. In: INTERSPEECH. Lisbon, Portugal, pp 1517–1520

    Google Scholar 

  18. Busso C, Bulut M, Lee C, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan S (2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Res Eval 42(4):335–359

    Article  Google Scholar 

  19. Chastagnol C, Devillers L (2011) Analysis of anger across several agent-customer interactions in French call centers. In: ICASSP. Prague, Czech Republic, pp 4960–4963

    Google Scholar 

  20. Childers DG, Lee CK (1991) Vocal quality factors: analysis, synthesis, and perception. J Acoust Soc Am 90(5):2394–2410

    Article  Google Scholar 

  21. Cowie R, Cornelius RR (2003) Describing the emotional states that are expressed in speech. Speech Commun 40(1–2):5–32

    Article  MATH  Google Scholar 

  22. Darwin C (1872) The expression of emotion in man and animals. reprinted by University of Chicago Press, Murray, London, UK (1975)

    Google Scholar 

  23. Davitz JR (1964) Personality, perceptual, and cognitive correlates of emotional sensitivity. In: Davitz JR (ed) The communication of emotional meaning. McGraw-Hill, New York

    Google Scholar 

  24. Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: international conference on spoken language processing (ICSLP). Philadelphia, USA, pp 1970–1973

    Google Scholar 

  25. Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In: INTERSPEECH. Pittsburgh, PA, USA, pp 801–804

    Google Scholar 

  26. Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun 40(1–2):33–60

    Article  MATH  Google Scholar 

  27. Ekman P (1992) An argument for basic emotions. Cognit Emot 6:169–200

    Article  Google Scholar 

  28. Engberg IS, Hansen AV, Andersen O, Dalsgaard P (1997) Design, recording and verification of a Danish emotional speech database. In: EUROSPEECH. Rhodes, Greece, pp 1695–1698

    Google Scholar 

  29. Erden M, Arslan LM (2011) Automatic detection of anger in human-human call center dialogs. In: INTERSPEECH. Florence, Italy, pp 81–84

    Google Scholar 

  30. Erickson D, Yoshida K, Menezes C, Fujino A, Mochida T, Shibuya Y (2006) Exploratory study of some acoustic and articulatory characteristics of sad speech. Phonetica 63:1–5

    Article  Google Scholar 

  31. Erro D, Navas E, Hernáez I, Saratxaga I (2010) Emotion conversion based on prosodic unit selection. IEEE Trans Audio Speech Lang Process 18(5):974–983

    Article  Google Scholar 

  32. Espinosa HP, Garcia JO, Pineda LV (2010) Features selection for primitives estimation on emotional speech. In: ICASSP. Florence, Italy, pp 5138–5141

    Google Scholar 

  33. Eyben F, Wollmer M, Schuller B (2009) OpenEarIntroducing the Munich open-source emotion and affect recognition toolkit. In: International conference on affective computing and intelligent interaction and workshops (ACII). Amsterdam, Netherlands, pp 1–6

    Google Scholar 

  34. Eyben F, Batliner A, Schuller B, Seppi D, Steidl S (2010) Cross-corpus classification of realistic emotions—some pilot experiments. In: International workshop on EMOTION (satellite of LREC): corpora for research on emotion and affect. Valletta, Malta, pp 77–82

    Google Scholar 

  35. Eyben F, Wöllmer M, Schuller B (2010) OpenSMILE: The Munich versatile and fast open-source audio feature extractor. In: International conference on multimedia. Firenze, Italy, pp 1459–1462

    Google Scholar 

  36. Fairbanks G, Hoaglin LW (1941) An experimental study of the durational characteristics of the voice during the expression of emotion. Speech Monogr 8:85–91

    Article  Google Scholar 

  37. Fairbanks G, Pronovost W (1939) An experimental study of the pitch characteristics of the voice during the expression of emotion. Speech Monogr 6:87–104

    Article  Google Scholar 

  38. Fant G, Lin Q, Gobl C (1985) Notes on glottal flow interaction. Speech Transm Lab Q Progress Status Rep, KTH 26:21–25

    Google Scholar 

  39. Fernandez R, Picard R (2011) Recognizing affect from speech prosody using hierarchical graphical models. Speech Commun 53(9–10):1088–1103

    Article  Google Scholar 

  40. Fonagy I, Magdics K (1963) Emotional patterns in intonation and music. Kommunikationforsch 16:293–326

    Google Scholar 

  41. Gangamohan P, Mittal VK, Yegnanarayana B (2012) A flexible analysis and synthesis tool (FAST) for studying the characteristic features of emotion in speech. In: IEEE international conference on consumer communications and networking conference. Las Vegas, USA pp 266–270

    Google Scholar 

  42. Gangamohan P, Sudarsana RK, Yegnanarayana B (2013) Analysis of emotional speech at subsegmental level. In: INTERSPEECH. Lyon, France, pp 1916–1920

    Google Scholar 

  43. Gangamohan P, Sudarsana RK, Suryakanth VG, Yegnanarayana B (2014) Excitation source features for discrimination of anger and happy emotions. In: INTERSPEECH. Singapore, pp 1253–1257

    Google Scholar 

  44. Gnjatovic M, Rösner D (2010) Inducing genuine emotions in simulated speech-based human-machine interaction: the nimitek corpus. IEEE Trans Affect Comput 1(2):132–144

    Article  Google Scholar 

  45. Gobl C (1988) Voice source dynamics in connected speech. Speech Trans Lab Q Progress Status Rep, KTH 1:123–159

    Google Scholar 

  46. Gobl C (1989) A preliminary study of acoustic voice quality correlates. Speech Trans Lab Q Progress Status Rep, KTH 4:9–21

    Google Scholar 

  47. Gobl C, Chasaide AN (1992) Acoustic characteristics of voice quality. Speech Commun 11(4):481–490

    Article  Google Scholar 

  48. Gobl C, Chasaide AN (2003) The role of voice quality in communicating emotion, mood and attitude. Speech Commun 40(1–2):189–212

    Article  MATH  Google Scholar 

  49. Grichkovtsova I, Morel M, Lacheret A (2012) The role of voice quality and prosodic contour in affective speech perception. Speech Commun 54(3):414–429

    Article  Google Scholar 

  50. Grimm M, Kroschel K, Mower E, Narayanan S (2007) Primitives-based evaluation and estimation of emotions in speech. Speech Commun 49(10–11):787–800

    Article  Google Scholar 

  51. Grimm M, Kroschel K, Narayanan S (2008) The Vera am Mittag German audio-visual emotional speech database. In: International conference on multimedia and expo. Hannover, Germany, pp 865–868

    Google Scholar 

  52. Guruprasad S, Yegnanarayana B (2009) Perceived loudness of speech based on the characteristics of glottal excitation source. J Acoust Soc Am 126(4):2061–2071

    Article  Google Scholar 

  53. Hansen JH, Womack BD (1996) Feature analysis and neural network-based classification of speech under stress. IEEE Trans Speech Audio Process 4(4):307–313

    Article  Google Scholar 

  54. Hanson HM (1997) Glottal characteristics of female speakers: acoustic correlates. J Acoust Soc Am 101(1):466–481

    Article  Google Scholar 

  55. Hassan A, Damper RI (2010) Multi-class and hierarchical SVMs for emotion recognition. In: INTERSPEECH. Chiba, Japan, pp 2354–2357

    Google Scholar 

  56. He L, Lech M, Allen N (2010) On the importance of glottal flow spectral energy for the recognition of emotions in speech. In: INTERSPEECH. Chiba, Japan, pp 2346–2349

    Google Scholar 

  57. Hershey JR, Olsen PA (2007) Approximating the Kullback Leibler divergence between Gaussian mixture models. In: ICASSP, vol 4. Montreal, Quebec, Canada, pp 317–320

    Google Scholar 

  58. Huber R, Batliner A, Buckow J, Nöth E, Warnke V, Niemann H (2000) Recognition of emotion in a realistic dialogue scenario. In: Proceedings of international conference on spoken language processing. Beijing, China, pp 665–668

    Google Scholar 

  59. Hübner D, Vlasenko B, Grosser T, Wendemuth A (2010) Determining optimal features for emotion recognition from speech by applying an evolutionary algorithm. In: INTERSPEECH. Chiba, Japan, pp 2358–2361

    Google Scholar 

  60. Izard CE (1977) Human emotions. Plenum Press, New York

    Book  Google Scholar 

  61. Jeon JH, Xia R, Liu Y (2011) Sentence level emotion recognition based on decisions from subsentence segments. In: ICASSP. Lyon, France, pp 4940–4943

    Google Scholar 

  62. Jeon JH, Le D, Xia R, Liu Y (2013) A preliminary study of cross-lingual emotion recognition from speech: automatic classification versus human perception. In: INTERSPEECH. Prague, Czech Republic, pp 2837–2840

    Google Scholar 

  63. Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. In: European conference on machine learning. London, UK, pp 137–142

    Google Scholar 

  64. Kadiri SR, Gangamohan P, Mittal VK, Yegnanarayana B (2014) Naturalistic audio-visual emotion database. In: International conference on natural language processing. Goa, India, pp 127–134

    Google Scholar 

  65. Kadiri SR, Gangamohan P, Yegnanarayana B (2014) Discriminating neutral and emotional speech using neural networks. In: Interenational conference on natural language processing. Goa, India, pp 119–126

    Google Scholar 

  66. Kadiri SR, Gangamohan P, Gangashetty SV, Yegnanarayana B (2015) Analysis of excitation source features of speech for emotion recognition. In: INTERSPEECH. Dresden, Germany, pp 1032–1036

    Google Scholar 

  67. Keller E (2005) The analysis of voice quality in speech processing. In: Gèrard C, Anna E, Marcos F, Maria M (eds) Lecture notes in computer science. Springer, pp 54–73

    Google Scholar 

  68. Kim W, Hansen JHL (2010) Angry emotion detection from real-life conversational speech by leveraging content structure. In: ICASSP. Dallas, Texas, USA, pp 5166–5169

    Google Scholar 

  69. Kim J, Lee S, Narayanan S (2010) An exploratory study of manifolds of emotional speech. In: ICASSP. Dallas, Texas, USA, pp 5142–5145

    Google Scholar 

  70. Kim J, Park J, Oh Y (2011) On-line speaker adaptation based emotion recognition using incremental emotional information. In: ICASSP. Prague, Czech Republic, pp 4948–4951

    Google Scholar 

  71. Klasmeyer G, Sendlmeier WF (2000) Voice and emotional states. In: Voice quality measurement. Springer, Berlin, Germany, pp 339–358

    Google Scholar 

  72. Klatt DH (1980) Software for a cascade/parallel formant synthesizer. J Acoust Soc Am 67(3):971–995

    Article  Google Scholar 

  73. Koolagudi SG, Sreenivasa Rao K (2012) Emotion recognition from speech: a review. Int J Speech Technol 15(2):99–117

    Article  Google Scholar 

  74. Koolagudi SG, Maity S, Vuppala AK, Chakrabarti S, Sreenivasa Rao K (2009) IITKGP-SESC: speech database for emotion analysis. In: Communications in computer and information science, pp 485–492

    Google Scholar 

  75. Laver John DM (1968) Voice quality and indexical information. Int J Lang Commun Disord 3(1):43–54

    Article  Google Scholar 

  76. Lee C, Mower E, Busso C, Lee S, Narayanan S (2011) Emotion recognition using a hierarchical binary decision tree approach. Speech Commun 53(9–10):1162–1171

    Article  Google Scholar 

  77. Lee CM, Narayanan S (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303

    Article  Google Scholar 

  78. Lee CM, Yildirim S, Bulut M, Kazemzadeh A, Busso C, Deng Z, Lee S, Narayanan S (2004) Emotion recognition based on phoneme classes. In: INTERSPEECH. JejuIsland, Korea, pp 205–211

    Google Scholar 

  79. Lieberman P, Michaels SB (1962) Some aspects of fundamental frequency and envelope amplitude as related to the emotional content of speech. J Acoust Soc Am 34(7):922–927

    Article  Google Scholar 

  80. Lin J, Wu C, Wei W (2013) Emotion recognition of conversational affective speech using temporal course modeling. In: INTERSPEECH. Lyon, France, pp 1336–1340

    Google Scholar 

  81. Luengo I, Navas E, Hernáez I, Sánchez J (2005) Automatic emotion recognition using prosodic parameters. In: INTERSPEECH. Lisbon, Portugal, pp 493–496

    Google Scholar 

  82. Lugger M, Yang B (2007) The relevance of voice quality features in speaker independent emotion recognition. In: ICASSP, vol 4. Honolulu, Hawaii, USA, pp 17–20

    Google Scholar 

  83. Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580

    Article  Google Scholar 

  84. Mansoorizadeh M, Charkari NM (2007) Speech emotion recognition: comparison of speech segmentation approaches. In: Proceedings of IKT, Mashad, Iran

    Google Scholar 

  85. McGilloway S, Cowie R, Douglas-Cowie E, Gielen S, Westerdijk M, Stroeve S (2000) Approaching automatic recognition of emotion from voice: a rough benchmark. In: ISCA tutorial and research workshop (ITRW) on speech and emotion. Newcastle, Northern Ireland, UK

    Google Scholar 

  86. Mittal VK, Yegnanarayana B (2013) Effect of glottal dynamics in the production of shouted speech. J Acoust Soc Am 133(5):3050–3061

    Article  Google Scholar 

  87. Morrison D, Wang R, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Commun 49(2):98–112

    Article  Google Scholar 

  88. Murray IR, Arnott JL (1993) Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J Acoust Soc Am 93(2):1097–1108

    Article  Google Scholar 

  89. Murty KSR, Yegnanarayana B (2008) Epoch extraction from speech signals. IEEE Trans Audio Speech Lang Process 16(8):1602–1613

    Google Scholar 

  90. Nogueiras A, Moreno A, Bonafonte A, Mariño JB (2001) Speech emotion recognition using hidden Markov models. In: EUROSPEECH. Aalborg, Denmark, pp 2679–2682

    Google Scholar 

  91. Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41(4):603–623

    Article  Google Scholar 

  92. Oatley K (1989) The importance of being emotional. New Sci 123:33–36

    Google Scholar 

  93. Pereira C (2000) Dimensions of emotional meaning in speech. In: ISCA tutorial and research workshop (ITRW) on speech and emotion. Northern Ireland, UK

    Google Scholar 

  94. Polzehl T, Sundaram S, Ketabdar H, Wagner M, Metze F (2009) Emotion classification in children’s speech using fusion of acoustic and linguistic features. In: INTERSPEECH. Brighton, UK, pp 340–343

    Google Scholar 

  95. Prasanna SRM, Govind D (2010) Analysis of excitation source information in emotional speech. In: INTERSPEECH. Chiba, Japan, pp 781–784

    Google Scholar 

  96. Rothenberg M (1973) A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. J Acoust Soc Am 53(6):1632–1645

    Article  Google Scholar 

  97. Rozgic V, Ananthakrishnan S, Saleem S, Kumar R, Vembu AN, Prasad R (2012) Emotion recognition using acoustic and lexical features. In: INTERSPEECH. Portland, USA

    Google Scholar 

  98. Scherer KR (1981) Speech and emotional states. In: Darby JK (ed) Speech evaluation in psychiatry. Grune and Stratton, New York

    Google Scholar 

  99. Scherer KR (1984) On the nature and function of emotion: a component process approach. In: Scherer KR, Ekman P (eds) Approaches to emotion. Lawrence Elbraum, Hillsdale

    Google Scholar 

  100. Scherer KR (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40(1–2):227–256

    Article  MATH  Google Scholar 

  101. Scholsberg H (1941) A scale for the judgment of facial expressions. J Exp Psychol 29(6):497–510

    Google Scholar 

  102. Schlosberg H (1954) Three dimensions of emotion. J Psychol Rev 61(2):81–88

    Google Scholar 

  103. Schröder M (2001) Emotional speech synthesis-a review. In: INTERSPEECH. Aalborg,Denmark, pp 561–564

    Google Scholar 

  104. Schröder M (2004) Speech and emotion research: an overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD thesis, Saarland University

    Google Scholar 

  105. Schröder M, Cowie R, Douglas-Cowie E, Westerdijk M, Gielen SC (2001) Acoustic correlates of emotion dimensions in view of speech synthesis. In: INTERSPEECH. Aalborg, Denmark, pp 87–90

    Google Scholar 

  106. Schuller B (2011) Recognizing affect from linguistic information in 3D continuous space. IEEE Trans Affect Comput 2(4):192–205

    Article  Google Scholar 

  107. Schuller B, Rigoll G (2006) Timing levels in segment-based speech emotion recognition. In: INTERSPEECH. Pittsburgh, Pennsylvania, pp 17–21

    Google Scholar 

  108. Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: ICASSP vol 1. Montreal, Quebec, Canada, pp 577–580

    Google Scholar 

  109. Schuller B, Müller R, Lang M, Rigoll G (2005) Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: INTERSPEECH. Lisbon, Portugal, pp 805–808

    Google Scholar 

  110. Schuller B, Villar RJ, Rigoll G, Lang MK (2005) Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition. In: ICASSP. Philadelphia, Pennsylvania, USA, pp 325–328

    Google Scholar 

  111. Schuller B, Batliner A, Steidl S, Seppi D (2009) Emotion recognition from speech: putting ASR in the loop. In: ICASSP. Taipei, Taiwan, pp 4585–4588

    Google Scholar 

  112. Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53(9–10):1062–1087

    Article  Google Scholar 

  113. Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G (2010) Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans Affect Comput 1(2):119–131

    Article  Google Scholar 

  114. Shami M, Verhelst W (2007) An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Commun 49(3):201–212

    Article  Google Scholar 

  115. Shaver P, Schwartz J, kirson D, O’Connor C (1987) Emotion, knowledge: further exploration of a prototype approach. J Personal Soc Psychol 52:1061–1086

    Google Scholar 

  116. Sneddon I, McRorie M, McKeown G, Hanratty J (2012) The Belfast induced natural emotion database. IEEE Trans Affect Comput 3(1):32–41

    Article  Google Scholar 

  117. Steidl S (2009) Automatic classification of emotion related user states in spontaneous children’s speech. PhD thesis, Universität Erlangen-Nürnberg, Germany

    Google Scholar 

  118. Steidl S, Batliner A, Seppi D, Schuller B (2010) On the impact of children’s emotional speech on acoustic and language models. EURASIP J Audio, Speech, and Music Processing

    Google Scholar 

  119. Stein N, Oatley K (1992) Basic emotions: theory and measurement. Cognit Emot 6:161–168

    Article  Google Scholar 

  120. Sun R, Moore II E (2012) A preliminary study on cross-databases emotion recognition using the glottal features in speech. In: INTERSPEECH. Portland, USA, pp 1628–1631

    Google Scholar 

  121. Sun R, Moore II E, Torres JF (2009) Investigating glottal parameters for differentiating emotional categories with similar prosodics. In: ICASSP. Taipei, Taiwan, pp 4509–4512

    Google Scholar 

  122. Sundberg J, Patel S, Bjorkner E, Scherer KR (2011) Interdependencies among voice source parameters in emotional speech. IEEE Trans Affect Comput 2(3):162–174

    Article  Google Scholar 

  123. Tahon M, Degottex G, Devillers L (2012) Usual voice quality features and glottal features for emotional valence detection. In: Speech Prosody. Shanghai, China, pp 693–696

    Google Scholar 

  124. Titze IR (1994) Principles of voice production. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  125. Truong Khiet P, van Leeuwen David A, de Jong Franciska M G (2012) Speech-based recognition of self-reported and observed emotion in a dimensional space. Speech Commun 54(9):1049–1063

    Article  Google Scholar 

  126. Ververidis D, Kotropoulos C (2003) A review of emotional speech databases. In: Proceedings of panhellenic conference on informatics (PCI). Thessaloniki, Greece, pp 560–574

    Google Scholar 

  127. Ververidis D, Kotropoulos C (2005) Emotional speech classification using Gaussian mixture models. In: International symposium on circuits and systems. Kobe, Japan, pp 2871–2874

    Google Scholar 

  128. Vlasenko B, Prylipko D, Philippou-Hübner D, Wendemuth A (2011) Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In: INTERSPEECH. Florence, Italy, pp 1577–1580

    Google Scholar 

  129. Vroomen J, Collier R, Mozziconacci S (1993) Duration and intonation in emotional speech. In: EUROSPEECH, vol 1. Berlin, Germany, pp 577–580

    Google Scholar 

  130. Švec Jan G, Schutte Harm K, Miller Donald G (1999) On pitch jumps between chest and falsetto registers in voice: data from living and excised human larynges. J Acoust Soc Am 106(3):1523–1531

    Article  Google Scholar 

  131. Waaramaa T, Laukkanen AM, Airas M, Alku P (2010) Perception of emotional valences and activity levels from vowel segments of continuous speech. J Voice 24(1):30–38

    Article  Google Scholar 

  132. Williams CE, Stevens KN (1969) On determining the emotional state of pilots during flight: an exploratory study. Aerosp Med 40:1369–1372

    Google Scholar 

  133. Williams CE, Stevens KN (1972) Emotions and speech: some acoustical correlates. J Acoust Soc Am 52(2):1238–1250

    Article  Google Scholar 

  134. Wu S, Falk TH, Chan W (2011) Automatic speech emotion recognition using modulation spectral features. Speech Commun 53(5):768–785

    Article  Google Scholar 

  135. Yegnanarayana B, Dhananjaya N (2013) Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Commun 55(6):782–795

    Article  Google Scholar 

  136. Yegnanarayana B, Murty KSR (2009) Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans Audio Speech Lang Process 17(4):614–624

    Article  Google Scholar 

  137. Yeh L, Chi T (2010) Spectro-temporal modulations for robust speech emotion recognition. In: INTERSPEECH. Chiba, Japan, pp 789–792

    Google Scholar 

  138. Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Gangamohan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Gangamohan, P., Kadiri, S.R., Yegnanarayana, B. (2016). Analysis of Emotional Speech—A Review. In: Esposito, A., Jain, L. (eds) Toward Robotic Socially Believable Behaving Systems - Volume I . Intelligent Systems Reference Library, vol 105. Springer, Cham. https://doi.org/10.1007/978-3-319-31056-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31056-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31055-8

  • Online ISBN: 978-3-319-31056-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics