Skip to main content

Stress and Emotion Recognition Using Acoustic Speech Analysis

  • Chapter
  • First Online:
Mental Health Informatics

Part of the book series: Studies in Computational Intelligence ((SCI,volume 491))

Abstract

This chapter describes computational methods for an automatic recognition of stress levels and different types of speaker’s emotions expressed in natural, not acted speech. A range of different acoustic features are examined and compared with respect to the accuracy of speech classification. Nonlinear features such as the area under the TEO autocorrelation envelope derived using different spectral decompositions were compared with features based on the classical linear model of speech production including F0, formants and MFCC. Two classifiers GMM and KNN are independently applied to observe the classification consistency. The experiments used speech under actual stress from the SUSAS database (7 speakers; 3 female and 4 male) and speech with five naturally expressed emotions (neutral, angry, anxious, dysphoric, and happy) from the ORI corpora (71 speakers; 27 female and 44 male). The classification results demonstrated consistency with the nonlinear model of the phonation process indicating that the features related to the harmonic structure and the spectral distribution of the glottal energy provide the most important acoustic cues for stress and emotion recognition in natural speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Airas, M.: TKK Aparat: An environment for voice inverse filtering and parameterization. Logopedics Phoniatrics Vocology 33(1), 49–64 (2008)

    Article  Google Scholar 

  2. Airas, M., Pulakka, H., Bäckström, T., Alku, P.: A toolkit for voice inverse filtering and parametrisation. In: Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech’2005—Eurospeech): 2145–2148, Lisbon, Portugal, 4–8 Sept 2005

    Google Scholar 

  3. Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2–3), 109–118 (1992)

    Article  Google Scholar 

  4. Ang, J., Dhillon, R., Krupski, A.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings of the International Conference on Spoken Language Processing, ICSLP 2002, Colorado (2002)

    Google Scholar 

  5. Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70(3), 614–636 (1996)

    Article  Google Scholar 

  6. Barney, A., Shadle, C.H., Davies, P.O.A.L.: Fluid flow in a dynamic mechanical model of the vocal folds and tract. J. Acoust. Soc. Am. 105(1), 444–455 (1999)

    Article  Google Scholar 

  7. Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. IFA Proc. 17, 97–110 (1993)

    Google Scholar 

  8. Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T.: Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun. 50(6), 487–503 (2008)

    Article  Google Scholar 

  9. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)

    Article  Google Scholar 

  10. Davis, B., Sheeber, L., Hops, H., Tildesley, E.: Adolescent responses to depressive parental behaviors in problem-solving interactions. J. Abnorm. Child Psychol. 28(5), 451–465 (2000)

    Article  Google Scholar 

  11. Donoho, D.L.: Denoising by soft thresholding. IEEE Trans. Inf. Theory 41(3), 613–627 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  12. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall/CRC, New York (1993)

    Book  MATH  Google Scholar 

  13. France, D.J., Shiavi, R.G., Silverman, S., Silverman, M., Wilkes, M.: Acoustical properties of speech as indicators of depression and suicidal risk. Biomed. Eng. IEEE Trans. 47(7), 829–837 (2000)

    Article  Google Scholar 

  14. Gaillard, A.W.K., Wientjes, C.J.E.: Mental load and work stress as two types of energy mobilization. Work Stress 8(2), 141–152 (1994)

    Article  Google Scholar 

  15. Gao, H., Chen, S, Su, G.: Emotion classification of Mandarin speech based on TEO nonlinear features. In: Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, SNPD 2007 (2007)

    Google Scholar 

  16. Gersho, A.: Vector quantization and signal compression. Kluwer Academic Publishers, Dordrecht (1992)

    Book  MATH  Google Scholar 

  17. Grimm, M., Kroschel, K., Mower, E., Narayanan, S.: Primitives-based evaluation and estimation of emotions in speech. Speech Commun. 49(10–11), 787–800 (2007)

    Article  Google Scholar 

  18. Hansen, J.H.L., Wooil, K., Rahurkar, M., Ruzanski, E., Meyerhoff, J.: Robust emotional stressed speech detection. EURASIP J. Adv. Signal Process. 2011, Article ID 906789 (2011)

    Google Scholar 

  19. Hansen, J.H.L., Bou-Ghazale, S.: Getting started with SUSAS: A speech under simulated and actual stress database. EUROSPEECH-1997, 1743–1746 (1997)

    Google Scholar 

  20. He, L., Lech, M., Allen, N.: On the importance of glottal flow spectral energy for the recognition of emotions in speech. Interspeech 2010, Makuhari, Japan, 26–30 Sept 2010

    Google Scholar 

  21. He, L., Lech, M., Maddage, N., Memon, S., Allen, N.: Emotion recognition in spontaneous speech within work and family environments. iCBBE 2009, June 2009 in Beijing, China (2009)

    Google Scholar 

  22. He, L., Lech, M., Maddage, N., Allen, N.: Emotion recognition in speech of parents of depressed adolescents. iCBBE 2009, Beijing, China, 11–13 June 2009

    Google Scholar 

  23. He, L., Lech, M., Maddage, N., Memon, S., Allen, N.: Stress and emotion recognition using Log-Gabor filter analysis of speech spectrograms. ACII 2009, Amsterdam, Sept 2009

    Google Scholar 

  24. Hemant, A.P., Basu, T.K.: Identifying perceptually similar languages using Teager energy based cepstrum. Eng. Lett. 16(1), 2008 (2008)

    Google Scholar 

  25. Huber, R., Batliner, A., Buckow, J., Noth, E., Warnke, V., Niemann, H.: Recognition of emotion in a realistic dialogue scenario. In: Proceedings of the International Conference on Spoken Language, ICSLP 2000, Beijing, (2000)

    Google Scholar 

  26. Iliev, A.I., Scordilis, M.S.: Emotion recognition in speech using inter-sentence Glottal statistics. In: Systems, Signals and Image Processing, 2008. IWSSIP 2008. 15th International Conference (2008)

    Google Scholar 

  27. Iliev, A.I., Scordilis, M.S., Papa, J.P., Falcão, A.X.: Spoken emotion recognition through optimum-path forest classification using glottal features. Comput. Speech Lang. 24(3), 445–460 (2010)

    Google Scholar 

  28. Iliou, T., Anagnostopoulos, C.N.: Comparison of different classifiers for emotion recognition. In: 13th Panhellenic Conference on Informatics, PCI ‘09 (2009)

    Google Scholar 

  29. Ingram, R. (ed.): The International Encyclopedia of Depression, Springer, New York (2009)

    Google Scholar 

  30. Kaiser, J.F.: On a simple algorithm to calculate the ‘energy’ of a signal. Proc. Int. Conf. Acoust. Speech Signal Process. 1, 381–384 (1990)

    Article  Google Scholar 

  31. Kaiser, J.F.: Some useful properties of Teager’s energy operator. Proc. Int. Conf. Acoust. Speech Signal Process. 3, 149–152 (1993)

    Article  MathSciNet  Google Scholar 

  32. Kim, E.H., Hyun, K.H., Kim, S.H., Kwak, Y.K.: Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Trans. Mechatronics 14(3), 317–325 (2009)

    Article  Google Scholar 

  33. Kim, E.H., Hyun, K.H., Kim, S.H., Kwak, Y.K.: Speech emotion recognition separately from voiced and unvoiced sound for emotional interaction robot. In: International Conference on Control, Automation and Systems, ICCAS 2008 (2008)

    Google Scholar 

  34. Khosla, S., Murugappan, S., Gutmark, E.: What can vortices tell us about vocal vibration and voice production. Curr. Opin. Otholaryngology Head Neck Surg. 16, 183–187 (2008)

    Article  Google Scholar 

  35. Khosla, S., Murugappan, S., Paniello, R., Ying, J., Gutmark, E.: Role of vortices in voice production: Norma versus asymmetric tension. Larygoscope 119, 216–221 (2009)

    Article  Google Scholar 

  36. Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C.: Emotion recognition based on phoneme classes. In: Proceedings of the International Conference on Spoken Language Processing, ICSLP 2004, Korea (2004)

    Google Scholar 

  37. Iida, A., Campbell, N., Iga, S., Higuchi, F., Yasumura, M.: A speech synthesis system with emotion for assisting communication. In: Proceedings of ISCA Workshop on Speech and Emotion, Belfast, vol. 1, 167–172 (2000)

    Google Scholar 

  38. Leslie, S., Greenberg, J.D.S.: Emotion in psychotherapy: Affect, cognition, and the process of change. Guilford Press, New York (1987)

    Google Scholar 

  39. Liscombe, J., Riccardi, G., Hakkani-Tur, D.: Using context to improve emotion detection in spoken dialog systems. Interspeech 2005, 1845–1848 (2005)

    Google Scholar 

  40. Liscombe, J.: Detecting emotion in speech: Experiments in ThreeDomains. In: Proceedings of HLT/NAACL 2006, New York (2006)

    Google Scholar 

  41. Longoria, N., Sheeber, L., Davis, B.: Living in family environments (LIFE) Coding. A reference manual for coders. Oregon Research Institute (2006)

    Google Scholar 

  42. Low, L.S.A., Maddage, N., Lech, M., Sheeber, L.B., Allen, N.B.: Detection of clinical depression in adolescents speech during family interactions. IEEE Trans. Biomed. Eng. 58(3), 574–586 (2011)

    Article  Google Scholar 

  43. Low, L.S.A., Lech, M., Maddage, N.C., Allen, N.: Mel frequency cepstral feature and Gaussian mixtures for modeling clinical depression in adolescents. In: Proceedings on IEEE International Conference on Cognitive Informatics, ICCI ‘09 (2009)

    Google Scholar 

  44. Maragos, P., Kaiser, J.F., Quatieri, T.: On amplitude and frequency demodulation using energy operators. Signal Process. IEEE Trans. 41(4), 1532–1550 (1993)

    Article  MATH  Google Scholar 

  45. Maragos, P., Kaiser, J.F., Quatieri, T.: Energy separation in signal modulations with application to speech analysis. IEEE Trans. Signal Process. 41(10), 3024–3051 (1993)

    Article  MATH  Google Scholar 

  46. Moore, B.: An introduction to the psychology of hearing. Academic Press, San Diego (2001)

    Google Scholar 

  47. Moore, E.I.I., Clements, M.A., Peifer, J.W., Weisser, L.: Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Trans. Biomed. Eng. 55(1), 96–107 (2008)

    Article  Google Scholar 

  48. Moore, E.I.I., Clements, M., Peifer, J., Weisser, L.: Comparing objective feature statistics of speech for classifying clinical depression. In: 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEMBS ‘04 (2004)

    Google Scholar 

  49. Murphy, F.C., Nimmo-Smith, L., Lawrence, L.D.: Functional neuroanatomy of emotions: A meta-analysis Cognitive. Affect. Behav. Neurosci. 2002, 207–233 (2002)

    Google Scholar 

  50. Myers, D.G.: Theories of emotion. Psychology, 7th edn. Worth Publishers, New York (2004)

    Google Scholar 

  51. New, T.L., Foo, S.W., DeSilva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)

    Article  Google Scholar 

  52. New, T.L., Foo, S.W., DeSilva, L.C.: Classification of stress in speech using linear and nonlinear features. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (2003)

    Google Scholar 

  53. Ozdas, A., Shiavi, R.G., Silverman, S.E., Silverman, M.K., Wilkes, D.M.: Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. Biomed. Eng. IEEE Trans. 51(9), 1530–1540 (2004)

    Article  Google Scholar 

  54. Pulakka, H.: Analysis of human voice production using inverse filtering, highspeed imaging, and electroglottography. Master’s thesis, Helsinki University of Technology, Espoo, Finland (2005)

    Google Scholar 

  55. Quatieri, T.: Speech Signal Processing. Prentice Hall, Englewood Cliffs (2002)

    Google Scholar 

  56. Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals (Signal Processing Series), Prentice-Hall, Englewood Cliffs (1978)

    Google Scholar 

  57. Reynolds, D., Rose, R.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)

    Article  Google Scholar 

  58. Scherer, K., Zei, B.: Vocal indicators of affective disorders. Psychother. Psychosom. 49, 179–186 (1998)

    Article  Google Scholar 

  59. Scherer, K.: Expression of emotion in voice and music. J. Voice 9(3), 235–248 (1995)

    Article  Google Scholar 

  60. Scherer, K.R.: Vocal communication of emotion: A review of research paradigms. Speech Commun. 40, 227–256 (2003)

    Article  MATH  Google Scholar 

  61. Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: Low level descriptors and functionals. In: Proceedings of Interspeech, pp. 2253–2256 (2007)

    Google Scholar 

  62. Shinwari, D., Scherer, K.R., Afjey, A., Dewitt, K.: Flow visualization in a model of the glottis with a symmetric and oblique angle. JASA 113, 487–497 (2003)

    Article  Google Scholar 

  63. Steeneken, H.J.M., Hansen, J.H.L.: Speech under stress conditions: overview of the effect on speech production and on system performance. In: Proceedings., 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP ‘99 (1999)

    Google Scholar 

  64. Teager, H.: Some observations on oral air flow during phonation. Acoust. IEEE Trans. Speech Signal Process. 28(5), 599–601 (1980)

    Article  Google Scholar 

  65. Teager, H.M., Teager, S.: Evidence for nonlinear production mechanisms in the vocal tract. In: NATO Advanced Study Inst. On Speech Production and Speech Modeling, Bonas, France, vol. 55, pp. 241–261. Kluwer Academic Publishers, Boston (1989)

    Google Scholar 

  66. Tolkmitt, F.J., Scherer, K.R.: Effect of experimentally induced stress on vocal parameters. J. Exp. Psychol. Hum. Percept. Perform. 12(3), 302–313 (1986)

    Article  Google Scholar 

  67. Thayer, R.E.: The biopsychology of mood and arousal. Oxford University Press, New York (1989)

    Google Scholar 

  68. Torabi, S., Almas Ganj, F., Mohammadian, A.: Semi-supervised classification of speaker’s psychological stress. In: Biomedical Engineering Conference, CIBEC 2008. Cairo International (2008)

    Google Scholar 

  69. Wagner, J., Vogt, T., André, E.: A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. Lecture Notes in Computer Science, Affective Computing and Intelligent Interaction, Springer, Berlin, Heidelberg (2007)

    Google Scholar 

  70. Veeneman, D., BeMent, S.: Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Trans. Acoust. Speech Signal Process. 33(2), 369–377 (1985)

    Article  Google Scholar 

  71. Ververidis, D.K., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Commun. 48, 1162–1181 (2006)

    Article  Google Scholar 

  72. Yildirim, S., Narayanan, S., Potamianos, A.: Detecting emotional state of a child in a conversational computer game. Comput. Speech Lang. 25(1), 29–44 (2011)

    Article  Google Scholar 

  73. Zhao, W., Zhang, C., Frankel, S.H., Mongeaue, L.: Computational aeroacoustics of phonation, part 1: computational methods and sound generation mechanisms. J. Acoust. Soc. Am. 112, 2134–2146 (2002)

    Article  Google Scholar 

  74. Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear feature based classification of speech under stress. Speech Audio Process. IEEE Trans. 9(3), 201–216 (2001)

    Article  Google Scholar 

  75. Zhou, J., Wang, G., Yang, Y., Chen, P.: Speech emotion recognition based on rough set and SVM. In: IEEE International Conference on Cognitive Informatics. ICCI 2006 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Margaret Lech .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Lech, M., He, L. (2014). Stress and Emotion Recognition Using Acoustic Speech Analysis. In: Lech, M., Song, I., Yellowlees, P., Diederich, J. (eds) Mental Health Informatics. Studies in Computational Intelligence, vol 491. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38550-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38550-6_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38549-0

  • Online ISBN: 978-3-642-38550-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics