Advertisement

Stress and Emotion Recognition Using Acoustic Speech Analysis

  • Margaret LechEmail author
  • Ling He
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 491)

Abstract

This chapter describes computational methods for an automatic recognition of stress levels and different types of speaker’s emotions expressed in natural, not acted speech. A range of different acoustic features are examined and compared with respect to the accuracy of speech classification. Nonlinear features such as the area under the TEO autocorrelation envelope derived using different spectral decompositions were compared with features based on the classical linear model of speech production including F0, formants and MFCC. Two classifiers GMM and KNN are independently applied to observe the classification consistency. The experiments used speech under actual stress from the SUSAS database (7 speakers; 3 female and 4 male) and speech with five naturally expressed emotions (neutral, angry, anxious, dysphoric, and happy) from the ORI corpora (71 speakers; 27 female and 44 male). The classification results demonstrated consistency with the nonlinear model of the phonation process indicating that the features related to the harmonic structure and the spectral distribution of the glottal energy provide the most important acoustic cues for stress and emotion recognition in natural speech.

Keywords

Speech Signal Gaussian Mixture Model Emotion Recognition Emotion Classification Correct Classification Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Airas, M.: TKK Aparat: An environment for voice inverse filtering and parameterization. Logopedics Phoniatrics Vocology 33(1), 49–64 (2008)CrossRefGoogle Scholar
  2. 2.
    Airas, M., Pulakka, H., Bäckström, T., Alku, P.: A toolkit for voice inverse filtering and parametrisation. In: Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech’2005—Eurospeech): 2145–2148, Lisbon, Portugal, 4–8 Sept 2005Google Scholar
  3. 3.
    Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2–3), 109–118 (1992)CrossRefGoogle Scholar
  4. 4.
    Ang, J., Dhillon, R., Krupski, A.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings of the International Conference on Spoken Language Processing, ICSLP 2002, Colorado (2002)Google Scholar
  5. 5.
    Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70(3), 614–636 (1996)CrossRefGoogle Scholar
  6. 6.
    Barney, A., Shadle, C.H., Davies, P.O.A.L.: Fluid flow in a dynamic mechanical model of the vocal folds and tract. J. Acoust. Soc. Am. 105(1), 444–455 (1999)CrossRefGoogle Scholar
  7. 7.
    Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. IFA Proc. 17, 97–110 (1993)Google Scholar
  8. 8.
    Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T.: Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun. 50(6), 487–503 (2008)CrossRefGoogle Scholar
  9. 9.
    Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)CrossRefGoogle Scholar
  10. 10.
    Davis, B., Sheeber, L., Hops, H., Tildesley, E.: Adolescent responses to depressive parental behaviors in problem-solving interactions. J. Abnorm. Child Psychol. 28(5), 451–465 (2000)CrossRefGoogle Scholar
  11. 11.
    Donoho, D.L.: Denoising by soft thresholding. IEEE Trans. Inf. Theory 41(3), 613–627 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall/CRC, New York (1993)CrossRefzbMATHGoogle Scholar
  13. 13.
    France, D.J., Shiavi, R.G., Silverman, S., Silverman, M., Wilkes, M.: Acoustical properties of speech as indicators of depression and suicidal risk. Biomed. Eng. IEEE Trans. 47(7), 829–837 (2000)CrossRefGoogle Scholar
  14. 14.
    Gaillard, A.W.K., Wientjes, C.J.E.: Mental load and work stress as two types of energy mobilization. Work Stress 8(2), 141–152 (1994)CrossRefGoogle Scholar
  15. 15.
    Gao, H., Chen, S, Su, G.: Emotion classification of Mandarin speech based on TEO nonlinear features. In: Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, SNPD 2007 (2007)Google Scholar
  16. 16.
    Gersho, A.: Vector quantization and signal compression. Kluwer Academic Publishers, Dordrecht (1992)CrossRefzbMATHGoogle Scholar
  17. 17.
    Grimm, M., Kroschel, K., Mower, E., Narayanan, S.: Primitives-based evaluation and estimation of emotions in speech. Speech Commun. 49(10–11), 787–800 (2007)CrossRefGoogle Scholar
  18. 18.
    Hansen, J.H.L., Wooil, K., Rahurkar, M., Ruzanski, E., Meyerhoff, J.: Robust emotional stressed speech detection. EURASIP J. Adv. Signal Process. 2011, Article ID 906789 (2011)Google Scholar
  19. 19.
    Hansen, J.H.L., Bou-Ghazale, S.: Getting started with SUSAS: A speech under simulated and actual stress database. EUROSPEECH-1997, 1743–1746 (1997)Google Scholar
  20. 20.
    He, L., Lech, M., Allen, N.: On the importance of glottal flow spectral energy for the recognition of emotions in speech. Interspeech 2010, Makuhari, Japan, 26–30 Sept 2010Google Scholar
  21. 21.
    He, L., Lech, M., Maddage, N., Memon, S., Allen, N.: Emotion recognition in spontaneous speech within work and family environments. iCBBE 2009, June 2009 in Beijing, China (2009)Google Scholar
  22. 22.
    He, L., Lech, M., Maddage, N., Allen, N.: Emotion recognition in speech of parents of depressed adolescents. iCBBE 2009, Beijing, China, 11–13 June 2009Google Scholar
  23. 23.
    He, L., Lech, M., Maddage, N., Memon, S., Allen, N.: Stress and emotion recognition using Log-Gabor filter analysis of speech spectrograms. ACII 2009, Amsterdam, Sept 2009Google Scholar
  24. 24.
    Hemant, A.P., Basu, T.K.: Identifying perceptually similar languages using Teager energy based cepstrum. Eng. Lett. 16(1), 2008 (2008)Google Scholar
  25. 25.
    Huber, R., Batliner, A., Buckow, J., Noth, E., Warnke, V., Niemann, H.: Recognition of emotion in a realistic dialogue scenario. In: Proceedings of the International Conference on Spoken Language, ICSLP 2000, Beijing, (2000)Google Scholar
  26. 26.
    Iliev, A.I., Scordilis, M.S.: Emotion recognition in speech using inter-sentence Glottal statistics. In: Systems, Signals and Image Processing, 2008. IWSSIP 2008. 15th International Conference (2008)Google Scholar
  27. 27.
    Iliev, A.I., Scordilis, M.S., Papa, J.P., Falcão, A.X.: Spoken emotion recognition through optimum-path forest classification using glottal features. Comput. Speech Lang. 24(3), 445–460 (2010)Google Scholar
  28. 28.
    Iliou, T., Anagnostopoulos, C.N.: Comparison of different classifiers for emotion recognition. In: 13th Panhellenic Conference on Informatics, PCI ‘09 (2009)Google Scholar
  29. 29.
    Ingram, R. (ed.): The International Encyclopedia of Depression, Springer, New York (2009)Google Scholar
  30. 30.
    Kaiser, J.F.: On a simple algorithm to calculate the ‘energy’ of a signal. Proc. Int. Conf. Acoust. Speech Signal Process. 1, 381–384 (1990)CrossRefGoogle Scholar
  31. 31.
    Kaiser, J.F.: Some useful properties of Teager’s energy operator. Proc. Int. Conf. Acoust. Speech Signal Process. 3, 149–152 (1993)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Kim, E.H., Hyun, K.H., Kim, S.H., Kwak, Y.K.: Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Trans. Mechatronics 14(3), 317–325 (2009)CrossRefGoogle Scholar
  33. 33.
    Kim, E.H., Hyun, K.H., Kim, S.H., Kwak, Y.K.: Speech emotion recognition separately from voiced and unvoiced sound for emotional interaction robot. In: International Conference on Control, Automation and Systems, ICCAS 2008 (2008)Google Scholar
  34. 34.
    Khosla, S., Murugappan, S., Gutmark, E.: What can vortices tell us about vocal vibration and voice production. Curr. Opin. Otholaryngology Head Neck Surg. 16, 183–187 (2008)CrossRefGoogle Scholar
  35. 35.
    Khosla, S., Murugappan, S., Paniello, R., Ying, J., Gutmark, E.: Role of vortices in voice production: Norma versus asymmetric tension. Larygoscope 119, 216–221 (2009)CrossRefGoogle Scholar
  36. 36.
    Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C.: Emotion recognition based on phoneme classes. In: Proceedings of the International Conference on Spoken Language Processing, ICSLP 2004, Korea (2004)Google Scholar
  37. 37.
    Iida, A., Campbell, N., Iga, S., Higuchi, F., Yasumura, M.: A speech synthesis system with emotion for assisting communication. In: Proceedings of ISCA Workshop on Speech and Emotion, Belfast, vol. 1, 167–172 (2000)Google Scholar
  38. 38.
    Leslie, S., Greenberg, J.D.S.: Emotion in psychotherapy: Affect, cognition, and the process of change. Guilford Press, New York (1987)Google Scholar
  39. 39.
    Liscombe, J., Riccardi, G., Hakkani-Tur, D.: Using context to improve emotion detection in spoken dialog systems. Interspeech 2005, 1845–1848 (2005)Google Scholar
  40. 40.
    Liscombe, J.: Detecting emotion in speech: Experiments in ThreeDomains. In: Proceedings of HLT/NAACL 2006, New York (2006)Google Scholar
  41. 41.
    Longoria, N., Sheeber, L., Davis, B.: Living in family environments (LIFE) Coding. A reference manual for coders. Oregon Research Institute (2006)Google Scholar
  42. 42.
    Low, L.S.A., Maddage, N., Lech, M., Sheeber, L.B., Allen, N.B.: Detection of clinical depression in adolescents speech during family interactions. IEEE Trans. Biomed. Eng. 58(3), 574–586 (2011)CrossRefGoogle Scholar
  43. 43.
    Low, L.S.A., Lech, M., Maddage, N.C., Allen, N.: Mel frequency cepstral feature and Gaussian mixtures for modeling clinical depression in adolescents. In: Proceedings on IEEE International Conference on Cognitive Informatics, ICCI ‘09 (2009)Google Scholar
  44. 44.
    Maragos, P., Kaiser, J.F., Quatieri, T.: On amplitude and frequency demodulation using energy operators. Signal Process. IEEE Trans. 41(4), 1532–1550 (1993)CrossRefzbMATHGoogle Scholar
  45. 45.
    Maragos, P., Kaiser, J.F., Quatieri, T.: Energy separation in signal modulations with application to speech analysis. IEEE Trans. Signal Process. 41(10), 3024–3051 (1993)CrossRefzbMATHGoogle Scholar
  46. 46.
    Moore, B.: An introduction to the psychology of hearing. Academic Press, San Diego (2001)Google Scholar
  47. 47.
    Moore, E.I.I., Clements, M.A., Peifer, J.W., Weisser, L.: Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Trans. Biomed. Eng. 55(1), 96–107 (2008)CrossRefGoogle Scholar
  48. 48.
    Moore, E.I.I., Clements, M., Peifer, J., Weisser, L.: Comparing objective feature statistics of speech for classifying clinical depression. In: 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEMBS ‘04 (2004)Google Scholar
  49. 49.
    Murphy, F.C., Nimmo-Smith, L., Lawrence, L.D.: Functional neuroanatomy of emotions: A meta-analysis Cognitive. Affect. Behav. Neurosci. 2002, 207–233 (2002)Google Scholar
  50. 50.
    Myers, D.G.: Theories of emotion. Psychology, 7th edn. Worth Publishers, New York (2004)Google Scholar
  51. 51.
    New, T.L., Foo, S.W., DeSilva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)CrossRefGoogle Scholar
  52. 52.
    New, T.L., Foo, S.W., DeSilva, L.C.: Classification of stress in speech using linear and nonlinear features. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (2003) Google Scholar
  53. 53.
    Ozdas, A., Shiavi, R.G., Silverman, S.E., Silverman, M.K., Wilkes, D.M.: Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. Biomed. Eng. IEEE Trans. 51(9), 1530–1540 (2004)CrossRefGoogle Scholar
  54. 54.
    Pulakka, H.: Analysis of human voice production using inverse filtering, highspeed imaging, and electroglottography. Master’s thesis, Helsinki University of Technology, Espoo, Finland (2005)Google Scholar
  55. 55.
    Quatieri, T.: Speech Signal Processing. Prentice Hall, Englewood Cliffs (2002)Google Scholar
  56. 56.
    Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals (Signal Processing Series), Prentice-Hall, Englewood Cliffs (1978)Google Scholar
  57. 57.
    Reynolds, D., Rose, R.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)CrossRefGoogle Scholar
  58. 58.
    Scherer, K., Zei, B.: Vocal indicators of affective disorders. Psychother. Psychosom. 49, 179–186 (1998)CrossRefGoogle Scholar
  59. 59.
    Scherer, K.: Expression of emotion in voice and music. J. Voice 9(3), 235–248 (1995)CrossRefGoogle Scholar
  60. 60.
    Scherer, K.R.: Vocal communication of emotion: A review of research paradigms. Speech Commun. 40, 227–256 (2003)CrossRefzbMATHGoogle Scholar
  61. 61.
    Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: Low level descriptors and functionals. In: Proceedings of Interspeech, pp. 2253–2256 (2007)Google Scholar
  62. 62.
    Shinwari, D., Scherer, K.R., Afjey, A., Dewitt, K.: Flow visualization in a model of the glottis with a symmetric and oblique angle. JASA 113, 487–497 (2003)CrossRefGoogle Scholar
  63. 63.
    Steeneken, H.J.M., Hansen, J.H.L.: Speech under stress conditions: overview of the effect on speech production and on system performance. In: Proceedings., 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP ‘99 (1999)Google Scholar
  64. 64.
    Teager, H.: Some observations on oral air flow during phonation. Acoust. IEEE Trans. Speech Signal Process. 28(5), 599–601 (1980)CrossRefGoogle Scholar
  65. 65.
    Teager, H.M., Teager, S.: Evidence for nonlinear production mechanisms in the vocal tract. In: NATO Advanced Study Inst. On Speech Production and Speech Modeling, Bonas, France, vol. 55, pp. 241–261. Kluwer Academic Publishers, Boston (1989)Google Scholar
  66. 66.
    Tolkmitt, F.J., Scherer, K.R.: Effect of experimentally induced stress on vocal parameters. J. Exp. Psychol. Hum. Percept. Perform. 12(3), 302–313 (1986)CrossRefGoogle Scholar
  67. 67.
    Thayer, R.E.: The biopsychology of mood and arousal. Oxford University Press, New York (1989)Google Scholar
  68. 68.
    Torabi, S., Almas Ganj, F., Mohammadian, A.: Semi-supervised classification of speaker’s psychological stress. In: Biomedical Engineering Conference, CIBEC 2008. Cairo International (2008)Google Scholar
  69. 69.
    Wagner, J., Vogt, T., André, E.: A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. Lecture Notes in Computer Science, Affective Computing and Intelligent Interaction, Springer, Berlin, Heidelberg (2007)Google Scholar
  70. 70.
    Veeneman, D., BeMent, S.: Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Trans. Acoust. Speech Signal Process. 33(2), 369–377 (1985)CrossRefGoogle Scholar
  71. 71.
    Ververidis, D.K., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Commun. 48, 1162–1181 (2006)CrossRefGoogle Scholar
  72. 72.
    Yildirim, S., Narayanan, S., Potamianos, A.: Detecting emotional state of a child in a conversational computer game. Comput. Speech Lang. 25(1), 29–44 (2011)CrossRefGoogle Scholar
  73. 73.
    Zhao, W., Zhang, C., Frankel, S.H., Mongeaue, L.: Computational aeroacoustics of phonation, part 1: computational methods and sound generation mechanisms. J. Acoust. Soc. Am. 112, 2134–2146 (2002)CrossRefGoogle Scholar
  74. 74.
    Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear feature based classification of speech under stress. Speech Audio Process. IEEE Trans. 9(3), 201–216 (2001)CrossRefGoogle Scholar
  75. 75.
    Zhou, J., Wang, G., Yang, Y., Chen, P.: Speech emotion recognition based on rough set and SVM. In: IEEE International Conference on Cognitive Informatics. ICCI 2006 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.School of Electrical and Computer EngineeringRMIT UniversityMelbourneAustralia
  2. 2.Department of Medical Informatics and EngineeringSchool of Electrical Engineering and Information, Sichuan UniversityChengduChina

Personalised recommendations