Mental Health Informatics pp 163-184 | Cite as
Stress and Emotion Recognition Using Acoustic Speech Analysis
- 4 Citations
- 1.6k Downloads
Abstract
This chapter describes computational methods for an automatic recognition of stress levels and different types of speaker’s emotions expressed in natural, not acted speech. A range of different acoustic features are examined and compared with respect to the accuracy of speech classification. Nonlinear features such as the area under the TEO autocorrelation envelope derived using different spectral decompositions were compared with features based on the classical linear model of speech production including F0, formants and MFCC. Two classifiers GMM and KNN are independently applied to observe the classification consistency. The experiments used speech under actual stress from the SUSAS database (7 speakers; 3 female and 4 male) and speech with five naturally expressed emotions (neutral, angry, anxious, dysphoric, and happy) from the ORI corpora (71 speakers; 27 female and 44 male). The classification results demonstrated consistency with the nonlinear model of the phonation process indicating that the features related to the harmonic structure and the spectral distribution of the glottal energy provide the most important acoustic cues for stress and emotion recognition in natural speech.
Keywords
Speech Signal Gaussian Mixture Model Emotion Recognition Emotion Classification Correct Classification RateReferences
- 1.Airas, M.: TKK Aparat: An environment for voice inverse filtering and parameterization. Logopedics Phoniatrics Vocology 33(1), 49–64 (2008)CrossRefGoogle Scholar
- 2.Airas, M., Pulakka, H., Bäckström, T., Alku, P.: A toolkit for voice inverse filtering and parametrisation. In: Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech’2005—Eurospeech): 2145–2148, Lisbon, Portugal, 4–8 Sept 2005Google Scholar
- 3.Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2–3), 109–118 (1992)CrossRefGoogle Scholar
- 4.Ang, J., Dhillon, R., Krupski, A.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings of the International Conference on Spoken Language Processing, ICSLP 2002, Colorado (2002)Google Scholar
- 5.Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70(3), 614–636 (1996)CrossRefGoogle Scholar
- 6.Barney, A., Shadle, C.H., Davies, P.O.A.L.: Fluid flow in a dynamic mechanical model of the vocal folds and tract. J. Acoust. Soc. Am. 105(1), 444–455 (1999)CrossRefGoogle Scholar
- 7.Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. IFA Proc. 17, 97–110 (1993)Google Scholar
- 8.Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T.: Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun. 50(6), 487–503 (2008)CrossRefGoogle Scholar
- 9.Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)CrossRefGoogle Scholar
- 10.Davis, B., Sheeber, L., Hops, H., Tildesley, E.: Adolescent responses to depressive parental behaviors in problem-solving interactions. J. Abnorm. Child Psychol. 28(5), 451–465 (2000)CrossRefGoogle Scholar
- 11.Donoho, D.L.: Denoising by soft thresholding. IEEE Trans. Inf. Theory 41(3), 613–627 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
- 12.Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall/CRC, New York (1993)CrossRefzbMATHGoogle Scholar
- 13.France, D.J., Shiavi, R.G., Silverman, S., Silverman, M., Wilkes, M.: Acoustical properties of speech as indicators of depression and suicidal risk. Biomed. Eng. IEEE Trans. 47(7), 829–837 (2000)CrossRefGoogle Scholar
- 14.Gaillard, A.W.K., Wientjes, C.J.E.: Mental load and work stress as two types of energy mobilization. Work Stress 8(2), 141–152 (1994)CrossRefGoogle Scholar
- 15.Gao, H., Chen, S, Su, G.: Emotion classification of Mandarin speech based on TEO nonlinear features. In: Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, SNPD 2007 (2007)Google Scholar
- 16.Gersho, A.: Vector quantization and signal compression. Kluwer Academic Publishers, Dordrecht (1992)CrossRefzbMATHGoogle Scholar
- 17.Grimm, M., Kroschel, K., Mower, E., Narayanan, S.: Primitives-based evaluation and estimation of emotions in speech. Speech Commun. 49(10–11), 787–800 (2007)CrossRefGoogle Scholar
- 18.Hansen, J.H.L., Wooil, K., Rahurkar, M., Ruzanski, E., Meyerhoff, J.: Robust emotional stressed speech detection. EURASIP J. Adv. Signal Process. 2011, Article ID 906789 (2011)Google Scholar
- 19.Hansen, J.H.L., Bou-Ghazale, S.: Getting started with SUSAS: A speech under simulated and actual stress database. EUROSPEECH-1997, 1743–1746 (1997)Google Scholar
- 20.He, L., Lech, M., Allen, N.: On the importance of glottal flow spectral energy for the recognition of emotions in speech. Interspeech 2010, Makuhari, Japan, 26–30 Sept 2010Google Scholar
- 21.He, L., Lech, M., Maddage, N., Memon, S., Allen, N.: Emotion recognition in spontaneous speech within work and family environments. iCBBE 2009, June 2009 in Beijing, China (2009)Google Scholar
- 22.He, L., Lech, M., Maddage, N., Allen, N.: Emotion recognition in speech of parents of depressed adolescents. iCBBE 2009, Beijing, China, 11–13 June 2009Google Scholar
- 23.He, L., Lech, M., Maddage, N., Memon, S., Allen, N.: Stress and emotion recognition using Log-Gabor filter analysis of speech spectrograms. ACII 2009, Amsterdam, Sept 2009Google Scholar
- 24.Hemant, A.P., Basu, T.K.: Identifying perceptually similar languages using Teager energy based cepstrum. Eng. Lett. 16(1), 2008 (2008)Google Scholar
- 25.Huber, R., Batliner, A., Buckow, J., Noth, E., Warnke, V., Niemann, H.: Recognition of emotion in a realistic dialogue scenario. In: Proceedings of the International Conference on Spoken Language, ICSLP 2000, Beijing, (2000)Google Scholar
- 26.Iliev, A.I., Scordilis, M.S.: Emotion recognition in speech using inter-sentence Glottal statistics. In: Systems, Signals and Image Processing, 2008. IWSSIP 2008. 15th International Conference (2008)Google Scholar
- 27.Iliev, A.I., Scordilis, M.S., Papa, J.P., Falcão, A.X.: Spoken emotion recognition through optimum-path forest classification using glottal features. Comput. Speech Lang. 24(3), 445–460 (2010)Google Scholar
- 28.Iliou, T., Anagnostopoulos, C.N.: Comparison of different classifiers for emotion recognition. In: 13th Panhellenic Conference on Informatics, PCI ‘09 (2009)Google Scholar
- 29.Ingram, R. (ed.): The International Encyclopedia of Depression, Springer, New York (2009)Google Scholar
- 30.Kaiser, J.F.: On a simple algorithm to calculate the ‘energy’ of a signal. Proc. Int. Conf. Acoust. Speech Signal Process. 1, 381–384 (1990)CrossRefGoogle Scholar
- 31.Kaiser, J.F.: Some useful properties of Teager’s energy operator. Proc. Int. Conf. Acoust. Speech Signal Process. 3, 149–152 (1993)MathSciNetCrossRefGoogle Scholar
- 32.Kim, E.H., Hyun, K.H., Kim, S.H., Kwak, Y.K.: Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Trans. Mechatronics 14(3), 317–325 (2009)CrossRefGoogle Scholar
- 33.Kim, E.H., Hyun, K.H., Kim, S.H., Kwak, Y.K.: Speech emotion recognition separately from voiced and unvoiced sound for emotional interaction robot. In: International Conference on Control, Automation and Systems, ICCAS 2008 (2008)Google Scholar
- 34.Khosla, S., Murugappan, S., Gutmark, E.: What can vortices tell us about vocal vibration and voice production. Curr. Opin. Otholaryngology Head Neck Surg. 16, 183–187 (2008)CrossRefGoogle Scholar
- 35.Khosla, S., Murugappan, S., Paniello, R., Ying, J., Gutmark, E.: Role of vortices in voice production: Norma versus asymmetric tension. Larygoscope 119, 216–221 (2009)CrossRefGoogle Scholar
- 36.Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C.: Emotion recognition based on phoneme classes. In: Proceedings of the International Conference on Spoken Language Processing, ICSLP 2004, Korea (2004)Google Scholar
- 37.Iida, A., Campbell, N., Iga, S., Higuchi, F., Yasumura, M.: A speech synthesis system with emotion for assisting communication. In: Proceedings of ISCA Workshop on Speech and Emotion, Belfast, vol. 1, 167–172 (2000)Google Scholar
- 38.Leslie, S., Greenberg, J.D.S.: Emotion in psychotherapy: Affect, cognition, and the process of change. Guilford Press, New York (1987)Google Scholar
- 39.Liscombe, J., Riccardi, G., Hakkani-Tur, D.: Using context to improve emotion detection in spoken dialog systems. Interspeech 2005, 1845–1848 (2005)Google Scholar
- 40.Liscombe, J.: Detecting emotion in speech: Experiments in ThreeDomains. In: Proceedings of HLT/NAACL 2006, New York (2006)Google Scholar
- 41.Longoria, N., Sheeber, L., Davis, B.: Living in family environments (LIFE) Coding. A reference manual for coders. Oregon Research Institute (2006)Google Scholar
- 42.Low, L.S.A., Maddage, N., Lech, M., Sheeber, L.B., Allen, N.B.: Detection of clinical depression in adolescents speech during family interactions. IEEE Trans. Biomed. Eng. 58(3), 574–586 (2011)CrossRefGoogle Scholar
- 43.Low, L.S.A., Lech, M., Maddage, N.C., Allen, N.: Mel frequency cepstral feature and Gaussian mixtures for modeling clinical depression in adolescents. In: Proceedings on IEEE International Conference on Cognitive Informatics, ICCI ‘09 (2009)Google Scholar
- 44.Maragos, P., Kaiser, J.F., Quatieri, T.: On amplitude and frequency demodulation using energy operators. Signal Process. IEEE Trans. 41(4), 1532–1550 (1993)CrossRefzbMATHGoogle Scholar
- 45.Maragos, P., Kaiser, J.F., Quatieri, T.: Energy separation in signal modulations with application to speech analysis. IEEE Trans. Signal Process. 41(10), 3024–3051 (1993)CrossRefzbMATHGoogle Scholar
- 46.Moore, B.: An introduction to the psychology of hearing. Academic Press, San Diego (2001)Google Scholar
- 47.Moore, E.I.I., Clements, M.A., Peifer, J.W., Weisser, L.: Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Trans. Biomed. Eng. 55(1), 96–107 (2008)CrossRefGoogle Scholar
- 48.Moore, E.I.I., Clements, M., Peifer, J., Weisser, L.: Comparing objective feature statistics of speech for classifying clinical depression. In: 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEMBS ‘04 (2004)Google Scholar
- 49.Murphy, F.C., Nimmo-Smith, L., Lawrence, L.D.: Functional neuroanatomy of emotions: A meta-analysis Cognitive. Affect. Behav. Neurosci. 2002, 207–233 (2002)Google Scholar
- 50.Myers, D.G.: Theories of emotion. Psychology, 7th edn. Worth Publishers, New York (2004)Google Scholar
- 51.New, T.L., Foo, S.W., DeSilva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)CrossRefGoogle Scholar
- 52.New, T.L., Foo, S.W., DeSilva, L.C.: Classification of stress in speech using linear and nonlinear features. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (2003) Google Scholar
- 53.Ozdas, A., Shiavi, R.G., Silverman, S.E., Silverman, M.K., Wilkes, D.M.: Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. Biomed. Eng. IEEE Trans. 51(9), 1530–1540 (2004)CrossRefGoogle Scholar
- 54.Pulakka, H.: Analysis of human voice production using inverse filtering, highspeed imaging, and electroglottography. Master’s thesis, Helsinki University of Technology, Espoo, Finland (2005)Google Scholar
- 55.Quatieri, T.: Speech Signal Processing. Prentice Hall, Englewood Cliffs (2002)Google Scholar
- 56.Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals (Signal Processing Series), Prentice-Hall, Englewood Cliffs (1978)Google Scholar
- 57.Reynolds, D., Rose, R.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)CrossRefGoogle Scholar
- 58.Scherer, K., Zei, B.: Vocal indicators of affective disorders. Psychother. Psychosom. 49, 179–186 (1998)CrossRefGoogle Scholar
- 59.Scherer, K.: Expression of emotion in voice and music. J. Voice 9(3), 235–248 (1995)CrossRefGoogle Scholar
- 60.Scherer, K.R.: Vocal communication of emotion: A review of research paradigms. Speech Commun. 40, 227–256 (2003)CrossRefzbMATHGoogle Scholar
- 61.Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: Low level descriptors and functionals. In: Proceedings of Interspeech, pp. 2253–2256 (2007)Google Scholar
- 62.Shinwari, D., Scherer, K.R., Afjey, A., Dewitt, K.: Flow visualization in a model of the glottis with a symmetric and oblique angle. JASA 113, 487–497 (2003)CrossRefGoogle Scholar
- 63.Steeneken, H.J.M., Hansen, J.H.L.: Speech under stress conditions: overview of the effect on speech production and on system performance. In: Proceedings., 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP ‘99 (1999)Google Scholar
- 64.Teager, H.: Some observations on oral air flow during phonation. Acoust. IEEE Trans. Speech Signal Process. 28(5), 599–601 (1980)CrossRefGoogle Scholar
- 65.Teager, H.M., Teager, S.: Evidence for nonlinear production mechanisms in the vocal tract. In: NATO Advanced Study Inst. On Speech Production and Speech Modeling, Bonas, France, vol. 55, pp. 241–261. Kluwer Academic Publishers, Boston (1989)Google Scholar
- 66.Tolkmitt, F.J., Scherer, K.R.: Effect of experimentally induced stress on vocal parameters. J. Exp. Psychol. Hum. Percept. Perform. 12(3), 302–313 (1986)CrossRefGoogle Scholar
- 67.Thayer, R.E.: The biopsychology of mood and arousal. Oxford University Press, New York (1989)Google Scholar
- 68.Torabi, S., Almas Ganj, F., Mohammadian, A.: Semi-supervised classification of speaker’s psychological stress. In: Biomedical Engineering Conference, CIBEC 2008. Cairo International (2008)Google Scholar
- 69.Wagner, J., Vogt, T., André, E.: A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. Lecture Notes in Computer Science, Affective Computing and Intelligent Interaction, Springer, Berlin, Heidelberg (2007)Google Scholar
- 70.Veeneman, D., BeMent, S.: Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Trans. Acoust. Speech Signal Process. 33(2), 369–377 (1985)CrossRefGoogle Scholar
- 71.Ververidis, D.K., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Commun. 48, 1162–1181 (2006)CrossRefGoogle Scholar
- 72.Yildirim, S., Narayanan, S., Potamianos, A.: Detecting emotional state of a child in a conversational computer game. Comput. Speech Lang. 25(1), 29–44 (2011)CrossRefGoogle Scholar
- 73.Zhao, W., Zhang, C., Frankel, S.H., Mongeaue, L.: Computational aeroacoustics of phonation, part 1: computational methods and sound generation mechanisms. J. Acoust. Soc. Am. 112, 2134–2146 (2002)CrossRefGoogle Scholar
- 74.Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear feature based classification of speech under stress. Speech Audio Process. IEEE Trans. 9(3), 201–216 (2001)CrossRefGoogle Scholar
- 75.Zhou, J., Wang, G., Yang, Y., Chen, P.: Speech emotion recognition based on rough set and SVM. In: IEEE International Conference on Cognitive Informatics. ICCI 2006 (2006)Google Scholar