Abstract
This chapter describes computational methods for an automatic recognition of stress levels and different types of speaker’s emotions expressed in natural, not acted speech. A range of different acoustic features are examined and compared with respect to the accuracy of speech classification. Nonlinear features such as the area under the TEO autocorrelation envelope derived using different spectral decompositions were compared with features based on the classical linear model of speech production including F0, formants and MFCC. Two classifiers GMM and KNN are independently applied to observe the classification consistency. The experiments used speech under actual stress from the SUSAS database (7 speakers; 3 female and 4 male) and speech with five naturally expressed emotions (neutral, angry, anxious, dysphoric, and happy) from the ORI corpora (71 speakers; 27 female and 44 male). The classification results demonstrated consistency with the nonlinear model of the phonation process indicating that the features related to the harmonic structure and the spectral distribution of the glottal energy provide the most important acoustic cues for stress and emotion recognition in natural speech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Airas, M.: TKK Aparat: An environment for voice inverse filtering and parameterization. Logopedics Phoniatrics Vocology 33(1), 49–64 (2008)
Airas, M., Pulakka, H., Bäckström, T., Alku, P.: A toolkit for voice inverse filtering and parametrisation. In: Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech’2005—Eurospeech): 2145–2148, Lisbon, Portugal, 4–8 Sept 2005
Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2–3), 109–118 (1992)
Ang, J., Dhillon, R., Krupski, A.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings of the International Conference on Spoken Language Processing, ICSLP 2002, Colorado (2002)
Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70(3), 614–636 (1996)
Barney, A., Shadle, C.H., Davies, P.O.A.L.: Fluid flow in a dynamic mechanical model of the vocal folds and tract. J. Acoust. Soc. Am. 105(1), 444–455 (1999)
Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. IFA Proc. 17, 97–110 (1993)
Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T.: Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun. 50(6), 487–503 (2008)
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)
Davis, B., Sheeber, L., Hops, H., Tildesley, E.: Adolescent responses to depressive parental behaviors in problem-solving interactions. J. Abnorm. Child Psychol. 28(5), 451–465 (2000)
Donoho, D.L.: Denoising by soft thresholding. IEEE Trans. Inf. Theory 41(3), 613–627 (1995)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall/CRC, New York (1993)
France, D.J., Shiavi, R.G., Silverman, S., Silverman, M., Wilkes, M.: Acoustical properties of speech as indicators of depression and suicidal risk. Biomed. Eng. IEEE Trans. 47(7), 829–837 (2000)
Gaillard, A.W.K., Wientjes, C.J.E.: Mental load and work stress as two types of energy mobilization. Work Stress 8(2), 141–152 (1994)
Gao, H., Chen, S, Su, G.: Emotion classification of Mandarin speech based on TEO nonlinear features. In: Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, SNPD 2007 (2007)
Gersho, A.: Vector quantization and signal compression. Kluwer Academic Publishers, Dordrecht (1992)
Grimm, M., Kroschel, K., Mower, E., Narayanan, S.: Primitives-based evaluation and estimation of emotions in speech. Speech Commun. 49(10–11), 787–800 (2007)
Hansen, J.H.L., Wooil, K., Rahurkar, M., Ruzanski, E., Meyerhoff, J.: Robust emotional stressed speech detection. EURASIP J. Adv. Signal Process. 2011, Article ID 906789 (2011)
Hansen, J.H.L., Bou-Ghazale, S.: Getting started with SUSAS: A speech under simulated and actual stress database. EUROSPEECH-1997, 1743–1746 (1997)
He, L., Lech, M., Allen, N.: On the importance of glottal flow spectral energy for the recognition of emotions in speech. Interspeech 2010, Makuhari, Japan, 26–30 Sept 2010
He, L., Lech, M., Maddage, N., Memon, S., Allen, N.: Emotion recognition in spontaneous speech within work and family environments. iCBBE 2009, June 2009 in Beijing, China (2009)
He, L., Lech, M., Maddage, N., Allen, N.: Emotion recognition in speech of parents of depressed adolescents. iCBBE 2009, Beijing, China, 11–13 June 2009
He, L., Lech, M., Maddage, N., Memon, S., Allen, N.: Stress and emotion recognition using Log-Gabor filter analysis of speech spectrograms. ACII 2009, Amsterdam, Sept 2009
Hemant, A.P., Basu, T.K.: Identifying perceptually similar languages using Teager energy based cepstrum. Eng. Lett. 16(1), 2008 (2008)
Huber, R., Batliner, A., Buckow, J., Noth, E., Warnke, V., Niemann, H.: Recognition of emotion in a realistic dialogue scenario. In: Proceedings of the International Conference on Spoken Language, ICSLP 2000, Beijing, (2000)
Iliev, A.I., Scordilis, M.S.: Emotion recognition in speech using inter-sentence Glottal statistics. In: Systems, Signals and Image Processing, 2008. IWSSIP 2008. 15th International Conference (2008)
Iliev, A.I., Scordilis, M.S., Papa, J.P., Falcão, A.X.: Spoken emotion recognition through optimum-path forest classification using glottal features. Comput. Speech Lang. 24(3), 445–460 (2010)
Iliou, T., Anagnostopoulos, C.N.: Comparison of different classifiers for emotion recognition. In: 13th Panhellenic Conference on Informatics, PCI ‘09 (2009)
Ingram, R. (ed.): The International Encyclopedia of Depression, Springer, New York (2009)
Kaiser, J.F.: On a simple algorithm to calculate the ‘energy’ of a signal. Proc. Int. Conf. Acoust. Speech Signal Process. 1, 381–384 (1990)
Kaiser, J.F.: Some useful properties of Teager’s energy operator. Proc. Int. Conf. Acoust. Speech Signal Process. 3, 149–152 (1993)
Kim, E.H., Hyun, K.H., Kim, S.H., Kwak, Y.K.: Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Trans. Mechatronics 14(3), 317–325 (2009)
Kim, E.H., Hyun, K.H., Kim, S.H., Kwak, Y.K.: Speech emotion recognition separately from voiced and unvoiced sound for emotional interaction robot. In: International Conference on Control, Automation and Systems, ICCAS 2008 (2008)
Khosla, S., Murugappan, S., Gutmark, E.: What can vortices tell us about vocal vibration and voice production. Curr. Opin. Otholaryngology Head Neck Surg. 16, 183–187 (2008)
Khosla, S., Murugappan, S., Paniello, R., Ying, J., Gutmark, E.: Role of vortices in voice production: Norma versus asymmetric tension. Larygoscope 119, 216–221 (2009)
Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C.: Emotion recognition based on phoneme classes. In: Proceedings of the International Conference on Spoken Language Processing, ICSLP 2004, Korea (2004)
Iida, A., Campbell, N., Iga, S., Higuchi, F., Yasumura, M.: A speech synthesis system with emotion for assisting communication. In: Proceedings of ISCA Workshop on Speech and Emotion, Belfast, vol. 1, 167–172 (2000)
Leslie, S., Greenberg, J.D.S.: Emotion in psychotherapy: Affect, cognition, and the process of change. Guilford Press, New York (1987)
Liscombe, J., Riccardi, G., Hakkani-Tur, D.: Using context to improve emotion detection in spoken dialog systems. Interspeech 2005, 1845–1848 (2005)
Liscombe, J.: Detecting emotion in speech: Experiments in ThreeDomains. In: Proceedings of HLT/NAACL 2006, New York (2006)
Longoria, N., Sheeber, L., Davis, B.: Living in family environments (LIFE) Coding. A reference manual for coders. Oregon Research Institute (2006)
Low, L.S.A., Maddage, N., Lech, M., Sheeber, L.B., Allen, N.B.: Detection of clinical depression in adolescents speech during family interactions. IEEE Trans. Biomed. Eng. 58(3), 574–586 (2011)
Low, L.S.A., Lech, M., Maddage, N.C., Allen, N.: Mel frequency cepstral feature and Gaussian mixtures for modeling clinical depression in adolescents. In: Proceedings on IEEE International Conference on Cognitive Informatics, ICCI ‘09 (2009)
Maragos, P., Kaiser, J.F., Quatieri, T.: On amplitude and frequency demodulation using energy operators. Signal Process. IEEE Trans. 41(4), 1532–1550 (1993)
Maragos, P., Kaiser, J.F., Quatieri, T.: Energy separation in signal modulations with application to speech analysis. IEEE Trans. Signal Process. 41(10), 3024–3051 (1993)
Moore, B.: An introduction to the psychology of hearing. Academic Press, San Diego (2001)
Moore, E.I.I., Clements, M.A., Peifer, J.W., Weisser, L.: Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Trans. Biomed. Eng. 55(1), 96–107 (2008)
Moore, E.I.I., Clements, M., Peifer, J., Weisser, L.: Comparing objective feature statistics of speech for classifying clinical depression. In: 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEMBS ‘04 (2004)
Murphy, F.C., Nimmo-Smith, L., Lawrence, L.D.: Functional neuroanatomy of emotions: A meta-analysis Cognitive. Affect. Behav. Neurosci. 2002, 207–233 (2002)
Myers, D.G.: Theories of emotion. Psychology, 7th edn. Worth Publishers, New York (2004)
New, T.L., Foo, S.W., DeSilva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)
New, T.L., Foo, S.W., DeSilva, L.C.: Classification of stress in speech using linear and nonlinear features. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (2003)
Ozdas, A., Shiavi, R.G., Silverman, S.E., Silverman, M.K., Wilkes, D.M.: Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. Biomed. Eng. IEEE Trans. 51(9), 1530–1540 (2004)
Pulakka, H.: Analysis of human voice production using inverse filtering, highspeed imaging, and electroglottography. Master’s thesis, Helsinki University of Technology, Espoo, Finland (2005)
Quatieri, T.: Speech Signal Processing. Prentice Hall, Englewood Cliffs (2002)
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals (Signal Processing Series), Prentice-Hall, Englewood Cliffs (1978)
Reynolds, D., Rose, R.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Scherer, K., Zei, B.: Vocal indicators of affective disorders. Psychother. Psychosom. 49, 179–186 (1998)
Scherer, K.: Expression of emotion in voice and music. J. Voice 9(3), 235–248 (1995)
Scherer, K.R.: Vocal communication of emotion: A review of research paradigms. Speech Commun. 40, 227–256 (2003)
Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: Low level descriptors and functionals. In: Proceedings of Interspeech, pp. 2253–2256 (2007)
Shinwari, D., Scherer, K.R., Afjey, A., Dewitt, K.: Flow visualization in a model of the glottis with a symmetric and oblique angle. JASA 113, 487–497 (2003)
Steeneken, H.J.M., Hansen, J.H.L.: Speech under stress conditions: overview of the effect on speech production and on system performance. In: Proceedings., 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP ‘99 (1999)
Teager, H.: Some observations on oral air flow during phonation. Acoust. IEEE Trans. Speech Signal Process. 28(5), 599–601 (1980)
Teager, H.M., Teager, S.: Evidence for nonlinear production mechanisms in the vocal tract. In: NATO Advanced Study Inst. On Speech Production and Speech Modeling, Bonas, France, vol. 55, pp. 241–261. Kluwer Academic Publishers, Boston (1989)
Tolkmitt, F.J., Scherer, K.R.: Effect of experimentally induced stress on vocal parameters. J. Exp. Psychol. Hum. Percept. Perform. 12(3), 302–313 (1986)
Thayer, R.E.: The biopsychology of mood and arousal. Oxford University Press, New York (1989)
Torabi, S., Almas Ganj, F., Mohammadian, A.: Semi-supervised classification of speaker’s psychological stress. In: Biomedical Engineering Conference, CIBEC 2008. Cairo International (2008)
Wagner, J., Vogt, T., André, E.: A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. Lecture Notes in Computer Science, Affective Computing and Intelligent Interaction, Springer, Berlin, Heidelberg (2007)
Veeneman, D., BeMent, S.: Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Trans. Acoust. Speech Signal Process. 33(2), 369–377 (1985)
Ververidis, D.K., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Commun. 48, 1162–1181 (2006)
Yildirim, S., Narayanan, S., Potamianos, A.: Detecting emotional state of a child in a conversational computer game. Comput. Speech Lang. 25(1), 29–44 (2011)
Zhao, W., Zhang, C., Frankel, S.H., Mongeaue, L.: Computational aeroacoustics of phonation, part 1: computational methods and sound generation mechanisms. J. Acoust. Soc. Am. 112, 2134–2146 (2002)
Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear feature based classification of speech under stress. Speech Audio Process. IEEE Trans. 9(3), 201–216 (2001)
Zhou, J., Wang, G., Yang, Y., Chen, P.: Speech emotion recognition based on rough set and SVM. In: IEEE International Conference on Cognitive Informatics. ICCI 2006 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lech, M., He, L. (2014). Stress and Emotion Recognition Using Acoustic Speech Analysis. In: Lech, M., Song, I., Yellowlees, P., Diederich, J. (eds) Mental Health Informatics. Studies in Computational Intelligence, vol 491. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38550-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-38550-6_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38549-0
Online ISBN: 978-3-642-38550-6
eBook Packages: EngineeringEngineering (R0)