Stress and Emotion Recognition Using Acoustic Speech Analysis

Lech, Margaret; He, Ling

doi:10.1007/978-3-642-38550-6_9

Margaret Lech⁶ &
Ling He⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 491))

1931 Accesses
5 Citations

Abstract

This chapter describes computational methods for an automatic recognition of stress levels and different types of speaker’s emotions expressed in natural, not acted speech. A range of different acoustic features are examined and compared with respect to the accuracy of speech classification. Nonlinear features such as the area under the TEO autocorrelation envelope derived using different spectral decompositions were compared with features based on the classical linear model of speech production including F0, formants and MFCC. Two classifiers GMM and KNN are independently applied to observe the classification consistency. The experiments used speech under actual stress from the SUSAS database (7 speakers; 3 female and 4 male) and speech with five naturally expressed emotions (neutral, angry, anxious, dysphoric, and happy) from the ORI corpora (71 speakers; 27 female and 44 male). The classification results demonstrated consistency with the nonlinear model of the phonation process indicating that the features related to the harmonic structure and the spectral distribution of the glottal energy provide the most important acoustic cues for stress and emotion recognition in natural speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Airas, M.: TKK Aparat: An environment for voice inverse filtering and parameterization. Logopedics Phoniatrics Vocology 33(1), 49–64 (2008)
Article Google Scholar
Airas, M., Pulakka, H., Bäckström, T., Alku, P.: A toolkit for voice inverse filtering and parametrisation. In: Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech’2005—Eurospeech): 2145–2148, Lisbon, Portugal, 4–8 Sept 2005
Google Scholar
Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2–3), 109–118 (1992)
Article Google Scholar
Ang, J., Dhillon, R., Krupski, A.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings of the International Conference on Spoken Language Processing, ICSLP 2002, Colorado (2002)
Google Scholar
Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70(3), 614–636 (1996)
Article Google Scholar
Barney, A., Shadle, C.H., Davies, P.O.A.L.: Fluid flow in a dynamic mechanical model of the vocal folds and tract. J. Acoust. Soc. Am. 105(1), 444–455 (1999)
Article Google Scholar
Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. IFA Proc. 17, 97–110 (1993)
Google Scholar
Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T.: Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun. 50(6), 487–503 (2008)
Article Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)
Article Google Scholar
Davis, B., Sheeber, L., Hops, H., Tildesley, E.: Adolescent responses to depressive parental behaviors in problem-solving interactions. J. Abnorm. Child Psychol. 28(5), 451–465 (2000)
Article Google Scholar
Donoho, D.L.: Denoising by soft thresholding. IEEE Trans. Inf. Theory 41(3), 613–627 (1995)
Article MathSciNet MATH Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall/CRC, New York (1993)
Book MATH Google Scholar
France, D.J., Shiavi, R.G., Silverman, S., Silverman, M., Wilkes, M.: Acoustical properties of speech as indicators of depression and suicidal risk. Biomed. Eng. IEEE Trans. 47(7), 829–837 (2000)
Article Google Scholar
Gaillard, A.W.K., Wientjes, C.J.E.: Mental load and work stress as two types of energy mobilization. Work Stress 8(2), 141–152 (1994)
Article Google Scholar
Gao, H., Chen, S, Su, G.: Emotion classification of Mandarin speech based on TEO nonlinear features. In: Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, SNPD 2007 (2007)
Google Scholar
Gersho, A.: Vector quantization and signal compression. Kluwer Academic Publishers, Dordrecht (1992)
Book MATH Google Scholar
Grimm, M., Kroschel, K., Mower, E., Narayanan, S.: Primitives-based evaluation and estimation of emotions in speech. Speech Commun. 49(10–11), 787–800 (2007)
Article Google Scholar
Hansen, J.H.L., Wooil, K., Rahurkar, M., Ruzanski, E., Meyerhoff, J.: Robust emotional stressed speech detection. EURASIP J. Adv. Signal Process. 2011, Article ID 906789 (2011)
Google Scholar
Hansen, J.H.L., Bou-Ghazale, S.: Getting started with SUSAS: A speech under simulated and actual stress database. EUROSPEECH-1997, 1743–1746 (1997)
Google Scholar
He, L., Lech, M., Allen, N.: On the importance of glottal flow spectral energy for the recognition of emotions in speech. Interspeech 2010, Makuhari, Japan, 26–30 Sept 2010
Google Scholar
He, L., Lech, M., Maddage, N., Memon, S., Allen, N.: Emotion recognition in spontaneous speech within work and family environments. iCBBE 2009, June 2009 in Beijing, China (2009)
Google Scholar
He, L., Lech, M., Maddage, N., Allen, N.: Emotion recognition in speech of parents of depressed adolescents. iCBBE 2009, Beijing, China, 11–13 June 2009
Google Scholar
He, L., Lech, M., Maddage, N., Memon, S., Allen, N.: Stress and emotion recognition using Log-Gabor filter analysis of speech spectrograms. ACII 2009, Amsterdam, Sept 2009
Google Scholar
Hemant, A.P., Basu, T.K.: Identifying perceptually similar languages using Teager energy based cepstrum. Eng. Lett. 16(1), 2008 (2008)
Google Scholar
Huber, R., Batliner, A., Buckow, J., Noth, E., Warnke, V., Niemann, H.: Recognition of emotion in a realistic dialogue scenario. In: Proceedings of the International Conference on Spoken Language, ICSLP 2000, Beijing, (2000)
Google Scholar
Iliev, A.I., Scordilis, M.S.: Emotion recognition in speech using inter-sentence Glottal statistics. In: Systems, Signals and Image Processing, 2008. IWSSIP 2008. 15th International Conference (2008)
Google Scholar
Iliev, A.I., Scordilis, M.S., Papa, J.P., Falcão, A.X.: Spoken emotion recognition through optimum-path forest classification using glottal features. Comput. Speech Lang. 24(3), 445–460 (2010)
Google Scholar
Iliou, T., Anagnostopoulos, C.N.: Comparison of different classifiers for emotion recognition. In: 13th Panhellenic Conference on Informatics, PCI ‘09 (2009)
Google Scholar
Ingram, R. (ed.): The International Encyclopedia of Depression, Springer, New York (2009)
Google Scholar
Kaiser, J.F.: On a simple algorithm to calculate the ‘energy’ of a signal. Proc. Int. Conf. Acoust. Speech Signal Process. 1, 381–384 (1990)
Article Google Scholar
Kaiser, J.F.: Some useful properties of Teager’s energy operator. Proc. Int. Conf. Acoust. Speech Signal Process. 3, 149–152 (1993)
Article MathSciNet Google Scholar
Kim, E.H., Hyun, K.H., Kim, S.H., Kwak, Y.K.: Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Trans. Mechatronics 14(3), 317–325 (2009)
Article Google Scholar
Kim, E.H., Hyun, K.H., Kim, S.H., Kwak, Y.K.: Speech emotion recognition separately from voiced and unvoiced sound for emotional interaction robot. In: International Conference on Control, Automation and Systems, ICCAS 2008 (2008)
Google Scholar
Khosla, S., Murugappan, S., Gutmark, E.: What can vortices tell us about vocal vibration and voice production. Curr. Opin. Otholaryngology Head Neck Surg. 16, 183–187 (2008)
Article Google Scholar
Khosla, S., Murugappan, S., Paniello, R., Ying, J., Gutmark, E.: Role of vortices in voice production: Norma versus asymmetric tension. Larygoscope 119, 216–221 (2009)
Article Google Scholar
Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C.: Emotion recognition based on phoneme classes. In: Proceedings of the International Conference on Spoken Language Processing, ICSLP 2004, Korea (2004)
Google Scholar
Iida, A., Campbell, N., Iga, S., Higuchi, F., Yasumura, M.: A speech synthesis system with emotion for assisting communication. In: Proceedings of ISCA Workshop on Speech and Emotion, Belfast, vol. 1, 167–172 (2000)
Google Scholar
Leslie, S., Greenberg, J.D.S.: Emotion in psychotherapy: Affect, cognition, and the process of change. Guilford Press, New York (1987)
Google Scholar
Liscombe, J., Riccardi, G., Hakkani-Tur, D.: Using context to improve emotion detection in spoken dialog systems. Interspeech 2005, 1845–1848 (2005)
Google Scholar
Liscombe, J.: Detecting emotion in speech: Experiments in ThreeDomains. In: Proceedings of HLT/NAACL 2006, New York (2006)
Google Scholar
Longoria, N., Sheeber, L., Davis, B.: Living in family environments (LIFE) Coding. A reference manual for coders. Oregon Research Institute (2006)
Google Scholar
Low, L.S.A., Maddage, N., Lech, M., Sheeber, L.B., Allen, N.B.: Detection of clinical depression in adolescents speech during family interactions. IEEE Trans. Biomed. Eng. 58(3), 574–586 (2011)
Article Google Scholar
Low, L.S.A., Lech, M., Maddage, N.C., Allen, N.: Mel frequency cepstral feature and Gaussian mixtures for modeling clinical depression in adolescents. In: Proceedings on IEEE International Conference on Cognitive Informatics, ICCI ‘09 (2009)
Google Scholar
Maragos, P., Kaiser, J.F., Quatieri, T.: On amplitude and frequency demodulation using energy operators. Signal Process. IEEE Trans. 41(4), 1532–1550 (1993)
Article MATH Google Scholar
Maragos, P., Kaiser, J.F., Quatieri, T.: Energy separation in signal modulations with application to speech analysis. IEEE Trans. Signal Process. 41(10), 3024–3051 (1993)
Article MATH Google Scholar
Moore, B.: An introduction to the psychology of hearing. Academic Press, San Diego (2001)
Google Scholar
Moore, E.I.I., Clements, M.A., Peifer, J.W., Weisser, L.: Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Trans. Biomed. Eng. 55(1), 96–107 (2008)
Article Google Scholar
Moore, E.I.I., Clements, M., Peifer, J., Weisser, L.: Comparing objective feature statistics of speech for classifying clinical depression. In: 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEMBS ‘04 (2004)
Google Scholar
Murphy, F.C., Nimmo-Smith, L., Lawrence, L.D.: Functional neuroanatomy of emotions: A meta-analysis Cognitive. Affect. Behav. Neurosci. 2002, 207–233 (2002)
Google Scholar
Myers, D.G.: Theories of emotion. Psychology, 7th edn. Worth Publishers, New York (2004)
Google Scholar
New, T.L., Foo, S.W., DeSilva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)
Article Google Scholar
New, T.L., Foo, S.W., DeSilva, L.C.: Classification of stress in speech using linear and nonlinear features. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (2003)
Google Scholar
Ozdas, A., Shiavi, R.G., Silverman, S.E., Silverman, M.K., Wilkes, D.M.: Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. Biomed. Eng. IEEE Trans. 51(9), 1530–1540 (2004)
Article Google Scholar
Pulakka, H.: Analysis of human voice production using inverse filtering, highspeed imaging, and electroglottography. Master’s thesis, Helsinki University of Technology, Espoo, Finland (2005)
Google Scholar
Quatieri, T.: Speech Signal Processing. Prentice Hall, Englewood Cliffs (2002)
Google Scholar
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals (Signal Processing Series), Prentice-Hall, Englewood Cliffs (1978)
Google Scholar
Reynolds, D., Rose, R.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Article Google Scholar
Scherer, K., Zei, B.: Vocal indicators of affective disorders. Psychother. Psychosom. 49, 179–186 (1998)
Article Google Scholar
Scherer, K.: Expression of emotion in voice and music. J. Voice 9(3), 235–248 (1995)
Article Google Scholar
Scherer, K.R.: Vocal communication of emotion: A review of research paradigms. Speech Commun. 40, 227–256 (2003)
Article MATH Google Scholar
Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: Low level descriptors and functionals. In: Proceedings of Interspeech, pp. 2253–2256 (2007)
Google Scholar
Shinwari, D., Scherer, K.R., Afjey, A., Dewitt, K.: Flow visualization in a model of the glottis with a symmetric and oblique angle. JASA 113, 487–497 (2003)
Article Google Scholar
Steeneken, H.J.M., Hansen, J.H.L.: Speech under stress conditions: overview of the effect on speech production and on system performance. In: Proceedings., 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP ‘99 (1999)
Google Scholar
Teager, H.: Some observations on oral air flow during phonation. Acoust. IEEE Trans. Speech Signal Process. 28(5), 599–601 (1980)
Article Google Scholar
Teager, H.M., Teager, S.: Evidence for nonlinear production mechanisms in the vocal tract. In: NATO Advanced Study Inst. On Speech Production and Speech Modeling, Bonas, France, vol. 55, pp. 241–261. Kluwer Academic Publishers, Boston (1989)
Google Scholar
Tolkmitt, F.J., Scherer, K.R.: Effect of experimentally induced stress on vocal parameters. J. Exp. Psychol. Hum. Percept. Perform. 12(3), 302–313 (1986)
Article Google Scholar
Thayer, R.E.: The biopsychology of mood and arousal. Oxford University Press, New York (1989)
Google Scholar
Torabi, S., Almas Ganj, F., Mohammadian, A.: Semi-supervised classification of speaker’s psychological stress. In: Biomedical Engineering Conference, CIBEC 2008. Cairo International (2008)
Google Scholar
Wagner, J., Vogt, T., André, E.: A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. Lecture Notes in Computer Science, Affective Computing and Intelligent Interaction, Springer, Berlin, Heidelberg (2007)
Google Scholar
Veeneman, D., BeMent, S.: Automatic glottal inverse filtering from speech and electroglottographic signals. IEEE Trans. Acoust. Speech Signal Process. 33(2), 369–377 (1985)
Article Google Scholar
Ververidis, D.K., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Commun. 48, 1162–1181 (2006)
Article Google Scholar
Yildirim, S., Narayanan, S., Potamianos, A.: Detecting emotional state of a child in a conversational computer game. Comput. Speech Lang. 25(1), 29–44 (2011)
Article Google Scholar
Zhao, W., Zhang, C., Frankel, S.H., Mongeaue, L.: Computational aeroacoustics of phonation, part 1: computational methods and sound generation mechanisms. J. Acoust. Soc. Am. 112, 2134–2146 (2002)
Article Google Scholar
Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear feature based classification of speech under stress. Speech Audio Process. IEEE Trans. 9(3), 201–216 (2001)
Article Google Scholar
Zhou, J., Wang, G., Yang, Y., Chen, P.: Speech emotion recognition based on rough set and SVM. In: IEEE International Conference on Cognitive Informatics. ICCI 2006 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, RMIT University, Melbourne, 3001, Australia
Margaret Lech
Department of Medical Informatics and Engineering, School of Electrical Engineering and Information, Sichuan University, Chengdu, China
Ling He

Authors

Margaret Lech
View author publications
You can also search for this author in PubMed Google Scholar
Ling He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Margaret Lech .

Editor information

Editors and Affiliations

Computer Engineering, RMIT University School of Electrical and, Melbourne, Victoria, Australia
Margaret Lech
Singapore Campus, James Cook University Australia, School of Business and IT, Singapore, Singapore
Insu Song
Department of Psychiatry and Behavioral Sciences, University of California, Davis, Sacramento, California, USA
Peter Yellowlees
Psychology Network Pty Ltd, Brisbane, Australia
Joachim Diederich

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lech, M., He, L. (2014). Stress and Emotion Recognition Using Acoustic Speech Analysis. In: Lech, M., Song, I., Yellowlees, P., Diederich, J. (eds) Mental Health Informatics. Studies in Computational Intelligence, vol 491. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38550-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-38550-6_9
Published: 20 November 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38549-0
Online ISBN: 978-3-642-38550-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics