Speech Emotion Recognition Using Local and Global Features

Gao, Yuanbo; Li, Baobin; Wang, Ning; Zhu, Tingshao

doi:10.1007/978-3-319-70772-3_1

Yuanbo Gao²⁰,
Baobin Li²⁰,
Ning Wang²¹ &
…
Tingshao Zhu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10654))

Included in the following conference series:

International Conference on Brain Informatics

2025 Accesses
19 Citations

Abstract

Speech is an easy and useful way to detect speakers’ mental and psychological health, and automatic emotion recognition in speech has been investigated widely in the fields of human-machine interaction, psychology, psychiatry, etc. In this paper, we extract prosodic and spectral features including pitch, MFCC, intensity, ZCR and LSP to establish the emotion recognition model with SVM classifier. In particular, we find different frame duration and overlap have different influences on final results. So, Depth-First-Search method is applied to find the best parameters. Experimental results on two known databases, EMODB and RAVDESS, show that this model works well, and our speech features are enough effectively in characterizing and recognizing emotions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Minker, W., Pittermann, J., Pittermann, A., Strauß, P.M., Bühler, D.: Challenges in speech-based human-computer interfaces. Int. J. Speech Technol. 10(2–3), 109–119 (2007)
Article Google Scholar
Ntalampiras, S., Potamitis, I., Fakotakis, N.: An adaptive framework for acoustic monitoring of potential hazards. EURASIP J. Audio Speech Music Process. 2009, 13 (2009)
Article MATH Google Scholar
Cummings, K.E., Clements, M.A., Hansen, J.H.: Estimation and comparison of the glottal source waveform across stress styles using glottal inverse filtering. In: Proceedings of the IEEE Energy and Information Technologies in the Southeast. Southeastcon 1989, pp. 776–781. IEEE (1989)
Google Scholar
Seppänen, T., Väyrynen, E., Toivanen, J.: Prosody-based classification of emotions in spoken finnish. In: INTERSPEECH (2003)
Google Scholar
Origlia, A., Galatà, V., Ludusan, B.: Automatic classification of emotions via global and local prosodic features on a multilingual emotional database. In: Proceeding of the 2010 Speech Prosody. Chicago (2010)
Google Scholar
Atal, B.S.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J. Acoust. Soc. Am. 55(6), 1304–1312 (1974)
Article Google Scholar
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Article Google Scholar
Ververidis, D., Kotropoulos, C., Pitas, I.: Automatic emotional speech classification. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2004), vol. 1, IEEE I-593 (2004)
Google Scholar
Fernandez, R., Picard, R.W.: Classical and novel discriminant features for affect recognition from speech. In: Interspeech, pp. 473–476 (2005)
Google Scholar
Bou-Ghazale, S.E., Hansen, J.H.: A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans. Speech Audio Process. 8(4), 429–442 (2000)
Article Google Scholar
Rabiner, L.R., Schafer, R.W.: Digital processing of speech signals (prentice-hall series in signal processing) (1978)
Google Scholar
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden markov models. Speech Commun. 41(4), 603–623 (2003)
Article Google Scholar
Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53(5), 768–785 (2011)
Article Google Scholar
Li, X., Tao, J., Johnson, M.T., Soltis, J., Savage, A., Leong, K.M., Newman, J.D.: Stress and emotion classification using jitter and shimmer features. In: IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2007, vol. 4, IEEE IV-1081 (2007)
Google Scholar
Lugger, M., Janoir, M.E., Yang, B.: Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In: 2009 17th European Signal Processing Conference, pp. 1225–1229. IEEE (2009)
Google Scholar
Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. Int. J. Smart Home 6(2), 101–108 (2012)
Google Scholar
Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: features and classification models. Digit. Signal Process. 22(6), 1154–1160 (2012)
Article MathSciNet Google Scholar
Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse autoencoder-based feature transfer learning for speech emotion recognition 7971, 511–516 (2013)
Google Scholar
Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech, pp. 223–227 (2014)
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of german emotional speech. Interspeech 5, 1517–1520 (2005)
Google Scholar
Livingstone, S., Peck, K., Russo, F.: Ravdess: the ryerson audio-visual database of emotional speech and song. In: 22nd Annual Meeting of the Canadian Society for Brain, Behaviour and Cognitive Science (CSBBCS) (2012)
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on Multimedia, pp. 1459–1462. ACM (2010)
Google Scholar
Kotti, M., Paternò, F.: Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. Int. J. Speech Technol. 15(2), 131–150 (2012)
Article Google Scholar
Lampropoulos, A.S., Tsihrintzis, G.A.: Evaluation of MPEG-7 Descriptors for Speech Emotional Recognition (2012)
Google Scholar
Wang, K., An, N., Li, B.N., Zhang, Y., Li, L.: Speech emotion recognition using fourier parameters. IEEE Trans. Affect. Comput. 6(1), 69–75 (2015)
Article Google Scholar
Zhang, B., Essl, G., Provost, E.M.: Recognizing emotion from singing and speaking using shared models. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 139–145. IEEE (2015)
Google Scholar

Download references

Acknowledgments

The research was supported in part by NSFC under Grants 11301504 and U1536104, in part by National Basic Research Program of China (973 Program2014CB744600).

Author information

Authors and Affiliations

School of Computer and Control, University of Chinese Academy of Sciences, Beijing, 100190, China
Yuanbo Gao & Baobin Li
Beijing Institue of Electronics Technology and Application, Beijing, 100091, China
Ning Wang
Institute of Psychology Chinese Academy of Sciences, Beijing, 100101, China
Tingshao Zhu

Authors

Yuanbo Gao
View author publications
You can also search for this author in PubMed Google Scholar
Baobin Li
View author publications
You can also search for this author in PubMed Google Scholar
Ning Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tingshao Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baobin Li .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Yi Zeng
Beijing Normal University, Beijing, China
Yong He
KTH Royal Institute of Technology and Karolinska Institute, Stockholm, Sweden
Jeanette Hellgren Kotaleski
University of California, San Diego, San Diego, California, USA
Maryann Martone
Chinese Academy of Sciences, Beijing, China
Bo Xu
Allen Institute for Brain Science, Seattle, Washington, USA
Hanchuan Peng
Wuhan National Lab Optoelectronics, Huazhong University of Science and Technology, Wuhan, China
Qingming Luo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, Y., Li, B., Wang, N., Zhu, T. (2017). Speech Emotion Recognition Using Local and Global Features. In: Zeng, Y., et al. Brain Informatics. BI 2017. Lecture Notes in Computer Science(), vol 10654. Springer, Cham. https://doi.org/10.1007/978-3-319-70772-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-70772-3_1
Published: 04 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70771-6
Online ISBN: 978-3-319-70772-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics