MSF-Based Automatic Emotional Computing for Speech Signal

Qin, Yuqiang; Zhang, Xueying

doi:10.1007/978-3-642-24273-1_33

Yuqiang Qin^6,7 &
Xueying Zhang⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 238))

Included in the following conference series:

International Conference on Web Information Systems and Mining

889 Accesses

Abstract

In this paper, modulation spectral features (MSFs) are proposed for the automatic emotional recognition for speech signal. The features are extracted from an auditory-inspired long-term spectro-temporal(ST) representation. On an experiment assessing classification of 4 emotion categories, the MSFs show promising performance in comparison with features that are based on mel-frequency cepstral coefficients and perceptual linear prediction coefficients, two commonly used short-term spectral representations. The MSFs further express a substantial improvement in recognition performance when used to augment prosodic features, which have been extensively used for speech emotion recognition. Using both types of features, an overall recognition rate of 91.55 % is obtained for classifying 4 emotion categories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ishi, C., Ishiguro, H., Hagita, N.: Analysis of the roles and the dynamics of breathy and whispery voice qualities in dialogue speech. EURASIP J. Audio Speech Music Process, article ID 528193, 12 pages (2010)
Google Scholar
Falk, T.H., Chan, W.-Y.: A non-intrusive quality measure of dereverberated speech. In: Proc. Internat. Workshop for Acoustic Echo and Noise Control, pp. 155–189 (2008)
Google Scholar
Falk, T.H., Chan, W.-Y.: Modulation spectral features for robust far-field speaker identification. IEEE Trans. Audio Speech Language Process. 18, 90–100 (2010)
Article Google Scholar
Falk, T.H., Chan, W.-Y.: Temporal dynamics for blind measurement of room acoustical parameters. IEEE Trans. Instrum. Meas. 59, 978–989 (2010)
Article Google Scholar
Giannakopoulos, T., Pikrakis, A., Theodoridis, S.: A dimensional approach to emotion recognition of speech from movies. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing, pp. 65–68 (2009)
Google Scholar
Wu, S., Falk, T., Chan, W.-Y.: Automatic recognition of speech emotion using long-term spectro-temporal features. In: Proc. Internat. Conf. on Digital Signal Processing, pp. 1–6 (2009)
Google Scholar
Wollmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., Cowie, R.: Abandoning emotion classes – Towards continuous emotion recognition with modelling of long-range dependencies. In: Proc. Interspeech, pp. 597–600 (2008)
Google Scholar
Sun, R., Moore, E., Torres, J.: Investigating, glottal parameters for differentiating emotional categories with similar prosodics. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing, pp. 4509–4512 (2009)
Google Scholar
Lugger, M., Yang, B.: Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing, vol. 4, pp. 4945–4948 (2008)
Google Scholar
Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: Proc. Internat. Conf. on Multimedia & Expo., pp. 865–868 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Taiyuan University of Technology, Taiyuan, China
Yuqiang Qin & Xueying Zhang
Taiyuan University of Science and Technology, Taiyuan, China
Yuqiang Qin

Authors

Yuqiang Qin
View author publications
You can also search for this author in PubMed Google Scholar
Xueying Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, University of Macau, Taipa, Macau, China
Gong Zhiguo
School of Computer, Shanghai University, 200444, Shanghai, China
Xiangfeng Luo
School of Computer and Software, Taiyuan University of Technology, 030024, Taiyuan, China
Junjie Chen
Caritas Institute of Higher Education, 18 Chui Ling Road, Tseung Kwan, Hong Kong SAR, China
Fu Lee Wang
School of Computer and Information Engineering, Shanghai University of Electric Power, 200090, Shanghai, China
Jingsheng Lei

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qin, Y., Zhang, X. (2011). MSF-Based Automatic Emotional Computing for Speech Signal. In: Zhiguo, G., Luo, X., Chen, J., Wang, F.L., Lei, J. (eds) Emerging Research in Web Information Systems and Mining. WISM 2011. Communications in Computer and Information Science, vol 238. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24273-1_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-24273-1_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24272-4
Online ISBN: 978-3-642-24273-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics