Abstract
In this paper, modulation spectral features (MSFs) are proposed for the automatic emotional recognition for speech signal. The features are extracted from an auditory-inspired long-term spectro-temporal(ST) representation. On an experiment assessing classification of 4 emotion categories, the MSFs show promising performance in comparison with features that are based on mel-frequency cepstral coefficients and perceptual linear prediction coefficients, two commonly used short-term spectral representations. The MSFs further express a substantial improvement in recognition performance when used to augment prosodic features, which have been extensively used for speech emotion recognition. Using both types of features, an overall recognition rate of 91.55 % is obtained for classifying 4 emotion categories.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ishi, C., Ishiguro, H., Hagita, N.: Analysis of the roles and the dynamics of breathy and whispery voice qualities in dialogue speech. EURASIP J. Audio Speech Music Process, article ID 528193, 12 pages (2010)
Falk, T.H., Chan, W.-Y.: A non-intrusive quality measure of dereverberated speech. In: Proc. Internat. Workshop for Acoustic Echo and Noise Control, pp. 155–189 (2008)
Falk, T.H., Chan, W.-Y.: Modulation spectral features for robust far-field speaker identification. IEEE Trans. Audio Speech Language Process. 18, 90–100 (2010)
Falk, T.H., Chan, W.-Y.: Temporal dynamics for blind measurement of room acoustical parameters. IEEE Trans. Instrum. Meas. 59, 978–989 (2010)
Giannakopoulos, T., Pikrakis, A., Theodoridis, S.: A dimensional approach to emotion recognition of speech from movies. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing, pp. 65–68 (2009)
Wu, S., Falk, T., Chan, W.-Y.: Automatic recognition of speech emotion using long-term spectro-temporal features. In: Proc. Internat. Conf. on Digital Signal Processing, pp. 1–6 (2009)
Wollmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., Cowie, R.: Abandoning emotion classes – Towards continuous emotion recognition with modelling of long-range dependencies. In: Proc. Interspeech, pp. 597–600 (2008)
Sun, R., Moore, E., Torres, J.: Investigating, glottal parameters for differentiating emotional categories with similar prosodics. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing, pp. 4509–4512 (2009)
Lugger, M., Yang, B.: Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing, vol. 4, pp. 4945–4948 (2008)
Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: Proc. Internat. Conf. on Multimedia & Expo., pp. 865–868 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qin, Y., Zhang, X. (2011). MSF-Based Automatic Emotional Computing for Speech Signal. In: Zhiguo, G., Luo, X., Chen, J., Wang, F.L., Lei, J. (eds) Emerging Research in Web Information Systems and Mining. WISM 2011. Communications in Computer and Information Science, vol 238. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24273-1_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-24273-1_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24272-4
Online ISBN: 978-3-642-24273-1
eBook Packages: Computer ScienceComputer Science (R0)