Skip to main content

MSF-Based Automatic Emotional Computing for Speech Signal

  • Conference paper
Emerging Research in Web Information Systems and Mining (WISM 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 238))

Included in the following conference series:

  • 889 Accesses

Abstract

In this paper, modulation spectral features (MSFs) are proposed for the automatic emotional recognition for speech signal. The features are extracted from an auditory-inspired long-term spectro-temporal(ST) representation. On an experiment assessing classification of 4 emotion categories, the MSFs show promising performance in comparison with features that are based on mel-frequency cepstral coefficients and perceptual linear prediction coefficients, two commonly used short-term spectral representations. The MSFs further express a substantial improvement in recognition performance when used to augment prosodic features, which have been extensively used for speech emotion recognition. Using both types of features, an overall recognition rate of 91.55 % is obtained for classifying 4 emotion categories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ishi, C., Ishiguro, H., Hagita, N.: Analysis of the roles and the dynamics of breathy and whispery voice qualities in dialogue speech. EURASIP J. Audio Speech Music Process, article ID 528193, 12 pages (2010)

    Google Scholar 

  2. Falk, T.H., Chan, W.-Y.: A non-intrusive quality measure of dereverberated speech. In: Proc. Internat. Workshop for Acoustic Echo and Noise Control, pp. 155–189 (2008)

    Google Scholar 

  3. Falk, T.H., Chan, W.-Y.: Modulation spectral features for robust far-field speaker identification. IEEE Trans. Audio Speech Language Process. 18, 90–100 (2010)

    Article  Google Scholar 

  4. Falk, T.H., Chan, W.-Y.: Temporal dynamics for blind measurement of room acoustical parameters. IEEE Trans. Instrum. Meas. 59, 978–989 (2010)

    Article  Google Scholar 

  5. Giannakopoulos, T., Pikrakis, A., Theodoridis, S.: A dimensional approach to emotion recognition of speech from movies. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing, pp. 65–68 (2009)

    Google Scholar 

  6. Wu, S., Falk, T., Chan, W.-Y.: Automatic recognition of speech emotion using long-term spectro-temporal features. In: Proc. Internat. Conf. on Digital Signal Processing, pp. 1–6 (2009)

    Google Scholar 

  7. Wollmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., Cowie, R.: Abandoning emotion classes – Towards continuous emotion recognition with modelling of long-range dependencies. In: Proc. Interspeech, pp. 597–600 (2008)

    Google Scholar 

  8. Sun, R., Moore, E., Torres, J.: Investigating, glottal parameters for differentiating emotional categories with similar prosodics. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing, pp. 4509–4512 (2009)

    Google Scholar 

  9. Lugger, M., Yang, B.: Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing, vol. 4, pp. 4945–4948 (2008)

    Google Scholar 

  10. Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: Proc. Internat. Conf. on Multimedia & Expo., pp. 865–868 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Qin, Y., Zhang, X. (2011). MSF-Based Automatic Emotional Computing for Speech Signal. In: Zhiguo, G., Luo, X., Chen, J., Wang, F.L., Lei, J. (eds) Emerging Research in Web Information Systems and Mining. WISM 2011. Communications in Computer and Information Science, vol 238. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24273-1_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24273-1_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24272-4

  • Online ISBN: 978-3-642-24273-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics