Skip to main content

Multi-algorithm Fusion for Speech Emotion Recognition

  • Conference paper
Advances in Computing and Communications (ACC 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 192))

Included in the following conference series:

Abstract

In this paper, we have proposed a speech emotion recognition system based on multi-algorithm fusion. Mel Frequency Cepstral Coefficients (MFCC) and Discrete Wavelet Transform (DWT), the two prominent algorithms for speech analysis, have been used to extract emotion information from speech signal. MFCC, a representation of the short-term power spectrum of a sound is a classical approach to analyze speech signal whilst the DWT, a multiresolution approach mainly approximate the frequency information along with time information. Feature level fusion of algorithms has been performed after extraction of features by acoustic analysis of speech emotion signal. The final emotion state was determined by classification using Support Vector Machine. Popular Berlin emotion database is used for evaluation of the proposed system. The results achieved are very promising as the proposed fusion algorithm performed well compared to individual algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cohn, J.F., Katz, G.S.: Bimodal expressions of emotion by face and voice. In: Workshop on Face/Gesture Recognition and their Applications, the Sixth ACM International Multimedia Conference, Bristol, England (1998)

    Google Scholar 

  2. Fasel, B., Luettin, J.: Automatic facial expression analysis: A survey. Pattern Recognition 36, 259–275 (2003)

    Article  MATH  Google Scholar 

  3. Kudiri, K.M., Verma, G.K., Gohel, B.: Relative Amplitude based Features for Emotion Detection from Speech. In: 3rd IEEE Int. Conf. on Signal and Image Processing, pp. 301–304 (2010)

    Google Scholar 

  4. Rizon, M.: Discrete Wavelet Transform Based Classification of Human Emotions Using Electroencephalogram Signals. American Journal of Applied Sciences 7(7), 865–872 (2010)

    Article  Google Scholar 

  5. Shah, F., et al.: Discrete Wavelet Transforms and Artificial Neural Networks for Speech Emotion Recognition. International Journal of Computer Theory and Engineering 2(3), 1793–8201 (2010)

    Google Scholar 

  6. Kwon, O.-W.: Emotion Recognition by Speech Signals. In: EUROSPEECH-2003, Geneva (2003)

    Google Scholar 

  7. Mao, X.: Speech Emotion Recognition based on a Hybrid of HMM/ANN. In: Proceedings of the 7th WSEAS International Conference on Applied Informatics and Communications, Athens, Greece, August 24-26 (2007)

    Google Scholar 

  8. Liqin, F., et al.: Relative Speech Emotion Recognition Based Artificial Neural Network. In: IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application (2008)

    Google Scholar 

  9. http://en.wikipedia.org/wiki/Mel-frequency_cepstrum

  10. Dutta, T.: Dynamic Time Warping Based Approach to Text Dependent Speaker Identification Using Spectrograms. In: Congress on Image and Signal Processing, vol. 2, pp. 354–360 (2008)

    Google Scholar 

  11. Tzanetakis, G., Essl, G., Cook, P.: Audio Analysis using the Discrete Wavelet Transform. In: Proc. Conf. in Acoustics and Music Theory Applications, Skiathos, Greece (2001)

    Google Scholar 

  12. Lindasalwa, M., Begam, M., Elamvazuthi, I.: Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. Jour. of Computing 2(3), 138–143 (2010)

    Google Scholar 

  13. Toh, A.M., Togneri, R., Northolt, S.: Spectral entropy as speech features for speech recognition. In: The Proceedings of PEECS, Perth, pp. 22–25 (2005)

    Google Scholar 

  14. Kan, P.L.E., Allen, T., Quigley, F.: A GMM-Based Speaker Identification System on FPGA. In: 6th International Symposium on Reconfigurable Computing: Architectures, Tools and Applications. LNCS. Bangkok, Thailand (March 2010)

    Google Scholar 

  15. Burkhardt, F., Paeschke, A.: A database of German emotional speech. In: Interspeech, Lisbon, Portugal, pp. 1517–1520 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Verma, G.K., Tiwary, U.S., Agrawal, S. (2011). Multi-algorithm Fusion for Speech Emotion Recognition. In: Abraham, A., Mauri, J.L., Buford, J.F., Suzuki, J., Thampi, S.M. (eds) Advances in Computing and Communications. ACC 2011. Communications in Computer and Information Science, vol 192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22720-2_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22720-2_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22719-6

  • Online ISBN: 978-3-642-22720-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics