Multi-algorithm Fusion for Speech Emotion Recognition

Verma, Gyanendra K.; Tiwary, U. S.; Agrawal, Shaishav

doi:10.1007/978-3-642-22720-2_47

Gyanendra K. Verma⁶,
U. S. Tiwary⁶ &
Shaishav Agrawal⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 192))

Included in the following conference series:

International Conference on Advances in Computing and Communications

1640 Accesses
6 Citations

Abstract

In this paper, we have proposed a speech emotion recognition system based on multi-algorithm fusion. Mel Frequency Cepstral Coefficients (MFCC) and Discrete Wavelet Transform (DWT), the two prominent algorithms for speech analysis, have been used to extract emotion information from speech signal. MFCC, a representation of the short-term power spectrum of a sound is a classical approach to analyze speech signal whilst the DWT, a multiresolution approach mainly approximate the frequency information along with time information. Feature level fusion of algorithms has been performed after extraction of features by acoustic analysis of speech emotion signal. The final emotion state was determined by classification using Support Vector Machine. Popular Berlin emotion database is used for evaluation of the proposed system. The results achieved are very promising as the proposed fusion algorithm performed well compared to individual algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cohn, J.F., Katz, G.S.: Bimodal expressions of emotion by face and voice. In: Workshop on Face/Gesture Recognition and their Applications, the Sixth ACM International Multimedia Conference, Bristol, England (1998)
Google Scholar
Fasel, B., Luettin, J.: Automatic facial expression analysis: A survey. Pattern Recognition 36, 259–275 (2003)
Article MATH Google Scholar
Kudiri, K.M., Verma, G.K., Gohel, B.: Relative Amplitude based Features for Emotion Detection from Speech. In: 3rd IEEE Int. Conf. on Signal and Image Processing, pp. 301–304 (2010)
Google Scholar
Rizon, M.: Discrete Wavelet Transform Based Classification of Human Emotions Using Electroencephalogram Signals. American Journal of Applied Sciences 7(7), 865–872 (2010)
Article Google Scholar
Shah, F., et al.: Discrete Wavelet Transforms and Artificial Neural Networks for Speech Emotion Recognition. International Journal of Computer Theory and Engineering 2(3), 1793–8201 (2010)
Google Scholar
Kwon, O.-W.: Emotion Recognition by Speech Signals. In: EUROSPEECH-2003, Geneva (2003)
Google Scholar
Mao, X.: Speech Emotion Recognition based on a Hybrid of HMM/ANN. In: Proceedings of the 7th WSEAS International Conference on Applied Informatics and Communications, Athens, Greece, August 24-26 (2007)
Google Scholar
Liqin, F., et al.: Relative Speech Emotion Recognition Based Artificial Neural Network. In: IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application (2008)
Google Scholar
http://en.wikipedia.org/wiki/Mel-frequency_cepstrum
Dutta, T.: Dynamic Time Warping Based Approach to Text Dependent Speaker Identification Using Spectrograms. In: Congress on Image and Signal Processing, vol. 2, pp. 354–360 (2008)
Google Scholar
Tzanetakis, G., Essl, G., Cook, P.: Audio Analysis using the Discrete Wavelet Transform. In: Proc. Conf. in Acoustics and Music Theory Applications, Skiathos, Greece (2001)
Google Scholar
Lindasalwa, M., Begam, M., Elamvazuthi, I.: Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. Jour. of Computing 2(3), 138–143 (2010)
Google Scholar
Toh, A.M., Togneri, R., Northolt, S.: Spectral entropy as speech features for speech recognition. In: The Proceedings of PEECS, Perth, pp. 22–25 (2005)
Google Scholar
Kan, P.L.E., Allen, T., Quigley, F.: A GMM-Based Speaker Identification System on FPGA. In: 6th International Symposium on Reconfigurable Computing: Architectures, Tools and Applications. LNCS. Bangkok, Thailand (March 2010)
Google Scholar
Burkhardt, F., Paeschke, A.: A database of German emotional speech. In: Interspeech, Lisbon, Portugal, pp. 1517–1520 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Information Technology, Allahabad, Allahabad, India, 211012
Gyanendra K. Verma, U. S. Tiwary & Shaishav Agrawal

Authors

Gyanendra K. Verma
View author publications
You can also search for this author in PubMed Google Scholar
U. S. Tiwary
View author publications
You can also search for this author in PubMed Google Scholar
Shaishav Agrawal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Intelligence Research Labs (MIR Labs), Auburn, 98071-2259, Washington, USA
Ajith Abraham
Departamento de Comunicaciones, Universidad Politcnica de Valencia, 46071, Valencia, Spain
Jaime Lloret Mauri
Avaya Labs Research, Basking Ridge, NJ, USA
John F. Buford
University of Massachusetts, 100 Morrissey Blvd., 02125-3393, Boston, MA, USA
Junichi Suzuki
Rajagiri School of Engineering and Technology, Rajagiri Valley, Kakkanad, 682 039, Kochi, India
Sabu M. Thampi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Verma, G.K., Tiwary, U.S., Agrawal, S. (2011). Multi-algorithm Fusion for Speech Emotion Recognition. In: Abraham, A., Mauri, J.L., Buford, J.F., Suzuki, J., Thampi, S.M. (eds) Advances in Computing and Communications. ACC 2011. Communications in Computer and Information Science, vol 192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22720-2_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-22720-2_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22719-6
Online ISBN: 978-3-642-22720-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics