Spoken Emotion Recognition Using Deep Learning

Albornoz, E. M.; Sánchez-Gutiérrez, M.; Martinez-Licona, F.; Rufiner, H. L.; Goddard, J.

doi:10.1007/978-3-319-12568-8_13

Spoken Emotion Recognition Using Deep Learning

E. M. Albornoz¹⁷,
M. Sánchez-Gutiérrez¹⁸,
F. Martinez-Licona¹⁸,
H. L. Rufiner¹⁷ &
…
J. Goddard¹⁸

Conference paper

2505 Accesses
13 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8827))

Abstract

Spoken emotion recognition is a multidisciplinary research area that has received increasing attention over the last few years. In this paper, restricted Boltzmann machines and deep belief networks are used to classify emotions in speech. The motivation lies in the recent success reported using these alternative techniques in speech processing and speech recognition. This classifier is compared with a multilayer perceptron classifier, using spectral and prosodic characteristics. A well-known German emotional database is used in the experiments and two methodologies of cross-validation are proposed. Our experimental results show that the deep method achieves an improvement of 8.67% over the baseline in a speaker independent scheme.

Download to read the full chapter text

Chapter PDF

References

Albornoz, E.M., Milone, D.H., Rufiner, H.L.: Spoken emotion recognition using hierarchical classifiers. Computer Speech & Language 25(3), 556–570 (2011)
Article Google Scholar
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., Amir, N.: Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech. Computer Speech & Language 25(1), 4–28 (2011)
Article Google Scholar
Bengio, Y.: Learning Deep Architectures for AI. Foundations and Trends®in Machine Learning 2(1), 1–127 (2009)
Article MathSciNet MATH Google Scholar
Borchert, M., Dusterhoft, A.: Emotions in speech - experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. In: Proc. of IEEE Int. Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), pp. 147–151 (October 2005)
Google Scholar
Brueckner, R., Schuller, B.: Likability classification - a not so deep neural network approach. In: 13th Annual Conference of the International Speech Communication Association, INTERSPEECH 2012, Portland, USA, pp. 1–4 (2012)
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of German Emotional Speech. In: Proc. of 9th European Conference on Speech Communication and Technology (Interspeech), pp. 1517–1520 (September 2005)
Google Scholar
Devillers, L., Vidrascu, L.: Speaker Classification II: Selected Projects. In: Müller, C. (ed.) Speaker Classifcation II. LNCS (LNAI), vol. 4441, pp. 34–42. Springer, Heidelberg (2007)
Chapter Google Scholar
El Ayadi, M., Kamel, M., Karray, F.: Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44(3), 572–587 (2011)
Article MATH Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the International Conference on Multimedia, MM 2010, pp. 1459–1462. ACM, New York (2010)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall (July 1998)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine 29(6), 82–97 (2012)
Article Google Scholar
Hinton, G.E.: A practical guide to training restricted boltzmann machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012)
Google Scholar
Hinton Geoffrey, E.: Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation 14(8), 1771–1800 (2002)
Article MATH Google Scholar
Hinton Geoffrey, E., Simon, O., Yee-Whye, T.: A Fast Learning Algorithm for Deep Belief Nets. Neural Computation 18(7), 1527–1554 (2006), doi:10.1162/neco.2006.18.7.1527
Article MathSciNet MATH Google Scholar
Kim, Y., Lee, H., Provost, E.M.: Deep learning for robust feature generation in audiovisual emotion recognition. In: ICASSP, pp. 3687–3691. IEEE (2013)
Google Scholar
Koolagudi, S., Rao, K.: Emotion recognition from speech using source, system, and prosodic features. International Journal of Speech Technology 15, 265–289 (2012)
Article Google Scholar
Mohamed, A., Sainath, T., Dahl, G., Ramabhadran, B., Hinton, G., Picheny, M.: Deep belief networks using discriminative features for phone recognition. In: IEEE Int. Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5060–5063 (2011)
Google Scholar
Popovic, B., Ostrogonac, S., Delic, V., Janev, M., Stankovic, I.: Deep architectures for automatic emotion recognition based on lip shape. In: The 12th Int. Scientific-Professional Symposium (INFOTEH), Bosnia and Herzegovina (March 2013)
Google Scholar
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Inc., Upper Saddle River (1993)
Google Scholar
Rufiner, H.L., Torres, M.E., Gamero, L.G., Milone, D.H.: Introducing complexity measures in nonlinear physiological signals: application to robust speech recognition. Physica A: Statistical Mechanics and its Applications 332(1), 496–508 (2004)
Article MathSciNet Google Scholar
Sánchez-Gutiérrez, M., Albornoz, E.M., Martínez-Licona, F., Rufiner, H.L., Goddard, J.: Deep learning for emotional speech recognition. In: 6th Mexican Conference on Pattern Recognition, Cancún, México (accepted, June 2014)
Google Scholar
Schmidt, E.M., Kim, Y.E.: Learning emotion-based acoustic features with deep belief networks. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 65–68. IEEE, New Paltz (2011)
Chapter Google Scholar
Stuhlsatz, A., Meyer, C., Eyben, F., ZieIke, T., Meier, G., Schuller, B.: Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 5688–5691 (2011)
Google Scholar
Wulsin, D.: DBN Toolbox v1.0. Department of Bioengineering, University of Pennsylvania (2010), http://www.seas.upenn.edu/~wulsin/
Yang, B., Lugger, M.: Emotion recognition from speech signals using new harmony features. Signal Processing 90(5), 1415–1423 (2010), Special Section on Statistical Signal & Array Processing
Article MATH Google Scholar
Yildirim, S., Narayanan, S., Potamianos, A.: Detecting emotional state of a child in a conversational computer game. Computer Speech & Language 25(1), 29–44 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Centro de Investigación SINC(i), Universidad Nacional del Litoral - CONICET, Argentina
E. M. Albornoz & H. L. Rufiner
Departamento de Ingeniería Eléctrica, Universidad Autónoma Metropolitana, México
M. Sánchez-Gutiérrez, F. Martinez-Licona & J. Goddard

Authors

E. M. Albornoz
View author publications
You can also search for this author in PubMed Google Scholar
M. Sánchez-Gutiérrez
View author publications
You can also search for this author in PubMed Google Scholar
F. Martinez-Licona
View author publications
You can also search for this author in PubMed Google Scholar
H. L. Rufiner
View author publications
You can also search for this author in PubMed Google Scholar
J. Goddard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering and Computer Science, CINVESTAV, Guadalajara, Jalisco, México
Eduardo Bayro-Corrochano
Department of Computer Science, University of York, YO10 5GH, Deramore Lane, York, UK
Edwin Hancock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Albornoz, E.M., Sánchez-Gutiérrez, M., Martinez-Licona, F., Rufiner, H.L., Goddard, J. (2014). Spoken Emotion Recognition Using Deep Learning. In: Bayro-Corrochano, E., Hancock, E. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2014. Lecture Notes in Computer Science, vol 8827. Springer, Cham. https://doi.org/10.1007/978-3-319-12568-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-12568-8_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12567-1
Online ISBN: 978-3-319-12568-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)