Deep Learning for Emotional Speech Recognition

Sánchez-Gutiérrez, Máximo E.; Albornoz, E. Marcelo; Martinez-Licona, Fabiola; Rufiner, H. Leonardo; Goddard, John

doi:10.1007/978-3-319-07491-7_32

Máximo E. Sánchez-Gutiérrez¹⁹,
E. Marcelo Albornoz²⁰,
Fabiola Martinez-Licona¹⁹,
H. Leonardo Rufiner²⁰ &
…
John Goddard¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8495))

Included in the following conference series:

Mexican Conference on Pattern Recognition

2197 Accesses
14 Citations

Abstract

Emotional speech recognition is a multidisciplinary research area that has received increasing attention over the last few years. The present paper considers the application of restricted Boltzmann machines (RBM) and deep belief networks (DBN) to the difficult task of automatic Spanish emotional speech recognition. The principal motivation lies in the success reported in a growing body of work employing these techniques as alternatives to traditional methods in speech processing and speech recognition. Here a well-known Spanish emotional speech database is used in order to extensively experiment with, and compare, different combinations of parameters and classifiers. It is found that with a suitable choice of parameters, RBM and DBN can achieve comparable results to other classifiers.

Download to read the full chapter text

Chapter PDF

Spoken Emotion Recognition Using Deep Learning

Tamil Speech Emotion Recognition Using Deep Belief Network(DBN)

Speech emotion recognition with unsupervised feature learning

Article 13 May 2015

Keywords

References

Douglas-Cowie, C.: Humaine d5f deliverable, obtainable from http://emotion-research.net/download/pilot-db/
Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: 10th Annual Conference of the International, Speech Communication, Association INTERSPEECH 2009, pp. 312–315 (2009)
Google Scholar
Schuller, B., Steidl, S., Batliner, A., Noth, E., Vinciarelli, A., Burkhardt, F., van Son, R., Weninger, F., Eyben, F., Bocklet, T., Mohammadi, G., Weiss, B.: The interspeech 2012 speaker trait challenge. In: Proc. INTERSPEECH (2012)
Google Scholar
Scherer, K.R.: A blueprint for affective computing: a sourcebook. Oxford University Press, Oxford (2010)
Google Scholar
Mohamed, A., Sainath, T., Dahl, G.E., Ramabhadran, B., Hinton, G., Picheny, M.: Deep belief networks using discriminative features for phone recognition. In: ICASSP 2011. ISCA, Portland (2012)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine (2012)
Google Scholar
Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, G., Schuller, B.: Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, Czech Republic, pp. 5688–5691 (2011)
Google Scholar
Bruckner, R., Schuller, B.: Likability classification - a not so deep neural network approach. In: Proceedings of INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association (2012)
Google Scholar
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation 14, 1771–1800 (2002)
Article MATH Google Scholar
Hinton, G.E.: A practical guide to training restricted boltzmann machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012)
Chapter Google Scholar
Bengio, Y.: Learning deep architectures for ai. Foundations and Trends in Machine Learning 2, 1–127 (2009)
Article MATH Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Computation 18, 1527–1554 (2006)
Article MATH MathSciNet Google Scholar
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44(3), 572–587 (2011)
Article MATH Google Scholar
Catalogue, E.: Emotional speech synthesis database, catalogue reference: Elra-s0329, http://catalog.elra.info
Rabiner, L., Juang, B.H.: Fundamentals of speech recognition. Prentice Hall PTR (1993)
Google Scholar
Deller, J., Proakis, J., Hansen, J.: Discrete-time processing of speech signals. Prentice Hall PTR, Upper Saddle River (1993)
Google Scholar
Albornoz, E., Milone, D., Rufiner, H.: Spoken emotion recognition using hierarchical classifiers. Computer Speech and Language 25, 556–570 (2011)
Article Google Scholar
Eyben, F., Wollmer, M., Schuller, B.: Opensmile - the munich versatile and fast open-source audio feature extractor. In: Proc. ACM Multimedia (MM). ACM, Florence (2010)
Google Scholar
Wulsin, D.: Dbn toolbox v1. Department of Bioengineering. University of Pennsylvania (2010), http://www.seas.upenn.edu/~wulsin/
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (2011)
Google Scholar
Guide, M.U.: Mathworks (2011), http://www.mathworks.com

Download references

Author information

Authors and Affiliations

Departamento de Ingeniería Eléctrica, Universidad Autónoma Metropolitana, México
Máximo E. Sánchez-Gutiérrez, Fabiola Martinez-Licona & John Goddard
Centro de Investigación SINC(i), Universidad Nacional del Litoral - CONICET, Argentina
E. Marcelo Albornoz & H. Leonardo Rufiner

Authors

Máximo E. Sánchez-Gutiérrez
View author publications
You can also search for this author in PubMed Google Scholar
E. Marcelo Albornoz
View author publications
You can also search for this author in PubMed Google Scholar
Fabiola Martinez-Licona
View author publications
You can also search for this author in PubMed Google Scholar
H. Leonardo Rufiner
View author publications
You can also search for this author in PubMed Google Scholar
John Goddard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Luis Enrique Erro No. 1, 72840, Sta. Maria Tonantzintla, Puebla, Mexico
José Francisco Martínez-Trinidad & Jesús Ariel Carrasco-Ochoa &
Faculty of Computer Sciences, Autonomous University of Puebla (BUAP), Av. San Claudio y 14 Sur, 72570, Ciudad Universitaria, Puebla, Mexico
José Arturo Olvera-Lopez
Instituto Politécnico Nacional (IPN), Cerro Blanco 141, 76090, Colinas del Cimatario, Querétaro, Mexico
Joaquín Salas-Rodríguez
Centre for Pattern Recognition and Machine Intelligence, Computer Science and Software Engineering Department, Concordia University, 1455 de Maisonneuve Blvd West, Suite, EV3.403, Montreal, QC, Canada
Ching Y. Suen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sánchez-Gutiérrez, M.E., Albornoz, E.M., Martinez-Licona, F., Rufiner, H.L., Goddard, J. (2014). Deep Learning for Emotional Speech Recognition. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-Lopez, J.A., Salas-Rodríguez, J., Suen, C.Y. (eds) Pattern Recognition. MCPR 2014. Lecture Notes in Computer Science, vol 8495. Springer, Cham. https://doi.org/10.1007/978-3-319-07491-7_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-07491-7_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07490-0
Online ISBN: 978-3-319-07491-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Deep Learning for Emotional Speech Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Spoken Emotion Recognition Using Deep Learning

Tamil Speech Emotion Recognition Using Deep Belief Network(DBN)

Speech emotion recognition with unsupervised feature learning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Deep Learning for Emotional Speech Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Spoken Emotion Recognition Using Deep Learning

Tamil Speech Emotion Recognition Using Deep Belief Network(DBN)

Speech emotion recognition with unsupervised feature learning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation