Abstract
Emotional speech recognition is a multidisciplinary research area that has received increasing attention over the last few years. The present paper considers the application of restricted Boltzmann machines (RBM) and deep belief networks (DBN) to the difficult task of automatic Spanish emotional speech recognition. The principal motivation lies in the success reported in a growing body of work employing these techniques as alternatives to traditional methods in speech processing and speech recognition. Here a well-known Spanish emotional speech database is used in order to extensively experiment with, and compare, different combinations of parameters and classifiers. It is found that with a suitable choice of parameters, RBM and DBN can achieve comparable results to other classifiers.
Chapter PDF
Similar content being viewed by others
References
Douglas-Cowie, C.: Humaine d5f deliverable, obtainable from http://emotion-research.net/download/pilot-db/
Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: 10th Annual Conference of the International, Speech Communication, Association INTERSPEECH 2009, pp. 312–315 (2009)
Schuller, B., Steidl, S., Batliner, A., Noth, E., Vinciarelli, A., Burkhardt, F., van Son, R., Weninger, F., Eyben, F., Bocklet, T., Mohammadi, G., Weiss, B.: The interspeech 2012 speaker trait challenge. In: Proc. INTERSPEECH (2012)
Scherer, K.R.: A blueprint for affective computing: a sourcebook. Oxford University Press, Oxford (2010)
Mohamed, A., Sainath, T., Dahl, G.E., Ramabhadran, B., Hinton, G., Picheny, M.: Deep belief networks using discriminative features for phone recognition. In: ICASSP 2011. ISCA, Portland (2012)
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine (2012)
Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, G., Schuller, B.: Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, Czech Republic, pp. 5688–5691 (2011)
Bruckner, R., Schuller, B.: Likability classification - a not so deep neural network approach. In: Proceedings of INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association (2012)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation 14, 1771–1800 (2002)
Hinton, G.E.: A practical guide to training restricted boltzmann machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012)
Bengio, Y.: Learning deep architectures for ai. Foundations and Trends in Machine Learning 2, 1–127 (2009)
Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Computation 18, 1527–1554 (2006)
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44(3), 572–587 (2011)
Catalogue, E.: Emotional speech synthesis database, catalogue reference: Elra-s0329, http://catalog.elra.info
Rabiner, L., Juang, B.H.: Fundamentals of speech recognition. Prentice Hall PTR (1993)
Deller, J., Proakis, J., Hansen, J.: Discrete-time processing of speech signals. Prentice Hall PTR, Upper Saddle River (1993)
Albornoz, E., Milone, D., Rufiner, H.: Spoken emotion recognition using hierarchical classifiers. Computer Speech and Language 25, 556–570 (2011)
Eyben, F., Wollmer, M., Schuller, B.: Opensmile - the munich versatile and fast open-source audio feature extractor. In: Proc. ACM Multimedia (MM). ACM, Florence (2010)
Wulsin, D.: Dbn toolbox v1. Department of Bioengineering. University of Pennsylvania (2010), http://www.seas.upenn.edu/~wulsin/
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (2011)
Guide, M.U.: Mathworks (2011), http://www.mathworks.com
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Sánchez-Gutiérrez, M.E., Albornoz, E.M., Martinez-Licona, F., Rufiner, H.L., Goddard, J. (2014). Deep Learning for Emotional Speech Recognition. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-Lopez, J.A., Salas-Rodríguez, J., Suen, C.Y. (eds) Pattern Recognition. MCPR 2014. Lecture Notes in Computer Science, vol 8495. Springer, Cham. https://doi.org/10.1007/978-3-319-07491-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-07491-7_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07490-0
Online ISBN: 978-3-319-07491-7
eBook Packages: Computer ScienceComputer Science (R0)