Abstract
In this work a convolutional neural network is applied for classification of emotional speech. Two significantly different approaches to speech signal pre-processing are compared: traditional, based on frequency spectrum and time domain-based. In the first case, a mel-scale spectrogram of the sound signal is computed and used as a 2-dimensional input for the network, similarly as in image recognition tasks. In the second approach, raw sound signal in time domain is fed to the network. Despite the radically different form and content of the input data, the neural architecture is similar, with 2D convolutional layers in the first approach and 1D convolutional layers in the second one, and also identical fully-connected output layers in both approaches. We put emphasis to use practically the same number of trainable parameters in both networks, as well as the same size of input signal snippets used for training. The obtained results show that, under this setting, the frequency-based approach offers very little advantage over direct application of the raw sound signal. In both cases, the total accuracy of whole-file classification exceeded 93% for a dataset with three emotion types.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Defined as the proportion of correctly classified segments to the number of all the segments.
References
Dean, J., Patterson, D., Young, C.: A new golden age in computer architecture: empowering the machine-learning revolution. IEEE Micro 38(2) (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates Inc. (2012)
Opałka, S., Stasiak, B., Szajerman, D., Wojciechowski, A.: Multi-Channel Convolutional Neural Networks Architecture Feeding for Effective EEG Mental Tasks Classification, Sensors 18(10), 3451 (2018)
Tarasiuk, P., Pryczek, M.: Geometric transformations embedded into convolutional neural networks. J. Appl. Comput. Sci. 24(3), 33–48 (2016)
Harár, P., Burget, R., Dutta, M.K.: Speech emotion recognition with deep learning. In: Proceedings of 4th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 137–140 (2017)
Ververidis, D., Kotropoulos, C.: Emotional speech recognition: resources, features, and methods. Speech Commun. 48(9), 1162–1181 (2006)
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of INTERSPEECH 2005, Lissabon, Portugal, pp. 1517–1520 (2005)
Uhrin, D., Partila, P., Frnda, J., Sevcik, L., Voznak, M., Lin, J.C.-W.: Design of emotion recognition system. In: Proceedings of the 2nd Czech-China Scientific Conference 2016, pp. 53–63 (2017)
Kołakowska A., Landowska A., Szwoch M., Szwoch W., Wróbel M.R.: Emotion recognition and its applications. In: Hippe, Z., Kulikowski, J., Mroczek, T., Wtorek, J. (eds.) Human-Computer Systems Interaction: Backgrounds and Applications 3. Advances in Intelligent Systems and Computing, vol. 300, pp. 51–62. Springer (2014)
Partila P., Voznak M.: Speech emotions recognition using 2-D neural classifier. In: Zelinka, I., Chen, G., Rössler, O., Snasel, V., Abraham, A. (eds.) Nostradamus 2013: Prediction, Modeling and Analysis of Complex Systems. Advances in Intelligent Systems and Computing, vol. 210, pp. 221–231. Springer, Heidelberg (2013)
Stasiak, B., Rychlicki-Kicior, K.: Fundamental frequency extraction in speech emotion recognition. In: Communications in Computer and Information Science, CCIS, vol. 287, pp. 292–303 (2012)
Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. Int. J. Smart Home 6(2), 101–108 (2012)
Wöllmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., Cowie, R.: Abandoning emotion classes - towards continuous emotion recognition with modeling of long-range dependencies. In: Proceedings of INTERSPEECH, Brisbane, Australia, ISCA, pp. 597–600 (2008)
Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, G., Schuller, B.: Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings of ICASSP, Prague, Czech Republic, pp. 5688–5691. IEEE (2011)
Lee, C.W., Song, K.Y., Jeong, J., Choi, W.Y.: Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data (2018). arXiv:805.06606
Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans. Multimed. 16(8), 2203–2213 (2014)
Badshah, A.M., Rahim, N., Ullah, N., Ahmad, J. Muhammad, K., Lee, M.Y., Kwon S., Baik, S.W.: Deep features-based speech emotion recognition for smart affective services. Multimed. Tools Appl. (2017). https://doi.org/10.1007/s11042-017-5292-7
Weiskirchen, N., Böck, R., Wendemuth, A.: Recognition of emotional speech with convolutional neural networks by means of spectral estimates. In: 7th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, ACII, pp. 50–55 (2017)
Jianfeng, Z., Xia, M., Lijiang, C.: Learning deep features to recognise speech emotion using merged deep CNN. IET Signal Process. 12(6), 713–721 (2018)
Zhang, L., Wang, L., Dang, J., Guo, L., Guan, H.: Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition. In: Cheng, L., Leung, A., Ozawa, S. (eds.) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science, vol. 11304, pp. 62–71. Springer (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Stasiak, B., Opałka, S., Szajerman, D., Wojciechowski, A. (2019). Convolutional Neural Networks in Speech Emotion Recognition – Time-Domain and Spectrogram-Based Approach. In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds) Information Technology in Biomedicine. ITIB 2019. Advances in Intelligent Systems and Computing, vol 1011. Springer, Cham. https://doi.org/10.1007/978-3-030-23762-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-23762-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23761-5
Online ISBN: 978-3-030-23762-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)