Speech Emotion Recognition Using Neural Network and Wavelet Features

  • Tanmoy RoyEmail author
  • Tshilidzi Marwala
  • S. Chakraverty
Conference paper
Part of the Lecture Notes in Mechanical Engineering book series (LNME)


Human speech which is generated through the vibration of the vocal cord gets affected by the emotional state of the speaker. Accurate recognition of different emotions concealed in human speech is a significant factor toward further improvement of the quality of Human–Computer Interaction (HCI). But the satisfactory level of accuracy is not yet achieved mainly because there is no well-accepted standard feature set. Emotions are hard to distinguish from speech even by human and that is why the standard feature set is difficult to extract. This paper presents a model to classify emotions from speech signals with high accuracy compared to the present state of the art. The speech dataset used in this experiment where speech recordings that are specifically labeled with different emotions of the speakers. A wavelet-based novel feature set is extracted from speech signals and then a Neural Network (NN) with a single hidden layer is trained on the feature set for classification of different emotions. The feature set is a newly introduced one and for the first time it is being tested with NN architecture and classification results are also compared with the results of other prominent classification techniques.


Speech emotion recognition Neural network Wavelet Feature extraction 


  1. 1.
    El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 44(3):572–587CrossRefGoogle Scholar
  2. 2.
    Busso C, Lee S, Narayanan S (2009) Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans Audio Speech Lang Process 17:582–596CrossRefGoogle Scholar
  3. 3.
    Bosch LT (2003) Emotions, speech and the asr framework. Speech Commun 40(1):213–225zbMATHGoogle Scholar
  4. 4.
    Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor JG (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 18(1):32–80. Scholar
  5. 5.
    Han K, Dong Y, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In: Proceedings of the INTERSPEECHGoogle Scholar
  6. 6.
    Lee J, Tashev I (2015) High-level feature representation using recurrent neural network for speech emotion recognition. In: Proceedings of the INTERSPEECHGoogle Scholar
  7. 7.
    Neiberg D, Elenius K, Laskowski K (2006) Emotion recognition in spontaneous speech using GMMs. In: Proceedings of the INTERSPEECHGoogle Scholar
  8. 8.
    Shen P, Changjun Z, Chen X (2011) Automatic speech emotion recognition using support vector machine. In: Proceedings of the international conference on electronic mechanical engineering and information technology, vol 2, pp 621–625.
  9. 9.
    JB (2001) Speech emotion recognition using hidden markov models. In: Proceedings of INTERSPEECH, pp 2679–2682,Google Scholar
  10. 10.
    Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41:603–623CrossRefGoogle Scholar
  11. 11.
    Mower E, Mataric MJ, Narayanan S (2011) A framework for automatic human emotion classification using emotion profiles. IEEE Trans Audio Speech Lang Process 19(5):1057–1070. ISSN 1558-7916. Scholar
  12. 12.
    Lugger M, Yang B (2008) Psychological motivated multi-stage emotion classification exploiting voice quality features. In: Mihelic F, Zibert J (eds) Speech recognition, technologies and applications, chapter 22. I-TechGoogle Scholar
  13. 13.
    Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90:1415–1423CrossRefGoogle Scholar
  14. 14.
    Fragopanagos N, Taylor JG (2005) Emotion recognition in human-computer interaction. Neural Netw, 18(5):389–405. ISSN 0893-6080. Scholar
  15. 15.
    Walker JS (2008) A primer on WAVELETS and their scientific applications. Taylor and Francis Group, LLCGoogle Scholar
  16. 16.
    Quiroga RQ, Rosso OA, Basar E, Schurman M (2001) Wavelet entropy in event-related potentials: a new method shows ordering of EEG oscillations. Biol Cybern 84:291–299Google Scholar
  17. 17.
    Kullback S (1959) Digital signal processing. WileyGoogle Scholar
  18. 18.
    Livingstone SR, Russo FA (2018) The ryerson audio-visual database of emotional speech and song (RAVDESS). Public Library Sci 13(5):1–35. Scholar
  19. 19.
    Slaney M, McRoberts G (1998) Baby ears: a recognition system for affective vocalizations. In: Proceedings of the international conference on acoustics, speech, and signal processingGoogle Scholar
  20. 20.
    Engberg IS, Hansen AV, Andersen O, Dalsgaard P (1997) Design, recording and verification of a Danish emotional speech database. In: Proceedings of the 5th European conference on speech communication and technologyGoogle Scholar
  21. 21.
    Fayek HM, Lech M, Cavedonb L (2017) Evaluating deep learning architectures for speech emotion recognition. Neural Netw, 92:60–68CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Electrical and Electronic EngineeringUniversity of JohannesburgJohannesburgSouth Africa
  2. 2.Department of MathematicsNational Institute of TechnologyRourkelaIndia

Personalised recommendations