Skip to main content

Emotion Speech Recognition Based on Adaptive Fractional Deep Belief Network and Reinforcement Learning

  • Conference paper
  • First Online:
Cognitive Informatics and Soft Computing

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 768))

Abstract

The identification of emotion is a challenging task due to the rapid development of human–computer interaction framework. Speech Emotion Recognition (SER) can be characterized as the extraction of the emotional condition of the narrator from their spoken utterances. The detection of emotion is troublesome to the computer since it differs according to the speaker. To solve this setback, the system is implemented based on Adaptive Fractional Deep Belief Network (AFDBN) and Reinforcement Learning (RL). Pitch chroma, spectral flux, tonal power ratio and MFCC features are extracted from the speech signal to achieve the desired task. The extracted feature is then given into the classification task. Finally, the performance is analyzed by the evaluation metrics which is compared with the existing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mencattini, A., Martinelli, E., Costantini, G., Todisco, M., Basile, B., Bozzali, M., Di Natale, C.: Speech emotion recognition using amplitude modulation parameters. Knowl.-Based Syst. 63, 68–81 (2014)

    Google Scholar 

  2. Omar, M.K.: A factor analysis model of sequences for language recognition. In: Spoken Language Technology Workshop (SLT), pp. 341–347. IEEE, California (2016)

    Google Scholar 

  3. Lu, C.-X., Sun, Z.-Y., Shi, Z.-Z., Cao, B.-X.: Using emotions as intrinsic motivation to accelerate classic reinforcement learning. In: International Conference on Information System and Artificial Intelligence (ISAI), pp. 332–337. IEEE, China (2016)

    Google Scholar 

  4. Newland, E.J., Xu, S., Miranker, W.L.: A neural network-based approach to modeling the allocation of behaviors in concurrent schedule, variable interval learning. In: Fourth International Conference on Natural Computation, ICNC’08, vol. 2, pp. 245–249. IEEE, China (2008)

    Google Scholar 

  5. Wang, K., An, N., Li, B.N., Zhang, Y., Li, L.: Speech emotion recognition using Fourier parameters. IEEE Trans. Affect. Comput. 6(1), 69–75 (2015)

    Article  Google Scholar 

  6. Jang, E.-H., Park, B.-J., Kim, S.-H., Chung, M.-A., Park, M.-S., Sohn, J.-H.: Emotion classification based on bio-signals emotion recognition using machine learning algorithms. In: International Conference on Information Science, Electronics and Electrical Engineering (ISEEE), vol. 3, pp. 1373–1376. IEEE, Japan (2014)

    Google Scholar 

  7. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Google Scholar 

  8. Ghahabi, O., Hernando, J.: Deep learning backend for single and multisession i-vector speaker recognition. J. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 807–817 (2017)

    Google Scholar 

  9. Cruz, F., Twiefel, J., Magg, S., Weber, C., Wermter, S.: Interactive reinforcement learning through speech guidance in a domestic scenario. In: IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1341–1348, Killarney, Ireland (2015)

    Google Scholar 

  10. Kim, E.H., Hyun, K.H., Kim, S.H., Kwak, Y.K.: Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Trans. Mechatron. 14(3), 317–325 (2009)

    Article  Google Scholar 

  11. Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans. Multimedia 16(8), 2203–2213 (2014)

    Article  Google Scholar 

  12. Hoque, S., Salauddin, F., Rahman, A.: Neighbour cell list optimization based on cooperative q-learning and reinforced back-propagation technique. In: Radio Science Meeting (Joint with AP-S Symposium), 2015 USNC-URSI, pp. 215–215. IEEE, Canada (2015)

    Google Scholar 

  13. Gharsellaoui, S., Selouani, S.-A., Dahmane, A.O.: Automatic emotion recognition using auditory and prosodic indicative features. In: 2015 IEEE 28th Canadian Conference on, Electrical and Computer Engineering (CCECE), pp. 1265–1270. IEEE, Canada (2015)

    Google Scholar 

  14. Lerch, A.: An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics, p. 272. Wiley IEEE Press, July 2012

    Google Scholar 

  15. Peeters, G.: Chroma-based estimation of musical key from audio-signal analysis. In: Proceedings of the 7th International Conference on Music Information Retrieval. Victoria (BC), Canada (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Sangeetha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sangeetha, J., Jayasankar, T. (2019). Emotion Speech Recognition Based on Adaptive Fractional Deep Belief Network and Reinforcement Learning. In: Mallick, P., Balas, V., Bhoi, A., Zobaa, A. (eds) Cognitive Informatics and Soft Computing. Advances in Intelligent Systems and Computing, vol 768. Springer, Singapore. https://doi.org/10.1007/978-981-13-0617-4_16

Download citation

Publish with us

Policies and ethics