A Hierarchical Classification Scheme for Efficient Speech Emotion Recognition

Heracleous, Panikos; Takai, Kohichi; Yasuda, Keiji; Yoneyama, Akio

doi:10.1007/978-3-030-90179-0_12

Panikos Heracleous⁸,
Kohichi Takai⁸,
Keiji Yasuda⁹ &
…
Akio Yoneyama⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1499))

Included in the following conference series:

International Conference on Human-Computer Interaction

1874 Accesses

Abstract

The current study focuses on speech emotion recognition based on a hierarchical classification scheme. The study aims at overcoming the problem of low accuracy in the case of a large number of emotions that are considered in a specific task. In the proposed method, the emotions are classified based on the valence-arousal 2-dimensional map, and models are trained for each group. In a second pass, with-in group recognition is performed for the group selected in the previous stage.

Dr. Panikos Heracleous is currently with Artificial Intelligence Research Center (AIRC), AIST, Japan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speech Emotion Recognition Using Combined Multiple Pairwise Classifiers

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Hierarchical emotion recognition from speech using source, power spectral and prosodic features

Article 28 July 2023

References

Busso, C., Bulut, M., Narayanan, S.: Toward effective automatic recognition systems of emotion in speech. In: Gratch, J., Marsella, S. (eds.) Social emotions in nature and artifact: emotions in human and human-computer interaction, pp. 110–127. Oxford University Press, New York, NY, USA (November (2013)
Chapter Google Scholar
Feng, H., Ueno, S., Kawahara, T.: End-to-end speech emotion recognition combined with acoustic-to-word ASR model. In: Proceedings of Interspeech, pp. 501–505 (2020)
Google Scholar
Huang, J., Tao, J., Liu, B., Lian, Z.: Learning utterance-level representations with label smoothing for speech emotion recognition. In: Proceedings of Interspeech, pp. 4079–4083 (2020)
Google Scholar
Jalal, M.A., Milner, R., Hain, T., Moore, R.K.: Proceedings of Interspeech, pp. 4084–4088 (2020)
Google Scholar
Jalal, M.A., Milner, R., Hain, T.: Empirical interpretation of speech emotion perception with attention based model for speech emotion recognition. In: Proceedings of Interspeech, pp. 4113–4117 (2020)
Google Scholar
Stuhlsatz, A., Meyer, C., Eyben, F., Zielke1, T., Meier, G., Schuller, B.: Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings of ICASSP, pp. 5688–5691 (2011)
Google Scholar
Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Proceedings of Interspeech, pp. 2023–2027 (2014)
Google Scholar
Lim, W., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: Proceedings of Signal and Information Processing Association Annual Summit and Conference (APSIPA) (2016)
Google Scholar
Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: a comprehensive review. Neural Commun. 29, 2352–2449 (2017)
Article MathSciNet Google Scholar
Huynh, X.-P., Tran, T.-D., Kim, Y.-G.: Convolutional neural network models for facial expression recognition using BU-3DFE database. In: Information Science and Applications (ICISA) 2016. LNEE, vol. 376, pp. 441–450. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-0557-2_44
Chapter Google Scholar
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. J. Lang. Resour. Eval. 335–359 (2008). https://doi.org/10.1007/s10579-008-9076-6
Sahidullah, M., Saha, G.: Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Commun. 54(4), 543–565 (2012)
Article Google Scholar
Bielefeld, B.: Language identification using shifted delta cepstrum. In: Fourteenth Annual Speech Research Symposium (1994)
Google Scholar
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-End Factor Analysis for Speaker Verification. IEEE Trans. Audio, Speech Language Process. 19(4), 788–798 (2011)
Article Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd ed. New York. Academic Press, Cambridge, ch. 10 (1990)
Google Scholar
Cristianini, N., Taylor, J.S.: Support Vector Machines. Cambridge University Press, Cambridge (2000)
Google Scholar
Lubis, N., Sakti, S., Yoshino, K., Nakamura, S.: Positive emotion elicitation in chat-based dialogue systems. IEEE/ACM Trans. Audio, Speech Lang. Process. 27(4), 866–877 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

KDDI Research, Inc., 2-1-15 Ohara, Fujimino-shi, Saitama, 356-8502, Japan
Panikos Heracleous, Kohichi Takai & Akio Yoneyama
Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
Keiji Yasuda

Authors

Panikos Heracleous
View author publications
You can also search for this author in PubMed Google Scholar
Kohichi Takai
View author publications
You can also search for this author in PubMed Google Scholar
Keiji Yasuda
View author publications
You can also search for this author in PubMed Google Scholar
Akio Yoneyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panikos Heracleous .

Editor information

Editors and Affiliations

University of Crete and Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Constantine Stephanidis
Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Margherita Antona
Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Stavroula Ntoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heracleous, P., Takai, K., Yasuda, K., Yoneyama, A. (2021). A Hierarchical Classification Scheme for Efficient Speech Emotion Recognition. In: Stephanidis, C., Antona, M., Ntoa, S. (eds) HCI International 2021 - Late Breaking Posters. HCII 2021. Communications in Computer and Information Science, vol 1499. Springer, Cham. https://doi.org/10.1007/978-3-030-90179-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-90179-0_12
Published: 06 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90178-3
Online ISBN: 978-3-030-90179-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Hierarchical Classification Scheme for Efficient Speech Emotion Recognition

Abstract

Access this chapter

Similar content being viewed by others

Speech Emotion Recognition Using Combined Multiple Pairwise Classifiers

Speech Emotion Recognition: A Comprehensive Survey

Hierarchical emotion recognition from speech using source, power spectral and prosodic features

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Hierarchical Classification Scheme for Efficient Speech Emotion Recognition

Abstract

Access this chapter

Similar content being viewed by others

Speech Emotion Recognition Using Combined Multiple Pairwise Classifiers

Speech Emotion Recognition: A Comprehensive Survey

Hierarchical emotion recognition from speech using source, power spectral and prosodic features

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation