LF-GANet: Local Frame-Level Global Dynamic Attention Network for Speech Emotion Recognition

Dou, Shuwei; Han, Tingting; Liu, Ruqian; Xia, Wei; Zhong, Hongmei

doi:10.1007/978-981-99-7505-1_13

Shuwei Dou^40,41,
Tingting Han^40,41,
Ruqian Liu^40,41,
Wei Xia^40,41 &
…
Hongmei Zhong^40,41

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1032))

Included in the following conference series:

International Conference in Communications, Signal Processing, and Systems

127 Accesses

Abstract

Speech emotion recognition (SER) is an important field of human–computer interaction. Although humans have various ways of expressing emotions, speech is one of the most direct ways. Therefore, it is an important technical challenge to extract the emotional information from the speech signal as much as possible. To address this issue, we proposed a local frame-level global dynamic attention network (LF-GANet) to extract emotional information from speech signals. This network mainly consists of two parts, a local frame-level module (LFM) and a global dynamic attention module (GAM). To extract rich frame-level emotional information from speech signals, the LFM was designed to extract features from forward and reverse time series separately; the GAM real-time extracted the global correlations from speech signals. We conducted experiments on the EMODB and SAVEE datasets. The results showed that our method outperformes the existing SOTA model in UAR on both datasets, verifying the effectiveness of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Time-Frequency Attention Mechanism with Subsidiary Information for Effective Speech Emotion Recognition

Upgraded Attention-Based Local Feature Learning Block for Speech Emotion Recognition

Dynamic-Static Cross Attentional Feature Fusion Method for Speech Emotion Recognition

References

Yildirim S, Kaya Y, Kılıc F (2021) A modified feature selection method based on metaheuristic algorithms for speech emotion recognition. Appl Acoustics 173:107721
Google Scholar
Assuncao G, Menezes P, Perdig˜ao F (2020) Speaker awareness for speech emotion recognition. Int J Online Biomed Eng 16(4):15–22
Google Scholar
Ozer I (2021) Pseudo-colored rate map representation for speech emotion recognition. Biomed Signal Process Control 66:102502
Article Google Scholar
Muppidi A, Radfar M (2021) Speech emotion recognition using quaternion convolutional neural networks. In: ICASSP 2021, Toronto, ON, Canada, June 6–11. IEEE, pp 6309–6313
Google Scholar
Rajamani ST, Rajamani KT, Mallol-Ragolta A et al (2021) A novel attention-based gated recurrent unit and its efficacy in speech emotion recognition. In: ICASSP 2021, Toronto, ON, Canada, June 6–11. IEEE, pp 6294–6298
Google Scholar
Mustaqeem M, Kwon S (2020) Clustering-based speech emotion recognition by incorporating learned features and deep bilstm, IEEE. Access 8:79861–79875
Article Google Scholar
Ye JX, Wen XC, Wang XZ, Xu Y, Luo Y, Wu CL, Chen LY, Liu KH (2022) GM-TCNet: gated multi-scale temporal convolutional network using emotion causality for speech emotion recognition. Speech Commun 145:21–35. ISSN 0167-6393. https://doi.org/10.1016/j.specom.2022.07.005
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Google Scholar
Peng Z, Lu Y, Pan S et al. (2021) Efficient speech emotion recognition using multi-scale CNN and attention. In: ICASSP 2021, Toronto, ON, Canada, June 6–11. IEEE, pp 3020–3024
Google Scholar
Burkhardt F, Paeschke A, Rolfes M et al (2005) A database of German emotional speech. In: INTERSPEECH 2005, Lisbon, Portugal, September 4–8, vol 5, pp 1517–1520
Google Scholar
Philip Jackson and SJUoSG Haq (2014) Surrey audio-visual expressed emotion (savee) database. University of Surrey, Guildford, UK
Google Scholar
McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O librosa (2015) Audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, vol 8, pp 18–25
Google Scholar
Ibrahim H, Loo CK, Alnajjar F (2021) Grouped echo state network with late fusion for speech emotion recognition. In: Neural information processing—28th international conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8–12, 2021, Proceedings, Part III. (Lecture Notes in Computer Science), vol 13110. Springer, pp 431–442
Google Scholar
Kanwal S, Asghar S (2021) Speech emotion recognition using clustering based ga-optimized feature set. IEEE Access 9:125830–125842
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Tianjin Science and Technology Planning Project under Grant No. 20JCYBJC00300 and the National Natural Science Foundation of China under Grant No. 62001328.

Author information

Authors and Affiliations

Tianjin Key Laboratory of Wireless Mobile Communications and Power Transmission, Tianjin Normal University, Tianjin, 300387, China
Shuwei Dou, Tingting Han, Ruqian Liu, Wei Xia & Hongmei Zhong
College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, 300387, China
Shuwei Dou, Tingting Han, Ruqian Liu, Wei Xia & Hongmei Zhong

Authors

Shuwei Dou
View author publications
You can also search for this author in PubMed Google Scholar
Tingting Han
View author publications
You can also search for this author in PubMed Google Scholar
Ruqian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xia
View author publications
You can also search for this author in PubMed Google Scholar
Hongmei Zhong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tingting Han .

Editor information

Editors and Affiliations

College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, China
Wei Wang
Dalian University of Technology, Dalian, China
Xin Liu
School of Information Science and Technology, Dalian Maritime University, Dalian, China
Zhenyu Na
College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, China
Baoju Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dou, S., Han, T., Liu, R., Xia, W., Zhong, H. (2024). LF-GANet: Local Frame-Level Global Dynamic Attention Network for Speech Emotion Recognition. In: Wang, W., Liu, X., Na, Z., Zhang, B. (eds) Communications, Signal Processing, and Systems. CSPS 2023. Lecture Notes in Electrical Engineering, vol 1032. Springer, Singapore. https://doi.org/10.1007/978-981-99-7505-1_13

Download citation

DOI: https://doi.org/10.1007/978-981-99-7505-1_13
Published: 01 March 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7539-6
Online ISBN: 978-981-99-7505-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

LF-GANet: Local Frame-Level Global Dynamic Attention Network for Speech Emotion Recognition

Abstract

Access this chapter

Similar content being viewed by others

A Time-Frequency Attention Mechanism with Subsidiary Information for Effective Speech Emotion Recognition

Upgraded Attention-Based Local Feature Learning Block for Speech Emotion Recognition

Dynamic-Static Cross Attentional Feature Fusion Method for Speech Emotion Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

LF-GANet: Local Frame-Level Global Dynamic Attention Network for Speech Emotion Recognition

Abstract

Access this chapter

Similar content being viewed by others

A Time-Frequency Attention Mechanism with Subsidiary Information for Effective Speech Emotion Recognition

Upgraded Attention-Based Local Feature Learning Block for Speech Emotion Recognition

Dynamic-Static Cross Attentional Feature Fusion Method for Speech Emotion Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation