Skip to main content

LF-GANet: Local Frame-Level Global Dynamic Attention Network for Speech Emotion Recognition

  • Conference paper
  • First Online:
Communications, Signal Processing, and Systems (CSPS 2023)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1032))

  • 127 Accesses

Abstract

Speech emotion recognition (SER) is an important field of human–computer interaction. Although humans have various ways of expressing emotions, speech is one of the most direct ways. Therefore, it is an important technical challenge to extract the emotional information from the speech signal as much as possible. To address this issue, we proposed a local frame-level global dynamic attention network (LF-GANet) to extract emotional information from speech signals. This network mainly consists of two parts, a local frame-level module (LFM) and a global dynamic attention module (GAM). To extract rich frame-level emotional information from speech signals, the LFM was designed to extract features from forward and reverse time series separately; the GAM real-time extracted the global correlations from speech signals. We conducted experiments on the EMODB and SAVEE datasets. The results showed that our method outperformes the existing SOTA model in UAR on both datasets, verifying the effectiveness of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Yildirim S, Kaya Y, Kılıc F (2021) A modified feature selection method based on metaheuristic algorithms for speech emotion recognition. Appl Acoustics 173:107721

    Google Scholar 

  2. Assuncao G, Menezes P, Perdig˜ao F (2020) Speaker awareness for speech emotion recognition. Int J Online Biomed Eng 16(4):15–22

    Google Scholar 

  3. Ozer I (2021) Pseudo-colored rate map representation for speech emotion recognition. Biomed Signal Process Control 66:102502

    Article  Google Scholar 

  4. Muppidi A, Radfar M (2021) Speech emotion recognition using quaternion convolutional neural networks. In: ICASSP 2021, Toronto, ON, Canada, June 6–11. IEEE, pp 6309–6313

    Google Scholar 

  5. Rajamani ST, Rajamani KT, Mallol-Ragolta A et al (2021) A novel attention-based gated recurrent unit and its efficacy in speech emotion recognition. In: ICASSP 2021, Toronto, ON, Canada, June 6–11. IEEE, pp 6294–6298

    Google Scholar 

  6. Mustaqeem M, Kwon S (2020) Clustering-based speech emotion recognition by incorporating learned features and deep bilstm, IEEE. Access 8:79861–79875

    Article  Google Scholar 

  7. Ye JX, Wen XC, Wang XZ, Xu Y, Luo Y, Wu CL, Chen LY, Liu KH (2022) GM-TCNet: gated multi-scale temporal convolutional network using emotion causality for speech emotion recognition. Speech Commun 145:21–35. ISSN 0167-6393. https://doi.org/10.1016/j.specom.2022.07.005

  8. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30

    Google Scholar 

  9. Peng Z, Lu Y, Pan S et al. (2021) Efficient speech emotion recognition using multi-scale CNN and attention. In: ICASSP 2021, Toronto, ON, Canada, June 6–11. IEEE, pp 3020–3024

    Google Scholar 

  10. Burkhardt F, Paeschke A, Rolfes M et al (2005) A database of German emotional speech. In: INTERSPEECH 2005, Lisbon, Portugal, September 4–8, vol 5, pp 1517–1520

    Google Scholar 

  11. Philip Jackson and SJUoSG Haq (2014) Surrey audio-visual expressed emotion (savee) database. University of Surrey, Guildford, UK

    Google Scholar 

  12. McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O librosa (2015) Audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, vol 8, pp 18–25

    Google Scholar 

  13. Ibrahim H, Loo CK, Alnajjar F (2021) Grouped echo state network with late fusion for speech emotion recognition. In: Neural information processing—28th international conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8–12, 2021, Proceedings, Part III. (Lecture Notes in Computer Science), vol 13110. Springer, pp 431–442

    Google Scholar 

  14. Kanwal S, Asghar S (2021) Speech emotion recognition using clustering based ga-optimized feature set. IEEE Access 9:125830–125842

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Tianjin Science and Technology Planning Project under Grant No. 20JCYBJC00300 and the National Natural Science Foundation of China under Grant No. 62001328.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tingting Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dou, S., Han, T., Liu, R., Xia, W., Zhong, H. (2024). LF-GANet: Local Frame-Level Global Dynamic Attention Network for Speech Emotion Recognition. In: Wang, W., Liu, X., Na, Z., Zhang, B. (eds) Communications, Signal Processing, and Systems. CSPS 2023. Lecture Notes in Electrical Engineering, vol 1032. Springer, Singapore. https://doi.org/10.1007/978-981-99-7505-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7505-1_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7539-6

  • Online ISBN: 978-981-99-7505-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics