Multi-modal Emotion Recognition with Temporal-Band Attention Based on LSTM-RNN

Liu, Jiamin; Su, Yuanqi; Liu, Yuehu

doi:10.1007/978-3-319-77380-3_19

Multi-modal Emotion Recognition with Temporal-Band Attention Based on LSTM-RNN

Jiamin Liu¹⁹,
Yuanqi Su²⁰ &
Yuehu Liu^19,21

Conference paper
First Online: 10 May 2018

3300 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10735))

Abstract

Emotion recognition is a key problem in Human-Computer Interaction (HCI). The multi-modal emotion recognition was discussed based on untrimmed visual signals and EEG signals in this paper. We propose a model with two attention mechanisms based on multi-layer Long short-term memory recurrent neural network (LSTM-RNN) for emotion recognition, which combines temporal attention and band attention. At each time step, the LSTM-RNN takes the video and EEG slice as inputs and generate representations of two signals, which are fed into a multi-modal fusion unit. Based on the fusion, our network predicts the emotion label and the next time slice for analyzing. Within the process, the model applies different levels of attention to different frequency bands of EEG signals through the band attention. With the temporal attention, it determines where to analyze next signal in order to suppress the redundant information for recognition. Experiments on Mahnob-HCI database demonstrate the encouraging results; the proposed method achieves higher accuracy and boosts the computational efficiency.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 155.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimed. Syst. 16(6), 345–379 (2010)
Article Google Scholar
Bashivan, P., Rish, I., Yeasin, M., Codella, N.: Learning representations from EEG with deep recurrent-convolutional neural networks. arXiv preprint arXiv:1511.06448 (2015)
Bradley, M.M., Lang, P.J.: Measuring emotion: the self-assessment manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry 25(1), 49–59 (1994)
Article Google Scholar
Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trends Robot. 2(1–2), 1–142 (2013)
Google Scholar
He, L., Jiang, D., Yang, L., Pei, E., Wu, P., Sahli, H.: Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In: International Workshop on Audio/Visual Emotion Challenge, pp. 73–80 (2015)
Google Scholar
Huang, X., Kortelainen, J., Zhao, G., Li, X., Moilanen, A., Seppänen, T., Pietikäinen, M.: Multi-modal emotion analysis from facial expressions and electroencephalogram. Comput. Vis. Image Underst. 147, 114–124 (2016)
Article Google Scholar
Kim, Y., Lee, H., Provost, E.M.: Deep learning for robust feature generation in audiovisual emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3687–3691. IEEE (2013)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Koelstra, S., Patras, I.: Fusion of facial expressions and EEG for implicit affective tagging. Image Vis. Comput. 31(2), 164–174 (2013)
Article Google Scholar
Levenson, R.W.: The intrapersonal functions of emotion. Cogn. Emot. 13(5), 481–504 (1999)
Article Google Scholar
Liu, Y., Sourina, O., Nguyen, M.K.: Real-time EEG-based emotion recognition and its applications. In: Gavrilova, M.L., Tan, C.J.K., Sourin, A., Sourina, O. (eds.) Transactions on Computational Science XII. LNCS, vol. 6670, pp. 256–277. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22336-5_13
Chapter Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Sharma, S., Kiros, R., Salakhutdinov, R.: Action recognition using visual attention. arXiv preprint arXiv:1511.04119 (2015)
Soleymani, M., Lichtenauer, J., Pun, T., Pantic, M.: A multi-modal affective database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3, 42–55 (2012)
Article Google Scholar
Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. arXiv preprint arXiv:1611.06067 (2016)
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
Google Scholar
Yin, Z., Zhao, M., Wang, Y., Yang, J., Zhang, J.: Recognition of emotions using multimodal physiological signals and an ensemble deep learning model. Comput. Methods Programs Biomed. 140, 93–110 (2016)
Article Google Scholar
Zhalehpour, S., Akhtar, Z., Erdem, C.E.: Multimodal emotion recognition with automatic peak frame selection. In: IEEE International Symposium on Innovations in Intelligent Systems and Applications, pp. 116–121 (2014)
Google Scholar
Zheng, W.L., Zhu, J.Y., Peng, Y., Lu, B.L.: EEG-based emotion classification using deep belief networks. In: IEEE International Conference on Multimedia and Expo, pp. 1–6 (2014)
Google Scholar

Download references

Acknowledgements

This work has been funded by National Natural Science Foundation of China (Grant No. 91520301).

Author information

Authors and Affiliations

Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China
Jiamin Liu & Yuehu Liu
Department of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
Yuanqi Su
Shaanxi Key Laboratory of Digital Technology and Intelligent System, Xi’an, China
Yuehu Liu

Authors

Jiamin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanqi Su
View author publications
You can also search for this author in PubMed Google Scholar
Yuehu Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiamin Liu .

Editor information

Editors and Affiliations

University of Electronic Science and Technology of China, Chengdu, China
Bing Zeng
University of Chinese Academy of Sciences, Beijing, China
Qingming Huang
University of Ottawa, Ottawa, Ontario, Canada
Abdulmotaleb El Saddik
University of Electronic Science and Technology of China, Chengdu, China
Hongliang Li
Chinese Academy of Sciences, Beijing, China
Shuqiang Jiang
Harbin Institute of Technology, Harbin, China
Xiaopeng Fan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Su, Y., Liu, Y. (2018). Multi-modal Emotion Recognition with Temporal-Band Attention Based on LSTM-RNN. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10735. Springer, Cham. https://doi.org/10.1007/978-3-319-77380-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-77380-3_19
Published: 10 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77379-7
Online ISBN: 978-3-319-77380-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics