Abstract
Emotion recognition is a key problem in Human-Computer Interaction (HCI). The multi-modal emotion recognition was discussed based on untrimmed visual signals and EEG signals in this paper. We propose a model with two attention mechanisms based on multi-layer Long short-term memory recurrent neural network (LSTM-RNN) for emotion recognition, which combines temporal attention and band attention. At each time step, the LSTM-RNN takes the video and EEG slice as inputs and generate representations of two signals, which are fed into a multi-modal fusion unit. Based on the fusion, our network predicts the emotion label and the next time slice for analyzing. Within the process, the model applies different levels of attention to different frequency bands of EEG signals through the band attention. With the temporal attention, it determines where to analyze next signal in order to suppress the redundant information for recognition. Experiments on Mahnob-HCI database demonstrate the encouraging results; the proposed method achieves higher accuracy and boosts the computational efficiency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimed. Syst. 16(6), 345–379 (2010)
Bashivan, P., Rish, I., Yeasin, M., Codella, N.: Learning representations from EEG with deep recurrent-convolutional neural networks. arXiv preprint arXiv:1511.06448 (2015)
Bradley, M.M., Lang, P.J.: Measuring emotion: the self-assessment manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry 25(1), 49–59 (1994)
Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trends Robot. 2(1–2), 1–142 (2013)
He, L., Jiang, D., Yang, L., Pei, E., Wu, P., Sahli, H.: Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In: International Workshop on Audio/Visual Emotion Challenge, pp. 73–80 (2015)
Huang, X., Kortelainen, J., Zhao, G., Li, X., Moilanen, A., Seppänen, T., Pietikäinen, M.: Multi-modal emotion analysis from facial expressions and electroencephalogram. Comput. Vis. Image Underst. 147, 114–124 (2016)
Kim, Y., Lee, H., Provost, E.M.: Deep learning for robust feature generation in audiovisual emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3687–3691. IEEE (2013)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Koelstra, S., Patras, I.: Fusion of facial expressions and EEG for implicit affective tagging. Image Vis. Comput. 31(2), 164–174 (2013)
Levenson, R.W.: The intrapersonal functions of emotion. Cogn. Emot. 13(5), 481–504 (1999)
Liu, Y., Sourina, O., Nguyen, M.K.: Real-time EEG-based emotion recognition and its applications. In: Gavrilova, M.L., Tan, C.J.K., Sourin, A., Sourina, O. (eds.) Transactions on Computational Science XII. LNCS, vol. 6670, pp. 256–277. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22336-5_13
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Sharma, S., Kiros, R., Salakhutdinov, R.: Action recognition using visual attention. arXiv preprint arXiv:1511.04119 (2015)
Soleymani, M., Lichtenauer, J., Pun, T., Pantic, M.: A multi-modal affective database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3, 42–55 (2012)
Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. arXiv preprint arXiv:1611.06067 (2016)
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
Yin, Z., Zhao, M., Wang, Y., Yang, J., Zhang, J.: Recognition of emotions using multimodal physiological signals and an ensemble deep learning model. Comput. Methods Programs Biomed. 140, 93–110 (2016)
Zhalehpour, S., Akhtar, Z., Erdem, C.E.: Multimodal emotion recognition with automatic peak frame selection. In: IEEE International Symposium on Innovations in Intelligent Systems and Applications, pp. 116–121 (2014)
Zheng, W.L., Zhu, J.Y., Peng, Y., Lu, B.L.: EEG-based emotion classification using deep belief networks. In: IEEE International Conference on Multimedia and Expo, pp. 1–6 (2014)
Acknowledgements
This work has been funded by National Natural Science Foundation of China (Grant No. 91520301).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Liu, J., Su, Y., Liu, Y. (2018). Multi-modal Emotion Recognition with Temporal-Band Attention Based on LSTM-RNN. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10735. Springer, Cham. https://doi.org/10.1007/978-3-319-77380-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-77380-3_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77379-7
Online ISBN: 978-3-319-77380-3
eBook Packages: Computer ScienceComputer Science (R0)