Video Emotion Recognition Using Local Enhanced Motion History Image and CNN-RNN Networks

  • Haowen WangEmail author
  • Guoxiang Zhou
  • Min HuEmail author
  • Xiaohua Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10996)


This paper focus on the issue of recognition of facial expressions in video sequences and propose a local-with-global method, which is based on local enhanced motion history image and CNN-RNN networks. On the one hand, traditional motion history image method is improved by using detected human facial landmarks as attention areas to boost local value in difference image calculation, so that the action of crucial facial unit can be captured effectively, then the generated LEMHI is fed into a CNN network for categorization. On the other hand, a CNN-LSTM model is used as an global feature extractor and classifier for video emotion recognition. Finally, a random search weighted summation strategy is selected as our late-fusion fashion to final predication. Experiments on AFEW, CK+ and MMI datasets using subject-independent validation scheme demonstrate that the integrated framework achieves a better performance than state-of-arts methods.


Video emotion recognition Motion history image LSTM Facial landmarks 



This research has been partially supported by National Natural Science Foundation of China under Grant Nos. 61672202, 61502141 and 61432004.


  1. 1.
    Lecun, Y., Huang, F.J., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Computer Vision and Pattern Recognition, CVPR 2004 (2004)Google Scholar
  2. 2.
    Fan, Y., Lu, X., Li, D., Liu, Y.: Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In: ACM International Conference on Multimodal Interaction, pp. 445–450. ACM (2016)Google Scholar
  3. 3.
    Hosseini, S., Lee, S.H., Cho, N.I.: Feeding hand-crafted features for enhancing the performance of convolutional neural networks (2018)Google Scholar
  4. 4.
    Koelstra, S., Pantic, M., Patras, I.: A dynamic texture-based approach to recognition of facial actions and their temporal models. IEEE Trans. Pattern Anal. Mach. Intell. 32(11), 1940–1954 (2010)CrossRefGoogle Scholar
  5. 5.
    Hasani, B., Mahoor, M.H.: Facial expression recognition using enhanced deep 3D convolutional neural networks (2017)Google Scholar
  6. 6.
    Ma, C.Y., Chen, M.H., Kira, Z., et al.: TS-LSTM and temporal-inception: exploiting spatiotemporal dynamics for activity recognition (2017)Google Scholar
  7. 7.
    Razavian, A.S., Azizpour, H., Sullivan, J., et al.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519. IEEE Computer Society (2014)Google Scholar
  8. 8.
    Mayer, C., Eggers, M., Radig, B.: Cross-database evaluation for facial expression recognition. Pattern Recogn. Image Anal. 24(1), 124–132 (2014)CrossRefGoogle Scholar
  9. 9.
    Lee, S.H., Yong, M.R.: Intra-class variation reduction using training expression images for sparse representation based facial expression recognition. IEEE Trans. Affect. Comput. 5(3), 340–351 (2017)CrossRefGoogle Scholar
  10. 10.
    Taheri, S., Qiu, Q., Chellappa, R.: Structure-preserving sparse decomposition for facial expression analysis. IEEE Trans. Image Process. 23(8), 3590–3603 (2014)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Liu, M., Li, S., Shan, S., Wang, R., Chen, X.: Deeply learning deformable facial action parts model for dynamic expression analysis. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 143–157. Springer, Cham (2015). Scholar
  12. 12.
    Liu, M., Shan, S., Wang, R., et al.: Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1749–1756. IEEE Computer Society (2014)Google Scholar
  13. 13.
    Shan, C., Gong, S., Mcowan, P.W.: Facial expression recognition based on Local Binary Patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)CrossRefGoogle Scholar
  14. 14.
    Fan, X., Tjahjadi, T.: A dynamic framework based on local Zernike moment and motion history image for facial expression recognition. Pattern Recogn. 64, 399–406 (2017)CrossRefGoogle Scholar
  15. 15.
    Yao, A., Shao, J., Ma, N., et al.: Capturing AU-aware facial features and their latent relations for emotion recognition in the wild. In: ACM on International Conference on Multimodal Interaction, pp. 451–458. ACM (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Computer and InformationHefei University of TechnologyHefeiChina
  2. 2.Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent MachineHefeiChina

Personalised recommendations