Video Emotion Recognition Using Local Enhanced Motion History Image and CNN-RNN Networks

Wang, Haowen; Zhou, Guoxiang; Hu, Min; Wang, Xiaohua

doi:10.1007/978-3-319-97909-0_12

Haowen Wang^21,22,
Guoxiang Zhou²¹,
Min Hu^21,22 &
…
Xiaohua Wang^21,22

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10996))

Included in the following conference series:

Chinese Conference on Biometric Recognition

3278 Accesses
5 Citations

Abstract

This paper focus on the issue of recognition of facial expressions in video sequences and propose a local-with-global method, which is based on local enhanced motion history image and CNN-RNN networks. On the one hand, traditional motion history image method is improved by using detected human facial landmarks as attention areas to boost local value in difference image calculation, so that the action of crucial facial unit can be captured effectively, then the generated LEMHI is fed into a CNN network for categorization. On the other hand, a CNN-LSTM model is used as an global feature extractor and classifier for video emotion recognition. Finally, a random search weighted summation strategy is selected as our late-fusion fashion to final predication. Experiments on AFEW, CK+ and MMI datasets using subject-independent validation scheme demonstrate that the integrated framework achieves a better performance than state-of-arts methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lecun, Y., Huang, F.J., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Computer Vision and Pattern Recognition, CVPR 2004 (2004)
Google Scholar
Fan, Y., Lu, X., Li, D., Liu, Y.: Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In: ACM International Conference on Multimodal Interaction, pp. 445–450. ACM (2016)
Google Scholar
Hosseini, S., Lee, S.H., Cho, N.I.: Feeding hand-crafted features for enhancing the performance of convolutional neural networks (2018)
Google Scholar
Koelstra, S., Pantic, M., Patras, I.: A dynamic texture-based approach to recognition of facial actions and their temporal models. IEEE Trans. Pattern Anal. Mach. Intell. 32(11), 1940–1954 (2010)
Article Google Scholar
Hasani, B., Mahoor, M.H.: Facial expression recognition using enhanced deep 3D convolutional neural networks (2017)
Google Scholar
Ma, C.Y., Chen, M.H., Kira, Z., et al.: TS-LSTM and temporal-inception: exploiting spatiotemporal dynamics for activity recognition (2017)
Google Scholar
Razavian, A.S., Azizpour, H., Sullivan, J., et al.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519. IEEE Computer Society (2014)
Google Scholar
Mayer, C., Eggers, M., Radig, B.: Cross-database evaluation for facial expression recognition. Pattern Recogn. Image Anal. 24(1), 124–132 (2014)
Article Google Scholar
Lee, S.H., Yong, M.R.: Intra-class variation reduction using training expression images for sparse representation based facial expression recognition. IEEE Trans. Affect. Comput. 5(3), 340–351 (2017)
Article Google Scholar
Taheri, S., Qiu, Q., Chellappa, R.: Structure-preserving sparse decomposition for facial expression analysis. IEEE Trans. Image Process. 23(8), 3590–3603 (2014)
Article MathSciNet Google Scholar
Liu, M., Li, S., Shan, S., Wang, R., Chen, X.: Deeply learning deformable facial action parts model for dynamic expression analysis. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 143–157. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16817-3_10
Chapter Google Scholar
Liu, M., Shan, S., Wang, R., et al.: Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1749–1756. IEEE Computer Society (2014)
Google Scholar
Shan, C., Gong, S., Mcowan, P.W.: Facial expression recognition based on Local Binary Patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)
Article Google Scholar
Fan, X., Tjahjadi, T.: A dynamic framework based on local Zernike moment and motion history image for facial expression recognition. Pattern Recogn. 64, 399–406 (2017)
Article Google Scholar
Yao, A., Shao, J., Ma, N., et al.: Capturing AU-aware facial features and their latent relations for emotion recognition in the wild. In: ACM on International Conference on Multimodal Interaction, pp. 451–458. ACM (2015)
Google Scholar

Download references

Acknowledgments

This research has been partially supported by National Natural Science Foundation of China under Grant Nos. 61672202, 61502141 and 61432004.

Author information

Authors and Affiliations

School of Computer and Information, Hefei University of Technology, Hefei, China
Haowen Wang, Guoxiang Zhou, Min Hu & Xiaohua Wang
Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei, 230009, China
Haowen Wang, Min Hu & Xiaohua Wang

Authors

Haowen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guoxiang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Min Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohua Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Haowen Wang or Min Hu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Zhou
Beihang University, Beijing, China
Yunhong Wang
Chinese Academy of Sciences, Beijing, China
Zhenan Sun
Xinjiang University, Urumqi, China
Zhenhong Jia
Tsinghua University, Beijing, China
Jianjiang Feng
Chinese Academy of Sciences, Beijing, China
Shiguang Shan
Xinjiang University, Urumqi, China
Kurban Ubul
Tsinghua University, Shenzhen, China
Zhenhua Guo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Zhou, G., Hu, M., Wang, X. (2018). Video Emotion Recognition Using Local Enhanced Motion History Image and CNN-RNN Networks. In: Zhou, J., et al. Biometric Recognition. CCBR 2018. Lecture Notes in Computer Science(), vol 10996. Springer, Cham. https://doi.org/10.1007/978-3-319-97909-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-97909-0_12
Published: 09 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97908-3
Online ISBN: 978-3-319-97909-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics