Abstract
Recently, emotion recognition in the wild has been attracted in computer vision and affective computing. In contrast to classical emotion recognition, emotion recognition in the wild becomes more challenging since the databases are collected under real scenarios. In such databases, there would inevitably be various adverse samples, whose emotion labels are considerably hard to be identified using many ideal databases based classical emotion recognition methods. Therefore, it significantly increases the difficulty of emotion recognition task based on the wild databases. In this paper, we propose to use a transductive transfer learning framework to handle the problem of emotion recognition in the wild. We develop a sparse transductive transfer linear discriminant analysis (STTLDA) for facial expression recognition and speech emotion recognition under real-world environments, respectively. As far as we know, the novelty of our method is that we are the first to consider emotion recognition in the wild as a transfer learning problem and use the transductive transfer learning method to eliminate the distribution difference between training and testing samples caused by the “wild”. We conduct extensive experiments on SFEW 2.0, AFEW 4.0 and 5.0 (audio part) databases, which were used in Emotion Recognition in the Wild Challenge (EmotiW 2014 and 2015) to evaluate our proposed method. Experimental results demonstrate that our proposed STTLDA achieves a satisfactory performance compared with the baseline provided by the challenge organizers and some competitive methods. In addition, we report our previous results in static image based facial expression recognition challenge of EmotiW 2015. In this competition, we achieve an accuracy of 50 % on the Test set and this result has a 10.87 % improvement compared with the baseline released by challenge organizers.
Similar content being viewed by others
References
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syste Technol (TIST) 2(3):27
Chen J, Chen Z, Chi Z, Fu H (2014) Emotion recognition in the wild with feature fusion and multiple kernel learning. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 508–513
Dhall A, Goecke R, Joshi J, Sikka K, Gedeon T (2014) Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 461–466
Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T (2013) Emotion recognition in the wild challenge 2013. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 509–516
Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: data, evaluation protocol and benchmark. In: IEEE international conference on computer vision workshops (ICCV Workshops). IEEE, pp 2106–2112
Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE MultiMed 19(3):34–41
Dhall A, Murthy OR, Goecke R, Joshi J, Gedeon T (2015) Video and image based emotion recognition challenges in the wild: emotiw 2015. In: Proceedings of the 2015 ACM on international conference on multimodal interaction. ACM, pp 423–426
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587
Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the international conference on multimedia. ACM, pp 1459–1462
Huang X, Dhall A, Zhao G, Goecke R, Pietikäinen M (2015) Riesz-based volume local binary pattern and a novel group expression model for group happiness intensity analysis. In: BMVC, pp 1–13
Huang X, Zhao G, Hong X, Pietikäinen M (2013) Texture description with completed local quantized patterns. In: Proceedings of scandinavian conference on image analysis. Springer, pp 1–10
Huang X, Zhao G, Zheng W, Pietikäinen M (2012) Spatiotemporal local monogenic binary patterns for facial expression recognition. IEEE Signal Process Lett 19(5):243–246
Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçehre Ç, Memisevic R, Vincent P, Courville A, Bengio Y, Ferrari RC et al (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 543–550
Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184
Liu M, Wang R, Huang Z, Shan S, Chen X (2013) Partial least squares regression on grassmannian manifold for emotion recognition. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 525–530
Liu M, Wang R, Li S, Shan S, Huang Z, Chen X (2014) Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 494–501
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 94–101
Lyons M, Akamatsu S, Kamachi M, Gyoba J (1998) Coding facial expressions with gabor wavelets. In: Proceedings of 3rd IEEE international conference on automatic face and gesture recognition. IEEE, pp 200–205
Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary pattern. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Ojansivu V, Heikkilä J (2008) Blur insensitive texture classification using local phase quantization. In: International conference on image and signal processing, pp 236–243
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Schuller B, Steidl S, Batliner A (2009) The interspeech 2009 emotion challenge. In: INTERSPEECH, vol 2009, pp 312–315
Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Müller CA, Narayanan SS (2010) The interspeech 2010 paralinguistic challenge. In: INTERSPEECH, pp 2794–2797
Schuller B, Valstar M, Eyben F, McKeown G, Cowie R, Pantic M (2011) Avec 2011–the first international audio/visual emotion challenge. In: Affective computing and intelligent interaction. Springer, pp 415–424
Schuller B, Valster M, Eyben F, Cowie R, Pantic M (2012) Avec 2012: the continuous audio/visual emotion challenge. In: Proceedings of the 14th ACM international conference on Multimodal interaction. ACM, pp 449–456
Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27(6):803–816
Sikka K, Dykstra K, Sathyanarayana S, Littlewort G, Bartlett M (2013) Multiple kernel learning for emotion recognition in the wild. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 517–524
Sun B, Li L, Zuo T, Chen Y, Zhou G, Wu X (2014) Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 481–486
Tian Y, Kanade T, Cohn JF (2011) Facial expression recognition. In: Handbook of face recognition. Springer, pp 487–519
Valstar M, Schuller B, Smith K, Eyben F, Jiang B, Bilakhia S, Schnieder S, Cowie R, Pantic M (2013) Avec 2013: the continuous audio/visual emotion and depression recognition challenge. In: 3rd ACM international workshop on Audio/visual emotion challenge. ACM, pp 3–10
Zeng Z, Pantic M, Roisman G, Huang TS et al (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Zheng W (2014) Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans Affect Comput 5(1):71–85
Zheng W, Xin M, Wang X, Wang B (2014) A novel speech emotion recognition method via incomplete sparse least square regression. IEEE Signal Process Lett 21(5):569–572
Zheng W, Zhou X (2015) Cross-pose color facial expression recognition using transductive transfer linear discriminant analysis. In: Proceedings of the IEEE international conference on image processing. IEEE, pp 1935–1939
Zong Y, Zheng W, Huang X, Yan J, Zhang T (2015) Transductive transfer lda with riesz-based volume lbp for emotion recognition in the wild. In: Proceedings of the 2015 ACM on international conference on multimodal interaction. ACM, pp 491–496
Acknowledgments
The authors would like to thank anonymous reviewers for their useful comments and valuable suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was partly supported by the National Basic Research Program of China under Grant 2015CB351704 and 2011CB302202, the National Natural Science Foundation of China (NSFC) under Grants 61231002 and 61201444, the Ph.D. Program Foundation of Ministry Education of China under Grant 20120092110054, the Natural Science Foundation of Jiangsu Province under Grant BK20130020, and the Graduate Research Innovation Project of Jiangsu Province under Grant KYZZ15_0055.
Rights and permissions
About this article
Cite this article
Zong, Y., Zheng, W., Huang, X. et al. Emotion recognition in the wild via sparse transductive transfer linear discriminant analysis. J Multimodal User Interfaces 10, 163–172 (2016). https://doi.org/10.1007/s12193-015-0210-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-015-0210-7