Advertisement

International Journal of Computer Vision

, Volume 127, Issue 6–7, pp 884–906 | Cite as

Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning

  • Shan Li
  • Weihong DengEmail author
Article

Abstract

Comprehending different categories of facial expressions plays a great role in the design of computational model analyzing human perceived and affective state. Authoritative studies have revealed that facial expressions in human daily life are in multiple or co-occurring mental states. However, due to the lack of valid datasets, most previous studies are still restricted to basic emotions with single label. In this paper, we present a novel multi-label facial expression database, RAF-ML, along with a new deep learning algorithm, to address this problem. Specifically, a crowdsourcing annotation of 1.2 million labels from 315 participants was implemented to identify the multi-label expressions collected from social network, then EM algorithm was designed to filter out unreliable labels. For all we know, RAF-ML is the first database in the wild that provides with crowdsourced cognition for multi-label expressions. Focusing on the ambiguity and continuity of blended expressions, we propose a new deep manifold learning network, called Deep Bi-Manifold CNN, to learn the discriminative feature for multi-label expressions by jointly preserving the local affinity of deep features and the manifold structures of emotion labels. Furthermore, a deep domain adaption method is leveraged to extend the deep manifold features learned from RAF-ML to other expression databases under various imaging conditions and cultures. Extensive experiments on the RAF-ML and other diverse databases (JAFFE, CK\(+\), SFEW and MMI) show that the deep manifold feature is not only superior in multi-label expression recognition in the wild, but also captures the elemental and generic components that are effective for a wide range of expression recognition tasks.

Keywords

Facial expression recognition Deep feature learning Multi-label classification Crowdsourced database in-the-wild 

Notes

Acknowledgements

The funding was provided by National Natural Science Foundation of China (Grant Nos 61573068, 61471048), Beijing Nova Program (Grant No Z161100004916088).

Supplementary material

11263_2018_1131_MOESM1_ESM.pdf (118 kb)
Supplementary material 1 (pdf 118 KB)

References

  1. Anitha, C., Venkatesha, M., & Adiga, B. S. (2010). A survey on facial expression databases. International Journal of Engineering Science and Technology, 2(10), 5158–5174.Google Scholar
  2. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. Software Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  3. Chang, Y., Hu, C., & Turk, M. (2004). Probabilistic expression analysis on manifolds. In Computer vision and pattern recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE computer society conference on (Vol. 2, pp. II–II). IEEE.Google Scholar
  4. Chen, J., Liu, X., Tu, P., & Aragones, A. (2013). Learning person-specific models for facial expression and action unit recognition. Pattern Recognition Letters, 34(15), 1964–1970.CrossRefGoogle Scholar
  5. Chu, W. S., De la Torre, F., & Cohn, J. F. (2013). Selective transfer machine for personalized facial action unit detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3515–3522).Google Scholar
  6. Cour, T., Sapp, B., & Taskar, B. (2011). Learning from partial labels. Journal of Machine Learning Research, 12(May), 1501–1536.MathSciNetzbMATHGoogle Scholar
  7. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion recognition in human–computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80.CrossRefGoogle Scholar
  8. Csurka, G. (2017). Domain adaptation for visual applications: A comprehensive survey. CoRR, arXiv:1702.05374.
  9. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In: CVPR, on (Vol. 1, pp. 886–893). IEEE.Google Scholar
  10. Dhall, A., Goecke, R., & Gedeon, T. (2015a). Automatic group happiness intensity analysis. IEEE Transactions on Affective Computing, 6(1), 13–26.CrossRefGoogle Scholar
  11. Dhall, A., Ramana Murthy, O., Goecke, R., Joshi, J., & Gedeon, T. (2015b). Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 423–426). ACM.Google Scholar
  12. Ding, X., Chu, W. S., De la Torre, F., Cohn, J. F., & Wang, Q. (2013). Facial action unit event detection by cascade of tasks. In Proceedings of the IEEE international conference on computer vision (pp. 2400–2407).Google Scholar
  13. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML (pp. 647–655).Google Scholar
  14. Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15), E1454–E1462.CrossRefGoogle Scholar
  15. Ekman, P., & Friesen, W. V. (2003). Unmasking the face: A guide to recognizing emotions from facial clues. Journal of Personality (p. 212). Cambridge, MA: Malor Books.Google Scholar
  16. Ekman, P., Friesen, W. V., & Ellsworth, P. (2013). Emotion in the human face: Guidelines for research and an integration of findings. Amsterdam: Elsevier.Google Scholar
  17. Ekman, P., & Rosenberg, E. L. (1997). What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford: Oxford University Press.Google Scholar
  18. Ekman, P., & Scherer, K. (1984). Expression and the nature of emotion. Approaches to Emotion, 3, 19–344.Google Scholar
  19. Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015a). Discriminative shared gaussian processes for multiview and view-invariant facial expression recognition. IEEE Transactions on Image Processing, 24(1), 189–204.MathSciNetCrossRefzbMATHGoogle Scholar
  20. Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015b). Multi-conditional latent variable model for joint facial action unit detection. In Proceedings of the IEEE international conference on computer vision (pp. 3792–3800).Google Scholar
  21. Fabian Benitez-Quiroz, C., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5562–5570).Google Scholar
  22. Fürnkranz, J., Hüllermeier, E., Mencía, E. L., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133–153.CrossRefGoogle Scholar
  23. Gao, B. B., Xing, C., Xie, C. W., Wu, J., & Geng, X. (2017). Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 26(6), 2825–2838.MathSciNetCrossRefzbMATHGoogle Scholar
  24. Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012a). A kernel two-sample test. Journal of Machine Learning Research, 13(Mar), 723–773.MathSciNetzbMATHGoogle Scholar
  25. Gretton, A., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M., Fukumizu, K., & Sriperumbudur, B. K. (2012b). Optimal kernel choice for large-scale two-sample tests. In Advances in neural information processing systems (pp. 1205–1213).Google Scholar
  26. Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In Computer vision and pattern recognition, 2006 IEEE computer society conference on (Vol. 2, pp. 1735–1742). IEEE.Google Scholar
  27. Hassin, R. R., Aviezer, H., & Bentin, S. (2013). Inherently ambiguous: Facial expressions of emotions, in context. Emotion Review, 5(1), 60–65.CrossRefGoogle Scholar
  28. He, X., & Niyogi, P. (2004). Locality preserving projections. In Advances in neural information processing systems (pp. 153–160).Google Scholar
  29. Hou, P., Geng, X., & Zhang, M. L. (2016). Multi-label manifold learning. In AAAI (pp. 1680–1686).Google Scholar
  30. Huang, S. J., Zhou, Z. H., & Zhou, Z. (2012). Multi-label learning by exploiting label correlations locally. In AAAI (pp. 949–955).Google Scholar
  31. Inc. M. (2013). Face++ research toolkit. www.faceplusplus.com.
  32. Izard, C. E. (1972). Anxiety: A variable combination of interacting fundamental emotions. In Anxiety: Current trends in theory and research (Vol. 1, pp. 55–106).Google Scholar
  33. Izard, C. E. (2013). Human emotions. Berlin: Springer.Google Scholar
  34. Jack, R. E., Garrod, O. G., Yu, H., Caldara, R., & Schyns, P. G. (2012). Facial expressions of emotion are not culturally universal. Proceedings of the National Academy of Sciences, 109(19), 7241–7244.CrossRefGoogle Scholar
  35. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on multimedia (pp. 675–678). ACM.Google Scholar
  36. Jung, H., Lee, S., Yim, J., Park, S., & Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE international conference on computer vision (pp. 2983–2991).Google Scholar
  37. Kim, B. K., Roh, J., Dong, S. Y., & Lee, S. Y. (2016). Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. Journal on Multimodal User, Interfaces, 1–17.Google Scholar
  38. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).Google Scholar
  39. Li, S., & Deng, W. (2018). Deep facial expression recognition: A survey. CoRR, arXiv:1804.08348.
  40. Li, S., & Deng, W. (2019). Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Transactions on Image Processing, 28(1), 356–370.MathSciNetCrossRefzbMATHGoogle Scholar
  41. Li, S., Deng, W., & Du, J. (2017). Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2584–2593). IEEE.Google Scholar
  42. Liu, M., Li, S., Shan, S., & Chen, X. (2013). Au-aware deep networks for facial expression recognition. In Automatic face and gesture recognition (FG), 2013 10th IEEE international conference and workshops on (pp. 1–6). IEEE.Google Scholar
  43. Liu, M., Li, S., Shan, S., Wang, R., & Chen, X. (2014a). Deeply learning deformable facial action parts model for dynamic expression analysis. In Asian conference on computer vision (pp. 143–157). Berlin: Springer.Google Scholar
  44. Liu, M., Shan, S., Wang, R., & Chen, X. (2014b). Learning expression lets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1749–1756).Google Scholar
  45. Liu, M., Shan, S., Wang, R., & Chen, X. (2016). Learning expressionlets via universal manifold model for dynamic facial expression recognition. IEEE Transactions on Image Processing, 25(12), 5920–5932.MathSciNetCrossRefzbMATHGoogle Scholar
  46. Long, M., Cao, Y., Wang, J., & Jordan, M. (2015). Learning transferable features with deep adaptation networks. In International conference on machine learning (pp. 97–105).Google Scholar
  47. Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended Cohn–Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In CVPRW, on (pp. 94–101). IEEE.Google Scholar
  48. Lv, Y., Feng, Z., & Xu, C. (2014). Facial expression recognition via deep learning. In Smart computing (SMARTCOMP), 2014 international conference on (pp. 303–308). IEEE.Google Scholar
  49. Lyons, M., Akamatsu, S., Kamachi, M., & Gyoba, J. (1998). Coding facial expressions with Gabor wavelets. In Automatic face and gesture recognition, 1998. Proceedings. Third IEEE international conference on (pp. 200–205). IEEE.Google Scholar
  50. Miao, Y. Q., Araujo, R., & Kamel, M. S. (2012) Cross-domain facial expression recognition using supervised kernel mean matching. In Machine learning and applications (ICMLA), 2012 11th international conference on, IEEE (Vol. 2, pp. 326–332).Google Scholar
  51. Mollahosseini, A., Chan, D., & Mahoor, M. H. (2016). Going deeper in facial expression recognition using deep neural networks. In 2016 IEEE Winter conference on applications of computer vision (WACV) (pp. 1–10). IEEE.Google Scholar
  52. Ng, H. W., Nguyen, V. D., Vonikakis, V., & Winkler, S. (2015). Deep learning for emotion recognition on small datasets using transfer learning. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 443–449). ACM.Google Scholar
  53. Nummenmaa, T. (1988). The recognition of pure and blended facial expressions of emotion from still photographs. Scandinavian Journal of Psychology, 29(1), 33–47.CrossRefGoogle Scholar
  54. Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.CrossRefzbMATHGoogle Scholar
  55. Pantic, M., & Rothkrantz, L. J. M. (2000). Automatic analysis of facial expressions: The state of the art. IEEE Transactions on pattern analysis and machine intelligence, 22(12), 1424–1445.CrossRefGoogle Scholar
  56. Patel, V. M., Gopalan, R., Li, R., & Chellappa, R. (2015). Visual domain adaptation: A survey of recent advances. IEEE Signal Processing Magazine, 32(3), 53–69.CrossRefGoogle Scholar
  57. Plutchik, R. (1991). The emotions. Lanham: University Press of America.Google Scholar
  58. Russell, J. A., & Barrett, L. F. (1999). Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant. Journal of Personality and Social Psychology, 76(5), 805.CrossRefGoogle Scholar
  59. Sariyanidi, E., Gunes, H., & Cavallaro, A. (2015). Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(6), 1113–1133.CrossRefGoogle Scholar
  60. Sariyanidi, E., Gunes, H., & Cavallaro, A. (2017). Learning bases of activity for facial expression recognition. IEEE Transactions on Image Processing, 26(4), 1965–1978.  https://doi.org/10.1109/TIP.2017.2662237.MathSciNetCrossRefzbMATHGoogle Scholar
  61. Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. (pp. 815–823).Google Scholar
  62. Sharif Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 806–813).Google Scholar
  63. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, arXiv:1409.1556.
  64. Tomkins, S. S. (1963). Affect imagery consciousness: Volume II: The negative affects (Vol. 2). Berlin: Springer.Google Scholar
  65. Tsoumakas, G., & Katakis, I. (2006). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 1–13.CrossRefGoogle Scholar
  66. Tsoumakas, G., & Vlahavas, I. (2007). Random k-labelsets: An ensemble method for multilabel classification. In Machine learning: ECML 2007 (pp. 406–417). Berlin: Springer.Google Scholar
  67. Valstar, M., & Pantic, M. (2010). Induced disgust, happiness and surprise: An addition to the MMI facial expression database. In Proceedings of the 3rd international workshop on EMOTION (satellite of LREC): Corpora for research on emotion and affect (p. 65).Google Scholar
  68. Viola, P., & Jones, M. (2001) Rapid object detection using a boosted cascade of simple features. In CVPR, on (Vol. 1, pp. 1–511). IEEE.Google Scholar
  69. Wang, S., Liu, Z., Wang, J., Wang, Z., Li, Y., Chen, X., et al. (2014). Exploiting multi-expression dependences for implicit multi-emotion video tagging. Image and Vision Computing, 32(10), 682–691.CrossRefGoogle Scholar
  70. Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In European conference on computer vision. Berlin: Springer (pp. 499–515).Google Scholar
  71. Whitehill, J., Wu, T. F., Bergsma, J., Movellan, J. R., & Ruvolo, P. L. (2009). Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in neural information processing systems (pp. 2035–2043).Google Scholar
  72. Xing, C., Geng, X., & Xue, H. (2016). Logistic boosting regression for label distribution learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4489–4497).Google Scholar
  73. Xiong, X., & De la Torre, F. (2013). Supervised descent method and its applications to face alignment. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 532–539).Google Scholar
  74. Yan, H., Ang, M. H., & Poo, A. N. (2011). Cross-dataset facial expression recognition. In Robotics and automation (ICRA), 2011 IEEE international conference on (pp. 5985–5990). IEEE.Google Scholar
  75. Yin, L., Wei, X., Sun, Y., Wang, J., & Rosato, M. J. (2006). A 3d facial expression database for facial behavior research. In Automatic face and gesture recognition, 2006. FGR 2006. 7th international conference on (pp. 211–216). IEEE.Google Scholar
  76. Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems (pp. 3320–3328).Google Scholar
  77. Yu, Z., & Zhang, C. (2015). Image based static facial expression recognition with multiple deep network learning. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 435–442). ACM.Google Scholar
  78. Zen, G., Porzi, L., Sangineto, E., Ricci, E., & Sebe, N. (2016). Learning personalized models for facial expression analysis and gesture recognition. IEEE Transactions on Multimedia, 18(4), 775–788.CrossRefGoogle Scholar
  79. Zeng, J., Chu, W. S., De la Torre, F., Cohn, J. F., & Xiong, Z. (2015). Confidence preserving machine for facial action unit detection. In Proceedings of the IEEE international conference on computer vision (pp. 3622–3630).Google Scholar
  80. Zhang, M. L., & Wu, L. (2015). Lift: Multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 107–120.CrossRefGoogle Scholar
  81. Zhang, M. L., & Yu, F. (2015). Solvingthe partial label learning problem: An instance-based approach. In IJCAI (pp. 4048–4054).Google Scholar
  82. Zhang, M. L., & Zhou, Z. H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.CrossRefzbMATHGoogle Scholar
  83. Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.CrossRefGoogle Scholar
  84. Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2018). From facial expression recognition to interpersonal relation prediction. International Journal of Computer Vision, 126(5), 550–569.MathSciNetCrossRefGoogle Scholar
  85. Zhao, K., Zhang, H., Ma, Z., Song, Y. Z., & Guo, J. (2015). Multi-label learning with prior knowledge for facial expression analysis. Neurocomputing, 157, 280–289.CrossRefGoogle Scholar
  86. Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., & Metaxas, D. N. (2012). Learning active facial patches for expression analysis. In Computer vision and pattern recognition (CVPR), 2012 IEEE conference on (pp. 2562–2569). IEEE.Google Scholar
  87. Zhou, Y., Xue, H., & Geng, X. (2015). Emotion distribution recognition from facial expressions. In Proceedings of the 23rd ACM international conference on multimedia (pp. 1247–1250). ACM.Google Scholar
  88. Zhu, R., Sang, G., & Zhao, Q. (2016). Discriminative feature adaptation for cross-domain facial expression recognition. In Biometrics (ICB), 2016 international conference on (pp. 1–7). IEEE.Google Scholar
  89. Zong, Y., Huang, X., Zheng, W., Cui, Z., & Zhao, G. (2017). Learning a target sample re-generator for cross-database micro-expression recognition. In Proceedings of the 2017 ACM on multimedia conference (pp. 872–880). ACM.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Beijing University of Posts and Telecommunications (BUPT)BeijingChina

Personalised recommendations