Multimedia Tools and Applications

, Volume 78, Issue 14, pp 20309–20332 | Cite as

Handling missing labels and class imbalance challenges simultaneously for facial action unit recognition

  • Yongqiang Li
  • Baoyuan WuEmail author
  • Yongping Zhao
  • Hongxun Yao
  • Qiang Ji


Facial action unit (AU) recognition has attracted great attention because of the applications in a wide range of fields. Missing labels and class imbalance (CIB) are both challenges for facial action unit recognition. Missing labels means that there are only apart label assignments for training samples. CIB is observed from two perspectives: firstly, the number of positive AUs is much smaller than that of negative AUs for each expressional image; secondly, the rate of positive samples of different AUs are significantly different. Both missing labels and CIB lead to performance degradation in AU recognition. In this work, we propose to handle these two challenges in AU recognition simultaneously. Specifically, we formulate AU recognition with missing labels as a multi label learning with missing labels (MLML) problem, which handles the missing label challenge naturally. However, different from most existing MLML approaches which usually employ same features from whole image for all classes, we select the most related features for each AU. To handle the CIB challenge, we further introduce class cardinality bounds which constrain the number of positive AUs for each data instance, as well as the number of positive labels for each AU in the overall dataset. The class cardinality bounds serve as linear constraints for the objective function, which turns the optimization NP-hard. Thus we present convex approximation based on the Lovasz extension, which leads to a linear program that can be efficiently solved by the alternative direction method of multipliers (ADMM). Experimental results on both posed and spontaneous facial expression datasets demonstrate the superiority of the proposed method compared to state-of-the-art.


Face action unit recognition Multi-label learning Missing labels Class imbalance 



Yongqiang Li is supported by National Natural Science Foundation of China (No. 61402129), and Postdoctoral Foundation Projects (No. LBH-Z14090, No. 2015M571417 and No. 2017T100243). Baoyuan Wu is supported by Tencent AI Lab Foundation. Hongxun Yao is partially supported by National Natural Science Foundation of China (No. 61472103) and Key Program (No. 61133003).

Supplementary material (237 kb)
(ZIP 237 KB)


  1. 1.
    Bach FR (2013) Learning with submodular functions: a convex optimization perspective. arXiv: Learn 6:145–373zbMATHGoogle Scholar
  2. 2.
    Boyd B, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. In: Foundations and trends in machine learning, pp 1–122Google Scholar
  3. 3.
    Boyd S, Vandenberghe L (2013) Convex optimization. Cambridge University Press, CambridgezbMATHGoogle Scholar
  4. 4.
    Boykov Y, Kolmogorov V (2004) An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans Pattern Anal Mach Intell 26 (9):1124CrossRefzbMATHGoogle Scholar
  5. 5.
    Breazeal CL (2000) Sociable machines: expressive social exchange between humans and robots. Doctoral dissertation, Massachusetts Institute of TechnologyGoogle Scholar
  6. 6.
    Bucak SS, Jin R, Jain AK (2011) Multi-label learning with incomplete class assignments. In: Computer vision and pattern recognition, pp 2801–2808Google Scholar
  7. 7.
    Cabral RS, Torre FDL, Costeira JP, Bernardino A (2011) Matrix completion for multi-label image classification. In: Advances in neural information processing systems, pp 190–198Google Scholar
  8. 8.
    Chen G, Song Y, Wang F, Zhang C (2008) Semi-supervised multi-label learning by solving a sylvester equation. In: Siam international conference on data mining, SDM 2008, Atlanta, pp 410–419Google Scholar
  9. 9.
    Cootes TF. aam tools. [online]. available:
  10. 10.
    Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685CrossRefGoogle Scholar
  11. 11.
    Dembczynski K, Jachnik A, Kotlowski W, Waegeman W, Hullermeier E (2013) Optimizing the F-measure in multi-label classification: plug-in rule approach versus structured loss minimization. In: International conference on machine learning, pp 1130–1138Google Scholar
  12. 12.
    Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol:1–38Google Scholar
  13. 13.
    Ekman PE, Friesen WV, Hager JC (2002) Facial action coding system. A human face, Salt Lake CityGoogle Scholar
  14. 14.
    Geman S, Geman D (1984) Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741CrossRefzbMATHGoogle Scholar
  15. 15.
    Goldberg AB, Zhu X, Recht B, Xu J, Nowak RD (2010) Transduction with matrix completion: Three birds with one stone. In: Advances in neural information processing systems, pp 757–765Google Scholar
  16. 16.
    Hamm J, Kohler CG, Gur RC, Verma R (2011) Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders. J Neurosci Methods 200(2):237–256CrossRefGoogle Scholar
  17. 17.
    Han S, Meng Z, Khan AS, Tong Y (2016) Incremental boosting convolutional neural network for facial 613 action unit recognition. In: Advances in neural information processing system, pp 109–117Google Scholar
  18. 18.
    Jiang B, Valstar M, Pantic M (2011) Action unit detection using sparse appearance descriptors in space-time video volumes. In: IEEE International conference on automatic face & gesture recognition and workshops, pp 314–321Google Scholar
  19. 19.
    Li Y, Chen J, Zhao Y, Ji Q (2013) Data-free prior model for facial action unit recognition. IEEE Trans Affect Comput 4(2):127–141CrossRefGoogle Scholar
  20. 20.
    Li Y, Wang S, Zhao Y, Ji Q (2013) Simultaneous facial feature tracking and facial expression recognition. IEEE Trans Image Process 22(7):2559–2573CrossRefGoogle Scholar
  21. 21.
    Li Y, Wu B, Ghanem B, Zhao Y, Yao H, Ji Q (2016) Facial action unit recognition under incomplete data based on multi-label learning with missing labels. Pattern Recogn 60:890–900CrossRefGoogle Scholar
  22. 22.
    Liao W, Ji Q (2009) Learning bayesian network parameters under incomplete data with domain knowledge. Pattern Recogn 42(11):3046–3056CrossRefzbMATHGoogle Scholar
  23. 23.
    Liu Z, Wang S, Wang Z, Ji Q (2013) Implicit video multi-emotion tagging by exploiting multi-expression relations. In: IEEE International conference and workshops on automatic face and gesture recognition, pp 1–6Google Scholar
  24. 24.
    Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The Extended Cohn-Kanade Dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Computer vision and pattern recognition, pp 94–101Google Scholar
  25. 25.
    Mahoor MH, Cadavid S, Messinger DS, Cohn JF (2009) A framework for automated measurement of the intensity of non-posed facial action units. In: 2009. CVPR workshops 2009. IEEE computer society conference on computer vision and pattern recognition workshops, pp 74–80Google Scholar
  26. 26.
    Mckeown G, Valstar M, Cowie R, Pantic M, Schroder M (2012) The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17CrossRefGoogle Scholar
  27. 27.
    Pantic M, Patras I (2006) Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences. IEEE Trans Syst Man Cybern Part B 36(2):433–49CrossRefGoogle Scholar
  28. 28.
    Petterson J, Caetano T (2010) Reverse multi-label learning. In: International conference on neural information processing systems, pp 1912–1920Google Scholar
  29. 29.
    Rudovic O, Pavlovic V, Pantic M (2015) Context-sensitive dynamic ordinal regression for intensity estimation of facial action units. IEEE Trans Pattern Anal Mach Intell 37(5):944–958CrossRefGoogle Scholar
  30. 30.
    Sandbach G, Zafeiriou S, Pantic M (2013) Markov random field structures for facial action unit intensity estimation. In: IEEE International conference on computer vision workshops, pp 738–745Google Scholar
  31. 31.
    Sorower MS (2010) A literature survey on algorithms for multi-label learning. Oregon State UniversityGoogle Scholar
  32. 32.
    Sun YY, Zhang Y, Zhou ZH (2010) Multi-label learning with weak label. In: Twenty-fourth AAAI conference on artificial intelligence, pp 593–598Google Scholar
  33. 33.
    Tian Y, Kanade T, Cohn JF (2001) Recognizing action units for facial expression analysis. IEEE Trans Pattern Anal Mach Intell 23(2):97–115CrossRefGoogle Scholar
  34. 34.
    Tong Y, Ji Q (2008) Learning bayesian networks with qualitative constraints. In: Computer vision and pattern recognition, pp 1–8Google Scholar
  35. 35.
    Tong Y, Chen J, Ji Q (2010) A unified probabilistic framework for spontaneous facial action modeling and understanding. IEEE Trans Pattern Anal Mach Intell 32(2):258–273CrossRefGoogle Scholar
  36. 36.
    Tong Y, Liao W, Ji Q (2007) Facial action unit recognition by exploiting their dynamic and semantic relationships. IEEE Trans Pattern Anal Mach Intell 29(10):1683–1699CrossRefGoogle Scholar
  37. 37.
    Valstar M, Pantic M (2007) Combined support vector machines and hidden markov models for modeling facial action temporal dynamics. In: IEEE International conference on human-computer interaction, pp 118–127Google Scholar
  38. 38.
    Wang Q, Si L, Zhang D (2014) Learning to hash with partial tags: exploring correlation between tags and hashing bits for large scale image retrieval. In: European conference on computer vision, pp 378– 392Google Scholar
  39. 39.
    Wu B, Liu Z, Wang S, Hu B, Ji Q (2014) Multi-label learning with missing labels. In: International conference on pattern recognition, pp 1964–1968Google Scholar
  40. 40.
    Wu B, Lyu S, Ghanem B (2015) Ml-mg: multi-label learning with missing labels using a mixed graph. In: IEEE International conference on computer vision, pp 4157–4165Google Scholar
  41. 41.
    Wu B, Lyu S, Hu B, Ji Q (2015) Multi-label learning with missing labels for image annotation and facial action unit recognition. Pattern Recogn 48(7):2279–2289CrossRefGoogle Scholar
  42. 42.
    Wu B, Lyu S, Ghanem B (2016) Constrained submodular minimization for missing labels and class imbalance in multi-label learning. In: The thirtieth AAAI conference on artificial intelligenceGoogle Scholar
  43. 43.
    Xu M, Jin R, Zhou ZH (2013) Speedup matrix completion with side information: application to multi-label learning. In: Advances in neural information processing systems, pp 2301–2309Google Scholar
  44. 44.
    Zehfuss G (1858) ÜBer eine gewisse determinante. Zeitschrift für Mathematik und Physik, pp 298– 301Google Scholar
  45. 45.
    Zelnikmanor L, Perona P (2005) Self-tuning spectral clustering. In: Advances in neural information processing systems, pp 1601–1608Google Scholar
  46. 46.
    Zhang ML, Li YK, Liu XY (2015) Towards class-imbalance aware multi-label learning. In: International conference on artificial intelligence, pp 4041–4047Google Scholar
  47. 47.
    Zhu X (2005) Semi-supervised learning literature survey. Comput Sci 37(1):63–77MathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Harbin Institute of TechnologyHarbinChina
  2. 2.Tencent AI LabBellevueUSA
  3. 3.Department of Electrical, Computer, and Systems Engineering Rensselaer Polytechnic InstituteTroUSA

Personalised recommendations