Skip to main content

Deep Learning Face Attributes for Detection and Alignment

  • Chapter
  • First Online:
Visual Attributes

Abstract

Describable face attributes are labels that can be given to a face image to describe its characteristics. Examples of face attributes include gender, age, ethnicity, face shape, and nose size. Predicting face attributes in the wild is challenging due to complex face variations. This chapter aims to provide an in-depth presentation of recent progress and the current state-of-the-art approaches to solving some of the fundamental challenges in face attribute recognition, particularly from the angle of deep learning. We highlight effective techniques for training deep convolutional networks for predicting face attributes in the wild, and addressing the problem of imbalanced distribution of attributes. In addition, we discuss the use of face attributes as rich contexts to facilitate accurate face detection and face alignment in return. The chapter ends by posing an open question for the face attribute recognition challenge arising from emerging and future applications .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The method is also applicable to other visual recognition problems that encounter imbalanced class distributions.

  2. 2.

    An imposter of a data point \(x_i\) is another data point \(x_j\) with a different class label, \(y_i \ne y_j\).

  3. 3.

    Employing clustering to aid classification is common in the literature [6, 65].

  4. 4.

    http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html.

  5. 5.

    The layers of a CNN have neurons arranged in 3 dimensions: width, height, and the third dimension of an activation volume.

  6. 6.

    IoU indicates Intersection over Union.

  7. 7.

    Data and codes of this work are available at http://mmlab.ie.cuhk.edu.hk/projects/TCDCN.html.

References

  1. Arbeláez, P., Pont-Tuset, J., Barron, J.T., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  2. Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting with constrained local models. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

    Google Scholar 

  3. Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930–2940 (2013)

    Article  Google Scholar 

  4. Berg, T., Belhumeur, P.N.: Poof: Part-based one-versus-one features for fine-grained categorization, face verification, and attribute estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

    Google Scholar 

  5. Bourdev, L., Maji, S., Malik, J.: Describing people: a poselet-based approach to attribute classification. In: International Conference on Computer Vision (ICCV) (2011)

    Google Scholar 

  6. Boureau, Y.L., Roux, N.L., Bach, F., Ponce, J., LeCun, Y.: Ask the locals: multi-way local pooling for image recognition. In: International Conference on Computer Vision (ICCV) (2011)

    Google Scholar 

  7. Burgos-Artizzu, X., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: International Conference on Computer Vision (ICCV) (2013)

    Google Scholar 

  8. Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

    Google Scholar 

  9. Chechik, G., Shalit, U., Sharma, V., Bengio, S.: An online algorithm for large scale image similarity learning. In: Conference on Neural Information Processing Systems (NIPS) (2009)

    Google Scholar 

  10. Chen, D., Ren, S., Wei, Y., Cao, X., Sun, J.: Joint cascade face detection and alignment. In: European Conference on Computer Vision (ECCV) (2014)

    Google Scholar 

  11. Chen, K., Gong, S., Xiang, T., Loy, C.C.: Cumulative attribute space for age and crowd density estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

    Google Scholar 

  12. Chrysos, G.G., Antonakos, E., Snape, P., Asthana, A., Zafeiriou, S.: A comprehensive performance evaluation of deformable face tracking “in-the-wild”. arXiv preprint arXiv:1603.06015 (2016)

  13. Chung, J., Lee, D., Seo, Y., Yoo, C.D.: Deep attribute networks. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2012)

    Google Scholar 

  14. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)

    Article  Google Scholar 

  15. Cootes, T.F., Ionita, M.C., Lindner, C., Sauer, P.: Robust and accurate shape model fitting using random forest regression voting. In: European Conference on Computer Vision (ECCV) (2012)

    Google Scholar 

  16. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2009)

    Google Scholar 

  17. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: European Conference on Computer Vision (ECCV) (2014)

    Google Scholar 

  18. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)

    Article  Google Scholar 

  19. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  20. Face++: http://www.faceplusplus.com/

  21. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  22. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 915–1929 (2013)

    Article  Google Scholar 

  23. Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  24. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2006)

    Google Scholar 

  25. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)

  26. Hu, Y., Lam, K.M., Qiu, G., Shen, T.: From local pixel structure to global image super-resolution: a new face hallucination framework. IEEE Trans. Image Process. 20(2), 433–445 (2011)

    Article  MathSciNet  Google Scholar 

  27. Huang, C., Loy, C.C., Tang, X.: Discriminative sparse neighbor approximation for imbalanced learning. arXiv preprint arXiv:1602.01197 (2016)

  28. Huang, C., Loy, C.C., Tang, X.: Learning deep representation for imbalanced classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  29. Huang, H., He, H., Fan, X., Zhang, J.: Super-resolution of human face image using canonical correlation analysis. Pattern Recogn. 43(7), 2532–2543 (2010)

    Article  MATH  Google Scholar 

  30. Huang, Z., Zhao, X., Shan, S., Wang, R., Chen, X.: Coupling alignments with recognition for still-to-video face recognition. In: International Conference on Computer Vision (ICCV), pp. 3296–3303 (2013)

    Google Scholar 

  31. Jain, V., Learned-Miller, E.: FDDB: a benchmark for face detection in unconstrained settings. university of massachusetts. Technical report, Amherst, Tech. Rep. UM-CS-2010-009 (2010)

    Google Scholar 

  32. Jain, V., Learned-Miller, E.: Online domain adaptation of a pre-trained cascade of classifiers. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2011)

    Google Scholar 

  33. Jin, Y., Bouganis, C.S.: Robust multi-image based blind face hallucination. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  34. Kazemi, V., Josephine, S.: One millisecond face alignment with an ensemble of regression trees. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  35. Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011)

    Google Scholar 

  36. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Conference on Neural Information Processing Systems (NIPS) (2012)

    Google Scholar 

  37. Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: International Conference on Computer Vision (ICCV) (2009)

    Google Scholar 

  38. Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Describable visual attributes for face verification and image search. IEEE Trans. Pattern Anal. Mach. Intell. 33(10), 1962–1977 (2011)

    Article  Google Scholar 

  39. Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: European Conference on Computer Vision (ECCV) (2012)

    Google Scholar 

  40. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Conference on Neural Information Processing Systems (NIPS) (1990)

    Google Scholar 

  41. Li, J., Zhang, Y.: Learning SURF cascade for fast and accurate object detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

    Google Scholar 

  42. Li, H., Hua, G., Lin, Z., Brandt, J., Yang, J.: Probabilistic elastic part model for unsupervised face detector adaptation. In: International Conference on Computer Vision (ICCV) (2013)

    Google Scholar 

  43. Li, H., Lin, Z., Brandt, J., Shen, X., Hua, G.: Efficient boosted exemplar-based face detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  44. Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Semantic image segmentation via deep parsing network. In: International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  45. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  46. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  47. Lu, C., Tang, X.: Surpassing human-level face verification performance on LFW with gaussianface. arXiv preprint arXiv:1404.3840 (2014)

  48. Luo, P., Wang, X., Tang, X.: A deep sum-product architecture for robust facial attributes analysis. In: International Conference on Computer Vision (ICCV) (2013)

    Google Scholar 

  49. Mathias, M., Benenson, R., Pedersoli, M., Van Gool, L.: Face detection without bells and whistles. In: European Conference on Computer Vision (ECCV) (2014)

    Google Scholar 

  50. McCullagh, P., Nelder, J.A., McCullagh, P.: Generalized linear models. Chapman and Hall London (1989)

    Google Scholar 

  51. Mnih, V., Hinton, G.: Learning to label aerial images from noisy data. In: International Conference on Machine Learning (ICML) (2012)

    Google Scholar 

  52. Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 fps via regressing local binary features. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  53. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Conference on Neural Information Processing Systems (NIPS) (2015)

    Google Scholar 

  54. Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: International Conference on Computer Vision Workshop (2013)

    Google Scholar 

  55. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  56. Shen, X., Lin, Z., Brandt, J., Wu, Y.: Detecting and aligning faces by image retrieval. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

    Google Scholar 

  57. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  58. Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

    Google Scholar 

  59. Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Conference on Neural Information Processing Systems (NIPS) (2014)

    Google Scholar 

  60. Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  61. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  62. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  63. Tzimiropoulos, G., Pantic, M.: Gauss-newton deformable part models for face alignment in-the-wild. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  64. Uijlings, J.R., van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)

    Article  Google Scholar 

  65. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

    Google Scholar 

  66. Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  67. Wang, N., Tao, D., Gao, X., Li, X., Li, J.: A comprehensive survey to face hallucination. Int. J. Comput. Vis. 106(1), 9–30 (2014)

    Article  Google Scholar 

  68. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)

    MATH  Google Scholar 

  69. Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

    Google Scholar 

  70. Xiong, X., Torre, F.: Supervised descent method and its applications to face alignment. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

    Google Scholar 

  71. Yan, J., Lei, Z., Wen, L., Li, S.: The fastest deformable part model for object detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  72. Yan, J., Zhang, X., Lei, Z., Li, S.Z.: Face detection by structural models. Image Vis. Comput. 32(10), 790–799 (2014)

    Article  Google Scholar 

  73. Yang, B., Yan, J., Lei, Z., Li, S.Z.: Aggregate channel features for multi-view face detection. In: International Joint Conference on Biometrics (IJCB) (2014)

    Google Scholar 

  74. Yang, C.Y., Liu, S., Yang, M.H.: Structured face hallucination. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

    Google Scholar 

  75. Yang, H., Patras, I.: Sieving regression forest votes for facial feature detection in the wild. In: International Conference on Computer Vision (ICCV) (2013)

    Google Scholar 

  76. Yang, H., Jia, X., Loy, C.C., Robinson, P.: An empirical study of recent face alignment methods. arXiv preprint arXiv:1511.05049 (2015)

  77. Yang, S., Luo, P., Loy, C.C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  78. Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  79. Yu, X., Huang, J., Zhang, S., Yan, W., Metaxas, D.N.: Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In: International Conference on Computer Vision (ICCV) (2013)

    Google Scholar 

  80. Zhang, J., Shan, S., Kan, M., Chen, X.: Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In: European Conference on Computer Vision (ECCV) (2014)

    Google Scholar 

  81. Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.: PANDA: pose aligned networks for deep attribute modeling. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  82. Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: European Conference on Computer Vision (ECCV) (2014)

    Google Scholar 

  83. Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning deep representation for face alignment with auxiliary attributes. IEEE Trans. Pattern Anal. Mach. Intell. 38(5), 918–930 (2015)

    Article  Google Scholar 

  84. Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning social relation traits from face images. In: International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  85. Zhu, S., Li, C., Loy, C.C., Tang, X.: Face alignment by coarse-to-fine shape searching. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  86. Zhu, S., Li, C., Loy, C.C., Tang, X.: Unconstrained face alignment via cascaded compositional learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  87. Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

    Google Scholar 

  88. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: European Conference on Computer Vision (ECCV) (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Change Loy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Loy, C.C., Luo, P., Huang, C. (2017). Deep Learning Face Attributes for Detection and Alignment. In: Feris, R., Lampert, C., Parikh, D. (eds) Visual Attributes. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-50077-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50077-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50075-1

  • Online ISBN: 978-3-319-50077-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics