Abstract
We train a deep convolutional neural network to perform identity classification using a new dataset of public figures annotated with age, gender, ethnicity and emotion labels, and then fine-tune it for attribute classification. An optimal sharing pattern of computational resources within this network is determined by experiment, requiring only 1 G flops to produce all predictions. Rather than fine-tune by re-learning weights in one additional layer after the penultimate layer of the identity network, we try several different depths for each attribute. We find that prediction of age and emotion is improved by fine-tuning from earlier layers onward, presumably because deeper layers are progressively invariant to non-identity related changes in the input.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baker, S.: Multi-PIE. In: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition. IEEE Computer Society (2008)
Chen, B.-C., Chen, C.-S., Hsu, W.H.: Cross-age reference coding for age-invariant face recognition and retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 768–783. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10599-4_49
Dhall, A., Goecke, R., Joshi, J., Wagner, M., Gedeon, T.: Emotion recognition in the wild challenge. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI (2013)
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: a deep convolutional activation feature for generic visual recognition. CoRR, abs/1310.1531 (2013)
Eidinger, E., Enbar, R., Hassner, T.: Age and gender estimation of unfiltered faces. IEEE Trans. Inf. Forensics Secur. 9(12), 2170–2179 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report 07–49, University of Massachusetts, Amherst, October 2007
Kahou, S.E., Bouthillier, X., Lamblin, P., Gülçehre, Ç., Michalski, V., Konda, K.R., Jean, S., Froumenty, P., Dauphin, Y., Boulanger-Lewandowski, N., Ferrari, R.C., Mirza, M., Warde-Farley, D., Courville, A.C., Vincent, P., Memisevic, R., Pal, C.J., Bengio, Y.: EmoNets: multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Interfaces, abs/1503.01800 (2015)
Levi, G., Hassner, T.: Age and gender classification using convolutional neural networks. In: CVPR workshops (2015)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: CVPR (2014)
Pinto, L., Gandhi, D., Han, Y., Park, Y.-L., Gupta, A.: The curious robot: learning visual representations via physical interactions. In: ECCV (2016)
Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. CoRR, abs/1603.01249 (2016)
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR workshops (2014)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: CVPR (2013)
Susskind, J.M., Anderson, A.K., Hinton, G.E.: The toronto face database. Department of Computer Science, University of Toronto, Toronto, ON, Canada. Technical report 3 (2010)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10590-1_53
Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.D.: PANDA: pose aligned networks for deep attribute modeling. In: CVPR (2014)
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10599-4_7
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning deep representation for face alignment with auxiliary attributes. IEEE Trans. Pattern Anal. Mach. Intell. 38(5), 918–930 (2016)
Acknowledgments
We would like to thank Neil O’Hare for collaborating with us on the search engine query logs, and the entire Yahoo Vision and Machine Learning Team.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Baumgartner, T., Culpepper, J. (2017). Deep Architectures for Face Attributes. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10117. Springer, Cham. https://doi.org/10.1007/978-3-319-54427-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-54427-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54426-7
Online ISBN: 978-3-319-54427-4
eBook Packages: Computer ScienceComputer Science (R0)