A Novel Multi-purpose Deep Architecture for Facial Attribute and Emotion Understanding

  • Ankit SharmaEmail author
  • Pooyan BalouchianEmail author
  • Hassan ForooshEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11401)


Facial expression estimation has for years been studied benefiting a wide array of application areas ranging from information retrieval and sentiment analysis to video surveillance and emotion analysis. Methods have been proposed to tackle the problem of facial attribute recognition using deep architectures yielding high accuracies, however less efforts exist to focus on the performance of these architectures. Here in this work, we make use of Squeeze-Net [6] for the first time in the literature to perform facial emotion recognition benchmarked on Celeb-A and AffectNet datasets. Here we extend Squeeze-Net by introducing a new 5 \(\times \) 5 convolution kernel after the last fully-connected layer offered by Squeeze-Net, merging the 1 \(\times \) 1 and 3 \(\times \) 3 outputs from the last fully-connected layers, to perform a more domain-specific feature extraction. We run extensive experiments using widely-used datasets; i.e. Celeb-A and AffectNet, using AlexNet and Squeeze-Net in addition to our proposed architecture. Our proposed architecture, an extension to Squeeze-Net, yields results inline with state of the art while offering a simple architecture involving less complexity compared to state of the art, reporting accuracies of 90.47% and 56.38% compared to 90.94% and 52.36%, in Attribute Prediction and Expression Prediction respectively.


Attribute prediction Emotion recognition Convolutional neural network 


  1. 1.
    Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)Google Scholar
  2. 2.
    Abdulnabi, A.H., Wang, G., Lu, J., Jia, K.: Multi-task CNN model for attribute prediction. IEEE Trans. Multimedia 17(11), 1949–1959 (2015)CrossRefGoogle Scholar
  3. 3.
    Chollet, F., et al.: Keras: deep learning library for theano and tensorflow, vol. 7(8) (2015).
  4. 4.
    Han, H., Jain, A.K., Shan, S., Chen, X.: Heterogeneous face attribute estimation: a deep multi-task learning approach. IEEE Trans. Pattern Anal. Mach. Intell. 40(11), 2597–2609 (2017)CrossRefGoogle Scholar
  5. 5.
    Hand, E.M., Chellappa, R.: Attributes for improved attributes: a multi-task network utilizing implicit and explicit relationships for facial attribute classification. In: AAAI, pp. 4068–4074 (2017)Google Scholar
  6. 6.
    Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and \(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
  7. 7.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)Google Scholar
  8. 8.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  9. 9.
    Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015)Google Scholar
  10. 10.
    Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: a database for facial expression, valence, and arousal computing in the wild. arXiv preprint arXiv:1708.03985 (2017)
  11. 11.
    Rudd, E.M., Günther, M., Boult, T.E.: MOON: a mixed objective optimization network for the recognition of facial attributes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 19–35. Springer, Cham (2016). Scholar
  12. 12.
    Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.: Panda: pose aligned networks for deep attribute modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1637–1644 (2014)Google Scholar
  13. 13.
    Zhong, Y., Sullivan, J., Li, H.: Leveraging mid-level deep representations for predicting face attributes in the wild. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3239–3243. IEEE (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Central FloridaOrlandoUSA

Personalised recommendations