Face Detection in a Complex Background Using Cascaded Conventional Networks

  • Jianjun Li
  • Juxian Wang
  • Chin-Chen Chang
  • Zhuo Tang
  • Zhenxing Luo
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 733)


Although significant achievements have been achieved in the field of face detection recently, face detection under complex background is still a challenge issue. Especially, face detection has wide applications in real life, such as face recognition attendance system and crowd size estimation. In this paper, we propose a novel cascaded framework to tackle the challenges based on: blur, illumination, pose, expression and occlusion. Our framework adopt the localization of facial landmarks to boost up their performance. In addition, our detector extracts features from different layers of a deep residual network for complementary information of low-dimensional and high-dimensional features. Our method achieves notable results over the state-of-the-art techniques on the challenging WIDER FACE benchmark for face detection and our results show that average precision of 89.2%. Importantly, we demonstrate superior performance and robustness in a challenging environment.


Face detection Cascaded conventional neural network Facial landmarks 


  1. 1.
    Farfade, S.S., Saberian, M.J., Li, L.J.: Multi-view face detection using deep convolutional neural networks. In: 5th ACM on International Conference on Multimedia Retrieval, pp. 643–650. ACM (2015)Google Scholar
  2. 2.
    Tran, C.K., Tseng, C.D., Lee, T.F.: Improving the face recognition accuracy under varying illumination conditions for local binary patterns and local ternary patterns based on weber-face and singular value decomposition. In: International Conference on Green Technology and Sustainable Development (GTSD), pp. 5–9. IEEE (2016)Google Scholar
  3. 3.
    Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)CrossRefGoogle Scholar
  4. 4.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  5. 5.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999)Google Scholar
  6. 6.
    Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110, 346–359 (2008)CrossRefGoogle Scholar
  7. 7.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: Computer Vision and Pattern Recognition, CVPR, pp. 1–8 (2008)Google Scholar
  8. 8.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: arXiv preprint arXiv:1512.02325 (2015)
  9. 9.
    Roeder, N., Li, X.: Accuracy analysis for facial feature detection. Pattern Recogn. 29, 143–157 (1996)CrossRefGoogle Scholar
  10. 10.
    Yuille, A.L.: Deformable templates for face recognition. J. Cogn. Neurosci. 3, 59–70 (1991)CrossRefGoogle Scholar
  11. 11.
    Lam, K.M., Yan, H.: Locating and extracting the eye in human face images. Pattern Recogn. 29(5), 771–779 (1996)CrossRefGoogle Scholar
  12. 12.
    Deng, J.Y., Lai, F.: Region-based template deformation and masking for eye-feature extraction and description. Pattern Recogn. 30, 403–419 (1997)CrossRefGoogle Scholar
  13. 13.
    Renburgh, R.H., Clunies-Ross, C.W.: Linear discriminant analysis. Chicago 3(6), 27–33 (1960)Google Scholar
  14. 14.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, p. 1. IEEE (2001)Google Scholar
  15. 15.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  16. 16.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A: Going deeper with convolutions. In: Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  17. 17.
    Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Comput. Vis. Image Underst. 61(1), 38–59 (1995)CrossRefGoogle Scholar
  18. 18.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)CrossRefGoogle Scholar
  19. 19.
    Cootes, T.F., Wheeler, G.V., Walker, K.N., et al.: View-based active appearance models. Image Vis. Comput. 20(9), 657–664 (2002)CrossRefGoogle Scholar
  20. 20.
    Yu, X., Huang, J., Zhang, S., Yan, W., Metaxas, D.N.: Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1944–1951 (2013)Google Scholar
  21. 21.
    Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 157, pp. 2879–2886. IEEE Computer Society (2012)Google Scholar
  22. 22.
    Zhang, Z., Luo, P., Chen, C.L., Tang, X.: Facial Landmark detection by deep multi-task learning. In: European Conference on Computer Vision, vol. 8694, pp. 94–108 (2014)Google Scholar
  23. 23.
    Yang, B., Yan, J., Lei, Z., Li, S.Z.: Aggregate channel features for multi-view face detection. In: 2014 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–8. IEEE (2014)Google Scholar
  24. 24.
    Yang, S., Luo, P., Chen, C.L., Tang, X.: Wider face: a face detection benchmark, pp. 5525–5533 (2015)Google Scholar
  25. 25.
    Yang, S., Luo, P., Loy, C.-C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3676–3684 (2015)Google Scholar
  26. 26.
    Ohn-Bar, E., Trivedi, M.M.: To boost or not to boost? On the limits of boosted trees for object detection. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3350–3355. IEEE (2016)Google Scholar
  27. 27.
    Jia, Y., Shelhamer, E., Donahue, J., et al.: Caffe: convolutional architecture for fast feature embedding. In: In Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)Google Scholar
  28. 28.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  29. 29.
    Seetaface Homepage. Accessed 2016

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Jianjun Li
    • 1
  • Juxian Wang
    • 1
  • Chin-Chen Chang
    • 2
  • Zhuo Tang
    • 3
  • Zhenxing Luo
    • 3
  1. 1.School of Computer Science and EngineeringHangzhou Dianzi UniversityHangzhouChina
  2. 2.Department of Information Engineering and Computer ScienceFeng Chia UniversityTaichungTaiwan
  3. 3.Key Lab of the 36th Institute of CETC of ChinaJiaxingChina

Personalised recommendations