Single Shot Attention-Based Face Detector

  • Chubin Zhuang
  • Shifeng Zhang
  • Xiangyu Zhu
  • Zhen LeiEmail author
  • Stan Z. Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10996)


Although face detection has taken a big step forward with the development of anchor based face detector, the issue of effective detection of faces with different scales still remains. To solve this problem, we present an one-stage face detector, named Single Shot Attention-Based Face Detector (AFD), which enables accurate detection of multi-scale faces with high efficiency, especially for small faces. Specifically, AFD consists of two inter-connected modules, namely attention proposal module (APM) and face detection module (FDM). The former aims to generate the attention region and coarsely refine the anchors. The latter takes the output from APM as input and further improve the detection results. We obtain state-of-the-art results on common face detection benchmarks, i.e. FDDB and WIDER FACE, and can run at 20 FPS on a Nvidia Titan X (Pascal) for VGA-resolution images.


Face detection Attention mechanism Single shot 



This work was supported by the Chinese National Natural Science Foundation Projects #61473291, #61572536, #61572501, #61573356, the National Key Research and Development Plan (Grant No. 2016YFC0801002), and AuthenMetric R&D Funds.


  1. 1.
    Luan, T., Yin, X., Liu, X.: Disentangled representation learning GAN for pose-invariant face recognition. In: CVPR (2017)Google Scholar
  2. 2.
    Masi, I., Chang, F.J., Choi, J., Harel, S., Kim, J., Kim, K.G.: Learning pose-aware models for pose-invariant face recognition in the wild. In: PAMI (2018)Google Scholar
  3. 3.
    Xing, J., Niu, Z., Huang, J., Hu, W., Xi, Z., Yan, S.: Towards robust and accurate multi-view and partially-occluded face alignment. In: PAMI (2018)Google Scholar
  4. 4.
    Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2016)Google Scholar
  5. 5.
    Viola, P., Jones, M.J.: Robust real-time face detection. IJCV 57, 137–154 (2004)Google Scholar
  6. 6.
    Lecun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook of Brain Theory and Neural Networks (1995)Google Scholar
  7. 7.
    Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: CVPR (2015)Google Scholar
  8. 8.
    Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. Sig. Process. Lett. 23, 1499–1503 (2016)CrossRefGoogle Scholar
  9. 9.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).
  10. 10.
    Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: S\(^3\)FD: single shot scale-invariant face detector. In: ICCV (2017)Google Scholar
  11. 11.
    Huang, J., Guadarrama, S., Murphy, K., Rathod, V., Sun, C., Zhu, M., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: CVPR (2017)Google Scholar
  12. 12.
    Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: CVPR (2018)Google Scholar
  13. 13.
    Jain, V., Learned-Miller, E.: FDDB: a benchmark for face detection in unconstrained settings. UMass Amherst Technical report (2010)Google Scholar
  14. 14.
    Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: CVPR (2016)Google Scholar
  15. 15.
    Huang, C., Ai, H., Li, Y., Lao, S.: High-performance rotation invariant multiview face detection. In: PAMI (2007)Google Scholar
  16. 16.
    Li, S.Z., Zhu, L., Zhang, Z.Q., Blake, A., Zhang, H.J., Shum, H.: Statistical learning of multi-view face detection. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 67–81. Springer, Heidelberg (2002).
  17. 17.
    Felzenszwalb, P., Mcallester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  18. 18.
    Yan, J., Zhang, X., Lei, Z., Li, S.Z.: Face detection by structural models. Image Vis. Comput. 32, 790–799 (2014)CrossRefGoogle Scholar
  19. 19.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. NIPS (2015)Google Scholar
  20. 20.
    Jiang, H., Learned-Miller, E.: Face detection with the faster R-CNN. In: Automatic Face and Gesture Recognition (2017)Google Scholar
  21. 21.
    Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)Google Scholar
  22. 22.
    Wang, J., Yuan, Y., Yu, G.: Face attention network: an effective face detector for the occluded faces. arXiv: 1711.07246 (2017)
  23. 23.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACMMM (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Chubin Zhuang
    • 1
    • 2
  • Shifeng Zhang
    • 1
    • 2
  • Xiangyu Zhu
    • 1
    • 2
  • Zhen Lei
    • 1
    • 2
    Email author
  • Stan Z. Li
    • 1
    • 2
  1. 1.CBSR&NLPRInstitute of Automation, Chinese Academy of SciencesBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations