PyramidBox: A Context-Assisted Single Shot Face Detector

  • Xu Tang
  • Daniel K. Du
  • Zeqiang He
  • Jingtuo LiuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)


Face detection has been well studied for many years and one of remaining challenges is to detect small, blurred and partially occluded faces in uncontrolled environment. This paper proposes a novel context-assisted single shot face detector, named PyramidBox to handle the hard face detection problem. Observing the importance of the context, we improve the utilization of contextual information in the following three aspects. First, we design a novel context anchor to supervise high-level contextual feature learning by a semi-supervised method, which we call it PyramidAnchors. Second, we propose the Low-level Feature Pyramid Network to combine adequate high-level context semantic feature and Low-level facial feature together, which also allows the PyramidBox to predict faces of all scales in a single shot. Third, we introduce a context-sensitive structure to increase the capacity of prediction network to improve the final accuracy of output. In addition, we use the method of Data-anchor-sampling to augment the training samples across different scales, which increases the diversity of training data for smaller faces. By exploiting the value of context, PyramidBox achieves superior performance among the state-of-the-art over the two common face detection benchmarks, FDDB and WIDER FACE. Our code is available in PaddlePaddle:


Face detection Context Single shot PyramidBox 



We wish to thank Dr. Shifeng Zhang and Dr. Yuguang Liu for many helpful discussions.

Supplementary material

474192_1_En_49_MOESM1_ESM.pdf (13.1 mb)
Supplementary material 1 (pdf 13372 KB)


  1. 1.
    Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)CrossRefGoogle Scholar
  2. 2.
    Brubaker, S.C., Wu, J., Sun, J., Mullin, M.D., Rehg, J.M.: On the design of cascades of boosted ensembles for face detection. Int. J. Comput. Vis. 77(1–3), 65–86 (2008)CrossRefGoogle Scholar
  3. 3.
    Pham, M.T., Cham, T.J.: Fast training and selection of Haar features using statistics in boosting-based face detection. In: ICCV (2007)Google Scholar
  4. 4.
    Liao, S., Jain, A.K., Li, S.Z.: A fast and accurate unconstrained face detector. IEEE Trans. Parttern Anal. Mach. Intell. 38, 211–223 (2016)CrossRefGoogle Scholar
  5. 5.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Yang, B., Yan, J., Lei, Z., Li, S.Z.: Aggregate channel features for multi-view face detection. In: IJCB, pp. 1–8 (2014)Google Scholar
  7. 7.
    Zhu, Q., Yeh, M.C., Cheng, K.T., Avidan, S.: Fast human detection using a cascade of histograms of oriented gradients. In: CVPR, vol. 2 (2006)Google Scholar
  8. 8.
    Mathias, M., Benenson, R., Pedersoli, M., Van Gool, L.: Face detection without bells and whistles. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 720–735. Springer, Cham (2014). Scholar
  9. 9.
    Yan, J., Lei, Z., Wen, L., Li, S.Z.: The fastest deformable part model for object detection. In: CVPR (2014)Google Scholar
  10. 10.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: CVPR (2012)Google Scholar
  11. 11.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)Google Scholar
  12. 12.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. TIEEE Trans. Parttern Anal. Mach. Intell. 38(3), 142–158 (2016)CrossRefGoogle Scholar
  13. 13.
    Girshick, R.: Fast R-CNN. In: ICCV (2015)Google Scholar
  14. 14.
    Ren, S., Girshick, K.H.R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  15. 15.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  16. 16.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)Google Scholar
  17. 17.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)Google Scholar
  18. 18.
    Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. arXiv preprint (2017)Google Scholar
  19. 19.
    Barbu, A., Gramajo, G.: Face detection with a 3D model. arXiv preprint arXiv:1404.3596 (2014)
  20. 20.
    Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: SSH: single stage headless face detector. In: ICCV (2017)Google Scholar
  21. 21.
    Zhang, S., Zhu, X., Lei, X., Shi, H., Wang, X., Li, S.Z.: \(\text{S}^{3}\text{ FD }\): single shot scale-invariant face detector. In: ICCV (2017)Google Scholar
  22. 22.
    Wang, Y., Ji, X., Zhou, Z., Wang, H., Li, Z.: Detecting faces using region-based fully convolutional networs. arXiv preprint arXiv:1709.05256 (2017)
  23. 23.
    Wang, J., Yuan, Y., Yu, G.: Face attention network: an effective face detector for the occluded faces. arXiv preprint arXiv:1711.07246 (2017)
  24. 24.
    Lin, T.Y., Dollár, P., Girshick, R.: Feature pyramid networks for object detection. In: CVPR (2017)Google Scholar
  25. 25.
    Zhang, S., Zhu, X., Lei, X., Shi, H., Wang, X., Li, S.Z.: FaceBoxes: a CPU real-time face detector with high accuracy. arXiv preprint arXiv:1708.05234 (2017)
  26. 26.
    Yang, S., Xiong, Y., Loy, C.C., Tang, X.: Face detection through scale-friendly deep convolutional networks. arXiv preprint arXiv:1706.02863 (2017)
  27. 27.
    Zhu, C., Zheng, Y., Luu, K., Savvides, M.: CMS-RCNN: contextual multi-scale region-based CNN for unconstrained face detection. arXiv preprint arXiv:1606.05413 (2016)
  28. 28.
    Hu, P., Ramanan, D.: Finding tiny faces. In: CVPR (2017)Google Scholar
  29. 29.
    Liu, W., Rabinovich, A., Berg, A.C.: ParseNet: looking wider to see better. ICLR (2016)Google Scholar
  30. 30.
    Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). Scholar
  31. 31.
    Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659
  32. 32.
    Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261 (2016)
  33. 33.
    Goodfellow, I.J., Farley, D.W., Mirza, M., Courville, A., Bengio, Y.: Maxout networks (2013)Google Scholar
  34. 34.
    Thompson, S.K.: Sampling. Wiley, Hoboken (2012)CrossRefGoogle Scholar
  35. 35.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS, vol. 9 (2010)Google Scholar
  37. 37.
    Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  38. 38.
    Jain, V., Learned-Miller, E.G.: FDDB: a benchmark for face detection in unconstrained settings. UMass Amherst Technical report (2010)Google Scholar
  39. 39.
    Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: CVPR (2016)Google Scholar
  40. 40.
    Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. In: SPL, vol. 23, no. 10 (2016)CrossRefGoogle Scholar
  41. 41.
    Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network. In: MM. ACM (2016)Google Scholar
  42. 42.
    Triantafyllidou, D., Tefas, A.: A fast deep convolutional neural network for face detection in big visual data. In: INNS Conference on Big Data (2016)Google Scholar
  43. 43.
    Yang, S., Luo, P., Loy, C.C., Tang, X: From facial parts responses to face detection: a deep learning approach. In: ICCV (2015)Google Scholar
  44. 44.
    Li, Y., Sun, B., Wu, T., Wang, Y.: Face detection with end-to-end integration of a convnet and a 3D model (2016)CrossRefGoogle Scholar
  45. 45.
    Farfade, S.S., Saberian, M.J., Li, L.J.: Multi-view face detection using deep convolutional neural networksGoogle Scholar
  46. 46.
    Ghiasi, G., Fowlkes, C.: Occlusion coherence: detecting and localizing occluded faces (2015)Google Scholar
  47. 47.
    Kumar, V., Namboodiri, A., Jawahar, C.: Visual phrases for exemplar face detection. In: ICCV (2015)Google Scholar
  48. 48.
    Li, H., Hua, G., Lin, Z., Brandt, J., Yang, J.: Probabilistic elastic part model for unsupervised face detector adaptation. In: ICCV (2013)Google Scholar
  49. 49.
    Li, J., Zhang, Y.: Learning surf cascade for fast and accurate object detection. In: CVPR (2013)Google Scholar
  50. 50.
    Li, H., Lin, Z., Brandt, J., Shen, X., Hua, G.: Efficient boosted exemplar-based face detection. In: CVPR (2014)Google Scholar
  51. 51.
    Ohn-Bar, E., Trivedi, M.M.: To boost or not to boost? On the limits of boosted trees for object detection. In: ICPR (2016)Google Scholar
  52. 52.
    Ranjan, R., Patel, V.M., Chellappa, R.: A deep pyramid deformable part model for face detection. In: BTAS (2015)Google Scholar
  53. 53.
    Ranjan, R., Patel, V.M., Chellappa, R.: HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv preprint arXiv:1603.01249 (2016)
  54. 54.
    Wan, S., Chen, Z., Zhang, T., Zhang, B., Wong, K.K.: Bootstrapping face detection with hard negative examples. arXiv preprint arXiv:1608.02236 (2016)
  55. 55.
    Zhang, C., Xu, X., Tu, D.: Face detection using improved faster RCNN. arXiv preprint arXiv:1802.02142 (2018)
  56. 56.
    Wang, H., Li, Z., Ji, X., Wang, Y.: Face R-CNN. arXiv preprint arXiv:1706.01061, vol. 7 (2017)

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

Personalised recommendations