Focal Loss for Region Proposal Network

  • Chengpeng Chen
  • Xinhang Song
  • Shuqiang JiangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11257)


Currently, most state-of-the-art object detection models are based on a two-stage scheme pioneered by R-CNN and integrated with region proposal network (RPN), which is served as proposal generation. During the training of RPN, only a fixed number of samples with a fixed object/not-object ratio are sampled to avoid class imbalance problem. In contrast to the sampling strategies, focal loss is utilized to solve the class imbalance problem by down-weighting the losses of vast number of easy samples, which is encountered in one-stage detection methods. Inspired by this, we investigate the adaptation of focal loss to RPN in this paper, which allow us to train RPN free of the sampling process. Based on Faster R-CNN, we adapt focal loss to RPN and the experimental results on PASCAL VOC 2007 and COCO datasets outperform the baseline, which shows the efficiency of the proposed method and implies that focal loss can be applied to RPN directly.


Object detection Region proposal network Focal loss 



This work was supported in part by the National Natural Science Foundation of China under Grant 61532018, in part by the Lenovo Outstanding Young Scientists Program, in part by National Program for Special Support of Eminent Professionals and National Program for Support of Top-notch Young Professionals, in part by the National Postdoctoral Program for Innovative Talents under Grant BX201700255.


  1. 1.
    Ren, S., He, K., Girshick, R., Sun, J.: FasterR-CNN: towards real-time object detection with region proposal networks. In: NIPS (2016)Google Scholar
  2. 2.
    Chen, X., Gupta, A.: An implementation of faster R-CNN with study for region sampling. arXiv:1702.02138 (2017)
  3. 3.
    Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollàr, P.: Focal loss for dense object detection. In: ICCV (2017)Google Scholar
  4. 4.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). Scholar
  5. 5.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS (2016)Google Scholar
  6. 6.
    Girshick, R., Donahue, J., Darrell, T. Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)Google Scholar
  7. 7.
    Girshick, R.: Fast R-CNN. In: ICCV (2015)Google Scholar
  8. 8.
    He, K., Gkioxari, G., Dollàr, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar
  9. 9.
    Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  10. 10.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  11. 11.
    Liu, W., et al.: SSD: Single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  12. 12.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)Google Scholar
  13. 13.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017)Google Scholar
  14. 14.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region based object detectors with online hard example mining. In: CVPR (2016)Google Scholar
  15. 15.
    Uijlings, J.R., Van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. IJCV 104, 154–171 (2013)CrossRefGoogle Scholar
  16. 16.
    Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE TPAMI 34(11), 2189–2202 (2012)CrossRefGoogle Scholar
  17. 17.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). Scholar
  18. 18.
    Szegedy, C., Reed, S., Erhan, D., Anguelov, D.: Scalable, high-quality object detection. arXiv preprint arXiv:1412.1441v2 (2014)
  19. 19.
    Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: CVPR (2014)Google Scholar
  20. 20.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). Scholar
  21. 21.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  22. 22.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  23. 23.
    Sung, K.-K., Poggio, T.: Learning and example selection for object and pattern detection. In: MIT A.I. Memo No. 1521 (1994)Google Scholar
  24. 24.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
  25. 25.
    Singh, B., Davis, L. S.: An analysis of scale invariance in object detection - SNIP. In: CVPR (2018)Google Scholar
  26. 26.
    Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: CVPR (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Chengpeng Chen
    • 1
    • 2
  • Xinhang Song
    • 1
    • 2
  • Shuqiang Jiang
    • 1
    • 2
    Email author
  1. 1.Key Laboratory of Intelligent Information Processing, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  2. 2.University of Chinese Academy of SciensesBeijingChina

Personalised recommendations