Abstract
Object detection methods can be divided into two categories that are the two-stage methods with higher accuracy but lower speed and the one-stage methods with lower accuracy but higher speed. In order to inherit the advantages of both approaches, a novel dense object detector, called Path Augmented RetinaNet (PA-RetinaNet), is proposed in this paper. It not only achieves a better accuracy than the two-stage methods, but also maintains the efficiency of the one-stage methods. Specifically, we introduce a bottom-up path augmentation module to enhance the feature exaction hierarchy, which shortens the information path between lower feature layers and topmost layers. Furthermore, we address the class imbalance problem by introducing a Class-Imbalance loss, where the loss of each training sample is weighted by a function of its predicted probability, so that the trained model focuses more on hard examples. To evaluate the effectiveness of our PA-RetinaNet, we conducted a number of experiments on the MS COCO dataset. The results show that our method is 4.3% higher than the existing two-stage method, while the speed is similar to the state-of-the-art one-stage methods.
Keywords
Supported by the National Key R&D Program of China (2018YFB0203904), National Natural Science Foundation of China (61602165) and Natural Science Foundation of Hunan Province (2018JJ3074), NSFC from PRC (61872137, 61502158), Hunan NSF (2017JJ3042).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Lin et al. [9] found \(\gamma \) = 2 to work best through a large number of experiments. The function in this paper is mainly compared with the focal loss at \(\gamma \) = 2.
References
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks (2015). https://doi.org/10.1109/TPAMI.2016.2577031
Dai, J., Li, Y., He, K., et al.: R-FCN: Object Detection Via Region-based Fully Convolutional Networks (2016)
Lin, T., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection (2016). https://doi.org/10.1109/CVPR.2017.106
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection (2015). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017)
Fu, C.Y., Liu, W., Ranga, A., et al.: DSSD: Deconvolutional Single Shot Detector (2017)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2999–3007 (2017). https://doi.org/10.1109/TPAMI.2018.2858826
Uijlings, J.R.R., van de Sande, K.E.A.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013). https://doi.org/10.1007/s11263-013-0620-5
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 761–769 (2016). https://doi.org/10.1109/CVPR.2016.89
Zhang, S., Zhu, X., Lei, Z., et al.: S\(^3\)FD: single shot scale-invariant face detector (2017). https://doi.org/10.1109/ICCV.2017.30
Kong, T., Sun, F., Yao, A., et al.: RON: reverse connection with objectness prior networks for object detection (2017). https://doi.org/10.1109/CVPR.2017.557
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition (2015). https://doi.org/10.1109/CVPR.2016.90
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Computer Society (2001). https://doi.org/10.1109/CVPR.2001.990517
Felzenszwalb, P.F., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. PAMI 32(9), 1627–1645 (2010)
Girshick, R.: Fast R-CNN. In: Computer Science (2015). https://doi.org/10.1109/ICCV.2015.169
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_26
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2014). https://doi.org/10.1007/978-3-319-10578-9_23
Wang, X., Shrivastava, A., Gupta, A.: A-Fast-RCNN: hard positive generation via adversary for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3039–3048 (2017). https://doi.org/10.1109/cvpr.2017.324
Bell, S., Zitnick, C.L., Bala, K., et al.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks (2015). https://doi.org/10.1109/CVPR.2016.314
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Kong, T., Yao, A., Chen, Y., et al.: HyperNet: towards accurate region proposal generation and joint object detection (2016). https://doi.org/10.1109/CVPR.2016.98
Shrivastava, A., Sukthankar, R., Malik, J., et al.: Beyond Skip Connections: Top-down Modulation For Object Detection (2016)
Sermanet, P., Eigen, D., Zhang, X., et al.: OverFeat: Integrated Recognition, Localization And Detection Using Convolutional Networks. Eprint Arxiv (2013)
Liu, S., Qi, L., Qin, H., et al.: Path aggregation network for instance segmentation (2018). https://doi.org/10.1109/CVPR.2018.00913
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2014). https://doi.org/10.1109/TPAMI.2016.2572683
He, K., Gkioxari, G., Dollar, P., et al.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2017). https://doi.org/10.1109/TPAMI.2018.2844175
Pytorch homepage. https://pytorch.org
Acknowledgments
This project is an improvement on yhenon’s work, thanks for the code provided by yhenon (https://github.com/yhenon/pytorch-retinanet).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tan, G., Guo, Z., Xiao, Y. (2019). PA-RetinaNet: Path Augmented RetinaNet for Dense Object Detection. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-30484-3_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)