Advertisement

Multi-flow Sub-network and Multiple Connections for Single Shot Detection

  • Ye Li
  • Huicheng Zheng
  • Lvran Chen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11257)

Abstract

One-stage object detection methods are usually more computationally efficient than two-stage methods, which makes it more likely to be applied in practice. However, one-stage methods often suffer from lower detection accuracies, especially when the objects to be detected are small. In this paper, we propose a multi-flow sub-network and multiple connections for single shot detection (MSSD), which is built upon a one-stage strategy to inherit the computational efficiency and improve the detection accuracy. The multi-flow sub-network in MSSD aims to extract high quality feature maps with high spatial resolution, sufficient non-linear transformation, and multiple receptive fields, which facilitates detection of small objects in particular. In addition, MSSD uses multiple connections, including up-sampling, down-sampling, and resolution-invariant connections, to combine feature maps of different layers, which helps the model capture fine-grained details and improve feature representation. Extensive experiments on PASCAL VOC and MS COCO demonstrate that MSSD achieves competitive detection accuracy with high computational efficiency compared to state-of-the-art methods. MSSD with input size of 320\(\,\times \,\)320 achieves 80.6% mAP on VOC2007 at 45 FPS and 29.7% mAP on COCO, both with a Nvidia Titan X GPU.

Keywords

Object detection Single shot detection Feature representation enhancement 

Notes

Acknowledgements

This work was supported by National Natural Science Foundation of China (U1611461), Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase, No. U1501501), and Science and Technology Program of Guangzhou (No. 201803030029).

References

  1. 1.
    Everingham, M., Gool, L.V., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  2. 2.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  3. 3.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: International Conference on Computer Vision, pp. 2980–2988. IEEE, Venice (2017)Google Scholar
  4. 4.
    Zhu, Y., Zhao, C., Wang, J., Zhao, X., Wu, Y., Lu, H.: CoupleNet: coupling global structure with local parts for object detection. In: International Conference on Computer Vision, pp. 4146–4154. IEEE, Venice (2017)Google Scholar
  5. 5.
    Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
  6. 6.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: International Conference on Computer Vision, Venice, pp. 2999–3007. IEEE (2017)Google Scholar
  7. 7.
    Wu, X., Zhang, D., Zhu, J., Steven C.H.: Single-shot bidirectional pyramid networks for high-quality object detection. arXiv preprint arXiv:1803.08208 (2018)
  8. 8.
    Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  9. 9.
    Hu, P., Ramanan, D.: Finding tiny faces. In: IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, pp. 1522–1530. IEEE (2017)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp. 770–778. IEEE (2016)Google Scholar
  11. 11.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, pp. 1–8. IEEE (2008)Google Scholar
  12. 12.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, San Diego, pp. 886–893. IEEE (2005)Google Scholar
  13. 13.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp. 580–587. IEEE (2014)Google Scholar
  14. 14.
    Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision, Santiago, pp. 1440–1448. IEEE (2015)Google Scholar
  15. 15.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, Montreal, pp. 91–99. MIT (2015)Google Scholar
  16. 16.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp. 779–788. IEEE (2016)Google Scholar
  17. 17.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
  18. 18.
    Xiang, W., Zhang, D.Q., Athitsos, V., Yu, H.: Context-aware single-shot detector. arXiv preprint arXiv:1707.08682 (2017)
  19. 19.
    Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. arXiv preprint arXiv:1711.07767 (2017)
  20. 20.
    Fu, H., Cheng, J., Xu, Y., Wong, D.W.K., Liu, J., Cao, X.: Joint optic disc and cup segmentation based on multi-label deep network and polar transformation. IEEE Trans. Med. Imaging 37(7), 1597–1605 (2018)CrossRefGoogle Scholar
  21. 21.
    Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, Boston, pp. 1–9. IEEE (2015)Google Scholar
  22. 22.
    Dai, J., et al.: Deformable convolutional networks. In: IEEE International Conference on Computer Vision, Venice, pp. 764–773. IEEE (2017)Google Scholar
  23. 23.
    Huang, G., Liu, Z., Maaten, L.V.D., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, pp. 2261–2269. IEEE (2017)Google Scholar
  24. 24.
    Dai, J., Li, Y., He, K., Sun, J., et al.: R-FCN: object detection via region-based fully convolutional networks. In: International Conference on Neural Information Processing Systems, Barcelona, pp. 379–387. MIT (2016)Google Scholar
  25. 25.
    Jeong, J., Park, H., Kwak, N.: Enhancement of SSD by concatenating feature maps for object detection. arXiv preprint arXiv:1705.09587 (2017)
  26. 26.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp. 6517–6525. IEEE (2016)Google Scholar
  27. 27.
    Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960 (2017)
  28. 28.
    Shen, Z., Liu, Z., Li, J., Jiang, Y.G., Chen, Y., Xue, X.: DSOD: learning deeply supervised object detectors from scratch. In: IEEE International Conference on Computer Vision, Venice, pp. 1937–1945. IEEE (2017)Google Scholar
  29. 29.
    Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.: Single-shot refinement neural network for object detection. arXiv preprint arXiv:1711.06897 (2017)
  30. 30.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp. 761–769. IEEE (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Data and Computer ScienceSun Yat-sen UniversityGuangzhouChina
  2. 2.Key Laboratory of Machine Intelligence and Advanced ComputingMinistry of EducationGuangzhouChina
  3. 3.Guangdong Key Laboratory of Information Security TechnologyGuangzhouChina

Personalised recommendations