Advertisement

SCOD: Dynamical Spatial Constraints for Object Detection

  • Kai-Jun Zhang
  • Cheng-Hao Guo
  • Zhong-Han Niu
  • Lu-Fei Liu
  • Yu-Bin YangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11295)

Abstract

One-stage detectors are widely used in real-world computer vision applications nowadays due to their competitive accuracy and very fast speed. However, for high resolution (e.g., \(512 \times 512\)) input, most one-stage detectors run too slowly to process such images in real time. In this paper, we propose a novel one-stage detector called Dynamical Spatial Constraints for Object Detection (SCOD). We apply dynamical spatial constraints to address multiple detections of the same object and use two parallel classifiers to address the serious class imbalance. Experimental results show that SCOD makes a significant improvement in speed and achieves competitive accuracy on the challenging PASCAL VOC2007 and PASCAL VOC2012 benchmarks. On VOC2007 test, SCOD runs at 41 FPS with a mAP of 80.4%, which is \(2.2 {\times }\) faster than SSD that runs at 19 FPS with a mAP of 79.8%. On VOC2012 test, SCOD runs at 71 FPS with a mAP of 75.4%, which is \(1.8 {\times }\) faster than YOLOv2 that runs at 40 FPS with a mAP of 73.4%.

Keywords

Object detection Spatial constraints Class imbalance Non-maximum suppression 

Notes

Acknowledgments

This work is funded by the Natural Science Foundation of China (No. 61673204), State Grid Corporation of Science and Technology Projects (Funded No. SGLNXT00DKJS1700166).

References

  1. 1.
    Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1800–1807. IEEE (2017)Google Scholar
  2. 2.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)Google Scholar
  3. 3.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)Google Scholar
  4. 4.
    Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2154 (2014)Google Scholar
  5. 5.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  6. 6.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Cascade object detection with deformable part models. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2241–2248. IEEE (2010)Google Scholar
  7. 7.
    Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
  8. 8.
    Girshick, R.: Fast R-CNN. In: Proceedings of the International Conference on Computer Vision (ICCV) (2015)Google Scholar
  9. 9.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  10. 10.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. IEEE (2017)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  12. 12.
    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  13. 13.
    Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. arXiv preprint arXiv:1611.10012 (2016)
  14. 14.
    Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., Chen, Y.: RON: reverse connection with objectness prior networks for object detection. arXiv preprint arXiv:1707.01691 (2017)
  15. 15.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  16. 16.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2980–2988 (2017)Google Scholar
  17. 17.
    Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  18. 18.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)Google Scholar
  20. 20.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)
  21. 21.
    Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
  22. 22.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (NIPS) (2015)Google Scholar
  23. 23.
    Shen, Z., Liu, Z., Li, J., Jiang, Y.G., Chen, Y., Xue, X.: DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1919–1927 (2017)Google Scholar
  24. 24.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)Google Scholar
  25. 25.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Neural Information Processing Systems (NIPS) (2015)Google Scholar
  26. 26.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, p. I-I. IEEE (2001)Google Scholar
  27. 27.
    Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Kai-Jun Zhang
    • 1
  • Cheng-Hao Guo
    • 2
  • Zhong-Han Niu
    • 1
  • Lu-Fei Liu
    • 1
  • Yu-Bin Yang
    • 1
    Email author
  1. 1.State Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina
  2. 2.Science and Technology on Information System Engineering LaboratoryNanjingChina

Personalised recommendations