An over-regression suppression method to discriminate occluded objects of same category

Abstract

Occlusion is a key challenge in object detection. It is hard to discriminate objects accurately when they gather together and occlude each other, especially when they belong to same category which easily leads to the problem that multiple objects are regressed into the same bounding box. To address this problem, an over-regression suppression (ORS) method is proposed to take full advantage of supervised information. Firstly, annotated information is utilized to compute the overlaps between different ground truth boxes. Then, the regression loss function is redesigned by adding a penalty term which is associated with the aforementioned overlaps to prevent Over-regression. Finally, the validity of the algorithm is proved by making some changes in Faster R-CNN, in which a k-means ++ clustering algorithm is used to automatically generate various size anchors by learning the shape regularities of objects from dataset, and the Soft-NMS, a nearly cost-free method, is introduced to replace the traditional NMS. Extensive evaluations on the challenging PASCAL VOC and MS COCO benchmarks demonstrate the superiority of ORS in handling intra-class occlusion. Its performance increases when dataset contains more large objects and hard samples, as demonstrated by the results on the MS COCO dataset.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    Sun X, Wu P, Hoi SCH (2018) Face detection using deep learning: an improved faster RCNN approach. Neurocomputing 299:42–50

    Article  Google Scholar 

  2. 2.

    Sam B, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2017

  3. 3.

    Liu T, Abd-Elrahman A (2018) Deep convolutional neural network training enrichment using multi-view object-based analysis of unmanned aerial systems imagery for wetlands classification. ISPRS J Photogramm Remote Sens 139:154–170

    Article  Google Scholar 

  4. 4.

    Pham C, Jeon JW (2017) Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks. Sig Process Image Commun 53:110–122

    Article  Google Scholar 

  5. 5.

    Bodla N, Singh B, Chellappa R, et al (2017) Soft-NMS—improving object detection with one line of code. arXiv preprint arXiv:1704.04503, 2017

  6. 6.

    Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, 2015

  7. 7.

    Zhou HY, Gao BB, Wu J (2017) Adaptive feeding: achieving fast and accurate detections by adaptively combining object detectors. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2017

  8. 8.

    Hosang J, Benenson R, Schiele B (2017) Learning Non-maximum Suppression. arXiv preprint arXiv:1705.02950, 2017

  9. 9.

    Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: integrated recognition, localization and detection using convolutional networks. In: ICLR, 2014

  10. 10.

    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2014

  11. 11.

    Girshick R (2015) Fast R-CNN. In: IEEE international conference on computer vision. IEEE, pp 1440–1448, 2015

  12. 12.

    Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp 379–387, 2016

  13. 13.

    He K, Gkioxari G, Dollár P (2017) Mask R-CNN [C]. In: ICCV, pp 2980–2988, 2017

  14. 14.

    Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2016

  15. 15.

    Liu W, Anguelov D, Erhan D, Szegedy C, Christian S, Cheng-Yang F, Alexander C (2016) SSD: single shot multibox detector. In: ECCV, 2016

  16. 16.

    Fu Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659, 2017

  17. 17.

    Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2017

  18. 18.

    Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767, 2018

  19. 19.

    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE international conference on computer vision and pattern recognition (CVPR), 2016

  20. 20.

    Xie S, Girshick R, Dollr P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:1611.05431, 2016

  21. 21.

    Szegedy C, Liu W, Jia Y (2015) Going deeper with convolutions. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2015

  22. 22.

    Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on international conference on machine learning. JMLR. org, 2015

  23. 23.

    Szegedy V (2016) Vanhoucke and S. Ioffe, Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2016

  24. 24.

    Szegedy S, Ioffe S, Vanhoucke V (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, pp 4–12, 2017

  25. 25.

    Howard G, Zhu M, Chen B (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017

  26. 26.

    Sandler M, Howard A, Zhu M (2018) MobileNetV2: inverted residuals and linear bottlenecks. arXiv preprint arXiv:1801.04381, 2018

  27. 27.

    Zhang X, Zhou X, Lin M (2017) ShuffleNet: an extremely efficient convolutional neural network for mobile devices. arXiv preprint arXiv:1707.01083, 2017

  28. 28.

    Lin TY, Goyal P, Girshick R (2017) Focal loss for dense object detection. In: IEEE international conference on computer vision. IEEE Computer Society, pp 2999–3007, 2017

  29. 29.

    Cai Z, Fan Q, Feris RS, et al (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision, pp 354–370. Springer, Cham, 2016

  30. 30.

    Van EA (2018) You only look twice: rapid multi-scale object detection in satellite imagery. arXiv preprint arXiv:1805.09512, 2018

  31. 31.

    Yuan Y, Xiong Z, Wang Q (2019) VSSA-NET: vertical spatial sequence attention network for traffic sign detection. In: IEEE transactions on image processing, 2019

  32. 32.

    Tian Y, Luo P, Wang X (2015) Deep learning strong parts for pedestrian detection. In: IEEE international conference on computer vision. IEEE, pp 1904–1912, 2015

  33. 33.

    Ouyang W, Zeng X, Wang X (2016) Partial occlusion handling in pedestrian detection with a deep model. IEEE Trans Circuits Syst Video Technol 26(11):2123–2137

    Article  Google Scholar 

  34. 34.

    Zhou C, Yuan J (2016) Learning to integrate occlusion-specific detectors for heavily occluded pedestrian detection. In: ACCV, pp. 305–320, 2016

  35. 35.

    Zhou C, Yuan J (2017) Multi-label learning of part detectors for heavily occluded pedestrian detection. In: IEEE international conference on computer vision. IEEE computer society, pp 3506–3515, 2017

  36. 36.

    Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2017) Repulsion loss: detecting pedestrians in a crowd. CoRR abs/1711.07752, 2017

  37. 37.

    Zhang S, Wen L, Bian X (2018) Occlusion-aware R-CNN: detecting pedestrians in a crowd. In: European conference on computer vision, 2018

  38. 38.

    Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(1):207–244

    MATH  Google Scholar 

  39. 39.

    Hsu JL, Yang HX (2009) A modified K-means algorithm for sequence clustering. In: International conference on hybrid intelligent systems. IEEE, pp. 287–292, 2009

  40. 40.

    Arthur D, Vassilvitskii S (2007) K-means ++: the advantages of careful seeding. In: 18th ACM-SIAM symposium on discrete algorithms. Society for industrial and applied mathematics, pp 1027–1035, 2007

  41. 41.

    Milan A, Schindler K, Roth S (2013) Detection- and trajectory-level exclusion in multiple object tracking. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2013

  42. 42.

    Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, 2015

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Chunping Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhao, B., Wang, C. & Fu, Q. An over-regression suppression method to discriminate occluded objects of same category. Pattern Anal Applic 23, 1251–1261 (2020). https://doi.org/10.1007/s10044-019-00853-9

Download citation

Keywords

  • Convolutional neural network
  • Object detector
  • k-means ++
  • Over-regression suppression
  • Occlusion