Abstract
Occlusion is a key challenge in object detection. It is hard to discriminate objects accurately when they gather together and occlude each other, especially when they belong to same category which easily leads to the problem that multiple objects are regressed into the same bounding box. To address this problem, an over-regression suppression (ORS) method is proposed to take full advantage of supervised information. Firstly, annotated information is utilized to compute the overlaps between different ground truth boxes. Then, the regression loss function is redesigned by adding a penalty term which is associated with the aforementioned overlaps to prevent Over-regression. Finally, the validity of the algorithm is proved by making some changes in Faster R-CNN, in which a k-means ++ clustering algorithm is used to automatically generate various size anchors by learning the shape regularities of objects from dataset, and the Soft-NMS, a nearly cost-free method, is introduced to replace the traditional NMS. Extensive evaluations on the challenging PASCAL VOC and MS COCO benchmarks demonstrate the superiority of ORS in handling intra-class occlusion. Its performance increases when dataset contains more large objects and hard samples, as demonstrated by the results on the MS COCO dataset.
Similar content being viewed by others
References
Sun X, Wu P, Hoi SCH (2018) Face detection using deep learning: an improved faster RCNN approach. Neurocomputing 299:42–50
Sam B, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2017
Liu T, Abd-Elrahman A (2018) Deep convolutional neural network training enrichment using multi-view object-based analysis of unmanned aerial systems imagery for wetlands classification. ISPRS J Photogramm Remote Sens 139:154–170
Pham C, Jeon JW (2017) Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks. Sig Process Image Commun 53:110–122
Bodla N, Singh B, Chellappa R, et al (2017) Soft-NMS—improving object detection with one line of code. arXiv preprint arXiv:1704.04503, 2017
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, 2015
Zhou HY, Gao BB, Wu J (2017) Adaptive feeding: achieving fast and accurate detections by adaptively combining object detectors. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2017
Hosang J, Benenson R, Schiele B (2017) Learning Non-maximum Suppression. arXiv preprint arXiv:1705.02950, 2017
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: integrated recognition, localization and detection using convolutional networks. In: ICLR, 2014
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2014
Girshick R (2015) Fast R-CNN. In: IEEE international conference on computer vision. IEEE, pp 1440–1448, 2015
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp 379–387, 2016
He K, Gkioxari G, Dollár P (2017) Mask R-CNN [C]. In: ICCV, pp 2980–2988, 2017
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2016
Liu W, Anguelov D, Erhan D, Szegedy C, Christian S, Cheng-Yang F, Alexander C (2016) SSD: single shot multibox detector. In: ECCV, 2016
Fu Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659, 2017
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2017
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767, 2018
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE international conference on computer vision and pattern recognition (CVPR), 2016
Xie S, Girshick R, Dollr P, Tu Z, He K (2016) Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:1611.05431, 2016
Szegedy C, Liu W, Jia Y (2015) Going deeper with convolutions. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2015
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on international conference on machine learning. JMLR. org, 2015
Szegedy V (2016) Vanhoucke and S. Ioffe, Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2016
Szegedy S, Ioffe S, Vanhoucke V (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, pp 4–12, 2017
Howard G, Zhu M, Chen B (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017
Sandler M, Howard A, Zhu M (2018) MobileNetV2: inverted residuals and linear bottlenecks. arXiv preprint arXiv:1801.04381, 2018
Zhang X, Zhou X, Lin M (2017) ShuffleNet: an extremely efficient convolutional neural network for mobile devices. arXiv preprint arXiv:1707.01083, 2017
Lin TY, Goyal P, Girshick R (2017) Focal loss for dense object detection. In: IEEE international conference on computer vision. IEEE Computer Society, pp 2999–3007, 2017
Cai Z, Fan Q, Feris RS, et al (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision, pp 354–370. Springer, Cham, 2016
Van EA (2018) You only look twice: rapid multi-scale object detection in satellite imagery. arXiv preprint arXiv:1805.09512, 2018
Yuan Y, Xiong Z, Wang Q (2019) VSSA-NET: vertical spatial sequence attention network for traffic sign detection. In: IEEE transactions on image processing, 2019
Tian Y, Luo P, Wang X (2015) Deep learning strong parts for pedestrian detection. In: IEEE international conference on computer vision. IEEE, pp 1904–1912, 2015
Ouyang W, Zeng X, Wang X (2016) Partial occlusion handling in pedestrian detection with a deep model. IEEE Trans Circuits Syst Video Technol 26(11):2123–2137
Zhou C, Yuan J (2016) Learning to integrate occlusion-specific detectors for heavily occluded pedestrian detection. In: ACCV, pp. 305–320, 2016
Zhou C, Yuan J (2017) Multi-label learning of part detectors for heavily occluded pedestrian detection. In: IEEE international conference on computer vision. IEEE computer society, pp 3506–3515, 2017
Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2017) Repulsion loss: detecting pedestrians in a crowd. CoRR abs/1711.07752, 2017
Zhang S, Wen L, Bian X (2018) Occlusion-aware R-CNN: detecting pedestrians in a crowd. In: European conference on computer vision, 2018
Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(1):207–244
Hsu JL, Yang HX (2009) A modified K-means algorithm for sequence clustering. In: International conference on hybrid intelligent systems. IEEE, pp. 287–292, 2009
Arthur D, Vassilvitskii S (2007) K-means ++: the advantages of careful seeding. In: 18th ACM-SIAM symposium on discrete algorithms. Society for industrial and applied mathematics, pp 1027–1035, 2007
Milan A, Schindler K, Roth S (2013) Detection- and trajectory-level exclusion in multiple object tracking. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, 2013
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, 2015
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhao, B., Wang, C. & Fu, Q. An over-regression suppression method to discriminate occluded objects of same category. Pattern Anal Applic 23, 1251–1261 (2020). https://doi.org/10.1007/s10044-019-00853-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-019-00853-9