Smoother Soft-NMS for Overlapping Object Detection in X-Ray Images

  • Chunhui LinEmail author
  • Xudong Bao
  • Xuan Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11935)


As a contactless security technology, X-ray security inspection machine is widely used in the detection of dangerous object in all kinds of densely populated public places to ensure the safety. Unlike a natural image, various objects overlapping with each other can be observed in an X-ray image for its perspectivity. It brings us a challenge that the traditional NMS (Non-maximum suppression) algorithm will suppress the less significant objects. In this paper, we propose a Smoother Soft NMS based on the difference in aspect ratios and areas of different object bounding boxes to improve the accuracy of overlapping object detection. We also propose a special data augmentation method to simulate the generation of complex samples of overlapping objects. On our dataset, we boost the mean Average Precision of ResNet-101 FPN from 89.44% to 96.67% and Cascade R-CNN from 96.43% to 97.21%. Detector trained by Smoother Soft NMS has a significant improvement in overlapping cases.


Smoother Soft NMS Dangerous object detection X-ray images 


  1. 1.
    Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face recognition: a literature survey. ACM Comput. Surv. (CSUR) 35(4), 399–458 (2003)CrossRefGoogle Scholar
  2. 2.
    Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)Google Scholar
  3. 3.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Null, p. 1470. IEEE (2003)Google Scholar
  4. 4.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1–8. IEEE (2007)Google Scholar
  5. 5.
    Bouwmans, T., Zahzah, E.H.: Robust PCA via principal component pursuit: a review for a comparative evaluation in video surveillance. Comput. Vis. Image Underst. 122, 22–34 (2014)CrossRefGoogle Scholar
  6. 6.
    Ma, X., et al.: Vehicle traffic driven camera placement for better metropolis security surveillance. In: IEEE Intelligent Systems (2018)Google Scholar
  7. 7.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  8. 8.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  9. 9.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567 (2015)
  10. 10.
    Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261 (2016)
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
  12. 12.
    Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)Google Scholar
  13. 13.
    Girshick, R.B.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)Google Scholar
  14. 14.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  15. 15.
    Redmon, J., Divvala, S., Girshick, R., et al.: You Only Look Once: Unified, Real-Time Object Detection. ArXiv preprint arXiv:1506.02640
  16. 16.
    Liu, W., et al.: SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  17. 17.
    He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: IEEE International Conference on Computer Vision, pp. 2980–2988. IEEE Computer Society (2017)Google Scholar
  18. 18.
    Rosenfeld, A., Thurston, M.: Edge and curve detection for visual scene analysis. IEEE Trans. Comput. 5, 562–569 (1971)CrossRefGoogle Scholar
  19. 19.
    Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS improving object detection with one line of code. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5562–5570. IEEE (2017)Google Scholar
  20. 20.
    Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature Pyramid Networks for Object Detection. ArXiv preprint arXiv:1612.03144
  21. 21.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 761–769 (2016)Google Scholar
  22. 22.
    Wang, X., Shrivastava, A., Gupta, A.: A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection. ArXiv preprint arXiv:1704.03414
  23. 23.
    Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. In: International Conference on Neural Information Processing Systems, pp. 2672–2680. MIT Press (2014)Google Scholar
  24. 24.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  25. 25.
    Cai, Z.: Nuno Vasconcelos. Cascade R-CNN: Delving into high quality object detection. ArXiv preprint arXiv:1712.00726

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Lab of Image Science and Technology, School of Computer Science and EngineeringSoutheast UniversityNanjingChina
  2. 2.School of AutomationSoutheast UniversityNanjingChina

Personalised recommendations