Advertisement

Bi-box Regression for Pedestrian Detection and Occlusion Estimation

  • Chunluan ZhouEmail author
  • Junsong Yuan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11205)

Abstract

Occlusions present a great challenge for pedestrian detection in practical applications. In this paper, we propose a novel approach to simultaneous pedestrian detection and occlusion estimation by regressing two bounding boxes to localize the full body as well as the visible part of a pedestrian respectively. For this purpose, we learn a deep convolutional neural network (CNN) consisting of two branches, one for full body estimation and the other for visible part estimation. The two branches are treated differently during training such that they are learned to produce complementary outputs which can be further fused to improve detection performance. The full body estimation branch is trained to regress full body regions for positive pedestrian proposals, while the visible part estimation branch is trained to regress visible part regions for both positive and negative pedestrian proposals. The visible part region of a negative pedestrian proposal is forced to shrink to its center. In addition, we introduce a new criterion for selecting positive training examples, which contributes largely to heavily occluded pedestrian detection. We validate the effectiveness of the proposed bi-box regression approach on the Caltech and CityPersons datasets. Experimental results show that our approach achieves promising performance for detecting both non-occluded and occluded pedestrians, especially heavily occluded ones.

Keywords

Pedestrian detection Occlusion handling Deep CNN 

Notes

Acknowledgement

This work is supported in part by Singapore Ministry of Education Academic Research Fund Tier 2 MOE2015-T2-2-114 and start-up grants of University at Buffalo.

References

  1. 1.
    Angelova, A., Krizhevsky, A., Vanhoucke, V., Ogale, A., Ferguson, D.: Real-time pedestrian detection with deep network cascades. In: British Machine and Vision Conference (BMVC) (2015)Google Scholar
  2. 2.
    Azizpour, H., Laptev, I.: Object detection using strongly-supervised deformable part models. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 836–849. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33718-5_60CrossRefGoogle Scholar
  3. 3.
    Benenson, R., Mathias, M., Tuytelaars, T., Van Gool, L.: Seeking the strongest rigid detector. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  4. 4.
    Brazil, G., Yin, X., Liu, X.: Illuminating pedestrians via simultaneous detection and segmentation. In: International Conference on Computer Vision (ICCV) (2017)Google Scholar
  5. 5.
    Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_22CrossRefGoogle Scholar
  6. 6.
    Cai, Z., Saberian, M., Vasconcelos, N.: Learning complexity-aware cascades for deep pedestrian detection. In: International Conference on Computer Vision (ICCV) (2015)Google Scholar
  7. 7.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  8. 8.
    Dollar, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 36, 1532–1545 (2014)CrossRefGoogle Scholar
  9. 9.
    Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 34, 743–761 (2012)CrossRefGoogle Scholar
  10. 10.
    Du, X., El-Khamy, M., Lee, J., Davis, L.S.: Fused DNN: a deep neural network fusion approach to fast and robust pedestrian detection. CoRR (2016). http://arxiv.org/abs/1610.03466
  11. 11.
    Duan, G., Ai, H., Lao, S.: A structural filter approach to human detection. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 238–251. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15567-3_18CrossRefGoogle Scholar
  12. 12.
    Enzweiler, M., Eigenstetter, A., Schiele, B., Gavrila, D.: Multi-cue pedestrian classification with partial occlusion handling. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)Google Scholar
  13. 13.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 32, 1627–1645 (2010)CrossRefGoogle Scholar
  14. 14.
    Girshick, R., Donahue, J., Darrel, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  15. 15.
    Girshick, R., Felzenszwalb, P., McAllester, D.: Object detection with grammar models. In: Advances in Neural Information Processing Systems (NIPS) (2011)Google Scholar
  16. 16.
    Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision (ICCV) (2015)Google Scholar
  17. 17.
    Hosang, J., Omran, M., Benenson, R., Schiele, B.: Taking a deeper look at pedestrians. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  18. 18.
    Leibe, B., Seemann, E., Schiele, B.: Pedestrian detection in crowded scenes. In: IEEE Conference on Computer Vision and Pattern Recogntion (CVPR) (2005)Google Scholar
  19. 19.
    Li, J., Liang, X., Shen, S., Xu, T., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. CoRR (2015)Google Scholar
  20. 20.
    Mao, J., Xiao, T., Jiang, Y., Cao, Z.: What can help pedestrian detection? In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  21. 21.
    Mathias, M., Benenson, R., Timofte, R., Van Gool, L.: Handling occlusions with franken-classfiers. In: International Conference on Computer Vision (ICCV) (2013)Google Scholar
  22. 22.
    Ouyang, W., Wang, X.: A discriminative deep model for pedestrian detection with occlusion handling. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  23. 23.
    Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: International Conference on Computer Vision (ICCV) (2013)Google Scholar
  24. 24.
    Ouyang, W., Wang, X.: Single-pedestrian detection aided by multi-pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  25. 25.
    Ouyang, W., Zeng, X., Wang, X.: Modeling mutual visibility relationship in pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  26. 26.
    Pepik, B., Stark, M., Gehler, P., Schiele, B.: Occlusion patterns for object class detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  27. 27.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  28. 28.
    Shet, V., Neumann, J., Ramesh, V., Davis, L.: Bilattice-based logical reasoning for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)Google Scholar
  29. 29.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR (2014)Google Scholar
  30. 30.
    Tang, S., Andriluka, M., Schiele, B.: Detection and tracking of occluded people. In: British Machine Vision Conference (BMVC) (2012)Google Scholar
  31. 31.
    Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In: International Conference on Computer Vision (ICCV) (2015)Google Scholar
  32. 32.
    Tian, Y., Luo, P., Wang, X., Tang, X.: Pedestrian detection aided by deep learning semantic tasks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  33. 33.
    Tu, Z., Xie, W., Dauwels, J., Li, B., Yuan, J.: Semantic cues enhanced multi-modality multi-stream CNN for action recognition. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) PP(99), 1 (2018)Google Scholar
  34. 34.
    Wang, S., Cheng, J., Liu, H., Tang, M.: PCN: part and context information for pedestrian detection with CNNs. In: British Machine Vision Conference (BMVC) (2017)Google Scholar
  35. 35.
    Wang, X., Han, T., Yan, S.: An HOG-LBP human detector with partial occlusion handling. In: International Conference on Computer Vision (ICCV) (2009)Google Scholar
  36. 36.
    Wu, B., Nevatia, R.: Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In: International Conference on Computer Vision (ICCV) (2005)Google Scholar
  37. 37.
    Xu, D., Ouyang, W., Ricci, E., Wang, X., Sebe, N.: Learning cross-model deep representations for robust pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  38. 38.
    Yang, B., Yan, J., Lei, Z., Li, S.: Convolutional channel features. In: International Conference on Computer Vision (ICCV) (2015)Google Scholar
  39. 39.
    Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_28CrossRefGoogle Scholar
  40. 40.
    Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  41. 41.
    Zhang, S., Benenson, R., Schiele, B.: Citypersons: a diverse dataset for pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  42. 42.
    Zhou, C., Yuan, J.: Non-rectangular part discovery for object detection. In: British Machine Vision Conference (BMVC) (2014)Google Scholar
  43. 43.
    Zhou, C., Yuan, J.: Learning to integrate occlusion-specific detectors for heavily occluded pedestrian detection. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10112, pp. 305–320. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-54184-6_19CrossRefGoogle Scholar
  44. 44.
    Zhou, C., Yuan, J.: Multi-label learning of part detectors for heavily occluded pedestrian detection. In: International Conference on Computer Vision (ICCV) (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Nanyang Technological UniversitySingaporeSingapore
  2. 2.The State University of New York at BuffaloBuffaloUSA

Personalised recommendations