Skip to main content

Putting the Anchors Efficiently: Geometric Constrained Pedestrian Detection

  • Conference paper
  • First Online:
Computer Vision – ACCV 2018 (ACCV 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11365))

Included in the following conference series:

Abstract

Anchor box mechanism is of vital importance for deep network based object detection. Current object detectors put anchors uniformly in the entire image, so the false positives along with training and testing costs are increased significantly. In this work, we break through the homogeneity limitation in anchor putting by introducing geometric constraints for pedestrian detection. We first deduce the relationship between the height of the pedestrian and its location in image plane using geometrical priors, for which we only need to know the rough relative height between camera and pedestrian. As a result, we narrow down the distribution space of anchors with a certain pixel height from 2D to 1D. In implementation, we propose a novel Geometric Constrained Loss, by which the new anchor mechanism is embedded into the deep learning architecture. To further remove false positives in inference, Geometric Constrained Suppression is introduced. Complemented with two effective prediction modules, Dilated Double Shots Detector is designed to achieve better performance. Comprehensive experiments conducted on KITTI and Caltech datasets validate the superior performance and generalization ability of our method for both one-stage and two-stage methods, in both training and inference phases.

This research is supported by the funding from NSFC programs (61673269, 61273285, U1764264).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Seriously occluded and truncated examples are ignored.

References

  1. Benenson, R., Mathias, M., Timofte, R., Van Gool, L.: Pedestrian detection at 100 frames per second. In: CVPR, pp. 2903–2910. IEEE (2012)

    Google Scholar 

  2. Benenson, R., Timofte, R., Van Gool, L.: Stixels estimation without depth map computation. In: ICCV Workshops, pp. 2010–2017. IEEE (2011)

    Google Scholar 

  3. Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22

    Chapter  Google Scholar 

  4. Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: CVPR, pp. 2147–2156 (2016)

    Google Scholar 

  5. Chen, X., et al.: 3D object proposals for accurate object class detection. In: NIPS, pp. 424–432 (2015)

    Google Scholar 

  6. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp. 379–387 (2016)

    Google Scholar 

  7. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005. vol. 1, pp. 886–893. IEEE (2005)

    Google Scholar 

  8. Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: CVPR, pp. 304–311 (2009)

    Google Scholar 

  9. Eckstein, M.P., Koehler, K., Welbourne, L.E., Akbas, E.: Humans, but not deep neural networks, often miss giant targets in scenes. Curr. Biol. 27(18), 2827–2832 (2017)

    Article  Google Scholar 

  10. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  11. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  12. Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)

  13. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)

    Google Scholar 

  14. Gidaris, S., Komodakis, N.: LocNet: improving localization accuracy for object detection. In: CVPR, pp. 789–798 (2016)

    Google Scholar 

  15. Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)

    Google Scholar 

  16. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)

    Google Scholar 

  17. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2980–2988 (2017)

    Google Scholar 

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  19. Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. IJCV 80(1), 3–15 (2008)

    Article  Google Scholar 

  20. Hu, P., Ramanan, D.: Finding tiny faces. In: CVPR, pp. 1522–1530 (2017)

    Google Scholar 

  21. Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: CVPR (2017)

    Google Scholar 

  22. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)

    Google Scholar 

  23. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)

    Google Scholar 

  24. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  25. Liu, W., Rabinovich, A., Berg, A.C.: ParseNet: looking wider to see better. arXiv preprint arXiv:1506.04579 (2015)

  26. Mao, J., Xiao, T., Jiang, Y., Cao, Z.: What can help pedestrian detection? In: CVPR, p. 3 (2017)

    Google Scholar 

  27. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Chapter  Google Scholar 

  28. Park, D., Ramanan, D., Fowlkes, C.: Multiresolution models for object detection. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 241–254. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_18

    Chapter  Google Scholar 

  29. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR, pp. 779–788 (2016)

    Google Scholar 

  30. Redmon, J., Farhadi, A.: YOLO9000: Better, Faster, Stronger. arXiv preprint arXiv:1612.08242 (2016)

  31. Ren, J., et al.: Accurate single stage detector using recurrent rolling convolution. In: CVPR, pp. 752–760 (2017)

    Google Scholar 

  32. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)

    Google Scholar 

  33. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  34. Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: CVPR, pp. 761–769 (2016)

    Google Scholar 

  35. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  36. Sudowe, P., Leibe, B.: Efficient use of geometric constraints for sliding-window object detection in video. In: Crowley, J.L., Draper, B.A., Thonnat, M. (eds.) ICVS 2011. LNCS, vol. 6962, pp. 11–20. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23968-7_2

    Chapter  Google Scholar 

  37. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: AAAI, pp. 4278–4284 (2017)

    Google Scholar 

  38. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826 (2016)

    Google Scholar 

  39. Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. IJCV 104(2), 154–171 (2013)

    Article  Google Scholar 

  40. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)

    Google Scholar 

  41. Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: CVPR, p. 2 (2017)

    Google Scholar 

  42. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn., January 2006

    Google Scholar 

  43. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_26

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu Zhao .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 174 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fang, L., Zhao, X., Song, X., Zhang, S., Yang, M. (2019). Putting the Anchors Efficiently: Geometric Constrained Pedestrian Detection. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11365. Springer, Cham. https://doi.org/10.1007/978-3-030-20873-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20873-8_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20872-1

  • Online ISBN: 978-3-030-20873-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics