Putting the Anchors Efficiently: Geometric Constrained Pedestrian Detection

Fang, Liangji; Zhao, Xu; Song, Xiao; Zhang, Shiquan; Yang, Ming

doi:10.1007/978-3-030-20873-8_25

Liangji Fang¹⁸,
Xu Zhao¹⁸,
Xiao Song¹⁸,
Shiquan Zhang¹⁸ &
…
Ming Yang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11365))

Included in the following conference series:

Asian Conference on Computer Vision

2521 Accesses
1 Citations

Abstract

Anchor box mechanism is of vital importance for deep network based object detection. Current object detectors put anchors uniformly in the entire image, so the false positives along with training and testing costs are increased significantly. In this work, we break through the homogeneity limitation in anchor putting by introducing geometric constraints for pedestrian detection. We first deduce the relationship between the height of the pedestrian and its location in image plane using geometrical priors, for which we only need to know the rough relative height between camera and pedestrian. As a result, we narrow down the distribution space of anchors with a certain pixel height from 2D to 1D. In implementation, we propose a novel Geometric Constrained Loss, by which the new anchor mechanism is embedded into the deep learning architecture. To further remove false positives in inference, Geometric Constrained Suppression is introduced. Complemented with two effective prediction modules, Dilated Double Shots Detector is designed to achieve better performance. Comprehensive experiments conducted on KITTI and Caltech datasets validate the superior performance and generalization ability of our method for both one-stage and two-stage methods, in both training and inference phases.

This research is supported by the funding from NSFC programs (61673269, 61273285, U1764264).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Seriously occluded and truncated examples are ignored.

References

Benenson, R., Mathias, M., Timofte, R., Van Gool, L.: Pedestrian detection at 100 frames per second. In: CVPR, pp. 2903–2910. IEEE (2012)
Google Scholar
Benenson, R., Timofte, R., Van Gool, L.: Stixels estimation without depth map computation. In: ICCV Workshops, pp. 2010–2017. IEEE (2011)
Google Scholar
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Chapter Google Scholar
Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: CVPR, pp. 2147–2156 (2016)
Google Scholar
Chen, X., et al.: 3D object proposals for accurate object class detection. In: NIPS, pp. 424–432 (2015)
Google Scholar
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp. 379–387 (2016)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005. vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: CVPR, pp. 304–311 (2009)
Google Scholar
Eckstein, M.P., Koehler, K., Welbourne, L.E., Akbas, E.: Humans, but not deep neural networks, often miss giant targets in scenes. Curr. Biol. 27(18), 2827–2832 (2017)
Article Google Scholar
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR, pp. 1–8 (2008)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI 32(9), 1627–1645 (2010)
Article Google Scholar
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)
Google Scholar
Gidaris, S., Komodakis, N.: LocNet: improving localization accuracy for object detection. In: CVPR, pp. 789–798 (2016)
Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2980–2988 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. IJCV 80(1), 3–15 (2008)
Article Google Scholar
Hu, P., Ramanan, D.: Finding tiny faces. In: CVPR, pp. 1522–1530 (2017)
Google Scholar
Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: CVPR (2017)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, W., Rabinovich, A., Berg, A.C.: ParseNet: looking wider to see better. arXiv preprint arXiv:1506.04579 (2015)
Mao, J., Xiao, T., Jiang, Y., Cao, Z.: What can help pedestrian detection? In: CVPR, p. 3 (2017)
Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Park, D., Ramanan, D., Fowlkes, C.: Multiresolution models for object detection. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 241–254. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_18
Chapter Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: Better, Faster, Stronger. arXiv preprint arXiv:1612.08242 (2016)
Ren, J., et al.: Accurate single stage detector using recurrent rolling convolution. In: CVPR, pp. 752–760 (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: CVPR, pp. 761–769 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sudowe, P., Leibe, B.: Efficient use of geometric constraints for sliding-window object detection in video. In: Crowley, J.L., Draper, B.A., Thonnat, M. (eds.) ICVS 2011. LNCS, vol. 6962, pp. 11–20. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23968-7_2
Chapter Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: AAAI, pp. 4278–4284 (2017)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826 (2016)
Google Scholar
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. IJCV 104(2), 154–171 (2013)
Article Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)
Google Scholar
Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: CVPR, p. 2 (2017)
Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn., January 2006
Google Scholar
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_26
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Automation, Shanghai Jiao Tong University, Shanghai, China
Liangji Fang, Xu Zhao, Xiao Song, Shiquan Zhang & Ming Yang

Authors

Liangji Fang
View author publications
You can also search for this author in PubMed Google Scholar
Xu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Song
View author publications
You can also search for this author in PubMed Google Scholar
Shiquan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ming Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu Zhao .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C.V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 174 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fang, L., Zhao, X., Song, X., Zhang, S., Yang, M. (2019). Putting the Anchors Efficiently: Geometric Constrained Pedestrian Detection. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11365. Springer, Cham. https://doi.org/10.1007/978-3-030-20873-8_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-20873-8_25
Published: 26 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20872-1
Online ISBN: 978-3-030-20873-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics