Deep Pedestrian Detection Using Contextual Information and Multi-level Features

Kong, Weijie; Li, Nannan; Li, Thomas H.; Li, Ge

doi:10.1007/978-3-319-73603-7_14

Weijie Kong²¹,
Nannan Li²¹,
Thomas H. Li²² &
…
Ge Li²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10704))

Included in the following conference series:

International Conference on Multimedia Modeling

3251 Accesses
3 Citations

Abstract

Recently, Faster R-CNN achieves great performance in deep learning based object detection. However, a major bottleneck of Faster R-CNN lies on the sharp performance deterioration when detecting objects that are small in size or have a similar appearance with their backgrounds. To address this problem, we present a new pedestrian detection approach based on Faster R-CNN, which combines contextual information with multi-level features. The contextual information is embedded by pooling information from a larger area around the original region of interest. It helps pedestrians detection from cluttered backgrounds. The multi-level features can be obtained by pooling proposal-specific features from several shallow but high-resolution layers. These features are more informative for detecting small-size pedestrians. Extensive experiments on the challenging Caltech dataset validate that our approach not only performs better than the baseline of Faster R-CNN but also boosts the detection performance when combined with contextual information and multi-level features. Meanwhile, compared with numerous pedestrian detection approaches, our combined method outperforms all of them and achieves a quite superior performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015)
Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once - unified, real-time object detection. CoRR cs.CV (2015)
Google Scholar
Dai, J., Li, Y., He, K., Sun, J.: R-FCN - object detection via region-based fully convolutional networks. In: NIPS (2016)
Google Scholar
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)
Article Google Scholar
Li, J., Liang, X., Shen, S., Xu, T., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. CoRR (2015)
Google Scholar
Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_28
Chapter Google Scholar
Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In: ICCV, pp. 1904–1912 (2015)
Google Scholar
Du, X., El-Khamy, M., Lee, J., Davis, L.S.: Fused DNN: a deep neural network fusion approach to fast and robust pedestrian detection, October 2016. arXiv.org
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Wang, X., Han, T.X., Yan, S.: An HOG-LBP human detector with partial occlusion handling. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 32–39. IEEE (2009)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features (2009)
Google Scholar
Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)
Article Google Scholar
Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3626–3633 (2013)
Google Scholar
Tian, Y., Luo, P., Wang, X., Tang, X.: Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5087 (2015)
Google Scholar
Liu, W., Rabinovich, A., Berg, A.C.: ParseNet: looking wider to see better, June 2015. arXiv.org
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of 13th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, Chia Laguna Resort, PMLR, Sardinia, Italy, vol. 9, pp. 249–256, 13–15 May 2010
Google Scholar
Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: CVPR (2016)
Google Scholar

Download references

Acknowledgment

This project was supported by Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality (ZDSYS201703031405467), Shenzhen Peacock Plan (20130408-183003656), and National Science Foundation of China (No. U1611461).

Author information

Authors and Affiliations

School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, China
Weijie Kong, Nannan Li & Ge Li
Gpower Semiconductor, Inc., Suzhou, China
Thomas H. Li

Authors

Weijie Kong
View author publications
You can also search for this author in PubMed Google Scholar
Nannan Li
View author publications
You can also search for this author in PubMed Google Scholar
Thomas H. Li
View author publications
You can also search for this author in PubMed Google Scholar
Ge Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ge Li .

Editor information

Editors and Affiliations

Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Klaus Schoeffmann
Chulalongkorn University, Bangkok, Thailand
Thanarat H. Chalidabhongse
City University of Hong Kong, Hong Kong, China
Chong Wah Ngo
Chulalongkorn University, Bangkok, Thailand
Supavadee Aramvith
Dublin City University, Dublin, Ireland
Noel E. O’Connor
Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Tampere University of Technology, Tampere, Finland
Moncef Gabbouj
Rutgers University, Piscataway, New Jersey, USA
Ahmed Elgammal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kong, W., Li, N., Li, T.H., Li, G. (2018). Deep Pedestrian Detection Using Contextual Information and Multi-level Features. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-73603-7_14
Published: 13 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics