Abstract
Recently, Faster R-CNN achieves great performance in deep learning based object detection. However, a major bottleneck of Faster R-CNN lies on the sharp performance deterioration when detecting objects that are small in size or have a similar appearance with their backgrounds. To address this problem, we present a new pedestrian detection approach based on Faster R-CNN, which combines contextual information with multi-level features. The contextual information is embedded by pooling information from a larger area around the original region of interest. It helps pedestrians detection from cluttered backgrounds. The multi-level features can be obtained by pooling proposal-specific features from several shallow but high-resolution layers. These features are more informative for detecting small-size pedestrians. Extensive experiments on the challenging Caltech dataset validate that our approach not only performs better than the baseline of Faster R-CNN but also boosts the detection performance when combined with contextual information and multi-level features. Meanwhile, compared with numerous pedestrian detection approaches, our combined method outperforms all of them and achieves a quite superior performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015)
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once - unified, real-time object detection. CoRR cs.CV (2015)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN - object detection via region-based fully convolutional networks. In: NIPS (2016)
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)
Li, J., Liang, X., Shen, S., Xu, T., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. CoRR (2015)
Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_28
Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In: ICCV, pp. 1904–1912 (2015)
Du, X., El-Khamy, M., Lee, J., Davis, L.S.: Fused DNN: a deep neural network fusion approach to fast and robust pedestrian detection, October 2016. arXiv.org
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Wang, X., Han, T.X., Yan, S.: An HOG-LBP human detector with partial occlusion handling. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 32–39. IEEE (2009)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features (2009)
Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)
Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3626–3633 (2013)
Tian, Y., Luo, P., Wang, X., Tang, X.: Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5087 (2015)
Liu, W., Rabinovich, A., Berg, A.C.: ParseNet: looking wider to see better, June 2015. arXiv.org
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of 13th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, Chia Laguna Resort, PMLR, Sardinia, Italy, vol. 9, pp. 249–256, 13–15 May 2010
Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: CVPR (2016)
Acknowledgment
This project was supported by Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality (ZDSYS201703031405467), Shenzhen Peacock Plan (20130408-183003656), and National Science Foundation of China (No. U1611461).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Kong, W., Li, N., Li, T.H., Li, G. (2018). Deep Pedestrian Detection Using Contextual Information and Multi-level Features. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-73603-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)