Abstract
Object detection is an important research area in image processing and computer vision. The performance of object detection has significantly improved through applying deep learning technology. Among these methods, convolutional neural network (CNN)-based methods are most frequently used. CNN methods mainly include two classes: two-stage methods and one-stage methods. This chapter firstly introduces some typical CNN-based architectures in details. After that, pedestrian detection, as a classical subset of object detection, is further introduced. According to whether CNN is used or not, pedestrian detection can be divided into two types: handcrafted feature-based methods and CNN-based methods. Among these methods, NNNF (non-neighboring and neighboring features) inspired by pedestrian attributes (i.e., appearance constancy and shape symmetry) and MCF based on handcrafted channels and each layer of CNN are specifically illustrated. Finally, some challenges of object detection (i.e., scale variation, occlusion, and deformation) will be discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bell, S., Zitnick, C. L., Bala, K., and Girshick, R.: Inside-Outside Net: Detecting objects in context with skip pooling and recurrent neural networks. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)
Benenson, R., Mathias, M., Tuytelaars, T., and Gool, L. V.: Seeking the strongest rigid detector. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2013)
Benenson, R., Omran, M., Hosang, J. and Schiele, B.: Ten years of pedestrian detection, what have we learned? in Proc. Eur. Conf. Comput. Vis. (2014)
Cai, Z., Saberian, M., and Vasconcelos, N.: Learning complexity-aware cascades for deep pedestrian detection. in Proc. IEEE Int. Conf. Comput. Vis. (2015)
Cai, Z. Fan, Q., Feris, R. S., and Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. in Proc. Eur. Conf. Comput. Vis. (2016)
Cao, J., Pang, Y., and Li, X.: Pedestrian detection inspired by appearance constancy and shape symmetry. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)
Cao, J., Pang, Y., and Li, X.: Pedestrian detection inspired by appearance constancy and shape symmetry. IEEE Trans. Image Processing 25(12), 5538–5551 (2016)
Cao, J., Pang, Y., and Li, X.: Learning multilayer features for pedestrian detection. IEEE Trans. Image Processing 26(7), 3310–3320 (2017)
Cheng, M. M., Zhang, Z., Lin, W.Y., and Torr, P.: BING: Binarized normed gradients for objectness estimation at 300fps. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2014)
Dalal, N. and Triggs, B.: Histograms of oriented gradients for human detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)
Dai, J., Li, Y., He, K., and Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. in Proc. Advances in Neural Information Processing Systems (2016)
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y.: Deformable convolutional networks. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2009)
Dollár, P., Tu, Z., Perona, P., and Belongie, S.: Integral channel features. in Proc. Brit. Mach. Vis. Conf. (2009)
Dollár, P., Wojek, C., Schiele, B., and Perona, P.: Pedestrian detection: An evaluation of the state of the art. IEEE Trans. Pattern Analysis and Machine Intelligence 34(4), 743–761 (2012)
Dollár, P., Appel, R., Belongie, S., and Perona, P.: Fastest feature pyramids for object detection. IEEE Trans. Pattern Analysis and Machine Intelligence 36(8), 1532–1545 (2014)
Everingham, M., Van Gool, L.,Williams, C. K. I.,Winn, J., and Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Felzenszwalb, P. F., Girshick, R., and McAllester, D.: Cascade object detection with deformable part models. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2010)
Felzenszwalb, P., Girshick, R., McAllester, D., and Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Analysis and Machine Intelligence 32(9), 1627–1645 (2012)
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., and Berg, A. C.: Dssd: Deconvolutional single shot detector. CoRR abs/1701.06659 (2017)
Geiger, A., Lenz, P., and Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2012)
Gidaris, S. and Komodakis, N.: LocNet: Improving localization accuracy for object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)
Girshick, R., Donahue, J., Darrell, T., and Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2014)
Girshick, R.: Fast RCNN. in Proc. Int. Conf. Comput. Vis. (2015)
He, K., Zhang, X., Ren, S., and Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Analysis and Machine Intelligence 37(9), 1904–1916 (2015)
He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)
He, K., Gkioxari, G., Dollár, P., and Girshick, R.: Mask R-CNN. in Proc. Int. Conf. Comput. Vis. (2017)
Hosang, J., Omran, M., Benenson, R., and Schiele, B.: Taking a deeper look at pedestrians. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2015)
Jaderberg, M., Simonyan, K., Zisserman, A., and Kavukcuoglu, K.: Spatial transformer networks. in Proc. Advances in Neural Information Processing Systems (2015)
Jeon, Y. and Kim, J.: Active convolution: Learning the shape of convolution for image classification. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)
Kong, T., Yao, A., Chen, Y., and Sun, F.: HyperNet: Towards accurate region proposal generation and joint object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)
Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., and Chen, Y.: Ron: Reverse connection with objectness prior networks for object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)
Krizhevsky, A., Sutskever, I., and Hinton, G. E.: ImageNet classification with deep convolutional neural networks. in Proc. Advances in Neural Information Processing Systems (2012)
Li, J., Liang, X., Shen, S., Xu, T., and Yan, S.: Scale-aware Fast R-CNN for pedestrian detection. CoRR abs/1510.08160 2015
Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., and Yan, S.: Perceptual generative adversarial networks for small object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollár, P.: Focal loss for dense object detection. in Proc. Int. Conf. Comput. Vis. (2017)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., and Belongie, S.: Feature pyramid networks for object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C.: SSD: Single shot multibox detector. in Proc. Eur. Conf. Comput. Vis. (2016)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Mathias, M., Benenson, R., Timofte, R., and Van Gool, L.: Handling occlusions with franken-classifiers. in Proc. Int. Conf. Comput. Vis. (2013)
Mao, J., Xiao, T., Jiang, Y., and Cao, Z.: What can help pedestrian detection? in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)
Najibi, M., Rastegari, M., and Davis, L. S.: G-CNN: an iterative grid based object detector. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)
Nam, N., Dollár, P., and Han, J.: Local decorrelation for improved detection. in Proc. Advances in Neural Information Processing Systems (2014)
Ojala, T., Pietikainen, M., and Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7), 971–987 (2002)
Ouyang, W., Wang, X., Zeng, X., Qiu, S. Luo, P., Tian, Y., Li, H., Yang, S., Wang, Z., Loy, C.-C., and Tang, X.: DeepID-Net: Deformable deep convolutional neural networks for object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2015)
Paisitkriangkrai, S., Shen, C., and van den Hengel, A.: Pedestrian detection with spatially pooled features and structured ensemble learning. IEEE Trans. Pattern Analysis and Machine Intelligence. 38(6), 1243–1257 (2016)
Pang, Y., Cao, J., and Shao, L.: Small-scale pedestrian detection by joint classification and super-resolution into a unified network. Tech. report (2017)
Park, D., Ramanan. D., and Fowlkes, C.: Multiresolution models for object detection. in Proc. Eur. Conf. Comput. Vis. (2010)
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.: You only look once: unified, real-time object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)
Ren, J., Chen, X., Liu, J., Sun, W., Pang, J., Yan, Q., Tai, Y., and Xu, L:. Accurate single stage detector using recurrent rolling convolution. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)
Ren, X., and Ramanan, D.: Histograms of sparse codes for object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)
Ren, S., He, K., Girshick, R., and Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. in Proc. Advances in Neural Information Processing Systems (2015)
Sermanet, P., Kavukcuoglu, K., Chintala, S., and LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2013)
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y.: Overfeat: Integrated recognition, localization and detection using convolutional networks. In ICLR, (2014)
Shrivastava, A., Gupta, A., and Girshick, R.: Training region-based object detectors with online hard example mining. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)
Simonyan, K., and Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A.: Going deeper with convolutions. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2015)
Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. in Proc. Int. Conf. Comput. Vis. (2015)
Tian, Y., Luo, P., Wang, X., and Tang, X.: Pedestrian detection aided by deep learning semantic tasks. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2015)
Uijlings, J. R. R., van de Sande, K. E. A., Gevers, T., and Smeulders, A. W. M.: Selective search for object recognition. Int. J. Comput. Vis. (2013)
Viola, P. and Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)
Wang, X., Han, T. X., and Yan, S.: An HOG-LBP human detector with partial occlusion handling. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2008)
Wang, X., Yang, M., Zhu, S., and Lin, Y.: Regionlets for generic object detection. in Proc. Int. Conf. Comput. Vis. (2013)
Wang, X., Shrivastava, A., and Gupta, A.: A-Fast-RCNN: Hard positive generation via adversary for object detection. in Proc. Int. Conf. Comput. Vis. (2017)
Yan, J., Lei, Z., Wen, L., and Li, S. Z.: The fastest deformable part model for object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2014)
Yang, B., Yan, J., Lei, Z., and Li, S. Z.: Convolutional channel features. in Proc. Int. Conf. Comput. Vis. (2015)
Yang, B., Yan, J., Lei, Z., Li, and S. Z.: CRAFT objects from images. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)
Yang, F., Choi, W., and Lin, Y.: Exploit All the Layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)
Yu, F., Koltun, V., and Funkhouser, T.: Dilated residual networks. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)
Zagoruyko, S., Lerer, A., Lin, T.-Y., Pinheiro, P. O., Gross, S., Chintala, S., and Dollár, P.: A multipath network for object detection. in Proc. British Machine Vision Conference (2016)
Zhang, L., Lin, L., Liang, X., and He, K.: Is faster R-CNN doing well for pedestrian detection? in Proc. Eur. Conf. Comput. Vis. (2016)
Zhang, S., Bauckhage, C., and Cremers, A. B.: Informed haar-like features improve pedestrian detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2014)
Zhang, S., Benenson, R., and Schiele, B.: Filtered channel features for pedestrian detection, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2015)
Zhang, S., Benenson, R., Hosang, J. and Schiele, B.: CityPersons: A diverse dataset for pedestrian detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)
Zitnick, C. L., and Dollár, P.: Edge boxes: locating object proposals from edges. in Proc. Eur. Conf. Comput. Vis. (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Pang, Y., Cao, J. (2019). Deep Learning in Object Detection. In: Jiang, X., Hadid, A., Pang, Y., Granger, E., Feng, X. (eds) Deep Learning in Object Detection and Recognition. Springer, Singapore. https://doi.org/10.1007/978-981-10-5152-4_2
Download citation
DOI: https://doi.org/10.1007/978-981-10-5152-4_2
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5151-7
Online ISBN: 978-981-10-5152-4
eBook Packages: Computer ScienceComputer Science (R0)