Skip to main content

Deep Learning in Object Detection

  • Chapter

Abstract

Object detection is an important research area in image processing and computer vision. The performance of object detection has significantly improved through applying deep learning technology. Among these methods, convolutional neural network (CNN)-based methods are most frequently used. CNN methods mainly include two classes: two-stage methods and one-stage methods. This chapter firstly introduces some typical CNN-based architectures in details. After that, pedestrian detection, as a classical subset of object detection, is further introduced. According to whether CNN is used or not, pedestrian detection can be divided into two types: handcrafted feature-based methods and CNN-based methods. Among these methods, NNNF (non-neighboring and neighboring features) inspired by pedestrian attributes (i.e., appearance constancy and shape symmetry) and MCF based on handcrafted channels and each layer of CNN are specifically illustrated. Finally, some challenges of object detection (i.e., scale variation, occlusion, and deformation) will be discussed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bell, S., Zitnick, C. L., Bala, K., and Girshick, R.: Inside-Outside Net: Detecting objects in context with skip pooling and recurrent neural networks. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  2. Benenson, R., Mathias, M., Tuytelaars, T., and Gool, L. V.: Seeking the strongest rigid detector. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2013)

    Google Scholar 

  3. Benenson, R., Omran, M., Hosang, J. and Schiele, B.: Ten years of pedestrian detection, what have we learned? in Proc. Eur. Conf. Comput. Vis. (2014)

    Google Scholar 

  4. Cai, Z., Saberian, M., and Vasconcelos, N.: Learning complexity-aware cascades for deep pedestrian detection. in Proc. IEEE Int. Conf. Comput. Vis. (2015)

    Google Scholar 

  5. Cai, Z. Fan, Q., Feris, R. S., and Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. in Proc. Eur. Conf. Comput. Vis. (2016)

    Google Scholar 

  6. Cao, J., Pang, Y., and Li, X.: Pedestrian detection inspired by appearance constancy and shape symmetry. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  7. Cao, J., Pang, Y., and Li, X.: Pedestrian detection inspired by appearance constancy and shape symmetry. IEEE Trans. Image Processing 25(12), 5538–5551 (2016)

    Article  MathSciNet  Google Scholar 

  8. Cao, J., Pang, Y., and Li, X.: Learning multilayer features for pedestrian detection. IEEE Trans. Image Processing 26(7), 3310–3320 (2017)

    Article  MathSciNet  Google Scholar 

  9. Cheng, M. M., Zhang, Z., Lin, W.Y., and Torr, P.: BING: Binarized normed gradients for objectness estimation at 300fps. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  10. Dalal, N. and Triggs, B.: Histograms of oriented gradients for human detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)

    Article  Google Scholar 

  11. Dai, J., Li, Y., He, K., and Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. in Proc. Advances in Neural Information Processing Systems (2016)

    Google Scholar 

  12. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y.: Deformable convolutional networks. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  13. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  14. Dollár, P., Tu, Z., Perona, P., and Belongie, S.: Integral channel features. in Proc. Brit. Mach. Vis. Conf. (2009)

    Google Scholar 

  15. Dollár, P., Wojek, C., Schiele, B., and Perona, P.: Pedestrian detection: An evaluation of the state of the art. IEEE Trans. Pattern Analysis and Machine Intelligence 34(4), 743–761 (2012)

    Article  Google Scholar 

  16. Dollár, P., Appel, R., Belongie, S., and Perona, P.: Fastest feature pyramids for object detection. IEEE Trans. Pattern Analysis and Machine Intelligence 36(8), 1532–1545 (2014)

    Article  Google Scholar 

  17. Everingham, M., Van Gool, L.,Williams, C. K. I.,Winn, J., and Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  18. Felzenszwalb, P. F., Girshick, R., and McAllester, D.: Cascade object detection with deformable part models. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2010)

    Google Scholar 

  19. Felzenszwalb, P., Girshick, R., McAllester, D., and Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Analysis and Machine Intelligence 32(9), 1627–1645 (2012)

    Article  Google Scholar 

  20. Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., and Berg, A. C.: Dssd: Deconvolutional single shot detector. CoRR abs/1701.06659 (2017)

    Google Scholar 

  21. Geiger, A., Lenz, P., and Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2012)

    Google Scholar 

  22. Gidaris, S. and Komodakis, N.: LocNet: Improving localization accuracy for object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  23. Girshick, R., Donahue, J., Darrell, T., and Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  24. Girshick, R.: Fast RCNN. in Proc. Int. Conf. Comput. Vis. (2015)

    Google Scholar 

  25. He, K., Zhang, X., Ren, S., and Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Analysis and Machine Intelligence 37(9), 1904–1916 (2015)

    Article  Google Scholar 

  26. He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  27. He, K., Gkioxari, G., Dollár, P., and Girshick, R.: Mask R-CNN. in Proc. Int. Conf. Comput. Vis. (2017)

    Google Scholar 

  28. Hosang, J., Omran, M., Benenson, R., and Schiele, B.: Taking a deeper look at pedestrians. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  29. Jaderberg, M., Simonyan, K., Zisserman, A., and Kavukcuoglu, K.: Spatial transformer networks. in Proc. Advances in Neural Information Processing Systems (2015)

    Google Scholar 

  30. Jeon, Y. and Kim, J.: Active convolution: Learning the shape of convolution for image classification. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  31. Kong, T., Yao, A., Chen, Y., and Sun, F.: HyperNet: Towards accurate region proposal generation and joint object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  32. Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., and Chen, Y.: Ron: Reverse connection with objectness prior networks for object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  33. Krizhevsky, A., Sutskever, I., and Hinton, G. E.: ImageNet classification with deep convolutional neural networks. in Proc. Advances in Neural Information Processing Systems (2012)

    Google Scholar 

  34. Li, J., Liang, X., Shen, S., Xu, T., and Yan, S.: Scale-aware Fast R-CNN for pedestrian detection. CoRR abs/1510.08160 2015

    Google Scholar 

  35. Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., and Yan, S.: Perceptual generative adversarial networks for small object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  36. Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollár, P.: Focal loss for dense object detection. in Proc. Int. Conf. Comput. Vis. (2017)

    Google Scholar 

  37. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., and Belongie, S.: Feature pyramid networks for object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  38. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C.: SSD: Single shot multibox detector. in Proc. Eur. Conf. Comput. Vis. (2016)

    Google Scholar 

  39. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  40. Mathias, M., Benenson, R., Timofte, R., and Van Gool, L.: Handling occlusions with franken-classifiers. in Proc. Int. Conf. Comput. Vis. (2013)

    Google Scholar 

  41. Mao, J., Xiao, T., Jiang, Y., and Cao, Z.: What can help pedestrian detection? in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  42. Najibi, M., Rastegari, M., and Davis, L. S.: G-CNN: an iterative grid based object detector. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  43. Nam, N., Dollár, P., and Han, J.: Local decorrelation for improved detection. in Proc. Advances in Neural Information Processing Systems (2014)

    Google Scholar 

  44. Ojala, T., Pietikainen, M., and Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7), 971–987 (2002)

    Article  Google Scholar 

  45. Ouyang, W., Wang, X., Zeng, X., Qiu, S. Luo, P., Tian, Y., Li, H., Yang, S., Wang, Z., Loy, C.-C., and Tang, X.: DeepID-Net: Deformable deep convolutional neural networks for object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  46. Paisitkriangkrai, S., Shen, C., and van den Hengel, A.: Pedestrian detection with spatially pooled features and structured ensemble learning. IEEE Trans. Pattern Analysis and Machine Intelligence. 38(6), 1243–1257 (2016)

    Article  Google Scholar 

  47. Pang, Y., Cao, J., and Shao, L.: Small-scale pedestrian detection by joint classification and super-resolution into a unified network. Tech. report (2017)

    Google Scholar 

  48. Park, D., Ramanan. D., and Fowlkes, C.: Multiresolution models for object detection. in Proc. Eur. Conf. Comput. Vis. (2010)

    Google Scholar 

  49. Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.: You only look once: unified, real-time object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  50. Ren, J., Chen, X., Liu, J., Sun, W., Pang, J., Yan, Q., Tai, Y., and Xu, L:. Accurate single stage detector using recurrent rolling convolution. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  51. Ren, X., and Ramanan, D.: Histograms of sparse codes for object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  52. Ren, S., He, K., Girshick, R., and Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. in Proc. Advances in Neural Information Processing Systems (2015)

    Google Scholar 

  53. Sermanet, P., Kavukcuoglu, K., Chintala, S., and LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2013)

    Google Scholar 

  54. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y.: Overfeat: Integrated recognition, localization and detection using convolutional networks. In ICLR, (2014)

    Google Scholar 

  55. Shrivastava, A., Gupta, A., and Girshick, R.: Training region-based object detectors with online hard example mining. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  56. Simonyan, K., and Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)

    Google Scholar 

  57. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A.: Going deeper with convolutions. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  58. Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. in Proc. Int. Conf. Comput. Vis. (2015)

    Google Scholar 

  59. Tian, Y., Luo, P., Wang, X., and Tang, X.: Pedestrian detection aided by deep learning semantic tasks. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  60. Uijlings, J. R. R., van de Sande, K. E. A., Gevers, T., and Smeulders, A. W. M.: Selective search for object recognition. Int. J. Comput. Vis. (2013)

    Book  Google Scholar 

  61. Viola, P. and Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)

    Article  Google Scholar 

  62. Wang, X., Han, T. X., and Yan, S.: An HOG-LBP human detector with partial occlusion handling. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  63. Wang, X., Yang, M., Zhu, S., and Lin, Y.: Regionlets for generic object detection. in Proc. Int. Conf. Comput. Vis. (2013)

    Google Scholar 

  64. Wang, X., Shrivastava, A., and Gupta, A.: A-Fast-RCNN: Hard positive generation via adversary for object detection. in Proc. Int. Conf. Comput. Vis. (2017)

    Google Scholar 

  65. Yan, J., Lei, Z., Wen, L., and Li, S. Z.: The fastest deformable part model for object detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  66. Yang, B., Yan, J., Lei, Z., and Li, S. Z.: Convolutional channel features. in Proc. Int. Conf. Comput. Vis. (2015)

    Google Scholar 

  67. Yang, B., Yan, J., Lei, Z., Li, and S. Z.: CRAFT objects from images. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  68. Yang, F., Choi, W., and Lin, Y.: Exploit All the Layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  69. Yu, F., Koltun, V., and Funkhouser, T.: Dilated residual networks. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  70. Zagoruyko, S., Lerer, A., Lin, T.-Y., Pinheiro, P. O., Gross, S., Chintala, S., and Dollár, P.: A multipath network for object detection. in Proc. British Machine Vision Conference (2016)

    Google Scholar 

  71. Zhang, L., Lin, L., Liang, X., and He, K.: Is faster R-CNN doing well for pedestrian detection? in Proc. Eur. Conf. Comput. Vis. (2016)

    Google Scholar 

  72. Zhang, S., Bauckhage, C., and Cremers, A. B.: Informed haar-like features improve pedestrian detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  73. Zhang, S., Benenson, R., and Schiele, B.: Filtered channel features for pedestrian detection, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  74. Zhang, S., Benenson, R., Hosang, J. and Schiele, B.: CityPersons: A diverse dataset for pedestrian detection. in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  75. Zitnick, C. L., and Dollár, P.: Edge boxes: locating object proposals from edges. in Proc. Eur. Conf. Comput. Vis. (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanwei Pang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Cite this chapter

Pang, Y., Cao, J. (2019). Deep Learning in Object Detection. In: Jiang, X., Hadid, A., Pang, Y., Granger, E., Feng, X. (eds) Deep Learning in Object Detection and Recognition. Springer, Singapore. https://doi.org/10.1007/978-981-10-5152-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-5152-4_2

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-5151-7

  • Online ISBN: 978-981-10-5152-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics