Deep Learning Architectures for Object Detection and Classification

  • Bhaumik VaidyaEmail author
  • Chirag Paunwala
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 374)


Object detection and classification have observed large amount of transformation and research after the advances in machine learning algorithms. The advancement in the computing power and data availability is complimenting this transformation in object detection. In recent times, research in the field of object detection is dominated by special type of neural network called Convolutional Neural Network (CNN). The object detection system has to localize objects in an image and accurately classify it. CNN is well suited for this task as it can accurately find features like edges, corners and even more advanced features needed to detect object. This chapter provides detailed overview on how CNN works and how it is useful in object detection and classification task. After that popular deep networks based on CNN like ResNet, VGG16, VGG19, GoogleNet and MobileNet are explained in detail. These networks worked well for object classification task but needed sliding window technique for localizing object in an image. It worked slowly as it needed to process many windows for a single image. This led to more advanced algorithms for object detection based on CNN like Convolutional Neural Network with Region proposals (R-CNN), fast R-CNN, faster R-CNN, Single shot multi-box detector (SSD) and You Only Look Once (YOLO). This chapter provides a detail explanation of how these algorithms work and comparison between them. Most of the deep learning algorithms require large amount of data and dedicated hardware like GPUs to train. To overcome this, the concept of transfer learning is discovered. In that pre-trained models of popular CNN architecture are used to solve new problems. So in the last part of the chapter this concept of transfer learning and when it is useful is explained.


Deep learning Convolutional neural network (CNN) CNN with region proposals (R-CNN) You only look once (YOLO) Single shot multi-box detector (SSD) Transfer learning 


  1. 1.
    Kpcb Internet Trends Report 2014. Accessed 20 June 2017
  2. 2.
    Szeliski, R.: Computer Vision: Algorithms and Applications. Springer Science & Business Media, Berlin (2010)Google Scholar
  3. 3.
    Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  4. 4.
    Walther, D., Itti, L., Riesenhuber, M., Poggio, T., Koch, C.: Attentional selection for object recognition - a gentle way. In: International Workshop on Biologically Motivated Computer Vision, pp. 472–479. Springer (2002)Google Scholar
  5. 5.
    Lowe, D. G.: Object recognition from local scale-invariant features. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, vol. 2, pp. 1150–1157. IEEE (1999)Google Scholar
  6. 6.
    Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: Computer Vision ECCV 2006, pp. 404–417 (2006)CrossRefGoogle Scholar
  7. 7.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893. IEEE (2005)Google Scholar
  8. 8.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016).
  9. 9.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  10. 10.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich: feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  11. 11.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  12. 12.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)Google Scholar
  13. 13.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer, Berlin (2016)CrossRefGoogle Scholar
  14. 14.
    LeCun, Y.: LeNet-5, Convolutional Neural Networks (2015).
  15. 15.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  17. 17.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
  18. 18.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer, Cham (2014)Google Scholar
  19. 19.
    Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2019). arXiv:1704.04861
  20. 20.
    Steinkraus, D., Buck, I., Simard, P.: Using GPUs for machine learning algorithms. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition, pp. 1115–1120. IEEE (2005)Google Scholar
  21. 21.
    Rojas, R.: Neural Networks - A Systematic Introduction. Springer, Berlin (1996)CrossRefGoogle Scholar
  22. 22.
    Bishop, C.M.: Pattern recognition and machine learning. Information Science and Statistics. Springer, New York Inc, Secaucus (2006)Google Scholar
  23. 23.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)CrossRefGoogle Scholar
  24. 24.
    Fukushima, K.: Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural Netw. 1, 119–130 (1988)CrossRefGoogle Scholar
  25. 25.
    Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195, 215–243 (1968)CrossRefGoogle Scholar
  26. 26.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the de-tails: delving deep into convolutional nets (2014). arXiv:1405.3531
  27. 27.
    Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A.: Striving for simplicity: the all convolutional net (2014). CoRR labs/ arXiv:1412.6806
  28. 28.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Cognitive Modeling (1988)Google Scholar
  30. 30.
    Sebe, N.: Machine learning in computer vision, vol. 29. Springer Science & Business Media, Berlin (2005)Google Scholar
  31. 31.
    Imagenet large scale visual recognition challenge.
  32. 32.
    Imagenet database statistics.
  33. 33.
    Van de Sande, K.E., Uijlings, J.R., Gevers, T., Smeulders, A.W.: Segmentation as selective search for object recognition. In: IEEE International Conference on Computer Vision (ICCV), pp. 1879–1886. IEEE (2011)Google Scholar
  34. 34.
    Zitnick, C.L., Dollar, P.: Edge boxes: locating object proposals from edges, pp. 391–405. Springer, Berlin (2014)Google Scholar
  35. 35.
    PASCAL VOC image dataset.
  36. 36.
    Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Murphy, K.: Speed/accuracy trade-offs for modern convolutional object detectors (2016). arXiv:1611.10012
  37. 37.
    COCO object detection dataset.
  38. 38.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Research ScholarGujarat Technological UniversityAhmedabadIndia
  2. 2.SCETSuratIndia

Personalised recommendations