Vehicle Detection Using Alex Net and Faster R-CNN Deep Learning Models: A Comparative Study

  • Jorge E. Espinosa
  • Sergio A. VelastinEmail author
  • John W. Branch
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10645)


This paper presents a comparative study of two deep learning models used here for vehicle detection. Alex Net and Faster R-CNN are compared with the analysis of an urban video sequence. Several tests were carried to evaluate the quality of detections, failure rates and times employed to complete the detection task. The results allow to obtain important conclusions regarding the architectures and strategies used for implementing such network for the task of video detection, encouraging future research in this topic.


Convolutional Neural Network Feature extraction Vehicle classification 



S.A. Velastin is grateful to funding received from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 600371, el Ministerio de Economía y Competitividad (COFUND2013-51509) and Banco Santander. The authors wish to thank Dr. Fei Yin for the code for metrics employed for evaluations. Finally, we gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPUs used for this research. The data and code used for this work is available upon request from the authors.


  1. 1.
    Tsai, L.W., Hsieh, J.W., Fan, K.C.: Vehicle detection using normalized color and edge map. IEEE Trans. Image Process. 16(3), 850–864 (2007)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Ma, X., Grimson, W.E.L.: Edge-based rich representation for vehicle classification. In: 10th IEEE International Conference on Computer Vision (ICCV 2005), vol. 1–2, pp. 1185–1192 (2005)Google Scholar
  3. 3.
    Buch, N., Orwell, J., Velastin, S.A.: 3D extended histogram of oriented gradients (3DHOG) for classification of road users in urban scenes (2009)Google Scholar
  4. 4.
    Feris, R.S., et al.: Large-scale vehicle detection, indexing, and search in urban surveillance videos. IEEE Trans. Multimed. 14(1), 28–42 (2012)CrossRefGoogle Scholar
  5. 5.
    Chen, Z., Ellis, T.: Multi-shape descriptor vehicle classification for urban traffic. In: 2011 International Conference on Digital Image Computing Techniques and Applications (DICTA), pp. 456–461 (2011)Google Scholar
  6. 6.
    Chen, Z., Ellis, T., Velastin, S.A.: Vehicle detection, tracking and classification in urban traffic. In: 2012 15th International IEEE Conference on Intelligent Transportation Systems, pp. 951–956 (2012)Google Scholar
  7. 7.
    Gupte, S., Masoud, O., Martin, R.F., Papanikolopoulos, N.P.: Detection and classification of vehicles. IEEE Trans. Intell. Transp. Syst. 3(1), 37–47 (2002)CrossRefGoogle Scholar
  8. 8.
    Cucchiara, R., Piccardi, M., Mello, P.: Image analysis and rule-based reasoning for a traffic monitoring system. IEEE Trans. Intell. Transp. Syst. 1(2), 119–130 (2000)CrossRefGoogle Scholar
  9. 9.
    Messelodi, S., Modena, C.M., Zanin, M.: A computer vision system for the detection and classification of vehicles at urban road intersections. Pattern Anal. Appl. 8(1–2), 17–31 (2005)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Huang, C.-L., Liao, W.-C.: A vision-based vehicle identification system. In: Proceedings of 17th International Conference on Pattern Recognition, ICPR 2004, vol. 4, pp. 364–367 (2004)Google Scholar
  11. 11.
    Ottlik, A., Nagel, H.-H.: Initialization of model-based vehicle tracking in video sequences of inner-city intersections. Int. J. Comput. Vis. 80(2), 211–225 (2008)CrossRefGoogle Scholar
  12. 12.
    Tian, B., et al.: Hierarchical and networked vehicle surveillance in its: a survey. IEEE Trans. Intell. Transp. Syst. 16(2), 557–580 (2015)Google Scholar
  13. 13.
    ImageNet Large Scale Visual Recognition Competition (ILSVRC). Accessed 24 Oct 2016
  14. 14.
    Wang, H., Cai, Y., Chen, L.: A vehicle detection algorithm based on deep belief network. Sci. World J. 2014, e647380 (2014)Google Scholar
  15. 15.
    Dong, Z., Pei, M., He, Y., Liu, T., Dong, Y., Jia, Y.: Vehicle type classification using unsupervised convolutional neural network. In: 2014 22nd International Conference on Pattern Recognition (ICPR), pp. 172–177 (2014)Google Scholar
  16. 16.
    Chen, X., Xiang, S., Liu, C.L., Pan, C.H.: Vehicle detection in satellite images by hybrid deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 11(10), 1797–1801 (2014)CrossRefGoogle Scholar
  17. 17.
    Hu, C., Bai, X., Qi, L., Chen, P., Xue, G., Mei, L.: Vehicle color recognition with spatial pyramid deep learning. IEEE Trans. Intell. Transp. Syst. 16(5), 2925–2934 (2015)CrossRefGoogle Scholar
  18. 18.
    Su, B., Shao, J., Zhou, J., Zhang, X., Mei, L.: Vehicle color recognition in the surveillance with deep convolutional neural networks (2015)Google Scholar
  19. 19.
    Zhang, F., Xu, X., Qiao, Y.: Deep classification of vehicle makers and models: the effectiveness of pre-training and data enhancement. In: 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 231–236 (2015)Google Scholar
  20. 20.
    Bautista, C.M., Dy, C.A., Mañalac, M.I., Orbe, R.A., Cordel, M.: Convolutional neural network for vehicle detection in low resolution traffic videos. In: 2016 IEEE Region 10 Symposium (TENSYMP), pp. 277–281 (2016)Google Scholar
  21. 21.
    Girshick, R.: Fast R-CNN. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  22. 22.
    Wang, S., Liu, F., Gan, Z., Cui, Z.: Vehicle type classification via adaptive feature clustering for traffic surveillance video. In: 2016 8th International Conference on Wireless Communications Signal Processing (WCSP), pp. 1–5 (2016)Google Scholar
  23. 23.
    Chabot, F., Chaouch, M., Rabarisoa, J., Teulière, C., Chateau, T.: Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. arXiv preprint arXiv:170307570 (2017)
  24. 24.
    Fan, Q., Brown, L., Smith, J.: A closer look at faster R-CNN for vehicle detection. In: 2016 IEEE Intelligent Vehicles Symposium (IV), pp. 124–129 (2016)Google Scholar
  25. 25.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  26. 26.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556 (2014)
  27. 27.
    Liu, D., Wang, Y.: Monza: image classification of vehicle make and model using convolutional neural networks and transfer learning. Accessed 16 Oct 2017
  28. 28.
    Gao, Y., Lee, H.J.: Local tiled deep networks for recognition of vehicle make and model. Sensors 16(2), 226 (2016)CrossRefGoogle Scholar
  29. 29.
    Liu, X., Liu, W., Mei, T., Ma, H.: A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In: European Conference on Computer Vision, pp. 869–884 (2016)Google Scholar
  30. 30.
    Bromley, J., et al.: Signature verification using a “siamese” time delay neural network. IJPRAI 7(4), 669–688 (1993)Google Scholar
  31. 31.
    Su, B., Shao, J., Zhou, J., Zhang, X., Mei, L., Hu, C.: The precise vehicle retrieval in traffic surveillance with deep convolutional neural networks. Int. J. Inf. Electron. Eng. 6(3), 192 (2016)Google Scholar
  32. 32.
    Cai, Y., Sun, X., Wang, H., Chen, L., Jiang, H.: Night-time vehicle detection algorithm based on visual saliency and deep learning. J. Sens. 2016 (2016)Google Scholar
  33. 33.
    Wu, Y.Y., Tsai, C.M.: Pedestrian, bike, motorcycle, and vehicle classification via deep learning: deep belief network and small training set. In: 2016 International Conference on Applied System Innovation (ICASI), pp. 1–4 (2016)Google Scholar
  34. 34.
    Huang, B.-J., Hsieh, J.-W., Tsai, C.-M.: Vehicle detection in Hsuehshan Tunnel using background subtraction and deep belief network. In: Asian Conference on Intelligent Information and Database Systems, pp. 217–226 (2017)Google Scholar
  35. 35.
    Zhou, Y., Liu, L., Shao, L., Mellor, M.: DAVE: a unified framework for fast vehicle detection and annotation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 278–293. Springer, Cham (2016). doi: 10.1007/978-3-319-46475-6_18 CrossRefGoogle Scholar
  36. 36.
    You, R., Kwon, J.-W.: VoNet: vehicle orientation classification using convolutional neural network. In: Proceedings of 2nd International Conference on Communication and Information Processing, pp. 195–199 (2016)Google Scholar
  37. 37.
    Caffe — Deep Learning Framework. Accessed 05 Sept 2016
  38. 38.
    Luo, X., Shen, R., Hu, J., Deng, J., Hu, L., Guan, Q.: A deep convolution neural network model for vehicle recognition and face recognition. Procedia Comput. Sci. 107, 715–720 (2017)CrossRefGoogle Scholar
  39. 39.
    Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  40. 40.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  41. 41.
    Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:13124400 (2013)
  42. 42.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  43. 43.
    ImageNet Large Scale Visual Recognition Competition 2012 (ILSVRC 2012). Accessed 30 Aug 2017
  44. 44.
    Brown, L.M., Fan, Q., Zhai, Y.: Self-calibration from vehicle information. In: 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6 (2015)Google Scholar
  45. 45.
    Lampert, C.H., Blaschko, M.B., Hofmann, T.: Efficient subwindow search: a branch and bound framework for object localization. IEEE Trans. Pattern Anal. Mach. Intell. 31(12), 2129–2142 (2009)CrossRefGoogle Scholar
  46. 46.
    Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)CrossRefGoogle Scholar
  47. 47.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)CrossRefGoogle Scholar
  48. 48.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). doi: 10.1007/978-3-319-10602-1_26 Google Scholar
  49. 49.
    Hosang, J., Benenson, R., Dollár, P., Schiele, B.: What makes for effective detection proposals? IEEE Trans. Pattern Anal. Mach. Intell. 38(4), 814–830 (2016)CrossRefGoogle Scholar
  50. 50.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  51. 51.
    ILSVRC2016. Accessed 30 Aug 2017
  52. 52.
    Zivkovic, Z.: Improved adaptive Gaussian mixture model for background subtraction. In: Proceedings of 17th International Conference on Pattern Recognition, ICPR 2004, vol. 2, pp. 28–31 (2004)Google Scholar
  53. 53.
    Zivkovic, Z., Van Der Heijden, F.: Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit. Lett. 27(7), 773–780 (2006)CrossRefGoogle Scholar
  54. 54.
    Image Category Classification Using Deep Learning - MATLAB & Simulink Example. Accessed 28 Feb 2017
  55. 55.
    8th International Conference on Pattern Recognition Systems |Universidad Carlos III de Madrid — Madrid, Spain. Accessed 30 Aug 2017
  56. 56.
    Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519 (2014)Google Scholar
  57. 57.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). doi: 10.1007/978-3-319-10590-1_53 Google Scholar
  58. 58.
    The PASCAL Visual Object Classes Challenge 2007 (VOC 2007). Accessed 31 Aug 2017
  59. 59.
    ViPER: The Video Performance Evaluation Resource. Accessed 31 Aug 2017
  60. 60.
    Yin, F., Makris, D., Velastin, S.A.: Performance evaluation of object tracking algorithms. In: IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, Rio De Janeiro, Brazil, p. 25 (2007)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Jorge E. Espinosa
    • 1
  • Sergio A. Velastin
    • 2
    • 3
    Email author
  • John W. Branch
    • 4
  1. 1.Facultad de IngenieríasPolitécnico Colombiano Jaime Isaza Cadavid – MedellínMedellínColombia
  2. 2.University Carlos III - MadridMadridSpain
  3. 3.Queen Mary University of LondonLondonUK
  4. 4.Facultad de MinasUniversidad Nacional de Colombia – Sede MedellínMedellínColombia

Personalised recommendations