Speeding up inference on deep neural networks for object detection by performing partial convolution


Real-time object detection is an expected application of deep neural networks (DNNs). It can be achieved by employing graphic processing units (GPUs) or dedicated hardware accelerators. Alternatively, in this work, we present a software scheme to accelerate the inference stage of DNNs designed for object detection. The scheme relies on partial processing within the consecutive convolution layers of a DNN. It makes use of different relationships between the locations of the components of an input feature, an intermediate feature representation, and an output feature to effectively identify the modified components. This downsizes the matrix multiplicand to cover only those modified components. Therefore, matrix multiplication is accelerated within a convolution layer. In addition, the aforementioned relationships can also be employed to signal the next consecutive convolution layer regarding the modified components. This further helps reduce the overhead of the comparison on a member-by-member basis to identify the modified components. The proposed scheme has been experimentally benchmarked against a similar concept approach, namely, CBinfer, and against the original Darknet on the Tiny-You Only Look Once network. The experiments were conducted on a personal computer with dual CPU running at 3.5 GHz without GPU acceleration upon video data sets from YouTube. The results show that improvement ratios of 1.56 and 13.10 in terms of detection frame rate over CBinfer and Darknet, respectively, are attainable on average. Our scheme was also extended to exploit GPU-assisted acceleration. The experimental results of NVIDIA Jetson TX2 reached a detection frame rate of 28.12 frames per second (1.25\(\times\) with respect to CBinfer). The accuracy of detection of all experiments was preserved at 90% of the original Darknet.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    We make use of the im2col implementation by Berkeley Vision’s Caffe, available at https://github.com/BVLC/caffe/blob/master/LICENSE.


  1. 1.

    Zhao, Z-Q., Zheng, P., Xu, H. S., WU, X.: Object detection with deep learning: a review. J. LaTeX Class Files 14, 8. arXiv:1807.05511 (2017)

  2. 2.

    Pathak, A.R., Pandey, M., Rautaray, S.: Application of deep learning for object detection. Proc. Comput. Sci. 132, 1706–1717 (2018)

    Article  Google Scholar 

  3. 3.

    Vondrick, C., Khosla, A., Pirsiavash, H., Malisiewicz, T., Torralba, A.: Visualizing object detection features. Int. J. Comput. Vis. 119(2), 145–158 (2016)

    MathSciNet  Article  Google Scholar 

  4. 4.

    Matsumoto, M.: SVM-based object detection using self-quotient \(\epsilon\)-filter and histograms of oriented gradients. In: Proceedings of the Computational Intelligence. Springer, Berlin Heidelberg, pp. 277–286 (2012)

  5. 5.

    Szegedy, C., Toshev, A., Erhan, D.: Deep neural networks for object detection. In; Proceedings of the 26th International Conference on Neural Information Processing Systems—Volume 2, Lake Tahoe, Nevada, pp. 2553–2561 (2013)

  6. 6.

    Liu, N., Han, J., Zhang, D., Wen, S., Liu, T.: Predicting eye fixations using convolutional neural networks. CVPR (2015)

  7. 7.

    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR (2014)

  8. 8.

    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS (2015)

  9. 9.

    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. CVPR (2016)

  10. 10.

    Redmon, J., Farhadi, A.: YOLO9000: Better, Faster, Stronger, arXiv (2016). arXiv:1612.08242

  11. 11.

    Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement, arXiv (2018), arXiv:1804.02767

  12. 12.

    Huynh, L.N., Lee, Y., Balan, R.K.: Deepmon: Mobile GPU-based deep learning framework for continuous vision applications. In: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, New York, pp. 82–95 (2017)

  13. 13.

    Mobahi, H., Collobert, R., Weston, J.: Deep learning from temporal coherence in video. In: Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, Quebec, Canada, pp. 737–744 (2009)

  14. 14.

    Lin, X., Zhao, C., Pan W.: Towards accurate binary convolutional neural network, NIPS 2017. Long Beach, CA, USA, pp. 344–352 (2017)

  15. 15.

    Bertasiu, G., Torresani, L., Shi, J.: Object detection in video with spatiotemporal sampling networks, ECCV2018. arXiv:1803.05549 (2018)

  16. 16.

    Cavigelli, L., Degen, P., Benini, L.: CBinfer: Change-based inference for convolutional neural networks on video data. arXiv:1704.04313 (2017)

  17. 17.

    Xu, M., Zhu, M., Liu, Y., Lin, F.X., Liu, X.: DeepCache: Principled cache for mobile deep vision. arXiv:1712.01670 (2018)

  18. 18.

    Anderson, A., Vasudevany, A., Keane, C., Gregg, D.: Low-memory GEMM-based convolution algorithms for deep neural networks, DeepMon: Mobile GPU-based deep learning framework for continuous vision applications. In: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, New York, pp. 82–95 (2017)

  19. 19.

    Abu-El-Haija, S., Kothari, N.: YouTube-8M: A large-scale video classification Benchmark (2016)

  20. 20.

    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S. E., Fu, C. Y., Berg, A. C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M., (eds) Computer Vision—ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol 9905. Springer, Cham (2016)

Download references


This work was supported by Thailand Research Fund (TRF) and Walailak University, Thailand, under Grant number RSA6280097.

Author information



Corresponding author

Correspondence to Wattanapong Kurdthongmee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kurdthongmee, W. Speeding up inference on deep neural networks for object detection by performing partial convolution. J Real-Time Image Proc 17, 1487–1503 (2020). https://doi.org/10.1007/s11554-019-00906-6

Download citation


  • Deep neural networks
  • DNNs object detection
  • Convolution
  • Inference acceleration