Abstract
Object detection and recognition algorithms using deep convolutional neural networks (CNNs) tend to be computationally intensive to implement. This presents a particular challenge for embedded systems, such as mobile robots, where the computational resources tend to be far less than for workstations. As an alternative to standard, uniformly sampled images, we propose the use of foveated image sampling here to reduce the size of images, which are faster to process in a CNN due to the reduced number of convolution operations. We evaluate object detection and recognition on the Microsoft COCO database, using foveated image sampling at different image sizes, ranging from \(416\times 416\) to \(96\times 96\) pixels, on an embedded GPU – an NVIDIA Jetson TX2 with 256 CUDA cores. The results show that it is possible to achieve a \(4{\times }\) speed-up in frame rates, from 3.59 FPS to 15.24 FPS, using \(416\times 416\) and \(128\times 128\) pixel images respectively. For foveated sampling, this image size reduction led to just a small decrease in recall performance in the foveal region, to 92.0% of the baseline performance with full-sized images, compared to a significant decrease to 50.1% of baseline recall performance in uniformly sampled images, demonstrating the advantage of foveated sampling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Person, bicycle, car, motorbike, aeroplane, bus, train, truck, boat, traffic light, fire hydrant, stop sign, parking meter, bench, bird, cat, dog, horse, sheep, cow.
- 2.
96 \(\times \) 96, 128 \(\times \) 128, 160 \(\times \) 160, 192 \(\times \) 192, 224 \(\times \) 224, 256 \(\times \) 256, 288 \(\times \) 288, 320 \(\times \) 320, 352 \(\times \) 352, 384 \(\times \) 384 and 416 \(\times \) 416.
References
Akbas, E., Eckstein, M.P.: Object detection through search with a foveated visual system. PLoS Comput. Biol. 13(10), e1005743 (2017)
Almeida, A.F., Figueiredo, R., Bernardino, A., Santos-Victor, J.: Deep networks for human visual attention: a hybrid model using foveal vision. In: Ollero, A., Sanfeliu, A., Montano, L., Lau, N., Cardeira, C. (eds.) ROBOT 2017. AISC, vol. 694, pp. 117–128. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-70836-2_10
Frintrop, S., Werner, T., Martin Garcia, G.: Traditional saliency reloaded: a good old model in new shape. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 82–90 (2015)
Itti, L.: Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans. Image Process. 13(10), 1304–1318 (2004)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Martinez, J., Altamirano, L.: A new foveal cartesian geometry approach used for object tracking. In: Proceedings of the IASTED International Conference on Signal Processing, Pattern Recognition, and Applications, SPPRA 2006, Innsbruck, Austria, pp. 133–139 (2006)
Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS-W (2017). Accessed 20 Oct 2018
Recasens, A., Kellnhofer, P., Stent, S., Matusik, W., Torralba, A.: Learning to zoom: a saliency-based sampling layer for neural networks. arXiv preprint arXiv:1809.03355 (2018)
Redmon, J.: Darknet: open source neural networks in C (2016). http://pjreddie.com/darknet/. Accessed 25 Aug 2018
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Shafiee, M.J., Chywl, B., Li, F., Wong, A.: Fast YOLO: a fast you only look once system for real-time embedded object detection in video. arXiv preprint arXiv:1709.05943 (2017)
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009)
Strasburger, H., Rentschler, I., Jüttner, M.: Peripheral vision and pattern recognition: a review. J. Vision 11(5), 1–82 (2011)
Tijtgat, N., Van Ranst, W., Volckaert, B., Goedemé, T., De Turck, F.: Embedded real-time object detection for a UAV warning system. In: The International Conference on Computer Vision, ICCV 2017, pp. 2110–2118 (2017)
Tong, F., Li, Z.N.: Reciprocal-wedge transform for space-variant sensing. IEEE Trans. Pattern Anal. Mach. Intell. 17(5), 500–511 (1995)
Traver, V.J., Bernardino, A.: A review of log-polar imaging for visual perception in robotics. Rob. Autonom. Syst. 58(4), 378–398 (2010)
Wässle, H., Grünert, U., Röhrenbeck, J., Boycott, B.B.: Cortical magnification factor and the ganglion cell density of the primate retina. Nature 341(6243), 643–646 (1989)
Wilson, S.W.: On the retino-cortical mapping. Int. J. Man Mach. Stud. 18(4), 361–389 (1983)
Wu, B., Iandola, F.N., Jin, P.H., Keutzer, K.: SqueezeDet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: CVPR Workshops, pp. 446–454 (2017)
Zhang, X., Gao, T., Gao, D.: A new deep spatial transformer convolutional neural network for image saliency detection. Des. Autom. Embed. Syst. 1–14 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Jaramillo-Avila, U., Anderson, S.R. (2019). Foveated Image Processing for Faster Object Detection and Recognition in Embedded Systems Using Deep Convolutional Neural Networks. In: Martinez-Hernandez, U., et al. Biomimetic and Biohybrid Systems. Living Machines 2019. Lecture Notes in Computer Science(), vol 11556. Springer, Cham. https://doi.org/10.1007/978-3-030-24741-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-24741-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24740-9
Online ISBN: 978-3-030-24741-6
eBook Packages: Computer ScienceComputer Science (R0)