An Efficient FIFO Based Accelerator for Convolutional Neural Networks

Panchbhaiyye, Vineet; Ogunfunmi, Tokunbo

doi:10.1007/s11265-020-01632-0

An Efficient FIFO Based Accelerator for Convolutional Neural Networks

Published: 20 February 2021

Volume 93, pages 1117–1129, (2021)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

528 Accesses
5 Citations
Explore all metrics

Abstract

Over the last decade, Convolutional Neural Networks (CNNs) have become the go to technique to perform tasks in deep learning applications such as computer vision, speech recognition, etc. LeCun et al (Nature 521(7553), 436–44) 2015. Even though CNNs are very efficient at these tasks they are not suitable for embedded applications due to the limited power budget. In this work we present an improved architecture to process the convolution layers in a CNN. This work is based on our earlier architecture which uses FIFO (First In First Out memory)s to accelerate CNNs. Panchbhaiyye and Ogunfunmi 2020. The architecture presented takes advantage of sparsity in CNN layer’s inputs and outputs to achieve performance improvement. We evaluate the proposed improvement on 16 bit floating point and 8 bit integer data types and find that this leads to more than 13% improvement in the processing time of convolution layers for VGG16 with float16 data type. Also, we show how this architecture can be used to compute fully connected layers. Overall we are able to exceed the performance of state-of-the-art architectures by more than 1.65x using an inexpensive Pynq Z1 board running at 100Mhz.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

CBAM: Convolutional Block Attention Module

A review on the long short-term memory model

Article 13 May 2020

References

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–44. [Online]. Available: https://doi.org/10.1038/nature14539.
Article Google Scholar
Panchbhaiyye, V., & Ogunfunmi, T. (2020). A FIFO based accelerator for convolutional neural networks. In ICASSP 2020 - 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1758–1762).
Falk, T., Mai, D., & Bensch, R. (2019). U-net: Deep learning for cell counting, detection, and morphometry. Nature Methods, 16, 67–70.
Article Google Scholar
Sze, V., Chen, Y., Yang, T., & Emer, J.S. (2017). Efficient processing of deep neural networks: A Tutorial and Survey. Proceedings of the IEEE, 105(12), 2295–2329.
Article Google Scholar
Wang, X., Han, Y., Leung, V.C.M., Niyato, D., Yan, X., & Chen, X. (2020). Convergence of edge computing and deep learning: A comprehensive survey. IEEE Communications Surveys Tutorials, 22 (2), 869–904.
Article Google Scholar
Lin, D.D., Talathi, S.S. , & Annapureddy, V.S. (2016). Fixed point quantization of deep convolutional networks. In Proceedings of the 33rd international conference on international conference on machine learning - Volume 48, ser, ICML’16. JMLR.org (pp. 2849–2858).
Han, S., Mao, H., & Dally, W.J. (2016). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. arXiv:1510.00149.
Han, S., Pool, J., Narang, S., Mao, H., Gong, E., Tang, S., Elsen, E., Vajda, P., Paluri, M., Tran, J., Catanzaro, B., & Dally, W.J. (2017). Dsd: Dense-sparse-dense training for deep neural networks arxiv: Computer Vision and Pattern Recognition.
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In 2017 IEEE international conference on computer vision (ICCV) (pp. 2755–2763).
Blott, M., Preußer, T.B., Fraser, N.J., Gambardella, G., O’brien, K., Umuroglu, Y., Leeser, M., & Vissers, K. (2018). Finn-r: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Transactions on Reconfigurable Technology and Systems 11(3). [Online]. Available: https://doi.org/10.1145/3242897.
Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., Boyle, R., Cantin, P., Chao, C., Clark, C, Coriell, J., Daley, M., Dau, M., Dean, J., Gelb, B., Ghaemmaghami, T.V., Gottipati, R., Gulland, W., Hagmann, R., Ho, C.R., Hogberg, D., Hu, J., Hundt, R., Hurt, D., Ibarz, J., Jaffey, A., Jaworski, A., Kaplan, A., Khaitan, H., Killebrew, D., Koch, A., Kumar, N., Lacy, S., Laudon, J., Law, J., Le, D., Leary, C., Liu, Z., Lucke, K., Lundin, A., MacKean, G., Maggiore, A., Mahony, M., Miller, K., Nagarajan, R., Narayanaswami, R., Ni, R., Nix, K., Norrie, T., Omernick, M., Penukonda, N., Phelps, A., Ross, J., Ross, M., Salek, A., Samadiani, E., Severn, C., Sizikov, G., Snelham, M., Souter, J., Steinberg, D., Swing, A., Tan, M., Thorson, G., Tian, B., Toma, H., Tuttle, E., Vasudevan, V., Walter, R., Wang, W., Wilcox, E., & Yoon, D.H. (2017). In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th annual international symposium on computer architecture (ISCA) (pp. 1–12).
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In Gordon, G., Dunson, D., & Dudík, M. (Eds.) Proceedings of the fourteenth international conference on artificial intelligence and statistics. Fort Lauderdale, FL, USA: PMLR, 11-13, (Vol. 15 pp. 315–323).
Nair, V., & Hinton, G.E. (2010). Rectified linear units improve restricted boltzmann machines, (pp. 807–814). USA: Omnipress. [Online]. Available: http://dl.acm.org/citation.cfm?id=3104322.3104425.
Google Scholar
Hennessy, J.L., & Patterson, D.A. (2017). Computer Architecture, Sixth edition: A Quantitative approach, 6th edn. San Francisco, CA USA: Morgan Kaufmann Publishers Inc.
Dumoulin, V, & Visin, F. (2018). A guide to convolution arithmetic for deep learning.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770–778).
Szegedy, C., Liu, Wei, Jia, Yangqing, Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–9).
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Navab, N., Hornegger, J., Wells, W.M., & Frangi, A.F. (Eds.) Medical image computing and computer-assisted intervention – MICCAI 2015 (pp. 234–241). Cham: Springer International Publishing.
Digilent. (2019). PYNQ-Z1 Reference Manual. [Online]. Available: https://reference.digilentinc.com/reference/programmable-logic/pynq-z1/reference-manual.
Xilinx. (2019). Vivado design suite user guide - high-level synthesis ug902(v2019.2). https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug902-vivado-high-level-synthesis.pdf. [Online].
Xilinx. (2018). Pynq python library, v2.4. https://pynq.readthedocs.io/en/v2.4/index.html. [Online].
ARM. (2010). AMBA^®; 4 AXI4-Stream Protocol, in AMBA^®; 4 AXI4-Stream Protocol. ARM. [Online]. Available: https://static.docs.arm.com/ihi0051/a/IHI0051A_amba4_axi4_stream_v1_0_protocol_spec.pdf.
Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 2704–2713).
Chen, Y., Krishna, T., Emer, J.S., & Sze, V. (2017). Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1), 127–138.
Article Google Scholar
Ardakani, A., Condo, C., Ahmadi, M., & Gross, W. (2017). An architecture to accelerate convolution in deep neural networks. IEEE Transactions on Circuits and Systems I: Regular Papers, 10, 1–14.
Google Scholar
Aimar, A., Mostafa, H., Calabrese, E., Rios-Navarro, A., Tapiador-Morales, R., Lungu, I., Milde, M.B., Corradi, F., Linares-Barranco, A., Liu, S., & Delbruck, T. (2019). Nullhop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Transactions on Neural Networks and Learning Systems, 30(3), 644–656.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Santa Clara University, 500 El Camino Real, Santa Clara, 95053, CA, USA
Vineet Panchbhaiyye & Tokunbo Ogunfunmi

Authors

Vineet Panchbhaiyye
View author publications
You can also search for this author in PubMed Google Scholar
Tokunbo Ogunfunmi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tokunbo Ogunfunmi.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Panchbhaiyye, V., Ogunfunmi, T. An Efficient FIFO Based Accelerator for Convolutional Neural Networks. J Sign Process Syst 93, 1117–1129 (2021). https://doi.org/10.1007/s11265-020-01632-0

Download citation

Received: 08 July 2020
Revised: 03 December 2020
Accepted: 21 December 2020
Published: 20 February 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11265-020-01632-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient FIFO Based Accelerator for Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

CBAM: Convolutional Block Attention Module

A review on the long short-term memory model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Efficient FIFO Based Accelerator for Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

CBAM: Convolutional Block Attention Module

A review on the long short-term memory model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation