An Efficient FIFO Based Accelerator for Convolutional Neural Networks

Abstract

Over the last decade, Convolutional Neural Networks (CNNs) have become the go to technique to perform tasks in deep learning applications such as computer vision, speech recognition, etc. LeCun et al (Nature 521(7553), 436–44) 2015. Even though CNNs are very efficient at these tasks they are not suitable for embedded applications due to the limited power budget. In this work we present an improved architecture to process the convolution layers in a CNN. This work is based on our earlier architecture which uses FIFO (First In First Out memory)s to accelerate CNNs. Panchbhaiyye and Ogunfunmi 2020. The architecture presented takes advantage of sparsity in CNN layer’s inputs and outputs to achieve performance improvement. We evaluate the proposed improvement on 16 bit floating point and 8 bit integer data types and find that this leads to more than 13% improvement in the processing time of convolution layers for VGG16 with float16 data type. Also, we show how this architecture can be used to compute fully connected layers. Overall we are able to exceed the performance of state-of-the-art architectures by more than 1.65x using an inexpensive Pynq Z1 board running at 100Mhz.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

References

  1. 1.

    LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–44. [Online]. Available: https://doi.org/10.1038/nature14539.

    Article  Google Scholar 

  2. 2.

    Panchbhaiyye, V., & Ogunfunmi, T. (2020). A FIFO based accelerator for convolutional neural networks. In ICASSP 2020 - 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1758–1762).

  3. 3.

    Falk, T., Mai, D., & Bensch, R. (2019). U-net: Deep learning for cell counting, detection, and morphometry. Nature Methods, 16, 67–70.

    Article  Google Scholar 

  4. 4.

    Sze, V., Chen, Y., Yang, T., & Emer, J.S. (2017). Efficient processing of deep neural networks: A Tutorial and Survey. Proceedings of the IEEE, 105(12), 2295–2329.

    Article  Google Scholar 

  5. 5.

    Wang, X., Han, Y., Leung, V.C.M., Niyato, D., Yan, X., & Chen, X. (2020). Convergence of edge computing and deep learning: A comprehensive survey. IEEE Communications Surveys Tutorials, 22 (2), 869–904.

    Article  Google Scholar 

  6. 6.

    Lin, D.D., Talathi, S.S. , & Annapureddy, V.S. (2016). Fixed point quantization of deep convolutional networks. In Proceedings of the 33rd international conference on international conference on machine learning - Volume 48, ser, ICML’16. JMLR.org (pp. 2849–2858).

  7. 7.

    Han, S., Mao, H., & Dally, W.J. (2016). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. arXiv:1510.00149.

  8. 8.

    Han, S., Pool, J., Narang, S., Mao, H., Gong, E., Tang, S., Elsen, E., Vajda, P., Paluri, M., Tran, J., Catanzaro, B., & Dally, W.J. (2017). Dsd: Dense-sparse-dense training for deep neural networks arxiv: Computer Vision and Pattern Recognition.

  9. 9.

    Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In 2017 IEEE international conference on computer vision (ICCV) (pp. 2755–2763).

  10. 10.

    Blott, M., Preußer, T.B., Fraser, N.J., Gambardella, G., O’brien, K., Umuroglu, Y., Leeser, M., & Vissers, K. (2018). Finn-r: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Transactions on Reconfigurable Technology and Systems 11(3). [Online]. Available: https://doi.org/10.1145/3242897.

  11. 11.

    Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., Boyle, R., Cantin, P., Chao, C., Clark, C, Coriell, J., Daley, M., Dau, M., Dean, J., Gelb, B., Ghaemmaghami, T.V., Gottipati, R., Gulland, W., Hagmann, R., Ho, C.R., Hogberg, D., Hu, J., Hundt, R., Hurt, D., Ibarz, J., Jaffey, A., Jaworski, A., Kaplan, A., Khaitan, H., Killebrew, D., Koch, A., Kumar, N., Lacy, S., Laudon, J., Law, J., Le, D., Leary, C., Liu, Z., Lucke, K., Lundin, A., MacKean, G., Maggiore, A., Mahony, M., Miller, K., Nagarajan, R., Narayanaswami, R., Ni, R., Nix, K., Norrie, T., Omernick, M., Penukonda, N., Phelps, A., Ross, J., Ross, M., Salek, A., Samadiani, E., Severn, C., Sizikov, G., Snelham, M., Souter, J., Steinberg, D., Swing, A., Tan, M., Thorson, G., Tian, B., Toma, H., Tuttle, E., Vasudevan, V., Walter, R., Wang, W., Wilcox, E., & Yoon, D.H. (2017). In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th annual international symposium on computer architecture (ISCA) (pp. 1–12).

  12. 12.

    Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In Gordon, G., Dunson, D., & Dudík, M. (Eds.) Proceedings of the fourteenth international conference on artificial intelligence and statistics. Fort Lauderdale, FL, USA: PMLR, 11-13, (Vol. 15 pp. 315–323).

  13. 13.

    Nair, V., & Hinton, G.E. (2010). Rectified linear units improve restricted boltzmann machines, (pp. 807–814). USA: Omnipress. [Online]. Available: http://dl.acm.org/citation.cfm?id=3104322.3104425.

    Google Scholar 

  14. 14.

    Hennessy, J.L., & Patterson, D.A. (2017). Computer Architecture, Sixth edition: A Quantitative approach, 6th edn. San Francisco, CA USA: Morgan Kaufmann Publishers Inc.

  15. 15.

    Dumoulin, V, & Visin, F. (2018). A guide to convolution arithmetic for deep learning.

  16. 16.

    Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations.

  17. 17.

    He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770–778).

  18. 18.

    Szegedy, C., Liu, Wei, Jia, Yangqing, Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–9).

  19. 19.

    Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Navab, N., Hornegger, J., Wells, W.M., & Frangi, A.F. (Eds.) Medical image computing and computer-assisted intervention – MICCAI 2015 (pp. 234–241). Cham: Springer International Publishing.

  20. 20.

    Digilent. (2019). PYNQ-Z1 Reference Manual. [Online]. Available: https://reference.digilentinc.com/reference/programmable-logic/pynq-z1/reference-manual.

  21. 21.

    Xilinx. (2019). Vivado design suite user guide - high-level synthesis ug902(v2019.2). https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug902-vivado-high-level-synthesis.pdf. [Online].

  22. 22.

    Xilinx. (2018). Pynq python library, v2.4. https://pynq.readthedocs.io/en/v2.4/index.html. [Online].

  23. 23.

    ARM. (2010). AMBA®; 4 AXI4-Stream Protocol, in AMBA®; 4 AXI4-Stream Protocol. ARM. [Online]. Available: https://static.docs.arm.com/ihi0051/a/IHI0051A_amba4_axi4_stream_v1_0_protocol_spec.pdf.

  24. 24.

    Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 2704–2713).

  25. 25.

    Chen, Y., Krishna, T., Emer, J.S., & Sze, V. (2017). Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1), 127–138.

    Article  Google Scholar 

  26. 26.

    Ardakani, A., Condo, C., Ahmadi, M., & Gross, W. (2017). An architecture to accelerate convolution in deep neural networks. IEEE Transactions on Circuits and Systems I: Regular Papers, 10, 1–14.

    Google Scholar 

  27. 27.

    Aimar, A., Mostafa, H., Calabrese, E., Rios-Navarro, A., Tapiador-Morales, R., Lungu, I., Milde, M.B., Corradi, F., Linares-Barranco, A., Liu, S., & Delbruck, T. (2019). Nullhop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Transactions on Neural Networks and Learning Systems, 30(3), 644–656.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Tokunbo Ogunfunmi.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Panchbhaiyye, V., Ogunfunmi, T. An Efficient FIFO Based Accelerator for Convolutional Neural Networks. J Sign Process Syst (2021). https://doi.org/10.1007/s11265-020-01632-0

Download citation

Keywords

  • Convolution neural networks
  • FPGA
  • Machine learning
  • Dataflow