Skip to main content

Optimization of Convolution Neural Network Algorithm Based on FPGA

  • Conference paper
  • First Online:
Embedded Systems Technology (ESTC 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 857))

Included in the following conference series:

Abstract

The traditional CNN algorithm requires a great deal of computation and is difficult to be optimized. The computation of throughput on the hardware platform does not match the memory bandwidth very well. The existing scheme doesn’t take full advantage of logical resources, and also doesn’t make full use of memory bandwidth. Neither of them can get the best performance. In this paper, we use the commonly used im2col method in the software implementation and convert convolution operation into matrix multiplication. Therefore, it improves the calculation speed effectively. In the hardware implementation aspect, we propose a nested loop optimization structure. Firstly, the correlation of the parameters is analyzed, multiplication times are reduced and the multiplication of the inner loop is replaced by an addition operation. Hence, the maximum operating frequency and power consumption are improved remarkably. Secondly, the input data and the convolution kernel are multi-level partitioning optimization. The multi-layer input data is grouped by 2k, and the data of each layer is optimized by L group. At the same time, the convolution kernel is also grouped by 2k and the convolution kernel with the parallel data synchronization operation optimization. So the structure has a significant improvement in the degree of parallelism. The external bandwidth and the internal bandwidth can be improved significantly in the condition of the same total computation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  2. Ren, S., He, K., Girshik, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2015)

    Article  Google Scholar 

  3. Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM (2015)

    Google Scholar 

  4. Farabet, C., Poulet, C., Han, J.Y., LeCun, Y.: CNP: An FPGA-based processor for convolutional networks. In: International Conference on Field Programmable Logic and Applications, pp. 32–37. IEEE (2009)

    Google Scholar 

  5. Peemen, M., Setio, A.A., Mesman, B., Corporaal, H.: Memory-centric accelerator design for convolutional neural networks. In: IEEE International Conference on Computer Design, pp. 13–19. IEEE (2013)

    Google Scholar 

  6. Sankaradas, M., Jakkula, V., Cadambi, S., Chakradhar, S., Durdanovic, I., Cosatto, E., Graf, H.P.: A massively parallel coprocessor for convolutional neural networks. In: IEEE International Conference on Application-Specific Systems, Architectures and Processors, pp. 53–60. IEEE (2009)

    Google Scholar 

  7. Chakradhar, S., Sankaradas, M., Jakkula, V., Cadambi, S.: A dynamically configurable coprocessor for convolutional neural networks. ACM SIGARCH Comput. Archit. News 38(3), 247–257 (2010)

    Article  Google Scholar 

  8. Cadambi, S., Majumdar, A., Becchi, M., Chakradhar, S., Graf, H.P.: A programmable parallel accelerator for learning and classification. In: International Conference on Parallel Architectures and Compilation Techniques, pp. 273–284. IEEE (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Weichao Zhang or Xiaogang Tian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tang, F., Zhang, W., Tian, X., Fan, X., Cao, X. (2018). Optimization of Convolution Neural Network Algorithm Based on FPGA. In: Bi, Y., Chen, G., Deng, Q., Wang, Y. (eds) Embedded Systems Technology. ESTC 2017. Communications in Computer and Information Science, vol 857. Springer, Singapore. https://doi.org/10.1007/978-981-13-1026-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1026-3_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1025-6

  • Online ISBN: 978-981-13-1026-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics