Advertisement

Systolic Array Based Accelerator and Algorithm Mapping for Deep Learning Algorithms

  • Zhijie Yang
  • Lei WangEmail author
  • Dong Ding
  • Xiangyu Zhang
  • Yu Deng
  • Shiming Li
  • Qiang Dou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11276)

Abstract

As the depth of DNN increases, the need for DNN calculations for the storage and computing power of the underlying computing platform is increasing. In this work, we implement an accelerator on FPGA for deep learning algorithms (CNN and RNN). The core computing module of the accelerator is a 32 * 32 systolic array of PEs. A mapping method for variable size of CNN and RNN algorithms is proposed. The experiment result shows that the maximum power consumption of the accelerator is 7.5W@100Mhz, the peak performance is 0.2Tops/s, and the real performance is 7.6Mops@100Mhz when running the 1st layer of LeNet-5.

Keywords

Accelerator Systolic array DNN Data mapping 

Notes

Acknowledgment

This project is supported by HGJ2017ZX01028103.

References

  1. 1.
    Hazelwood, K., Kalro, A., Law, J., Lee, K., Lu, J., Noordhuis, P., et al.: Applied machine learning at Facebook: a datacenter infrastructure perspective. In: IEEE International Symposium on High Performance Computer Architecture, pp. 620–629. IEEE Computer Society (2018)Google Scholar
  2. 2.
    Tang, B., S.O. Information, Y.N. University: Case study of the application of field programmable gate array FPGA in the smart skill. Application of IC (2018)Google Scholar
  3. 3.
    Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., et al.: In-datacenter performance analysis of a tensor processing unit, pp. 1–12 (2017)Google Scholar
  4. 4.
    Kung, H.T., Leiserson, C.E.: Systolic arrays (for VLSI). In: Proceedings of Sparse Matrix Conference, pp. 256–282 (1978)Google Scholar
  5. 5.
    Farabet, C., Martini, B., Corda, B., Akselrod, P.: NeuFlow: a runtime reconfigurable dataflow processor for vision. In: Computer Vision and Pattern Recognition Workshops, vol. 9, pp. 109–116. IEEE (2011)Google Scholar
  6. 6.
    Chung, E., Fowers, J., Ovtcharov, K., Papamichael, M., Caulfield, A., Massengill, T., et al.: Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38(2), 8–20 (2018)CrossRefGoogle Scholar
  7. 7.
    Chen, Y., Sun, N., Temam, O., Luo, T., Liu, S., Zhang, S., et al.: DaDianNao: a machine-learning supercomputer. In: IEEE/ACM International Symposium on Microarchitecture, vol. 5, pp. 609–622. IEEE (2014)Google Scholar
  8. 8.
    Alwani, M., Chen, H., Ferdman, M., Milder, P.: Fused-layer CNN accelerators. In: IEEE/ACM International Symposium on Microarchitecture, pp. 1–12. IEEE (2016)Google Scholar
  9. 9.
    Li, Z., et al.: Laius: an 8-bit fixed-point CNN hardware inference engine. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC). IEEE (2017)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2018

Authors and Affiliations

  • Zhijie Yang
    • 1
  • Lei Wang
    • 1
  • Dong Ding
    • 1
  • Xiangyu Zhang
    • 1
  • Yu Deng
    • 1
  • Shiming Li
    • 1
  • Qiang Dou
    • 1
  1. 1.College of ComputerNational University of Defense TechnologyChangshaChina

Personalised recommendations