Advertisement

Efficient Processing of Convolutional Neural Networks on SW26010

  • Yi ZhangEmail author
  • Bing Shu
  • Yan Yin
  • Yawei Zhou
  • Shaodi Li
  • Junmin Wu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11783)

Abstract

Artificial intelligence has developed rapidly in recent years. Deep neural networks are the basis of many artificial intelligence applications. How to accelerate the computational processing of deep neural networks is very important. To explor the potential for accelerating the process deep neural networks on various hardware platforms, we propose a convolutional neural network optimization method based on the Weight-Stationary for SW26010 processor. We re-circulate convolution loops and use hybrid DMA transmission mode to increase memory bandwidth and reduce memory access overhead. On top of those, further optimizations are done based on register communication, asynchronous DMA transfer double buffering, instruction scheduling and other schemes. Finally, we achieve a double-precision convolution performance over 2.4 Tflops, achieving 81% of the processor’s peak performance. In multiple parameters, we achieve a proforamnce acceleration of \(2.4-4.0\times \) speedup compared to the Tesla K80 GPU with cuDNNv7.

Keywords

SW26010 processor Convolutional neural networks Weight-stationary Parallel model Many-core architecture Deep learning 

References

  1. 1.
    Chetlur, S., et al.: cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)
  2. 2.
    Chen, Y., Chen, T., Xu, Z., Sun, N., Temam, O.: Diannao family: energy-efficient hardware accelerators for machine learning. Commun. ACM 59(11), 105–112 (2016)CrossRefGoogle Scholar
  3. 3.
    Fang, J., Fu, H., Zhao, W., Chen, B., Zheng, W., Yang, G.: swdnn: a library for accelerating deep learning applications on sunway taihulight. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 615–624. IEEE (2017)Google Scholar
  4. 4.
    Li, L., et al.: swcaffe: A parallel framework for accelerating deep learning applications on sunway taihulight. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 413–422. IEEE (2018)Google Scholar
  5. 5.
    Jiang, L., et al.: Towards highly efficient dgemm on the emerging sw26010 many-core processor. In: 2017 46th International Conference on Parallel Processing (ICPP), pp. 422–431. IEEE (2017)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2019

Authors and Affiliations

  • Yi Zhang
    • 1
    Email author
  • Bing Shu
    • 2
  • Yan Yin
    • 1
  • Yawei Zhou
    • 1
  • Shaodi Li
    • 1
  • Junmin Wu
    • 1
  1. 1.University of Science and Technology of ChinaHefeiChina
  2. 2.Jiangnan Institute of Computing TechnologyWuxiChina

Personalised recommendations