On Retargeting the AI Programming Framework to New Hardwares

  • Jiacheng Zhao
  • Yisong Chang
  • Denghui Li
  • Chunwei Xia
  • Huimin Cui
  • Ke Zhang
  • Xiaobing Feng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11276)


Nowadays, a large number of accelerators are proposed to increase the performance of AI applications, making it a big challenge to enhance existing AI programming frameworks to support these new accelerators. In this paper, we select TensorFlow to demonstrate how to port the AI programming framework to new hardwares, i.e., FPGA and Sunway TaihuLight here. FPGA and Sunway TaihuLight represent two distinct and significant hardware architectures for considering the retargeting process. We introduce our retargeting processes and experiences for these two platforms, from the source codes to the compilation processes. We compare the two retargeting approaches and demonstrate some preliminary experimental results.


Retarget AI programming framework FPGA Sunway 



This work is supported in part by the National Key R&D Program of China (2016YFB1000402), the National Natural Science Foundation of China (61802368, 61521092, 61432016, 61432018, 61332009, 61702485). The authors would like to thank all the anonymous reviewers for their valuable comments and helpful suggestions.


  1. 1.
    Chen, T., et al.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: ASPLOS 2014, NY, USA. ACM, New York (2014)Google Scholar
  2. 2.
    Han, S., et al.: EIE: efficient inference engine on compressed deep neural network. In: ISCA 2016 (2016)Google Scholar
  3. 3.
  4. 4.
    Han, S., et al.: ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: FPGA 2017 (2017)Google Scholar
  5. 5.
  6. 6.
    Li, M.: Introducing NNVM compiler: a new open end-to-end compiler for AI frameworks (2017)Google Scholar
  7. 7.
  8. 8.
    Guennebaud, G., Jacob, B., et al.: Eigen v3 (2010).
  9. 9.
    Lin, H., et al.: Scalable graph traversal on sunway taihulight with ten million cores. In: IPDPS 2017 (2017)Google Scholar
  10. 10.
    Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)Google Scholar
  11. 11.
    Lécun, Y., Bottou, L., Bengio, Y., Haner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE (1998)Google Scholar
  12. 12.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. CoRR vol. abs/1512.00567 (2015)Google Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. CoRR vol. abs/1603.05027 (2016)CrossRefGoogle Scholar
  14. 14.
    Fang, J., Fu, H., Zhao, W., Chen, B., Zheng, W., Yang, G.: swDNN: a library for accelerating deep learning applications on sunway taihulight. In: IPDPS 2017 (2017)Google Scholar
  15. 15.
    Chetlur, S., et al.: cuDNN: efficient primitives for deep learning. CoRR vol. abs/1410.0759 (2014)Google Scholar
  16. 16.
    Lecun, Y., Bottou, L., Bengio, Y., Haner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp. 2278–2324, November 1998CrossRefGoogle Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS 2012 (2012)Google Scholar
  18. 18.
    Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI 2016 (2016)Google Scholar
  19. 19.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: MM 2014, pp. 675–678 (2014)Google Scholar
  20. 20.
    Chen, T., et al.:, MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. CoRR vol. abs/1512.01274 (2015)Google Scholar
  21. 21.
    Nvidia Corporation: Nvidia cuda C programming guide. Nvidia Corporation (2011)Google Scholar
  22. 22.
    Chen, Y.-H., Emer, J., Sze, V.: Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: ISCA 2016 (2016)Google Scholar
  23. 23.
    Parashar, A., et al.: SCNN: an accelerator for compressed-sparse convolutional neural networks. In: ISCA 2017 (2017)Google Scholar
  24. 24.
    Suda, N., et al.: Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In: FPGA 2016 (2016)Google Scholar
  25. 25.
    Qiu, J., et al.: Going deeper with embedded FPGA platform for convolutional neural network. In: FPGA 2016 (2016)Google Scholar
  26. 26.
    Ji, Y., Zhang, Y., Chen, W., Xie, Y.: Bridge the gap between neural networks and neuromorphic hardware with a neural network compiler. In: ASPLOS 2018 (2018)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2018

Authors and Affiliations

  • Jiacheng Zhao
    • 1
    • 2
  • Yisong Chang
    • 1
    • 2
  • Denghui Li
    • 1
  • Chunwei Xia
    • 1
    • 2
  • Huimin Cui
    • 1
    • 2
  • Ke Zhang
    • 1
    • 2
  • Xiaobing Feng
    • 1
    • 2
  1. 1.SKL Computer ArchitectureInstitute of Computing Technology, CASBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations