Leveraging Subgraph Extraction for Performance Portable Programming Frameworks on DL Accelerators
Deep learning framework plays an important role in connecting hardware platform and algorithm. In recent years, some domain-specific deep learning accelerators with better performance and energy efficiency were proposed by researchers. However, current frameworks lack enough considerations about how to better support the possible new features brought by accelerators. In this paper, we propose to build a performance portable programming framework with subgraph extraction. The intuition is that increasing ratio of optimizations are taken from the top-level framework to the low-level software stack of accelerator. In response to this development trend, framework needs to pay more attention to the splitting strategy of computation graph for the heterogeneous computation.
This work is partially supported by the National Key Research and Development Program of China (under Grant 2017YFA0700902, 2017YFB1003101), the NSF of China (under Grants 6147239, 61432016, 61473275, 61522211, 61532016, 61521092, 61502446, 61672491, 61602441, 61602446, 61732002, 61702478), the 973 Program of China (under Grant 2015CB358800), National Science and Technology Major Project (2018ZX01031102) and Strategic Priority Research Program of Chinese Academy of Sciences (XDBS01050200).
- 1.Abadi, M., et al.: Tensorflow: a system for large-scale machine learning (2016)Google Scholar
- 2.Chen, T., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Statistics (2015)Google Scholar
- 3.Chen, T., et al.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 269–284. ACM (2014)Google Scholar
- 4.Chen, Y., et al.: DadianNao: a machine-learning supercomputer. In: IEEE/ACM International Symposium on Microarchitecture, pp. 609–622 (2014)Google Scholar
- 6.Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
- 7.Liu, D., et al.: PuDianNao: a polyvalent machine learning accelerator. In: Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 369–381 (2015)Google Scholar
- 8.Liu, S., et al.: Cambricon: an instruction set architecture for neural networks. In: Proceedings of the 43rd International Symposium on Computer Architecture, pp. 393–405. IEEE Press (2016)Google Scholar
- 9.Reagen, B., et al.: Minerva: enabling low-power, highly-accurate deep neural network accelerators. In: ACM SIGARCH Computer Architecture News, vol. 44, pp. 267–278. IEEE Press (2016)Google Scholar
- 10.Zhang, S., et al.: Cambricon-X: an accelerator for sparse neural networks. In: IEEE/ACM International Symposium on Microarchitecture, pp. 1–12 (2016)Google Scholar