Understanding Data Partition for Applications on CPU-GPU Integrated Processors

  • Juan Fang
  • Huanhuan Chen
  • Junjie Mao
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 747)


Integrating GPU with CPU on the same chip is increasingly common in current processor architectures for high performance. CPU and GPU share on-chip network, last level cache, memory. Do not need to copy data back and forth that a discrete GPU requires. Shared virtual memory, memory coherence, and system-wide atomics are introduced to heterogeneous architectures and programming models to enable fine-grained CPU and GPU collaboration. Programming model such as OpenCL 2.0, CUDA 8.0, and C++ AMP support these heterogeneous architecture features. Data partition is one of the collaboration patterns. It is essential for improving performance and energy-efficiency to balance the data processed between CPU and GPU. In this paper, we first demonstrate that the optimal allocation of data to the CPU and GPU can provide 20% higher performance than fixed ratio of 20% for one application. Second, we evaluate another 5 heterogeneous applications covering the latest architecture features, found the relation of the data partitioning with performance.


Data partition GPU Heterogeneous architectures 



This work is partially supported by the National Natural Science Foundation of China under Grant NO. 61202076 and NO. 61202062.


  1. 1.
    Khronos Group: The OpenCL specification, Version 2.0 (2015)Google Scholar
  2. 2.
    NVidia: CUDA C programming guide v. 8.0, September 2016Google Scholar
  3. 3.
    Vilches, A., Asenjo, R., Navarro, A., Corbera, F., Gran, R., Garzarán, M.: Adaptive partitioning for irregular applications on heterogeneous CPU-GPU chips. In: International Conference on Computational Science, vol. 51, pp. 271–350, pp. 140–149 (2015)Google Scholar
  4. 4.
    Grewe, D., Wang, Z., O’Boyle, M.F.P.: OpenCL task partitioning in the presence of GPU contention. In: Caşcaval, C., Montesinos, P. (eds.) LCPC 2013. LNCS, vol. 8664, pp. 87–101. Springer, Cham (2014). Scholar
  5. 5.
    Lang, J., Rünger, G.: Dynamic distribution of workload between CPU and GPU for a parallel conjugate gradient method in an adaptive FEM. Procedia Comput. Sci. 18, 299–308 (2013)CrossRefGoogle Scholar
  6. 6.
    Pérez, B., Bosque, J.L., Beivide, R.: Simplify programming and load balancing of data parallel applications on heterogeneous system. In: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, pp. 42–51 (2016)Google Scholar
  7. 7.
    Gómez-Luna, J., Hajj, I.E., Chang, L.-W., García-Flores, V., de Gonzalo, S.G., Jablin, T.B., Pena, A.J., Hwu, W.-M.: Chai: collaborative heterogeneous applications for integrated-architectures. In: IEEE International Symposium on Performance Analysis of Systems and Software (2017)Google Scholar
  8. 8.
    Zhang, F., Zhai, J., He, B., Zhang, S., Chen, W.: Understanding co-running behaviors on integrated CPU/GPU architectures. IEEE Trans. Parallel Distrib. Syst. 28, 905–918 (2017)CrossRefGoogle Scholar
  9. 9.
    Hwu, W.-M.W.: Heterogeneous System Architecture: A New Compute Platform Infrastructure. Morgan Kaufman (2015)Google Scholar
  10. 10.
    Power, J., Hestness, J., Orr, M.S., Hill, M.D., Wood, D.A.: gem5-gpu: a heterogeneous CPU-GPU simulator. IEEE Comput. Archit. Lett. 14(1), 34–36 (2015)CrossRefGoogle Scholar
  11. 11.
    Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M.D., Wood, D.A.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011)CrossRefGoogle Scholar
  12. 12.
    Bakhoda, A., Yuan, G., Fung, W., Wong, H., Aamodt, T.: Analyzing CUDA workloads using a detailed GPU simulator. In: International Symposium on Performance Analysis of Systems and Software (2009)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Beijing University of TechnologyBeijingChina
  2. 2.China Information Technology Security Evaluation CenterBeijingChina

Personalised recommendations