Skip to main content
Log in

Providing Source Code Level Portability Between CPU and GPU with MapCG

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Graphics processing units (GPU) have taken an important role in the general purpose computing market in recent years. At present, the common approach to programming GPU units is to write GPU specific code with low level GPU APIs such as CUDA. Although this approach can achieve good performance, it creates serious portability issues as programmers are required to write a specific version of the code for each potential target architecture. This results in high development and maintenance costs. We believe it is desirable to have a programming model which provides source code portability between CPUs and GPUs, as well as different GPUs. This would allow programmers to write one version of the code, which can be compiled and executed on either CPUs or GPUs efficiently without modification. In this paper, we propose MapCG, a MapReduce framework to provide source code level portability between CPUs and GPUs. In contrast to other approaches such as OpenCL, our framework, based on MapReduce, provides a high level programming model and makes programming much easier. We describe the design of MapCG, including the MapReduce-style high-level programming framework and the runtime system on the CPU and GPU. A prototype of the MapCG runtime, supporting multi-core CPUs and NVIDIA GPUs, was implemented. Our experimental results show that this implementation can execute the same source code efficiently on multi-core CPU platforms and GPUs, achieving an average speedup of 1.6 ~ 2.5x over previous implementations of MapReduce on eight commonly used applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. NVIDIA. NVIDIA CUDA compute unified device architecture programming guide. http://developer.dounload.nvidia.com/compute/cuda/1-1/NVIDIA_CUDA_programming_Guide_1.1.pdf, 2007.

  2. Eichenberger A E, O'Brien J K, O'Brien K M et al. Using advanced compiler technology to exploit the performance of the Cell Broadband Engine™ architecture. IBM Systems Journal, 2006, 45(1): 59-84.

    Article  Google Scholar 

  3. Zhu W R, Sreedhar V C, Hu Z, Gao G R. Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In Proc. the 34th ISCA, June 2007, pp.35-45.

  4. Buck I, Foley T, Horn D et al. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graph., 2004, 23(3): 777-786.

    Article  Google Scholar 

  5. Khronos Group. OpenCL specification. http://www.khronos.org/registry/cl/.

  6. Stratton J, Stone S S, Hwu W M. MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs. In Proc. the 21th LCPC, July 31-Aug. 2, 2008, pp.16-30.

  7. He B S, Fang W B, Luo Q, Govindaraju N K, Wang T. Mars: A mapreduce framework on graphics processors. In Proc. the 17th PACT, Oct. 2008, pp.260-269.

  8. Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C. Evaluating mapreduce for multi-core and multiprocessor systems. In Proc. the 13th HPCA, Feb. 2007, pp.13-24.

  9. Berger E D, McKinley K S, Blumofe R D, Wilson P R. Hoard: A scalable memory allocator for multithreaded applications. SIGPLAN Not., 2000, 35(11): 117-128.

    Article  Google Scholar 

  10. Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In Proc. the 6th OSDI, Dec. 2004, pp.137-150.

  11. Ekanayake J, Pallickara S, Fox G. MapReduce for data intensive scientific analyses. In Proc. the 4th IEEE International Conference on eScience, Dec. 2008, pp.277-284.

  12. Chu C T, Kim S K, Lin Y A et al. Map-reduce for machine learning on multicore. Advances in Neural Information Processing System, 2007, 19: 281-288.

    Google Scholar 

  13. Matthews S, Williams T. Mrsrf: An efficient mapreduce algorithm for analyzing large collections of evolutionary trees. BMC Bioinformatics, 2010, 11(Suppl. 1): S15.

    Article  Google Scholar 

  14. Panda B, Herbach J, Basu S, Bayardo R. PLANET: Massively parallel learning of tree ensembles with mapreduce. In Proc. VLDB, Aug. 2009, pp.1426-1437.

  15. Yoo R M, Romano A, Kozyrakis C. Phoenix rebirth: Scalable mapreduce on a large-scale shared-memory system. In Proc. IISWC, Oct. 2009, pp.198-207.

  16. Fomitchev M, Ruppert E. Lock-free linked lists and skip lists. In Proc. the 23rd PODC, Jul. 2004, pp.50-59.

  17. Dice D, Garthwaite A. Mostly lock-free malloc. In Proc. the 3 rd ISMM, Jun. 2002, pp.163-174.

  18. Huang X H, Rodrigues C I, Jones S et al. XMalloc: A scalable lock-free dynamic memory allocator for many-core machines. In Proc. the 10th CIT, June 29-July 1, 2010, pp.1134-1139.

  19. Fang W B, He B S, Luo Q et al. Mars: Accelerating MapReduce with graphics processors. IEEE Transactions on Parallel and Distributed Systems, 2010, 22(4): 608-620.

    Article  Google Scholar 

  20. Ji F, Ma X S. Using shared memory to accelerate MapReduce on graphics processing units. In Proc. the 25th IPDPS, May 2011, pp.805-816.

  21. Apache hadoop. http://hadoop.apache.org/.

  22. Chen R, Chen H B, Zang B Y. Tiled-MapReduce: Optimizing resource usage of data-parallel applications on multicore with tiling. In Proc. the 19th PACT, Sep. 2010, pp.523-534.

  23. Shan Y, Wang B, Yan J et al. Fpmr: Mapreduce framework on fpga. In Proc. the 18th FPGA, Feb. 2010, pp.93-102.

  24. Rafique M M, Rose B, Butt A R, Nikolopoulos D S. CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters. In Proc. the 23rd IPDPS, May 2009.

  25. Govindaraju N, Gray J, Kumar R et al. GPUTeraSort: High performance graphics co-processor sorting for large database management. In Proc. SIGMOD/PODS, Jun. 2006, pp.325-336.

  26. AMD CTM. http://www.and.com/us/press-release/Pages/Press/Release_114147.aspx, 2011.

  27. Yan Y H, Grossman M, Sarkar V. JCUDA: A programmer-friendly interface for accelerating Java programs with CUDA. In Proc. the 15th Euro-Par, Aug. 2009, pp.887-899.

  28. Wang P H, Collins J P, Chinya G N et al. EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system. In Proc. PLDI, Jun. 2007, pp.156-166.

  29. Linderman M, Collins J P, Wang H, Meng T H. Merge: A programming model for heterogeneous multi-core systems. In Proc. the 13th ASPLOS, Mar. 2008, pp.287-296.

  30. Lee S, Min S J, Eigenmann R. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. In Proc. the 14th PPoPP, Feb. 2009, pp.101-110.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun-Tao Hong.

Additional information

This work was supported by the National Natural Science Foundation of China under Grant No. 60973143, the National High Technology Research and Development 863 Program of China under Grant No. 2008AA01A201, and the National Basic Research 973 Program of China under Grant No. 2007CB310900.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 105 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hong, CT., Chen, DH., Chen, YB. et al. Providing Source Code Level Portability Between CPU and GPU with MapCG. J. Comput. Sci. Technol. 27, 42–56 (2012). https://doi.org/10.1007/s11390-012-1205-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-012-1205-4

Keywords

Navigation