Skip to main content

Load Balancing for Heterogeneous Parallel Architecture

  • Chapter
  • First Online:
Task Scheduling for Multi-core and Parallel Architectures
  • 911 Accesses

Abstract

Besides traditional CPU-based parallel computer, heterogeneous parallel architectures that consists of both CPU and GPGPU are used in many emerging large-scale clusters/supercomputers. In order to better utilize both the CPU and GPU, an application could divide and distribute its workload to the two types of hardware at the same time. However, it is not trivial to find an optimal allocation for all the applications offline, because applications often have various characters thus different applications have different speedup ratio on GPGPU compared with that on CPU. In order to solve this problem, this chapter presents the techniques that can balance the application workload across heterogeneous hardware.

Part of contents in this chapter has been published through International Workshop on Programming Models and Applications for Multicores and Manycores. Reprinted from Ref. [14], with permission from ACM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. C. Augonnet, S. Thibault, R. Namyst, P. Wacrenier, StarPU: A unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience 23 (2) (2011) 187–198.

    Google Scholar 

  2. S. S. Baghsorkhi, M. Delahaye, S. J. Patel, W. D. Gropp, W.-m. W. Hwu, An adaptive performance modeling tool for GPU architectures, in: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, ACM, New York, NY, USA, 2010, pp. 105–114.

    Google Scholar 

  3. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, P. Hanrahan, Brook for GPUs: stream computing on graphics hardware, in: ACM SIGGRAPH 2004 Papers, SIGGRAPH ’04, ACM, New York, NY, USA, 2004, pp. 777–786.

    Google Scholar 

  4. J. Bueno, L. Martinell, A. Duran, M. Farreras, X. Martorell, R. Badia, E. Ayguade, J. Labarta, Productive cluster programming with OmpSS, Euro-Par 2011 Parallel Processing (2011) 555–566.

    Google Scholar 

  5. B. He, W. Fang, Q. Luo, N. K. Govindaraju, T. Wang, Mars: a mapreduce framework on graphics processors, in: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT ’08, ACM, New York, NY, USA, 2008, pp. 260–269.

    Google Scholar 

  6. S. Hong, H. Kim, An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness, in: Proceedings of the 36th annual international symposium on Computer architecture, ISCA ’09, ACM, New York, NY, USA, 2009, pp. 152–163.

    Google Scholar 

  7. S. Hong, H. Kim, An integrated GPU power and performance model, in: Proceedings of the 37th annual international symposium on Computer architecture, ISCA ’10, ACM, New York, NY, USA, 2010, pp. 280–289.

    Google Scholar 

  8. C.-K. Luk, S. Hong, and H. Kim. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 45–55. ACM, 2009.

    Google Scholar 

  9. P. McCormick, J. Inman, J. Ahrens, J. Mohd-Yusof, G. Roth, S. Cummins, Scout: a data-parallel programming language for graphics processors, Parallel Computing 33 (10–11) (2007) 648–662.

    Google Scholar 

  10. A. Munshi, The OpenCL specification version: 1.2 (2011).

    Google Scholar 

  11. C. Nvidia, CUDA C programming guide 5.0 (2012).

    Google Scholar 

  12. S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, W.-m. W. Hwu, Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, in: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, PPoPP ’08, ACM, New York, NY, USA, 2008, pp. 73–82.

    Google Scholar 

  13. T. R. Scogland, B. Rountree, W.-c. Feng, and B. R. De Supinski. Heterogeneous task scheduling for accelerated openmp. In Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 144–155. IEEE, 2012.

    Google Scholar 

  14. Z. Wang, L. Zheng, Q. Chen, and M. Guo. CAP: co-scheduling based on asymptotic profiling in CPU+ GPU hybrid systems. Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores, pages 107–114. ACM, 2013.

    Google Scholar 

  15. Y. Zhang, J. Owens, A quantitative performance analysis model for GPU architectures, in: High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on, 2011, pp. 382 –393.

    Google Scholar 

  16. F. Zhang, B. Wu, J. Zhai, B. He, and W. Chen. Finepar: irregularity-aware fine-grained workload partitioning on integrated architectures. In Proceedings of the 2017 International Symposium on Code Generation and Optimization, pages 27–38. IEEE Press, 2017.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quan Chen .

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chen, Q., Guo, M. (2017). Load Balancing for Heterogeneous Parallel Architecture. In: Task Scheduling for Multi-core and Parallel Architectures. Springer, Singapore. https://doi.org/10.1007/978-981-10-6238-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6238-4_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6237-7

  • Online ISBN: 978-981-10-6238-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics