Skip to main content

Predictive Runtime Code Scheduling for Heterogeneous Architectures

  • Conference paper
High Performance Embedded Architectures and Compilers (HiPEAC 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5409))

Abstract

Heterogeneous architectures are currently widespread. With the advent of easy-to-program general purpose GPUs, virtually every recent desktop computer is a heterogeneous system. Combining the CPU and the GPU brings great amounts of processing power. However, such architectures are often used in a restricted way for domain-specific applications like scientific applications and games, and they tend to be used by a single application at a time. We envision future heterogeneous computing systems where all their heterogeneous resources are continuously utilized by different applications with versioned critical parts to be able to better adapt their behavior and improve execution time, power consumption, response time and other constraints at runtime. Under such a model, adaptive scheduling becomes a critical component.

In this paper, we propose a novel predictive user-level scheduler based on past performance history for heterogeneous systems. We developed several scheduling policies and present the study of their impact on system performance. We demonstrate that such scheduler allows multiple applications to fully utilize all available processing resources in CPU/GPU-like systems and consistently achieve speedups ranging from 30% to 40% compared to just using the GPU in a single application mode.

All the authors are members of the HiPEAC European Network of Excellence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Parboil benchmark suite, http://www.crhc.uiuc.edu/impact/parboil.php

  2. CUDA Programming Guide 1.1. NVIDIA’s website (2007)

    Google Scholar 

  3. Badia, R.M., Labarta, J., Sirvent, R., Pérez, J.M., Cela, J.M., Grima, R.: Programming grid applications with grid superscalar. J. Grid Comput. 1(2), 151–170 (2003)

    Article  Google Scholar 

  4. Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: Cellss: a programming model for the cell be architecture. In: SC 2006: Proceedings of the 2006 ACM/IEEE conference on Supercomputing. ACM, New York (2006)

    Google Scholar 

  5. Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proceedings of the IEEE 93(2), 216–231 (2005); special issue on Program Generation, Optimization, and Platform Adaptation

    Article  Google Scholar 

  6. Fursin, G., Cohen, A., O’Boyle, M., Temam, O.: A practical method for quickly evaluating program optimizations. In: Conte, T., Navarro, N., Hwu, W.-m.W., Valero, M., Ungerer, T. (eds.) HiPEAC 2005. LNCS, vol. 3793, pp. 29–46. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Fursin, G., Miranda, C., Pop, S., Cohen, A., Temam, O.: Practical run-time adaptation with procedure cloning to enable continuous collective compilation. In: Proceedings of the GCC Developers Summit (July 2007)

    Google Scholar 

  8. Gabb, H.A., Jackson, R.M., Sternberg, M.J.: Modelling protein docking using shape complementarity, electrostatics and biochemical information. Journal of Molecular Biology 272(1), 106–120 (1997)

    Article  Google Scholar 

  9. Gelado, I., Kelm, J.H., Ryoo, S., Lumetta, S.S., Navarro, N., Hwu, W.m.W.: Cuba: an architecture for efficient cpu/co-processor data communication. In: ICS 2008: Proceedings of the 22nd annual international conference on Supercomputing, pp. 299–308. ACM, New York (2008)

    Google Scholar 

  10. Mackay, D.J.C.: Information Theory, Inference & Learning Algorithms. Cambridge University Press, Cambridge (2002)

    Google Scholar 

  11. Maheswaran, M., Siegel, H.J.: A dynamic matching and scheduling algorithm for heterogeneous computing systems. In: HCW 1998: Proceedings of the Seventh Heterogeneous Computing Workshop, Washington, DC, USA, p. 57. IEEE Computer Society, Los Alamitos (1998)

    Google Scholar 

  12. Oh, H., Ha, S.: A static scheduling heuristic for heterogeneous processors. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 573–577. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  13. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.m.W.: Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 73–82. ACM, New York (2008)

    Google Scholar 

  14. Sih, G.C., Lee, E.A.: A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst. 4(2), 175–187 (1993)

    Article  Google Scholar 

  15. Stone, H.S.: Multiprocessor scheduling with the aid of network flow algorithms. IEEE Transactions on Software Engineering SE-3(1), 85–93 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  16. Stratton, J., Stone, S., Hwu, W.m.: Mcuda: An efficient implementation of cuda kernels on multi-cores. Technical Report IMPACT-08-01, University of Illinois at Urbana-Champaign (March 2008)

    Google Scholar 

  17. Tanenbaum, A.S.: Modern Operating Systems. Prentice Hall PTR, Upper Saddle River (2001)

    MATH  Google Scholar 

  18. Tanenbaum, A.S., van Steen, M.: Distributed Systems: Principles and Paradigms, 2nd edn. Prentice-Hall, Inc., Upper Saddle River (2006)

    MATH  Google Scholar 

  19. Topcuoglu, H., Hariri, S., Wu, M.-Y.: Task scheduling algorithms for heterogeneous processors. In: Heterogeneous Computing Workshop, 1999 (HCW 1999) Proceedings. Eighth, pp. 3–14 (1999)

    Google Scholar 

  20. Topcuoglu, H., Hariri, S., Wu, M.-Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13(3), 260–274 (2002)

    Article  Google Scholar 

  21. Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27(1–2), 3–35 (2001)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiménez, V.J., Vilanova, L., Gelado, I., Gil, M., Fursin, G., Navarro, N. (2009). Predictive Runtime Code Scheduling for Heterogeneous Architectures. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2009. Lecture Notes in Computer Science, vol 5409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92990-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-92990-1_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-92989-5

  • Online ISBN: 978-3-540-92990-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics