Predictive Runtime Code Scheduling for Heterogeneous Architectures

Jiménez, Víctor J.; Vilanova, Lluís; Gelado, Isaac; Gil, Marisa; Fursin, Grigori; Navarro, Nacho

doi:10.1007/978-3-540-92990-1_4

Víctor J. Jiménez⁶,
Lluís Vilanova⁷,
Isaac Gelado⁷,
Marisa Gil⁷,
Grigori Fursin⁸ &
…
Nacho Navarro⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5409))

Included in the following conference series:

International Conference on High-Performance Embedded Architectures and Compilers

1347 Accesses
65 Citations

Abstract

Heterogeneous architectures are currently widespread. With the advent of easy-to-program general purpose GPUs, virtually every recent desktop computer is a heterogeneous system. Combining the CPU and the GPU brings great amounts of processing power. However, such architectures are often used in a restricted way for domain-specific applications like scientific applications and games, and they tend to be used by a single application at a time. We envision future heterogeneous computing systems where all their heterogeneous resources are continuously utilized by different applications with versioned critical parts to be able to better adapt their behavior and improve execution time, power consumption, response time and other constraints at runtime. Under such a model, adaptive scheduling becomes a critical component.

In this paper, we propose a novel predictive user-level scheduler based on past performance history for heterogeneous systems. We developed several scheduling policies and present the study of their impact on system performance. We demonstrate that such scheduler allows multiple applications to fully utilize all available processing resources in CPU/GPU-like systems and consistently achieve speedups ranging from 30% to 40% compared to just using the GPU in a single application mode.

All the authors are members of the HiPEAC European Network of Excellence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Parboil benchmark suite, http://www.crhc.uiuc.edu/impact/parboil.php
CUDA Programming Guide 1.1. NVIDIA’s website (2007)
Google Scholar
Badia, R.M., Labarta, J., Sirvent, R., Pérez, J.M., Cela, J.M., Grima, R.: Programming grid applications with grid superscalar. J. Grid Comput. 1(2), 151–170 (2003)
Article Google Scholar
Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: Cellss: a programming model for the cell be architecture. In: SC 2006: Proceedings of the 2006 ACM/IEEE conference on Supercomputing. ACM, New York (2006)
Google Scholar
Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proceedings of the IEEE 93(2), 216–231 (2005); special issue on Program Generation, Optimization, and Platform Adaptation
Article Google Scholar
Fursin, G., Cohen, A., O’Boyle, M., Temam, O.: A practical method for quickly evaluating program optimizations. In: Conte, T., Navarro, N., Hwu, W.-m.W., Valero, M., Ungerer, T. (eds.) HiPEAC 2005. LNCS, vol. 3793, pp. 29–46. Springer, Heidelberg (2005)
Chapter Google Scholar
Fursin, G., Miranda, C., Pop, S., Cohen, A., Temam, O.: Practical run-time adaptation with procedure cloning to enable continuous collective compilation. In: Proceedings of the GCC Developers Summit (July 2007)
Google Scholar
Gabb, H.A., Jackson, R.M., Sternberg, M.J.: Modelling protein docking using shape complementarity, electrostatics and biochemical information. Journal of Molecular Biology 272(1), 106–120 (1997)
Article Google Scholar
Gelado, I., Kelm, J.H., Ryoo, S., Lumetta, S.S., Navarro, N., Hwu, W.m.W.: Cuba: an architecture for efficient cpu/co-processor data communication. In: ICS 2008: Proceedings of the 22nd annual international conference on Supercomputing, pp. 299–308. ACM, New York (2008)
Google Scholar
Mackay, D.J.C.: Information Theory, Inference & Learning Algorithms. Cambridge University Press, Cambridge (2002)
Google Scholar
Maheswaran, M., Siegel, H.J.: A dynamic matching and scheduling algorithm for heterogeneous computing systems. In: HCW 1998: Proceedings of the Seventh Heterogeneous Computing Workshop, Washington, DC, USA, p. 57. IEEE Computer Society, Los Alamitos (1998)
Google Scholar
Oh, H., Ha, S.: A static scheduling heuristic for heterogeneous processors. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 573–577. Springer, Heidelberg (1996)
Chapter Google Scholar
Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.m.W.: Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 73–82. ACM, New York (2008)
Google Scholar
Sih, G.C., Lee, E.A.: A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst. 4(2), 175–187 (1993)
Article Google Scholar
Stone, H.S.: Multiprocessor scheduling with the aid of network flow algorithms. IEEE Transactions on Software Engineering SE-3(1), 85–93 (1977)
Article MathSciNet MATH Google Scholar
Stratton, J., Stone, S., Hwu, W.m.: Mcuda: An efficient implementation of cuda kernels on multi-cores. Technical Report IMPACT-08-01, University of Illinois at Urbana-Champaign (March 2008)
Google Scholar
Tanenbaum, A.S.: Modern Operating Systems. Prentice Hall PTR, Upper Saddle River (2001)
MATH Google Scholar
Tanenbaum, A.S., van Steen, M.: Distributed Systems: Principles and Paradigms, 2nd edn. Prentice-Hall, Inc., Upper Saddle River (2006)
MATH Google Scholar
Topcuoglu, H., Hariri, S., Wu, M.-Y.: Task scheduling algorithms for heterogeneous processors. In: Heterogeneous Computing Workshop, 1999 (HCW 1999) Proceedings. Eighth, pp. 3–14 (1999)
Google Scholar
Topcuoglu, H., Hariri, S., Wu, M.-Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13(3), 260–274 (2002)
Article Google Scholar
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27(1–2), 3–35 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Barcelona Supercomputing Center (BSC), Spain
Víctor J. Jiménez
Departament d’Arquitectura de Computadors (UPC), France
Lluís Vilanova, Isaac Gelado, Marisa Gil & Nacho Navarro
ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud University, France
Grigori Fursin

Authors

Víctor J. Jiménez
View author publications
You can also search for this author in PubMed Google Scholar
Lluís Vilanova
View author publications
You can also search for this author in PubMed Google Scholar
Isaac Gelado
View author publications
You can also search for this author in PubMed Google Scholar
Marisa Gil
View author publications
You can also search for this author in PubMed Google Scholar
Grigori Fursin
View author publications
You can also search for this author in PubMed Google Scholar
Nacho Navarro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRISA, Campus de Beaulieu, 35042, Rennes Cedex, France
André Seznec
Intel Corporation, Massachusetts Microprocessor Design Center, 77 Reed Road, MA 01749, Hudson, USA
Joel Emer
School of Informatics, Institute for Computing Systems Architecture, King’ s Buildings, EH9 3JZ, Edinburgh, United Kingdom
Michael O’Boyle
Department of Electrical Engineering, Princeton University, 34 Olden Street, NJ 08544-5263, Princeton, USA
Margaret Martonosi
Department of Computer Science, University of Augsburg, 86135, Augsburg, Germany
Theo Ungerer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiménez, V.J., Vilanova, L., Gelado, I., Gil, M., Fursin, G., Navarro, N. (2009). Predictive Runtime Code Scheduling for Heterogeneous Architectures. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2009. Lecture Notes in Computer Science, vol 5409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92990-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-92990-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92989-5
Online ISBN: 978-3-540-92990-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics