Unrolling Loops Containing Task Parallelism

Ferrer, Roger; Duran, Alejandro; Martorell, Xavier; Ayguadé, Eduard

doi:10.1007/978-3-642-13374-9_30

Unrolling Loops Containing Task Parallelism

Roger Ferrer¹⁸,
Alejandro Duran¹⁸,
Xavier Martorell^18,19 &
…
Eduard Ayguadé^18,19

Conference paper

861 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5898))

Abstract

Classic loop unrolling allows to increase the performance of sequential loops by reducing the overheads of the non-computational parts of the loop. Unfortunately, when the loop contains parallelism inside most compilers will ignore it or perform a naïve transformation.

We propose to extend the semantics of the loop unrolling transformation to cover loops that contain task parallelism. In these cases, the transformation will try to aggregate the multiple tasks that appear after a classic unrolling phase to reduce the overheads per iteration.

We present an implementation of such extended loop unrolling for OpenMP tasks with two phases: a classical unroll followed by a task aggregation phase. Our aggregation technique covers the special cases where task parallelism appears inside branches or where the loop is uncountable.

Our experimental results show that using this extended unroll allows loops with fine-grained tasks to reduce the overheads associated with task creation and obtain a much better scaling.

Download to read the full chapter text

Chapter PDF

References

Allen, F.E., Cocke, J.: A Catalogue of Optimizing Transformations. Design and Optimization of Compilers, 1–30 (1972)
Google Scholar
Allen, R., Kennedy, K.: Automatic Translation of FORTRAN Programs to Vector Form. ACM Trans. Program. Lang. Syst. 9(4), 491–542 (1987)
Article MATH Google Scholar
Chen, C., Charme, J., Hall, M.: CHiLL: A Framework for Composing High-Level Loop Transformations (2008)
Google Scholar
Donadio, S., Brodman, J., Roeder, T., Yotov, K., Barthou, D., Cohen, A., Garzarn, M., Padua, D., Pingali, K.: A Language for the Compact Representation of Multiple Program Versions. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 136–151. Springer, Heidelberg (2006)
Chapter Google Scholar
Gerasoulis, A., Yang, T.: On the Granularity and Clustering of Directed Acyclic Task Graphs. IEEE Transactions on Parallel and Distributed Systems 4(6), 686–701 (1993)
Article Google Scholar
Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies. International Journal of Parallel Programming 34(3), 261–317 (2006)
Article MATH Google Scholar
Hall, M.W., Amarasinghe, S.P., Murphy, B.R., Liao, S.w., Lam, M.S.: Detecting Coarse-Grain Parallelism Using an Interprocedural Parallelizing Compiler (1995)
Google Scholar
Ishizaka, K., Obata, M., Kasahara, H.: Coarse-Grain Task Parallel Processing Using the OpenMP Backend of the OSCAR Multigrain Parallelizing Compiler. In: Valero, M., Joe, K., Kitsuregawa, M., Tanaka, H. (eds.) ISHPC 2000. LNCS, vol. 1940, pp. 457–470. Springer, Heidelberg (2000)
Chapter Google Scholar
McCreary, C., Gill, H.: Automatic Determination of Grain Size for Efficient Parallel Processing (1989)
Google Scholar
OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.0 (May 2008)
Google Scholar
Pérez, J.M., Badia, R.M., Labarta, J.: A Flexible and Portable Programming Model for SMP and Multi-cores. Technical report, Barcelona Supercomputing Center-Centro Nacional de Supercomputacin (2007)
Google Scholar
Pugh, W.: Uniform Techniques for Loop Optimization. In: ICS 1991: Proceedings of the 5th international conference on Supercomputing, pp. 341–352. ACM, New York (1991)
Chapter Google Scholar
Teruel, X., Martorell, X., Duran, A., Ferrer, R., Ayguadé, E.: Support for OpenMP Tasks in Nanos v4. In: CAS Conference 2007 (October 2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Barcelona Supercomputing Center, Nexus II, Jordi Girona, 29, Barcelona, Spain
Roger Ferrer, Alejandro Duran, Xavier Martorell & Eduard Ayguadé
Departament d’Arquitectura de Computadors, Edifici C6, Jordi Girona, 1-3, Barcelona, Spain
Xavier Martorell & Eduard Ayguadé

Authors

Roger Ferrer
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Duran
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Martorell
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Ayguadé
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Delaware, 19716, Newark, DE, USA
Guang R. Gao & Xiaoming Li &
Department of Computer and Information Sciences, University of Delaware, 19716, Newark, DE, USA
Lori L. Pollock & John Cavazos &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferrer, R., Duran, A., Martorell, X., Ayguadé, E. (2010). Unrolling Loops Containing Task Parallelism. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds) Languages and Compilers for Parallel Computing. LCPC 2009. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13374-9_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-13374-9_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13373-2
Online ISBN: 978-3-642-13374-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics