Towards Task-Parallel Reductions in OpenMP

Ciesko, Jan; Mateo, Sergi; Teruel, Xavier; Martorell, Xavier; Ayguadé, Eduard; Labarta, Jesús; Duran, Alex; de Supinski, Bronis R.; Olivier, Stephen; Li, Kelvin; Eichenberger, Alexandre E.

doi:10.1007/978-3-319-24595-9_14

Jan Ciesko¹⁸,
Sergi Mateo^18,19,
Xavier Teruel¹⁸,
Xavier Martorell^18,19,
Eduard Ayguadé^18,19,
Jesús Labarta^18,19,
Alex Duran²⁰,
Bronis R. de Supinski²¹,
Stephen Olivier²²,
Kelvin Li²³ &
…
Alexandre E. Eichenberger²³

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9342))

Included in the following conference series:

International Workshop on OpenMP

766 Accesses
1 Citations

Abstract

Reductions represent a common algorithmic pattern in many scientific applications. OpenMP\(^{*}\) has always supported them on parallel and worksharing constructs. OpenMP 3.0’s tasking constructs enable new parallelization opportunities through the annotation of irregular algorithms. Unfortunately the tasking model does not easily allow the expression of concurrent reductions, which limits the general applicability of the programming model to such algorithms. In this work, we present an extension to OpenMP that supports task-parallel reductions on task and taskgroup constructs to improve productivity and programmability. We present specification of the feature and explore issues for programmers and software vendors regarding programming transparency as well as the impact on the current standard with respect to nesting, untied task support and task data dependencies. Our performance evaluation demonstrates comparable results to hand coded task reductions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This case may involve multiple private copies due to support for untied tasks.
2.
mcxx 1.99.8 (git 538d492).
3.
nanox 0.9a (git master 10f6134).

References

Barcelona Supercomputing Center.: OmpSs Specification, 25 April 2014. http://pm.bsc.es/ompss-docs/specs
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: SIGPLAN Notices, vol. 40(10), pp. 519–538 (2005)
Google Scholar
Ciesko, J., Mateo, S., Teruel, X., Beltran, V., Martorell, X., Badia, R.M., Ayguadé, E., Labarta, J.: Task-parallel reductions in OpenMP and OmpSs. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 1–15. Springer, Heidelberg (2014)
Google Scholar
Frigo, M., Halpern, P., Leiserson, C.E., Lewin-Berlin, S.: Reducers and other Cilk++ hyperobjects. In: Proceedings of the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures, SPAA 2009, pp. 79–90. ACM, New York (2009)
Google Scholar
Leiserson, C.E.: The Cilk++ concurrency platform. In: Proceedings of the 46th Annual Design Automation Conference, DAC 2009, pp. 522–527. ACM, New York (2009)
Google Scholar
Olivier, S., Huan, J., Liu, J., Prins, J.F., Dinan, J., Sadayappan, P., Tseng, C.-W.: UTS: an unbalanced tree search benchmark. In: Almási, G.S., Caşcaval, C., Wu, P. (eds.) KSEM 2006. LNCS, vol. 4382, pp. 235–250. Springer, Heidelberg (2007)
Chapter Google Scholar
OpenMP Architecture Review Board.: OpenMP Application ProgramInterface Version 4.0, July 2013
Google Scholar
Shirako, J., Peixotto, D.M., Sarkar, V., Scherer, W.N.: Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In: ICS 2008: Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 277–288. ACM, New York (2008)
Google Scholar
Shirako, J., Peixotto, D.M., Sarkar, V., Scherer, W.N.: Phaser accumulators: a new reduction construct for dynamic parallelism. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, pp. 1–12. IEEE, Rome, May 2009
Google Scholar

Download references

Acknowledgments

This work has been developed with the support of the grant SEV-2011-00067 of Severo Ochoa Program, awarded by the Spanish Government and by the Spanish Ministry of Science and Innovation (contracts TIN2012-34557, and CAC2007-00052) by the Generalitat de Catalunya (contract 2014-SGR-1051) and the Intel-BSC Exascale Lab collaboration project.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Also the authors would like to thank the OpenMP community for their substantial contribution to this work.

Intel, Xeon, Xeon Phi and Many Integrated Core are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

\(^{*}\)Other brands and names are the property of their respective owners.

Author information

Authors and Affiliations

Barcelona Supercomputing Center, Barcelona, Spain
Jan Ciesko, Sergi Mateo, Xavier Teruel, Xavier Martorell, Eduard Ayguadé & Jesús Labarta
Universitat Politècnica de Catalunya, Barcelona, Spain
Sergi Mateo, Xavier Martorell, Eduard Ayguadé & Jesús Labarta
Intel Iberia Corporation, Madrid, Spain
Alex Duran
Lawrence Livermore National Laboratories, Livermore, USA
Bronis R. de Supinski
Sandia National Laboratories, Livermore, USA
Stephen Olivier
IBM Corporation, New York, USA
Kelvin Li & Alexandre E. Eichenberger

Authors

Jan Ciesko
View author publications
You can also search for this author in PubMed Google Scholar
Sergi Mateo
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Teruel
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Martorell
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Ayguadé
View author publications
You can also search for this author in PubMed Google Scholar
Jesús Labarta
View author publications
You can also search for this author in PubMed Google Scholar
Alex Duran
View author publications
You can also search for this author in PubMed Google Scholar
Bronis R. de Supinski
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Olivier
View author publications
You can also search for this author in PubMed Google Scholar
Kelvin Li
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre E. Eichenberger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergi Mateo .

Editor information

Editors and Affiliations

RWTH Aachen University, Aachen, Germany
Christian Terboven
Lawrence Livermore National Laboratory, Livermore, California, USA
Bronis R. de Supinski
RWTH Aachen University, Aachen, Germany
Pablo Reble
University of Houston, Houston, California, USA
Barbara M. Chapman
RWTH Aachen University, Aachen, Germany
Matthias S. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ciesko, J. et al. (2015). Towards Task-Parallel Reductions in OpenMP. In: Terboven, C., de Supinski, B., Reble, P., Chapman, B., Müller, M. (eds) OpenMP: Heterogenous Execution and Data Movements. IWOMP 2015. Lecture Notes in Computer Science(), vol 9342. Springer, Cham. https://doi.org/10.1007/978-3-319-24595-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-24595-9_14
Published: 26 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24594-2
Online ISBN: 978-3-319-24595-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics