Skip to main content

Towards Task-Parallel Reductions in OpenMP

  • Conference paper
  • First Online:
OpenMP: Heterogenous Execution and Data Movements (IWOMP 2015)

Abstract

Reductions represent a common algorithmic pattern in many scientific applications. OpenMP\(^{*}\) has always supported them on parallel and worksharing constructs. OpenMP 3.0’s tasking constructs enable new parallelization opportunities through the annotation of irregular algorithms. Unfortunately the tasking model does not easily allow the expression of concurrent reductions, which limits the general applicability of the programming model to such algorithms. In this work, we present an extension to OpenMP that supports task-parallel reductions on task and taskgroup constructs to improve productivity and programmability. We present specification of the feature and explore issues for programmers and software vendors regarding programming transparency as well as the impact on the current standard with respect to nesting, untied task support and task data dependencies. Our performance evaluation demonstrates comparable results to hand coded task reductions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This case may involve multiple private copies due to support for untied tasks.

  2. 2.

    mcxx 1.99.8 (git 538d492).

  3. 3.

    nanox 0.9a (git master 10f6134).

References

  1. Barcelona Supercomputing Center.: OmpSs Specification, 25 April 2014. http://pm.bsc.es/ompss-docs/specs

  2. Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: SIGPLAN Notices, vol. 40(10), pp. 519–538 (2005)

    Google Scholar 

  3. Ciesko, J., Mateo, S., Teruel, X., Beltran, V., Martorell, X., Badia, R.M., Ayguadé, E., Labarta, J.: Task-parallel reductions in OpenMP and OmpSs. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 1–15. Springer, Heidelberg (2014)

    Google Scholar 

  4. Frigo, M., Halpern, P., Leiserson, C.E., Lewin-Berlin, S.: Reducers and other Cilk++ hyperobjects. In: Proceedings of the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures, SPAA 2009, pp. 79–90. ACM, New York (2009)

    Google Scholar 

  5. Leiserson, C.E.: The Cilk++ concurrency platform. In: Proceedings of the 46th Annual Design Automation Conference, DAC 2009, pp. 522–527. ACM, New York (2009)

    Google Scholar 

  6. Olivier, S., Huan, J., Liu, J., Prins, J.F., Dinan, J., Sadayappan, P., Tseng, C.-W.: UTS: an unbalanced tree search benchmark. In: Almási, G.S., Caşcaval, C., Wu, P. (eds.) KSEM 2006. LNCS, vol. 4382, pp. 235–250. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. OpenMP Architecture Review Board.: OpenMP Application ProgramInterface Version 4.0, July 2013

    Google Scholar 

  8. Shirako, J., Peixotto, D.M., Sarkar, V., Scherer, W.N.: Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In: ICS 2008: Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 277–288. ACM, New York (2008)

    Google Scholar 

  9. Shirako, J., Peixotto, D.M., Sarkar, V., Scherer, W.N.: Phaser accumulators: a new reduction construct for dynamic parallelism. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, pp. 1–12. IEEE, Rome, May 2009

    Google Scholar 

Download references

Acknowledgments

This work has been developed with the support of the grant SEV-2011-00067 of Severo Ochoa Program, awarded by the Spanish Government and by the Spanish Ministry of Science and Innovation (contracts TIN2012-34557, and CAC2007-00052) by the Generalitat de Catalunya (contract 2014-SGR-1051) and the Intel-BSC Exascale Lab collaboration project.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Also the authors would like to thank the OpenMP community for their substantial contribution to this work.

Intel, Xeon, Xeon Phi and Many Integrated Core are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

\(^{*}\)Other brands and names are the property of their respective owners.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergi Mateo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ciesko, J. et al. (2015). Towards Task-Parallel Reductions in OpenMP. In: Terboven, C., de Supinski, B., Reble, P., Chapman, B., Müller, M. (eds) OpenMP: Heterogenous Execution and Data Movements. IWOMP 2015. Lecture Notes in Computer Science(), vol 9342. Springer, Cham. https://doi.org/10.1007/978-3-319-24595-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24595-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24594-2

  • Online ISBN: 978-3-319-24595-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics