Supporting Adaptive Privatization Techniques for Irregular Array Reductions in Task-Parallel Programming Models
Irregular array-type reductions represent a reoccurring algorithmic pattern in many scientific applications. Their scalable execution on modern systems is not trivial as their irregular memory access pattern prohibits an efficient use of the memory subsystem and costly techniques are needed to eliminate data races. Taking a closer look at algorithms, memory access patterns and support techniques reveals that a one-size-fits-all solution does not exist and approaches are needed that can adapt to individual properties while maintaining programming transparency. In this work we propose a solution framework that generalizes the concept of privatization to support a variety of techniques, implements an inspector-executor to provide memory access analytics to the runtime for automatic tuning and shows what language extensions are needed. A reference implementation in OmpSs, a task-parallel programming model, shows programmability and scalability of this solution.
KeywordsArray reduction Privatization Inspector-executor OmpSs OpenMP
This work has been developed with the support of the grant SEV-2011-00067 of Severo Ochoa Program, awarded by the Spanish Government and by the Spanish Ministry of Science and Innovation (contracts TIN2012-34557, and CAC2007-00052) by the Generalitat de Catalunya (contract 2009-SGR-980) and the Intel-BSC Exascale Lab collaboration project.
- 1.Hydrodynamics Challenge Problem, Lawrence Livermore National Laboratory. Technical report LLNL-TR-490254Google Scholar
- 2.A comparison of parallelization techniques for irregular reductions. In: Proceedings 15th International Parallel and Distributed Processing Symposium, p. 8, April 2001Google Scholar
- 3.Ciesko, J., Bueno, J., Puzovic, N., Ramirez, A., Badia, R.M., Labarta, J.: Programmable and scalable reductions on clusters. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp. 560–568, May 2013Google Scholar
- 4.Ciesko, J., Mateo, S., Teruel, X., Beltran, V., Martorell, X., Labarta, J.: Boosting irregular array reductions through in-lined block-ordering on fast processors. In: High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE, September 2015Google Scholar
- 5.Ciesko, J., Mateo, S., Teruel, X., Beltran, V., Martorell, X., Badia, R.M., Ayguadé, E., Labarta, J.: Task-parallel reductions in OpenMP and OmpSs. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 1–15. Springer, Heidelberg (2014)Google Scholar
- 7.Han, H., Tseng, C.W.: A comparison of parallelization techniques for irregular reductions. In: Proceedings of the 15th International Parallel & Amp; Distributed Processing Symposium, IPDPS 2001, p. 27. IEEE Computer Society, Washington, DC, USA (2001). http://dl.acm.org/citation.cfm?id=645609.662492
- 8.Komatitsch, D., Tromp, J.: Introduction to the spectral-element method for 3-D seismic wave propagation 139(3), 806–822 (1999)Google Scholar
- 9.OpenMP Architecture Review Board: OpenMP Application Program Interface Version 4.0, July 2013Google Scholar
- 11.Yu, H., Rauchwerger, L.: Adaptive reduction parallelization techniques. In: ACM International Conference on Supercomputing 25th Anniversary Volume, pp. 311–322. ACM, New York, NY, USA (2014). http://doi.acm.org/10.1145/2591635.2667180