Optimization of the GNU OpenMP Synchronization Barrier in MPSoC
Synchronization mechanisms have been central issues in the race toward the computing units parallelization. Indeed when the number of cores increases, the applications are split into more and more software tasks, leading to the higher use of synchronization primitives to preserve the initial application services. In this context, providing efficient synchronization mechanisms turns to be essential to leverage parallelism offered by Multi-Processor Systems-on-Chip.
By using an instrumented emulation platform allowing us to extract accurate timing information, in a non-intrusive way, we led a fine analysis of the synchronization barriers of the GNU OpenMP library. This study reveals that a time expensive function was uselessly called during the barrier awakening process. We propose here a software optimization of this library that saves up to \(80\%\) of the release phase duration for a 16-core MSoCs. Moreover, being localized into the middle-ware OpenMP library, benefiting this optimization requires no specific care from the application programmer’s point of view, but a library update and can be used on every kinds of platform.
KeywordsGNU OpenMP library Emulation platform Synchronization barrier optimization Generic middle-ware optimization
- 1.gem5. http://gem5.org
- 3.NAS parallel benchmarks. https://www.nas.nasa.gov/publications/npb.html
- 7.Buchmann, R., Greiner, A.: A fully static scheduling approach for fast cycle accurate systemC simulation of MPSoCs. In: 2007 International Conference on Microelectronics, pp. 101–104 (2007)Google Scholar
- 8.Hoefler, T., Mehlan, T., Mietke, F., Rehm, W.: A survey of barrier algorithms for coarse grained supercomputers. Chemnitzer Informatik Berichte 04(03) (2004). ISSN: 0947-5152. http://www.unixer.de/~htor/publications/
- 9.Leiserson, C.E., et al.: The network architecture of the connection machine CM-5. In: Proceedings of the Fourth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA 1992, pp. 272–285. ACM (1992)Google Scholar
- 11.Soga, T., Sasaki, H., Hirao, T., Kondo, M., Inoue, K.: A flexible hardware barrier mechanism for many-core processors. In: Asia and South Pacific Design Automation Conference (ASP-DAC), 2015 20th Asia and South Pacific, pp. 61–68 (2015)Google Scholar
- 12.Villa, O., Palermo, G., Silvano, C.: Efficiency and scalability of barrier synchronization on NoC based many-core architectures. In: Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 2008, pp. 81–90. ACM (2008)Google Scholar
- 13.Wei, Z., Liu, P., Sun, R., Ying, R.: TAB barrier: hybrid barrier synchronization for NoC-based processors. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 409–412 (2015)Google Scholar
- 14.Zhengbin, P., Shaogang, W., Dan, W., Pingjing, L.: Hardware acceleration of barrier communication for large scale parallel computer. In: 2013 8th International ICST Conference on Communications and Networking in China (CHINACOM), pp. 610–614 (2013)Google Scholar