Extending Synchronization Constructs in OpenMP to Exploit Pipeline Parallelism on Heterogeneous Multi-core

  • Shigang Li
  • Shucai Yao
  • Haohu He
  • Lili Sun
  • Yi Chen
  • Yunfeng Peng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7017)


The ability of expressing multiple-levels of parallelism is one of the significant features in OpenMP parallel programming model. However, pipeline parallelism is not well supported in OpenMP. This paper proposes extensions to OpenMP directives, aiming at expressing pipeline parallelism effectively. The extended directives are divided into two groups. One can define the precedence at thread level while the other can define the precedence at iteration level. Through these directives, programmers can establish pipeline model more easily and exploit more parallelism to improve performance. To support these directives, a set of runtime interfaces for synchronization are implemented on the Cell heterogeneous multi-core architecture using signal block communications mechanism. Experimental results indicate that good performance can be obtained from the pipeline scheme proposed in this paper compared to the naive parallel applications.


Pipeline Parallelism OpenMP Cell architecture 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    OpenMP Application Program Interface, Version 3.0. OpenMP Architecture Review Board (2008)Google Scholar
  2. 2.
    Gonzalez, M., Ayguadé, E., Martorell, X., Labarta, J.: Defining and supporting pipelined executions in OpenMP. In: Eigenmann, R., Voss, M.J. (eds.) WOMPAT 2001. LNCS, vol. 2104, pp. 155–169. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  3. 3.
    Rangan, R., Vachharajani, N., Vachharajani, M., August, D.I.: Decoupled software pipelining with the synchronization array. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 177–188. IEEE Press, ashington, DC (2004)Google Scholar
  4. 4.
    Syrivelis, D., Lalis, S.: Extracting coarse-grained pipelined parallelism out of sequential applications for parallel processor arrays. In: Berekovic, M., Müller-Schloer, C., Hochberger, C., Wong, S. (eds.) ARCS 2009. LNCS, vol. 5455, pp. 4–15. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Michailidis, P.D., Margaritis, K.G.: Implementing parallel LU factorization with pipelining on a multicore using OpenMP. In: 13th IEEE International Conference on Computational Science and Engineering, pp. 253–260 (2010)Google Scholar
  6. 6.
    Baudisch, D., Brandt, J., Schneider, K.: Multithreaded code from synchronous programs: Generating software pipelines for OpenMP. In: Methoden und Beschreibungssprachen zur Modellierung und Verifikation (MBMV), Dresden, Germany (2010)Google Scholar
  7. 7.
    Kurzak, J., Dongarra, J.: QR factorization for the CELL processor. Scientific Programming 17, 31–42 (2009)CrossRefGoogle Scholar
  8. 8.
    Baudisch, D., Brandt, J., Schneider, K.: Multithreaded code from synchronous programs: Extracting independent threads for OpenMP. In: Design, Automation and Test in Europe (DATE), pp. 949–952. European Design and Automation Association (2010)Google Scholar
  9. 9.
    Teruel, X., Unnikrishnan, P., Martorell, X., et al.: Openmp tasks in ibm XL compilers. In: Proc. of the 2008 Conference of the Center for Advanced Studies on Collaborative Research, pp. 207–221. ACM Press, New York (2008)Google Scholar
  10. 10.
    Gschwind, M.: Chip multiprocessing and the cell broadband engine. In: CF 2006: Proceedings of the 3rd Conference on Computing Frontiers, pp. 1–8 (2006)Google Scholar
  11. 11.
    Thies, W., Chandrasekhar, V., Amarasinghe, S.: A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 356–369. IEEE Press, Washington, DC (2007)Google Scholar
  12. 12.
    Ottoni, G., Rangan, R., Stoler, A., August, D.I.: Automatic thread extraction with decoupled software pipelining. In: Proceedings of the 38th IEEE/ACM International Symposium on Microarchitecture, pp. 105–118. IEEE Press, Washington, DC (2005)Google Scholar
  13. 13.
    Gordon, M.I., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 151–162. ACM, New York (2006)Google Scholar
  14. 14.
    Jin, H., Frumkin, M., Yan, J.: The OpenMP implementation of NAS parallel benchmarks and its performance. NAS Technical Report NAS-99-011, NASA Ames Research Center, Moffett Field, CA(1999)Google Scholar
  15. 15.
    Ayguade, E., Copty, N., Duran, A., Hoeflinger, J., et al.: A proposal for task parallelism in OpenMP. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935, pp. 1–12. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Ayguade, E., Martorell, X., Labarta, J., Gonzalez, M., Navarro, N.: Exploiting multiple levels of parallelism in OpenMP: a case study. In: 1999 International Conference on Parallel Processing (ICPP), pp. 172–180 (1999)Google Scholar
  17. 17.
    Suess, M., Leopold, C.: Implementing data-parallel patterns for shared memory with OpenMP. In: Proceedings of the International Conference on Parallel Computing (PARCO). IOS Press, Amsterdam (2008)Google Scholar
  18. 18.
    Cao, Q., Hu, C., He, H., Huang, X., Li, S.: Support for OpenMP tasks on cell architecture. In: Hsu, C.-H., Yang, L.T., Park, J.H., Yeo, S.-S. (eds.) ICA3PP 2010. LNCS, vol. 6082, pp. 308–317. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  19. 19.
    Altevogt, P., Boettiger, H., Kiss, T., et al: IBM BladeCenter QS21 hardware performance, IBM Technical White Paper WP101245 [R], USA (2008)Google Scholar
  20. 20.
    SPEC: Standard Performance Evaluation Corporation,

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Shigang Li
    • 1
  • Shucai Yao
    • 1
  • Haohu He
    • 1
  • Lili Sun
    • 1
  • Yi Chen
    • 1
  • Yunfeng Peng
    • 1
  1. 1.University of Science and Technology BeijingBeijingChina

Personalised recommendations