Parallel Programming with Algorithmic Skeletons

  • Herbert KuchenEmail author


Today, parallel programming is typically based on low-level frameworks such as MPI, OpenMP, and CUDA. Developing software on this level of abstraction is tedious, error-prone, and restricted to a specific hardware platform. Parallel programming can be considerably simplified but introducing more structure. Thus, we suggest providing predefined typical parallel-programming patterns. The user has to structure a parallel program clearly by composing these patterns in an easy way without having to know, how these patterns have been efficiently implemented in parallel on top of low-level frameworks. In this paper, we present the Muenster skeleton library (Muesli), which provides such a system of parallel programming patterns and hence enables structured parallel programming.


Parallel programming Structure Algorithmic skeletons 


  1. Aldinucci, M., Danelutto, M., Kilpatrick, P., & Torquati, M. (2017). Fastflow: High-level and efficient streaming on mulicore. In S. Pllana & F. Xhafa (Eds.), Programming multicore and many-core computing systems (pp. 261–280). Hoboken, NJ, USA: Wiley.CrossRefGoogle Scholar
  2. Bassini, S., Danelutto, M., Dazzi, P., Joubert, G. R., & Peters, F. J. (Eds.). (2018). Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing, ParCo 2017, 12–15 September 2017, Bologna, Italy (Vol. 32). Amsterdam, Netherlands: IOS Press.Google Scholar
  3. Benoit, A., Cole, M., Gilmore, S., & Hillston, J. (2005). Flexible skeletal programming with eSkel. In J. C. Cunha & P. D. Medeiros (Eds.), Euro-Par 2005 parallel processing (Vol. 3648, pp. 761–770). Heidelberg: Springer.CrossRefGoogle Scholar
  4. Botorog, G. H., & Kuchen, H. (1996). Skil: An imperative language with algorithmic skeletons. In Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing HPDC 1996 (pp. 243–252 T4–An Imperative Language with Algorithm). IEEE.Google Scholar
  5. Botorog, G. H., & Kuchen, H. (1998). Efficient high-level parallel programming. Theoretical Computer Science, 196(1–2), 71–107.CrossRefGoogle Scholar
  6. Chapman, B., Jost, G., & van der Pas, R. (2008). Using OpenMP: Portable shared memory parallel programming. scientific and engineering computation. Cambridge, MA: MIT Press.Google Scholar
  7. Cole, M. (1989). Algorithmic skeletons: Structured management of parallel computing. Research monographs in parallel and distributed computing. London (UK): Pitman.Google Scholar
  8. Cole, M. (2004). Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Computing, 30(3), 389–406.CrossRefGoogle Scholar
  9. Crotinger, J. A., Cummings, J., Haney, S., Humphrey, W., Karmesin, S., Reynders, J., … Williams, T. J. (2000). Generic programming in POOMA and PETE. In M. Jazayeri, R. G. K. Loos & D. R. Musser (Eds.), Generic programming (Vol. 1766, pp. 218–231). Heidelberg: Springer.CrossRefGoogle Scholar
  10. Danelutto, M., Pasqualetti, F., & Pelagatti, S. (1997). Skeletons for data parallelism in p31. In C. Lengauer, M. Griebl, & S. Gorlatch (Eds.), Euro-Par 1997 parallel processing (Vol. 1300, pp. 619–628). Heidelberg: Springer.Google Scholar
  11. Darlington, J., Field, A. J., Harrison, P. G., Kelly, P. H. J., Sharp, D. W. N., Wu, Q., et al. (1993). Parallel programming using skeleton functions. In A. Bode, M. Reeve, & G. Wolf (Eds.), PARLE 1993 parallel architectures and languages Europe (Vol. 694, pp. 146–160). Heidelberg: Springer.Google Scholar
  12. Darlington, J., Guo, Y., To, H. W., & Yang, J. (1995). Functional skeletons for parallel coordination. In G. Goos, J. Hartmanis, J. van Leeuwen, S. Haridi, K. Ali, & P. Magnusson (Eds.), EURO-PAR 1995 parallel processing (Vol. 966, pp. 55–66). Heidelberg: Springer.Google Scholar
  13. Ernsting, S., & Kuchen, H. (2012). Algorithmic skeletons for multi-core, multi-GPU systems and clusters. International Journal of High Performance Computing and Networking, 7(2), 129–138.CrossRefGoogle Scholar
  14. Ernstsson, A., Li, L., & Kessler, C. (2018). SkePU 2: Flexible and type-safe skeleton programming for heterogeneous parallel systems. International Journal of Parallel Programming, 46(1), 62–80.CrossRefGoogle Scholar
  15. Foster, I., Olson, R., & Tuecke, S. (1992). Productive parallel programming. Scientific Programming, 1(1), 51–66.CrossRefGoogle Scholar
  16. Gropp, W., Lusk, E., & Skjellum, A. (2014). Using MPI: Portable parallel programming with the message-passing interface. (E. Lusk & A. Skjellum, Eds.), Scientific and Engineering Computation (3rd ed.). Cambridge, MA: MIT Press.Google Scholar
  17. Kennedy, K., Koelbel, C., & Zima, H. (2007). The rise and fall of high performance fortran. In B. Ryder & B. Hailpern (Eds.), HOPL III: Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages (p. 7–1–7–22 TS–CrossRef). San Diego, California: ACM.Google Scholar
  18. Kuchen, H., Plasmeijer, R., & Stoltze, H. (1994). Efficient distributed memory implementation of a data parallel functional language. In C. Halatsis, D. Maritsas, G. Philokyprou, & S. Theodoridis (Eds.), PARLE 1994 parallel architectures and languages Europe (Vol. 817, pp. 464–477). Heidelberg: Springer.Google Scholar
  19. Kuchen, H., & Striegnitz, J. (2002). Higher-order functions and partial applications for a C++ skeleton library. In J. E. Moreira, G. C. Fox & V. Getov (Eds.), Proceedings of the 2002 Joint ACM-ISCOPE Conference on Java Grande (p. 122–130 TS–CrossRef). New York: ACM.Google Scholar
  20. Kuchen, H., & Striegnitz, J. (2005). Features from functional programming for a C++ skeleton library. Concurrency and Computation: Practice and Experience, 17(7–8), 739–756.CrossRefGoogle Scholar
  21. Nickolls, J., Buck, I., Garland, M., & Skadron, K. (2008, March). Scalable parallel programming with CUDA. Queue - GPU Computing, 6(2), 40–53.CrossRefGoogle Scholar
  22. OpenMP. (n.d.). Retrieved September 20, 2018, from
  23. Owens, J. D., Houston, M., Luebke, D., Green, S., Stone, J. E., & Phillips, J. C. (2008). GPU Computing. Proceedings of the IEEE, 96(5), 879–899.CrossRefGoogle Scholar
  24. Poldner, M., & Kuchen, H. (2008a). Algorithmic skeletons for branch and bound. In J. Filipe, B. Shishkov, & M. Helfert (Eds.), Software and data technologies (Vol. 10, pp. 204–219). Heidelberg: Springer.CrossRefGoogle Scholar
  25. Poldner, M., & Kuchen, H. (2008b). On implementing the farm skeleton. Parallel Processing Letters, 18(01), 117–131.CrossRefGoogle Scholar
  26. Poldner, M., & Kuchen, H. (2008c). Skeletons for divide and conquer algorithms. In H. Burkhart (Ed.), PDCN 2008 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks (pp. 181–188). Anaheim, CA: ACTA Press.Google Scholar
  27. Schulz, M., Graham, R., Squyres, J., & Gropp, B. (n.d.). MPI forum. Retrieved September 23, 2018, from
  28. Skillicorn, D. B. (1994). Foundations of parallel programming. Cambridge international series on parallel computation (Vol. 6). Cambridge, UK: Cambridge University Press.
  29. Skillicorn, D. B., Hill, J. M. D., & McColl, W. F. (1997). Questions and answers about BSP. Scientific Programming, 6(3), 249–274.CrossRefGoogle Scholar
  30. Stone, J. E., Gohara, D., & Shi, G. (2010). OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering, 12(3), 66–72.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of MünsterMünsterGermany

Personalised recommendations