An Abstract Annotation Model for Skeletons

  • Marco Aldinucci
  • Sonia Campa
  • Peter Kilpatrick
  • Fabio Tordini
  • Massimo Torquati
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7542)


Multi-core and many-core platforms are becoming increasingly heterogeneous and asymmetric. This significantly increases the porting and tuning effort required for parallel codes, which in turn often leads to a growing gap between peak machine power and actual application performance. In this work a first step toward the automated optimization of high level skeleton-based parallel code is discussed. The paper presents an abstract annotation model for skeleton programs aimed at formally describing suitable mapping of parallel activities on a high-level platform representation. The derived mapping and scheduling strategies are used to generate optimized run-time code.


Parallel Programming Parallel Code Data Parallelism Target Architecture Algorithmic Skeleton 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Parnas, D.L.: On the design and development of program families. IEEE Trans. on Software Engineering SE-2(1), 1–9 (1976)CrossRefzbMATHGoogle Scholar
  2. 2.
    Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computations. Research Monographs in Par. and Distrib. Computing. Pitman (1989)Google Scholar
  3. 3.
    Botorog, G.H., Kuchen, H.: Skil: An imperative language with algorithmic skeletons for efficient distributed programming. In: Proc. of the 5th International Symposium on High Performance Distributed Computing, HPDC 1996, pp. 243–252. IEEE Computer Society Press (1996)Google Scholar
  4. 4.
    Darlington, J., Guo, Y., Jing, Y., To, H.W.: Skeletons for structured parallel composition. In: Proc. of the 15th Symposium on Principles and Practice of Parallel Programming (1995)Google Scholar
  5. 5.
    Bacci, B., Danelutto, M., Orlando, S., Pelagatti, S., Vanneschi, M.: P3L: A Structured High level programming language and its structured support. Concurrency Practice and Experience 7(3), 225–255 (1995)CrossRefGoogle Scholar
  6. 6.
    Hamdan, M., King, P., Michaelson, G.: A scheme for nesting algorithmic skeletons. In: Hammond, K., Davie, T., Clack, C. (eds.) Proc. of the 10th International Workshop on the Implementation of Functional Languages, IFL 1998, Department of Computer Science, University College London, pp. 195–211 (1998)Google Scholar
  7. 7.
    Aldinucci, M., Danelutto, M.: Skeleton based parallel programming: functional and parallel semantics in a single shot. Computer Languages, Systems and Structures 33(3-4), 179–192 (2007)CrossRefzbMATHGoogle Scholar
  8. 8.
    Intel Corp.: Threading Building Blocks (2011)Google Scholar
  9. 9.
    Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: Fastflow: high-level and efficient streaming on multi-core. In: Pllana, S., Xhafa, F. (eds.) Programming Multi-core and Many-core Computing Systems. Parallel and Distributed Computing. Wiley (2012)Google Scholar
  10. 10.
    Cole, M.: Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Computing 30(3), 389–406 (2004)CrossRefGoogle Scholar
  11. 11.
    González-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: High-level structured parallel programming enablers. Software: Practice and Experience 40(12), 1135–1160 (2010)Google Scholar
  12. 12.
    Vanneschi, M.: The programming model of ASSIST, an environment for parallel and distributed portable applications. Parallel Computing 28(12), 1709–1732 (2002)CrossRefzbMATHGoogle Scholar
  13. 13.
    Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Usenix OSDI 2004, pp. 137–150 (December 2004)Google Scholar
  14. 14.
    Thies, W., Karczmarek, M., Amarasinghe, S.: StreamIt: A Language for Streaming Applications. In: CC 2002. LNCS, vol. 2304, pp. 179–196. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  15. 15.
    Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. Comm. of the ACM 52(10), 56–67 (2009)CrossRefGoogle Scholar
  16. 16.
    Apache Software Foundation: Hadoop (2008),
  17. 17.
    Leijen, D., Hall, J.: Optimize managed code for multi-core machines. MSDN Magazine (October 2007)Google Scholar
  18. 18.
    Enmyren, J., Kessler, C.W.: Skepu: a multi-backend skeleton programming library for multi-gpu systems. In: Proceedings of the Fourth International Workshop on High-level Parallel Programming and Applications, HLPP 2010, pp. 5–14. ACM, New York (2010)CrossRefGoogle Scholar
  19. 19.
    Aldinucci, M., Coppola, M., Danelutto, M.: Rewriting skeleton programs: How to evaluate the data-parallel stream-parallel tradeoff. In: Gorlatch, S. (ed.) Proc. of CMPP: Intl. Workshop on Constructive Methods for Parallel Programming, Fakultät für mathematik und informatik, Uni. Passau, Germany, pp. 44–58 (May 1998)Google Scholar
  20. 20.
    Skillicorn, D.B., Cai, W.: A cost calculus for parallel functional programming. J. Parallel Distrib. Comput. 28(1), 65–83 (1995)CrossRefzbMATHGoogle Scholar
  21. 21.
    Aldinucci, M., Gorlatch, S., Lengauer, C., Pelagatti, S.: Towards parallel programming by transformation: The FAN skeleton framework. Parallel Algorithms and Applications 16(2-3), 87–121 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Caromel, D., Henrio, L., Leyton, M.: Type safe algorithmic skeletons. In: 16th Euromicro Intl. Conference on Parallel, Distributed and Network-Based Processing, PDP, Toulouse, France, pp. 45–53. IEEE (February 2008)Google Scholar
  23. 23.
    Gorlatch, S., Lengauer, C., Wedler, C.: Optimization rules for programming with collective operations. In: Proc. of the 13th International Parallel Processing Symposium & 10th Symposium on Parallel and Distributed Processing, IPPS/SPDP 1999, pp. 492–499. IEEE Computer Society Press (1999)Google Scholar
  24. 24.
    Skillicorn, D.B., Cai, W.: A cost calculus for parallel functional programming. Journal of Parallel and Distributed Computing 28, 65–83 (1995)CrossRefzbMATHGoogle Scholar
  25. 25.
    Aldinucci, M., Danelutto, M.: Stream parallel skeleton optimization. In: Proc. of PDCS: Intl. Conference on Parallel and Distributed Computing and Systems, Cambridge, Massachusetts, USA, pp. 955–962. IASTED, ACTA Press (November 1999)Google Scholar
  26. 26.
    Pottenger, B., Eigenmann, R.: Idiom recognition in the Polaris parallelizing compiler. In: Proc. of the 9th Intl. Conference on Supercomputing, ICS 1995, pp. 444–448. ACM Press, New York (1995)Google Scholar
  27. 27.
    Aldinucci, M., Torquati, M.: FastFlow website (2009),
  28. 28.
    Aldinucci, M., Danelutto, M., Kilpatrick, P., Meneghin, M., Torquati, M.: An Efficient Unbounded Lock-Free Queue for Multi-core Systems. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 662–673. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  29. 29.
    Aldinucci, M., Drocco, M., Giordano, D., Spampinato, C., Torquati, M.: A parallel edge preserving algorithm for salt and pepper image denoising. Technical Report 138/2011, Università degli Studi di Torino, Dip. di Informatica, Italy (May 2011)Google Scholar
  30. 30.
    Kuchen, H.: A Skeleton Library. In: Monien, B., Feldmann, R. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 620–629. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  31. 31.
    Ernsting, S., Kuchen, H.: Data parallel skeletons for gpu clusters and multi-gpu systems. In: Proceedings of PARCO 2011. IOS Press (2011)Google Scholar
  32. 32.
    Newton, R., Schlimbach, F., Hampton, M., Knobe, K.: Capturing and composing parallel patterns with Intel CnC. In: Proc. of USENIX Workshop on Hot Topics in Parallelism, HotPar 2010, Berkley, CA, USA (June 2010)Google Scholar
  33. 33.
    Park, I., Voss, M.J., Kim, S.W., Eigenmann, R.: Parallel programming environment for OpenMP. Scientific Programming 9, 143–161 (2001)CrossRefGoogle Scholar
  34. 34.
    Stratton, J.A., Stone, S.S., Hwu, W.-M.W.: MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 16–30. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  35. 35.
    Khronos Compute Working Group: OpenACC Directives for Accelerators (November 2012),

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Marco Aldinucci
    • 1
  • Sonia Campa
    • 2
  • Peter Kilpatrick
    • 3
  • Fabio Tordini
    • 1
  • Massimo Torquati
    • 2
  1. 1.Computer Science DepartmentUniversity of TorinoItaly
  2. 2.Computer Science DepartmentUniversity of PisaItaly
  3. 3.Computer Science DepartmentQueen’s University BelfastUK

Personalised recommendations