Using Free Scheduling for Programming Graphic Cards

  • Wlodzimierz Bielecki
  • Marek Palkowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7174)


An approach is presented permitting us to build free scheduling for statement instances of affine loops. Under the free schedule, loop statement instances are executed as soon as their operands are available. To describe and implement the approach, the dependence analysis by Pugh and Wonnacott was chosen where dependences are found in the form of tuple relations. The proposed algorithm has been implemented and verified by means of the Omega project software. Results of experiments with the NAS benchmark suite are discussed. Speed-up and efficiency of parallel code produced by means of the approach are studied. Problems to be resolved in order to enhance the power of the presented technique are outlined.


fine-grained parallelism free scheduling parameterized affine loops NVIDIA cards 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT 13 IEEE International Conference on Parallel Architecture and Compilation Techniques, Juan-les-Pins, pp. 7–16 (September 2004)Google Scholar
  2. 2.
    Beletska, A., Bielecki, W., Cohen, A., Palkowski, M., Siedlecki, K.: Coarse-grained loop parallelization: Iteration space slicing vs affine transformations. Parallel Computing 37, 479–497 (2011)CrossRefGoogle Scholar
  3. 3.
    Beletskyy, V., Siedlecki, K.: Finding Free Schedules for Non-uniform Loops. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 297–302. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Bielecki, W., Klimek, T., Trifunovic, K.: Calculating exact transitive closure for a normalized affine integer tuple relation. Electronic Notes in Discrete Mathematics 33, 7–14 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Wlodzimierz, B., Tomasz, K., Marek, P., Beletska, A.: An Iterative Algorithm of Computing the Transitive Closure of a Union of Parameterized Affine Integer Tuple Relations. In: Wu, W., Daescu, O. (eds.) COCOA 2010, Part I. LNCS, vol. 6508, pp. 104–113. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Bielecki, W., Palkowski, M.: Extracting Both Affine and Non-linear Synchronization-Free Slices in Program Loops. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009. LNCS, vol. 6067, pp. 196–205. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: Conference on Programming Language Design and Implementation, pp. 101–113. ACM (2008)Google Scholar
  8. 8.
    Chen, D.K.: Compiler optimizations for parallel loops with fine-grained synchronization. Ph.D. thesis, Champaign, IL, USA (1994), uMI Order No. GAX95-12325Google Scholar
  9. 9.
    Darte, A., Robert, Y.: Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distrib. Syst. 5, 814–822 (1994)CrossRefGoogle Scholar
  10. 10.
    Darte, A., Vivien, F.: Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. In: Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques, PACT 1996, pp. 281–291. IEEE Computer Society, Washington, DC, USA (1996)CrossRefGoogle Scholar
  11. 11.
    Darte, A., Khachiyan, L., Robert, Y.: Linear scheduling is nearly optimal. Parallel Processing Letters 1(2), 73–81 (1991)CrossRefGoogle Scholar
  12. 12.
    Feautrier, P.: Some efficient solutions to the affine scheduling problem: I. one-dimensional time. Int. J. Parallel Program. 21(5), 313–348 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Feautrier, P.: Some efficient solutions to the affine scheduling problem: II. multi-dimensional time. Int. J. Parallel Program. 21(5), 389–420 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Kelly, W., Maslov, V., Pugh, W., Rosser, E., Shpeisman, T., Wonnacott, D.: The omega library interface guide. Tech. rep., College Park, MD, USA (1995)Google Scholar
  15. 15.
    Kelly, W., Pugh, W.: A framework for unifying reordering transformations. Tech. rep., Univ. of Maryland Institute for Advanced Computer Studies Report No. UMIACS-TR-92-126.1, College Park, MD, USA (1993)Google Scholar
  16. 16.
    Kelly, W., Pugh, W., Rosser, E., Shpeisman, T.: Transitive closure of infinite graphs and its applications. Int. J. Parallel Programming 24(6), 579–598 (1996)CrossRefGoogle Scholar
  17. 17.
    Krothapalli, V., Sadayappan, P.: Removal of redundant dependences in doacross loops with constant dependences. IEEE Transactions on Parallel and Distributed Systems 2, 281–289 (1991)CrossRefGoogle Scholar
  18. 18.
    Le Gouëslier d’Argence, P.: Affine scheduling on bounded convex polyhedric domains is asymptotically optimal. Theor. Comput. Sci. 196, 395–415 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Lim, A.W., Cheong, G.I., Lam, M.S.: An affine partitioning algorithm to maximize parallelism and minimize communication. In: Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing, pp. 228–237. ACM Press (1999)Google Scholar
  20. 20.
    Surhone, L.M., Tennoe, M.T., Henssonow, S.F.: Presburger Arithmetic. VDM Verlag Dr. Mueller AG & Co. Kg (2010); ISBN: 6133083557Google Scholar
  21. 21.
    Midkiff, S.P., Padua, D.A.: Compiler algorithms for synchronization. IEEE Transactions on Computers 36, 1485–1495 (1987)CrossRefzbMATHGoogle Scholar
  22. 22.
    Midkiff, S.P., Padua, D.A.: A comparison of four synchronization optimization techniques. In: ICPP (2), pp. 9–16 (1991)Google Scholar
  23. 23.
    NAS: Parallel Benchmarks Suite, Version 3.3 (February 2008),
  24. 24.
  25. 25.
    Pugh, W., Wonnacott, D.: An Exact Method for Analysis of Value-Based Array Data Dependences. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768, pp. 546–566. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  26. 26.
    Verdoolaege, S.: Integer set library - manual. Tech. rep. (2011),
  27. 27.
    Vivien, F.: On the Optimality of Feautrier’s Scheduling Algorithm. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 299–308. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Wlodzimierz Bielecki
    • 1
  • Marek Palkowski
    • 1
  1. 1.Faculty of Computer ScienceWest Pomeranian University of TechnologySzczecinPoland

Personalised recommendations