Iterative Collective Loop Fusion

  • T. J. Ashby
  • M. F. P. O’Boyle
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3923)


Naive code generation from high-level languages that encourage modularity can give rise to large numbers of simple loops for array-based programs. Collective loop fusion and array contraction can be used on such codes to improve temporal locality and performance. The problem is typically formalised using a loop dependence graph (LDG), with solutions denoted by fusion partitions. Much previous work has concentrated on approaches to the abstract formulation. We present our technique called iterative collective loop fusion based on empirically evaluating different transformations, and show how it can provide speedups over existing approaches of up to 1.38. We also give results showing that applying such techniques to high-level languages can provide speedups of up to 2.45 over the original code, and outperforms an equivalent code in Fortran.


Basic Block Original Code Enumeration Algorithm Partition Size Program Section 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Watt, S.M.: Aldor Users Guide,
  2. 2.
    Kennedy, K., McKinley, K.S.: Typed Fusion with Applications to Parallel and Sequential Code Generation. Techreport TR93-208. Rice University Dept. of Computer Science (1993)Google Scholar
  3. 3.
    Gao, G.R., Olsen, R., Sarkar, V., Thekkath, R.: Collective Loop Fusion for Array Contraction. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1992. LNCS, vol. 757, pp. 281–295. Springer, Heidelberg (1993)CrossRefGoogle Scholar
  4. 4.
    Lewis, E.C., Lin, C., Snyder, L.: The implementation and evaluation of fusion and contraction in array languages. In: PLDI 1998. Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, pp. 50–59. ACM Press, New York (1998)CrossRefGoogle Scholar
  5. 5.
    Song, Y., Xu, R., Wang, C., Li, Z.: Data locality enhancement by memory reduction. In: ICS 2001. Proceedings of the 15th international conference on Supercomputing, pp. 50–64. ACM Press, New York (2001)Google Scholar
  6. 6.
    Ding, C., Kennedy, K.: The Memory Bandwidth Bottleneck and its Amelioration by a Compiler. In: IPDPS 2000: Proceedings of the 14th International Symposium on Parallel and Distributed Processing, p. 181. IEEE Computer Society, Los Alamitos (2000)CrossRefGoogle Scholar
  7. 7.
    Singhai, S., McKinley, K.S.: A Parameterized Loop Fusion Algorithm for Improving Parallelism and Cache Locality. The Computer Journal 40(6), 340–355 (1997)CrossRefGoogle Scholar
  8. 8.
    Darte, A.: On the Complexity of Loop Fusion. In: PACT 1999, Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, p. 149. IEEE Computer Society, Los Alamitos (1999)Google Scholar
  9. 9.
    Kennedy, K.: Fast greedy weighted fusion. In: ICS 2000, pp. 131–140. ACM Press, New York (2000)Google Scholar
  10. 10.
    Megiddo, N., Sarkar, V.: Optimal weighted loop fusion for parallel programs. In: SPAA 1997: Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, pp. 282–291. ACM Press, New York (1997)CrossRefGoogle Scholar
  11. 11.
    Parello, D., Temam, O., Verdun, J.-M.: On increasing architecture awareness in program optimizations to bridge the gap between peak and sustained processor performance: matrix-multiply revisited. In: Supercomputing 2002, pp. 1–11. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  12. 12.
    Kisuki, T., Knijnenburg, P.M.W., O’Boyle, M.F.P.: Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation. In: PACT 2000, p. 237. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  13. 13.
    Nisbet, A.P.: GAPS: Iterative Feedback Directed Parallelisation Using Genetic Algorithms. In: Proceedings of Workshop on Profile and Feedback-Directed Compilation at PACT 1998, Paris, France (1998)Google Scholar
  14. 14.
    Gheorghita, S.V., Corporaal, H., Basten, T.: Iterative Compilation for Energy Reduction. Journal of Embedded Computing (to appear, 2005)Google Scholar
  15. 15.
    Ashby, T.J.: Design and Optimisation of Scientific Programs in a Categorical Language. PhD Thesis, University of Edinburgh (2005)Google Scholar
  16. 16.
    Freund, R., Nachtigal, N.: QMRpack,
  17. 17.
    Greenbaum, A.: Iterative methods for solving linear systems. Society for Industrial and Applied Mathematics (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • T. J. Ashby
    • 1
  • M. F. P. O’Boyle
    • 1
  1. 1.Institute for Computer Systems ArchitectureUniversity of EdinburghScotlandUK

Personalised recommendations