Iterative Collective Loop Fusion

Ashby, T. J.; O’Boyle, M. F. P.

doi:10.1007/11688839_17

T. J. Ashby¹⁸ &
M. F. P. O’Boyle¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3923))

Included in the following conference series:

International Conference on Compiler Construction

1094 Accesses
2 Citations

Abstract

Naive code generation from high-level languages that encourage modularity can give rise to large numbers of simple loops for array-based programs. Collective loop fusion and array contraction can be used on such codes to improve temporal locality and performance. The problem is typically formalised using a loop dependence graph (LDG), with solutions denoted by fusion partitions. Much previous work has concentrated on approaches to the abstract formulation. We present our technique called iterative collective loop fusion based on empirically evaluating different transformations, and show how it can provide speedups over existing approaches of up to 1.38. We also give results showing that applying such techniques to high-level languages can provide speedups of up to 2.45 over the original code, and outperforms an equivalent code in Fortran.

Download to read the full chapter text

Chapter PDF

withall: A Shorthand for Nested for Loop + If Statement

Extending Index-Array Properties for Data Dependence Analysis

Distributing and Parallelizing Non-canonical Loops

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Watt, S.M.: Aldor Users Guide, http://www.aldor.org
Kennedy, K., McKinley, K.S.: Typed Fusion with Applications to Parallel and Sequential Code Generation. Techreport TR93-208. Rice University Dept. of Computer Science (1993)
Google Scholar
Gao, G.R., Olsen, R., Sarkar, V., Thekkath, R.: Collective Loop Fusion for Array Contraction. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1992. LNCS, vol. 757, pp. 281–295. Springer, Heidelberg (1993)
Chapter Google Scholar
Lewis, E.C., Lin, C., Snyder, L.: The implementation and evaluation of fusion and contraction in array languages. In: PLDI 1998. Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, pp. 50–59. ACM Press, New York (1998)
Chapter Google Scholar
Song, Y., Xu, R., Wang, C., Li, Z.: Data locality enhancement by memory reduction. In: ICS 2001. Proceedings of the 15th international conference on Supercomputing, pp. 50–64. ACM Press, New York (2001)
Google Scholar
Ding, C., Kennedy, K.: The Memory Bandwidth Bottleneck and its Amelioration by a Compiler. In: IPDPS 2000: Proceedings of the 14th International Symposium on Parallel and Distributed Processing, p. 181. IEEE Computer Society, Los Alamitos (2000)
Chapter Google Scholar
Singhai, S., McKinley, K.S.: A Parameterized Loop Fusion Algorithm for Improving Parallelism and Cache Locality. The Computer Journal 40(6), 340–355 (1997)
Article Google Scholar
Darte, A.: On the Complexity of Loop Fusion. In: PACT 1999, Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, p. 149. IEEE Computer Society, Los Alamitos (1999)
Google Scholar
Kennedy, K.: Fast greedy weighted fusion. In: ICS 2000, pp. 131–140. ACM Press, New York (2000)
Google Scholar
Megiddo, N., Sarkar, V.: Optimal weighted loop fusion for parallel programs. In: SPAA 1997: Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, pp. 282–291. ACM Press, New York (1997)
Chapter Google Scholar
Parello, D., Temam, O., Verdun, J.-M.: On increasing architecture awareness in program optimizations to bridge the gap between peak and sustained processor performance: matrix-multiply revisited. In: Supercomputing 2002, pp. 1–11. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Kisuki, T., Knijnenburg, P.M.W., O’Boyle, M.F.P.: Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation. In: PACT 2000, p. 237. IEEE Computer Society Press, Los Alamitos (2000)
Google Scholar
Nisbet, A.P.: GAPS: Iterative Feedback Directed Parallelisation Using Genetic Algorithms. In: Proceedings of Workshop on Profile and Feedback-Directed Compilation at PACT 1998, Paris, France (1998)
Google Scholar
Gheorghita, S.V., Corporaal, H., Basten, T.: Iterative Compilation for Energy Reduction. Journal of Embedded Computing (to appear, 2005)
Google Scholar
Ashby, T.J.: Design and Optimisation of Scientific Programs in a Categorical Language. PhD Thesis, University of Edinburgh (2005)
Google Scholar
Freund, R., Nachtigal, N.: QMRpack, http://www.netlib.org/linalg/qmr/
Greenbaum, A.: Iterative methods for solving linear systems. Society for Industrial and Applied Mathematics (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Computer Systems Architecture, University of Edinburgh, Scotland, UK
T. J. Ashby & M. F. P. O’Boyle

Authors

T. J. Ashby
View author publications
You can also search for this author in PubMed Google Scholar
M. F. P. O’Boyle
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, CB3 0FD, Cambridge, UK
Alan Mycroft
Saarland University, Germany
Andreas Zeller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ashby, T.J., O’Boyle, M.F.P. (2006). Iterative Collective Loop Fusion. In: Mycroft, A., Zeller, A. (eds) Compiler Construction. CC 2006. Lecture Notes in Computer Science, vol 3923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11688839_17

Download citation

DOI: https://doi.org/10.1007/11688839_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33050-9
Online ISBN: 978-3-540-33051-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Iterative Collective Loop Fusion

Abstract

Chapter PDF

Similar content being viewed by others

withall: A Shorthand for Nested for Loop + If Statement

Extending Index-Array Properties for Data Dependence Analysis

Distributing and Parallelizing Non-canonical Loops

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Iterative Collective Loop Fusion

Abstract

Chapter PDF

Similar content being viewed by others

withall: A Shorthand for Nested for Loop + If Statement

Extending Index-Array Properties for Data Dependence Analysis

Distributing and Parallelizing Non-canonical Loops

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation