Abstract
This paper presents a system, Loopy, for programming loop transformations. Manual loop transformation can be tedious and error-prone, while fully automated methods do not guarantee improvements. Loopy takes a middle path: a programmer specifies a loop transformation at a high level, which is then carried out automatically by Loopy, and formally verified to guard against specification and implementation mistakes. Loopy ’s notation offers considerable flexibility with assembling transformations, while automation and checking prevent errors. Loopy is implemented for the LLVM framework, building on a polyhedral compilation library. Experiments show substantial improvements over fully automated loop transformations, using simple and direct specifications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The name is an obvious pun on ‘loop’ transformation. Moreover, “loopy” is slang for “crazy”, which we hope is not how this work strikes the reader!
References
Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison Wesley, Boston (2006)
Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, San Francisco (2001)
Banerjee, U.: Loop Transformations for Restructuring Compilers: Dependence analysis. Kluwer (1997)
Benabderrahmane, M., Pouchet, L., Cohen, A., Bastoul, C.: The polyhedral model is more widely applicable than you think. In: Compiler Construction, 19th International Conference, CC 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, 20–28 March 2010, Proceedings, pp. 283–303 (2010). http://dx.doi.org/10.1007/978-3-642-11970-5_16
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings ofthe ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, 7–13 June 2008, pp. 101–113 (2008). http://doi.acm.org/10.1145/1375581.1375595
Chen, C., Chame, J., Hall, M.: CHiLL: A framework for composing high-level loop transformations. Technical Report 08–897, University of Southern California (2008)
Donadio, S., Brodman, J., Roeder, T., Yotov, K., Barthou, D., Cohen, A., Garzarán, M.J., Padua, D.A., Pingali, K.K.: A language for the compact representation of multiple program versions. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 136–151. Springer, Heidelberg (2006)
Feautrier, P.: Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. Int. J. Parallel Program. 21(6), 389–420 (1992). http://dx.doi.org/10.1007/BF01379404
Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Int. J. Parallel Program. 34(3), 261–317 (2006). http://dx.doi.org/10.1007/s10766-006-0012-3
Grosser, T., Größlinger, A., Lengauer, C.: Polly-performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22(4) (2012). http://dx.doi.org/10.1142/S0129626412500107
Grosser, T., Verdoolaege, S., Cohen, A.: Polyhedral AST generation is more than scanning polyhedra. ACM Trans. Program. Lang. Syst. 37(4) (2015). http://dx.doi.org/10.1145/2743016. Article no. 12
Hartono, A., Norris, B., Sadayappan, P.: Annotation-based empirical performance tuning using Orio. In: 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, 23–29 May 2009, pp. 1–11 (2009). http://dx.doi.org/10.1109/IPDPS.2009.5161004
Intel: Intel Math Kernel Library (MKL) (2016). https://software.intel.com/en-us/intel-mkl/
Kelly, W., Pugh, W.: A framework for unifying reordering transformations. Technical Report UMIAS-TR-92-126.1, Univ. of Maryland, College Park, MD, USA (1993)
Khan, M.M., Basu, P., Rudy, G., Hall, M.W., Chen, C., Chame, J.: A script-based autotuning compiler system to generate high-performance CUDA code. TACO 9(4), 31 (2013). http://doi.acm.org/10.1145/2400682.2400690
Lamport, L.: The parallel execution of DO loops. Commun. ACM 17(2), 83–93 (1974). http://doi.acm.org/10.1145/360827.360844
Lattner, C., Adve, V.S.: LLVM: A compilation framework for lifelong program analysis & transformation. In: CGO, pp. 75–88 (2004). llvm.org
Muchnick, S.S.: Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco (1997)
Namjoshi, K.S., Singhania, N.: Loopy: programmable and formally verified loop transformations. Technical Report MS-CIS-16-04, Department of Computer and Information Science, University of Pennsylvania (2016)
Pouchet, L.N.: Polybench, the polyhedral benchmark suite (2015). http://polybench.sourceforge.net/
Püschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: code generation for DSP transforms. Proc. IEEE 93(2), 232–275 (2005). Program Generation, Optimization, and Adaptation
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the 34th ACMSIGPLAN Conference on Programming Language Design and Implementation, pp. 519–530, PLDI 2013. ACM, New York (2013). http://doi.acm.org/10.1145/2491956.2462176
Rudy, G., Khan, M.M., Hall, M.W., Chen, C., Chame, J.: A programming language interface to describe transformations and code generation. In: Languages and Compilers for Parallel Computing - 23rd International Workshop, LCPC 2010, Houston, TX, USA, 7–9 October 2010, Revised Selected Papers, pp. 136–150 (2010). http://dx.doi.org/10.1007/978-3-642-19595-2_10
Spampinato, D.G., Püschel, M.: A basic linear algebra compiler. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 23:23–23:32, CGO 2014. ACM, New York (2014). http://doi.acm.org/10.1145/2544137.2544155
Steuwer, M., Fensch, C., Lindley, S., Dubach, C.: Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code. SIGPLAN Not. 50(9), 205–217 (2015). http://doi.acm.org/10.1145/2858949.2784754
Tiwari, A., Chen, C., Chame, J., Hall, M.W., Hollingsworth, J.K.: A scalable auto-tuning framework for compiler optimization. In: 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, 23–29 May 2009, pp. 1–12 (2009). http://dx.doi.org/10.1109/IPDPS.2009.5161054
Verdoolaege, S.: isl: an integer set library for the polyhedral model. In: Mathematical Software - ICMS 2010, Third International Congress on Mathematical Software, Kobe, Japan, 13–17 September 2010, Proceedings, pp. 299–302 (2010). http://dx.doi.org/10.1007/978-3-642-15582-6_49
Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, pp. 1–27, SC 1998. IEEE Computer Society, Washington, DC (1998). http://dl.acm.org/citation.cfm?id=509058.509096
Yi, Q.: POET: a scripting language for applying parameterized source-to-source program transformations. Softw. Pract. Exper. 42(6), 675–706 (2012). http://dx.doi.org/10.1002/spe.1089
Acknowledgements
We would like to thank our colleagues at Bell Labs and at the University of Pennsylvania for their helpful comments on this research. This work was supported, in part, by DARPA under agreement number FA8750-12-C-0166. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag GmbH Germany
About this paper
Cite this paper
Namjoshi, K.S., Singhania, N. (2016). Loopy: Programmable and Formally Verified Loop Transformations. In: Rival, X. (eds) Static Analysis. SAS 2016. Lecture Notes in Computer Science(), vol 9837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53413-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-662-53413-7_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-53412-0
Online ISBN: 978-3-662-53413-7
eBook Packages: Computer ScienceComputer Science (R0)