Advertisement

Beyond Do Loops: Data Transfer Generation with Convex Array Regions

  • Serge Guelton
  • Mehdi Amini
  • Béatrice Creusillet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7760)

Abstract

Automatic data transfer generation is a critical step for guided or automatic code generation for accelerators using distributed memories. Although good results have been achieved for loop nests, more complex control flows such as switches or while loops are generally not handled. This paper shows how to leverage the convex array regions abstraction to generate data transfers. The scope of this study ranges from inter-procedural analysis in simple loop nests with function calls, to inter-iteration data reuse optimization and arbitrary control flow in loop bodies. Generated transfers are approximated when an exact solution cannot be found. Array regions are also used to extend redundant load store elimination to array variables. The approach has been successfully applied to GPUs and domain-specific hardware accelerators.

Keywords

data transfers convex array regions redundant transfer elimination GPU 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alias, C., Darte, A., Plesco, A.: Program Analysis and Source-Level Communication Optimizations for High-Level Synthesis. Rapport de recherche RR-7648, INRIA (June 2011), http://hal.inria.fr/inria-00601822
  2. 2.
    Alias, C., Darte, A., Plesco, A.: Optimizing Remote Accesses for Offloaded Kernels: Application to High-Level Synthesis for FPGA. In: 2nd International Workshop on Polyhedral Compilation Techniques, Impact (January 2012)Google Scholar
  3. 3.
    Alias, C., Darte, A., Plesco, A.: Optimizing Remote Accesses for Offloaded Kernels: Application to High-level Synthesis for FPGA. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP, pp. 1–10. ACM, New York (2012)Google Scholar
  4. 4.
    Amini, M., Coelho, F., Irigoin, F., Keryell, R.: Static compilation analysis for host-accelerator communication optimization. In: International Workshop on Languages and Compilers for Parallel Computing, LCPC (September 2011)Google Scholar
  5. 5.
    Amini, M., Creusillet, B., Even, S., Keryell, R., Goubier, O., Guelton, S., McMahon, J.O., Pasquier, F.X., Péan, G., Villalon, P.: Par4All: From convex array regions to heterogeneous computing. In: 2nd International Workshop on Polyhedral Compilation Techniques, Impact (January 2012)Google Scholar
  6. 6.
    Baskaran, M.M., Ramanujam, J., Sadayappan, P.: Automatic C-to-CUDA Code Generation for Affine Programs. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 244–263. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Benabderrahmane, M.-W., Pouchet, L.-N., Cohen, A., Bastoul, C.: The Polyhedral Model Is More Widely Applicable Than You Think. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 283–303. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Bonnot, P., Lemonnier, F., Edelin, G., Gaillat, G., Ruch, O., Gauget, P.: Definition and SIMD implementation of a multi-processing architecture approach on FPGA. In: Design Automation and Test in Europe, DATE, pp. 610–615. IEEE Computer Society Press (2008)Google Scholar
  9. 9.
    Coelho, F.: Étude de la Compilation du High Performance Fortran. Ph.D. thesis, Université Paris VI (1993)Google Scholar
  10. 10.
    Creusillet, B.: Array Region Analyses and Applications. Ph.D. thesis, MINES ParisTech. (1996)Google Scholar
  11. 11.
    Creusillet, B., Irigoin, F.: Exact vs. Approximate Array Region Analyses. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1996. LNCS, vol. 1239, pp. 86–100. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  12. 12.
    Creusillet, B., Irigoin, F.: Interprocedural array region analyses. International Journal of Parallel Programming 24(6), 513–546 (1996)Google Scholar
  13. 13.
    Entreprise, C.: HMPP workbench, http://www.caps-entreprise.com/hmpp.html
  14. 14.
    Guelton, S.: Building Source-to-Source compilers for Heterogenous targets. Ph.D. thesis, Télécom Bretagne (2011)Google Scholar
  15. 15.
    Guelton, S.: Transformations for memory size and distribution. [14], chap. 6Google Scholar
  16. 16.
    Kandemir, M., Ramanujam, J., Irwin, M.J., Vijaykrishnan, N., Kadayif, I., Parikh, A.: A compiler-based approach for dynamically managing scratch-pad memories in embedded systems. In: Computer-Aided Design of Integrated Circuits and Systems, vol. 23, pp. 243–260. IEEE (February 2004)Google Scholar
  17. 17.
    Meister, B., Leung, A., Vasilache, N., Wohlford, D., Bastoul, C., Lethin, R.: Productivity via automatic code generation for PGAS platforms with the R-Stream compiler. In: Workshop on Asynchrony in the PGAS Programming Model, APGAS, Yorktown Heights, New York (June 2009)Google Scholar
  18. 18.
    Meister, B., Vasilache, N., Wohlford, D., Baskaran, M.M., Leung, A., Lethin, R.: R-Stream compiler. In: Padua, D.A. (ed.) Encyclopedia of Parallel Computing, pp. 1756–1765. Springer (2011)Google Scholar
  19. 19.
    NVIDIA, Cray, PGI, CAPS: The OpenACC Specification, version 1.0 (November 2011), http://www.openacc-standard.org/Downloads/OpenACC.1.0.pdf
  20. 20.
    Pugh, W.: The Omega test: a fast and practical integer programming algorithm for dependence analysis. In: Conference on Supercomputing, pp. 4–13. ACM, New York (1991)Google Scholar
  21. 21.
    Silkan: Par4All initiative for automatic parallelization (2010), http://www.par4all.org
  22. 22.
    Torquati, M., Vanneschi, M., Amini, M., Guelton, S., Keryell, R., Lanore, V., Pasquier, F.X., Barreteau, M., Barrère, R., Petrisor, C.T., Lenormand, É., Cantini, C., De Stefani, F.: An innovative compilation tool-chain for embedded multi-core architectures. In: Embedded World Conference (February 2012)Google Scholar
  23. 23.
    Triolet, R., Feautrier, P., Irigoin, F.: Direct parallelization of call statements. In: ACM SIGPLAN Symposium on Compiler Construction, pp. 176–185 (1986)Google Scholar
  24. 24.
    Ventroux, N., Sassolas, T., Guerre, A., Creusillet, B., Keryell, R.: SESAM/ Par4All: a tool for joint exploration of MPSoC architectures and dynamic dataflow code generation. In: Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO, pp. 9–16. ACM, New York (2012)CrossRefGoogle Scholar
  25. 25.
    Verdoolaege, S., Grosser, T.: Polyhedral Extraction Tool. In: 2nd International Workshop on Polyhedral Compilation Techniques, Impact (January 2012)Google Scholar
  26. 26.
    Wolfe, M.: Implementing the PGI accelerator model. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU, pp. 43–50. ACM, New York (2010)CrossRefGoogle Scholar
  27. 27.
    Wolfe, M.: Optimizing Data Movement in the PGI Accelerator Programming Model (February 2011), http://www.pgroup.com/lit/articles/insider/v3n1a1.htm
  28. 28.
    Wonnacott, D., Pugh, W.: Nonlinear array dependence analysis. In: Proceedings of the Third Workshop on Languages, Compilers and Run-Time Systems for Scalable Computers (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Serge Guelton
    • 1
  • Mehdi Amini
    • 2
    • 3
  • Béatrice Creusillet
    • 3
  1. 1.Telecom BretagneBrestFrance
  2. 2.MINES ParisTech/CRIFontainebleauFrance
  3. 3.HPC-ProjectMeudonFrance

Personalised recommendations