ASSIST: An FDO Source-to-Source Transformation Tool for HPC Applications
Abstract
The complexity and the diversity of computer architectures have dramaticaly evolved over the last decade, which makes it impossible to manually optimize codes for all these architectures. In addition, compilers must remain conservative with respect to their optimization choices because of their static cost model. One way to guide them is to use feedback data from data profiling of a representative training dataset (FDO/PGO) for a given application. It then becomes possible, based on that knowledge, to add specific compiler directives and/or flags to enhance performance. Moreover, automatic transformations simplifying portions of the application (e.g. specialization) can be applied. In this paper we present ASSIST, a directive-oriented source-to-source manipulation tool that aims at providing such assistance. The tool is integrated into the MAQAO toolset and takes advantage of all the available static and dynamic profiling data produced by the other tools. It also features a set of code transformations triggered by directives. The combination of both leads to an autotuning process that helps users to keep their code as generic as possible whilst also benefiting from a performance gain related to feedback or user knowledge. We demonstrate how we can build a compiler’s PGO-like tool and compare our first results to the Intel compiler PGO mode.
Notes
Acknowledgements
We would like to thank Gabriel Staffelbach (CERFACS) for providing our laboratory with the AVBP application, as well as Ghislain Lartigue and Vincent Moureau (CORIA) for providing us with YALES2. This work has been carried out by the Li-PaRAD laboratory, PeXL and the Exascale Computing Research laboratory, with the support of CEA, Intel, UVSQ. Intel granted us dedicted access to a Skylake SP machine on which the experiments were run. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the CEA, Intel, or UVSQ.
References
- 1.
- 2.Amaral, J.N., Berube, P.: Aestimo: a Feedback-Directed Optimization Evaluation Tool. IEEE, Piscataway, NJ, USA (2006)Google Scholar
- 3.Barthou, D., Rubial, A.C., Jalby, W., Koliai, S., Valensi, C.: Performance tuning of x86 openmp codes with maqao. In: Parallel Tools Workshop, pp. 95–113. Desden, Germany, September 2009. SpringerGoogle Scholar
- 4.Bendifallah, Z., Jalby, W., Noudohouenou, J., Oseret, E., Palomares, V., Rubial, A.C.: PAMDA: performance assessment using MAQAO toolset and differential analysis, pp. 107–127. Springer International Publishing, Cham (2014)Google Scholar
- 5.Bodin, F., Dolbeau, R., Bihan, S.: Hmpp: a hybrid multi-core parallel programming environment. In: Workshop on General Purpose Processing on Graphics Processing Units (GPGPU 2007), vol. 28 (2007)Google Scholar
- 6.Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: ACM SIGPLAN Notices, pp. 101–113. ACM (2008)Google Scholar
- 7.Charif Rubial, A.S., Lereste, J.-B.: https://www.maqao.org/release/MAQAO.Tutorial.LProf.v1.pdf
- 8.Charif-Rubial, A.S., Barthou, D., Valensi, C., Shende, S., Malony, A., Jalby, W.: Mil: a language to build program analysis tools through static binary instrumentation. In: 20th Annual International Conference on High Performance Computing, pp. 206–215, Dec 2013Google Scholar
- 9.Chen, D., Xinliang Li, D., Moseley, T.: Autofdo: automatic feedback-directed optimization for warehouse-scale applications. In: Proceedings of the 2016 International Symposium on Code Generation and Optimization, CGO 2016, pp. 12–23. ACM, New York, NY, USA (2016)Google Scholar
- 10.Chris Lattner et Vikram Adve. Dms/spl reg: program transformations for practical scalable software evolution. In: Proceedings of the 26th International Conference on Software Engineering, ICSE 2004, pp. 625–634. IEEE (2004)Google Scholar
- 11.Chris Lattner et Vikram Adve. Llvm a compilation framework for lifelong program analysis and transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, p. 75. IEEE Computer (2004)Google Scholar
- 12.Chun Chen, J.C., Hall, M.: Chill: a framework for composing high-level loop transformations, June 2008Google Scholar
- 13.Cordy, J.R.: Source transformation, analysis and generation in txl. In: Proceedings of the 2006 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, PEPM 2006, pp. 1–11. ACM, New York, NY, USA (2006)Google Scholar
- 14.
- 15.Dave et al.: Cetus: a source-to-source compiler infrastructure for multicores. Computer, 36—42, December 2009Google Scholar
- 16.
- 17.Gonze, X. et al.: Abinit: first-principles approach to material and nanosystem properties. Comput. Phys. Commun., 2582–2615. Elsevier (2009)Google Scholar
- 18.
- 19.Hartono, A., Norris, B., Sadayappan, P.: Annotation-based empirical performance tuning using orio. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–11, May 2009Google Scholar
- 20.
- 21.Irigoin et al: Interprocedural analyses forprogramming environments. In: Workshop on Evironments and Tools For Parallel Scientifc Computing, Saint-Hilaire du Touvier, France, August 1992Google Scholar
- 22.Koliaï, S., Bendifallah, Z., Tribalat, M., Valensi, C., Acquaviva, J.-T., Jalby, W.: Quantifying performance bottleneck cost through differential analysis. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS 2013, pp. 263–272. ACM, New York, NY, USA, (2013)Google Scholar
- 23.MAQAO toolsuite. http://www.maqao.org
- 24.Novillo, D.: Samplepgo: the power of profile guided optimizations without the usability burden. In: Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, LLVM-HPC 2014, pp. 22–28. IEEE Press, Piscataway, NJ, USA (2014)Google Scholar
- 25.Palkowski, M., Bielecki, W.: TRACO Parallelizing Compiler, pp. 409–421. Springer International Publishing, Cham (2015)Google Scholar
- 26.Paraformance. http://paraformance.weebly.com/
- 27.Paul Klint, J.V., van der Storm, T.: Rascal a domain specific language for source code analysis ad manipulation. In: IEEE International Working Conference on Source Code Analysis and Manipulation, pp. 168–177. IEEE Computer Society (2009)Google Scholar
- 28.Quinlan et al.: Rose: compiler support for object-oriented framework. In: Parallel Processing Letters, pp. 215—226. Lawrence Livermore National Laboratory, Livermore, CA, USA, October 2000. World ScientificGoogle Scholar
- 29.Rubial, A.C., Oseret, E., Noudohouenou, J., Jalby, W., Lartigue, G.: CQA: a code quality analyzer tool at binary level. In: HiPC, pp. 1–10. IEEE Computer Society (2014)Google Scholar
- 30.Rudgyard, M., Schonfeld, T.: Steady and unsteady flow simulationsusing the hybrid flow solver avbp. AIAA J., 1378–1385. AIAA ARC (1999)Google Scholar
- 31.Takizawa, H., Suda, R., Hirasawa, S.: Xevtgen: fortran code transformer generator for high performance scienti c codes. Int. J. Network. Comput., 263—289 (2016)Google Scholar
- 32.Verdoolaege, S., et al.: Polyhedral parallel code generation for cuda. ACM Trans. Architec. Code Optim. ACM, January 2013Google Scholar
- 33.Vermaas, R., Bravenboer, M., Kalleberg, K.T., Visser, E.: Stratego/xt 0.17. a language and toolset for program transformation. In: Science of Computer Programming. Elsevier, May 2008Google Scholar
- 34.Wu, C., Lian, R., Zhang, J., Ju, R., Chan, S., Liu, L., Feng, X., Zhang, Z.: An Overview of the Open Research Compiler, pp. 17–31. Springer, Berlin Heidelberg, Berlin, Heidelberg (2005)Google Scholar
- 35.Xiao, X., Hirasawa, S., Takizawa, H., Kobayashi, H.: An approach to customization of compiler directives for application-specific code transformations. In: 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, pp. 99–106, Sept 2014Google Scholar
- 36.Yi, Q.: Poet: a scripting language for applying parameterized source-to-source program transformations. In: Software Practice And Experience, pp. 675–706. University of Texas at San Antonio, USA, May 2012. John Wiley and SonsGoogle Scholar