Advertisement

Performance Evaluation of Stencil Computations Based on Source-to-Source Transformations

  • Víctor MartínezEmail author
  • Matheus S. Serpa
  • Pablo J. Pavan
  • Edson Luiz Padoin
  • Philippe O. A. Navaux
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 979)

Abstract

Stencil computations are commons in High Performance Computing (HPC) applications, they consist in a pattern that replicates the same calculation in a data domain. The Finite-Difference Method is an example of stencil computations and it is used to solve real problems in diverse areas related to Partial Differential Equations (electromagnetics, fluid dynamics, geophysics, etc.). Although a large body of literature on optimization of this class of applications is available, the performance evaluation and its optimization on different HPC architectures remain a challenge. In this work, we implemented the 7-point Jacobian stencil in a Source-to-Source Transformation Framework (BOAST) to evaluate the performance of different HPC architectures. Achieved results present that the same source code can be executed on current architectures with a performance improvement, and it helps the programmer to develop the applications without dependence on hardware features.

Keywords

Stencil applications Heterogeneous architectures Source-to-source transformation Performance evaluation Performance improvement 

Notes

Acknowledgments

This work has been granted by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), the Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul (FAPERGS). Research has received funding from the EU H2020 Programme and from MCTI/RNP-Brazil under the HPC4E Project, grant agreement n.o 689772. It was also supported by Intel under the Modern Code project, and the PETROBRAS oil company under Ref. 2016/00133-9. We also thank to RICAP, partially funded by the Ibero-American Program of Science and Technology for Development (CYTED), Ref. 517RT0529.

References

  1. 1.
    Breuer, A., Heinecke, A., Bader, M.: Petascale local time stepping for the ADER-DG finite element method. In: 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016, Chicago, IL, USA, 23–27 May 2016, pp. 854–863 (2016)Google Scholar
  2. 2.
    Buchty, R., Heuveline, V., Karl, W., Weiss, J.P.: A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators. Concurrency Comput. Pract. Exp. 24(7), 663–675 (2012).  https://doi.org/10.1002/cpe.1904CrossRefGoogle Scholar
  3. 3.
    Christen, M., Schenk, O., Burkhart, H.: Automatic code generation and tuning for stencil kernels on modern shared memory architectures. Comput. Sci. 26(3–4), 205–210 (2011)Google Scholar
  4. 4.
    Cronsioe, J., Videau, B., Marangozova-Martin, V.: Boast: bringing optimization through automatic source-to-source transformations. In: 2013 IEEE 7th International Symposium on Embedded Multicore SoCs, pp. 129–134, September 2013. https://doi.org/10.1109/MCSoC.2013.12
  5. 5.
    Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Rev. 51(1), 129–159 (2009).  https://doi.org/10.1137/070693199CrossRefzbMATHGoogle Scholar
  6. 6.
    Datta, K., et al.: Auto-Tuning Stencil Computations on Multicore and Accelerators. CRC Press, Taylor & Francis Group (2010)Google Scholar
  7. 7.
    Dupros, F., Boulahya, F., Aochi, H., Thierry, P.: Communication-avoiding seismic numerical kernels on multicore processors. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), pp. 330–335, August 2015. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.230
  8. 8.
    Dupros, F., Do, H., Aochi, H.: On scalability issues of the elastodynamics equations on multicore platforms. In: Proceedings of the International Conference on Computational Science, ICCS 2013, Barcelona, Spain, 5–7 June 2013, pp. 1226–1234 (2013)Google Scholar
  9. 9.
    Forth, S.A., Tadjouddine, M., Pryce, J.D., Reid, J.K.: Jacobian code generated by source transformation and vertex elimination can be as efficient ash and-coding. ACM Trans. Math. Softw. 30(3), 266–299 (2004).  https://doi.org/10.1145/1024074.1024076. http://doi.acm.org/10.1145/1024074.1024076
  10. 10.
    Genssler, T., Kuttruff, V.: Source-to-source transformation in the large. In: Böszörményi, L., Schojer, P. (eds.) JMLC 2003. LNCS, vol. 2789, pp. 254–265. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-540-45213-3_31CrossRefGoogle Scholar
  11. 11.
    Khan, M., Priyanka, N., Ahmed, W., Radhika, N., Pavithra, M., Parimala, K.: Understanding source-to-source transformations for frequent porting of applications on changing cloud architectures. In: 2014 International Conference on Parallel, Distributed and Grid Computing, pp. 350–354, December 2014. https://doi.org/10.1109/PDGC.2014.7030769
  12. 12.
    Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. SIGPLAN Not. 44(4), 101–110 (2009).  https://doi.org/10.1145/1594835.1504194. http://doi.acm.org/10.1145/1594835.1504194
  13. 13.
    Loveman, D.B.: Program improvement by source-to-source transformation. J. ACM 24(1), 121–145 (1977).  https://doi.org/10.1145/321992.322000. http://doi.acm.org/10.1145/321992.322000
  14. 14.
    Martínez, V., Dupros, F., Castro, M., Navaux, P.: Performance improvement of stencil computations for multi-core architectures based on machine learning. Procedia Comput. Sci. 108, 305–314 (2017).  https://doi.org/10.1016/j.procs.2017.05.164. http://www.sciencedirect.com/science/article/pii/S1877050917307408. international Conference on Computational Science, ICCS 2017, 12–14 June 2017, Zurich, Switzerland
  15. 15.
    Mijakovic, R., Firbach, M., Gerndt, M.: An architecture for flexible auto-tuning: the periscope tuning framework 2.0. In: International Conference on Green High Performance Computing (ICGHPC), pp. 1–9, February 2016. https://doi.org/10.1109/ICGHPC.2016.7508066
  16. 16.
    Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. 47(4), 69:1–69:35 (2015).  https://doi.org/10.1145/2788396CrossRefGoogle Scholar
  17. 17.
    Moczo, P., Robertsson, J., Eisner, L.: The finite-difference time-domain method for modeling of seismic wave propagation. In: Advances in Wave Propagation in Heterogeneous Media, Advances in Geophysics, vol. 48, chap. 8, pp. 421–516. Elsevier - Academic Press (2007)Google Scholar
  18. 18.
    Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13, November 2010. https://doi.org/10.1109/SC.2010.2
  19. 19.
    Noaje, G., Jaillet, C., Krajecki, M.: Source-to-source code translator: OpenMP C to CUDA. In: 2011 IEEE International Conference on High Performance Computing and Communications, pp. 512–519, September 2011. https://doi.org/10.1109/HPCC.2011.73
  20. 20.
    Renault, E., Ancelin, C., Jimenez, W., Botero, O.: Using source-to-source transformation tools to provide distributed parallel applications from openMP source code. In: 2008 International Symposium on Parallel and Distributed Computing, pp. 197–204, July 2008. https://doi.org/10.1109/ISPDC.2008.65
  21. 21.
    Sodani, A., et al.: Knights landing: second-generation intelxeon phi product. IEEE Micro 36(2), 34–46 (2016).  https://doi.org/10.1109/MM.2016.25CrossRefGoogle Scholar
  22. 22.
    Stojanovic, S., Bojic, D., Bojovic, M., Valero, M., Milutinovic, V.: An overview of selected hybrid and reconfigurable architectures. In: 2012 IEEE International Conference on Industrial Technology (ICIT), pp. 444–449, March 2012. https://doi.org/10.1109/ICIT.2012.6209978
  23. 23.
    Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.K., Leiserson, C.E.: The pochoir stencil compiler. In: ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 117–128. ACM, New York (2011).  https://doi.org/10.1145/1989493.1989508. http://doi.acm.org/10.1145/1989493.1989508
  24. 24.
    Videau, B., et al.: Boast: a meta programming framework to produce portable and efficient computing kernels for HPC applications. Int. J. High Perform. Comput. Appl. 32(1), 28–44 (2018).  https://doi.org/10.1177/1094342017718068CrossRefGoogle Scholar
  25. 25.
    Wahib, M., Maruyama, N.: Automated GPU kernel transformations in large-scale production stencil applications. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2015, pp. 259–270. ACM, New York (2015).  https://doi.org/10.1145/2749246.2749255. http://doi.acm.org/10.1145/2749246.2749255
  26. 26.
    Zhao, B., Li, Z., Jannesari, A., Wolf, F., Wu, W.: Dependence-based code transformation for coarse-grained parallelism. In: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, COSMIC 2015, pp. 1:1–1:10. ACM, New York (2015).  https://doi.org/10.1145/2723772.2723777. http://doi.acm.org/10.1145/2723772.2723777

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Víctor Martínez
    • 1
    Email author
  • Matheus S. Serpa
    • 1
  • Pablo J. Pavan
    • 1
  • Edson Luiz Padoin
    • 2
  • Philippe O. A. Navaux
    • 1
  1. 1.Informatics InstituteUFRGSPorto AlegreBrazil
  2. 2.Department of Exact Sciences and EngineeringUNIJUIIjuíBrazil

Personalised recommendations