Advertisement

Improving Performance and Energy Efficiency of Geophysics Applications on GPU Architectures

  • Pablo J. PavanEmail author
  • Matheus S. Serpa
  • Emmanuell Diaz Carreño
  • Víctor Martínez
  • Edson Luiz Padoin
  • Philippe O. A. Navaux
  • Jairo Panetta
  • Jean-François Mehaut
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 979)

Abstract

Energy and performance of parallel systems are an increasing concern for new large-scale systems. Research has been developed in response to this challenge aiming the manufacture of more energy efficient systems. In this context, this paper proposes optimization methods to accelerate performance and increase energy efficiency of geophysics applications used in conjunction to algorithm and GPU memory characteristics. The optimizations we developed applied to Graphics Processing Units (GPU) algorithms for stencil applications achieve a performance improvement of up to 44.65% compared with the read-only version. The computational results have shown that the combination of use read-only memory, the Z-axis internalization and reuse of specific architecture registers allow increase the energy efficiency of up to 54.11% when shared memory was used and increase of up to 44.53% when read-only was used.

Keywords

Geophysics applications Manycore systems Energy efficiency GPU 

Notes

Acknowledgments

This research has received funding from the EU H2020 Programme and from MCTI/RNP-Brazil under the HPC4E Project, grant agreement n.o 689772. It was also supported by Intel under the Modern Code project, and the PETROBRAS oil company under Ref. 2016/00133-9. We also thank to RICAP, partially funded by the Ibero-American Program of Science and Technology for Development (CYTED), Ref. 517RT0529.

References

  1. 1.
    Bauer, M., Cook, H., Khailany, B.: Cudadma: optimizing GPU memory bandwidth via warp specialization. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 12:1–12:11. ACM, New York (2011).  https://doi.org/10.1145/2063384.2063400. http://doi.acm.org/10.1145/2063384.2063400
  2. 2.
    de la Cruz, R., Araya-Polo, M.: Towards a multi-level cache performance model for 3D stencil computation. Procedia Comput. Sci. 4, 2146–2155 (2011)CrossRefGoogle Scholar
  3. 3.
    Datta, K., et al.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 4. IEEE Press (2008)Google Scholar
  4. 4.
    Dong, Y., Chen, J., Tang, T.: Power measurements and analyses of massive object storage system. In: Proceedings of the International Conference on Computer and Information Technology (CIT), pp. 1317–1322. IEEE Computer Society (2010).  https://doi.org/10.1109/CIT.2010.237
  5. 5.
    Falch, T.L., Elster, A.C.: Register caching for stencil computations on GPUs. In: 2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 479–486. IEEE, September 2014.  https://doi.org/10.1109/SYNASC.2014.70
  6. 6.
    Feng, X., Ge, R., Cameron, K.W.: Power and energy profiling of scientific applications on distributed systems. In: International Parallel and Distributed Processing Symposium (IPDPS), International Conference on Performance Engineering, p. 34. IEEE (2005).  https://doi.org/10.1109/IPDPS.2005.346
  7. 7.
    Hamilton, B., Webb, C.J., Gray, A., Bilbao, S.: Large stencil operations for GPU-based 3-d acoustics simulations. In: Proceedings of the Digital Audio Effects (DAFx), Trondheim, Norway (2015)Google Scholar
  8. 8.
    Laros, J., et al.: Topics on measuring real power usage on high performance computing platforms. In: Proceedings of the International Conference on Cluster Computing and Workshops (ICCC), pp. 1–8 (2009).  https://doi.org/10.1109/CLUSTR.2009.5289179
  9. 9.
    Maruyama, N., Aoki, T.: Optimizing stencil computations for NVIDIA Kepler GPUs. In: Proceedings of the 1st International Workshop on High-Performance Stencil Computations, Vienna, pp. 89–95 (2014)Google Scholar
  10. 10.
    Micikevicius, P.: 3D finite difference computation on GPUs using CUDA. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2, pp. 79–84. ACM, New York (2009).  https://doi.org/10.1145/1513895.1513905. http://doi.acm.org/10.1145/1513895.1513905
  11. 11.
    Nasciutti, T.C., Panetta, J.: Impacto da arquitetura de memória de GPGPUs na velocidade de computaçãpoundso de estênceis. In: XVII Simpósio de Sistemas Computacionais (WSCAD-SSC), Aracaju, SE, pp. 1–8 (2016)Google Scholar
  12. 12.
    Nikitin, V.V., Duchkov, A.A., Andersson, F.: Parallel algorithm of 3D wave-packet decomposition of seismic data: implementation and optimization for GPU. J. Comput. Sci. 3(6), 469–473 (2012)CrossRefGoogle Scholar
  13. 13.
    Padoin, E.L., de Oliveira, D.A.G., Velho, P., Navaux, P.O.A., Mehaut, J.F.: ARM-based cluster: performance, scalability and energy efficiency. In: 4th Workshop on Applications for Multi-Core Architectures (WAMCA SBAC-PAD), Porto de Galinhas, PB, Brasil, pp. 1–6 (2013)Google Scholar
  14. 14.
    Padoin, E.L., Pilla, L.L., Boito, F.Z., Kassick, R.V., Velho, P., Navaux, P.O.: Evaluating application performance and energy consumption on hybrid CPU+GPU architecture. Cluster Comput. 16(3), 511–525 (2013)CrossRefGoogle Scholar
  15. 15.
    Schafer, A., Fey, D.: High performance stencil code algorithms for GPGPUs. Procedia Comput. Sci. 4, 2027–2036 (2011).  https://doi.org/10.1016/j.procs.2011.04.221. http://www.sciencedirect.com/science/article/pii/S1877050911002791. proceedings of the International Conference on Computational Science, ICCS 2011
  16. 16.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009).  https://doi.org/10.1145/1498765.1498785. http://doi.acm.org/10.1145/1498765.1498785
  17. 17.
    Xue, Q., Wang, Y., Zhan, Y., Chang, X.: An efficient GPU implementation for locating micro-seismic sources using 3D elastic wave time-reversal imaging. Comput. Geosci. 82, 89–97 (2015)CrossRefGoogle Scholar
  18. 18.
    Zhou, G., et al.: A novel GPU-accelerated strategy for contingency screening of static security analysis. Int. J. Electr. Power Energy Syst. 83, 33–39 (2016)CrossRefGoogle Scholar
  19. 19.
    Zhou, J., Unat, D., Choi, D.J., Guest, C.C., Cui, Y.: Hands-on performance tuning of 3D finite difference earthquake simulation on GPU fermi chipset. Procedia Comput. Sci. 9, 976–985 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Pablo J. Pavan
    • 1
    Email author
  • Matheus S. Serpa
    • 1
  • Emmanuell Diaz Carreño
    • 2
  • Víctor Martínez
    • 1
  • Edson Luiz Padoin
    • 1
    • 3
  • Philippe O. A. Navaux
    • 1
  • Jairo Panetta
    • 4
  • Jean-François Mehaut
    • 5
  1. 1.Informatics InstituteFederal University of Rio Grande do Sul – UFRGSPorto AlegreBrazil
  2. 2.Department of InformaticsFederal University of Paraná – UFPRCuritibaBrazil
  3. 3.Department of Exact Sciences and EngineeringRegional University of the Northwest of the State of Rio Grande do Sul – UNIJUIIjuíBrazil
  4. 4.Computer Science DivisionTechnological Institute of Aeronautics – ITASão José dos CamposBrazil
  5. 5.Laboratoire d’Informatique de GrenobleUniversity of Grenoble – UGAGrenobleFrance

Personalised recommendations