Abstract
Energy and performance of parallel systems are an increasing concern for new large-scale systems. Research has been developed in response to this challenge aiming the manufacture of more energy efficient systems. In this context, this paper proposes optimization methods to accelerate performance and increase energy efficiency of geophysics applications used in conjunction to algorithm and GPU memory characteristics. The optimizations we developed applied to Graphics Processing Units (GPU) algorithms for stencil applications achieve a performance improvement of up to 44.65% compared with the read-only version. The computational results have shown that the combination of use read-only memory, the Z-axis internalization and reuse of specific architecture registers allow increase the energy efficiency of up to 54.11% when shared memory was used and increase of up to 44.53% when read-only was used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bauer, M., Cook, H., Khailany, B.: Cudadma: optimizing GPU memory bandwidth via warp specialization. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 12:1–12:11. ACM, New York (2011). https://doi.org/10.1145/2063384.2063400. http://doi.acm.org/10.1145/2063384.2063400
de la Cruz, R., Araya-Polo, M.: Towards a multi-level cache performance model for 3D stencil computation. Procedia Comput. Sci. 4, 2146–2155 (2011)
Datta, K., et al.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 4. IEEE Press (2008)
Dong, Y., Chen, J., Tang, T.: Power measurements and analyses of massive object storage system. In: Proceedings of the International Conference on Computer and Information Technology (CIT), pp. 1317–1322. IEEE Computer Society (2010). https://doi.org/10.1109/CIT.2010.237
Falch, T.L., Elster, A.C.: Register caching for stencil computations on GPUs. In: 2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 479–486. IEEE, September 2014. https://doi.org/10.1109/SYNASC.2014.70
Feng, X., Ge, R., Cameron, K.W.: Power and energy profiling of scientific applications on distributed systems. In: International Parallel and Distributed Processing Symposium (IPDPS), International Conference on Performance Engineering, p. 34. IEEE (2005). https://doi.org/10.1109/IPDPS.2005.346
Hamilton, B., Webb, C.J., Gray, A., Bilbao, S.: Large stencil operations for GPU-based 3-d acoustics simulations. In: Proceedings of the Digital Audio Effects (DAFx), Trondheim, Norway (2015)
Laros, J., et al.: Topics on measuring real power usage on high performance computing platforms. In: Proceedings of the International Conference on Cluster Computing and Workshops (ICCC), pp. 1–8 (2009). https://doi.org/10.1109/CLUSTR.2009.5289179
Maruyama, N., Aoki, T.: Optimizing stencil computations for NVIDIA Kepler GPUs. In: Proceedings of the 1st International Workshop on High-Performance Stencil Computations, Vienna, pp. 89–95 (2014)
Micikevicius, P.: 3D finite difference computation on GPUs using CUDA. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2, pp. 79–84. ACM, New York (2009). https://doi.org/10.1145/1513895.1513905. http://doi.acm.org/10.1145/1513895.1513905
Nasciutti, T.C., Panetta, J.: Impacto da arquitetura de memória de GPGPUs na velocidade de computaçãpoundso de estênceis. In: XVII Simpósio de Sistemas Computacionais (WSCAD-SSC), Aracaju, SE, pp. 1–8 (2016)
Nikitin, V.V., Duchkov, A.A., Andersson, F.: Parallel algorithm of 3D wave-packet decomposition of seismic data: implementation and optimization for GPU. J. Comput. Sci. 3(6), 469–473 (2012)
Padoin, E.L., de Oliveira, D.A.G., Velho, P., Navaux, P.O.A., Mehaut, J.F.: ARM-based cluster: performance, scalability and energy efficiency. In: 4th Workshop on Applications for Multi-Core Architectures (WAMCA SBAC-PAD), Porto de Galinhas, PB, Brasil, pp. 1–6 (2013)
Padoin, E.L., Pilla, L.L., Boito, F.Z., Kassick, R.V., Velho, P., Navaux, P.O.: Evaluating application performance and energy consumption on hybrid CPU+GPU architecture. Cluster Comput. 16(3), 511–525 (2013)
Schafer, A., Fey, D.: High performance stencil code algorithms for GPGPUs. Procedia Comput. Sci. 4, 2027–2036 (2011). https://doi.org/10.1016/j.procs.2011.04.221. http://www.sciencedirect.com/science/article/pii/S1877050911002791. proceedings of the International Conference on Computational Science, ICCS 2011
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785. http://doi.acm.org/10.1145/1498765.1498785
Xue, Q., Wang, Y., Zhan, Y., Chang, X.: An efficient GPU implementation for locating micro-seismic sources using 3D elastic wave time-reversal imaging. Comput. Geosci. 82, 89–97 (2015)
Zhou, G., et al.: A novel GPU-accelerated strategy for contingency screening of static security analysis. Int. J. Electr. Power Energy Syst. 83, 33–39 (2016)
Zhou, J., Unat, D., Choi, D.J., Guest, C.C., Cui, Y.: Hands-on performance tuning of 3D finite difference earthquake simulation on GPU fermi chipset. Procedia Comput. Sci. 9, 976–985 (2012)
Acknowledgments
This research has received funding from the EU H2020 Programme and from MCTI/RNP-Brazil under the HPC4E Project, grant agreement n.o 689772. It was also supported by Intel under the Modern Code project, and the PETROBRAS oil company under Ref. 2016/00133-9. We also thank to RICAP, partially funded by the Ibero-American Program of Science and Technology for Development (CYTED), Ref. 517RT0529.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Pavan, P.J. et al. (2019). Improving Performance and Energy Efficiency of Geophysics Applications on GPU Architectures. In: Meneses, E., Castro, H., Barrios Hernández, C., Ramos-Pollan, R. (eds) High Performance Computing. CARLA 2018. Communications in Computer and Information Science, vol 979. Springer, Cham. https://doi.org/10.1007/978-3-030-16205-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-16205-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16204-7
Online ISBN: 978-3-030-16205-4
eBook Packages: Computer ScienceComputer Science (R0)