Skip to main content

Improving Performance and Energy Efficiency of Geophysics Applications on GPU Architectures

  • Conference paper
  • First Online:
High Performance Computing (CARLA 2018)

Abstract

Energy and performance of parallel systems are an increasing concern for new large-scale systems. Research has been developed in response to this challenge aiming the manufacture of more energy efficient systems. In this context, this paper proposes optimization methods to accelerate performance and increase energy efficiency of geophysics applications used in conjunction to algorithm and GPU memory characteristics. The optimizations we developed applied to Graphics Processing Units (GPU) algorithms for stencil applications achieve a performance improvement of up to 44.65% compared with the read-only version. The computational results have shown that the combination of use read-only memory, the Z-axis internalization and reuse of specific architecture registers allow increase the energy efficiency of up to 54.11% when shared memory was used and increase of up to 44.53% when read-only was used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bauer, M., Cook, H., Khailany, B.: Cudadma: optimizing GPU memory bandwidth via warp specialization. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 12:1–12:11. ACM, New York (2011). https://doi.org/10.1145/2063384.2063400. http://doi.acm.org/10.1145/2063384.2063400

  2. de la Cruz, R., Araya-Polo, M.: Towards a multi-level cache performance model for 3D stencil computation. Procedia Comput. Sci. 4, 2146–2155 (2011)

    Article  Google Scholar 

  3. Datta, K., et al.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 4. IEEE Press (2008)

    Google Scholar 

  4. Dong, Y., Chen, J., Tang, T.: Power measurements and analyses of massive object storage system. In: Proceedings of the International Conference on Computer and Information Technology (CIT), pp. 1317–1322. IEEE Computer Society (2010). https://doi.org/10.1109/CIT.2010.237

  5. Falch, T.L., Elster, A.C.: Register caching for stencil computations on GPUs. In: 2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 479–486. IEEE, September 2014. https://doi.org/10.1109/SYNASC.2014.70

  6. Feng, X., Ge, R., Cameron, K.W.: Power and energy profiling of scientific applications on distributed systems. In: International Parallel and Distributed Processing Symposium (IPDPS), International Conference on Performance Engineering, p. 34. IEEE (2005). https://doi.org/10.1109/IPDPS.2005.346

  7. Hamilton, B., Webb, C.J., Gray, A., Bilbao, S.: Large stencil operations for GPU-based 3-d acoustics simulations. In: Proceedings of the Digital Audio Effects (DAFx), Trondheim, Norway (2015)

    Google Scholar 

  8. Laros, J., et al.: Topics on measuring real power usage on high performance computing platforms. In: Proceedings of the International Conference on Cluster Computing and Workshops (ICCC), pp. 1–8 (2009). https://doi.org/10.1109/CLUSTR.2009.5289179

  9. Maruyama, N., Aoki, T.: Optimizing stencil computations for NVIDIA Kepler GPUs. In: Proceedings of the 1st International Workshop on High-Performance Stencil Computations, Vienna, pp. 89–95 (2014)

    Google Scholar 

  10. Micikevicius, P.: 3D finite difference computation on GPUs using CUDA. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2, pp. 79–84. ACM, New York (2009). https://doi.org/10.1145/1513895.1513905. http://doi.acm.org/10.1145/1513895.1513905

  11. Nasciutti, T.C., Panetta, J.: Impacto da arquitetura de memória de GPGPUs na velocidade de computaçãpoundso de estênceis. In: XVII Simpósio de Sistemas Computacionais (WSCAD-SSC), Aracaju, SE, pp. 1–8 (2016)

    Google Scholar 

  12. Nikitin, V.V., Duchkov, A.A., Andersson, F.: Parallel algorithm of 3D wave-packet decomposition of seismic data: implementation and optimization for GPU. J. Comput. Sci. 3(6), 469–473 (2012)

    Article  Google Scholar 

  13. Padoin, E.L., de Oliveira, D.A.G., Velho, P., Navaux, P.O.A., Mehaut, J.F.: ARM-based cluster: performance, scalability and energy efficiency. In: 4th Workshop on Applications for Multi-Core Architectures (WAMCA SBAC-PAD), Porto de Galinhas, PB, Brasil, pp. 1–6 (2013)

    Google Scholar 

  14. Padoin, E.L., Pilla, L.L., Boito, F.Z., Kassick, R.V., Velho, P., Navaux, P.O.: Evaluating application performance and energy consumption on hybrid CPU+GPU architecture. Cluster Comput. 16(3), 511–525 (2013)

    Article  Google Scholar 

  15. Schafer, A., Fey, D.: High performance stencil code algorithms for GPGPUs. Procedia Comput. Sci. 4, 2027–2036 (2011). https://doi.org/10.1016/j.procs.2011.04.221. http://www.sciencedirect.com/science/article/pii/S1877050911002791. proceedings of the International Conference on Computational Science, ICCS 2011

  16. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785. http://doi.acm.org/10.1145/1498765.1498785

  17. Xue, Q., Wang, Y., Zhan, Y., Chang, X.: An efficient GPU implementation for locating micro-seismic sources using 3D elastic wave time-reversal imaging. Comput. Geosci. 82, 89–97 (2015)

    Article  Google Scholar 

  18. Zhou, G., et al.: A novel GPU-accelerated strategy for contingency screening of static security analysis. Int. J. Electr. Power Energy Syst. 83, 33–39 (2016)

    Article  Google Scholar 

  19. Zhou, J., Unat, D., Choi, D.J., Guest, C.C., Cui, Y.: Hands-on performance tuning of 3D finite difference earthquake simulation on GPU fermi chipset. Procedia Comput. Sci. 9, 976–985 (2012)

    Article  Google Scholar 

Download references

Acknowledgments

This research has received funding from the EU H2020 Programme and from MCTI/RNP-Brazil under the HPC4E Project, grant agreement n.o 689772. It was also supported by Intel under the Modern Code project, and the PETROBRAS oil company under Ref. 2016/00133-9. We also thank to RICAP, partially funded by the Ibero-American Program of Science and Technology for Development (CYTED), Ref. 517RT0529.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pablo J. Pavan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pavan, P.J. et al. (2019). Improving Performance and Energy Efficiency of Geophysics Applications on GPU Architectures. In: Meneses, E., Castro, H., Barrios Hernández, C., Ramos-Pollan, R. (eds) High Performance Computing. CARLA 2018. Communications in Computer and Information Science, vol 979. Springer, Cham. https://doi.org/10.1007/978-3-030-16205-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-16205-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-16204-7

  • Online ISBN: 978-3-030-16205-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics