Skip to main content

Porting the Princeton Ocean Model to GPUs

  • Conference paper
Algorithms and Architectures for Parallel Processing (ICA3PP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8630))

Abstract

While GPU is becoming a compelling acceleration solution for a series of scientific applications, most existing work on climate models only achieved limited speedup. It is due to partial porting of the huge code and the memory bound inherence of these models. In this work, we design and implement a customized GPU-based acceleration of the Princeton Ocean Model (gpuPOM). Based on Nvidia’s state-of-the-art GPU architectures (K20X and K40m), we rewrite the original model from the Fortran into the CUDA-C completely. Several accelerating methods, including optimizing memory access in a single GPU, overlapping communication and boundary operations among multiple GPUs, are presented. The experimental results show that the gpuPOM on one K40m GPU achieves 6.9-fold to 17.8-fold speedup and 5.8-fold to 15.5-fold speedup on one K20X GPU comparing with different Intel CPUs. Further experiments on multiple GPUs indicate that the performance of the gpuPOM on a super-workstation containing 4 GPUs is equivalent to a powerful cluster consisting of 34 pure CPU nodes with over 400 CPU cores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Michalakes, J., Vachharajani, M.: Gpu acceleration of numerical weather prediction. Parallel Processing Letters 18(04), 531–548 (2008)

    Article  MathSciNet  Google Scholar 

  2. Shimokawabe, T., Aoki, T., Muroi, C., Ishida, J., Kawano, K., Endo, T., Nukada, A., Maruyama, N., Matsuoka, S.: An 80-fold speedup, 15.0 tflops full gpu acceleration of non-hydrostatic weather model asuca production code. In: IEEE 2010 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–11 (2010)

    Google Scholar 

  3. Fuhrer, O., Osuna, C., Lapillonne, X., Gysi, T., Bianco, M., Schulthess, T.: Towards gpu-accelerated operational weather forecasting. In: The GPU Technology Conference, GTC 2013 (2013)

    Google Scholar 

  4. Kelly, R.: Gpu computing for atmospheric modeling. Computing in Science & Engineering 12(4), 26–33 (2010)

    Article  Google Scholar 

  5. Mak, J., Choboter, P., Lupo, C.: Numerical ocean modeling and simulation with cuda. In: IEEE OCEANS, pp. 1–6 (2011)

    Google Scholar 

  6. Carpenter, I., Archibald, R., Evans, K.J., Larkin, J., Micikevicius, P., Norman, M., Rosinski, J., Schwarzmeier, J., Taylor, M.A.: Progress towards accelerating homme on hybrid multi-core systems. International Journal of High Performance Computing Applications 27(3), 335–347 (2013)

    Article  Google Scholar 

  7. Govett, M., Middlecoff, J., Henderson, T.: Running the nim next-generation weather model on gpus. In: IEEE, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), pp. 792–796 (2010)

    Google Scholar 

  8. Oey, L.Y., Lee, H.C., Schmitz, W.J.: Effects of winds and caribbean eddies on the frequency of loop current eddy shedding: A numerical model study. Journal of Geophysical Research: Oceans (1978–2012) 108(C10) (2003)

    Google Scholar 

  9. Blumberg, A.F., Mellor, G.L.: A description of a three-dimensional coastal ocean circulation model. Coastal and Estuarine Sciences 4, 1–16 (1987)

    Article  Google Scholar 

  10. Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. International Journal of High Performance Computing Applications 14(3), 189–204 (2000)

    Article  Google Scholar 

  11. NVIDIA: CUDA C Programming Guide Version 5.5. available at http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

  12. Jordi, A., Wang, D.P.: sbpom: A parallel implementation of princenton ocean model. Environmental Modelling & Software 38, 59–61 (2012)

    Article  Google Scholar 

  13. Yang, C., Xue, W., Fu, H., Gan, L., Li, L., Xu, Y., Lu, Y., Sun, J., Yang, G., Zheng, W.: A peta-scalable cpu-gpu algorithm for global atmospheric simulations. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 1–12. ACM (2013)

    Google Scholar 

  14. Potluri, S., Wang, H., Bureddy, D., Singh, A.K., Rosales, C., Panda, D.K.: Optimizing mpi communication on multi-gpu systems using cuda inter-process communication. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 1848–1857. IEEE (2012)

    Google Scholar 

  15. Whitehead, N., Fit-Florea, A.: Precision & performance: Floating point and ieee 754 compliance for nvidia gpus. rn (A+ B) 21, 1–1874919424 (2011)

    Google Scholar 

  16. McCalpin, J., Wonnacott, D.: Time skewing: A value-based approach to optimizing for memory locality. Technical report, Technical Report DCS-TR-379, Department of Computer Science, Rugers University (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Xu, S., Huang, X., Zhang, Y., Hu, Y., Fu, H., Yang, G. (2014). Porting the Princeton Ocean Model to GPUs. In: Sun, Xh., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2014. Lecture Notes in Computer Science, vol 8630. Springer, Cham. https://doi.org/10.1007/978-3-319-11197-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11197-1_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11196-4

  • Online ISBN: 978-3-319-11197-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics