Skip to main content

OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters

  • Conference paper
Recent Advances in the Message Passing Interface (EuroMPI 2012)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7490))

Included in the following conference series:

Abstract

General-Purpose Graphics Processing Units (GPGPUs) are becoming a common component of modern supercomputing systems. Many MPI applications are being modified to take advantage of the superior compute potential offered by GPUs. To facilitate this process, many MPI libraries are being extended to support MPI communication from GPU device memory. However, there is lack of a standardized benchmark suite that helps users evaluate common communication models on GPU clusters and do a fair comparison for different MPI libraries. In this paper, we extend the widely used OSU Micro-Benchmarks (OMB) suite with benchmarks that evaluate performance of point-point, multi-pair and collective MPI communication for different GPU cluster configurations. Benefits of the proposed benchmarks for MVAPICH2 and OpenMPI libraries are illustrated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Intel MPI Benchmark, http://www.intel.com/cd/software/products/

  2. Jacket GBENCH, http://www.accelereyes.com/gbench

  3. NAS Parallel Benchmarks, http://www.nas.nasa.gov

  4. Che, S., Sheaffer, J.W., Boyer, M., Szafaryn, L.G., Wang, L., Skadron, K.: A Characterization of the Rodinia Benchmark Suite with Comparison to Contemporary CMP Workloads. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IISWC 2009 (2009)

    Google Scholar 

  5. Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V., Vetter, J.S.: The Scalable HeterOgeneous Computing (SHOC) Benchmark Suite. In: Proceedings of the 3rd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU 2010 (2010)

    Google Scholar 

  6. Ji, F., Aji, A.M., Dinan, J., Buntinas, D., Balaji, P., Feng, W., Ma, X.: Efficient Intranode Communication in GPU-Accelerated Systems. In: Proceedings of AsHES, in conjunction with IPDPS 2012 (2012)

    Google Scholar 

  7. Argonne National Laboratory: MPICH2: High-performance and Widely Portable MPI, http://www.mcs.anl.gov/research/projects/mpich2/

  8. Network-Based Computing Laboratory: MVAPICH: MPI over InfiniBand and 10GigE/iWARP, http://mvapich.cse.ohio-state.edu/

  9. Open MPI: Open Source High Performance Computing, http://www.open-mpi.org

  10. OSU Microbenchmarks, http://mvapich.cse.ohio-state.edu/benchmarks/

  11. Parboil Benchmarks, http://impact.crhc.illinois.edu/parboil.aspx

  12. Portable Hardware Locality (hwloc), http://www.open-mpi.org/projects/hwloc/

  13. Potluri, S., Wang, H., Bureddy, D., Singh, A.K., Rosales, C., Panda, D.K.: Optimizaing MPI Communication on Multi-GPU Systems using CUDA Inter-Process Communication. In: Proceedings of the AsHES, in conjunction with IPDPS 2012 (2012)

    Google Scholar 

  14. Singh, A.K., Potluri, S., Wang, H., Kandalla, K., Sur, S., Panda, D.K.: MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefits. In: Proceedings of the Workshop on Parallel Programming on Accelerator Clusters (PPAC), in conjunction with Cluster 2011 (2011)

    Google Scholar 

  15. Spafford, K., Meredith, J.S., Vetter, J.S.: Quantifying NUMA and Contention Effects in Multi-GPU systems. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU 2011 (2011)

    Google Scholar 

  16. SPEC MPI 2007, http://www.spec.org/mpi/

  17. Wang, H., Potluri, S., Luo, M., Singh, A.K., Ouyang, X., Sur, S., Panda, D.K.: Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2. In: Proceedings of Cluster 2011 (2011)

    Google Scholar 

  18. Wang, H., Potluri, S., Luo, M., Singh, A.K., Sur, S., Panda, D.K.: MVAPICH2-GPU: Optimized GPU to GPU Communication for InfiniBand Clusters. In: Proceedings of the 2011 International Supercomputing Conference, ISC 2011 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bureddy, D., Wang, H., Venkatesh, A., Potluri, S., Panda, D.K. (2012). OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, vol 7490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33518-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33518-1_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33517-4

  • Online ISBN: 978-3-642-33518-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics