Skip to main content

Improving MPI communication overlap with collaborative polling


With the rise of parallel applications complexity, the needs in term of computational power are continually growing. Recent trends in High-Performance Computing (HPC) have shown that improvements in single-core performance will not be sufficient to face the challenges of an exascale machine: we expect an enormous growth of the number of cores as well as a multiplication of the data volume exchanged across compute nodes. To scale applications up to Exascale, the communication layer has to minimize the time while waiting for network messages. This paper presents a message progression based on Collaborative Polling which allows an efficient auto-adaptive overlapping of communication phases by performing computing. This approach is new as it increases the application overlap potential without introducing overheads of a threaded message progression. We designed our approch for Infiniband into a thread-based MPI runtime called MPC. We evaluate the gain from Collaborative Polling on the NAS Parallel Benchmarks and three scientific applications, where we show significant improvements in communication times up to a factor of 2.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. Iii JBW, Bova SW (1999) Where’s the overlap? An analysis of popular MPI implementations. Technical report (August 12 1999)

  2. Brightwell R, Riesen R, Underwood KD (2005) Analyzing the impact of overlap, offload, and independent progress for message passing interface applications. IJHPCA

  3. Pérache M, Carribault P, Jourdren H (2009) MPC-MPI: an MPI implementation reducing the overall memory consumption. In: PVM/MPI

  4. Bell C, Bonachea D, Nishtala R, Yelick KA (2006) Optimizing bandwidth limited problems using one-sided communication and overlap. In: IPDPS

  5. Subotic V, Sancho JC, Labarta J, Valero M (2011) The impact of application’s micro-imbalance on the communication-computation overlap. In: Parallel, distributed and network-based processing (PDP)

  6. Thakur R, Gropp W (2007) Test suite for evaluating performance of MPI implementations that support \(\text{ MPI }\_\text{ THREAD }\_\text{ MULTIPLE }\). In: PVM/MPI. pp 46–55

  7. Hager G, Jost G, Rabenseifner R (2009) Communication characteristics and hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: Proceedings of Cray User Group

  8. Graham R, Poole S, Shamis P, Bloch G, Bloch N, Chapman H, Kagan M, Shahar A, Rabinovitz I, Shainer G (2010) Connectx-2 infiniband management queues: first investigation of the new support for network offloaded collective operations. In: International conference on cluster, cloud and grid computing (CCGRID)

  9. Kamal H, Wagner A (2012) Added concurrency to improve MPI performance on multicore. In: ICPP, IEEE Computer Society, pp 229–238

  10. Almási G, Bellofatto R, Brunheroto J, Caşcaval C, Castaños JG, Crumley P, Erway CC, Lieber D, Martorell X, Moreira JE, Sahoo R, Sanomiya A, Ceze L, Strauss K (2003) An overview of the bluegene/L system software organization. Parallel Process Lett

  11. Amerson G, Apon a (2004) Implementation and design analysis of a network messaging module using virtual interface architecture. In: International conference on cluster computing

  12. Sur S, Jin Hw, Chai L, Panda DK (2006) RDMA read based Rendezvous protocol for MPI over infiniBand: design alternatives and benefits. Alternatives

  13. Kumar R, Mamidala AR, Koop MJ, Santhanaraman G, Panda DK (2008) Lock-free asynchronous rendezvous design for MPI point-to-point communication. In: PVM/MPI

  14. Hoefler T, Lumsdaine A (2008) Message progression in parallel computing to thread or not to thread?. In: International conference on cluster computing

  15. Didelot S, Carribault P, Pérache M, Jalby W (2012) Improving MPI communication overlap with collaborative polling. In: EuroMPI

  16. Trahay F, Denis A (2009) A scalable and generic task scheduling system for communication libraries. In: International conference on cluster computing

  17. Huang C, Lawlor O, Kalé LV (2004) Adaptive MPI. In: LCPC

  18. Rico-Gallego JA, Martín JCD (2011) Performance evaluation of thread-based MPI in shared memory. In: EuroMPI

  19. Demaine E (1997) A threads-only MPI implementation for the development of parallel programming. In: Proceedings of the 11th international symposium on high performance computing systems

  20. Tang H, Yang T (2001) Optimizing threaded MPI execution on SMP clusters. In: International Conference on Supercomputing (ICS)

  21. Carribault P, Pérache M, Jourdren H (2011) Thread-local storage extension to support thread-based MPI/openMP applications. In: Chapman BM, Gropp WD, Kumaran K, Müller MS (eds) IWOMP. Lecturen notes in computer science. Springer, Berlin, pp 80–93

    Google Scholar 

  22. InfiniBand Trade Association: InfiniBand architecture specification

  23. Brightwell R, Pedretti K (2011) An intra-node implementation of openshmem using virtual address space mapping. In: Fifth partitioned global address space conference

  24. Wolff M, Jaouen S, Jourdren H, Sonnendrcker E (2012) High-order dimensionally split lagrange-remap schemes for ideal magnetohydrodynamics. Discrete and Continuous Dynamical Systems - Series S

  25. Bailey D, Harris T, Saphir W, van der Wijngaart R, Woo A, Yarrow M (1995) The NAS Parallel Benchmarks 2.0

  26. Springel V (2005) The cosmological simulation code gadget-2. Monthly Notices of the Royal Astronomical Society 364

  27. Tezuka H, O’Carroll F, Hori A, Ishikawa Y (1998) Pin-down cache: A virtual memory management technique for zero-copy communication. In: IPPS/SPDP, pp 308–314

Download references


This paper is a result of work performed in Exascale Computing Research Lab with support provided by CEA, GENCI, INTEL, and UVSQ. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the CEA, GENCI, INTEL or UVSQ. We acknowledge that the results in this paper have been achieved using the PRACE Research Infrastructure resource Curie based in France at Bruyères-le-Châtel.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Sylvain Didelot.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Didelot, S., Carribault, P., Pérache, M. et al. Improving MPI communication overlap with collaborative polling. Computing 96, 263–278 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Mathematics Subject Classification