MUST: A Scalable Approach to Runtime Error Detection in MPI Programs

  • Tobias HilbrichEmail author
  • Martin Schulz
  • Bronis R. de Supinski
  • Matthias S. Müller
Conference paper


The Message-Passing Interface (MPI) is large and complex. Therefore, programming MPI is error prone. Several MPI runtime correctness tools address classes of usage errors, such as deadlocks or non-portable constructs. To our knowledge none of these tools scales to more than about 100 processes. However, some of the current HPC systems use more than 100,000 cores and future systems are expected to use far more. Since errors often depend on the task count used, we need correctness tools that scale to the full system size. We present a novel framework for scalable MPI correctness tools to address this need. Our fine-grained, module-based approach supports rapid prototyping and allows correctness tools built upon it to adapt to different architectures and use cases. The design uses P n MPI to instantiate a tool from a set of individual modules. We present an overview of our design, along with first performance results for a proof of concept implementation.


Message Passing Interface State Tracker Runtime Overhead Type Match Deadlock Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Message Passing Interface Forum: MPI: A Message-Passing Interface Standard. (1995)
  2. 2.
    Message Passing Interface Forum: MPI-2: Extensions to the Message-Passing Interface. (1997)
  3. 3.
    Krammer, B., Bidmon, K., Müller, M.S., Resch, M.M.: MARMOT: An MPI Analysis and Checking Tool. In Joubert, G.R., Nagel, W.E., Peters, F.J., Walter, W.V., eds.: PARCO. Volume 13 of Advances in Parallel Computing., Elsevier (2003) 493–500 Google Scholar
  4. 4.
    Vetter, J.S., de Supinski, B.R.: Dynamic Software Testing of MPI Applications with Umpire. Supercomputing, ACM/IEEE 2000 Conference (04-10 Nov. 2000) 51–51 Google Scholar
  5. 5.
    Schulz, M., de Supinski, B.R.: PNMPI Tools: A Whole Lot Greater Than the Sum of Their Parts. In: Supercomputing 2007 (SC’07). (2007) Google Scholar
  6. 6.
    Hilbrich, T., de Supinski, B.R., Schulz, M., Müller, M.S.: A Graph Based Approach for MPI Deadlock Detection. In: ICS ’09: Proceedings of the 23rd international conference on Supercomputing, New York, NY, USA, ACM (2009) 296–305 Google Scholar
  7. 7.
    Luecke, G.R., Zou, Y., Coyle, J., Hoekstra, J., Kraeva, M.: Deadlock Detection in MPI Programs. Concurrency and Computation: Practice and Experience 14(11) (2002) 911–932 zbMATHCrossRefGoogle Scholar
  8. 8.
    Vakkalanka, S.S., Sharma, S., Gopalakrishnan, G., Kirby, R.M.: ISP: A Tool for Model Checking MPI Programs. In: PPoPP ’08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, New York, NY, USA, ACM (2008) 285–286 Google Scholar
  9. 9.
    Roth, P.C., Arnold, D.C., Miller, B.P.: MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools. In: SC ’03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, Washington, DC, USA, IEEE Computer Society (2003) 21 Google Scholar
  10. 10.
    Brunst, H., Kranzlmüller, D., Nagel, W.E.: Tools for Scalable Parallel Program Analysis - Vampir NG and DeWiz. The International Series in Engineering and Computer Science, Distributed and Parallel Systems 777 (2005) 92–102 Google Scholar
  11. 11.
    Wolf, F., Wylie, B., Abraham, E., Becker, D., Frings, W., Fuerlinger, K., Geimer, M., Hermanns, M., Mohr, B., Moore, S., Szebenyi, Z.: Usage of the SCALASCA Toolset for Scalable Performance Analysis of Large-Scale Parallel Applications. In: Proceedings of the 2nd HLRS Parallel Tools Workshop, Stuttgart, Germany (July 2008) Google Scholar
  12. 12.
    Edwards, D.J., Minsky, M.L.: Recent Improvements in DDT. Technical report, Alinea, Cambridge, MA, USA (1963) Google Scholar
  13. 13.
    Totalview Technologies: Totalview - Parallel and Thread Debugger. (July 2009)

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Tobias Hilbrich
    • 1
    Email author
  • Martin Schulz
  • Bronis R. de Supinski
  • Matthias S. Müller
  1. 1.GWT-TUD GmbHDresdenGermany

Personalised recommendations