MPL: Efficient Record/Replay of nondeterministic features of message passing libraries
A major source of problems when debugging message passing programs is the nondeterministic behavior of the promiscuous receive and nonblocking test operations. This prohibits the use of cyclic debugging techniques because the intrusion caused by a debugger is often large enough to change the order in which processes interact. This paper describes the solutions we propose to efficiently record and replay the nondeterministic features of message passing libraries (MPL) like MPI or PVM. It turns out that for promiscuous receive operations it is sufficient to keep track of the sender of the message, and for nonblocking test-operations to keep track of the number of failed tests. The proposed solutions have been implemented for an existing MPI-library, and performance measurements reveal that the time overhead of both record and replay executions is very low with respect to the (nondeterministic) original execution while the size of the log files remains very small.
Unable to display preview. Download preview PDF.
- 1.J. Briat, I. Ginzburg, and M. Pasin. Athapascan-0 Reference and User Manuals. LMC-IMAG, B.P. 53, F-38041 Grenoble Cedex 9, March 1998. http://www-apache.imag.fr/software/ath0/.
- 2.J. Briat, I. Ginzburg, M. Pasin, and B. Plateau. Athapascan runtime: Efficiency for irregular problems. In Proceedings of the Europar’97 Conference, pages 590–599, Passau, Germany, Aug 1997. Springer Verlag.Google Scholar
- 3.Gerson G. H. Cavalheiro, François Galilée, and Jean-Louis Roch. Athapascan-1: Parallel Programming with Asynchronous Tasks. In Proceedings of the Yale Multithreaded Programming Workshop, Yale, USA, june 1998. http://www-apache.imag.fr/gersonc/publications/yale98.ps.gz.
- 4.J. Chassin de Kergommeaux, M. Ronsse, and K. De Bosschere. Efficient execution replay for athapascan-0 parallel programs. Research Report 3635, INRIA, March 1999. http://www.inria.fr/RRRT/publications-fra.html.
- 5.A. Fagot and J. Chassin de Kergommeaux. Formal and experimental validation of a low-overhead execution replay mechanism. In Proceedings of Euro-Par’95, Stockholm, Sweden, August 1995. Springer-Verlag, LNCS.Google Scholar
- 7.M. Hurfin, N. Plouzeau, and M. Raynal. EREBUS A debugger for asynchronous distributed computing systems. In Proceedings of the 3rd IEEE Workshop on Future Trends in Distributed Computing Systems, Taiwan, April 1992.Google Scholar
- 8.H. Jamrozik. Aide à la Mise au Point des Applications Parall eles et Réparties à base d’Objets Persistants. PhD thesis, Université Joseph Fourier, Grenoble, May 1993.Google Scholar
- 9.D. Kranzlmüller and J. Volkert. Debugging point-to-point communication in mpi and pvm. In Proc. EUROPVM/MPI 98 Intl. Conference, pages 265–272, September 1998.Google Scholar
- 11.E. Leu, A. Schiper, and A. Zramdini. Execution Replay on Distributed Memory Architectures. In Proceedings of the 2nd IEEE Symposium on Parallel and Distributed Processing, pages 106–112, Dallas, USA, December 1990.Google Scholar
- 12.Message Passing Interface Forum, University of Tennessee, Knoxville, Tennessee. MPI: A Message-Passing Standard, May 1994.Google Scholar
- 13.Frank Mueller. A library implementation of POSIX threads under UNIX. In Proc. of the Winter USENIX Conference, pages 29–41, San Diego, CA, January 1993.Google Scholar
- 14.R.H.B. Netzer and B.P. Miller. Optimal Tracing and Replay for Debugging Message-Passing Parallel Programs. In Proceedings of Supercomputing’ 92, Minneapolis, Minnesota, November 1992. Institute of Electrical Engineers Computer Society Press.Google Scholar
- 15.M. Ronsse and L. Levrouw. An experimental evaluation of a replay method for shared memory programs. In E. D’Hollander, G.R. Joubert, F.J. Peters, D. Trystram, K. De Bosschere, and J. Van Campenhout, editors, Parallel Computing: State-of-the-Art and Perspectives, pages 399–406. North-Holland, Gent, 1996.Google Scholar