Efficient Replay of PVM Programs
The paper presents a definition of replay of a distributed application as a function of three parameters: depth, width, and length. It addresses the problem of nondeterminism in distributed system and proposes an efficient approach to trace a PVM application behaviour in order to eliminate races in repetited execution. Detecting races in distributed computations requires implementation of a strongly consistent system of vector clocks. Therefore a system of vector clocks was adapted for a dynamic application model. Finally it presents the architecture of a tool supporting replay of PVM applications.
Unable to display preview. Download preview PDF.
- 1.Dione, C., Feeley, M., Desbiens, J.: A Taxonomy of Distributed Debuggers Based on Execution Replay. Proc. of the International Conference on Parallel and Distributed Techniques and Applications, Sunnyvale, California (1996)Google Scholar
- 3.Fagot, A., de Kergommeaux, J.C.: Systematic Assessment of the Overhead of Tracing Parallel Programs. Proc. of PDP’96, IEEE Computer Society, (1996) 179–186Google Scholar
- 4.Geist, G.A., Beguelin, A., Dongarra, J.J., Jiang, W., Manchek, R., Sunderam, V.S.: PVM: Parallel Virtual Machine, A User’s Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, MA, (1994)Google Scholar
- 7.Lourenço, J., Cunha, J.C.: Replaying Distributed Applications with RPVM. Proc. of DAPSYS’98, (1998)Google Scholar
- 8.Lourenço, J., Cunha, J.C., Krawczyk, H., Kuzora, P., Neyman, M., Wiszniewski, B.: An integrated testing and debugging environment for parallel and distributed programs. Proc. of the 23rd Euromicro Conference (EUROMICRO’97), IEEE Computer Society Press, Budapest, Hungary, (1997) 291–298Google Scholar
- 9.Mackey, M.: Program Replay in PVM. Technical Report, Hewlett Packard, Concurrent Computing Department, Hewlett Packard Laboratories, (1993)Google Scholar
- 10.Neyman, M.: Non-deterministic Recovery of Computations in Testing of Distributed Systems. Proc. of Ninth European Workshop on Dependable Computing, (1998) 114–117Google Scholar
- 12.Raynal, M., Singhal, M.: Logical Time: Capturing Causality in Distributed Systems. IEEE Computer, 1 (1996) 49–56Google Scholar