Modelling and Verification of a Checkpoint-Restart-Protocol
The paper presents a protocol for the treatment of temporary local crashes and communication failures in systems of communicating processes. Assuming the existence of a global partitioning of the processes’ interactions into a sequence of segments as well as the availability of a reliable memory to each participant, the recovery strategy lies in the provident permanent recording and discarding of checkpoint data and, in cases of failure, common restarts from some recent global checkpoint (boundary between two segments). Apart from the inherent limitations imposed by data flow and resources the processes may run totally asynchronously. The protocol is modelled by a PrT-net, which permits to formally state and verify the protocol’s service, namely guaranteeing liveness and the eventual progress of the system’s actions. The final part of this paper is devoted to the structure and basic ideas of this proof.
KeywordsMaster Process Communication Failure Firing Sequence Wait State Firing Rule
Unable to display preview. Download preview PDF.
- Baumgarten, B., and Ochsenschläger, P. On Termination and Phase Changes in the Presence of Unreliable Communication Arbeitspapier der GMD, St. Augustin (1984)Google Scholar
- Baumgarten, B., and Ochsenschläger, P. Modellierung und Verifikation eines Checkpoint-/Restart-Verfahrens Arbeitspapier der GMD, St. Augustin (1984)Google Scholar