Abstract
In this paper, a new scheme for recovering errors due to transient faults in a real-time multiprocessor system is presented. The scheme, called dynamic redundancy at the task level, is implemented in a real-time multitasking environment. Utilizing the facilities in the operating system, the scheme makes backup tasks for the primary tasks as redundancy. The paper introduces an algorithm to generate a fault tolerant schedule for the tasks so that they recover errors as retry or checkpointing does. A reliability model is proposed to evahuste the effectiveness of the scheme.
References
Y.W. Ng and A. Avizienis, A unified reliability model for fault tolerant computers.IEEE Trans. Comput., 1980,C-29 (11). 1002–1011.
D.P. Siewiorek and R.S. Swarz, The Theory and Practice of Reliable System Design. Bedford MA: Digital Press, 1982.
X. Castillo, S.R. McConnel and D.P. Siewiorek, Derivation and calibration of a transient error reliability model.IEEE Trans. Comput., 1982, C-31 (7), 658–671.
iRMX86 Nucleus Reference Manual. Intel Corporation, 5200 NE Elam Young Parkway, Hillsboro OR 97123.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Li, W., Yuan, Y. Error recovery in a real-time multiprocessor system. J. of Comput. Sci. & Technol. 7, 83–87 (1992). https://doi.org/10.1007/BF02946170
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02946170