Skip to main content
Log in

Adapting backward error recovery to parallel real time systems

  • Regular Papers
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The problem of adapting backward error recovery to parallel real time systems is discussed in this paper. Because of error propagation among different cooperating processes, an error occurring in one process may influence some important outputs in other processes. Therefore, a local output has to be delayed until its validity is confirmed globally. Since backward error recovery adopts redundancy of computing time instead of processing equipment, the variation of the actual execution time of a cooperating process may be very large if it works in an unreliable environment. These problems are the primary obstacles to be removed. Previous studies focus their attentions on how to eliminate domino-effect dynamically. But backward error recovery cannot be applied directly in parallel real time systems even under the condition that no domino-effect exists. How to reduce output delays efficiently if no domino-effect remains? How to estimate this delay time? How to calculate the actual execution time of every process and how to schedule these processes under an unstable condition? These problems were omitted in literature unfortunately. The interest of this paper is to provide satisfactory solutions to these problems to make it possible to adopt backward error recovery efficiently in parallel real time systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. B. Randellet al., Reliability issues in computing system designs.Computing Surveys. 1975, 10(2),

  2. I. Koren, Z. Koren and S. Su, Analysis of a class of recovery procedures.IEEE Trans. on Computers, 1986, C-35(8), 703–712.

    Article  Google Scholar 

  3. R. H. Campbell and B. Randell, Error recovery in asynchronous systems.IEEE Trans. on Software Engineering, 1986, SE-12(8), 811–826.

    Google Scholar 

  4. D. P. Siewiorek, Architecture of fault-tolerant computers.IEEE Computer, 1984, 17(8), 9–18.

    Google Scholar 

  5. K. G. Shin, T.-H. Lin and Y.-H. Lee. Optimal checkpointing of real-time tasks.IEEE Trans. on Software Engineering, 1987, SE-36(11), 1328–1341.

    Google Scholar 

  6. K. Kant and A. Silberschatz, Error propagation and recovery in concurrent environments.The Computer Journal, 1984, 28(5).

  7. K. G. Shin and T.-H. Lin, Modeling and measurement of error propagation in a multimodule computing system.IEEE Trans. on Computers, 1988, C-37(9), 1053–1066.

    Article  Google Scholar 

  8. R. Koo and S. Toueg, Checkpointing and rollback-recovery for distributed systems.IEEE Trans. on Software Engineering, 1987, SE-13(1), 23–31.

    Article  MATH  Google Scholar 

  9. A. Ciuffoletti, Error recovery in systems of communicating processes. Proc. 7th Int'1 Conf. on Software Engineering, 1984.

  10. R. E. Strom and S. Yemini, Optimistic recovery—An asynchronous approach to fault-tolerance in distributed system.Proc. FTCS-14, 1984.

  11. D. Zhou, A recovery technique for distributed communicating process systemsJ. of Comput. Sci. & Technol. (ISSN 1000-9000), 1986, 1(2), 32–41.

    Article  MATH  Google Scholar 

  12. D. Zhou, Eliminating domino effect in backward error recovery in distributed systems. Proc. 2nd Int'l Conf. on Compt. and Appl., Beijing, July, 1987.

  13. D. Zhou and X. Xu, A distributed error recovery technique and its implementation and application on UNIX.J. of Comput. Sci. & Technol. (ISSN 1000-9000), 1990, 5(2), 127–138.

    Article  MATH  Google Scholar 

  14. K. G. Shin, Y.-H. Lee, Evaluation of error recovery blocks used for cooperating processes.IEEE Trans. on Software Engineering, 1984, SE-10(6).

  15. K. J. Lin, S. Natarajan, J. W.-s. Liu and T. Krauskopf, Concord: A system of imprecise computations. Proc. 1987 IEEE Compsac. Japan, Oct., 1987.

  16. K. J. Lin, S. Natarajan and J. W.-s. Liu, Imprecise results: Utilizing partial computations in real-time systems. Proc. IEEE Real-Time Syst. Symp., 1987.

  17. G. Färber, Prozessrechnentechnik, pp. 132–142. Springer-Verlag, Berlin, Heidelberg, New York, 1979.

    Google Scholar 

  18. E. G. Coffman Jr. and R. Graham,Scheduling Theory. New York: Wiley, 1976.

    MATH  Google Scholar 

  19. R. Henn, Deterministische modelle für die prozessorzuteilung in einer harten realzeit-umgebung.Doktorarbeit, Fachbereich Mathematik, TU München, 1975.

  20. J.-y. Chung, J. W.-s. Liu and K.-j. Lin, Scheduling periodic jobs that allow imprecise results.IEEE Trans. on Computers, 1990, C-39(9), 1156–1174.

    Article  Google Scholar 

  21. M. H. Woodbury and K. G. Shin, Measurement and analysis of workload effects on fault latency in real-time systems.IEEE Trans. on Software Engineering, 1990, 16(2) 212–216.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, D. Adapting backward error recovery to parallel real time systems. J. of Comput. Sci. & Technol. 7, 257–267 (1992). https://doi.org/10.1007/BF02946576

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02946576

Keywords

Navigation