Fault Control Using Triple Modular Redundancy (TMR)

  • Sharon Hudson
  • R. S. Shyama Sundar
  • Srinivas Koppu
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 710)

Abstract

Operating Systems have been widely expanding in terms of capabilities and resources. One of the many unavoidable concerns is the occurrence of a fault in the system. A fault is a violation of the existing system. A fault leads to a single or multiple failure in the system. In order to avoid this type of failure, we need to remove or control the fault. The commonly used techniques for controlling and isolating faults in the system are replication and check pointing. This paper aims to provide control over the detected fault by using the antique technique of triple modular redundancy (TMR) which is a type of N-modular redundancy techniques. Although it has the highest form of reliability, it has not been used to create a fault tolerant system. In our paper, we propose a system using the technique of triple modular redundancy to effectively mask and mitigate the detected faults to provide uninterrupted usage of the entire operating system.

Keywords

Fault control Triple modular redundancy Fault Fault isolation Fault correction Fault tolerance 

References

  1. 1.
    Avizienis, A., & Laprie, J. C. (1986). Dependable computing: From concepts to design diversity. Proceedings of the IEEE, 74(5), 629–638.Google Scholar
  2. 2.
    Patton, R. J. (2015). Fault-tolerant control. Encyclopedia of systems and control, 422–428.Google Scholar
  3. 3.
    NiGlobal (2015) Redundant Systems Basic Concepts – National Instruments. White paper.Google Scholar
  4. 4.
    Kim, E. P., & Shanbhag, N. R. (2012). Soft N-modular redundancy. IEEE Transactions on Computers, 61(3), 323–336.Google Scholar
  5. 5.
    Gils, V. (2013). A triple modular redundancy technique providing multiple-bit error protection without using extra redundancy. IEEE Transactions on Computers, 100(12), 623–631.Google Scholar
  6. 6.
    Balasubramanian, B., & Garg, V. K. (2013). Fault tolerance in distributed systems using fused data structures. IEEE transactions on parallel and distributed systems, 24(4), 701–715.Google Scholar
  7. 7.
    Chen, Z., & Dongarra, J. (2008). Algorithm-based fault tolerance for fail-stop failures. IEEE Transactions on Parallel and Distributed Systems, 19(12), 1628–1641.Google Scholar
  8. 8.
    Kaur, J., & Kinger, S. (2014). Analysis of different techniques used for fault tolerance. IJCSIT) International Journal of Computer Science and Information Technologies, 5(3), 4086–4090.Google Scholar
  9. 9.
    Kola, G., Kosar, T., & Livny, M. (2015, August). Faults in large distributed systems and what we can do about them. In European Conference on Parallel Processing (pp. 442–453). Springer Berlin Heidelberg.Google Scholar
  10. 10.
    Aniruddha Marathe Rachel Harris David K. Lowenthal (2015). “Exploiting Redundancy and Application Scalability for Cost-Effective, Time-Constrained Execution of HPC Applications on Amazon EC2”. In IEEE Transactions on Parallel and Distributed Systems, 2015.Google Scholar
  11. 11.
    Hoffmann, M., Borchert, C., Dietrich, C., Schirmeier, H., Kapitza, R., Spinczyk, O., & Lohmann, D. (2014, June). Effectiveness of fault detection mechanisms in static and dynamic operating system designs. In Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC), 2014 IEEE 17th International Symposium on (pp. 230–237). IEEE.Google Scholar
  12. 12.
    Chen, W., Gong, R., Dai, K., Liu, F., & Wang, Z. (2006, September). Two new space-time triple modular redundancy techniques for improving fault tolerance of computer systems. In Computer and Information Technology, 2006. CIT’06. The Sixth IEEE International Conference on (pp. 175–175). IEEE.Google Scholar
  13. 13.
    Chen, Y. H., Lu, C. W., Shyu, S. S., Lee, C. L., & Ou, T. C. (2014). A multi-stage fault-tolerant multiplier with triple module redundancy (TMR) technique. Journal of Circuits, Systems, and Computers, 23(05), 1450074.Google Scholar
  14. 14.
    George, C., & Vadhiyar, S. (2015). Fault tolerance on large scale systems using adaptive process replication. IEEE Transactions on Computers, 64(8), 2213–2225.Google Scholar
  15. 15.
    Distler, T., Cachin, C., & Kapitza, R. (2016). Resource-efficient Byzantine fault tolerance. IEEE Transactions on Computers, 65(9), 2807–2819.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Sharon Hudson
    • 1
  • R. S. Shyama Sundar
    • 1
  • Srinivas Koppu
    • 1
  1. 1.School of Information and Technology (SITE)VIT UniversityVelloreIndia

Personalised recommendations