Modelling Fault Assumptions with Structural Failure Models
By formalising fault assumptions, fault models are essential for engineering faulttolerant systems: A fault-tolerant system is designed and evaluated to tolerate all faults that are described by a fault model. The accuracy of a fault model is of particular importance. An inaccurate fault model results in a system that is either not as resilient as expected or less efficient than possible. Unfortunately, many fault models used in the literature abstract away aspects that are relevant in realistic systems. For example, the prevalent threshold models abstract away dependences of faults although there is empirical evidence that dependent faults are relevant in real-world systems [Tang and Iyer, 1992, 1993, Long et al., 1995, Amir and Wool, 1996,Weatherspoon et al., 2002, Bakkaloglu et al., 2002, Yalagandula et al., 2004, Warns et al., 2008]. Abstracting away dependent and, therefore, correlated faults is particularly disturbing as even small correlations have a significant impact on the dependability of a system [Tang and Iyer, 1992,Weatherspoon et al., 2002,Warns et al., 2008]. Likewise, threshold models hide the propagation of faults. As propagating faults have been observed in electric power transmission systems [Dobson et al., 2004, 2005], they are likely to occur in distributed computing systems as well.
KeywordsFault Model Threshold Model Failure Model Faulty Process Crash Failure
Unable to display preview. Download preview PDF.