Abstract
An important step in achieving robustness to run-time faults is the ability to detect and repair problems when they arise in a running system. Effective fault detection and repair could be greatly enhanced by run-time fault diagnosis and localization, since it would allow the repair mechanisms to focus adaptation effort on the parts most in need of attention. In this paper we describe an approach to run-time fault diagnosis that combines architectural models with spectrum-based reasoning for multiple fault localization. Spectrum-based reasoning is a lightweight technique that takes a form of trace abstraction and produces a list (ordered by probability) of likely fault candidates. We show how this technique can be combined with architectural models to support run-time diagnosis that can (a) scale to modern distributed software systems; (b) accommodate the use of black-box components and proprietary infrastructure for which one has neither a specification nor source code; and (c) handle inherent uncertainty about the probable cause of a problem even in the face of transient faults and faults that arise only when certain combinations of system components interact.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abreu, R., van Gemund, A.J.C.: Diagnosing multiple intermittent failures using maximum likelihood estimation. Artif. Intell. 174(18), 1481–1497 (2010)
Abreu, R., Zoeteweij, P., van Gemund, A.J.C.: On the accuracy of spectrum-based fault localization. In: Proc. of TAIC PART 2007. IEEE Computer Society, Los Alamitos (2007)
Abreu, R., Zoeteweij, P., van Gemund, A.J.C.: An observation-based model for fault localization. In: Proc. of WODA 2008. ACM Press, New York (2008)
Abreu, R., Zoeteweij, P., van Gemund, A.J.C.: Spectrum-based multiple fault localization. In: Taentzer, G., Heimdahl, M. (eds.) Proc. of ASE 2009. IEEE Computer Society, Los Alamitos (2009)
Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., Fox, A.: Microreboot: A technique for cheap recovery. In: Proc. OSDI 2004, San Francisco, CA (2004)
Cheng, B.H.C., de Lemos, R., Garlan, D., Giese, H., Litoiu, M., Magee, J., Müller, H.A., Pezzè, M., Taylor, R. (eds.): Proc. of SEAMS 2010. ACM Press, New York (2010)
Cheng, S.-W., Garlan, D., Schmerl, B.: Architecture-based self-adaptation in the presence of multiple objectives. In: Proc. of SEAMS 2006, May 21-22 (2006)
Cutting, D.: The hadoop framework (2010)
de Kleer, J.: Diagnosing intermittent faults. In: Biswas, G., Koutsoukos, X., Abdelwahed, S. (eds.) Proceedings of the 18th International Workshop on Principles of Diagnosis (DX 2007), Nashville, Tennessee, USA, May 29-31, pp. 45–51 (2007)
de Kleer, J.: Diagnosing multiple persistent and intermittent faults. In: Proc. of IJCAI 2009. AAAI Press, Menlo Park (2009)
de Kleer, J., Williams, B.C.: Diagnosing multiple faults. Artificial Intelligence 32(1), 97–130 (1987)
Dobson, S.A., Strassner, J., Parashar, M., Shehory, O. (eds.): Proc. of ICAC 2009. ACM Press, New York (2009)
Fowler, M.: UML Distilled: A Brief Guide to the Standard Object Modeling Language, 3rd edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2003)
Garlan, D., Cheng, S.-W., Huang, A.-C., Schmerl, B., Steenkiste, P.: Rainbow: Architecture-based self adaptation with reusable infrastructure. IEEE Computer 37(10) (October 2004)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. In: Proceedings of the Symposium on Operating Systems Principles. ACM, New York (2003)
Ghosh, D., Sharman, R., Raghav Rao, H., Upadhyaya, S.: Self-healing systems - survey and synthesis. Decis. Support Syst. 42, 2164–2185 (2007)
Harrold, M.J., Rothermel, G., Sayre, K., Wu, R., Yi, L.: An empirical investigation of the relationship between spectra differences and regression faults. Software Testing, Verification and Reliability 10(3), 171–194 (2000)
Jones, J.A., Harrold, M.J., Stasko, J.T.: Visualization of test information to assist fault localization. In: Proc. of ICSE 2002. ACM Press, New York (2002)
Kephart, J., Chess, D.: The vision of autonomic computing. Computer 36(1) (2003)
Kiviluoma, K., Koskinen, J., Mikkonen, T.: Run-time monitoring of architecturally significant behaviors using behavioral profiles and aspects. In: Proc. of ISSTA 2006. ACM Press, New York (2006)
Kolettis, N., Fulton, N.D.: Software rejuvenation: Analysis, module and applications. In: Proc. of FTCS 1995. IEEE Computer Society, Washington, DC, USA (1995)
Korel, B., Laski, J.: Dynamic program slicing. Information Processing Letters 29, 155–163 (1988)
Kramer, J., Magee, J.: A rigorous architectural approach to adaptive software engineering. J. Comput. Sci. Technol. 24, 183–188 (2009)
Liblit, B., Naik, M., Zheng, A.X., Aiken, A., Jordan, M.I.: Scalable statistical bug isolation. In: Proc. of PLDI 2005, Chicago, Illinois, USA (2005)
Liu, C., Fei, L., Yan, X., Han, J., Midkiff, S.P.: Statistical debugging: A hypothesis testing-based approach. IEEE Transactions on Software Engineering (TSE) 32(10), 831–848 (2006)
Mayer, W., Stumptner, M.: Evaluating models for model-based debugging. In: Proc. of ASE 2008 (2008)
Mikic-Rakic, M., Mehta, N., Medvidovic, N.: Architectural style requirements for self-healing systems. In: Proceedings of the First Workshop on Self-Healing Systems, WOSS 2002, pp. 49–54. ACM, New York (2002)
Palviainen, M., Evesti, A., Ovaska, E.: The reliability estimation, prediction and measuring of component-based software. Journal of Systems and Software 84(6), 1054–1070 (2011)
Schmerl, B., Aldrich, J., Garlan, D., Kazman, R., Yan, H.: Discovering Architectures from Running Systems. IEEE Transactions on Software Engineering 32(7), 454–466 (2006)
Trivedi, K.S., Vaidyanathan, K.: Software aging and rejuvenation. In: Wiley Encyclopedia of Computer Science and Engineering. John Wiley & Sons, Inc., Chichester (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Casanova, P., Schmerl, B., Garlan, D., Abreu, R. (2011). Architecture-Based Run-Time Fault Diagnosis. In: Crnkovic, I., Gruhn, V., Book, M. (eds) Software Architecture. ECSA 2011. Lecture Notes in Computer Science, vol 6903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23798-0_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-23798-0_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23797-3
Online ISBN: 978-3-642-23798-0
eBook Packages: Computer ScienceComputer Science (R0)