Skip to main content

Architecture-Based Run-Time Fault Diagnosis

  • Conference paper
Software Architecture (ECSA 2011)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6903))

Included in the following conference series:

Abstract

An important step in achieving robustness to run-time faults is the ability to detect and repair problems when they arise in a running system. Effective fault detection and repair could be greatly enhanced by run-time fault diagnosis and localization, since it would allow the repair mechanisms to focus adaptation effort on the parts most in need of attention. In this paper we describe an approach to run-time fault diagnosis that combines architectural models with spectrum-based reasoning for multiple fault localization. Spectrum-based reasoning is a lightweight technique that takes a form of trace abstraction and produces a list (ordered by probability) of likely fault candidates. We show how this technique can be combined with architectural models to support run-time diagnosis that can (a) scale to modern distributed software systems; (b) accommodate the use of black-box components and proprietary infrastructure for which one has neither a specification nor source code; and (c) handle inherent uncertainty about the probable cause of a problem even in the face of transient faults and faults that arise only when certain combinations of system components interact.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abreu, R., van Gemund, A.J.C.: Diagnosing multiple intermittent failures using maximum likelihood estimation. Artif. Intell. 174(18), 1481–1497 (2010)

    Article  MathSciNet  Google Scholar 

  2. Abreu, R., Zoeteweij, P., van Gemund, A.J.C.: On the accuracy of spectrum-based fault localization. In: Proc. of TAIC PART 2007. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

  3. Abreu, R., Zoeteweij, P., van Gemund, A.J.C.: An observation-based model for fault localization. In: Proc. of WODA 2008. ACM Press, New York (2008)

    Google Scholar 

  4. Abreu, R., Zoeteweij, P., van Gemund, A.J.C.: Spectrum-based multiple fault localization. In: Taentzer, G., Heimdahl, M. (eds.) Proc. of ASE 2009. IEEE Computer Society, Los Alamitos (2009)

    Google Scholar 

  5. Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., Fox, A.: Microreboot: A technique for cheap recovery. In: Proc. OSDI 2004, San Francisco, CA (2004)

    Google Scholar 

  6. Cheng, B.H.C., de Lemos, R., Garlan, D., Giese, H., Litoiu, M., Magee, J., Müller, H.A., Pezzè, M., Taylor, R. (eds.): Proc. of SEAMS 2010. ACM Press, New York (2010)

    Google Scholar 

  7. Cheng, S.-W., Garlan, D., Schmerl, B.: Architecture-based self-adaptation in the presence of multiple objectives. In: Proc. of SEAMS 2006, May 21-22 (2006)

    Google Scholar 

  8. Cutting, D.: The hadoop framework (2010)

    Google Scholar 

  9. de Kleer, J.: Diagnosing intermittent faults. In: Biswas, G., Koutsoukos, X., Abdelwahed, S. (eds.) Proceedings of the 18th International Workshop on Principles of Diagnosis (DX 2007), Nashville, Tennessee, USA, May 29-31, pp. 45–51 (2007)

    Google Scholar 

  10. de Kleer, J.: Diagnosing multiple persistent and intermittent faults. In: Proc. of IJCAI 2009. AAAI Press, Menlo Park (2009)

    Google Scholar 

  11. de Kleer, J., Williams, B.C.: Diagnosing multiple faults. Artificial Intelligence 32(1), 97–130 (1987)

    Article  MATH  Google Scholar 

  12. Dobson, S.A., Strassner, J., Parashar, M., Shehory, O. (eds.): Proc. of ICAC 2009. ACM Press, New York (2009)

    Google Scholar 

  13. Fowler, M.: UML Distilled: A Brief Guide to the Standard Object Modeling Language, 3rd edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2003)

    Google Scholar 

  14. Garlan, D., Cheng, S.-W., Huang, A.-C., Schmerl, B., Steenkiste, P.: Rainbow: Architecture-based self adaptation with reusable infrastructure. IEEE Computer 37(10) (October 2004)

    Google Scholar 

  15. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. In: Proceedings of the Symposium on Operating Systems Principles. ACM, New York (2003)

    Google Scholar 

  16. Ghosh, D., Sharman, R., Raghav Rao, H., Upadhyaya, S.: Self-healing systems - survey and synthesis. Decis. Support Syst. 42, 2164–2185 (2007)

    Article  Google Scholar 

  17. Harrold, M.J., Rothermel, G., Sayre, K., Wu, R., Yi, L.: An empirical investigation of the relationship between spectra differences and regression faults. Software Testing, Verification and Reliability 10(3), 171–194 (2000)

    Article  Google Scholar 

  18. Jones, J.A., Harrold, M.J., Stasko, J.T.: Visualization of test information to assist fault localization. In: Proc. of ICSE 2002. ACM Press, New York (2002)

    Google Scholar 

  19. Kephart, J., Chess, D.: The vision of autonomic computing. Computer 36(1) (2003)

    Google Scholar 

  20. Kiviluoma, K., Koskinen, J., Mikkonen, T.: Run-time monitoring of architecturally significant behaviors using behavioral profiles and aspects. In: Proc. of ISSTA 2006. ACM Press, New York (2006)

    Google Scholar 

  21. Kolettis, N., Fulton, N.D.: Software rejuvenation: Analysis, module and applications. In: Proc. of FTCS 1995. IEEE Computer Society, Washington, DC, USA (1995)

    Google Scholar 

  22. Korel, B., Laski, J.: Dynamic program slicing. Information Processing Letters 29, 155–163 (1988)

    Article  MATH  Google Scholar 

  23. Kramer, J., Magee, J.: A rigorous architectural approach to adaptive software engineering. J. Comput. Sci. Technol. 24, 183–188 (2009)

    Article  Google Scholar 

  24. Liblit, B., Naik, M., Zheng, A.X., Aiken, A., Jordan, M.I.: Scalable statistical bug isolation. In: Proc. of PLDI 2005, Chicago, Illinois, USA (2005)

    Google Scholar 

  25. Liu, C., Fei, L., Yan, X., Han, J., Midkiff, S.P.: Statistical debugging: A hypothesis testing-based approach. IEEE Transactions on Software Engineering (TSE) 32(10), 831–848 (2006)

    Article  Google Scholar 

  26. Mayer, W., Stumptner, M.: Evaluating models for model-based debugging. In: Proc. of ASE 2008 (2008)

    Google Scholar 

  27. Mikic-Rakic, M., Mehta, N., Medvidovic, N.: Architectural style requirements for self-healing systems. In: Proceedings of the First Workshop on Self-Healing Systems, WOSS 2002, pp. 49–54. ACM, New York (2002)

    Chapter  Google Scholar 

  28. Palviainen, M., Evesti, A., Ovaska, E.: The reliability estimation, prediction and measuring of component-based software. Journal of Systems and Software 84(6), 1054–1070 (2011)

    Article  Google Scholar 

  29. Schmerl, B., Aldrich, J., Garlan, D., Kazman, R., Yan, H.: Discovering Architectures from Running Systems. IEEE Transactions on Software Engineering 32(7), 454–466 (2006)

    Article  Google Scholar 

  30. Trivedi, K.S., Vaidyanathan, K.: Software aging and rejuvenation. In: Wiley Encyclopedia of Computer Science and Engineering. John Wiley & Sons, Inc., Chichester (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Casanova, P., Schmerl, B., Garlan, D., Abreu, R. (2011). Architecture-Based Run-Time Fault Diagnosis. In: Crnkovic, I., Gruhn, V., Book, M. (eds) Software Architecture. ECSA 2011. Lecture Notes in Computer Science, vol 6903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23798-0_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23798-0_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23797-3

  • Online ISBN: 978-3-642-23798-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics