Abstract
With increasing size and complexity of Grids manual diagnosis of individual application faults becomes impractical and time-consuming. Quick and accurate identification of the root cause of failures is an important prerequisite for building reliable systems. We describe a pragmatic model-based technique for application-specific fault diagnosis based on indicators, symptoms and rules. Customized wrapper services then apply this knowledge to reason about root causes of failures. In addition to user-provided diagnosis models we show that given a set of past classified fault events it is possible to extract new models through learning that are able to diagnose new faults. We investigated and compared algorithms of supervised classification learning and cluster analysis. Our approach was implemented as part of the Otho Toolkit that ’service-enables’ legacy applications based on synthesis of wrapper service.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abrial, J.-R., Schuman, S.A., Meyer, B.: A specification language. In: McNaughten, R., McKeag, R.C. (eds.) On the Construction of Programs, Cambridge University Press, Cambridge (1980)
Bowring, J., Rehg, J., Harrold, M.J.: Active learning for automatic classification of software behavior. In: ISSTA 2004. Proc. of the Int. Symp. on Software Testing and Analysis (July 2004)
Chen, M., Zheng, A., Lloyd, J., Jordan, M., Brewer, E.: Failure diagnosis using decision trees. In: ICAC. Proc. of Int. Conf. on Autonomic Computing, York, NY (May 2004)
Millo, R., Mathur, A.: A grammar based fault classification scheme and its application to the classification of the errors of tex. Technical Report SERC-TR-165-P, Purdue University (1995)
Duarte, A.N., Brasileiro, F., Cirne, W., Filho, J.S.A.: Collaborative fault diagnosis in grids through automated tests. In: Proc. of the The IEEE 20th Int. Conf. on Advanced Information Networking and Applications, IEEE Computer Society Press, Los Alamitos (2006)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
Hochbaum, Shmoys,: A best possible heuristic for the k-center problem. Mathematics of Operations Research 10(2), 180–184 (1985)
Hofer, J., Fahringer, T.: Presenting Scientific Legacy Programs as Grid Services via Program Synthesis. In: Proceedings of 2nd IEEE International Conference on e-Science and Grid Computing, Amsterdam, Netherlands, December 4-6, 2006, IEEE Computer Society Press, Los Alamitos (2006)
Hofer, J., Fahringer, T.: Specification-based Synthesis of Tailor-made Grid Service Wrappers for Scientific Legacy Codes. In: Grid 2006. Proceedings of 7th IEEE/ACM International Conference on Grid Computing (Grid 2006), Short Paper and Poster, Barcelona, Spain, September 28-29, 2006 (2006)
Hofer, J., Fahringer, T.: The Otho Toolkit - Synthesizing Tailor-made Scientific Grid Application Wrapper Services. Journal of Multiagent and Grid Systems 3(3) (2007)
Hofer, J., Fahringer, T.: Towards automated diagnosis of application faults using wrapper services and machine learning. In: Proceedings of CoreGRID Workshop on Grid Middleware, Dresden, Germany, June 25–26, 2007, pp. 25–26. Springer, Heidelberg (2007)
Horita, Y., Taura, K., Chikayama, T.: A scalable and efficient self-organizing failure detector for grid applications. In: Grid 2005. 6th IEEE/ACM Int. Workshop on Grid Computing, IEEE Computer Society Press, Los Alamitos (2005)
Hwang, S., Kesselman, C.: A flexible framework for fault tolerance in the grid. Journal of Grid Computing 1(3), 251–272 (2003)
Hwang, S., Kesselman, C.: Gridworkflow: A flexible failure handling framework for the grid. In: HPDC 2003. 12th IEEE Int. Symp. on High Performance Distributed Computing, Seattle, Washington, IEEE Press, Los Alamitos (2003)
Jones, C.: Systematic Software Development using VDM. Prentice Hall, Englewood Cliffs (1990)
Kola, G., Kosar, T., Livny, M.: Phoenix: Making data-intensive grid applications fault-tolerant. In: Proc. of 5th IEEE/ACM Int. Workshop on Grid Computing, Pittsburgh, Pennsylvania, November 8, 2004, pp. 251–258 (2004)
Kuhn, D.R.: Fault classes and error detection in specification based testing. ACM Transactions on Software Engineering Methodology 8(4), 411–424 (1999)
Laprie, J.-C.: Dependable computing and fault tolerance: Concepts and terminology. In: Proc. of 15th Int. Symp. on Fault-Tolerant Computing (1985)
Meshkat, L., Allcock, W., Deelman, E., Kesselman, C.: Fault location in grids using bayesian belief networks. Technical Report GriPhyN-2002-8, GriPhyN Project (2002)
Mirgorodskiy, A.V., Maruyama, N., Miller, B.P.: Problem diagnosis in large-scale computing environments. In: Proc. of ACM/IEEE Supercomputing 2006 Conference (2006)
Mitchell, T.M.: Machine Learning. McGraw-Hill, Boston (1997)
Ortmeier, F., Reif, W.: Failure-sensitive Specification - A formal method for finding failure modes. Technical report, University of Augsburg (January 12, 2004)
Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M., Sun, J., Wang, B.: Automated support for classifying software failure reports. In: Proc. of 25th Int. Conf. on Software Engineering, Portland, Oregon, pp. 465–475 (2003)
Smallen, S., Olschanowsky, C., Ericson, K., Beckman, P., Schopf, J.M.: The inca test harness and reporting framework. In: Proc. of the ACM/IEEE Supercomputing’04 Conference (November 2004)
Stelling, P., Foster, I., Kesselman, C., Lee, C., von Laszewski, G.: A fault detection service for wide area distributed computations. In: Proc. 7th IEEE Symp. on High Performance Distributed Computing, pp. 268–278. IEEE Computer Society Press, Los Alamitos (1998)
AustrianGrid, http://www.austriangrid.at
Apache Axis2, http://ws.apache.org/axis2/
POV-Ray, http://www.povray.org
GNU Linear Programming Kit (GLPK), http://www.gnu.org/software/glpk/
Web Service Description Language (WSDL), http://www.w3.org/TR/wsdl
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hofer, J., Fahringer, T. (2007). Grid Application Fault Diagnosis Using Wrapper Services and Machine Learning. In: Krämer, B.J., Lin, KJ., Narasimhan, P. (eds) Service-Oriented Computing – ICSOC 2007. ICSOC 2007. Lecture Notes in Computer Science, vol 4749. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74974-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-74974-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74973-8
Online ISBN: 978-3-540-74974-5
eBook Packages: Computer ScienceComputer Science (R0)