Skip to main content

A Black-Box Approach for Detecting the Failure Traces

  • Conference paper
  • First Online:
  • 1181 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 426))

Abstract

Detecting failure traces can help system administrators timely recover from those failures and avoid them afterwards. For system managers, it is not difficult to detect whether a failure is currently occurring, because they only concern about several key measurements. If these measurements exceed the normal threshold, a failure event should be generated. But it is much more complicated to detect the failure traces which represented as failure related events. Because these failure traces may last for quite a long time and effect many components. Furthermore, current distributed system adds and removes new components so quickly that administrators may not have enough time and knowledge to set monitoring threshold for each of them. Based on these problems, we propose our FTD system. We first compare each component’s historical state and get outlier states as anomalous event. And then, combined with the failure event that the system provided, we detect the event correlations between failure events and anomalous events as failure traces. A network intrusion benchmark KDD99 is used to evaluate our work and we achieve good performances.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bahl, P., Chandra, R., Greenberg, A., Kandula, S., Maltz, D.A., Zhang, M.: Towards highly reliable enterprise network services via inference of multi-level dependencies, pp. 13–24

    Google Scholar 

  2. Aguilera, M.K., Mogul, J.C., Wiener, J.L., Reynolds, P., Muthitacharoen, A.: Performance debugging for distributed systems of black boxes, pp. 74–89

    Google Scholar 

  3. Reynolds, P., Wiener, J.L., Mogul, J.C., Aguilera, M.K., Vahdat, A.: WAP5: black-box performance debugging for wide-area systems, pp. 347–356

    Google Scholar 

  4. Sigelman, B.H., Barroso, L.A., Burrows, M., Stephenson, P., Plakal, M., Beaver, D., Jaspan, S., Shanbhag, C.: Dapper, a large-scale distributed systems tracing infrastructure. Google research (2010)

    Google Scholar 

  5. Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data, pp. 37–46

    Google Scholar 

  6. Boudjeloud-Assala, L.: Visual interactive evolutionary algorithm for high dimensional outlier detection and data clustering problems. Int. J. Bio-Inspired Comput. 4(1), 6–13 (2012)

    Article  Google Scholar 

  7. Patcha, A., Park, J.-M.: An overview of anomaly detection techniques: existing solutions and latest technological trends. Comput. Netw. 51(12), 3448–3470 (2007)

    Article  Google Scholar 

  8. Kim, M., Sumbaly, R., Shah, S.: Root cause detection in a service-oriented architecture, pp. 93–104

    Google Scholar 

  9. Tati, S., Ko, B.J., Cao, G., Swami, A., La Porta, T.: Adaptive algorithms for diagnosing large-scale failures in computer networks, pp. 1–12

    Google Scholar 

  10. Bronevetsky, G., Laguna, I., de Supinski, B.R., Bagchi, S.: Automatic fault characterization via abnormality-enhanced classification, pp. 1–12

    Google Scholar 

  11. Su, L., Han, W.-H., Yang, S.-Q., Zou, P., Jia, Y.: Continuous adaptive outlier detection on distributed data streams. In: Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds.) HPCC 2007. LNCS, vol. 4782, pp. 74–85. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

  13. Wang, P., Wang, H., Liu, M., et al.: An algorithmic approach to event summarization. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 183–194. ACM (2010)

    Google Scholar 

Download references

Acknowledgement

This research was supported by National 863 Program (No. 2011AA01A203), National Natural Science Foundation (61133004), P. R. China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to You Meng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Meng, Y., Yu, L., Luan, Z., Qian, D., Xie, M., Du, Z. (2014). A Black-Box Approach for Detecting the Failure Traces. In: Yuan, Y., Wu, X., Lu, Y. (eds) Trustworthy Computing and Services. ISCTCS 2013. Communications in Computer and Information Science, vol 426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43908-1_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43908-1_32

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43907-4

  • Online ISBN: 978-3-662-43908-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics