A Black-Box Approach for Detecting the Failure Traces

Meng, You; Yu, Lang; Luan, Zhongzhi; Qian, Depei; Xie, Ming; Du, Zhigao

doi:10.1007/978-3-662-43908-1_32

A Black-Box Approach for Detecting the Failure Traces

You Meng⁴,
Lang Yu⁴,
Zhongzhi Luan⁴,
Depei Qian⁴,
Ming Xie⁵ &
…
Zhigao Du⁶

Conference paper
First Online: 01 January 2014

1181 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 426))

Abstract

Detecting failure traces can help system administrators timely recover from those failures and avoid them afterwards. For system managers, it is not difficult to detect whether a failure is currently occurring, because they only concern about several key measurements. If these measurements exceed the normal threshold, a failure event should be generated. But it is much more complicated to detect the failure traces which represented as failure related events. Because these failure traces may last for quite a long time and effect many components. Furthermore, current distributed system adds and removes new components so quickly that administrators may not have enough time and knowledge to set monitoring threshold for each of them. Based on these problems, we propose our FTD system. We first compare each component’s historical state and get outlier states as anomalous event. And then, combined with the failure event that the system provided, we detect the event correlations between failure events and anomalous events as failure traces. A network intrusion benchmark KDD99 is used to evaluate our work and we achieve good performances.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bahl, P., Chandra, R., Greenberg, A., Kandula, S., Maltz, D.A., Zhang, M.: Towards highly reliable enterprise network services via inference of multi-level dependencies, pp. 13–24
Google Scholar
Aguilera, M.K., Mogul, J.C., Wiener, J.L., Reynolds, P., Muthitacharoen, A.: Performance debugging for distributed systems of black boxes, pp. 74–89
Google Scholar
Reynolds, P., Wiener, J.L., Mogul, J.C., Aguilera, M.K., Vahdat, A.: WAP5: black-box performance debugging for wide-area systems, pp. 347–356
Google Scholar
Sigelman, B.H., Barroso, L.A., Burrows, M., Stephenson, P., Plakal, M., Beaver, D., Jaspan, S., Shanbhag, C.: Dapper, a large-scale distributed systems tracing infrastructure. Google research (2010)
Google Scholar
Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data, pp. 37–46
Google Scholar
Boudjeloud-Assala, L.: Visual interactive evolutionary algorithm for high dimensional outlier detection and data clustering problems. Int. J. Bio-Inspired Comput. 4(1), 6–13 (2012)
Article Google Scholar
Patcha, A., Park, J.-M.: An overview of anomaly detection techniques: existing solutions and latest technological trends. Comput. Netw. 51(12), 3448–3470 (2007)
Article Google Scholar
Kim, M., Sumbaly, R., Shah, S.: Root cause detection in a service-oriented architecture, pp. 93–104
Google Scholar
Tati, S., Ko, B.J., Cao, G., Swami, A., La Porta, T.: Adaptive algorithms for diagnosing large-scale failures in computer networks, pp. 1–12
Google Scholar
Bronevetsky, G., Laguna, I., de Supinski, B.R., Bagchi, S.: Automatic fault characterization via abnormality-enhanced classification, pp. 1–12
Google Scholar
Su, L., Han, W.-H., Yang, S.-Q., Zou, P., Jia, Y.: Continuous adaptive outlier detection on distributed data streams. In: Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds.) HPCC 2007. LNCS, vol. 4782, pp. 74–85. Springer, Heidelberg (2007)
Chapter Google Scholar
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Wang, P., Wang, H., Liu, M., et al.: An algorithmic approach to event summarization. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 183–194. ACM (2010)
Google Scholar

Download references

Acknowledgement

This research was supported by National 863 Program (No. 2011AA01A203), National Natural Science Foundation (61133004), P. R. China.

Author information

Authors and Affiliations

Sino-German Joint Software Institute, Beihang University, Beijing, China
You Meng, Lang Yu, Zhongzhi Luan & Depei Qian
Tencent Corporation, Shenzhen, China
Ming Xie
CNPC Research Institute of Safety and Environment Technology, Beijing, China
Zhigao Du

Authors

You Meng
View author publications
You can also search for this author in PubMed Google Scholar
Lang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Zhongzhi Luan
View author publications
You can also search for this author in PubMed Google Scholar
Depei Qian
View author publications
You can also search for this author in PubMed Google Scholar
Ming Xie
View author publications
You can also search for this author in PubMed Google Scholar
Zhigao Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to You Meng .

Editor information

Editors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
Yuyu Yuan
Beijing University of Posts and Telecommunications, Beijing, China
Xu Wu
Beijing University of Posts and Telecommunications, Beijing, China
Yueming Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meng, Y., Yu, L., Luan, Z., Qian, D., Xie, M., Du, Z. (2014). A Black-Box Approach for Detecting the Failure Traces. In: Yuan, Y., Wu, X., Lu, Y. (eds) Trustworthy Computing and Services. ISCTCS 2013. Communications in Computer and Information Science, vol 426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43908-1_32

Download citation

DOI: https://doi.org/10.1007/978-3-662-43908-1_32
Published: 27 June 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43907-4
Online ISBN: 978-3-662-43908-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics