Mining Hot Clusters of Similar Anomalies for System Management

Zhang, Dapeng; Lin, Fen; Shi, Zhongzhi; Huang, Heqing

doi:10.1007/978-3-642-15246-7_34

Mining Hot Clusters of Similar Anomalies for System Management

Dapeng Zhang^21,23,24,
Fen Lin^21,23,
Zhongzhi Shi²¹ &
…
Heqing Huang²²

Conference paper

1606 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6230))

Abstract

Recently automatic system management has attracted much attention on mining system log files for anomaly detection, diagnosis and prediction. An important problem in this area is mining hot clusters of similar anomalies for system management. A hot anomaly cluster is defined as a largest-sized group of similar anomalies, whose similarity satisfies some user-specified constraints. While, some major anomalies have common symptoms and are shared by several hot clusters, these clusters do not have to be disjoint. So this problem could not be easily solved by existing clustering algorithms, such as k-means and EM. In this paper we propose a novel heuristic clustering algorithm, named Hot Clustering (HC), for mining these patterns. The key idea of HC is to group neighboring anomalies into hot clusters based on some heuristic rules. To validate our approach, we perform the experiment on bug reports from Bugzilla database by k-means, EM and HC. The experimental results show that our approach is both efficient and effective for this problem.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Peng, W., Li, T., Ma, S.: Mining logs files for computing system management. In: ICAC 2005, Seattle, WA, USA, pp. 309–310 (2005)
Google Scholar
Topol, B., Ogle, D., Pierson, D., Thoenscn, J., Sweitzer, J., Chow, M., Hoffmann, M.A., Durham, P., Telford, R., Sheth, S., Studwell, T.: Automating problem determination: A first step toward self-healing computing systems. In: IBM White Paper (October 2003)
Google Scholar
Li, Z., Tan, L., Wang, X., Lu, S., Zhou, Y., Zhai, C.: Have things changed now? an empirical study of bug characteristics in modern open source software. In: ASID 2006, San Jose, California, USA, pp. 25–33 (2006)
Google Scholar
Chen, M.Y., Zheng, A.X., Lloyd, J., Jordan, M.I., Brewer, E.A.: Failure diagnosis using decision trees. In: ICAC 2004, New York, NY, USA, pp. 36–43 (2004)
Google Scholar
Liang, Y., Zhang, Y., Xiong, H., Sahoo, R., Sivasubramaniam, A.: Failure prediction in ibm bluegene/l event logs. In: ICDM 2007, Omaha, Nebraska, USA, pp. 583–588 (2007)
Google Scholar
Srivastava, A.N., Zane-Ulman, B.: Enabling the discovery of recurring anomalies in aerospace problem reports using high-dimensional clustering techniques. In: IEEE Aerospace Conference 2006, p. 17 (2006)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD 1996, Portland, Oregon, USA, pp. 226–231 (1996)
Google Scholar
Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia database with noise. In: KDD 1998, New York, NY, USA, pp. 58–65 (1998)
Google Scholar
Jiang, D., Pei, J., Zhang, A.: Dhc: A density-based hierarchical clustering method for time series gene expression data. In: BIBE 2003, Bethesda, MD, USA, pp. 393–400 (2003)
Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)
Article MATH Google Scholar
Mozilla.org Bugzilla (2005), https://bugzilla.mozilla.org
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2-3), 107–145 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Dapeng Zhang, Fen Lin & Zhongzhi Shi
Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China
Heqing Huang
Graduate School of the Chinese Academy of Sciences, Beijing, 100039, China
Dapeng Zhang & Fen Lin
Institute of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004, China
Dapeng Zhang

Authors

Dapeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Zhongzhi Shi
View author publications
You can also search for this author in PubMed Google Scholar
Heqing Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, Seoul National University, 151-744, Seoul, Korea
Byoung-Tak Zhang
Department of Computing,, Macquarie University, NSW, Sydney, Australia
Mehmet A. Orgun

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, D., Lin, F., Shi, Z., Huang, H. (2010). Mining Hot Clusters of Similar Anomalies for System Management. In: Zhang, BT., Orgun, M.A. (eds) PRICAI 2010: Trends in Artificial Intelligence. PRICAI 2010. Lecture Notes in Computer Science(), vol 6230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15246-7_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-15246-7_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15245-0
Online ISBN: 978-3-642-15246-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics