Abstract
Recently automatic system management has attracted much attention on mining system log files for anomaly detection, diagnosis and prediction. An important problem in this area is mining hot clusters of similar anomalies for system management. A hot anomaly cluster is defined as a largest-sized group of similar anomalies, whose similarity satisfies some user-specified constraints. While, some major anomalies have common symptoms and are shared by several hot clusters, these clusters do not have to be disjoint. So this problem could not be easily solved by existing clustering algorithms, such as k-means and EM. In this paper we propose a novel heuristic clustering algorithm, named Hot Clustering (HC), for mining these patterns. The key idea of HC is to group neighboring anomalies into hot clusters based on some heuristic rules. To validate our approach, we perform the experiment on bug reports from Bugzilla database by k-means, EM and HC. The experimental results show that our approach is both efficient and effective for this problem.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Peng, W., Li, T., Ma, S.: Mining logs files for computing system management. In: ICAC 2005, Seattle, WA, USA, pp. 309–310 (2005)
Topol, B., Ogle, D., Pierson, D., Thoenscn, J., Sweitzer, J., Chow, M., Hoffmann, M.A., Durham, P., Telford, R., Sheth, S., Studwell, T.: Automating problem determination: A first step toward self-healing computing systems. In: IBM White Paper (October 2003)
Li, Z., Tan, L., Wang, X., Lu, S., Zhou, Y., Zhai, C.: Have things changed now? an empirical study of bug characteristics in modern open source software. In: ASID 2006, San Jose, California, USA, pp. 25–33 (2006)
Chen, M.Y., Zheng, A.X., Lloyd, J., Jordan, M.I., Brewer, E.A.: Failure diagnosis using decision trees. In: ICAC 2004, New York, NY, USA, pp. 36–43 (2004)
Liang, Y., Zhang, Y., Xiong, H., Sahoo, R., Sivasubramaniam, A.: Failure prediction in ibm bluegene/l event logs. In: ICDM 2007, Omaha, Nebraska, USA, pp. 583–588 (2007)
Srivastava, A.N., Zane-Ulman, B.: Enabling the discovery of recurring anomalies in aerospace problem reports using high-dimensional clustering techniques. In: IEEE Aerospace Conference 2006, p. 17 (2006)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD 1996, Portland, Oregon, USA, pp. 226–231 (1996)
Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia database with noise. In: KDD 1998, New York, NY, USA, pp. 58–65 (1998)
Jiang, D., Pei, J., Zhang, A.: Dhc: A density-based hierarchical clustering method for time series gene expression data. In: BIBE 2003, Bethesda, MD, USA, pp. 393–400 (2003)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)
Mozilla.org Bugzilla (2005), https://bugzilla.mozilla.org
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2-3), 107–145 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, D., Lin, F., Shi, Z., Huang, H. (2010). Mining Hot Clusters of Similar Anomalies for System Management. In: Zhang, BT., Orgun, M.A. (eds) PRICAI 2010: Trends in Artificial Intelligence. PRICAI 2010. Lecture Notes in Computer Science(), vol 6230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15246-7_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-15246-7_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15245-0
Online ISBN: 978-3-642-15246-7
eBook Packages: Computer ScienceComputer Science (R0)