Abstract
Mining patterns from the log messages is valuable for real-time analysis and detecting faults, anomaly and security threats. A data-streaming algorithm with an efficient pattern finding approach is more practical way to classify these ubiquitous logs. Thus, in this paper the authors propose a novel online approach for finding patterns in log data sets where a locally sensitive signature is generated for similar log messages. The similarity of these log messages is identified by parsing log messages and then, logically analyzing the signature bit stream associated with them. In addition to that the approach is intelligent enough to reflect the changes when a totally new log appears in the system. The validation of the proposed method is done by comparing F-measure of clustering results for labeled datasets and the word order matched percentage of the log messages in a cluster for unlabeled case with that of SLCT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Vaarandi, R.: A data clustering algorithm for mining patterns from event logs. In: Proceedings of the IEEE IPOM 2003, pp. 119–126 (2003)
Makanju, A., Brooks, S., Zincir-Heywood, A.N., Milios, E.E.: Logview: Visualizing event log clusters. In: Sixth Annual Conference on Privacy, Security and Trust, PST 2008, pp. 99–108 (2008)
Muller-Molina, A.J., Shinohara, T.: Efficient similarity search by reducing i/o with compressed sketches. In: Proceedings of the Second International Workshop on Similarity Search and Applications, SISAP 2009, pp. 30–38. IEEE Computer Society, Washington, DC (2009)
Hansen, S.E., Atkins, E.T., Todd, E.: Automated system monitoring and notification with swatch. In: Proceedings of the 7th Systems Administration Conference, NMonterey, CA, pp. 145–155 (1993)
Stearley, J., Corwell, S., Lord, K.: Bridging the gaps: Joining information sources with splunk. In: Proceedings of the Workshop on Managing Systems via Log Analysis and Machine Learning Techniques (2010)
Yamanishi, K., Maruyama, Y.: Dynamic syslog mining for network failure monitoring. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD 2005, pp. 499–508. ACM, New York (2005)
Seipel, D., Neubeck, P., Köhler, S., Atzmueller, M.: Mining complex event patterns in computer networks. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2012. LNCS, vol. 7765, pp. 33–48. Springer, Heidelberg (2013)
Nagappan, M., Vouk, M.A.: Abstracting log lines to log event types for mining software system logs. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR), pp. 114–117 (2010)
Mannila, H., Toivonen, H., Inkeri Verkamo, A.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1, 259–289 (1997)
Zheng, Q., Xu, K., Lv, W., Ma, S.: Intelligent search of correlated alarms from database containing noise data. In: 2002 IEEE/IFIP Network Operations and Management Symposium, NOMS 2002, pp. 405–419 (2002)
Wen, L., Wang, J., Aalst, W., Huang, B., Sun, J.: A novel approach for process mining based on event types. Journal of Intelligent Information Systems 32, 163–190 (2009)
Makanju, A.A., Zincir-Heywood, A.N., Milios, E.E.: Clustering event logs using iterative partitioning. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 1255–1264. ACM, New York (2009)
Demetrescu, C., Finocchi, I.: Algorithms for data streams. Handbook of Applied Algorithms: Solving Scientific, Engineering, and Practical Problems, 241 (2007)
Andoni, A.: Nearest Neighbor Search: the Old, the New, and the Impossible. PhD thesis, Massachusetts Institute of Technology (2009)
Panigrahy, R.: Hashing, Searching, Sketching. PhD thesis, Stanford University (2006)
Paulev, L., Jgou, H., Amsaleg, L.: Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognition Letters 31, 1348–1358 (2010)
Slaney, M., Lifshits, Y., He, J.: Optimal parameters for locality-sensitive hashing. Proceedings of the IEEE 100, 2604–2623 (2012)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006, pp. 459–468 (2006)
Broder, A., Mitzenmacher, M.: Network applications of bloom filters: A survey. Internet Mathematics 1, 485–509 (2004)
Song, H., Dharmapurikar, S., Turner, J., Lockwood, J.: Fast hash table lookup using extended bloom filter: an aid to network processing. SIGCOMM Comput. Commun. Rev. 35(4), 181–192 (2005)
Appleby., A.: Murmurhash 2.0 (2010), http://sites.google.com/site/murmurhash/
Fung, B.C., Wang, K., Ester, M.: Hierarchical document clustering using frequent itemsets. In: Proceedings of the Third Siam International Conference on Data Mining (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Joshi, B., Bista, U., Ghimire, M. (2014). Intelligent Clustering Scheme for Log Data Streams. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-54903-8_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)