Skip to main content

Intelligent Clustering Scheme for Log Data Streams

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Abstract

Mining patterns from the log messages is valuable for real-time analysis and detecting faults, anomaly and security threats. A data-streaming algorithm with an efficient pattern finding approach is more practical way to classify these ubiquitous logs. Thus, in this paper the authors propose a novel online approach for finding patterns in log data sets where a locally sensitive signature is generated for similar log messages. The similarity of these log messages is identified by parsing log messages and then, logically analyzing the signature bit stream associated with them. In addition to that the approach is intelligent enough to reflect the changes when a totally new log appears in the system. The validation of the proposed method is done by comparing F-measure of clustering results for labeled datasets and the word order matched percentage of the log messages in a cluster for unlabeled case with that of SLCT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vaarandi, R.: A data clustering algorithm for mining patterns from event logs. In: Proceedings of the IEEE IPOM 2003, pp. 119–126 (2003)

    Google Scholar 

  2. Makanju, A., Brooks, S., Zincir-Heywood, A.N., Milios, E.E.: Logview: Visualizing event log clusters. In: Sixth Annual Conference on Privacy, Security and Trust, PST 2008, pp. 99–108 (2008)

    Google Scholar 

  3. Muller-Molina, A.J., Shinohara, T.: Efficient similarity search by reducing i/o with compressed sketches. In: Proceedings of the Second International Workshop on Similarity Search and Applications, SISAP 2009, pp. 30–38. IEEE Computer Society, Washington, DC (2009)

    Chapter  Google Scholar 

  4. Hansen, S.E., Atkins, E.T., Todd, E.: Automated system monitoring and notification with swatch. In: Proceedings of the 7th Systems Administration Conference, NMonterey, CA, pp. 145–155 (1993)

    Google Scholar 

  5. Stearley, J., Corwell, S., Lord, K.: Bridging the gaps: Joining information sources with splunk. In: Proceedings of the Workshop on Managing Systems via Log Analysis and Machine Learning Techniques (2010)

    Google Scholar 

  6. Yamanishi, K., Maruyama, Y.: Dynamic syslog mining for network failure monitoring. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD 2005, pp. 499–508. ACM, New York (2005)

    Google Scholar 

  7. Seipel, D., Neubeck, P., Köhler, S., Atzmueller, M.: Mining complex event patterns in computer networks. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2012. LNCS, vol. 7765, pp. 33–48. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. Nagappan, M., Vouk, M.A.: Abstracting log lines to log event types for mining software system logs. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR), pp. 114–117 (2010)

    Google Scholar 

  9. Mannila, H., Toivonen, H., Inkeri Verkamo, A.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1, 259–289 (1997)

    Article  Google Scholar 

  10. Zheng, Q., Xu, K., Lv, W., Ma, S.: Intelligent search of correlated alarms from database containing noise data. In: 2002 IEEE/IFIP Network Operations and Management Symposium, NOMS 2002, pp. 405–419 (2002)

    Google Scholar 

  11. Wen, L., Wang, J., Aalst, W., Huang, B., Sun, J.: A novel approach for process mining based on event types. Journal of Intelligent Information Systems 32, 163–190 (2009)

    Article  Google Scholar 

  12. Makanju, A.A., Zincir-Heywood, A.N., Milios, E.E.: Clustering event logs using iterative partitioning. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 1255–1264. ACM, New York (2009)

    Google Scholar 

  13. Demetrescu, C., Finocchi, I.: Algorithms for data streams. Handbook of Applied Algorithms: Solving Scientific, Engineering, and Practical Problems, 241 (2007)

    Google Scholar 

  14. Andoni, A.: Nearest Neighbor Search: the Old, the New, and the Impossible. PhD thesis, Massachusetts Institute of Technology (2009)

    Google Scholar 

  15. Panigrahy, R.: Hashing, Searching, Sketching. PhD thesis, Stanford University (2006)

    Google Scholar 

  16. Paulev, L., Jgou, H., Amsaleg, L.: Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognition Letters 31, 1348–1358 (2010)

    Article  Google Scholar 

  17. Slaney, M., Lifshits, Y., He, J.: Optimal parameters for locality-sensitive hashing. Proceedings of the IEEE 100, 2604–2623 (2012)

    Article  Google Scholar 

  18. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006, pp. 459–468 (2006)

    Google Scholar 

  19. Broder, A., Mitzenmacher, M.: Network applications of bloom filters: A survey. Internet Mathematics 1, 485–509 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  20. Song, H., Dharmapurikar, S., Turner, J., Lockwood, J.: Fast hash table lookup using extended bloom filter: an aid to network processing. SIGCOMM Comput. Commun. Rev. 35(4), 181–192 (2005)

    Article  Google Scholar 

  21. Appleby., A.: Murmurhash 2.0 (2010), http://sites.google.com/site/murmurhash/

  22. Fung, B.C., Wang, K., Ester, M.: Hierarchical document clustering using frequent itemsets. In: Proceedings of the Third Siam International Conference on Data Mining (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Joshi, B., Bista, U., Ghimire, M. (2014). Intelligent Clustering Scheme for Log Data Streams. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54903-8_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54902-1

  • Online ISBN: 978-3-642-54903-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics