A Confidence-Guided Evaluation for Log Parsers Inner Quality

  • 10 Accesses


Logs faithfully record application behaviors and system states. Log parsing converts unstructured log messages into structured event templates by extracting the constant portion of raw logs. Log parsing is a prerequisite for further log analysis such as usage analysis, anomaly detection, performance modeling, and failure diagnosis. When processing logs with varied length, log parsing suffers from accuracy decreasing or the over-fitting problem. In addition, traditional probability-based accuracy assessment methods are ineffective in assessing log parsing inner quality, especially in understanding the reason about accuracy declining caused by varied length logs. In this paper we present a p_value-guided inner quality assessment on multiple log parsing algorithms. This method uses conformal evaluation to gain a deep insight of log parser quality. In this method, we choose the string edit distance algorithm as underlying non-conformity measure for conformal evaluation. We introduce two quality indicators to evaluate log parsers: credibility and confidence. The credibility reflects how conformal a log message to a event template generated by a log parser whereas the confidence reflects how non-conformal this log message to all other event templates. In order to demonstrate the inherent difference among different log parsers, we display the distribution of credibility and confidence of each prediction on tSNE 2D space. In the experiment, we evaluate 13 log parsers on different datasets. The results show that our approach could effectively demonstrate the inherent quality of log parsers and recognize variable-length problem compared to traditional confusion matrix based metrics.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

    Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: helping developers make informed logging decisions. In: Proceedings of the 37th international conference on software engineering, vol 1. IEEE Press, pp 415–425

  2. 2.

    Yuan D, Park S, Zhou Y (2012) Characterizing logging practices in open-source software. In: Proceedings of the 34th international conference on software engineering. IEEE Press, pp 102–112

  3. 3.

    Mi H, Wang H, Zhou Y, Lyu MR-T, Cai H (2013) Toward fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems. IEEE Trans Parallel Distrib Syst 24(6):1245–1255

  4. 4.

    Oliner A, Ganapathi A, Xu W (2012) Advances and challenges in log analysis. Commun ACM 55 (2):55–61

  5. 5.

    Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles. ACM, pp 117–132

  6. 6.

    Oprea A, Li Z, Yen T-F, Chin SH, Alrwais S (2015) Detection of early-stage enterprise infection by mining large-scale log data. In: 2015 45th Annual IEEE/IFIP international conference on dependable systems and networks (DSN). IEEE, pp 45–56

  7. 7.

    Makanju A, Zincir-Heywood AN, Milios EE (2010) Fast entropy based alert detection in super computer logs. In: 2010 International conference on dependable systems and networks workshops (DSN-W). IEEE, pp 52–58

  8. 8.

    Sangaiah AK, Medhane DV, Han T, Hossain MS, Muhammad G (2019) Enforcing position-based confidentiality with machine learning paradigm through mobile edge computing in real-time industrial informatics. IEEE Transactions on Industrial Informatics

  9. 9.

    Lee G, Lin J, Liu C, Lorek A, Ryaboy D (2012) The unified logging infrastructure for data analytics at twitter. Proce VLDB Endowm 5(12):1771–1780

  10. 10.

    He S, Zhu J, He P, Lyu MR (2016) Experience report: system log analysis for anomaly detection. In: 2016 IEEE 27th International symposium on software reliability engineering (ISSRE). IEEE, pp 207–218

  11. 11.

    Bertero C, Roy M, Sauvanaud C, Trédan G (2017) Experience report: log mining using natural language processing and application to anomaly detection. In: 2017 IEEE 28th International symposium on software reliability engineering (ISSRE). IEEE, pp 351–360

  12. 12.

    Fu Q, Lou J-G, Wang Y, Li J (2009) Execution anomaly detection in distributed systems through unstructured log analysis. In: Ninth IEEE International conference on data mining, 2009. ICDM’09. IEEE, pp 149–158

  13. 13.

    Chow M, Meisner D, Flinn J, Peek D, Wenisch TF (2014) The mystery machine: end-to-end performance analysis of large-scale internet services. In: OSDI, pp 217–231

  14. 14.

    Nagaraj K, Killian C, Neville J (2012) Structured comparative analysis of systems logs to diagnose performance problems. In: Proceedings of the 9th USENIX conference on networked systems design and implementation. USENIX Association, pp 26–26

  15. 15.

    Yuan D, Mai H, Xiong W, Tan L, Zhou Y, Pasupathy S (2010) Sherlog: error diagnosis by connecting clues from run-time logs. In: ACM SIGARCH computer architecture news, vol 38, no. 1. ACM, pp 143–154

  16. 16.

    Xu X, Zhu L, Weber I, Bass L, Sun D (2014) Pod-diagnosis: error diagnosis of sporadic operations on cloud applications. In: 2014 44th Annual IEEE/IFIP international conference on dependable systems and networks (DSN). IEEE, pp 252–263

  17. 17.

    Wong WE, Debroy V, Golden R, Xu X, Thuraisingham B (2012) Effective software fault localization using an rbf neural network. IEEE Trans Reliab 61(1):149–169

  18. 18.

    Lin Q, Zhang H, Lou J-G, Zhang Y, Chen X (2016) Log clustering based problem identification for online service systems. In: Proceedings of the 38th international conference on software engineering companion. ACM, pp 102–111

  19. 19.

    Du M, Li F, Zheng G, Srikumar V (2017) Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. ACM, pp 1285– 1298

  20. 20.

    Vaarandi R (2003) A data clustering algorithm for mining patterns from event logs. In: 3rd IEEE Workshop on IP operations & management, 2003.(IPOM 2003). IEEE, pp 119–126

  21. 21.

    Nagappan M, Vouk MA (2010) Abstracting log lines to log event types for mining software system logs. In: 2010 7th IEEE Working conference on mining software repositories (MSR). IEEE, pp 114–117

  22. 22.

    Vaarandi R, Pihelgas M (2015) Logcluster-a data clustering and pattern mining algorithm for event logs. In: 2015 11th International conference on network and service management (CNSM). IEEE, pp 1–7

  23. 23.

    Tang L, Li T, Perng C-S (2011) Logsig: generating system events from raw textual logs. In: Proceedings of the 20th ACM international conference on information and knowledge management. ACM, pp 785–794

  24. 24.

    Hamooni H, Debnath B, Xu J, Zhang H, Jiang G, Mueen A (2016) Logmine: fast pattern recognition for log analytics. In: Proceedings of the 25th ACM international on conference on information and knowledge management. ACM, pp 1573– 1582

  25. 25.

    Mizutani M (2013) Incremental mining of system log format. In: 2013 IEEE International conference on services computing (SCC). IEEE, pp 595–602

  26. 26.

    Shima K (2016) Length matters: clustering system log messages using length of words. arXiv:1611.03213

  27. 27.

    Jiang ZM, Hassan AE, Flora P, Hamann G (2008) Abstracting execution logs to execution events for enterprise applications (short paper). In: The Eighth international conference on quality software, 2008. QSIC’08. IEEE, pp 181–186

  28. 28.

    Jiang ZM, Hassan AE, Hamann G, Flora P (2008) An automated approach for abstracting execution logs to execution events. J Softw Maint Evol Res Pract 20(4):249–267

  29. 29.

    Makanju A, Zincir-Heywood AN, Milios EE (2012) A lightweight algorithm for message type extraction in system application logs. IEEE Trans Knowl Data Eng 24(11):1921–1936

  30. 30.

    Min D, Li F (2017) Spell: streaming parsing of system event logs. In: IEEE International conference on data mining

  31. 31.

    He P, Zhu J, Zheng Z, Lyu MR (2017) Drain: An online log parsing approach with fixed depth tree. In: 2017 IEEE International conference on web services (ICWS). IEEE, pp 33–40

  32. 32.

    He P, Zhu J, Xu P, Zheng Z, Lyu MR (2018) A directed acyclic graph approach to online log parsing. arXiv:1806.04356

  33. 33.

    He P, Zhu J, He S, Li J, Lyu MR (2018) Towards automated log parsing for large-scale log data analysis. IEEE Trans Depend Secur Comput 15(6):931–944

  34. 34.

    Messaoudi S, Panichella A, Bianculli D, Briand L, Sasnauskas R (2018) A search-based approach for accurate identification of log message formats. In: Proceedings of the 26th IEEE/ACM international conference on program comprehension (ICPC’18). ACM

  35. 35.

    He P, Zhu J, He S, Li J, Lyu MR (2016) An evaluation study on log parsing and its use in log mining. In: 2016 46th Annual IEEE/IFIP international conference on dependable systems and networks (DSN). IEEE, pp 654–661

  36. 36.

    Zhu J, He S, Liu J, He P, Xie Q, Zheng Z, Lyu MR (2018) Tools and benchmarks for automated log parsing, arXiv:1811.03509

  37. 37.

    Shafer G, Vovk V (2008) A tutorial on conformal prediction. J Mach Learn Res 9:371–421

  38. 38.

    Papadopoulos H, Vovk V, Gammermam A (2007) Conformal prediction with neural networks. In: Proceedings of the 19th IEEE international conference on tools with artificial intelligence - volume 02, ser. ICTAI ’07. [Online]. Available: IEEE Computer Society, Washington, DC, pp 388–395

  39. 39.

    Papadopoulos H (2008) ch. Inductive conformal prediction: theory and application to neural networks

  40. 40.

    Jordaney R, Sharad K, Dash SK, Wang Z, Papini D, Nouretdinov I, Cavallaro L (2017) Transcend: Detecting concept drift in malware classification models. In: proceedings OF the 26th USENIX security symposium (USENIX SECURITY’17). USENIX Association, pp 625–642

  41. 41.

    Pendlebury F, Pierazzi F, Jordaney R, Kinder J, Cavallaro L (2019) Tesseract: eliminating experimental bias in malware classification across space and time. In: Proceedings of the USENIX security symposium USENIX

  42. 42.

    Maaten Lvd, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605

  43. 43.

    He P, Zhu J, He S, Li J (2015) Log parser.

  44. 44.

    Fedorova V, Gammerman A, Nouretdinov I, Vovk V (2012) Plug-in martingales for testing exchangeability on-line. arXiv:1204.3251

  45. 45.

    Ho S-S (2005) A martingale framework for concept change detection in time-varying data streams. In: Proceedings of the 22nd international conference on machine learning. ACM, pp 321–327

  46. 46.

    He P (2017) An end-to-end log management framework for distributed systems. In: 2017 IEEE 36th symposium on reliable distributed systems (SRDS). IEEE, pp 266–267

  47. 47.

    Beschastnikh I, Brun Y, Schneider S, Sloan M, Ernst MD (2011) Leveraging existing instrumentation to automatically infer invariant-constrained models. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, pp 267–277

  48. 48.

    Shang W, Jiang ZM, Hemmati H, Adams B, Hassan AE, Martin P (2013) Assisting developers of big data analytics applications when deploying on hadoop clouds. In: 2013 35th International conference on software engineering (ICSE). IEEE, pp 402–411

Download references


This work is partially supported by the National Natural Science Foundation (61872200, 61872202, 11801284), the National Key Research and Development Program of China (2016YFC0400709, 2018YFB2100300), the Natural Science Foundation of Tianjin (18YFYZCG00060), the CERNET Innovation Project(NGII20180401).

Author information

Correspondence to Zhi Wang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xie, X., Wang, Z., Xiao, X. et al. A Confidence-Guided Evaluation for Log Parsers Inner Quality. Mobile Netw Appl (2020).

Download citation


  • Log parser
  • Conformal evaluation
  • Confidence
  • Credibility