Encyclopedia of Big Data Technologies

2019 Edition
| Editors: Sherif Sakr, Albert Y. Zomaya

Event Log Cleaning for Business Process Analytics

  • Andreas SoltiEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-77525-8_87



Event log cleaning is a data preparation phase that turns event data into event logs to enable or improve the quality of business process analytics methods like process mining, model enrichment, and conformance checking. Event data might have to be collected from different sources and formats, filtered, transformed, and assigned to the corresponding processes and cases.


The goal of business process analytics projects is to gain insights into the execution of business processes. It can help to know which questions should be answered by the analysis. Some typical questions are what is done (activities), when is it done or how long does it take (time stamps), in which order (relations), and by whom(resources). In contrast to traditional questionnaires, the process participants do not need to be personally asked about their perception of the process. In business process analytics, the event logs containing process...

This is a preview of subscription content, log in to check access.


  1. Baier T, Mendling J, Weske M (2014) Bridging abstraction layers in process mining. Inf Syst 46:123–139. https://doi.org/10.1016/j.is.2014.04.004CrossRefGoogle Scholar
  2. Bayomie D, Awad A, Ezat E (2016) Correlating unlabeled events from cyclic business processes execution. In: Advanced information systems engineering – 28th international conference, CAiSE 2016, Ljubljana, 13–17 June 2016. Proceedings, pp 274–289. https://doi.org/10.1007/978-3-319-39696-5_17Google Scholar
  3. Bertoli P, Francescomarino CD, Dragoni M, Ghidini C (2013) Reasoning-based techniques for dealing with incomplete business process execution traces. In: AI*IA 2013: advances in artificial intelligence – XIIIth international conference of the Italian association for artificial intelligence, Turin, 4–6 Dec 2013. Proceedings, pp 469–480. https://doi.org/10.1007/978-3-319-03524-6_40CrossRefGoogle Scholar
  4. Bose JCJC, Mans RS, van der Aalst WMP (2013) Wanna improve process mining results? In: IEEE symposium on computational intelligence and data mining, CIDM 2013, Singapore, 16–19 Apr 2013, pp 127–134.  https://doi.org/10.1109/CIDM.2013.6597227
  5. Bose RPJC, van der Aalst WMP, Zliobaite I, Pechenizkiy M (2014) Dealing with concept drifts in process mining. IEEE Trans Neural Netw Learn Syst 25(1):154–171.  https://doi.org/10.1109/TNNLS.2013.2278313CrossRefGoogle Scholar
  6. Conforti R, Rosa ML, ter Hofstede AHM (2017) Filtering out infrequent behavior from business process event logs. IEEE Trans Knowl Data Eng 29(2):300–314.  https://doi.org/10.1109/TKDE.2016.2614680CrossRefGoogle Scholar
  7. de Leoni M, Maggi FM, van der Aalst WMP (2015) An alignment-based framework to check the conformance of declarative process models and to preprocess event-log data. Inf Syst 47:258–277. https://doi.org/10.1016/j.is.2013.12.005CrossRefGoogle Scholar
  8. de Leoni M, van der Aalst WMP, Dees M (2016) A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs. Inf Syst 56:235–257. https://doi.org/10.1016/j.is.2015.07.003CrossRefGoogle Scholar
  9. de Lima Bezerra F, Wainer J (2013) Algorithms for anomaly detection of traces in logs of process aware information systems. Inf Syst 38(1):33–44. https://doi.org/10.1016/j.is.2012.04.004CrossRefGoogle Scholar
  10. de San Pedro J, Cortadella J (2016) Discovering duplicate tasks in transition systems for the simplification of process models. In: Business process management – 14th international conference, BPM 2016, Rio de Janeiro, 18–22 Sept 2016. Proceedings, pp 108–124. https://doi.org/10.1007/978-3-319-45348-4_7Google Scholar
  11. Diamantini C, Genga L, Potena D, van der Aalst WMP (2016) Building instance graphs for highly variable processes. Expert Syst Appl 59:101–118. https://doi.org/10.1016/j.eswa.2016.04.021CrossRefGoogle Scholar
  12. Dumas M, Rosa ML, Mendling J, Reijers HA (2013) Fundamentals of business process management. Springer, https://doi.org/10.1007/978-3-642-33143-5CrossRefGoogle Scholar
  13. Francescomarino CD, Ghidini C, Tessaris S, Sandoval IV (2015) Completing workflow traces using action languages. In: Advanced information systems engineering – 27th international conference, CAiSE 2015, Stockholm, 8–12 June 2015, Proceedings, pp 314–330. https://doi.org/10.1007/978-3-319-19069-3_20CrossRefGoogle Scholar
  14. Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1–44:37. http://doi.acm.org/10.1145/2523813zbMATHCrossRefGoogle Scholar
  15. Greco G, Guzzo A, Pontieri L, Saccà D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8):1010–1027.  https://doi.org/10.1109/TKDE.2006.123CrossRefGoogle Scholar
  16. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, http://hanj.cs.illinois.edu/bk3/
  17. International Organization for Standardization (2011) Software engineering – Software Product Quality Requirements and Evaluation (SQuaRE) – Guide to SQuaREGoogle Scholar
  18. Leemans SJ (2017) Robust process mining with guarantees. Ph.D thesis, Eindhoven University of Technology. https://pure.tue.nl/ws/files/63890938/20170509_Leemans.pdf
  19. Lu X, Fahland D, van der Aalst WMP (2014) Conformance checking based on partially ordered event data. In: Business process management workshops – BPM 2014 international workshops, Eindhoven, 7–8 Sept 2014, Revised papers, pp 75–88. https://doi.org/10.1007/978-3-319-15895-2_7Google Scholar
  20. Mannhardt F, de Leoni M, Reijers HA, van der Aalst WMP (2016a) Balanced multi-perspective checking of process conformance. Computing 98(4):407–437. https://doi.org/10.1007/s00607-015-0441-1MathSciNetzbMATHCrossRefGoogle Scholar
  21. Mannhardt F, de Leoni M, Reijers HA, van der Aalst WMP, Toussaint PJ (2016b) From low-level events to activities – a pattern-based approach. In: Business process management – 14th international conference, BPM 2016, Rio de Janeiro, 18–22 Sept 2016. Proceedings, pp 125–141. https://doi.org/10.1007/978-3-319-45348-4_8Google Scholar
  22. Mans RS, Schonenberg H, Song M, van der Aalst WMP, Bakker PJM (2008) Application of process mining in healthcare – a case study in a Dutch hospital. In: Biomedical engineering systems and technologies, international joint conference, BIOSTEC 2008, Funchal, Madeira, 28–31 Jan 2008, Revised selected papers, pp 425–438. https://doi.org/10.1007/978-3-540-92219-3_32CrossRefGoogle Scholar
  23. Nezhad HRM, Saint-Paul R, Casati F, Benatallah B (2011) Event correlation for process discovery from web service interaction logs. VLDB J 20(3):417–444. https://doi.org/10.1007/s00778-010-0203-9CrossRefGoogle Scholar
  24. Ostovar A, Maaradji A, Rosa ML, ter Hofstede AHM (2017) Characterizing drift from event streams of business processes. In: Advanced information systems engineering – 29th international conference, CAiSE 2017, Essen, 12–16 Jun 2017, Proceedings, pp 210–228. https://doi.org/10.1007/978-3-319-59536-8_14Google Scholar
  25. Pourmirza S, Dijkman RM, Grefen P (2017) Correlation miner: mining business process models and event correlations without case identifiers. Int J Coop Inf Syst 26(2):1–32. https://doi.org/10.1142/S0218843017420023CrossRefGoogle Scholar
  26. Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350. https://doi.org/10.1007/s007780100057zbMATHCrossRefGoogle Scholar
  27. Reichert M, Weber B (2012) Enabling flexibility in process-aware information systems – challenges, methods, technologies. Springer, https://doi.org/10.1007/978-3-642-30409-5zbMATHCrossRefGoogle Scholar
  28. Rogge-Solti A, Kasneci G (2014) Temporal anomaly detection in business processes. In: Business process management – 12th international conference, BPM 2014, Haifa, 7–11 Sept 2014. Proceedings, pp 234–249. https://doi.org/10.1007/978-3-319-10172-9_15Google Scholar
  29. Rogge-Solti A, Mans R, van der Aalst WMP, Weske M (2013) Improving documentation by repairing event logs. In: The practice of enterprise modeling – 6th IFIP WG 8.1 working conference, PoEM 2013, Riga, 6–7 Nov 2013, Proceedings, pp 129–144. https://doi.org/10.1007/978-3-642-41641-5_10Google Scholar
  30. Senderovich A, Rogge-Solti A, Gal A, Mendling J, Mandelbaum A (2016) The ROAD from sensor data to process instances via interaction mining. In: Advanced information systems engineering – 28th international conference, CAiSE 2016, Ljubljana, 13–17 Jun 2016. Proceedings, pp 257–273. https://doi.org/10.1007/978-3-319-39696-5_16Google Scholar
  31. Song JL, Luo TJ, Chen S, Liu W (2009) A clustering based method to solve duplicate tasks problem. J Grad School Chin Acad Sci 26(1):107–113Google Scholar
  32. Song S, Cao Y, Wang J (2016) Cleaning timestamps with temporal constraints. PVLDB 9(10):708–719. http://www.vldb.org/pvldb/vol9/p708-song.pdfGoogle Scholar
  33. Suriadi S, Andrews R, ter Hofstede AHM, Wynn MT (2017) Event log imperfection patterns for process mining: towards a systematic approach to cleaning event logs. Inf Syst 64:132–150. https://doi.org/10.1016/j.is.2016.07.011CrossRefGoogle Scholar
  34. van der Aalst WMP (2016) Process mining – data science in action, 2nd edn. Springer, https://doi.org/10.1007/978-3-662-49851-4CrossRefGoogle Scholar
  35. van der Aalst WMP, Adriansyah A, de Medeiros AKA, Arcieri F, Baier T, Blickle T, Bose RPJC, van den Brand P, Brandtjen R, Buijs JCAM, Burattin A, Carmona J, Castellanos M, Claes J, Cook J, Costantini N, Curbera F, Damiani E, de Leoni M, Delias P, van Dongen BF, Dumas M, Dustdar S, Fahland D, Ferreira DR, Gaaloul W, van Geffen F, Goel S, Günther CW, Guzzo A, Harmon P, ter Hofstede AHM, Hoogland J, Ingvaldsen JE, Kato K, Kuhn R, Kumar A, Rosa ML, Maggi FM, Malerba D, Mans RS, Manuel A, McCreesh M, Mello P, Mendling J, Montali M, Nezhad HRM, zur Muehlen M, Munoz-Gama J, Pontieri L, Ribeiro J, Rozinat A, Pérez HS, Pérez RS, Sepúlveda M, Sinur J, Soffer P, Song M, Sperduti A, Stilo G, Stoel C, Swenson KD, Talamo M, Tan W, Turner C, Vanthienen J, Varvaressos G, Verbeek E, Verdonk M, Vigo R, Wang J, Weber B, Weidlich M, Weijters T, Wen L, Westergaard M, Wynn MT (2011) Process mining manifesto. In: Business process management workshops – BPM 2011 international workshops, Clermont-Ferrand, 29 Aug 2011, Revised selected papers, part I, pp 169–194, https://doi.org/10.1007/978-3-642-28108-2_19CrossRefGoogle Scholar
  36. Wang J, Song S, Zhu X, Lin X, Sun J (2016) Efficient recovery of missing events. IEEE Trans Knowl Data Eng 28(11):2943–2957.  https://doi.org/10.1109/TKDE.2016.2594785CrossRefGoogle Scholar
  37. Yakout M, Berti-Équille L, Elmagarmid AK (2013) Don’t be scared: use scalable automatic repairing with maximal likelihood and bounded changes. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2013, New York, 22–27 Jun 2013, pp 553–564. http://doi.acm.org/10.1145/2463676.2463706

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Vienna University of Economics and BusinessViennaAustria