DB-XES: Enabling Process Discovery in the Large

  • Alifah Syamsiyah
  • Boudewijn F. van Dongen
  • Wil M. P. van der Aalst
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 307)


Dealing with the abundance of event data is one of the main process discovery challenges. Current process discovery techniques are able to efficiently handle imported event log files that fit in the computer’s memory. Once data files get bigger, scalability quickly drops since the speed required to access the data becomes a limiting factor. This paper proposes a new technique based on relational database technology as a solution for scalable process discovery. A relational database is used both for storing event data (i.e. we move the location of the data) and for pre-processing the event data (i.e. we move some computations from analysis-time to insertion-time). To this end, we first introduce DB-XES as a database schema which resembles the standard XES structure, we provide a transparent way to access event data stored in DB-XES, and we show how this greatly improves on the memory requirements of the state-of-the-art process discovery techniques. Secondly, we show how to move the computation of intermediate data structures to the database engine, to reduce the time required during process discovery. The work presented in this paper is implemented in ProM tool, and a range of experiments demonstrates the feasibility of our approach.


Process discovery Process mining Big event data Relational database 


  1. 1.
    Azzini, A., Ceravolo, P.: Consistent process mining over big data triple stores. In: 2013 IEEE International Congress on Big Data, pp. 54–61, June 2013Google Scholar
  2. 2.
    Calvanese, D., Montali, M., Syamsiyah, A., van der Aalst, W.M.P.: Ontology-driven extraction of event logs from relational databases. In: Reichert, M., Reijers, H.A. (eds.) BPM 2015. LNBIP, vol. 256, pp. 140–153. Springer, Cham (2016). Scholar
  3. 3.
    Di Ciccio, C., Maggi, F.M., Mendling, J.: Efficient discovery of target-branched declare constraints. Inf. Syst. 56, 258–283 (2016)CrossRefGoogle Scholar
  4. 4.
    Di Ciccio, C., Mecella, M.: On the discovery of declarative control flows for artful processes. ACM Trans. Manage. Inf. Syst. 5(4), 24:1–24:37 (2015)CrossRefGoogle Scholar
  5. 5.
    Di Ciccio, C., Mecella, M.: Mining constraints for artful processes. In: Abramowicz, W., Kriksciuniene, D., Sakalauskas, V. (eds.) BIS 2012. LNBIP, vol. 117, pp. 11–23. Springer, Heidelberg (2012). Scholar
  6. 6.
    Günther, C.W.: XES Standard Definition (2014).
  7. 7.
    Hernández, S., van Zelst, S.J., Ezpeleta, J., van der Aalst, W.M.P.: Handling big(ger) logs: connecting prom 6 to apache hadoop. In: BPM Demo Session 2015, pp. 80–84 (2015)Google Scholar
  8. 8.
    Jans, M., Alles, M., Vasarhelyi, M.A.: Process mining of event logs in internal auditing: a case study. In: ISAIS (2012)Google Scholar
  9. 9.
    Jans, M., Alles, M., Vasarhelyi, M.A.: Process Mining of Event Logs in Auditing: Opportunities and Challenges. Available at SSRN 2488737 (2010)Google Scholar
  10. 10.
    Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs - a constructive approach. In: Colom, J.-M., Desel, J. (eds.) PETRI NETS 2013. LNCS, vol. 7927, pp. 311–329. Springer, Heidelberg (2013). Scholar
  11. 11.
    Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs containing infrequent behaviour. In: Lohmann, N., Song, M., Wohed, P. (eds.) BPM 2013. LNBIP, vol. 171, pp. 66–78. Springer, Cham (2014). Scholar
  12. 12.
    Maggi, F.M., Burattin, A., Cimitile, M., Sperduti, A.: Online process discovery to detect concept drifts in LTL-based declarative process models. In: Meersman, R., Panetto, H., Dillon, T., Eder, J., Bellahsene, Z., Ritter, N., De Leenheer, P., Dou, D. (eds.) OTM 2013. LNCS, vol. 8185, pp. 94–111. Springer, Heidelberg (2013). Scholar
  13. 13.
    Mannhardt, F.: XESLite Managing Large XES Event Logs in ProM. BPM Center Report BPM-16-04 (2016)Google Scholar
  14. 14.
    Mans, R.S., Schonenberg, M.H., Song, M., van der Aalst, W.M.P., Bakker, P.J.M.: Application of process mining in healthcare – a case study in a Dutch hospital. In: Fred, A., Filipe, J., Gamboa, H. (eds.) BIOSTEC 2008. CCIS, vol. 25, pp. 425–438. Springer, Heidelberg (2008). Scholar
  15. 15.
    Paszkiewicz, Z.: Process mining techniques in conformance testing of inventory processes: an industrial application. In: Abramowicz, W. (ed.) BIS 2013. LNBIP, vol. 160, pp. 302–313. Springer, Heidelberg (2013). Scholar
  16. 16.
    Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. In: Spaccapietra, S. (ed.) Journal on Data Semantics X. LNCS, vol. 4900, pp. 133–173. Springer, Heidelberg (2008). Scholar
  17. 17.
    Puchovsky, M., Di Ciccio, C., Mendling, J.: A case study on the business benefits of automated process discovery. In: SIMPDA, pp. 35–49 (2016)Google Scholar
  18. 18.
    Reguieg, H., Benatallah, B., Nezhad, H.R.M., Toumani, F.: Event correlation analytics: scaling process mining using mapreduce-aware event correlation discovery techniques. IEEE Trans. Serv. Comput. 8(6), 847–860 (2015)CrossRefGoogle Scholar
  19. 19.
    Rozinat, A., de Jong, I.S.M., Günther, C.W., van der Aalst, W.M.P.: Process mining applied to the test process of wafer scanners in ASML. IEEE Trans. Syst. Man Cybern. Part C 39(4), 474–479 (2009)CrossRefGoogle Scholar
  20. 20.
    Schönig, S., Rogge-Solti, A., Cabanillas, C., Jablonski, S., Mendling, J.: Efficient and customisable declarative process mining with SQL. In: Nurcan, S., Soffer, P., Bajec, M., Eder, J. (eds.) CAiSE 2016. LNCS, vol. 9694, pp. 290–305. Springer, Cham (2016). Scholar
  21. 21.
    Sharma, V., Dave, M.: SQL and NoSQL databases. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(8), 20–27 (2012)Google Scholar
  22. 22.
    van der Aalst, W.M.P.: Distributed process discovery and conformance checking. In: de Lara, J., Zisman, A. (eds.) FASE 2012. LNCS, vol. 7212, pp. 1–25. Springer, Heidelberg (2012). Scholar
  23. 23.
    van der Aalst, W.M.P.: Decomposing petri nets for process mining: a generic approach. Distrib. Parallel Databases 31(4), 471–507 (2013)CrossRefGoogle Scholar
  24. 24.
    van der Aalst, W.M.P.: Process Mining: Data Science in Action. Springer, Berlin (2016). Scholar
  25. 25.
    van der Aalst, W.M.P., Damiani, E.: Processes meet big data: connecting data science with process science. IEEE Trans. Serv. Comput. 8(6), 810–819 (2015)CrossRefGoogle Scholar
  26. 26.
    van der Aalst, W.M.P., Reijers, H.A., Song, M.: Discovering social networks from event logs. Comput. Support. Coop. Work (CSCW) 14(6), 549–593 (2005)CrossRefGoogle Scholar
  27. 27.
    van der Spoel, S., van Keulen, M., Amrit, C.: Process prediction in noisy data sets: a case study in a Dutch hospital. In: Cudre-Mauroux, P., Ceravolo, P., Gašević, D. (eds.) SIMPDA 2012. LNBIP, vol. 162, pp. 60–83. Springer, Heidelberg (2013). Scholar
  28. 28.
    van der Werf, J.M.E.M., van Dongen, B.F., Hurkens, C.A.J., Serebrenik, A.: Process discovery using integer linear programming. In: van Hee, K.M., Valk, R. (eds.) PETRI NETS 2008. LNCS, vol. 5062, pp. 368–387. Springer, Heidelberg (2008). Scholar
  29. 29.
    van Dongen, B.F.: BPI Challenge 2017 (2017)Google Scholar
  30. 30.
    van Dongen, B.F., Shabani, S.: Relational XES: data management for process mining. In: CAiSE 2015, pp. 169–176 (2015)Google Scholar
  31. 31.
    van Zelst, S.J., van Dongen, B.F., van der Aalst, W.M.P.: Know what you stream: generating event streams from CPN models in ProM 6. In: BPM Demo Session 2015, pp. 85–89 (2015)Google Scholar
  32. 32.
    Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: XES, XESame, and ProM 6. In: Soffer, P., Proper, E. (eds.) CAiSE Forum 2010. LNBIP, vol. 72, pp. 60–75. Springer, Heidelberg (2011). Scholar
  33. 33.
    Vogelgesang, T., Appelrath, H.-J.: A relational data warehouse for multidimensional process mining. In: Ceravolo, P., Rinderle-Ma, S. (eds.) SIMPDA 2015. LNBIP, vol. 244, pp. 155–184. Springer, Cham (2017). Scholar
  34. 34.
    Zhou, Z., Wang, Y., Li, L.: Process mining based modeling and analysis of workflows in clinical care - a case study in a Chicago outpatient clinic. In: ICNSC, pp. 590–595 (2014)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2018

Authors and Affiliations

  • Alifah Syamsiyah
    • 1
  • Boudewijn F. van Dongen
    • 1
  • Wil M. P. van der Aalst
    • 1
  1. 1.Eindhoven University of TechnologyEindhovenThe Netherlands

Personalised recommendations