LogRank: An Approach to Sample Business Process Event Log for Efficient Discovery
Considerable amounts of business process event logs can be collected by modern information systems. Process discovery aims to uncover a process model from an event log. Many process discovery approaches have been proposed, however, most of them have difficulties in handling large-scale event logs. Motivated by PageRank, in this paper we propose LogRank, a graph-based ranking model, for event log sampling. Using LogRank, a large-scale event log can be sampled to a smaller size that can be efficiently handled by existing discovery approaches. Moreover, we introduce an approach to measure the quality of a sample log with respect to the original one from a discovery perspective. The proposed sampling approach has been implemented in the open-source process mining toolkit ProM. The experimental analyses with both synthetic and real-life event logs demonstrate that the proposed sampling approach provides an effective solution to improve process discovery efficiency as well as ensuring high quality of the discovered model.
KeywordsLogRank Log sampling Process discovery Quality measure
This work was supported in part by the NSFC under Grant 61472229, Grant 61602279, Grant 71704096, and Grant 31671588, in part by the Science and Technology Development Fund of Shandong Province of China under Grant 2016ZDJS02A11, Grant 2014GGX101035, and Grant ZR2017MF027, in part by the Taishan Scholar Climbing Program of Shandong Province, and in part by the SDUST Research Fund under Grant 2015TDJH102.
- 2.Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: On the role of fitness, precision, generalization and simplicity in process discovery. In: Meersman, R., et al. (eds.) OTM 2012. LNCS, vol. 7565, pp. 305–322. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33606-5_19CrossRefGoogle Scholar
- 4.Cheng, L., Kotoulas, S., Ward, T.E., Theodoropoulos, G.: Robust and efficient large-large table outer joins on distributed infrastructures. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 258–269. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_22CrossRefGoogle Scholar
- 5.Cheng, L., Li, T.: Efficient data redistribution to speedup big data analytics in large systems. In: 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), pp. 91–100. IEEE (2016)Google Scholar
- 8.Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs - a constructive approach. In: Colom, J.-M., Desel, J. (eds.) PETRI NETS 2013. LNCS, vol. 7927, pp. 311–329. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38697-8_17CrossRefGoogle Scholar
- 10.Liu, C., Duan, H., Qingtian, Z., Zhou, M., Lu, F., Cheng, J.: Towards comprehensive support for privacy preservation cross-organization business process mining. IEEE Trans. Serv. Comput. 1–15 (2016). https://doi.org/10.1109/TSC.2016.2617331
- 14.Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Association for Computational Linguistics (2004)Google Scholar
- 15.Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)Google Scholar