Advertisement

Entropy as a Measure of Log Variability

  • Christoffer Olling BackEmail author
  • Søren Debois
  • Tijs Slaats
Original Article

Abstract

Process mining algorithms fall in two classes: imperative miners output flow diagrams, showing all possible paths, whereas declarative miners output constraints, showing the rules governing a process. But given a log, how do we know which of the two to apply? Assuming that logs exhibiting a large degree of variability are more suited for declarative miners, we can attempt to answer this question by defining a suitable measure of the variability of the log. This paper reports on an exploratory study into the use of entropy measures as metrics of variability. We survey notions of entropy used, e.g. in physics; we propose variant notions likely more suitable for the field of process mining; we provide an implementation of every entropy notion discussed; and we report entropy measures for a collection of both synthetic and real-life logs. Finally, based on anecdotal indications of which logs are better suited for declarative/imperative mining, we identify the most promising measures for future studies. For estimating overall entropy, global block and k-nearest neighbour estimators of entropy appear most promising and excel at identifying noise in logs. For estimating entropy rate we identify Lempel–Ziv and certain variants of k-block estimators performing well, and note that the former is more stable, but sensitive to noise, while the latter is less stable, being sensitive to cut-off constraints determining block size.

Keywords

Process mining Hybrid models Process variability Process flexibility Information theory Entropy Knowledge work 

Notes

Acknowledgements

We would like to thank Jakob Grue Simonsen for valuable discussions.

References

  1. 1.
    Back CO, Debois S, Slaats T (2018) Towards an entropy-based analysis of log variability. In: Teniente E, Weidlich M (eds) Business process management workshops. Lecture notes in business information processing, vol 308. Springer, Cham, pp 53–70Google Scholar
  2. 2.
    van der Aalst WMP (1998) The application of Petri nets to workflow management. J Circuits Syst Comput 08:21–66.  https://doi.org/10.1142/S0218126698000043 CrossRefGoogle Scholar
  3. 3.
    Object Management Group (2011) Business process modeling notation version 2.0. Technical report, Object Management Group Final Adopted SpecificationGoogle Scholar
  4. 4.
    Pesic M, Schonenberg H, van der Aalst W (2007) Declare: full support for loosely-structured processes. In: EDOC 2007, pp 287–300Google Scholar
  5. 5.
    Debois S, Hildebrandt T, Slaats T (2015) Safety, liveness and run-time refinement for modular process-aware information systems with dynamic sub processes. In: International symposium on formal methods. Springer, Berlin, pp 143–160Google Scholar
  6. 6.
    Hull R, Damaggio E, Masellis RD, Fournier F, Gupta M, Heath F, Hobson S, Linehan M, Maradugu S, Nigam A, Noi Sukaviriya P, Vaculín R (2011) Business artifacts with guard-stage-milestone lifecycles: managing artifact interactions with conditions and events. In: DEBS 2011, pp 51–62Google Scholar
  7. 7.
    Debois S, Slaats T (2015) The analysis of a real life declarative process. In: CIDM 2015, pp 1374–1382Google Scholar
  8. 8.
    Reijers H, Slaats T, Stahl C (2013) Declarative modeling—an academic dream or the future for BPM? In: BPM 2013, pp 307–322Google Scholar
  9. 9.
    Slaats T, Schunselaar DMM, Maggi FM, Reijers HA (2016) The semantics of hybrid process models. In: CoopIS, pp 531–551Google Scholar
  10. 10.
    Maggi FM, Slaats T, Reijers HA (2014) The automated discovery of hybrid processes. In: Business process management—12th international conference, BPM 2014, Haifa, Israel, September 7–11, 2014. Proceedings, pp 392–399Google Scholar
  11. 11.
    Smedt JD, Weerdt JD, Vanthienen J (2015) Fusion miner: process discovery for mixed-paradigm models. Decis Support Syst 77:123–136CrossRefGoogle Scholar
  12. 12.
    Schunselaar DMM, Slaats T, Maggi FM, Reijers HA, van der Aalst WMP (2018) Mining hybrid business process models: a quest for better precision. In: Abramowicz W, Paschke A (eds) Business information systems. Springer, Cham, pp 190–205CrossRefGoogle Scholar
  13. 13.
    Greco G, Guzzo A, Pontieri L, Sacca D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8):1010–1027.  https://doi.org/10.1109/TKDE.2006.123 CrossRefGoogle Scholar
  14. 14.
    Song M, Günther CW, Aalst WM (2009) Trace clustering in process mining. In: Business process management workshops. Springer, Berlin, pp 109–120Google Scholar
  15. 15.
    Makanju AA, Zincir-Heywood AN, Milios EE (2009) Clustering event logs using iterative partitioning. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09. ACM, New York, NY, pp 1255–1264.  https://doi.org/10.1145/1557019.1557154
  16. 16.
    Bose RJC, van der Aalst WM (2009) Context aware trace clustering: towards improving process mining results. In: Proceedings of the 2009 SIAM international conference on data mining. SIAM, pp 401–412Google Scholar
  17. 17.
    Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423.  https://doi.org/10.1002/j.1538-7305.1948.tb01338.x MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Bishop CM (2006) Pattern recognition and machine learning. Springer, BerlinzbMATHGoogle Scholar
  19. 19.
    Breuker D, Matzner M, Delfmann P, Becker J (2016) Comprehensible predictive models for business processes. MIS Q 40(4):1009–1034CrossRefGoogle Scholar
  20. 20.
    van der Aalst WMP (2011) Process mining: discovery, conformance and enhancement of business processes. Springer, Berlin.  https://doi.org/10.1007/978-3-642-19345-3 CrossRefzbMATHGoogle Scholar
  21. 21.
    van der Aalst WMP, Adriansyah A, van Dongen BF (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192.  https://doi.org/10.1002/widm.1045 CrossRefGoogle Scholar
  22. 22.
    Li M (2008) An introduction to Kolmogorov complexity and its applications, 3rd edn. Texts in computer science. Springer, New YorkCrossRefzbMATHGoogle Scholar
  23. 23.
    Schürmann T, Grassberger P (1996) Entropy estimation of symbol sequences. Chaos Interdiscip J Nonlinear Sci 6(3):414–427MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Cover T, King R (1978) A convergent gambling estimate of the entropy of English. IEEE Trans Inf Theory 24(4):413–421MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Greco G, Guzzo A, Pontieri L, Sacca D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8):1010–1027CrossRefGoogle Scholar
  26. 26.
    De Medeiros A, Guzzo A, Greco G, Van Der Aalst W, Weijters A, Van Dongen B, Saccà D (2008) Process mining based on clustering: a quest for precision, pp 17–29Google Scholar
  27. 27.
    Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., Los Altos, pp 289–296Google Scholar
  28. 28.
    Delias P, Doumpos M, Grigoroudis E, Matsatsinis N (2017) A non-compensatory approach for trace clustering. Int Trans Oper Res 26:1828–1846CrossRefGoogle Scholar
  29. 29.
    Ha QT, Bui HN, Nguyen TT (2016) A trace clustering solution based on using the distance graph model. In: International conference on computational collective intelligence. Springer, Berlin, pp 313–322Google Scholar
  30. 30.
    Singh S, Póczos B (2016) Analysis of k-nearest neighbor distances with application to entropy estimation. arXiv:1603.08578
  31. 31.
    Singh H, Misra N, Hnizdo V, Fedorowicz A, Demchuk E (2003) Nearest neighbor estimates of entropy. Am J Math Manag Sci 23(3–4):301–321MathSciNetGoogle Scholar
  32. 32.
    Delattre S, Fournier N (2017) On the Kozachenko–Leonenko entropy estimator. J Stat Plan Inference 185:69–93MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Thomas JA, Cover TM (2006) Elements of information theory. Wiley, New YorkzbMATHGoogle Scholar
  34. 34.
    MacKay DJC (2003) Information theory, inference and learning algorithms, 6. print edn. Cambridge University Press, CambridgezbMATHGoogle Scholar
  35. 35.
    Lesne A, Blanc JL, Pezard L (2009) Entropy estimation of very short symbolic sequences. Phys Rev E 79(4):046208MathSciNetCrossRefGoogle Scholar
  36. 36.
    Ziv J, Lempel A (1977) A universal algorithm for sequential data compression. IEEE Trans Inf Theory 23(3):337–343MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Ziv J, Lempel A (1978) Compression of individual sequences via variable-rate coding. IEEE Trans Inf Theory 24(5):530–536MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Back CO Eventropy—entropy estimation tool and CLI for XES event logs and other sequential data. https://github.com/backco/eventropy. Accessed 30 Apr 2019
  39. 39.
    Real life event logs. 4TU Centre for Research Data. https://data.4tu.nl/repository/collection:event_logs_real. Accessed 23 Jan 2018
  40. 40.
    Mannhardt F, Blinde D (2017) Analyzing the trajectories of patients with sepsis using process mining. In: RADAR+ EMISA, vol 1859, pp 72–80Google Scholar
  41. 41.
    Maggi F, Slaats T, Reijers H (2014) The automated discovery of hybrid processes. In: BPM, pp 392–399Google Scholar
  42. 42.
    Van Der Aalst WW (2017) Testing representational biases.  https://doi.org/10.4121/uuid:25d6eef5-c427-42b5-ab38-5e512cca08a9
  43. 43.
    Di Ciccio C, Bernardi ML, Cimitile M, Maggi FM (2015) Generating event logs through the simulation of declare models. In: Workshop on enterprise and organizational modeling and simulation. Springer, Berlin, pp 20–36Google Scholar
  44. 44.
    Di Ciccio C, Mecella M (2015) On the discovery of declarative control flows for artful processes. ACM Trans Manag Inf Syst 5(4):24:1–24:37.  https://doi.org/10.1145/2629447 CrossRefGoogle Scholar
  45. 45.
    Buijs J, Dongen B, Aalst W (2012) On the role of fitness, precision, generalization and simplicity in process discovery. In: On the move to meaningful internet systems: OTM 2012, vol 7565. Springer, Berlin, pp 305–322.  https://doi.org/10.1007/978-3-642-33606-5_19. http://wwwis.win.tue.nl/~wvdaalst/publications/p688.pdf
  46. 46.
    Back CO, Debois S, Slaats T (2018) Towards an empirical evaluation of imperative and declarative process mining. In: International conference on conceptual modeling. Springer, Cham, pp 191–198Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of CopenhagenCopenhagen ØDenmark
  2. 2.IT University of CopenhagenCopenhagen SDenmark

Personalised recommendations