Data Mining and Knowledge Discovery

, Volume 24, Issue 1, pp 218–246 | Cite as

Projection approaches to process mining using region-based techniques



Traces are everywhere from information systems that store their continuous executions, to any type of health care applications that record each patient’s history. The transformation of a set of traces into a mathematical model that can be used for a formal reasoning is therefore of great value. The discovery of process models out of traces is an interesting problem that has received significant attention in the last years. This is a central problem in Process Mining, a novel area which tries to close the cycle between system design and validation, by resorting on methods for the automated discovery, analysis and extension of process models. In this work, algorithms for the derivation of a Petri net from a set of traces are presented. The methods are grounded on the theory of regions, which maps a model in the state-based domain (e.g., an automata) into a model in the event-based domain (e.g., a Petri net). When dealing with large examples, a direct application of the theory of regions will suffer from two problems: one is the state-explosion problem, i.e., the resources required by algorithms that work at the state-level are sometimes prohibitive. This paper introduces decomposition and projection techniques to alleviate the complexity of the region-based algorithms for Petri net discovery, thus extending its applicability to handle large inputs. A second problem is known as the overfitting problem for region-based approaches, which informally means that, in order to represent with high accuracy the trace set, the models obtained are often spaghetti-like. By focusing on special type of processes called conservative and for which an elegant theory and efficient algorithms can be devised, the techniques presented in this paper alleviate the overfitting problem and moreover incorporate structure into the models generated.


Process mining Theory of regions Petri nets 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Arnold A (1994) Finite transition systems. Prentice Hall, Englewood CliffsMATHGoogle Scholar
  2. Badouel E, Bernardinello L, Darondeau P (1995) Polynomial algorithms for the synthesis of bounded nets. In: Theory and practice of software (TAPSOFT). Lecture notes in computer science, vol 915. pp 364–383Google Scholar
  3. Bergenthum R, Desel J, Lorenz R, SMauser (2007) Process mining based on regions of languages. In: Proceedings of 5th international conference on business process management (BPM), pp 375–383Google Scholar
  4. Bergenthum R, Desel J, Lorenz R, Mauser S (2008) Synthesis of Petri nets from finite partial languages. Fundam Inform 88(4): 437–468MATHMathSciNetGoogle Scholar
  5. Carmona J, Cortadella J, Kishinevsky M (2008a) Divide-and-conquer strategies for process mining. Tech. Rep. LSI-08-35-R, Software Department, Universitat Politècnica de CatalunyaGoogle Scholar
  6. Carmona J, Cortadella J, Kishinevsky M (2008b) A region-based algorithm for discovering Petri nets from event logs. In: Dumas M, Reichert M, Shan MC (eds) Proceedings of 6th international conference on business process management (BPM). Lecture notes in computer science, vol 5240. Springer, Berlin, pp 358–373Google Scholar
  7. Carmona J, Cortadella J, Kishinevsky M, Kondratyev A, Lavagno L, Yakovlev A (2008c) A symbolic algorithm for the synthesis of bounded Petri nets. In: van Hee KM, Valk R (eds) 29th international conference on application and theory of Petri nets and other models of concurrency, vol 5062.Google Scholar
  8. Carmona J, Cortadella J, Kishinevsky M (2009) Divide-and-conquer strategies for process mining. In: Dayal U, Eder J, Koehler J, Reijers HA (eds) Proc. 7th international conference on business process management (BPM). Lecture notes in computer science, vol 5701. Springer, Heidelberg, pp 327–343Google Scholar
  9. Carmona J, Cortadella J, Kishinevsky M (2009) New region-based algorithms for deriving bounded Petri nets. IEEE Trans Comp 59(3): 371–384. doi: 10.1109/TC.2009.131 CrossRefMathSciNetGoogle Scholar
  10. Cook JE, Wolf AL (1998) Discovering models of software processes from event-based data. ACM Trans Softw Eng Methodol 7(3): 215–249CrossRefGoogle Scholar
  11. Cortadella J, Kishinevsky M, Lavagno L, Yakovlev A (1998) Deriving Petri nets from finite transition systems. IEEE Trans Comput 47(8): 859–882CrossRefMathSciNetGoogle Scholar
  12. Cvetković D, Rowlinson P, Simić S (1997) Eigenspaces of graphs. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  13. Desel J, Reisig W (1996) The synthesis problem of Petri nets. Acta Informatica 33(4): 297–315CrossRefMathSciNetGoogle Scholar
  14. Dill D, Drexler A, Hu A, Yang C (1992) Protocol verification as a hardware design aid. In: Computer design: VLSI in computers and processors, 1992. ICCD ’92. Proceedings. IEEE 1992 international conference, pp 522–525Google Scholar
  15. Dongen B, Busi N, Pinna G, van Der Aalst WMP (2007) An iterative algorithm for applying the theory of regions in process mining. In: Workshop on formal aspects of business processes and web servicesGoogle Scholar
  16. Ehrenfeucht A, Rozenberg G (1990) Partial (Set) 2-structures. Part I, II. Acta Informatica 27: 315–368CrossRefMATHMathSciNetGoogle Scholar
  17. Fiduccia CM, Mattheyses RM (1982) A linear-time heuristic for improving network partitions. In: Proceedings of the 19th conference on design automation (DAC ’82). IEEE Press, Piscataway, pp 175–181Google Scholar
  18. Ghionna L, Greco G, Guzzo A, Pontieri L (2008) Outlier detection techniques for process mining applications. In: Foundations of intelligent systems, 17th international symposium, ISMIS 2008, Toronto, Canada, 20–23 May 2008. Proceedings. Lecture notes in computer science, vol 4994. Springer, Berlin, pp 150–159Google Scholar
  19. Greco G, Guzzo A, Pontieri L, Saccà D (2006) Discovering expressive process models by clustering log traces. IEEE Trans Knowl Data Eng 18(8): 1010–1027CrossRefGoogle Scholar
  20. Günther C (2009) Process Mining in Flexible Environments. Dissertation, Technical University of Eindhoven, EindhovenGoogle Scholar
  21. Hack M (1972) Analysis of production schemata by Petri nets. M.S. thesis, MITGoogle Scholar
  22. Harel D (1987) Statecharts: A visual formulation for complex systems. Sci Comput Program 8(3): 231–274CrossRefMATHMathSciNetGoogle Scholar
  23. Hoare CAR (1978) Communicating sequential processes. In: Communications of the ACM, pp 666–677Google Scholar
  24. Jolliffe IT (2002) Principal component analysis. Springer, New YorkMATHGoogle Scholar
  25. Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(1): 291–307MATHGoogle Scholar
  26. Kindler E, Rubin V, Schäfer W (2006) Process mining and Petri net synthesis. In: Eder J, Dustdar S (eds) Business process management workshops. Lecture notes in computer science, vol 4103. Springer, Heidelberg, pp 105–116Google Scholar
  27. Maruster L, Weijters AJMM, van der Aalst WMP, van den Bosch A (2006) A rule-based approach for process discovery: Dealing with noise and imbalance in process logs. Data Min Knowl Discov 13(1): 67–87CrossRefMathSciNetGoogle Scholar
  28. McMillan KL (2001) Parameterized verification of the flash cache coherence protocol by compositional model checking. In: Margaria T, Melham TF (eds) Correct hardware design and verification methods (CHARME). Lecture notes in computer science, vol 2144. Springer, Heidelberg, pp 179–195Google Scholar
  29. Medeiros AKA, van der Aalst WMP, Weijters AJMM (2003) Workflow mining: Current status and future directions. In: Meersman R, Tari Z, Schmidt DC (eds) On the move to meaningful internet systems, CoopIS/DOA/ODBASE. Lecture notes in computer science, vol 2888. Springer, Heidelberg, pp 389–406Google Scholar
  30. Medeiros AA, Guzzo A, Greco G, van der Aalst W, Weijters A, van Dongen B, Sacca D (2008) Process mining based on clustering: A quest for precision. In: ter Hofstede A, Benatallah B, Paik H (eds) BPM 2007 international workshops (BPI, BPD, CBP, ProHealth, RefMod, Semantics4ws). Lecture notes in computer science, vol 4928. Springer, Berlin, pp 17–29Google Scholar
  31. Medeiros AKA, Weijters AJMM, van der Aalst WMP (2007) Genetic process mining: an experimental evaluation. Data Min Knowl Discov 14(2): 245–304CrossRefMathSciNetGoogle Scholar
  32. Milner R (1980) A calculus of communicating systems. Lecture notes in computer science. Springer, BerlinGoogle Scholar
  33. Mukund M (1992) Petri nets and step transition systems. Int J Found Comp Sci 3(4): 443–478CrossRefMATHMathSciNetGoogle Scholar
  34. Murata T (1989) Petri nets: properties, analysis and applications. Proc IEEE 77: 541–580CrossRefGoogle Scholar
  35. Pretorius AJ (2008) Visualization of state transition graphs. PhD thesis, Technical University of EindhovenGoogle Scholar
  36. Rozinat A, van der Aalst WMP (2008) Conformance checking of processes based on monitoring real behavior. Inf Syst 33(1): 64–95CrossRefGoogle Scholar
  37. Schaefer M, Vogler W (2007) Component refinement and csc-solving for stg decomposition. Theor Comp Sci 388(1–3): 243–266MATHMathSciNetGoogle Scholar
  38. Silva M, Teruel E, Colom JM (1998) Linear algebraic and linear programming techniques for the analysis of place/transition net systems. In: Reisig W, Rozenberg G (eds) Lecture notes in computer science: lectures on Petri nets I: basic models, vol 1491. Springer, Berlin, pp 309–373Google Scholar
  39. Talupur M, Tuttle MR (2008) Going with the flow: Parameterized verification using message flows. In: Cimatti A, Jones RB (eds) Formal methods in computer-aided design (FMCAD), IEEE Press, Los Alamitos, pp 1–8Google Scholar
  40. van der Aalst W, Günther C (2007) Finding structure in unstructured processes: The case for process mining. In: Basten T, Juhás G, Shukla SK Application of concurrency to system design (ACSD). IEEE Computer Society, Bratislava, pp 3–12Google Scholar
  41. van der Aalst W, Medeiros AKA, Weijters T. (2005) Genetic process mining. In: 26th international conference on applications and theory of Petri nets 2005 (ICATPN), Miami, USA, 20–25 June 2005. Proceedings. Lecture notes in computer science, vol 3536. Springer, Berlin, pp 48–69Google Scholar
  42. van der Aalst W, Weijters T, Maruster L (2004) Workflow mining: Discovering process models from event logs. IEEE Trans Knowl Data Eng 16(9): 1128–1142CrossRefGoogle Scholar
  43. van der Aalst W, Rubin V, Verbeek HMWE, Dongen B, Kindler E, Günther, C (2009) Process mining: a two-step approach to balance between underfitting and overfitting. Softw Syst Model 9(1):87–111 (2010)Google Scholar
  44. Verbeek H, Pretorius A, van der Aalst WMP, van Wijk JJ (2007) On Petri-net synthesis and attribute-based visualization. In: Proc. workshop on petri nets and software engineering (PNSE’07), pp 127–141Google Scholar
  45. Vogler W (1992) Modular construction and partial order semantics of Petri nets. In: LNCS, vol 625. Springer, HeidelbergGoogle Scholar
  46. Wei YC, Cheng CK (1991) Ratio cut partitioning for hierarchical designs. IEEE Trans Comput-Aided Des Integr Circuits Syst 10(7): 911–921. doi: 10.1109/43.87601 CrossRefGoogle Scholar
  47. Wen L, van der Aalst WMP, Wang J, Sun J (2007) Mining process models with non-free-choice constructs. Data Min Knowl Discov 15(2): 145–180CrossRefMathSciNetGoogle Scholar
  48. Wen L, Wang J, van der Aalst W, Huang B, Sun J (2009) A novel approach for process mining based on event types. J Intell Inf Syst 32: 163–190CrossRefGoogle Scholar
  49. Werf JMEM, van Dongen BF, Hurkens CAJ, Serebrenik A (2008) Process discovery using integer linear programming. In: van Hee KM, Valk R (eds) Petri Nets. 29th international conference on application and theory of Petri nets and other models of concurrency. Lecture notes in computer science, vol 5062. Springer, Berlin, pp 368–387Google Scholar
  50. Weijters A, van der Aalst W, de Medeiros AA (2006) Process mining with the heuristics miner-algorithm. Tech Rep WP 166, BETA Working Paper Series, Eindhoven University of TechnologyGoogle Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.Universitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations