A uniformization-based approach to preserve individuals’ privacy during process mining analyses

Abstract

Process Mining is a set of techniques that aim at discovering, monitoring and improving real processes by using logs of events created and stored by corporate information systems. The growing use of information and communication technologies and the imminent wide deployment of the Internet of Things enable the massive collection of events, which are going to be studied so as to improve all kinds of systems efficiency. Despite its enormous benefits, analyzing event logs might endanger individuals privacy, especially when those logs contain personal and confidential information, such as healthcare data. This article contributes to an emerging research direction within the process mining field, known as Privacy-Preserving Process Mining (PPPM), which embraces the privacy-by-design principle when conducting process mining analyses. We show that current solutions based on pseudonyms and encryption are vulnerable to attacks based on the analysis of the distribution of events combined with well-known location-oriented attacks such as the restricted space identification and the object identification attacks. With the aim to counteract these attacks, we present u-PPPM, a novel privacy-preserving process mining technique based on the uniformization of events distributions. This approach protects the privacy of the individuals appearing in event logs while minimizing the information loss during process discovery analyses. Experimental results, conducted using six real-life event logs, demonstrate the feasibility of our approach in real settings.

This is a preview of subscription content, access via your institution.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. 1.

    van der Aalst WMP (2011) Process mining: Discovery, conformance and enhancement of business processes. Springer, Berlin

    Google Scholar 

  2. 2.

    van der Aalst WMP (2016) Process mining: Data science in action. Springer, Berlin

    Google Scholar 

  3. 3.

    van der Aalst WMP (2016) Responsible data science: Using event data in a “people friendly” manner. In: Proceedings of the 18th international conference on enterprise information systems. Rome, Italy, pp 3–28

  4. 4.

    van der Aalst WMP, Adriansyah A, Alves de Medeiros AK, Arcieri F, Baier T, Blickle T, Bose JC, van den Brand P, Brandtjen R, Buijs J et al (2011) Process mining manifesto. In: Proceedings of the 9th international conference on business process management. Clermont-Ferrand, France, pp 169–194

  5. 5.

    Batista E, Solanas A (2018) Process mining in healthcare: A systematic review. In: Proceedings of the 9th international conference on information, intelligence, systems applications. Zakynthos, Greece, pp 1–6

  6. 6.

    Bauer M, Fahrenkrog-Petersen S, Koschmider A, Mannhardt F, van der Aa H, Weidlich M (2019) ELPaaS: event log privacy as a service. In: Proceedings of the dissertation award, doctoral consortium, and demonstration track at the 17th international conference on business process management. Vienna, Austria, pp 1–5

  7. 7.

    Brunk J, Riehle DM, Delfmann P (2018) Prediction of customer movements in large tourism industries by the means of process mining. Res Papers 40:1–16

    Google Scholar 

  8. 8.

    Buijs JCAM (2014) Receipt phase of an environmental permit application process (‘WABO’), CoSeLoG project. Eindhoven University of Technology. Dataset. https://doi.org/10.4121/uuid:a07386a5-7be3-4367-9535-70bc9e77dbe6https://doi.org/10.4121/uuid:a07386a5-7be3-4367-9535-70bc9e77dbe6

  9. 9.

    Burattin A, Conti M, Turato D (2015) Toward an anonymous process mining. In: Proceedings of the 3rd international conference on future internet of things & cloud. Rome, Italy, pp 58–63

  10. 10.

    van Dongen BF (2012) BPI Challenge 2012. 4TU. Centre for Research Data. Dataset. https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91fhttps://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f

  11. 11.

    van Dongen BF (2014) BPI Challenge 2014: Activity log for incidents. 4TU. Centre for Research Data. Dataset. https://doi.org/10.4121/uuid:86977bac-f874-49cf-8337-80f26bf5d2efhttps://doi.org/10.4121/uuid:86977bac-f874-49cf-8337-80f26bf5d2ef

  12. 12.

    van Dongen BF (2015) BPI Challenge 2015. 4TU. Centre for Research Data. Dataset. https://doi.org/10.4121/uuid:31a308ef-c844-48da-948c-305d167a0ec1https://doi.org/10.4121/uuid:31a308ef-c844-48da-948c-305d167a0ec1

  13. 13.

    Duma D, Aringhieri R (2020) An ad hoc process mining approach to discover patient paths of an Emergency Department. Flex Serv Manuf J 32(1):6–34

    Article  Google Scholar 

  14. 14.

    European Union (2016) Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Off J Eur Union L119:1–88

    Google Scholar 

  15. 15.

    Fahrenkrog-Petersen SA (2019) Providing privacy guarantees in process mining. In: Proceedings of the 31st international conference on advanced information systems engineering – doctoral consortium. Rome, Italy, pp 23–32

  16. 16.

    Fahrenkrog-Petersen SA, van der Aa H, Weidlich M (2019) PRETSA: Event log sanitization for privacy-aware process discovery. In: Proceedings of the 1st international conference on process mining. Aachen, Germany, pp 1–8

  17. 17.

    Garcia CdS, Meincheim A, Faria Junior ER, Dallagassa MR, Sato DMV, Carvalho DR, Santos EAP, Scalabrin EE (2019) Process mining techniques and applications – A systematic mapping study. Expert Syst Appl 133:260–295

    Article  Google Scholar 

  18. 18.

    Gatta R, Vallati M, Fernandez-Llatas C, Martinez-Millana A, Orini S, Sacchi L, Lenkowicz J, Marcos M, Munoz-Gama J, Cuendet M, de Bari B, Marco-Ruiz L, Stefanini A, Castellano M (2019) Clinical guidelines: a crossroad of many research areas. challenges and opportunities in process mining for healthcare. In: Proceedings of the 17th international conference on business process management. Vienna, Austria, pp 545–556

  19. 19.

    Ge C, Susilo W, Liu Z, Xia J, Szalachowski P, Liming F (2020) Secure keyword search and data sharing mechanism for cloud computing. IEEE Trans Dependable Secure Comput, 1–14

  20. 20.

    Ge C, Yin C, Liu Z, Fang L, Zhu J, Ling H (2020) A privacy preserve big data analysis system for wearable wireless sensor network. Comput Secur, 101887

  21. 21.

    Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, de Wolf PP (2012) Statistical disclosure control. Wiley, new York

    Google Scholar 

  22. 22.

    Isaak J, Hanna MJ (2018) User data privacy: Facebook, cambridge analytica, and privacy protection. Computer 51(8):56–59

    Article  Google Scholar 

  23. 23.

    Koutra D, Vogelstein JT, Faloutsos C (2013) DeltaCon: A principled massive-graph similarity function. In: Proceedings of the SIAM international conference on data mining. USA, Austin, pp 162–170

  24. 24.

    Kurniati AP, Hall G, Hogg D, Johnson O (2018) Process mining in oncology using the MIMIC-III dataset, vol 971

  25. 25.

    Liu C, Duan H, Zeng Q, Zhou M, Lu F, Cheng J (2016) Towards comprehensive support for privacy preservation cross-organization business process mining. IEEE Trans Serv Comput 12(4):1–15

    Google Scholar 

  26. 26.

    Machin J, Solanas A (2019) Conceptual description of nature-inspired cognitive cities: properties and challenges. In: Proceedings of the international work-conference on the interplay between natural and artificial computation. Almeria, Spain, pp 212–222

  27. 27.

    Mannhardt F, Koschmider A, Baracaldo N, Weidlich M, Michael J (2019) Privacy-Preserving Process Mining. Bus Inf Syst Eng 61(5):595–614

    Article  Google Scholar 

  28. 28.

    Mannhardt F, Petersen SA, de Oliveira MFD (2018) Privacy challenges for process mining in human-centered industrial environments. In: Proceedings of the 14th international conference on intelligent environments. Rome, Italy, pp 1–8

  29. 29.

    Michael J, Koschmider A, Mannhardt F, Baracaldo N, Rumpe B (2019) User-centered and privacy-driven process mining system design for iot. In: Proceedings of the 31st international conference on advanced information systems engineering. Rome, Italy, pp 194–206

  30. 30.

    Moreira C, Haven E, Sozzo S, Wichert A (2018) Process mining with real world financial loan applications: Improving inference on incomplete event logs. PloS One 13(12):e0207806

    Article  Google Scholar 

  31. 31.

    Nuñez von Voigt S, Fahrenkrog-Petersen SA, Janssen D, Koschmider A, Tschorsch F, Mannhardt F, Landsiedel O, Weidlich M (2020) Quantifying the Re-identification Risk of Event Logs for Process Mining. In: Proceedings of the 32nd international conference on advanced information systems engineering. Grenoble, France, pp 252–267

  32. 32.

    Papadimitriou P, Dasdan A, Garcia-Molina H (2010) Web graph similarity for anomaly detection. J Internet Serv Appl 1(1):19–30

    Article  Google Scholar 

  33. 33.

    Papageorgiou A, Strigkos M, Politou E, Alepis E, Solanas A, Patsakis C (2018) Security and privacy analysis of mobile health applications: the alarming state of practice. IEEE Access 6:9390–9403

    Article  Google Scholar 

  34. 34.

    Pika A, Wynn MT, Budiono S, ter Hofstede AHM, van der Aalst WMP, Reijers HA (2019) Towards privacy-preserving process mining in healthcare. In: Proceedings of the 2nd international workshop on process-oriented data science for healthcare. Vienna, Austria, pp 1–12

  35. 35.

    Rafiei M, van der Aalst WMP (2019) Mining roles from event logs while preserving privacy. In: Proceedings of the 17th international conference on business process management – workshop security and privacy-enhanced business process management. Vienna, Austria, pp 1–12

  36. 36.

    Rafiei M, van der Aalst WMP (2020) Practical aspect of privacy-preserving data publishing in process mining arXiv:2009.11542, pp 1–5

  37. 37.

    Rafiei M, van der Aalst WMP (2020) Privacy-preserving data publishing in process mining. In: Proceedings of the 18th international conference on business process management. Seville, Spain, pp 122–138

  38. 38.

    Rafiei M, Von Waldthausen L, van der Aalst WMP (2018) Ensuring confidentiality in process mining. In: Proceedings of the 8th international symposium on data-driven process discovery & analysis. Seville, Spain, pp 3–17

  39. 39.

    Rafiei M, Wagner M, van der Aalst WMP (2020) TLKC-privacy model for process mining. In: Proceedings of the 14th international conference on research challenges in information science. Limassol, Cyprus, pp 398–416

  40. 40.

    Ren Y, Zhu F, Sharma PK, Wang T, Wang J, Alfarraj O, Tolba A (2020) Data Query Mechanism Based on Hash Computing Power of Blockchain in Internet of Things. Sensors 20(1):207

    Article  Google Scholar 

  41. 41.

    Shoubridge P, Kraetzl M, Wallis WAL, Bunke H (2002) Detection of abnormal change in a time series of graphs. J Interconn Netw 3(01n02):85–101

    Article  Google Scholar 

  42. 42.

    Solanas A, Casino F, Batista E, Rallo R (2017) Trends and challenges in smart healthcare research: A journey from data to wisdom. In: Proceedings of the 3rd international forum on research and technologies for society and industry. Trento, Italy, pp 1–6

  43. 43.

    Solanas A, Patsakis C, Conti M, Vlachos IS, Ramos V, Falcone F, Postolache O, Pérez-Martínez PA, Di Pietro R, Perrea DN, Martínez-Ballesté A (2014) Smart health: A context-aware health paradigm within smart cities. IEEE Commun Mag 52(8):74–81

    Article  Google Scholar 

  44. 44.

    Steeman W (2013) BPI Challenge 2013, closed problems. Ghent University. Dataset. https://doi.org/10.4121/uuid:c2c3b154-ab26-4b31-a0e8-8f2350ddac11

  45. 45.

    Tillem G, Erkin Z, Lagendijk RL (2016) Privacy-preserving alpha algorithm for software analysis. In: Proceedings of the international symposium on information theory and signal processing in the Benelux. Louvain-la-Neuve, Belgium, pp 136–143

  46. 46.

    Weijters AJMM, van der Aalst WMP (2003) Rediscovering workflow models from event-based data using little thumb. Integr Comput Aided Eng 10(2):151–162

    Article  Google Scholar 

  47. 47.

    Weijters AJMM, van der Aalst WMP, Alves de Medeiros AK (2006) Process Mining with the Heuristics Miner Algorithm. Technische Universiteit Eindhoven, Tech Rep. WP 166, pp 1–34

  48. 48.

    Weske M (2007) Business process management – concepts, languages, architectures. Springer, Berlin

    Google Scholar 

  49. 49.

    Wu Q, He Z, Wang H, Wen L, Yu T (2019) A business process analysis methodology based on process mining for complaint handling service processes. Appl Sci 9(16):3313

    Article  Google Scholar 

Download references

Acknowledgements

The authors are supported by the Government of Catalonia (GC) with grant 2017-DI-002. A. Solanas is supported by the GC with project 2017-SGR-896, and by Fundació PuntCAT with the Vinton Cerf Distinction, and by the Spanish Ministry of Science & Technology with project IoTrain - RTI2018-095499-B-C32, and by the EU with project LOCARD (Grant Agreement no. 832735). Pictures designed by Freepik.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Edgar Batista.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Privacy-Preserving Computing

Guest Editors: Kaiping Xue, Zhe Liu, Haojin Zhu, Miao Pan and David S.L. Wei

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Batista, E., Solanas, A. A uniformization-based approach to preserve individuals’ privacy during process mining analyses. Peer-to-Peer Netw. Appl. (2021). https://doi.org/10.1007/s12083-020-01059-1

Download citation

Keywords

  • Process mining
  • Privacy
  • Privacy-preserving process mining
  • Distribution-based attacks
  • Uniformization strategies