Advertisement

Privacy-Preserving Process Mining

Differential Privacy for Event Logs
  • Felix Mannhardt
  • Agnes KoschmiderEmail author
  • Nathalie Baracaldo
  • Matthias Weidlich
  • Judith Michael
Research Paper

Abstract

Privacy regulations for data can be regarded as a major driver for data sovereignty measures. A specific example for this is the case of event data that is recorded by information systems during the processing of entities in domains such as e-commerce or health care. Since such data, typically available in the form of event log files, contains personalized information on the specific processed entities, it can expose sensitive information that may be traced back to individuals. In recent years, a plethora of methods have been developed to analyse event logs under the umbrella of process mining. However, the impact of privacy regulations on the technical design as well as the organizational application of process mining has been largely neglected. This paper set out to develop a protection model for event data privacy which applies the well-established notion of differential privacy. Starting from common assumptions about the event logs used in process mining, this paper presents potential privacy leakages and means to protect against them. The paper also shows at which stages of privacy leakages a protection model for event logs should be used. Relying on this understanding, the notion of differential privacy for process discovery methods is instantiated, i.e., algorithms that aim at the construction of a process model from an event log. The general feasibility of our approach is demonstrated by its application to two publicly available real-life events logs.

Keywords

Differential privacy Process mining Event logs Data protection Data sovereignty 

References

  1. Accorsi R, Stocker T, Müller G (2013) On the exploitation of process mining for security audits: the process discovery case. In: Shin Sung Y, Maldonado JC (eds) Proceedings of the 28th annual ACM symposium on applied computing, SAC ’13, Coimbra, Portugal, March 18–22. ACM, pp 1462–1468Google Scholar
  2. Adam K, Netz L, Varga S, Michael J, Rumpe B, Heuser P, Letmathe P (2018) Model-based generation of enterprise information systems. In: Fellmann M, Sandkuhl K (eds) Enterprise modeling and information systems architectures (EMISA’18), volume 2097 of CEUR workshop proceedings, pp 75–79. CEUR-WS.orgGoogle Scholar
  3. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD ’00. ACM, New York, NY, pp 439–450Google Scholar
  4. Aldeen YAAS, Salleh M, Razzaque MA (2015) A comprehensive review on privacy preserving data mining. SpringerPlus 4(1):694CrossRefGoogle Scholar
  5. Arasu A, Babcock B, Babu S, Cieslewicz J, Datar M, Ito K, Motwani R, Srivastava U, Widom J (2016) STREAM: the Stanford data stream management system. In: Garofalakis MN, Gehrke J, Rastogi R (eds) Data stream management: processing high-speed data streams, data-centric systems and applications. Springer, Berlin, pp 317–336CrossRefGoogle Scholar
  6. Augusto A, Conforti R, Dumas M, La Rosa M, Maggi FM, Marrella A, Mecella M, Soo A (2017) Automated discovery of process models from event logs: review and benchmark. IEEE Trans Knowl Data Eng (accepted)Google Scholar
  7. Bergeron E (2000) The difference between security and privacyGoogle Scholar
  8. Bertino E, Lin D, Jiang W (2008) A survey of quantification of privacy preserving data mining algorithms. Springer, Boston, MA, pp 183–205CrossRefGoogle Scholar
  9. Bhowmick SS, Gruenwald L, Iwaihara M, Chatvichienchai S (2006) PRIVATE-IYE: a framework for privacy preserving data integration. In: 22nd international conference on data engineering workshops (ICDEW’06), pp 91–91Google Scholar
  10. Blum A, Dwork C, McSherry F, Nissim K (2005) Practical privacy: the SuLQ framework. In: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 128–138Google Scholar
  11. Bonomi L, Xiong L (2013) A two-phase algorithm for mining sequential patterns with differential privacy. In: Proceedings of the 22nd ACM international conference on conference on information & knowledge management-CIKM ’13. ACM Press, New YorkGoogle Scholar
  12. Colombo P, Ferrari E (2015) Privacy aware access control for big data: a research roadmap. Big Data Res 2:145–154CrossRefGoogle Scholar
  13. D’Acquisto G, Domingo-Ferrer J, Kikiras P, Torra V, de Montjoye Y-A, Bourka A (2015a) Privacy by design in big data: an overview of privacy enhancing technologies in the era of big data analytics. CoRR arXiv:abs/1512.06000
  14. D’Acquisto G, Domingo-Ferrer J, Kikiras P, Torra V, de Montjoye Y-A, Bourka A (2015b) Privacy by design in big data: an overview of privacy enhancing technologies in the era of big data analyticsGoogle Scholar
  15. Dankar FK, El Emam K (2013) Practicing differential privacy in health care: a review. Trans Data Priv 6(1):35–67Google Scholar
  16. de Leoni M, Mannhardt F (2015) Road traffic fine management process. Eindhoven University of Technology, Eindhoven (Dataset) Google Scholar
  17. Dwork C (2008) Differential privacy: a survey of results. In: International conference on theory and applications of models of computation, Springer, Berlin, pp 1–19Google Scholar
  18. Dwork C, Naor M, Pitassi T, Rothblum GN (2010) Differential privacy under continual observation. In: Proceedings of the 42nd ACM symposium on theory of computing-STOC ’10. ACM Press, New YorkGoogle Scholar
  19. Dwork C, Roth A et al (2014) The algorithmic foundations of differential privacy. Found Trends® Theor Comput Sci 9(3–4):211–407Google Scholar
  20. Eibl G, Ferner C, Hildebrandt T, Stertz F, Burkhart S, Rinderle-Ma S, Engel D (2017) Exploration of the potential of process mining for intrusion detection in smart metering. In: ICISSPGoogle Scholar
  21. ElSalamouny E, Gambs S (2016) Differential privacy models for location-based services. Trans Data Priv 9(1):15–48Google Scholar
  22. Fazzinga B, Flesca S, Furfaro F, Pontieri L (2018) Online and offline classification of traces of event logs on the basis of security risks. J Intell Inf Syst 50(1):195–230CrossRefGoogle Scholar
  23. Hoepman J-H (2014) Privacy design strategies. In: Cuppens-Boulahia N, Cuppens F, Jajodia S, Kalam AAE, Sans T (eds) ICT systems security and privacy protection. Springer, Berlin, pp 446–459CrossRefGoogle Scholar
  24. Hoepman J-H (2018) Making privacy by design concrete. In: European cyber security perspectives 2018. Radboud Repository, pp 26–28Google Scholar
  25. Hsu J, Gaboardi M, Haeberlen A, Khanna S, Narayan A, Pierce BC, Roth A (2014) Differential privacy: an economic method for choosing epsilon. In: Proceedings of the 2014 IEEE 27th computer security foundations symposium, CSF ’14. IEEE Computer Society, Washington, DC, pp 398–410Google Scholar
  26. ISO/IEC 27000 (2018) Information technology-security techniques-information security management systems-overview and vocabulary, fifth edn. Standard, International Organization for StandardizationGoogle Scholar
  27. Kim JJ, Kim JJ, Winkler WE, Winkler WE (2003) Multiplicative noise for masking continuous data. Technical report, Statistical Research Division, US Bureau of the Census, Washington, DCGoogle Scholar
  28. Leemans SJJ, Fahland D, vander Aalst WMP (2013) Discovering block-structured process models from event logs containing infrequent behaviour. In: BPM 2013 workshops, volume 171 of LNBIP. Springer, pp 66–78Google Scholar
  29. Leemans SJJ, Fahland D, van der Aalst WMP (2014) Process and deviation exploration with inductive visual miner. In: BPM 2014 demos, volume 1295 of CEUR workshop proceedings, p 46. CEUR-WS.orgGoogle Scholar
  30. Leemans SJJ, Fahland D, van der Aalst WMP (2018) Scalable process discovery and conformance checking. Softw Syst Model 17(2):599–631CrossRefGoogle Scholar
  31. Macedo R, Paulo J, Pontes R, Portela B, Oliveira T, Matos M, Oliveira R (2017) A practical framework for privacy-preserving NoSQL databases. In: SRDS. IEEE Computer Society, pp 11–20Google Scholar
  32. Mannhardt F (2016) Sepsis cases-event log. Eindhoven University of Technology, Eindhoven (Dataset) Google Scholar
  33. Mannhardt F, Blinde D (2017) Analyzing the trajectories of patients with sepsis using process mining. In: RADAR+EMISA 2017, volume 1859 of CEUR workshop proceedings, pp 72–80. CEUR-WS.orgGoogle Scholar
  34. Mannhardt F, Petersen S, de Oliveira MFD (2018) Privacy challenges for process mining in human-centered industrial environments. In: 14th international conference on intelligent environments (IE). IEEE Xplore, pp 64–71Google Scholar
  35. Mans RS, van der Aalst WMP, Vanwersch RJB, Moleman AJ (2013) Process mining in healthcare: data challenges when answering frequently posed questions. In: Lenz R, Miksch S, Peleg M, Reichert M, Riaño D, ten Teije A (eds) Process support and knowledge representation in health care. Springer, Berlin, pp 140–153CrossRefGoogle Scholar
  36. McSherry F (2010) Privacy integrated queries. Commun ACM 53(9):89CrossRefGoogle Scholar
  37. McSherry F, Mahajan R (2011) Differentially-private network trace analysis. ACM SIGCOMM Comput Commun Rev 41(4):123–134CrossRefGoogle Scholar
  38. Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582CrossRefGoogle Scholar
  39. Mettler M (2016) Blockchain technology in healthcare: the revolution starts here. In: 2016 IEEE 18th international conference on e-health networking, applications and services (Healthcom), pp 1–3Google Scholar
  40. Michael J, Steinberger C (2017) Context modeling for active assistance. In: Cabanillas C, España S, Farshidi S (eds) Proceedings of the ER forum 2017 and the ER 2017 demo track co-located with the 36th international conference on conceptual modelling (ER 2017), pp 221–234Google Scholar
  41. Michael J, Koschmider A, Mannhardt F, Baracaldo N, Rumpe B (2019) User-centered and privacy-driven process mining system design for IoT. In: information systems engineering in responsible information systems-CAiSE forum 2019, Rome, Proceedings, pp 194–206Google Scholar
  42. Myers D, Radke K, Suriadi S, Foo E (2017) Process discovery for industrial control system cyber attack detection. In: De Capitani di Vimercati S, Martinelli F (eds) ICT systems security and privacy protection. Springer, Cham, pp 61–75CrossRefGoogle Scholar
  43. Peterson ZNJ, Gondree M, Beverly R (2011) A position paper on data sovereignty: the importance of geolocating data in the cloud. In: Proceedings of the 3rd USENIX conference on hot topics in cloud computing, HotCloud’11. USENIX Association, Berkeley, CA, pp 9–9Google Scholar
  44. Rozinat A, van der Aalst WMP (2006) Decision mining in ProM. In: Lecture notes in computer science. Springer, Berlin, pp 420–425Google Scholar
  45. Sacco O, Breslin JG, Decker S (2013) Fine-grained trust assertions for privacy management in the social semantic web. In: 2013 12th IEEE international conference on trust, security and privacy in computing and communications, pp 218–225Google Scholar
  46. Sicari S, Rizzardi A, Grieco LA, Coen-Porisini A (2015) Security, privacy and trust in Internet of Things: the road ahead. Comput Netw 76:146–164CrossRefGoogle Scholar
  47. Stocker T, Accorsi R (2014) SecSy: a security-oriented tool for synthesizing process event logs. In: Limonad L, Weber B (eds) Proceedings of the BPM demo sessions 2014 co-located with the 12th international conference on business process management (BPM 2014), Eindhoven, The Netherlands, September 10, 2014, volume 1295 of CEUR workshop proceedings, p 71. CEUR-WS.orgGoogle Scholar
  48. van der Aalst WMP (2016) Process mining: data science in action, 2nd edn. Springer, BerlinCrossRefGoogle Scholar
  49. van der Aalst W, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192CrossRefGoogle Scholar
  50. van Eck ML, Lu X, Leemans SJJ, van der Aalst WMP (2015) \(\text{PM}^{2}\): a process mining project methodology. In: Advanced information systems engineering. Springer, pp 297–313Google Scholar
  51. Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. SIGMOD Rec 33(1):50–57CrossRefGoogle Scholar
  52. Yu WE (2014) Data privacy and big data-compliance issues and considerations. ISACA J 3:27–31Google Scholar
  53. Yu X, Wen Q (2010) A view about cloud data security from data life cycle. In: 2010 international conference on computational intelligence and software engineering, pp 1–4Google Scholar
  54. Zhang Z, Qin Z, Zhu L, Weng J, Ren K (2017) Cost-friendly differential privacy for smart meters: exploiting the dual roles of the noise. IEEE Trans Smart Grid 8(2):619–626Google Scholar
  55. Zhiqiang G, Longjun Z (2018) Privacy preserving data mining on big data computing platform: trends and future. In: Barolli L, Woungang I, Hussain OK (eds) Advances in intelligent networking and collaborative systems. Springer, Cham, pp 491–502CrossRefGoogle Scholar

Copyright information

© Springer Fachmedien Wiesbaden GmbH, ein Teil von Springer Nature 2019

Authors and Affiliations

  • Felix Mannhardt
    • 1
  • Agnes Koschmider
    • 2
    Email author
  • Nathalie Baracaldo
    • 3
  • Matthias Weidlich
    • 4
  • Judith Michael
    • 5
  1. 1.Department of Technology ManagementSINTEF DigitalTrondheimNorway
  2. 2.Department of Computer ScienceKiel UniversityKielGermany
  3. 3.IBM Almaden Research CenterSan JoseUSA
  4. 4.Humboldt-Universität zu BerlinBerlinGermany
  5. 5.Software EngineeringRWTH Aachen UniversityAachenGermany

Personalised recommendations