Differential Privacy Analysis of Data Processing Workflows

Dumas, Marlon; García-Bañuelos, Luciano; Laud, Peeter

doi:10.1007/978-3-319-46263-9_4

Differential Privacy Analysis of Data Processing Workflows

Marlon Dumas¹⁶,
Luciano García-Bañuelos¹⁶ &
Peeter Laud¹⁷

Conference paper
First Online: 08 September 2016

699 Accesses
4 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9987))

Abstract

Differential privacy is an established paradigm to measure and control private information leakages occurring as a result of disclosures of derivatives of sensitive data sources. The bulk of differential privacy research has focused on designing mechanisms to ensure that the output of a program or query is $\epsilon $-differentially private with respect to its input. In an enterprise environment however, data processing generally occurs in the context of business processes consisting of chains of tasks performed by multiple IT system components, which disclose outputs to multiple parties along the way. Ensuring privacy in this setting requires us to reason in terms of series of disclosures of intermediate and final outputs, derived from multiple data sources. This paper proposes a method to quantify the amount of private information leakage from each sensitive data source vis-a-vis of each party involved in a business process. The method relies on generalized composition rules for sensitivity and differential privacy, which are applicable to chained compositions of tasks, where each task may have multiple inputs and outputs of different types, and such that a differentially private output of a task may be taken as input by other tasks.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
A pool in BPMN (represented by a horizontal rectangle) represents an independent organizational entity that communicates with other entities via message flows, represented via dashed arrows.
2.
Acyclicity is also required on data dependencies as way to simplify the presentation. However, this restriction does not affect the generality of our approach. A cyclic data dependency would usually stand for a data update access. The same intuition can alternatively be represented with two data nodes: one data node representing the read data object and the other one representing the written data.
3.
Note that there exists another topological order of the processing nodes of the example, namely [A, C, B, D]. Either one would produce the same output matrices.
4.
The tool is available at http://pleak.io/ for research purposes.

References

Accorsi, R., Lehmann, A., Lohmann, N.: Information leak detection in business process models: theory, application, and tool support. Inf. Syst. 47, 244–257 (2015)
Article Google Scholar
Barthe, G., Köpf, B., Olmedo, F., Béguelin, S.Z.: Probabilistic relational reasoning for differential privacy. ACM Trans. Program. Lang. Syst. 35(3), 9 (2013)
Article MATH Google Scholar
Berthold, M.R., Nicolas, C., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Thiel, K., Wiswedel, B.: KNIME - the Konstanz Information Miner: version 2.0 and beyond. SIGKDD Explor. 11(1), 26–31 (2009)
Article Google Scholar
Chatzikokolakis, K., Andrés, M.E., Bordenabe, N.E., Palamidessi, C.: Broadening the scope of differential privacy using metrics. In: De Cristofaro, E., Wright, M. (eds.) PETS 2013. LNCS, vol. 7981, pp. 82–102. Springer, Heidelberg (2013)
Chapter Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Chapter Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Chapter Google Scholar
Ebadi, H., Sands, D.: Featherweight PINQ (2015). CoRR arXiv:1505.02642
ElSalamouny, E., Chatzikokolakis, K., Palamidessi, C.: Generalized differential privacy: regions of priors that admit robust optimal mechanisms. In: van Breugel, F., Kashefi, E., Palamidessi, C., Rutten, J. (eds.) Horizons of the Mind. LNCS, vol. 8464, pp. 292–318. Springer, Heidelberg (2014)
Chapter Google Scholar
Frau, S., Gorrieri, R., Ferigato, C.: Petri net security checker: structural non-interference at work. In: Degano, P., Guttman, J., Martinelli, F. (eds.) FAST 2008. LNCS, vol. 5491, pp. 210–225. Springer, Heidelberg (2009)
Chapter Google Scholar
Gaboardi, M., Haeberlen, A., Hsu, J., Narayan, A., Pierce, B.C.: Linear dependent types for differential privacy. In: Giacobazzi, R., Cousot, R. (eds.) The 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2013, Rome, Italy, 23–25 January 2013, pp. 357–370. ACM (2013)
Google Scholar
Kimball, R., Reeves, L., Thornthwaite, W., Ross, M., Thornwaite, W.: The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses, 1st edn. Wiley, New York (1998)
Google Scholar
McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Çetintemel, U., Zdonik, S.B., Kossmann, D., Tatbul, N. (eds.) Proceedings of ACM SIGMOD International Conference on Management of Data, SIGMOD, Providence, Rhode Island, USA, 29th June–2nd July 2009, pp. 19–30. ACM (2009)
Google Scholar
Object Management Group: Business Process Model and Notation (BPMN) Version 2.0 (2011)
Google Scholar
Reed, J., Pierce, B.C.: Distance makes the types grow stronger: a calculus for differential privacy. In: Hudak, P., Weirich, S. (eds.) Proceeding of 15th ACM SIGPLAN International Conference on Functional Programming, ICFP 2010, Baltimore, Maryland, USA, 27–29 September 2010, pp. 157–168. ACM (2010)
Google Scholar

Download references

Acknowledgments

This work is funded by DARPA’s “Brandeis” programme.

Author information

Authors and Affiliations

University of Tartu, Tartu, Estonia
Marlon Dumas & Luciano García-Bañuelos
Cybernetica, Tallinn, Estonia
Peeter Laud

Authors

Marlon Dumas
View author publications
You can also search for this author in PubMed Google Scholar
Luciano García-Bañuelos
View author publications
You can also search for this author in PubMed Google Scholar
Peeter Laud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luciano García-Bañuelos .

Editor information

Editors and Affiliations

INSA Rennes & IRISA, Rennes, France
Barbara Kordy
KTH Royal Institute of Technology , Stockholm, Sweden
Mathias Ekstedt
University of Canterbury , Christchurch, New Zealand
Dong Seong Kim

A Lifting Distances to Probability Distributions

To interpret a privacy-enhanced DP-workflow $(W,\mathcal {E},\mathcal {C})$ (where $W=(D,P,F)$), we have to give metrics $\mathsf {d}_{d}$ on $\mathcal {D}({X_{d}})$ for each $d\in D$. Moreover, for the interpretation to be matched by the annotations, the mappings $\overline{f}_{{p}\rightarrow {d}}$ between these probability distributions must have the sensitivities given by $\mathcal {E}$ and $\mathcal {C}$. It may be more natural to assume that the interpretation gives us metrics on $X_{d}$, not on $\mathcal {D}({X_{d}})$. It is also more natural to require the mappings $f_{{p}\rightarrow {d}}$ to have a certain sensitivity.

We thus define that a pre-interpretation consists of sets $X_{d}$ for each $d\in D$ together with a metric $\mathsf {d}^\flat _{d}$ on it, as well as the mappings $f_{{p}\rightarrow {d}}$ for each $p\in P$ and $d\in p\bullet $. We have to specify what kind of interpretation it generates, and when to the annotations $\mathcal {E},\mathcal {C}$ match the pre-interpretation. The key for this is to specify the metric $\mathsf {d}_{d}$ on $\mathcal {D}({X_{d}})$.

Let X be a set and $d_X$ a metric on it. It turns out that the following definition of a metric $d^\#_X$ on $\mathcal {D}({X})$ is a suitable one. Let $\chi ,\chi '\in \mathcal {D}({X})$. Then

$$\begin{aligned} d^\#_X(\chi ,\chi ')=\inf _{\psi \in \chi \otimes \chi '}\sup _{(x,x')\in \mathrm {supp}({\psi })} d_X(x,x'). \end{aligned}$$

(A.1)

The proposed metric $d^\#_X$ can be seen as a kind of “worst-case” earth mover’s distance (or Wasserstein metric). In the “usual” earth mover’s distance, one would take the average over $\psi $, not the supremum over $\mathrm {supp}({\psi })$.

The suitability of the construction (A.1) is given by the following two propositions. Note that the first of them would not hold for the “usual” earth mover’s distance.

Proposition 3

Let $f:X\rightarrow \mathcal {D}({Y})$ be $\varepsilon $-sensitive according to the distance $d_X$ on X and distance $d_\mathrm {dp}$ on $\mathcal {D}({Y})$. Then the lifting $\overline{f}:\mathcal {D}({X})\rightarrow \mathcal {D}({Y})$ is $\varepsilon $-sensitive according to the distance $d^\#_X$ on $\mathcal {D}({X})$ and $d_\mathrm {dp}$ on $\mathcal {D}({Y})$.

Proof

Let $\chi ,\chi '\in \mathcal {D}({X})$, $\psi \in \chi \otimes \chi '$ and $y\in Y$. Then

$$\begin{aligned}&\mathrm {Pr}[\overline{f}(\chi )=y] = \sum _{x\in X}\chi (x)\cdot \mathrm {Pr}[f(x)=y] = \sum _{x,x'\in X}\psi (x,x')\cdot \mathrm {Pr}[f(x)=y] \le \\&\qquad \qquad \qquad \quad \sum _{x,x'\in X}\psi (x,x')\cdot e^{\varepsilon \cdot d_X(x,x')}\mathrm {Pr}[f(x')=y] \le \\&\qquad \quad \quad \sum _{x,x'\in X}\psi (x,x')\cdot e^{\sup _{x\in \mathrm {supp}({\psi (\cdot ,x')})}\varepsilon \cdot d_X(x,x')}\mathrm {Pr}[f(x')=y] = \\&\qquad \qquad \quad \sum _{x'\in X}\chi '(x')\cdot e^{\sup _{x\in \mathrm {supp}({\psi (\cdot ,x')})}\varepsilon \cdot d_X(x,x')}\mathrm {Pr}[f(x')=y] \le \\&\qquad \qquad \quad e^{\sup _{x,x'\in \mathrm {supp}({\psi })}\varepsilon \cdot d_X(x,x')}\cdot \sum _{x'\in X}\chi '(x')\cdot \mathrm {Pr}[f(x')=y] = \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad e^{\sup _{x,x'\in \mathrm {supp}({\psi })}\varepsilon \cdot d_X(x,x')}\cdot \mathrm {Pr}[\overline{f}(\chi ')=y], \end{aligned}$$

where $\mathrm {supp}({\psi (\cdot ,x')})$ denotes the set of all $x\in X$, such that $\psi (x,x')>0$. We obtain

$$\begin{aligned}&d_\mathrm {dp}(\overline{f}(\chi ),\overline{f}(\chi '))=\sup _{y\in Y}\left| \ln \frac{\mathrm {Pr}[\overline{f}(\chi ')=y]}{\mathrm {Pr}[\overline{f}(\chi )=y]}\right| \le \\&\qquad \qquad \qquad \qquad \qquad \qquad \inf _{\psi \in \chi \otimes \chi '}\sup _{x,x'\in \mathrm {supp}({\psi })}\varepsilon \cdot d_X(x,x')=\varepsilon \cdot d^\#_X(\chi ,\chi '). \end{aligned}$$

Proposition 4

Let $f:X\rightarrow \mathcal {D}({Y})$ be c-sensitive according to the distance $d_X$ on X and distance $d^\#_Y$ on $\mathcal {D}({Y})$, where $d^\#_Y$ is constructed from some distance $d_Y$ on Y according to (A.1). Then $\overline{f}:\mathcal {D}({X})\rightarrow \mathcal {D}({Y})$ is c-sensitive according to the distance $d^\#_X$ on $\mathcal {D}({X})$ and $d^\#_Y$ on $\mathcal {D}({Y})$.

Proof

Let $\chi ,\chi '\in \mathcal {D}({X})$. Define $\mathbf {F}$ as the following set of mappings of type $X\times X\rightarrow \mathcal {D}({Y\times Y})$:

$$ \mathbf {F}=\{\xi \,|\,\forall x,x'\in X: \xi (x,x')\in f(x)\otimes f(x')\}. $$

Also consider the set $\varPhi \subseteq \mathcal {D}({Y\times Y})$, defined as follows:

$$ \varPhi =\{\sum _{x,x'\in X} \psi (x,x')\cdot \xi (x,x')\,|\,\psi \in \chi \otimes \chi ', \xi \in \mathbf {F}\}. $$

In the definition of $\varPhi $, we take the averages over $\xi (x,x')$ with the weights given by $\psi (x,x')$. We have $\varPhi \subseteq \overline{f}(\chi )\otimes \overline{f}(\chi ')$ because the first [resp. second] projection of any element of $\varPhi $ is $\overline{f}(\chi )$ [resp. $\overline{f}(\chi ')$]. We now have

$$\begin{aligned}&\quad d^\#_Y(\overline{f}(\chi ),\overline{f}(\chi '))= \inf _{\phi \in \overline{f}(\chi )\otimes \overline{f}(\chi ')} \sup _{(y,y')\in \mathrm {supp}({\phi })} d_Y(y,y') \le \\&\inf _{\phi \in \varPhi } \sup _{(y,y')\in \mathrm {supp}({\phi })} d_Y(y,y') = \inf _{\psi \in \chi \otimes \chi '}\inf _{\xi \in \mathbf {F}} \sup _{(x,x')\in \mathrm {supp}({\psi })} \sup _{(y,y')\in \mathrm {supp}({\xi (x,x')})} d_Y(y,y') = \\&\qquad \quad \quad \quad \inf _{\psi \in \chi \otimes \chi '}\sup _{(x,x')\in \mathrm {supp}({\psi })}\inf _{\phi \in f(x)\otimes f(x')} \sup _{(y,y')\in \mathrm {supp}({\phi })} d_Y(y,y') =\\&\inf _{\psi \in \chi \otimes \chi '}\sup _{(x,x')\in \mathrm {supp}({\psi })} d^\#_Y(f(x),f(x')) \le \inf _{\psi \in \chi \otimes \chi '}\sup _{(x,x')\in \mathrm {supp}({\psi })} c\cdot d_X(x,x') = c\cdot d^\#_X(\chi ,\chi ') \end{aligned}$$

These two propositions tells us how to turn a pre-interpretation of a privacy-enhanced DP-workflow into an interpretation. We define $\mathsf {d}_{d}=(\mathsf {d}^\flat _{d})^\#$ for each $d\in D$. The annotations $\mathcal {E},\mathcal {C}$ match the pre-interpretation if for all $p\in P$, $d'\in \bullet p$ and $d\in p\bullet $:

the sensitivity of $f_{{p}\rightarrow {d}}$ in its argument “$d'$” is $c_p[d',d]$ with respect to the distances $\mathsf {d}^\flat _{d'}$ on $X_{d'}$ and $\mathsf {d}_{d}$ on $\mathcal {D}({X_{d}})$;
the sensitivity of $f_{{p}\rightarrow {d}}$ in its argument “$d'$” is $\epsilon _p[d',d]$ with respect to the distances $\mathsf {d}^\flat _{d'}$ on $X_{d'}$ and $d_\mathrm {dp}$ on $\mathcal {D}({X_{d}})$.

In this way, the corresponding interpretation is also matched by the annotations.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dumas, M., García-Bañuelos, L., Laud, P. (2016). Differential Privacy Analysis of Data Processing Workflows. In: Kordy, B., Ekstedt, M., Kim, D. (eds) Graphical Models for Security. GraMSec 2016. Lecture Notes in Computer Science(), vol 9987. Springer, Cham. https://doi.org/10.1007/978-3-319-46263-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-46263-9_4
Published: 08 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46262-2
Online ISBN: 978-3-319-46263-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Lifting Distances to Probability Distributions

A Lifting Distances to Probability Distributions

Proposition 3

Proof

Proposition 4

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation