Skip to main content

Differential Privacy Analysis of Data Processing Workflows

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9987))

Abstract

Differential privacy is an established paradigm to measure and control private information leakages occurring as a result of disclosures of derivatives of sensitive data sources. The bulk of differential privacy research has focused on designing mechanisms to ensure that the output of a program or query is \(\epsilon \)-differentially private with respect to its input. In an enterprise environment however, data processing generally occurs in the context of business processes consisting of chains of tasks performed by multiple IT system components, which disclose outputs to multiple parties along the way. Ensuring privacy in this setting requires us to reason in terms of series of disclosures of intermediate and final outputs, derived from multiple data sources. This paper proposes a method to quantify the amount of private information leakage from each sensitive data source vis-a-vis of each party involved in a business process. The method relies on generalized composition rules for sensitivity and differential privacy, which are applicable to chained compositions of tasks, where each task may have multiple inputs and outputs of different types, and such that a differentially private output of a task may be taken as input by other tasks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    A pool in BPMN (represented by a horizontal rectangle) represents an independent organizational entity that communicates with other entities via message flows, represented via dashed arrows.

  2. 2.

    Acyclicity is also required on data dependencies as way to simplify the presentation. However, this restriction does not affect the generality of our approach. A cyclic data dependency would usually stand for a data update access. The same intuition can alternatively be represented with two data nodes: one data node representing the read data object and the other one representing the written data.

  3. 3.

    Note that there exists another topological order of the processing nodes of the example, namely [ACBD]. Either one would produce the same output matrices.

  4. 4.

    The tool is available at http://pleak.io/ for research purposes.

References

  1. Accorsi, R., Lehmann, A., Lohmann, N.: Information leak detection in business process models: theory, application, and tool support. Inf. Syst. 47, 244–257 (2015)

    Article  Google Scholar 

  2. Barthe, G., Köpf, B., Olmedo, F., Béguelin, S.Z.: Probabilistic relational reasoning for differential privacy. ACM Trans. Program. Lang. Syst. 35(3), 9 (2013)

    Article  MATH  Google Scholar 

  3. Berthold, M.R., Nicolas, C., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Thiel, K., Wiswedel, B.: KNIME - the Konstanz Information Miner: version 2.0 and beyond. SIGKDD Explor. 11(1), 26–31 (2009)

    Article  Google Scholar 

  4. Chatzikokolakis, K., Andrés, M.E., Bordenabe, N.E., Palamidessi, C.: Broadening the scope of differential privacy using metrics. In: De Cristofaro, E., Wright, M. (eds.) PETS 2013. LNCS, vol. 7981, pp. 82–102. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  5. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Ebadi, H., Sands, D.: Featherweight PINQ (2015). CoRR arXiv:1505.02642

  8. ElSalamouny, E., Chatzikokolakis, K., Palamidessi, C.: Generalized differential privacy: regions of priors that admit robust optimal mechanisms. In: van Breugel, F., Kashefi, E., Palamidessi, C., Rutten, J. (eds.) Horizons of the Mind. LNCS, vol. 8464, pp. 292–318. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  9. Frau, S., Gorrieri, R., Ferigato, C.: Petri net security checker: structural non-interference at work. In: Degano, P., Guttman, J., Martinelli, F. (eds.) FAST 2008. LNCS, vol. 5491, pp. 210–225. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Gaboardi, M., Haeberlen, A., Hsu, J., Narayan, A., Pierce, B.C.: Linear dependent types for differential privacy. In: Giacobazzi, R., Cousot, R. (eds.) The 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2013, Rome, Italy, 23–25 January 2013, pp. 357–370. ACM (2013)

    Google Scholar 

  11. Kimball, R., Reeves, L., Thornthwaite, W., Ross, M., Thornwaite, W.: The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses, 1st edn. Wiley, New York (1998)

    Google Scholar 

  12. McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Çetintemel, U., Zdonik, S.B., Kossmann, D., Tatbul, N. (eds.) Proceedings of ACM SIGMOD International Conference on Management of Data, SIGMOD, Providence, Rhode Island, USA, 29th June–2nd July 2009, pp. 19–30. ACM (2009)

    Google Scholar 

  13. Object Management Group: Business Process Model and Notation (BPMN) Version 2.0 (2011)

    Google Scholar 

  14. Reed, J., Pierce, B.C.: Distance makes the types grow stronger: a calculus for differential privacy. In: Hudak, P., Weirich, S. (eds.) Proceeding of 15th ACM SIGPLAN International Conference on Functional Programming, ICFP 2010, Baltimore, Maryland, USA, 27–29 September 2010, pp. 157–168. ACM (2010)

    Google Scholar 

Download references

Acknowledgments

This work is funded by DARPA’s “Brandeis” programme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luciano García-Bañuelos .

Editor information

Editors and Affiliations

A Lifting Distances to Probability Distributions

A Lifting Distances to Probability Distributions

To interpret a privacy-enhanced DP-workflow \((W,\mathcal {E},\mathcal {C})\) (where \(W=(D,P,F)\)), we have to give metrics \(\mathsf {d}_{d}\) on \(\mathcal {D}({X_{d}})\) for each \(d\in D\). Moreover, for the interpretation to be matched by the annotations, the mappings \(\overline{f}_{{p}\rightarrow {d}}\) between these probability distributions must have the sensitivities given by \(\mathcal {E}\) and \(\mathcal {C}\). It may be more natural to assume that the interpretation gives us metrics on \(X_{d}\), not on \(\mathcal {D}({X_{d}})\). It is also more natural to require the mappings \(f_{{p}\rightarrow {d}}\) to have a certain sensitivity.

We thus define that a pre-interpretation consists of sets \(X_{d}\) for each \(d\in D\) together with a metric \(\mathsf {d}^\flat _{d}\) on it, as well as the mappings \(f_{{p}\rightarrow {d}}\) for each \(p\in P\) and \(d\in p\bullet \). We have to specify what kind of interpretation it generates, and when to the annotations \(\mathcal {E},\mathcal {C}\) match the pre-interpretation. The key for this is to specify the metric \(\mathsf {d}_{d}\) on \(\mathcal {D}({X_{d}})\).

Let X be a set and \(d_X\) a metric on it. It turns out that the following definition of a metric \(d^\#_X\) on \(\mathcal {D}({X})\) is a suitable one. Let \(\chi ,\chi '\in \mathcal {D}({X})\). Then

$$\begin{aligned} d^\#_X(\chi ,\chi ')=\inf _{\psi \in \chi \otimes \chi '}\sup _{(x,x')\in \mathrm {supp}({\psi })} d_X(x,x'). \end{aligned}$$
(A.1)

The proposed metric \(d^\#_X\) can be seen as a kind of “worst-case” earth mover’s distance (or Wasserstein metric). In the “usual” earth mover’s distance, one would take the average over \(\psi \), not the supremum over \(\mathrm {supp}({\psi })\).

The suitability of the construction (A.1) is given by the following two propositions. Note that the first of them would not hold for the “usual” earth mover’s distance.

Proposition 3

Let \(f:X\rightarrow \mathcal {D}({Y})\) be \(\varepsilon \)-sensitive according to the distance \(d_X\) on X and distance \(d_\mathrm {dp}\) on \(\mathcal {D}({Y})\). Then the lifting \(\overline{f}:\mathcal {D}({X})\rightarrow \mathcal {D}({Y})\) is \(\varepsilon \)-sensitive according to the distance \(d^\#_X\) on \(\mathcal {D}({X})\) and \(d_\mathrm {dp}\) on \(\mathcal {D}({Y})\).

Proof

Let \(\chi ,\chi '\in \mathcal {D}({X})\), \(\psi \in \chi \otimes \chi '\) and \(y\in Y\). Then

$$\begin{aligned}&\mathrm {Pr}[\overline{f}(\chi )=y] = \sum _{x\in X}\chi (x)\cdot \mathrm {Pr}[f(x)=y] = \sum _{x,x'\in X}\psi (x,x')\cdot \mathrm {Pr}[f(x)=y] \le \\&\qquad \qquad \qquad \quad \sum _{x,x'\in X}\psi (x,x')\cdot e^{\varepsilon \cdot d_X(x,x')}\mathrm {Pr}[f(x')=y] \le \\&\qquad \quad \quad \sum _{x,x'\in X}\psi (x,x')\cdot e^{\sup _{x\in \mathrm {supp}({\psi (\cdot ,x')})}\varepsilon \cdot d_X(x,x')}\mathrm {Pr}[f(x')=y] = \\&\qquad \qquad \quad \sum _{x'\in X}\chi '(x')\cdot e^{\sup _{x\in \mathrm {supp}({\psi (\cdot ,x')})}\varepsilon \cdot d_X(x,x')}\mathrm {Pr}[f(x')=y] \le \\&\qquad \qquad \quad e^{\sup _{x,x'\in \mathrm {supp}({\psi })}\varepsilon \cdot d_X(x,x')}\cdot \sum _{x'\in X}\chi '(x')\cdot \mathrm {Pr}[f(x')=y] = \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad e^{\sup _{x,x'\in \mathrm {supp}({\psi })}\varepsilon \cdot d_X(x,x')}\cdot \mathrm {Pr}[\overline{f}(\chi ')=y], \end{aligned}$$

where \(\mathrm {supp}({\psi (\cdot ,x')})\) denotes the set of all \(x\in X\), such that \(\psi (x,x')>0\). We obtain

$$\begin{aligned}&d_\mathrm {dp}(\overline{f}(\chi ),\overline{f}(\chi '))=\sup _{y\in Y}\left| \ln \frac{\mathrm {Pr}[\overline{f}(\chi ')=y]}{\mathrm {Pr}[\overline{f}(\chi )=y]}\right| \le \\&\qquad \qquad \qquad \qquad \qquad \qquad \inf _{\psi \in \chi \otimes \chi '}\sup _{x,x'\in \mathrm {supp}({\psi })}\varepsilon \cdot d_X(x,x')=\varepsilon \cdot d^\#_X(\chi ,\chi '). \end{aligned}$$

Proposition 4

Let \(f:X\rightarrow \mathcal {D}({Y})\) be c-sensitive according to the distance \(d_X\) on X and distance \(d^\#_Y\) on \(\mathcal {D}({Y})\), where \(d^\#_Y\) is constructed from some distance \(d_Y\) on Y according to (A.1). Then \(\overline{f}:\mathcal {D}({X})\rightarrow \mathcal {D}({Y})\) is c-sensitive according to the distance \(d^\#_X\) on \(\mathcal {D}({X})\) and \(d^\#_Y\) on \(\mathcal {D}({Y})\).

Proof

Let \(\chi ,\chi '\in \mathcal {D}({X})\). Define \(\mathbf {F}\) as the following set of mappings of type \(X\times X\rightarrow \mathcal {D}({Y\times Y})\):

$$ \mathbf {F}=\{\xi \,|\,\forall x,x'\in X: \xi (x,x')\in f(x)\otimes f(x')\}. $$

Also consider the set \(\varPhi \subseteq \mathcal {D}({Y\times Y})\), defined as follows:

$$ \varPhi =\{\sum _{x,x'\in X} \psi (x,x')\cdot \xi (x,x')\,|\,\psi \in \chi \otimes \chi ', \xi \in \mathbf {F}\}. $$

In the definition of \(\varPhi \), we take the averages over \(\xi (x,x')\) with the weights given by \(\psi (x,x')\). We have \(\varPhi \subseteq \overline{f}(\chi )\otimes \overline{f}(\chi ')\) because the first [resp. second] projection of any element of \(\varPhi \) is \(\overline{f}(\chi )\) [resp. \(\overline{f}(\chi ')\)]. We now have

$$\begin{aligned}&\quad d^\#_Y(\overline{f}(\chi ),\overline{f}(\chi '))= \inf _{\phi \in \overline{f}(\chi )\otimes \overline{f}(\chi ')} \sup _{(y,y')\in \mathrm {supp}({\phi })} d_Y(y,y') \le \\&\inf _{\phi \in \varPhi } \sup _{(y,y')\in \mathrm {supp}({\phi })} d_Y(y,y') = \inf _{\psi \in \chi \otimes \chi '}\inf _{\xi \in \mathbf {F}} \sup _{(x,x')\in \mathrm {supp}({\psi })} \sup _{(y,y')\in \mathrm {supp}({\xi (x,x')})} d_Y(y,y') = \\&\qquad \quad \quad \quad \inf _{\psi \in \chi \otimes \chi '}\sup _{(x,x')\in \mathrm {supp}({\psi })}\inf _{\phi \in f(x)\otimes f(x')} \sup _{(y,y')\in \mathrm {supp}({\phi })} d_Y(y,y') =\\&\inf _{\psi \in \chi \otimes \chi '}\sup _{(x,x')\in \mathrm {supp}({\psi })} d^\#_Y(f(x),f(x')) \le \inf _{\psi \in \chi \otimes \chi '}\sup _{(x,x')\in \mathrm {supp}({\psi })} c\cdot d_X(x,x') = c\cdot d^\#_X(\chi ,\chi ') \end{aligned}$$

These two propositions tells us how to turn a pre-interpretation of a privacy-enhanced DP-workflow into an interpretation. We define \(\mathsf {d}_{d}=(\mathsf {d}^\flat _{d})^\#\) for each \(d\in D\). The annotations \(\mathcal {E},\mathcal {C}\) match the pre-interpretation if for all \(p\in P\), \(d'\in \bullet p\) and \(d\in p\bullet \):

  • the sensitivity of \(f_{{p}\rightarrow {d}}\) in its argument “\(d'\)” is \(c_p[d',d]\) with respect to the distances \(\mathsf {d}^\flat _{d'}\) on \(X_{d'}\) and \(\mathsf {d}_{d}\) on \(\mathcal {D}({X_{d}})\);

  • the sensitivity of \(f_{{p}\rightarrow {d}}\) in its argument “\(d'\)” is \(\epsilon _p[d',d]\) with respect to the distances \(\mathsf {d}^\flat _{d'}\) on \(X_{d'}\) and \(d_\mathrm {dp}\) on \(\mathcal {D}({X_{d}})\).

In this way, the corresponding interpretation is also matched by the annotations.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Dumas, M., García-Bañuelos, L., Laud, P. (2016). Differential Privacy Analysis of Data Processing Workflows. In: Kordy, B., Ekstedt, M., Kim, D. (eds) Graphical Models for Security. GraMSec 2016. Lecture Notes in Computer Science(), vol 9987. Springer, Cham. https://doi.org/10.1007/978-3-319-46263-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46263-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46262-2

  • Online ISBN: 978-3-319-46263-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics