Skip to main content

A Differentially Private Kernel Two-Sample Test

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2019)

Abstract

Kernel two-sample testing is a useful statistical tool in determining whether data samples arise from different distributions without imposing any parametric assumptions on those distributions. However, raw data samples can expose sensitive information about individuals who participate in scientific studies, which makes the current tests vulnerable to privacy breaches. Hence, we design a new framework for kernel two-sample testing conforming to differential privacy constraints, in order to guarantee the privacy of subjects in the data. Unlike existing differentially private parametric tests that simply add noise to data, kernel-based testing imposes a challenge due to a complex dependence of test statistics on the raw data, as these statistics correspond to estimators of distances between representations of probability measures in Hilbert spaces. Our approach considers finite dimensional approximations to those representations. As a result, a simple chi-squared test is obtained, where a test statistic depends on a mean and covariance of empirical differences between the samples, which we perturb for a privacy guarantee. We investigate the utility of our framework in two realistic settings and conclude that our method requires only a relatively modest increase in sample size to achieve a similar level of power to the non-private tests in both settings.

A. Raj and H. C. L. Law—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See Appendix C.3 and C.2 for other possible approaches.

  2. 2.

    While this may produce negative privatized test statistics, the test threshold is appropriately adjusted for this. See Appendix C.2 and C.3 for alternative approaches for privatizing the test statistic.

  3. 3.

    Code is available at https://github.com/hcllaw/private_tst.

  4. 4.

    To ensure \(\tilde{\mathbf {\Lambda }}\) to be positive semi-definite, we project any negative sigular values to a small positive value (e.g., 0.01).

References

  1. Balle, B., Wang, Y.-X.: Improving the gaussian mechanism for differential privacy: analytical calibration and optimal denoising (2018)

    Google Scholar 

  2. Balog, M., Tolstikhin, I., Schölkopf, B.: Differentially private database release via kernel mean embeddings (2017). arXiv:1710.01641

  3. Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H.-P., Schölkopf, B., Smola, A.J.: Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14), e49–e57 (2006)

    Article  Google Scholar 

  4. Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. JMLR 12, 1069–1109 (2011)

    MathSciNet  MATH  Google Scholar 

  5. Chen, X.: A new generalization of Chebyshev inequality for random vectors. arXiv preprint arXiv:0707.0805 (2007)

  6. Chwialkowski, K.P., Ramdas, A., Sejdinovic, D., Gretton, A.: Fast two-sample testing with analytic representations of probability measures. In: NIPS, pp. 1981–1989 (2015)

    Google Scholar 

  7. Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006). https://doi.org/10.1007/11761679_29

    Chapter  Google Scholar 

  8. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14

    Chapter  Google Scholar 

  9. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 211–407 (2014)

    Article  MathSciNet  Google Scholar 

  10. Dwork, C., Talwar, K., Thakurta, A., Zhang, L.: Analyze Gauss: optimal bounds for privacy-preserving principal component analysis. In: Symposium on Theory of Computing, STOC 2014, pp. 11–20 (2014)

    Google Scholar 

  11. Flaxman, S., Sejdinovic, D., Cunningham, J.P., Filippi, S.: Bayesian learning of kernel embeddings. In: UAI, pp. 182–191 (2016)

    Google Scholar 

  12. Gaboardi, M., Lim, H.W., Rogers, R., Vadhan, S.P.: Differentially private chi-squared hypothesis testing: goodness of fit and independence testing. In: ICML, vol. 48, ICML 2016, pp. 2111–2120 (2016)

    Google Scholar 

  13. Gaboardi, M., Rogers, R.M.: Local private hypothesis testing: Chi-square tests. CoRR, abs/1709.07155 (2017)

    Google Scholar 

  14. Goyal, V., Khurana, D., Mironov, I., Pandey, O., Sahai, A.: Do distributed differentially-private protocols require oblivious transfer?. In: ICALP, pp. 29:1–29:15 (2016)

    Google Scholar 

  15. Gretton, A., Borgwardt, K.M., Rasch, M., Schölkopf, B., Smola, A.J.: A kernel method for the two-sample-problem. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) NIPS, pp. 513–520. MIT Press (2007)

    Google Scholar 

  16. Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. JMLR 13(1), 723–773 (2012)

    MathSciNet  MATH  Google Scholar 

  17. Gretton, A., Fukumizu, K., Harchaoui, Z., Sriperumbudur, B.K.: A fast, consistent kernel two-sample test. In: NIPS, pp. 673–681 (2009)

    Google Scholar 

  18. Gretton, A., et al.:. Optimal kernel choice for large-scale two-sample tests. In: NIPS (2012)

    Google Scholar 

  19. Hall, R., Rinaldo, A., Wasserman, L.: Differential privacy for functions and functional data. JMLR 14, 703–727 (2013)

    MathSciNet  MATH  Google Scholar 

  20. Homer, N.: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4(8), 1–9 (2008)

    Article  Google Scholar 

  21. Jain, P., Thakurta, A.: Differentially private learning with kernels. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013, pp. 118–126, July 2013

    Google Scholar 

  22. Jitkrittum, W., Szabó, Z., Chwialkowski, K., Gretton, A.: Interpretable distribution features with maximum testing power. In: NIPS (2016)

    Google Scholar 

  23. Johnson, A., Shmatikov,V.: Privacy-preserving data exploration in genome-wide association studies. In: ACM SIGKDD 2013 (2013)

    Google Scholar 

  24. Law, H.C.L., Sutherland, D.J., Sejdinovic, D., Flaxman, S.: Bayesian approaches to distribution regression. In: UAI (2017)

    Google Scholar 

  25. McGregor, A., Mironov, I., Pitassi, T., Reingold, O., Talwar, K., Vadhan, S.: The limits of two-party differential privacy. In: IEEE, October 2010

    Google Scholar 

  26. Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B.: Kernel mean embedding of distributions: a review and beyond. Found. Trends® Mach. Learn. 10(1–2), 1–141 (2017)

    MATH  Google Scholar 

  27. Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, pp. 1177–1184 (2008)

    Google Scholar 

  28. Rogers, R., Kifer, D.: A new class of private chi-square hypothesis tests. In: Artificial Intelligence and Statistics, pp. 991–1000 (2017)

    Google Scholar 

  29. Rothe, R., Timofte, R., Van Gool, L.: Deep expectation of real and apparent age from a single image without facial landmarks. Int. J. Comput. Vision 126(2), 144–157 (2016). https://doi.org/10.1007/s11263-016-0940-3

    Article  MathSciNet  Google Scholar 

  30. Wahba, G.: Spline Models for Observational Data. Society for Industrial and Applied Mathematics (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ho Chung Leon Law .

Editor information

Editors and Affiliations

Appendices

Appendix

A Covariance Perturbation

Theorem 4 (Modified Analyze Gauss)

Draw Gaussian random variables \(\mathbf {\varvec{\eta }}\sim \mathcal {N}(0, \beta ^2 \mathbf {I}_{J(J+1)/2})\) where \(\beta = \frac{\kappa ^2 J\sqrt{2 \log (1.25/\delta _2)}}{(n-1)\epsilon _2}\). Using \(\mathbf {\varvec{\eta }}\), we construct a upper triangular matrix (including diagonal), then copy the upper part to the lower part so that the resulting matrix \(\mathbf {D}\) becomes symmetric. The perturbed matrix \(\tilde{\mathbf {\Lambda }} = \mathbf {\Lambda }+ \mathbf {D}\) is (\(\epsilon _2, \delta _2\))-differentially privateFootnote 4.

The proof is the same as the proof for Algorithm 1 in [10] with the exception that the global sensitivity of \(\mathbf {\Lambda }\) is

$$\begin{aligned} GS(\mathbf {\Lambda }) = \max _{\mathcal {D}, \mathcal {D}'}\Vert \mathbf {\Lambda }(\mathcal {D}) - \mathbf {\Lambda }(\mathcal {D}')\Vert _F = \max _{\mathbf {v}, \mathbf {v}'} \Vert \mathbf {v}\mathbf {v}{^\top } - \mathbf {v}' {\mathbf {v}'}{^\top }\Vert _F \le \tfrac{\kappa ^2 J}{n-1}, \end{aligned}$$
(5)

where \(\mathbf {v}\) is the single entry differing in \(\mathcal {D}\) and \({\mathcal {D}}'\), and \(\Vert \mathbf {v}\Vert _2 \le \frac{\kappa \sqrt{J}}{\sqrt{n-1}}\).

B Sensitivity of \(\mathbf {w}_n^\top \left( \mathbf {\Sigma }_n+\gamma _n\mathbf {I}\right) ^{-1}\mathbf {w}_n \)

We first introduce a few notations, which we will use for the sensitivity analysis.

  • We split \(\mathbf {w}_n = \mathbf {m}+ \frac{1}{\sqrt{n}}\mathbf {v}\), where \(\mathbf {m}= \frac{1}{n}\sum _{i=1}^{n-1} \mathbf {z}_i\) and \(\mathbf {v}= \frac{1}{\sqrt{n}} \mathbf {z}_n\).

  • Similarly, we split \(\mathbf {\Lambda }= \mathbf {M}{^\top }\mathbf {M}+ \frac{n}{n-1}\mathbf {v}\mathbf {v}{^\top } + \gamma _n \mathbf {I}\), where \({\mathbf {M}}{^\top }\mathbf {M}= \frac{1}{n-1}\sum _{i=1}^{n-1} \mathbf {z}_i\mathbf {z}_i{^\top }\), we denote \(\mathbf {M}_{\gamma _n} = {\mathbf {M}}^\top \mathbf {M}+ \gamma _n \mathbf {I}\), where \(\gamma _n > 0\)

  • We put a dash for the quantities run on the neighbouring dataset \({\mathcal {D}}'\), i.e., the mean vector is \({\mathbf {w}_n}'\), the 2nd-moment matrix is \({\mathbf {\Lambda }}'\) (including a regularization term of \(\gamma _n \mathbf {I}\)). Here, \({\mathbf {w}_n} = \mathbf {m}+ \frac{1}{\sqrt{n}}\mathbf {v}'\), \(\mathbf {v}' = \frac{1}{\sqrt{n}} \mathbf {z}_n'\), and \(\mathbf {\Lambda }' = \mathbf {M}{^\top }\mathbf {M}+ \frac{n}{n-1}\mathbf {v}' {\mathbf {v}'}{^\top } + \gamma _n \mathbf {I}= \mathbf {M}_{\gamma _n} + \frac{n}{n-1}\mathbf {v}' {\mathbf {v}'}{^\top }\). Similarly, the covariance given the dataset \(\mathcal {D}\) is \(\mathbf {\Sigma }= \mathbf {\Lambda }- \frac{n}{n-1}\mathbf {w}_n \mathbf {w}_n{^\top }\) and the covariance given the dataset \(\mathcal {D}'\) is \(\mathbf {\Sigma }' = \mathbf {\Lambda }' - \frac{n}{n-1}{\mathbf {w}_n} {{\mathbf {w}_n}}{^\top }\).

  • Note that \(\mathbf {\Lambda }\) and \(\mathbf {M}_{\gamma _n}\) is positive definite, and hence invertible and have positive eigenvalues, we let eigen-vectors of \(\mathbf {M}_{\gamma _n}\) are denoted by \(\mathbf {u}_1, \cdots , \mathbf {u}_J\) and the corresponding eigenvalues by \(\mu _1, \cdots , \mu _J\). We also define the eigen-vectors such that \(\mathbf {Q}\) is orthogonal. Here \(\mathbf {Q}\) has columns given by the eigen-vectors.

The L2-sensitivity of test statistic is derived using a few inequalities that are listed below:

$$\begin{aligned} GS_2(s_n)&= \max _{\mathcal {D}, \mathcal {D}'} \quad \left| s_n (\mathcal {D}) - s_n(\mathcal {D}')\right| , \end{aligned}$$
(6)
$$\begin{aligned}&= n \; \max _{\mathbf {v}, \mathbf {v}'} \left| \mathbf {w}_n{^\top }\mathbf {\Sigma }^{-1}\mathbf {w}_n - {\mathbf {w}_n}{^\top }{\mathbf {\Sigma }'}^{-1}{\mathbf {w}_n} \right| \end{aligned}$$
(7)
$$\begin{aligned}&= n \; \max _{\mathbf {v}, \mathbf {v}'} \left| \mathbf {w}_n{^\top }(\mathbf {\Lambda }- \frac{n}{n-1}\mathbf {w}_n {\mathbf {w}_n{^\top }})^{-1}\mathbf {w}_n - {\mathbf {w}_n}{^\top }(\mathbf {\Lambda }'-\frac{n}{n-1}{\mathbf {w}_n}{{\mathbf {w}_n}{^\top }})^{-1}{\mathbf {w}_n} \right| , \end{aligned}$$
(8)
$$\begin{aligned}&\le 2n\; \max _{\mathbf {v}, \mathbf {v}'} \left| \mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n - {\mathbf {w}_n}{^\top }{\mathbf {\Lambda }'}^{-1}{\mathbf {w}_n} \right| , \text{ due } \text{ to } \text{ inequality } \text{ I } \end{aligned}$$
(9)
$$\begin{aligned}&\le 2n \; \max _{\mathbf {v}, \mathbf {v}'} \left( \left| {{\mathbf {w}'_n}}{^\top }(\mathbf {\Lambda }^{-1}-{\mathbf {\Lambda }'}^{-1}){{\mathbf {w}'_n}}\right| +\big | \mathbf {w}_n^\top \mathbf {\Lambda }^{-1} \mathbf {w}_n - {{\mathbf {w}'_n}}^\top \mathbf {\Lambda }^{-1} {{\mathbf {w}'_n}}\big | \right) , \end{aligned}$$
(10)
$$\begin{aligned}&\le 2n \; \max _{\mathbf {v}, \mathbf {v}'} \Vert \mathbf {w}_n \Vert _2^2 \Vert \mathbf {\Lambda }^{-1} - {\mathbf {\Lambda }'}^{-1} \Vert _F + \frac{4\kappa ^2 J}{n} \frac{\sqrt{J}}{\mu _{min}(\mathbf {\Lambda })}, \text{ Cauchy } \text{ Schwarz } \text{ and } \text{ IV }, \end{aligned}$$
(11)
$$\begin{aligned}&\le \frac{2 \kappa ^2 J}{n} \; \; \max _{\mathbf {v}, \mathbf {v}'} \Vert \mathbf {\Lambda }^{-1} - {\mathbf {\Lambda }'}^{-1} \Vert _F + \frac{4\kappa ^2 J}{n} \frac{\sqrt{J}}{\mu _{min}(\mathbf {\Lambda })}, \text{ since } \Vert \mathbf {w}_n \Vert _2^2 \le \frac{1}{n^2}\kappa ^2 J, \end{aligned}$$
(12)
$$\begin{aligned}&\le \frac{4\kappa ^2 J \sqrt{J} B^2}{(n-1) \Vert \mu _{min}(\mathbf {M}_{\gamma _n})\Vert } + \frac{4\kappa ^2 J}{n} \frac{\sqrt{J}}{\mu _{min}(\mathbf {\Lambda })}, \text{ due } \text{ to } \text{ inequality } \text{ III }.\end{aligned}$$
(13)
$$\begin{aligned}&\le \frac{4\kappa ^2 J \sqrt{J} B^2}{(n-1) \gamma _n } + \frac{4\kappa ^2 J}{n} \frac{\sqrt{J}}{\gamma _n} \end{aligned}$$
(14)
$$\begin{aligned}&= \frac{4\kappa ^2 J \sqrt{J} }{n \gamma _n } \left( 1 + \frac{ \kappa ^2 J}{n-1} \right) \end{aligned}$$
(15)

Here, the regularization parameter \(\lambda _n\) is the lower bound on the minimum singular values of the matrices \(\Lambda \) and \(\mathbf {M}_{\lambda }\).

Hence the final sensiitvity of the data can be upper bound by \(\frac{4\kappa ^2 J \sqrt{J} }{n \gamma _n } \left( 1 + \frac{ \kappa ^2 J}{n-1} \right) \).

The inequalities we used are given by

  • I: Due to the Sherman–Morrison formula, we can re-write

    $$\begin{aligned} \mathbf {w}_n{^\top }(\mathbf {\Lambda }- \frac{n}{n-1}\mathbf {w}_n \mathbf {w}_n{^\top })^{-1}\mathbf {w}_n = \mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n + \frac{\frac{n}{n-1}(\mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n)^2}{1+\frac{n}{n-1}\mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n}. \end{aligned}$$
    (16)

    Now, we can bound

    $$\begin{aligned}&\left| \mathbf {w}_n{^\top }(\mathbf {\Lambda }- \frac{n}{n-1}\mathbf {w}_n \mathbf {w}_n{^\top })^{-1}\mathbf {w}_n - {\mathbf {w}_n}{^\top }(\mathbf {\Lambda }'-\frac{n}{n-1}{\mathbf {w}_n}{\mathbf {w}_n}{^\top })^{-1}{\mathbf {w}_n} \right| \nonumber \\ \le \;&\;|\mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n-{\mathbf {w}_n}{^\top }\mathbf {\Lambda }'^{-1}{\mathbf {w}_n}| + \left| \frac{\frac{n}{n-1}(\mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n)^2}{1+\frac{n}{n-1}\mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n} -\frac{\frac{n}{n-1}({\mathbf {w}_n}{^\top }\mathbf {\Lambda }'^{-1}{\mathbf {w}_n})^2}{1+\frac{n}{n-1}{\mathbf {w}'_n}{^\top }\mathbf {\Lambda }'^{-1}{\mathbf {w}'_n}}\right| ,\nonumber \\&\le 2|\mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n-{\mathbf {w}_n}{^\top }\mathbf {\Lambda }'^{-1}{\mathbf {w}_n}|, \end{aligned}$$
    (17)

    where the last line is due to \(\mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n\ge \frac{(\mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n)^2}{1+\mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n} \ge 0\), and \({\mathbf {w}_n}{^\top }\mathbf {\Lambda }'^{-1}{\mathbf {w}_n}\ge \frac{({\mathbf {w}'_n}{^\top }\mathbf {\Lambda }'^{-1}{\mathbf {w}'_n})^2}{1+{\mathbf {w}'_n}{^\top }\mathbf {\Lambda }'^{-1}{\mathbf {w}'_n}}\ge 0\). Let \(a = \mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n\) and \(b ={\mathbf {w}'_n}{^\top }\mathbf {\Lambda }'^{-1}{\mathbf {w}'_n}\), then:

    $$\begin{aligned} A = \left| \frac{a^2}{1+a} - \frac{b^2}{1+b}\right| = \left| \frac{a^2 - b^2+a^2b -b^2a}{(1+a)(1+b)}\right|&= \left| \frac{(a-b)(a+b) + (a-b)ab}{(1+a)(1+b)}\right| \\&= \left| \frac{(a-b)[(a+b) +ab]}{(1+a)(1+b)}\right| \end{aligned}$$

    and then we have that:

    $$ A = \left| \frac{(a-b)[(1+a)(1+b)-1]}{(1+a)(1+b)}\right| \le |a-b|$$

    Hence, \(\left| \frac{\frac{n}{n-1}(\mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n)^2}{1+\frac{n}{n-1}\mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n} -\frac{\frac{n}{n-1}({\mathbf {w}_n}{^\top }\mathbf {\Lambda }'^{-1}{\mathbf {w}_n})^2}{1+\frac{n}{n-1}{\mathbf {w}'_n}{^\top }\mathbf {\Lambda }'^{-1}{\mathbf {w}'_n}}\right| \le \left( \frac{n}{n-1}\right) \left( \frac{n-1}{n}\right) |\mathbf {w}_n{^\top }\mathbf {\Lambda }^{-1}\mathbf {w}_n-{\mathbf {w}_n}{^\top }\mathbf {\Lambda }'^{-1}{\mathbf {w}_n}| \)

  • II: For a positive semi-definite \(\mathbf {\Sigma }\), \(0 \le \mathbf {m}{^\top }\mathbf {\Sigma }\mathbf {m}\le \Vert \mathbf {m}\Vert ^2_2 \Vert \mathbf {\Sigma }\Vert _F\), where \(\Vert \mathbf {\Sigma }\Vert _F\) is the Frobenius norm.

  • III: We here will denote \(\tilde{\mathbf {v}} = \sqrt{\frac{n}{n-1}}\mathbf {v}\) and \(\tilde{\mathbf {v}}' = \sqrt{\frac{n}{n-1}}\mathbf {v}'\). Due to the Sherman–Morrison formula,

    $$\begin{aligned} \mathbf {\Lambda }^{-1} = (\mathbf {M}_{\gamma _n})^{-1} - (\mathbf {M}_{\gamma _n})^{-1}\frac{\tilde{\mathbf {v}}\tilde{\mathbf {v}}{^\top } }{1+\tilde{\mathbf {v}}{^\top }(\mathbf {M}_{\gamma _n})^{-1}\tilde{\mathbf {v}}} (\mathbf {M}_{\gamma _n})^{-1}. \end{aligned}$$
    (18)

    For any eigenvectors \(\mathbf {u}_j, \mathbf {u}_k\) of \(\mathbf {M}{^\top }\mathbf {M}\), we have

    $$\begin{aligned} \mathbf {u}_j{^\top }(\mathbf {\Lambda }^{-1}-\mathbf {\Lambda }'^{-1})\mathbf {u}_k = \mu _j^{-1}\mu _k^{-1}\left( \frac{(\mathbf {u}_j{^\top }\tilde{\mathbf {v}}) (\tilde{\mathbf {v}}{^\top } \mathbf {u}_k) }{1+\tilde{\mathbf {v}}{^\top }(\mathbf {M}_{\gamma _n})^{-1}\tilde{\mathbf {v}}}- \frac{(\mathbf {u}_j{^\top }\tilde{\mathbf {v}}') (\tilde{\mathbf {v}}'{^\top } \mathbf {u}_k) }{1+\tilde{\mathbf {v}}'{^\top }(\mathbf {M}_{\gamma _n})^{-1}\tilde{\mathbf {v}}'}\right) , \end{aligned}$$
    (19)

    where \(\mu _j, \mu _k\) are corresponding eigenvalues. Now, we rewrite the Frobenius norm as (since it is invariant under any orthogonal matrix, so we take the one formed by the eigenvectors from \(M^\top M\) with this property):

    $$\begin{aligned} \Vert ( \mathbf {\Lambda }^{-1} - \mathbf {\Lambda }'^{-1}) \Vert ^2_F&= \Vert \varvec{Q}(\mathbf {\Lambda }^{-1} - \mathbf {\Lambda }'^{-1}) \varvec{Q}{^\top } \Vert ^2_F, \nonumber \\&= \sum _{j,k}^J \left( \mathbf {u}_j{^\top } (\mathbf {\Lambda }^{-1} - \mathbf {\Lambda }'^{-1}) \mathbf {u}_k\right) ^2, \nonumber \\&\le \frac{2}{(1+\tilde{\mathbf {v}}{^\top }(\mathbf {M}_{\gamma _n})^{-1}\tilde{\mathbf {v}})^2}\sum _{j,k}^J\frac{(\mathbf {u}_j{^\top }\tilde{\mathbf {v}})^2 (\tilde{\mathbf {v}}{^\top } \mathbf {u}_k)^2 }{\mu _j^2\mu _k^2} \nonumber \\&\quad + \frac{2}{(1+\tilde{\mathbf {v}}'{^\top }(\mathbf {M}_{\gamma _n})^{-1}\tilde{\mathbf {v}}')^2}\sum _{j,k}^J\frac{(\mathbf {u}_j{^\top }\tilde{\mathbf {v}}')^2 (\tilde{\mathbf {v}}'{^\top } \mathbf {u}_k)^2 }{\mu _j^2\mu _k^2}, \end{aligned}$$
    (20)
    $$\begin{aligned}&\qquad \qquad \qquad \qquad \qquad \qquad \text{[due } \text{ to } \Vert a-b\Vert _2^2 \le 2\Vert a\Vert _2^2+2\Vert b\Vert _2^2], \nonumber \\&\le \frac{2}{(\tilde{\mathbf {v}}{^\top }(\mathbf {M}_{\gamma _n})^{-1}\tilde{\mathbf {v}})^2}\sum _{j,k}^J\frac{(\mathbf {u}_j{^\top }\tilde{\mathbf {v}})^2 (\tilde{\mathbf {v}}{^\top } \mathbf {u}_k)^2 }{\mu _j^2\mu _k^2} \nonumber \\&\quad + \frac{2}{(\tilde{\mathbf {v}}'{^\top }(\mathbf {M}_{\gamma _n})^{-1}\tilde{\mathbf {v}}')^2}\sum _{j,k}^J\frac{(\mathbf {u}_j{^\top }\tilde{\mathbf {v}}')^2 (\tilde{\mathbf {v}}'{^\top } \mathbf {u}_k)^2 }{\mu _j^2\mu _k^2}, \end{aligned}$$
    (21)
    $$\begin{aligned}&\le \frac{\mu _{min}(\mathbf {M}_{\gamma _n})^2}{B^4J} \sum _{j,k}^J\frac{2((\mathbf {u}_j{^\top }\tilde{\mathbf {v}})^2 (\tilde{\mathbf {v}}{^\top } \mathbf {u}_k)^2 + (\mathbf {u}_j{^\top }\tilde{\mathbf {v}}')^2 (\tilde{\mathbf {v}}'{^\top } \mathbf {u}_k)^2 )}{\mu _j^2\mu _k^2} \end{aligned}$$
    (22)
    $$\begin{aligned}&\le \frac{(n-1)^2\mu _{min}(\mathbf {M}_{\gamma _n})^2}{n^2 B^4J} \sum _{j,k}^J\frac{2((\mathbf {u}_j{^\top }\tilde{\mathbf {v}})^2 (\tilde{\mathbf {v}}{^\top } \mathbf {u}_k)^2 + (\mathbf {u}_j{^\top }\tilde{\mathbf {v}}')^2 (\tilde{\mathbf {v}}'{^\top } \mathbf {u}_k)^2 )}{\mu _{min}(\mathbf {M}_{\gamma _n})^4} \end{aligned}$$
    (23)
    $$\begin{aligned}&\le \frac{(n-1)^2 2 J}{n^2\mu _{min}(\mathbf {M}_{\gamma _n})^2 B^4}(\Vert \tilde{\mathbf {v}}\Vert _2^8+\Vert \tilde{\mathbf {v}}'\Vert _2^8), \end{aligned}$$
    (24)
    $$\begin{aligned}&\le \left( \frac{n}{n-1}\right) ^2\frac{ 4 J}{\mu _{min}(\mathbf {M}_{\gamma _n})^2}B^4, \end{aligned}$$
    (25)

    Note that we can get Eq. 25 by noticing that:

    $$\begin{aligned} \tilde{\mathbf {v}} {^\top } (\mathbf {M}_{\gamma _n})^{-1} \tilde{\mathbf {v}} \le \Vert \tilde{\mathbf {v}} \Vert ^2_2 \Vert (\mathbf {M}_{\gamma _n})^{-1} \Vert _F&\le \left( \frac{n}{n-1}\right) B^2 \sqrt{\frac{1}{\mu _1^2(\mathbf {M}_{\gamma _n})} + \dots + \frac{1}{\mu _{min}^2(\mathbf {M}_{\gamma _n})}} \\ {}&\le \left( \frac{n}{n-1}\right) \frac{B^2 \sqrt{J}}{\mu _{min}(\mathbf {M}_{\gamma _n})} \end{aligned}$$
  • IV:

    $$\begin{aligned} \max _{\mathbf {v},\mathbf {v}'} \big | \mathbf {w}_n^\top \mathbf {\Lambda }^{-1} \mathbf {w}_n - {\mathbf {w}'_n}^\top \mathbf {\Lambda }'^{-1} {\mathbf {w}'_n}\big | \le \max _{\mathbf {v},\mathbf {v}'}\Big [ \big | {\mathbf {w}'_n}^\top \mathbf {\Lambda }^{-1} {\mathbf {w}'_n} - {\mathbf {w}'_n}^\top \mathbf {\Lambda }'^{-1} {\mathbf {w}'_n}\big | \\ + \big | \mathbf {w}_n^\top \mathbf {\Lambda }^{-1} \mathbf {w}_n - {\mathbf {w}'_n}^\top \mathbf {\Lambda }^{-1} {\mathbf {w}'_n}\big | \Big ] \end{aligned}$$

    We write \(\mathbf {w}_n^\top \mathbf {\Lambda }^{-1} \mathbf {w}_n = \left( \mathbf {\Lambda }^{-1/2}\mathbf {w}_n\right) ^\top \left( \mathbf {\Lambda }^{-1/2}\mathbf {w}_n\right) \) and similarly \({\mathbf {w}'_n}^\top \mathbf {\Lambda }^{-1} {\mathbf {w}'_n} = \left( \mathbf {\Lambda }^{-1/2}{\mathbf {w}'_n}\right) ^\top \left( \mathbf {\Lambda }^{-1/2}{\mathbf {w}'_n}\right) \).

    $$\begin{aligned} \left| \mathbf {w}_n^\top \mathbf {\Lambda }^{-1} \mathbf {w}_n - {\mathbf {w}'_n}^\top \mathbf {\Lambda }^{-1} {\mathbf {w}'_n} \right|&= \left| \left( \mathbf {\Lambda }^{-1/2}\mathbf {w}_n\right) ^\top \left( \mathbf {\Lambda }^{-1/2}\mathbf {w}_n\right) - \left( \mathbf {\Lambda }^{-1/2}{\mathbf {w}'_n}\right) ^\top \left( \mathbf {\Lambda }^{-1/2}{\mathbf {w}'_n}\right) \right| \\&= \left| \left( \mathbf {\Lambda }^{-1/2}\mathbf {w}_n + \mathbf {\Lambda }^{-1/2}{\mathbf {w}'_n}\right) ^\top \left( \mathbf {\Lambda }^{-1/2}\mathbf {w}_n - \mathbf {\Lambda }^{-1/2}{\mathbf {w}'_n} \right) \right| \\&= \left| \left( \mathbf {\Lambda }^{-1/2}\left( \mathbf {w}_n + {\mathbf {w}'_n}\right) \right) ^\top \left( \mathbf {\Lambda }^{-1/2}\left( \mathbf {w}_n - {\mathbf {w}'_n}\right) \right) \right| \\&\le \left\| \mathbf {\Lambda }^{-1/2}\left( \mathbf {w}_n +{\mathbf {w}'_n} \right) \right\| _2 ~\left\| \mathbf {\Lambda }^{-1/2}\left( \mathbf {w}_n -{\mathbf {w}'_n} \right) \right\| _2 \\&\le \left\| \mathbf {\Lambda }^{-1/2}\left( \mathbf {w}_n +{\mathbf {w}'_n} \right) \right\| _2 ~ \frac{\kappa \sqrt{J}}{n} \left\| \mathbf {\Lambda }^{-1} \right\| _F^{1/2} \text {using equality (II)} \\&\le \frac{2\kappa ^2 J}{n^2} \left\| \mathbf {\Lambda }^{-1} \right\| _F,\\&= \frac{2\kappa ^2 J}{n^2} \sqrt{\frac{1}{\mu ^2_1(\mathbf {\Lambda })} + \cdots + \frac{1}{\mu ^2_{min}(\mathbf {\Lambda })}},\\&\le \frac{2\kappa ^2 J}{n^2} \frac{\sqrt{J}}{\mu _{min}(\mathbf {\Lambda })}. \end{aligned}$$

    where the last equality comes from that \(\mathbf {\Lambda }\) is real and symmetric.

C Other Possible Ways to Make the Test Private

1.1 C.1 Perturbing the Kernel Mean in RKHS

In [4], the authors proposed a new way to make the solution of the regularized risk minimization differentially private by injecting the noise in objective itself. That is :

$$ f_{priv} = \arg \min \big (J(f,\varvec{x}) + \frac{1}{n} \varvec{b}^\top f \big ) $$

However, it is not an easy task to add perturbation in functional spaces. The authors in [19] proposes to add a sample path from Gaussian processes into the function to make it private.

Lemma 1

(Proposition 7 [19]). Let G be a sample path of a Gaussian process having mean zero and covariance function k. Let K denote the Gram matrix i.e. \(K = [k(\varvec{x}_i, \varvec{x}_j)]_{i,j=1}^n\). Let \(\{ f_D : D \in \mathcal {D} \}\) be a family of functions indexed by databases. Then the release of :

$$\tilde{f}_D = f_D + \frac{\varDelta c(\beta )}{\alpha } G$$

is \((\alpha , \beta )\)-differentially private (with respect to the cylinder \(\sigma \)-field F) where \(\varDelta \) is the upper bound on

$$\begin{aligned} \sup _{D\sim D'}\sup _{n\in \mathbb N}\sup _{x_1,\ldots ,x_n} \sqrt{\left( \mathbf{{f}}_D-\mathbf{{f}}_{D'}\right) ^\top K^{-1}\left( \mathbf{{f}}_D-\mathbf{{f}}_{D'}\right) } \end{aligned}$$
(26)

and \(c(\beta )\ge \sqrt{2\log \frac{2}{\beta }}\).

Now, we consider the optimization problem given for MMD and inject noise in the objective itself. The optimization problem then becomes:

$$\begin{aligned} {d}_{priv}(p,q)&= \sup _{f \in \mathcal {H},~\Vert f\Vert _{\mathcal {H}}\le 1} \Big [\mathbb {E}_{\varvec{x} \sim p } [f(\varvec{x})] - \mathbb {E}_{\varvec{x} \sim q } [f(\varvec{x})] + \big \langle f, g(\varDelta ,\beta ,\alpha ) G \big \rangle \Big ] \\&= \sup _{f \in \mathcal {H},~\Vert f\Vert _{\mathcal {H}}\le 1} \Big [ \big \langle f ,\mu _p - \mu _q \big \rangle + \big \langle f, g(\varDelta ,\beta ,\alpha ) G \big \rangle \Big ] \\&= \sup _{f \in \mathcal {H},~\Vert f\Vert _{\mathcal {H}}\le 1} \Big [ \Big \langle f ,\mu _p - \mu _q + g(\varDelta ,\beta ,\alpha ) G \Big \rangle \Big ] \\&= \Vert \mu _p - \mu _q + g(\varDelta ,\beta ,\alpha ) G\Vert _{\mathcal {H}} \end{aligned}$$

In the similar way, one get the empirical version of the perturbed MMD distance just by replacing the true expectation with the empirical one. The problem with a construction above where embedding is injected with a Gaussian process sample path with the same kernel k is that the result will not be in the corresponding RKHS \(\mathcal H_k\) for infinite-dimensional spaces (these are well known results known as Kallianpur’s 0/1 laws), and thus MMD cannot be computed, i.e. while \(f_D\) is in the RKHS, \(\tilde{f}_D\) need not be. This has for example been considered in Bayesian models for kernel embeddings [11], where an alternative kernel construction using convolution is given by:

$$\begin{aligned} r(x,x')=\int k(x,y)k(y,x')\nu (dy), \end{aligned}$$
(27)

where \(\nu \) is a finite measure. Such smoother kernel r ensures that the sample path from a GP(0, r) will be in the RKHS \(\mathcal H_k\).

The key property in [19] is Prop. 8, which shows that for any \(h\in \mathcal H_k\) and for any finite collection of points \(\mathbf {x}= (x_1,\ldots ,x_n)\):

$$\mathbf{{h}}^\top K^{-1}\mathbf{{h}}\le \Vert h \Vert _{\mathcal H_k}^2.$$

which implies that we only require \(\sup _{D\sim D'}\Vert f_D-f_{D'} \Vert _{\mathcal H_k} \le \varDelta \) to hold to upper bound (26). However, in nonparametric contexts like MMD, one usually considers permutation testing approaches. But this is not possible in the case of private testing as one would need to release the samples from the null distribution.

1.2 C.2 Adding \(\chi ^2\)-noise to the Test Statistics

Since the unperturbed test statistics follows the \(\chi ^2\) distribution under the null, hence it is again natural to think to add noise sampled from the chi-square distribution to the test statistics \(s_n\). The probability density function for chi-square distribution with \(k-\)degree of freedom is given as:

$$\begin{aligned}f(x,k) = {\left\{ \begin{array}{ll} \frac{x^{\frac{k}{2}-1} \exp ({-\frac{x}{2}})}{2^{\frac{k}{2}}\varGamma (\frac{k}{2})} , &{} \text {if } x \ge 0.\\ 0, &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

For \(k = 2\), we simply have \(f(x) = \frac{\exp (-\frac{x}{2})}{2}, ~if~x\ge 0\). As we have been given \(s_n = n \mathbf {w}_n \mathbf {\Sigma }_n^{-1}\mathbf {w}_n\) which essentially depends on \(\mathbf {z}_i ~\forall i \in [n]\). Now, we define \(s_n'\) which differs from \(s_n\) at only one sample i.e. \(s_n'\) depends on \(\mathbf {z}_1, \cdots \mathbf {z}_{i'} , \cdots \mathbf {z}_n\). We denote \(\varDelta = s_n - s_n'\). The privacy guarantee is to bound the following term:

$$\begin{aligned}&\frac{p\big (s_n + x = s_n + x_0\big )}{p\big (s_n + x = s_n + x_0\big )} = \frac{p\big ( x = x_0\big )}{p\big ( x = s_n'-s_n + x_0\big )} \end{aligned}$$
(28)
$$\begin{aligned}&= \frac{\exp \Big (-\frac{x_0}{2}\Big )}{\exp \Big (-\frac{s_n'-s_n + x_0}{2}\Big )} = \exp \Big (-\frac{s_n-s_n'}{2}\Big ) \le \exp \Big ( \frac{GS_2}{2}\Big ) \end{aligned}$$
(29)

Hence, we get the final privacy guarantee by Eq. (29). But the problem to this approach that since the support for chi-square distributions are limited to positive real numbers. Hence the distribution in the numerator and denominator in the Eq. (29) might have different support which essentially makes the privacy analysis almost impossible in the vicinity of zero and beyond. Hence, to hold Eq. (29), \(x_0\) must be greater than \(s_n - s_n'\) for all two neighbouring dataset which essentially implies \(x_0 > GS_2(s_n)\). Hence, we get no privacy guarantee at all when the test statistics lies very close to zero.

However, proposing alternate null distribution is simple in this case. As sum of two chi-square random variable is still a chi-square with increased degree of freedom. Let \(X_1\) and \(X_2\) denote 2 independent random variables that follow these chi-square distributions :

$$\begin{aligned} X_1 \sim \chi ^2(r_1) ~~~\text {and} ~~~ X_2 \sim \chi ^2(r_2) \end{aligned}$$

then \(Y = (X_1 + X_2) \sim \chi ^2(r_1 + r_2)\). Hence, the perturbed statistics will follow chi-square random variable with \(J+2\) degree of freedom.

1.3 C.3 Adding Noise to \(\mathbf {\Sigma }_n^{-1/2}\mathbf {w}_n\)

One might also achieve the goal to make test statistics private by adding Gaussian noise in the quantity \(\sqrt{n}\mathbf {\Sigma }_n^{-1/2} \mathbf {w}_n\) and finally taking the \(2-\)norm of the perturbed quantity. As we have done the sentitivity analysis of \(\mathbf {w}_n^\top \mathbf {\Sigma }^{-1} \mathbf {w}_n\) in the Theorem 1, the sensitivity analysis of \(\sqrt{n}\mathbf {\Sigma }^{-1/2}\mathbf {w}_n\) can be done in very similar way. Again from the application of slutsky’s theorem, we can see that asymptotically the perturbed test statistics will converge to the true one. However, similar to Sect. 5, we approximate it with the other null distribution which shows more power experimentally under the noise as well. Suppose we have to add the noise \(\mathbf {\varvec{\eta }}\sim \mathcal {N}(0,\sigma ^2(\epsilon ,\delta _n))\) in the \(\mathbf {\Sigma }_n^{-1/2}\mathbf {w}_n\) to make the statistics \(s_n\) private. The noisy statistics is then can be written as

$$ \tilde{s}_n = \sqrt{n} \left( \mathbf {\Sigma }_n^{-1/2} \mathbf {w}_n + \mathbf {\varvec{\eta }}\right) ^\top \sqrt{n}\left( \mathbf {\Sigma }_n^{-1/2} \mathbf {w}_n + \mathbf {\varvec{\eta }}\right) $$

Eventually, \(\tilde{s}_n\) can be written as the following: \(\tilde{s}_n = \left( \widetilde{\mathbf {\Sigma }_n^{-1/2} \mathbf {w}_n}\right) ^\top \mathbf {A}~ \left( \widetilde{\mathbf {\Sigma }_n^{-1/2} \mathbf {w}_n}\right) \) where

$$\begin{aligned} \widetilde{\mathbf {\Sigma }_n^{-1/2} \mathbf {w}_n} = \begin{pmatrix} \sqrt{n} \mathbf {\Sigma }_n^{-1/2} \mathbf {w}_n \\ \sqrt{n} \frac{\mathbf {\varvec{\eta }}}{\sigma (\epsilon ,\delta _n)} \end{pmatrix} \end{aligned}$$
(30)

\(\widetilde{\mathbf {\Sigma }_n^{-1/2} \mathbf {w}_n}\) is a 2J dimensional vector. The corresponding covariance matrix \(\hat{\mathbf {\Sigma }}_n\) is an identity matrix \(\mathbf {I}_{2J}\) of dimension \(2J\times 2J\). Hence, under the null \(\widetilde{\mathbf {\Sigma }_n^{-1/2} \mathbf {w}_n} \sim \mathcal {N}(0, \mathbf {I}_{2J})\). We define one more matrix which we call as \(\mathbf {A}\) which is

$$\begin{aligned} \mathbf {A} = \begin{bmatrix} \mathbf {I}_J &{}&{} \mathbf {V} \\ \mathbf {V} &{}&{} \mathbf {V}^2 \end{bmatrix} \text { where } \mathbf {V} ~=~\text {Diag}(\sigma (\epsilon ,\delta _n)) \end{aligned}$$
(31)

By definition matrix A is a symmetric matrix which essentially means that there exist a matrix \(\mathbf {H}\) such that \(\mathbf {H}^\top \mathbf {A} \mathbf {H} = diag(\lambda _1, \lambda _2 \cdots \lambda _r)\) where \(\mathbf {H}^\top \mathbf {H} = \mathbf {H} \mathbf {H}^\top = \mathbf {I}_J\). Now if we consider a random variable \(\mathbf {N}_2 \sim \mathcal {N}(0,\mathbf {I}_2J)\) and \(\mathbf {N}_1 = \mathbf {H} \mathbf {N}_2\) then following holds asymptotically:

$$\begin{aligned} \left( \widetilde{\mathbf {\Sigma }_n^{-1/2} \mathbf {w}_n}\right) ^\top \mathbf {A}~ \left( \widetilde{\mathbf {\Sigma }_n^{-1/2} \mathbf {w}_n}\right) \sim (\mathbf {N}_2)^\top \mathbf {A}~(\mathbf {N}_2) \sim (\mathbf {H} \mathbf {N}_2)^\top \mathbf {A}~(\mathbf {H} \mathbf {N}_2)\sim \sum _{i=1}^r \lambda _i \chi _1^{2,i} \end{aligned}$$

As a short remark, we would like to mention that the in this approach the weights for the weighted sum of \(\chi ^2\)-random variable are not directly dependent on the data which is essentially a good thing from the privacy point of view. Sensitivity of \(\mathbf {\Sigma }_n^{-1/2}\mathbf {w}_n\) can be computed in a similar way as in Theorem 1.

D Perturbed Samples Interpretation of Private Mean and Co-variance

In order to define differential privacy, we need to define two neighbouring dataset \(\mathcal {D}\) and \(\mathcal {D}'\). Let us consider some class of databases \(\mathcal {D}^N\) where each datset differ with another at just one data point. Let us also assume that each database carries n data points of dimension d each. Now if we privately want to release data then we consider a function \(f:\mathcal {D}^N \rightarrow \mathbb {R}^{nd}\) which simply takes all n data points of the database and vertically stack them in one large vector of dimension nd. It is not hard to see now that:

$$\begin{aligned} GS_2(f) = \sup _{\mathcal {D},\mathcal {D}'} \Vert f(\mathcal {D}) - f(\mathcal {D}') \Vert _2 \approx \mathcal {O}(diam(\mathcal {X})) \end{aligned}$$
(32)

where \(diam(\mathcal {X})\) denotes the input space. Since the sensitive is way too high (of the order of diameter of input space), the utility of the data is reduced by a huge amount after adding noise in it.

Here below now we discuss the perturbed sample interpretation of private mean and co-variance. That is to anylyze what level of noise added directly on samples itself would follow the same distribution as private mean. From Lemma 2, we see that the variance of the noise come out to be much more tractable in private mean case than adding noise directly to samples.

Lemma 2

Let us assume that \(\sqrt{n}\tilde{\mathbf {w}}_n = \sqrt{n} \mathbf {w}_n + \eta \) where \(\eta \sim \mathcal {N}(0,\frac{c}{n})\) for any positive constant c, \(\sqrt{n} \mathbf {w}_n = \frac{1}{\sqrt{n}}\sum _{i=1}^n \mathbf {z}_i\) and \(\mathbf {z}_i\)s are i.i.d samples. Then \(\sqrt{n} \mathbf {w}_n \rightarrow \frac{1}{\sqrt{n}}\sum _{i=1}^n \tilde{\mathbf {z}}_i\) where \(\tilde{\mathbf {z}}_i = \mathbf {z}_i + \zeta \) and \(\zeta \sim \mathcal {N}(0,\sigma _p^2)\) if \(\sigma _p^2 = \frac{c}{n}\)

Proof

It is easier to see that \(\mathbb {E}~\left[ \sqrt{n}\tilde{\mathbf {w}}_n\right] = \sqrt{n} \mathbf {w}_n = \mathbb {E}~\left[ \frac{1}{\sqrt{n}}\sum _{i=1}^n \tilde{\mathbf {z}}_i \right] \).

Now, we try to analyze the variance of both the term.

$$\begin{aligned} \frac{c}{n} = n \frac{\sigma _p^2}{n} \end{aligned}$$

Hence, \(\sigma _p^2 = \frac{c}{n}\)

Now similar to Lemma 2, we want to translate the noise added in the covariance matrix to the sample case. The empirical covaraince matrix \(\mathbf {\Sigma }_n = \frac{1}{n-1}\sum _{i=1}^n (\mathbf {z}_i - \mathbf {w}_n)(\mathbf {z}_i - \mathbf {w}_n)^\top \). For now, if we say \((\mathbf {z}_i - \mathbf {w}_n) = \hat{\mathbf {z}}_i\), then \(\mathbf {\Sigma }_n = \sum _{i=1}^n \frac{\hat{\mathbf {z}}_i}{\sqrt{n-1}} \frac{\hat{\mathbf {z}}_i}{\sqrt{n-1}}^\top \). Now, adding a Gaussian noise in each \(\hat{\mathbf {z}}_i\) results in the following:

$$\begin{aligned} \hat{\mathbf {\Sigma }}_n&= \sum _{i=1}^n \left( \frac{\hat{\mathbf {z}}_i}{\sqrt{n-1}} + \eta _i\right) \left( \frac{\hat{\mathbf {z}}_i}{\sqrt{n-1}} + \eta _i\right) ^\top ~~~~~\text {where } \eta _i \sim \mathcal {N}(0,\sigma ^2(\epsilon ,\delta _n)) \\&=\sum _{i=1}^n \left( \frac{\hat{\mathbf {z}}_i}{\sqrt{n-1}} \frac{\hat{\mathbf {z}}_i}{\sqrt{n-1}}^\top + \frac{\hat{\mathbf {z}}_i}{\sqrt{n-1}} \eta _i ^\top + \eta _i \frac{\hat{\mathbf {z}}_i}{\sqrt{n-1}}^\top + \eta _i \eta _i^\top \right) \end{aligned}$$

As can be seen by the above equations, we have similar terms like adding wishart noise in the covariance matrix with 2 extra cross terms. Hence instead of using the matrix \(\tilde{\mathbf {\Sigma }}_n\), one can use \(\hat{\mathbf {\Sigma }}_n\) for \(\mathbf {\Sigma }_n\) to compute the weights for the null distribution i.e. weighted sum of chi-square in Sect. 5.1.

E Additional Experimental Details

We see that indeed the Type I error is approximately controlled at the required level for TCMC, TCS and NTE algorithm, for both versions of the test, as shown in Fig. 5, note that here we allow some leeway due to multiple testing. Again, we emphasis that using the asymptotic \(\chi ^2\) distribution naively would provide inflated Type I error as shown in Fig. 6.

In Fig. 7, we show the effect of the regularisation parameter \(\gamma _n\) on the TCS algorithm performance in terms of Type I error and power on the SG, GMD and GVD datasets. For simplicity, we take \(\gamma _n = \gamma \) here, rather then let it depend on the sample size n. From the results, we can see that if the \(\gamma \) to be too small, we will inject too much noise, and hence we will lose power. Note that any \(\gamma >0\) will provide us differential privacy, however if we choose it to be too large, our null distribution will now be mis-calibrated, hurting performance. Hence, there is a trade off between calibration of the null distribution and also the level of noise you need to add.

Fig. 5.
figure 5

Type I Error for the SG Dataset, with baselines ME and SCF, \(\delta =1e-5\). Left: Vary \(\epsilon \), fix \(n=10000\) Left: Vary n, fix \(\epsilon =2.5\)

F Proof of Theorem 5.1

Proof

The variance \(\sigma _\mathbf {n}^2\) of the zero-mean noise term \(\mathbf {n}\) added to the mean vector \(\mathbf {w}_n\) is of the order \(\mathcal {O}(\frac{1}{n^2})\). Hence the variance of \(\sqrt{n}\mathbf {n}\) is of the order \(\mathcal {O}(\frac{1}{{n}})\). According to Slutsky’s theorem, \(\sqrt{n}\tilde{\mathbf {w}}_n\) and \(\sqrt{n}\mathbf {w}_n\) thus converge to the same limit in distribution, which under the null hypothesis is \(\mathcal N(0,\mathbf {\Sigma })\), with \(\mathbf {\Sigma }=\mathbb E\left[ \mathbf {z}\mathbf {z}^\top \right] \). Similarly, the eigenvalues of the covariance matrix corresponding to the Wishart noise to be added in \(\mathbf {\Sigma }_n\) are also of the order \(\mathcal {O}(\frac{1}{n})\) which implies that \(\tilde{\mathbf {\Sigma }}_n + \gamma _n I\) and \(\mathbf {\Sigma }_n + \gamma _n I\) converge to the same limit, i.e. \(\mathbf {\Sigma }\). Therefore, \(\tilde{s}_n\) converges in distribution to the same limit as the non-private test statistic, i.e. a chi-squared random variable with J degrees of freedom. We also assume that \(\tilde{\mathbf {\Sigma }}^{-1}\) and \(\mathbf {\Sigma }\) is bounded above by a constant c. If under the alternate we have \(\mathbf {w}^\top \mathbf {\Sigma }^{-1}\mathbf {w}\ge \mathcal {O}(n^{-\gamma }) \text{ for } \gamma <1\) which is also related to smallest local departure detectable [16]. Then, we consider the following:

We consider the following term:

Let us consdier the term 1 first.

Again since variance \(\sigma ^2\) for is of the order of \(\mathcal {O}(n^{-2})\), hence by chebyshev inequality [5] term 1 goes down at the rate \(\mathcal {O}(n^{-1})\). Similarly, we consider the term 2.

Now,

Using the same arguement as before, for a fixed J, term 21 will go down as \(\mathcal {O}(n^{-2})\) and term 22 will go down as \(\mathcal {O}(n^{-1})\). Hence, under the alternate and assumption mentioned in the proof \(\tilde{s}_n = s_n(1+\epsilon )\) where \(\epsilon \) goes down with rate \(\mathcal {O}(n^{-1+\gamma })\) (Fig. 8).

Fig. 6.
figure 6

Type I error for the SCF versions of the test, using the asymptotic \(\chi ^2\) distribution as the null distribution.

Fig. 7.
figure 7

Type I error for the SG dataset, Power for the GMD, GVD dataset over 500 runs, with \(\delta =1e-5\) for the TCS algorithm with different regularisations. Top: Varying \(\epsilon \) with \(n=10000\). Bottom: Varying n with \(\epsilon =2.5\). Here Asym * represents using the asymptotic \(\chi ^2\) null distribution.

Fig. 8.
figure 8

Type I error for the SG dataset, Power for the GMD, GVD dataset over 500 runs, with \(\delta =1e-5\) for the TCMC and NTE algorithm with different covariance pertubation methods. Here, we vary privacy level \(\epsilon \) with test samples \(n=10000\). Here we *-Asym represents using the asymptotic \(\chi ^2\) null distribution.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Raj, A., Law, H.C.L., Sejdinovic, D., Park, M. (2020). A Differentially Private Kernel Two-Sample Test. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46150-8_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46149-2

  • Online ISBN: 978-3-030-46150-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics