Skip to main content
Log in

A low cost and un-cancelled laplace noise based differential privacy algorithm for spatial decompositions

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Differentially private spatial decompositions split the whole domain into sub-domains recursively to generate a hierarchical private tree and add Laplace noise to each node’s points count. However the Laplace distribution is symmetric about the origin, the mean of a large number of queries may cancel the Laplace noise and reveal privacy. Existing methods take the solution by limiting the number of queries. But in private tree the points count of intermediate node may be real since the summation of all its descendants may cancel the Laplace noise. To address these problems of differentially private spatial decompositions, we propose a more secure algorithm to make the Laplace noise not be canceled. That splits the domains depending on its real points count not noisy, and only adds indefeasible Laplace noise to leaves. That the i th randomly selected leaf of one intermediate node is added noise by \(\frac {\left (\upbeta -i+1 \right )+1+\upbeta }{(\upbeta -i+1)+\upbeta }Lap(\lambda )\). We also propose the definition of Lapmin(λ) whose absolute value is not greater than Sensitivity(f). It is proved that adding Lapmin(λ) noise to query answer guarantees both differential privacy and minimal relative error comparing with unlimited Laplace noise. The experiment results show that our algorithm performs better both on synthetic and real datasets with higher security and data utility, and the noises costs is highly decreased.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

Notes

  1. http://research.microsoft.com/apps/pubs/?id=152883

  2. http://publish.illinois.edu/dbwork/open-data/

References

  1. Shen, F., Mu, Y., Yang, Y., Liu, W., Li, L., Song, J., Shen, H.T.: Classification by retrieval: Binarizing data and classifiers. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, pp. 595–604 (2017)

  2. An, L., Wang, W., Shang, S., Li, Q., Zhang, X.: Efficient task assignment in spatial crowdsourcing with worker and task privacy protection. GeoInformatica 22 (2), 335–362 (2018)

    Article  Google Scholar 

  3. An, L., Li, Z., Liu, G., Zheng, K., Zhang, M., Li, Q., Zhang, X.: Privacy-preserving task assignment in spatial crowdsourcing. J. Comput. Sci Technol. 32(5), 905–918 (2017)

    Article  MathSciNet  Google Scholar 

  4. Xiao, M., Ma, K., Liu, A., Zhao, H., Li, Z., Zheng, K., Zhou, X.: Sra: Secure reverse auction for task assignment in spatial crowdsourcing. IEEE Trans. Knowl. Data Eng. PP, 1–1 (2019). https://doi.org/10.1109/TKDE.2019.2893240

    Article  Google Scholar 

  5. Li, X., Song, J., Gao, L., Liu, X., Huang, W., He, X., Gan, C.: Beyond rnns: Positional Self-Attention with Co-Attention for Video Question Answering. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, the Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, the Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, pp. 8658–8665 (2019)

    Article  Google Scholar 

  6. Zhai, D., Sun, Y., An, L., Li, Z., Liu, G., Zhao, L., Zheng, K.: Towards secure and truthful task assignment in spatial crowdsourcing. World Wide Web 22(5), 2017–2040 (2019)

    Article  Google Scholar 

  7. Friedman, A., Schuster, A.: Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 493–502 (2010)

  8. Fung, B.C.M., Ke, W., Chen, R., Yu, P.S.: Privacy-preserving data publishing: A survey of recent developments. ACM Comput Surv. 42(4), 14:1–14:53 (2010)

    Article  Google Scholar 

  9. Hardt, M., Ligett, K., McSherry, F.: A simple and practical algorithm for differentially private data release. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held, Lake Tahoe, Nevada, pp. 2348–2356 (2012)

  10. Yu, S.: Big privacy: Challenges and opportunities of privacy study in the age of big data. IEEE Access 4, 2751–2763 (2016)

    Article  Google Scholar 

  11. Sweeney, L.: k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzz. Knowl.-Based Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  Google Scholar 

  12. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, Atlanta, GA, pp. 24 (2006)

  13. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, pp. 106–115 (2007)

  14. Dwork, C.: Differential Privacy. In: Automata, Languages and Programming, 33Rd International Colloquium, ICALP 2006, Venice, Italy, Proceedings, Part II, pp. 1–12 (2006)

  15. Dwork, C.: Differential Privacy: A Survey of Results. In: Theory and Applications of Models of Computation, 5Th International Conference, TAMC 2008, Xi’an, China, Proceedings, pp. 1–19 (2008)

  16. Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)

    Article  Google Scholar 

  17. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3-4), 211–407 (2014)

    Article  MathSciNet  Google Scholar 

  18. Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)

    Article  Google Scholar 

  19. McSherry, F., Talwar, K.: Mechanism Design via Differential Privacy. In: 48Th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, Proceedings, pp. 94–103 (2007)

  20. McSherry, F., Mironov, I.: Differentially private recommender systems: Building privacy into the netflix prize contenders. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, pp. 627–636 (2009)

  21. Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G., Winslett, M.: Differentially private histogram publication. VLDB J. 22(6), 797–822 (2013)

    Article  Google Scholar 

  22. Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. In: Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, Long Beach, California, pp 225–236 (2010)

  23. Mohammed, N., Chen, R., Fung, B.C.M., Yu, P.S.: Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, pp. 493–501 (2011)

  24. Verykios, V.S., Bertino, E., Fovino, I.N., Provenza, L.P., Saygin, Y., Theodoridis, Y.: State-of-the-art in privacy preserving data mining. SIGMOD Rec. 33(1), 50–57 (2004)

    Article  Google Scholar 

  25. Xu, Y., Ma, T., Tang, M., Tian, W.: A survey of privacy preserving data publishing using generalization and suppression (2014)

  26. Cormode, G., Procopiuc, C.M., Srivastava, D., Shen, E., Yu, T.: Differentially Private Spatial Decompositions. In: IEEE 28Th International Conference on Data Engineering (ICDE 2012), Washington, DC, pp. 20–31 (2012)

  27. Zhang, J., Xiao, X., Xie, X.: Privtree: A differentially private algorithm for hierarchical decompositions. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, pp. 155–170 (2016)

  28. Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Privbayes: Private Data Release via Bayesian Networks. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, pp. 1423–1434 (2014)

  29. Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Private release of graph statistics using ladder functions. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, pp. 731–745 (2015)

  30. Xiao, M., Wu, J., Huang, L., Cheng, R., Wang, Y.: Online task assignment for crowdsensing in predictable mobile social networks. IEEE Trans. Mob Comput. 16 (8), 2306–2320 (2017)

    Article  Google Scholar 

  31. Xiao, Z., Huang, W.: Kd-Tree Based Nonuniform Simplification of 3D Point Cloud. In: 2009 Third International Conference on Genetic and Evolutionary Computing, WGEC 2009, Guilin, China, pp. 339–342 (2009)

  32. Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD’84, Proceedings of Annual Meeting, Boston, Massachusetts, pp. 47–57 (1984)

  33. Hans, L.: Bodlaender a linear time algorithm for finding tree-decompositions of small treewidth (1996)

  34. Demaine, E.D., Mozes, S., Rossman, B., Weimann, O.: An Optimal Decomposition Algorithm for Tree Edit Distance. In: Automata, Languages and Programming, 34Th International Colloquium, ICALP 2007, Wroclaw, Poland, Proceedings, pp. 146–157 (2007)

  35. Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, Indianapolis, Indiana, pp. 123–134 (2010)

  36. Qardaji, W.H., Yang, W., Li, N.: Differentially Private Grids for Geospatial Data. In: 29Th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013, Pp 757–768 (2013)

  37. Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2011)

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Ping Huang for helping revise the paper over and over again. This work was supported by the National Natural Science Foundation of China under grants No. 61821003 and No.61902135, the National Key Research and Development Program of China under grant No. 2016YFB0800402, ARC Discovery Early Career Researcher Award (DE160100308) and ARC Discovery Project (DP170103954;DP190101985).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunhua Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Trust, Privacy, and Security in Crowdsourcing Computing

Guest Editors: An Liu, Guanfeng Liu, Mehmet A. Orgun, and Qing Li

Appendix

Appendix

1.1 Proof of Theorem 1

Let F1, F2,⋅⋅⋅Fk be k algorithms, each of them satisfies εi-differential privacy (\(i\in \left [1,\textit {k}\right ]\)). Then the sequential composition (F1, F2,⋅⋅⋅Fk) satisfies \((\sum \limits _{\textit {i=1}}^{\textit {k}}\varepsilon _{\textit {i}})\)-differential privacy.

Proof

The theorem is a serial combination of ε-differential privacy. The algorithm Fi(D) satisfies εi-differential privacy, so:

$$\forall O_{i}\in Range(F_{i}), \ Pr[F_{i}(D)=O_{i}]\leq e^{\varepsilon_{i}} * Pr[F_{i}(D^{\prime})=O_{i}]$$

Assume that F(D) is the serial combination of all the Fi(D), and its output is {O1, O2,..., Ok}. As any two Fi(D) are independent, so:

$$\forall O=\{O_{1},O_{2},...,O_{k}\}\in Range(F), \ Pr[F(D)=O]={\prod}_{i=1}^{k} Pr[F_{i}(D)=O_{i}]$$
$$Pr[F(D)=O]={\prod}_{i=1}^{k}Pr[F_{i}(D)=O_{i}]\leq {\prod}_{i=1}^{k}(e^{\varepsilon_{i}} \times Pr[F_{i}(D^{\prime})=O_{i}] ) $$
$$=e^{{\sum}_{i=1}^{k}\varepsilon_{i}}\times {\prod}_{i=1}^{k}Pr[F_{i}(D^{\prime})=O_{i}]=e^{{\sum}_{i=1}^{k}\varepsilon_{i}}\times Pr[F(D^{\prime})=O]$$

We can see that \(Pr[F(D)=O]=e^{{\sum }_{i=1}^{k}\varepsilon _{i}}\times Pr[F(D^{\prime })=O]\), so the Fi(D)’s serial combination F(D) satisfies \({\sum }_{i=1}^{k} \varepsilon _{i}\)-differential privacy. □

1.2 Proof of Lemma 1

$$ \begin{array}{@{}rcl@{}} ln\frac{Pr[D\rightarrow T]}{Pr[D^{\prime}\rightarrow T]}=\sum\limits_{i=1}^{h-1} ln \frac{Pr[v_{i}.count+\frac{i+1+\upbeta }{i+\upbeta }Lap(\lambda )>\theta ]}{Pr[v_{i}^{\prime}.count+\frac{i+1+\upbeta }{i+\upbeta }Lap(\lambda )>\theta ]}\\ +ln \frac{Pr[v_{h}.count+\frac{i+1+\upbeta }{i+\upbeta }Lap(\lambda )=\overline{v_{h}.count}]}{Pr[v_{h}^{\prime}.count+\frac{i+1+\upbeta }{i+\upbeta }Lap(\lambda )=\overline{v_{h}.count}]}<\frac{h}{\lambda } \end{array} $$

Proof

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{i=1}^{h-1} ln \frac{Pr[v_{i}.count+\frac{i+1+\upbeta }{i+\upbeta }Lap(\lambda )>\theta ]}{Pr[v_{i}^{\prime}.count+\frac{i+1+\upbeta }{i+\upbeta }Lap(\lambda )>\theta ]}\\ &&=\sum\limits_{i=1}^{h-1} ln \frac{Pr[Lap(\lambda )<(v_{i}.count-\theta)\frac{i+1+\upbeta }{i+\upbeta }]}{Pr[Lap(\lambda )<(v_{i}^{\prime}.count-\theta)\frac{i+1+\upbeta }{i+\upbeta }]}\\ \end{array} $$

For Laplace distribution function is:

$$ \begin{array}{@{}rcl@{}} F(x)= \left\{ \begin{array}{lll} \frac{1}{2}e^{\frac{x}{\lambda }}~~~~~~~~~~~~~~~~x<0\\ 1-\frac{1}{2}e^{-\frac{x}{\lambda }}~~~~~~~~~x\geq0 \end{array} \right. \end{array} $$

ifvi.count < 𝜃, \(F(x)=\frac {1}{2}e^{\frac {x}{\lambda }}, then\):

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{i=1}^{h-1} ln \frac{Pr[Lap(\lambda )<(v_{i}.count-\theta)\frac{i+1+\upbeta }{i+\upbeta }]}{Pr[Lap(\lambda )<(v_{i}^{\prime}.count-\theta)\frac{i+1+\upbeta }{i+\upbeta }]}\\ &&=\sum\limits_{i=1}^{h-1} ln(e^{\frac{1}{\lambda }\cdot \frac{i+1+\upbeta }{i+\upbeta }[(v_{i}.count-\theta)-(v_{i}^{\prime}.count-\theta)])})\\ &&=\sum\limits_{i=1}^{h-1}\frac{1}{\lambda }\cdot \frac{i+1+\upbeta }{i+\upbeta }<{\sum}_{i=1}^{h-1}\frac{1}{\lambda }=\frac{h-1}{\lambda } \end{array} $$

elseifvi.count𝜃, \(F(x)=1-\frac {1}{2}e^{-\frac {x}{\lambda }}\),and in [27] it has proved that:

$$ ln\frac{Pr[Lap(\lambda )<(v_{i}.count-\theta)]}{Pr[Lap(\lambda )<(v_{i}^{\prime}.count-\theta)]}\leq\frac{1}{\lambda} $$
$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{i=1}^{h-1} ln \frac{Pr[Lap(\lambda )<(v_{i}.count-\theta)\frac{i+1+\upbeta }{i+\upbeta }]}{Pr[Lap(\lambda )<(v_{i}^{\prime}.count-\theta)\frac{i+1+\upbeta }{i+\upbeta }]}\\ &&\leq\sum\limits_{i=1}^{h-1} \frac{1}{\lambda}=\frac{h-1}{\lambda } \end{array} $$
$$ \begin{array}{@{}rcl@{}} &&ln \frac{Pr[v_{h}.count+\frac{i+1+\upbeta }{i+\upbeta }Lap(\lambda )=\overline{v_{h}.count}]}{Pr[v_{h}^{\prime}.count+\frac{i+1+\upbeta }{i+\upbeta }Lap(\lambda )=\overline{v_{h}.count}]}\\ &&=ln \frac{Pr[\frac{i+1+\upbeta }{i+\upbeta }Lap(\lambda )=\overline{v_{h}.count}-v_{h}.count]}{Pr[\frac{i+1+\upbeta }{i+\upbeta }Lap(\lambda )=\overline{v_{h}.count}-v_{h}^{\prime}.count]}\\ &&=ln \frac{Pr[Lap(\lambda )=(\overline{v_{h}.count}-v_{h}.count)\cdot\frac{i+\upbeta }{i+1+\upbeta } ]}{Pr[Lap(\lambda )=(\overline{v_{h}.count}-v_{h}^{\prime}.count)\cdot \frac{i+\upbeta }{i+1+\upbeta }]}\\ &&=ln~e^{\frac{i+\upbeta }{i+1+\upbeta }\cdot \frac{|(\overline{v_{h}.count}-v_{h}.count)|-|(\overline{v_{h}.count}-v_{h}^{\prime}.count)|}{\lambda }}\\ &&\leq\frac{i+\upbeta }{i+1+\upbeta }\cdot\frac{1}{\lambda}\\ &&<\frac{1}{\lambda} \end{array} $$

1.3 Proof of Lemma 2

The total noises cost of the private tree generated by improved algorithm is \(\frac {t}{\upbeta }\cdot \)\(\sum \limits _{i=1}^{\upbeta }|\frac {i+1+\upbeta }{i+\upbeta }Lap(\lambda )|\), it is smaller than the noises cost of private tree generated by algorithm 1,which is \(\sum \limits _{i=1}^{n}|Lap(\lambda )|\).

Proof

The proof of \(\frac {t}{\upbeta }\cdot \)\(\sum \limits _{i=1}^{\upbeta }|\frac {i+1+\upbeta }{i+\upbeta }Lap(\lambda )|<\sum \limits _{i=1}^{n}|Lap(\lambda )|\) is equivalent to the proof of \(\frac {t}{\upbeta }\cdot \)\(\sum \limits _{i=1}^{\upbeta }\frac {i+1+\upbeta }{i+\upbeta }<n\).

$$ \begin{array}{@{}rcl@{}} &&\frac{t}{\upbeta }\cdot\sum\limits_{i=1}^{\upbeta }\frac{i+1+\upbeta }{i+\upbeta}=\frac{t}{\upbeta }\cdot\sum\limits_{i=1}^{\upbeta }(1+\frac{1}{i+\upbeta})\\ &&=\frac{t}{\upbeta }(\upbeta+\sum\limits_{i=1}^{\upbeta }(\frac{1}{i+\upbeta}))=t+\frac{t}{\upbeta }\sum\limits_{i=1}^{\upbeta }(\frac{1}{i+\upbeta})\\ &&<t+\frac{t}{\upbeta }\sum\limits_{i=1}^{\upbeta }(\frac{1}{\upbeta})=t+\frac{t}{\upbeta }\\ &&<t+\frac{t-1}{\upbeta-1 }=t+m=n \end{array} $$

1.4 Proof of Lemma 3

Assume that there are xk leaves which are k th selected and add noises with f(k)Lap(λ) in the β-tree, for anyk ∈ [1,β ]. The total noises cost of the private tree \(\sum \limits _{k=1}^{\upbeta }|x_{k}f(k)Lap(\lambda )|\) =(x1f(1) + ⋅⋅⋅ + xβf(β))|Lap(λ)| (and \(\sum \limits _{k=1}^{\upbeta }x_{k}=t\)) is smaller than the noises cost of the full β-tree, which have the same leaves count and intermediate nodes count with the general β-tree. Besides, the noises cost of full β-tree can be symbolic simplified \(\frac {t}{\upbeta }\sum \limits _{k=1}^{\upbeta }|f(k)Lap(\lambda )|\).

Proof

The proof of \(\sum \limits _{k=1}^{\upbeta }|x_{k}f(k)Lap(\lambda )|<\frac {t}{\upbeta }\sum \limits _{k=1}^{\upbeta }|f(k)Lap(\lambda )|\) is equivalent to the proof of \(\sum \limits _{k=1}^{\upbeta }x_{k}f(k)<\frac {t}{\upbeta }\sum \limits _{k=1}^{\upbeta }f(k)\), and \(\sum \limits _{k=1}^{\upbeta }x_{k}=t\);

Because the β-tree adds noise into an intermediate node’s i th selected child leaf with \(\frac {\left (\upbeta -i+1 \right )+1+\upbeta }{(\upbeta -i+1)+\upbeta }Lap(\lambda )\), where i increases from 1 to β. For any \(k\in [1,\frac {\upbeta }{2}]\), \(\frac {t}{\upbeta }-x_{k}=x_{\upbeta +1-k}-\frac {t}{\upbeta }>0\).

$$ \begin{array}{@{}rcl@{}} &&D_{PrivCost}=\sum\limits_{k=1}^{\upbeta}x_{k}f(k)-\frac{t}{\upbeta}\sum\limits_{k=1}^{\upbeta}f(k)\\ &&=x_{1}f(1)+\cdot\cdot\cdot +x_{\upbeta}f(\upbeta)-c(f(1)+\cdot\cdot\cdot +f(\upbeta))\\ &&=(x_{1}-\frac{t}{\upbeta})(f(1)-f(\upbeta))+\cdot\cdot\cdot +(x_{\frac{\upbeta}{2}}-\frac{t}{\upbeta})(f(\frac{\upbeta}{2})\\ &&-f(\frac{\upbeta}{2}+1))\\ &&={\sum}_{k=1}^{\frac{\upbeta}{2}}(x_{k}-\frac{t}{\upbeta})(f(k)-f(\upbeta+1-k)) \end{array} $$

For any \(k\in [1,\frac {\upbeta }{2}]\), \(x_{k}-\frac {t}{\upbeta }<0,f(k)-f(\upbeta +1-k)>0\), so DPrivCost < 0. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Wang, Y., Song, J. et al. A low cost and un-cancelled laplace noise based differential privacy algorithm for spatial decompositions. World Wide Web 23, 549–572 (2020). https://doi.org/10.1007/s11280-019-00769-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-019-00769-8

Keywords

Navigation