Advertisement

Frontiers of Computer Science

, Volume 13, Issue 2, pp 382–395 | Cite as

Differentially private high-dimensional data publication via grouping and truncating techniques

  • Ning Wang
  • Yu Gu
  • Jia Xu
  • Fangfang Li
  • Ge YuEmail author
Research Article
  • 18 Downloads

Abstract

The count of one column for high-dimensional datasets, i.e., the number of records containing this column, has been widely used in numerous applications such as analyzing popular spots based on check-in location information and mining valuable items from shopping records. However, this poses a privacy threat when directly publishing this information. Differential privacy (DP), as a notable paradigm for strong privacy guarantees, is thereby adopted to publish all column counts. Prior studies have verified that truncating records or grouping columns can effectively improve the accuracy of published results. To leverage the advantages of the two techniques, we combine these studies to further boost the accuracy of published results. However, the traditional penalty function, which measures the error imported by a given pair of parameters including truncating length and group size, is so sensitive that the derived parameters deviate from the optimal parameters significantly. To output preferable parameters, we first design a smart penalty function that is less sensitive than the traditional function. Moreover, a two-phase selection method is proposed to compute these parameters efficiently, together with the improvement in accuracy. Extensive experiments on a broad spectrum of real-world datasets validate the effectiveness of our proposals.

Keywords

differential privacy high-dimensional data truncation optimization grouping penalty function 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61433008, 61472071 and U143520006), the Fundamental Research Funds for the Central Universities of China (161604005 and 171605001), and the Natural Science Foundation of Liaoning Province (2015020018).

Supplementary material

11704_2017_6591_MOESM1_ESM.ppt (110 kb)
Supplementary material, approximately 110 KB.

References

  1. 1.
    Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. In: Proceedings of Theory of Cryptography Conference. 2006, 265–284CrossRefGoogle Scholar
  2. 2.
    Dwork C, Rothblum G N, Vadhan S P. Boosting and differential privacy. In: Proceedings of Annual IEEE Symposium on Foundations of Computer Science. 2010, 51–60Google Scholar
  3. 3.
    McSherry F, Talwar K. Mechanism design via differential privacy. In: Proceedings of Annual IEEE Symposium on Foundations of Computer Science. 2007, 94–103Google Scholar
  4. 4.
    McSherry F. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2009, 19–30CrossRefGoogle Scholar
  5. 5.
    Rastogi V, Nath S. Differentially private aggregation of distributed time-series with transformation and encryption. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2010, 735–746Google Scholar
  6. 6.
    Kellaris G, Papadopoulos S. Practical differential privacy via grouping and smoothing. In: Proceedings of International Conference on Very Large Data Bases. 2013, 301–312Google Scholar
  7. 7.
    Day W Y, Li N H. Differentially private publishing of high-dimensional data using sensitivity control. In: Proceedings of ACM Symposium on Information, Computer and Communications Security. 2015, 451–462Google Scholar
  8. 8.
    Day W Y, Li N H, Lyu M. Publishing graph degree distribution with node differential privacy. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2016, 123–138Google Scholar
  9. 9.
    Hardt M, Ligett K, McSherry F. A simple and practical algorithm for differentially private data release. In: Proceedings of Annual Conference on Neural Information Processing Systems. 2012, 2348–2356Google Scholar
  10. 10.
    Xiao X K, Wang G Z, Gehrke J. Differential privacy via wavelet transforms. In: Proceedings of International Conference on Data Engineering. 2010, 225–236Google Scholar
  11. 11.
    Zhang X J, Chen R, Xu J L, Meng X F, Xie Y T. Towards accurate histogram publication under differential privacy. In: Proceedings of SIAM International Conference on Data Mining. 2014, 587–595Google Scholar
  12. 12.
    Xu J, Zhang Z J, Xiao X K, Yang Y, Yu G. Differentially private histogram publication. In: Proceedings of IEEE International Conference on Data Engineering. 2012, 32–43Google Scholar
  13. 13.
    Hay M, Rastogi V, Miklau G, Suciu D. Boosting the accuracy of differentially private histograms through consistency. In: Proceedings of International Conference on Very Large Data Bases. 2010, 1021–1032Google Scholar
  14. 14.
    Li C, Hay M, Rastogi V, Miklau G, McGregor A. Optimizing linear counting queries under differential privacy. In: Proceedings of ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2010, 123–134Google Scholar
  15. 15.
    Qardaji W H, Yang W N, Li N H. Understanding hierarchical methods for differentially private histograms. In: Proceedings of International Conference on Very Large Data Bases. 2013, 1954–1965Google Scholar
  16. 16.
    Li C, Hay M, Miklau G, Wang Y. A data-and workload-aware query answering algorithm for range queries under differential privacy. In: Proceedings of International Conference on Very Large Data Bases. 2014, 341–352Google Scholar
  17. 17.
    Chen R, Mohammed N, Fung C M, Desai B C, Xiong L. Publishing set-valued data via differential privacy. In: Proceedings of International Conference on Very Large Data Bases. 2011, 1087–1098Google Scholar
  18. 18.
    Chen R, Ács G, Castelluccia C. Differentially private sequential data publication via variable-length n-grams. In: Proceedings of ACM Conference on Computer and Communications Security. 2012, 638–649Google Scholar
  19. 19.
    Zhang J, Cormode G, Procopiuc C M, Srivastava D, Xiao X K. Privbayes: private data release via bayesian networks. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2014, 1423–1434Google Scholar
  20. 20.
    Qardaji W H, Yang W N, Li N H. Priview: practical differentially private release of marginal contingency tables. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2014, 1435–1446Google Scholar
  21. 21.
    He X, Cormode G, Machanavajjhala A, Procopiuc C M, Srivastava D. DPT: differentially private trajectory synthesis using hierarchical reference systems. In: Proceedings of International Conference on Very Large Data Bases. 2015, 1154–1165Google Scholar
  22. 22.
    Kasiviswanathan S P, Nissim K, Raskhodnikova S, Smith A. Analyzing graphs with node differential privacy. In: Proceedings of Theory of Cryptography Conference. 2013, 457–476CrossRefGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringNortheastern UniversityShenyangChina
  2. 2.School of Computer, Electronics and InformationGuangxi UniversityGuangxiChina

Personalised recommendations