Skip to main content
Log in

Differentially private high-dimensional data publication via grouping and truncating techniques

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

The count of one column for high-dimensional datasets, i.e., the number of records containing this column, has been widely used in numerous applications such as analyzing popular spots based on check-in location information and mining valuable items from shopping records. However, this poses a privacy threat when directly publishing this information. Differential privacy (DP), as a notable paradigm for strong privacy guarantees, is thereby adopted to publish all column counts. Prior studies have verified that truncating records or grouping columns can effectively improve the accuracy of published results. To leverage the advantages of the two techniques, we combine these studies to further boost the accuracy of published results. However, the traditional penalty function, which measures the error imported by a given pair of parameters including truncating length and group size, is so sensitive that the derived parameters deviate from the optimal parameters significantly. To output preferable parameters, we first design a smart penalty function that is less sensitive than the traditional function. Moreover, a two-phase selection method is proposed to compute these parameters efficiently, together with the improvement in accuracy. Extensive experiments on a broad spectrum of real-world datasets validate the effectiveness of our proposals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. In: Proceedings of Theory of Cryptography Conference. 2006, 265–284

    Chapter  Google Scholar 

  2. Dwork C, Rothblum G N, Vadhan S P. Boosting and differential privacy. In: Proceedings of Annual IEEE Symposium on Foundations of Computer Science. 2010, 51–60

    Google Scholar 

  3. McSherry F, Talwar K. Mechanism design via differential privacy. In: Proceedings of Annual IEEE Symposium on Foundations of Computer Science. 2007, 94–103

    Google Scholar 

  4. McSherry F. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2009, 19–30

    Chapter  Google Scholar 

  5. Rastogi V, Nath S. Differentially private aggregation of distributed time-series with transformation and encryption. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2010, 735–746

    Google Scholar 

  6. Kellaris G, Papadopoulos S. Practical differential privacy via grouping and smoothing. In: Proceedings of International Conference on Very Large Data Bases. 2013, 301–312

    Google Scholar 

  7. Day W Y, Li N H. Differentially private publishing of high-dimensional data using sensitivity control. In: Proceedings of ACM Symposium on Information, Computer and Communications Security. 2015, 451–462

    Google Scholar 

  8. Day W Y, Li N H, Lyu M. Publishing graph degree distribution with node differential privacy. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2016, 123–138

    Google Scholar 

  9. Hardt M, Ligett K, McSherry F. A simple and practical algorithm for differentially private data release. In: Proceedings of Annual Conference on Neural Information Processing Systems. 2012, 2348–2356

    Google Scholar 

  10. Xiao X K, Wang G Z, Gehrke J. Differential privacy via wavelet transforms. In: Proceedings of International Conference on Data Engineering. 2010, 225–236

    Google Scholar 

  11. Zhang X J, Chen R, Xu J L, Meng X F, Xie Y T. Towards accurate histogram publication under differential privacy. In: Proceedings of SIAM International Conference on Data Mining. 2014, 587–595

    Google Scholar 

  12. Xu J, Zhang Z J, Xiao X K, Yang Y, Yu G. Differentially private histogram publication. In: Proceedings of IEEE International Conference on Data Engineering. 2012, 32–43

    Google Scholar 

  13. Hay M, Rastogi V, Miklau G, Suciu D. Boosting the accuracy of differentially private histograms through consistency. In: Proceedings of International Conference on Very Large Data Bases. 2010, 1021–1032

    Google Scholar 

  14. Li C, Hay M, Rastogi V, Miklau G, McGregor A. Optimizing linear counting queries under differential privacy. In: Proceedings of ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2010, 123–134

    Google Scholar 

  15. Qardaji W H, Yang W N, Li N H. Understanding hierarchical methods for differentially private histograms. In: Proceedings of International Conference on Very Large Data Bases. 2013, 1954–1965

    Google Scholar 

  16. Li C, Hay M, Miklau G, Wang Y. A data-and workload-aware query answering algorithm for range queries under differential privacy. In: Proceedings of International Conference on Very Large Data Bases. 2014, 341–352

    Google Scholar 

  17. Chen R, Mohammed N, Fung C M, Desai B C, Xiong L. Publishing set-valued data via differential privacy. In: Proceedings of International Conference on Very Large Data Bases. 2011, 1087–1098

    Google Scholar 

  18. Chen R, Ács G, Castelluccia C. Differentially private sequential data publication via variable-length n-grams. In: Proceedings of ACM Conference on Computer and Communications Security. 2012, 638–649

    Google Scholar 

  19. Zhang J, Cormode G, Procopiuc C M, Srivastava D, Xiao X K. Privbayes: private data release via bayesian networks. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2014, 1423–1434

    Google Scholar 

  20. Qardaji W H, Yang W N, Li N H. Priview: practical differentially private release of marginal contingency tables. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2014, 1435–1446

    Google Scholar 

  21. He X, Cormode G, Machanavajjhala A, Procopiuc C M, Srivastava D. DPT: differentially private trajectory synthesis using hierarchical reference systems. In: Proceedings of International Conference on Very Large Data Bases. 2015, 1154–1165

    Google Scholar 

  22. Kasiviswanathan S P, Nissim K, Raskhodnikova S, Smith A. Analyzing graphs with node differential privacy. In: Proceedings of Theory of Cryptography Conference. 2013, 457–476

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61433008, 61472071 and U143520006), the Fundamental Research Funds for the Central Universities of China (161604005 and 171605001), and the Natural Science Foundation of Liaoning Province (2015020018).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ge Yu.

Additional information

Ning Wang is a PhD Candidate in computer software and theory at Northeastern University (NEU), China. She received her BE and ME degrees from NEU in 2007 and 2013. Her current research interest lies in data privacy protection.

Yu Gu received the BE, ME and PhD degrees in computer software and theory from Northeastern University (NEU), China, in 2004, 2007, and 2010, respectively. Currently, he is a professor in NEU. His research interests include graph data management and big data analysis.

Jia Xu received the PhD degree in computer science from Northeastern University, China in 2013. She is currently an assistant professor with School of Computer, Electronics and Information, Guangxi University, China. Her research interests include data query processing and data privacy protection.

Fangfang Li received her PhD degree in computer software and theory from Northeastern University (NEU), China. Currently, she is a lecturer in NEU. She is a member of CCF. Her current research interests include CPS database management, recommend system and complex event processing.

Ge Yu is a full professor and a PhD supervisor in the College of Information Science and Engineering, Northeastern University, China from which he received his BS and MS degrees in computer science and technology in 1982 and 1985, respectively. He received his PhD degree in computer science from Kyushu University, Japan in 1996. He is a fellow of CCF, and a member of ACM and IEEE. His interests include distributed and parallel database, OLAP and data warehousing, data integration and graph data management, etc. He has published more than 200 papers in referred top journals and conferences, such as ACM SIGMOD, VLDB, ICDE, TKDE and VLDBJ. Also, he is an invited reviewer for many journals and conferences.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, N., Gu, Y., Xu, J. et al. Differentially private high-dimensional data publication via grouping and truncating techniques. Front. Comput. Sci. 13, 382–395 (2019). https://doi.org/10.1007/s11704-017-6591-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-017-6591-x

Keywords

Navigation