IHP: improving the utility in differential private histogram publication

  • Hui LiEmail author
  • Jiangtao Cui
  • Xue Meng
  • Jianfeng Ma


Differential privacy (DP) is a promising tool for preserving privacy during data publication, as it provides strong theoretical privacy guarantees in face of adversaries with arbitrary background knowledge. Histogram, as the result of a set of count queries, serves as a core statistical tool to report data distributions and is in fact viewed as the fundamental method for many other statistical analysis such as range queries. It is an important form for data publishing. In this paper, we consider the scenario of publishing sensitive histogram data with differential privacy scheme. Existing work in this field has justified that, comparing to directly applying DP techniques (i.e., injecting noise) over the counts in histogram bins, grouping bins before noise injection is more effective (i.e., with higher utility) as it introduces much less error over the sanitized histogram given the same privacy budget. However, state-of-the-art works have not unveiled how the overall utility of a sanitized histogram can be affected by the balance between the privacy budget distributed between grouping and noise injection phases. In this work, we conduct a theoretical study towards how the probability of getting better groups can be improved such that the overall error introduced in sanitized histogram can be further reduced, which directly leads to a higher utility for the sanitized histograms. In particular, we show that the probability of achieving better grouping can be affected by two factors, namely privacy budget assigned in grouping and the normalized utility function used for selecting groups. Motivated by that, we propose a new DP histogram publishing scheme, namely Iterative Histogram Partition, in which we carefully assign privacy budget between grouping and injection phases based on our theoretical study. We also theoretically prove that \(\epsilon \)-differential privacy can be achieved according to our new scheme. Moreover, we also show that, under the same privacy budget, our scheme exhibits less errors in the sanitized histograms comparing with state-of-the-art methods. We also extends the model to multi-dimensional histogram publication cases. Finally, empirical study over four real-world datasets also justifies that our scheme achieves the least error among series of state-of-the-art baseline methods.


Differential privacy Data publication Histogram 



The work is supported by National Nature Science Foundation of China (No. 61672408 and 61472298), Director Fund of PSRPC, Fundamental Research Funds for the Central Universities (No. JB181505), Natural Science Basic Research Plan in Shaanxi Province of China (No. 2018JM6073) and China 111 Project (No. B16037).


  1. 1.
    Ács, G., Castelluccia, C., Chen, R.: Differentially private histogram publishing through Lossy compression. In: Proceedings of the 12th IEEE ICDM, pp. 1–10 (2012)Google Scholar
  2. 2.
    Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the 26th ACM PODS, pp. 273–282 (2007)Google Scholar
  3. 3.
    Cho, H., Kwon, S.J., Jin, R., Chung, T.: A privacy-aware monitoring algorithm for moving k-nearest neighbor queries in road networks. Distrib. Parallel Databases 33(3), 319–352 (2015)CrossRefGoogle Scholar
  4. 4.
    Cormode, G., Procopiuc, C.M., Srivastava, D., Shen, E., Yu, T.: Differentially private spatial decompositions. In: IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, VA), 1–5 April 2012, pp. 20–31 (2012)Google Scholar
  5. 5.
    Ding, B., Winslett, M., Han, J., Li, Z.: Differentially private data cubes: optimizing noise sources and consistency. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, 12–16 June 2011, pp. 217–228 (2011)Google Scholar
  6. 6.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.D.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd TCC, pp. 265–284 (2006)Google Scholar
  7. 7.
    Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. PVLDB 3(1), 1021–1032 (2010)Google Scholar
  8. 8.
    Hua, J., Tang, A., Fang, Y., Shen, Z., Zhong, S.: Privacy-preserving utility verification of the data published by non-interactive differentially private mechanisms. IEEE Trans. Inf. Forensics Security 11(10), 2298–2311 (2016)CrossRefGoogle Scholar
  9. 9.
    Huang, J., Qi, J., Xu, Y., Chen, J.: A privacy-enhancing model for location-based personalized recommendations. Distrib. Parallel Databases 33(2), 253–276 (2015)CrossRefGoogle Scholar
  10. 10.
    Kellaris, G., Papadopoulos, S.: Practical differential privacy via grouping and smoothing. PVLDB 6(5), 301–312 (2013)Google Scholar
  11. 11.
    Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: Proceedings of the 29th ACM PODS, pp. 123–134 (2010)Google Scholar
  12. 12.
    Li, C., Hay, M., Miklau, G., Wang, Y.: A data- and workload-aware query answering algorithm for range queries under differential privacy. PVLDB 7(5), 341–352 (2014)Google Scholar
  13. 13.
    Li, C., Miklau, G., Hay, M., McGregor, A., Rastogi, V.: The matrix mechanism: optimizing linear counting queries under differential privacy. VLDB J. 24(6), 757–781 (2015)CrossRefGoogle Scholar
  14. 14.
    Li, H., Xiong, L., Jiang, X., Liu, J.: Differentially private histogram publication for dynamic datasets: an adaptive sampling approach. In: Proceedings of the 24th ACM CIKM, pp. 1001–1010 (2015)Google Scholar
  15. 15.
    Li, H., Cui, J., Lin, X., Ma, J.: Improving the utility in differential private histogram publishing: Theoretical study and practice. In: 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, December 5-8, 2016, pp 1100–1109 (2016)Google Scholar
  16. 16.
    Li, Y.D., Zhang, Z., Winslett, M., Yang, Y.: Compressive mechanism: utilizing sparse representation in differential privacy. In: Proceedings of the 10th ACM WPES, pp. 177–182 (2011)Google Scholar
  17. 17.
    Lichman, M.: UCI machine learning repository (2013).
  18. 18.
    McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the ACM SIGMOD, pp. 19–30 (2009)Google Scholar
  19. 19.
    McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: Proceedings of the 48th IEEE FOCS, pp. 94–103 (2007)Google Scholar
  20. 20.
    Muthukrishnan, S., Poosala, V., Suel, T.: On rectangular partitionings in two dimensions: Algorithms, complexity, and applications. In: 7th International Conference on Database Theory—ICDT ’99, Jerusalem, Israel, 10–12 January 1999, Proceedings, pp. 236–256 (1999)Google Scholar
  21. 21.
    Qardaji, W.H., Yang, W., Li, N.: Differentially private grids for geospatial data. In: 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, 8–12 April 2013, pp. 757–768 (2013)Google Scholar
  22. 22.
    Qardaji, W.H., Yang, W., Li, N.: Understanding hierarchical methods for differentially private histograms. PVLDB 6(14), 1954–1965 (2013)Google Scholar
  23. 23.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  24. 24.
    Rastogi, V., Nath, S.: Differentially private aggregation of distributed time-series with transformation and encryption. In: Proceedings of the ACM SIGMOD, pp. 735–746 (2010)Google Scholar
  25. 25.
    Song, C., Ge, T.: Aroma: A new data protection method with differential privacy and accurate query answering. In: Proceedings of the 23rd ACM CIKM, pp. 1569–1578 (2014)Google Scholar
  26. 26.
    Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Megías, D.: Individual differential privacy: a utility-preserving formulation of differential privacy guarantees. IEEE Trans. Inf. Forensics Security 12(6), 1418–1429 (2017)CrossRefGoogle Scholar
  27. 27.
    Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2011)CrossRefGoogle Scholar
  28. 28.
    Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G.: Differentially private histogram publication. In: Proceedings of the 28th IEEE ICDE, pp. 32–43 (2012)Google Scholar
  29. 29.
    Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G., Winslett, M.: Differentially private histogram publication. VLDB J. 22(6), 797–822 (2013)CrossRefGoogle Scholar
  30. 30.
    Yuan, G., Zhang, Z., Winslett, M., Xiao, X., Yang, Y., Hao, Z.: Low-rank mechanism: optimizing batch queries under differential privacy. PVLDB 5(11), 1352–1363 (2012)Google Scholar
  31. 31.
    Yuan, G., Zhang, Z., Winslett, M., Xiao, X., Yang, Y., Hao, Z.: Optimizing batch linear queries under exact and approximate differential privacy. ACM Trans. Database Syst. 40(2), 11 (2015)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Zhang, T., Zhu, Q.: Dynamic differential privacy for admm-based distributed classification learning. IEEE Trans. Inf. Forensics Security 12(1), 172–187 (2017)CrossRefGoogle Scholar
  33. 33.
    Zhang, X., Chen, R., Xu, J., Meng, X., Xie, Y.: Towards accurate histogram publication under differential privacy. In: Proceedings of the 2014 SIAM SDM, pp. 587–595 (2014)Google Scholar
  34. 34.
    Zhu, T., Xiong, P., Li, G., Zhou, W.: Correlated differential privacy: hiding information in non-iid data set. IEEE Trans. Inf. Forensics Security 10(2), 229–242 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Cyber EngineeringXidian UniversityXi’anChina
  2. 2.National Engineering Laboratory for Public Security Risk Perception and Control by Big Data (PSRPC)BeijingChina
  3. 3.School of Computer Science and TechnologyXidian UniversityXi’anChina

Personalised recommendations