Privacy via Maintaining Small Similitude Data for Big Data Statistical Representation

Derbeko, Philip; Dolev, Shlomi; Gudes, Ehud

doi:10.1007/978-3-319-94147-9_9

Philip Derbeko¹⁶,
Shlomi Dolev¹⁶ &
Ehud Gudes¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10879))

Included in the following conference series:

International Symposium on Cyber Security Cryptography and Machine Learning

1093 Accesses
3 Citations

Abstract

Despite its attractiveness, Big Data oftentimes is hard, slow and expensive to handle due to its size. Moreover, as the amount of collected data grows, individual privacy raises more and more concerns: “what do they know about me?” Different algorithms were suggested to enable privacy-preserving data release with the current de-facto standard differential privacy. However, the processing time of keeping the data private is inhibiting and currently not practical for every day use. Combined with the continuously growing data collection, the solution is not seen on a horizon.

In this research, we suggest replacing the Big Data with a much smaller similitude model. The model “resembles” the data with respect to a class of query. The user defines the maximum acceptable error and privacy requirements ahead of the query execution. Those requirements define the minimal size of the similitude model. The suggested method is demonstrated by using a wavelet transform and then by pruning the tree according to both the data reduction and the privacy requirements. We propose methods of combining the noise required for privacy preservation with noise of similitude model, that allow us to decrease the amount of added noise thus, improving the utilization of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Pandas - python data analysis library. http://pandas.pydata.org
Pywavelets - wavelet transforms in python. https://github.com/PyWavelets/pywt
Ács, G., Castelluccia, C., Chen, R.: Differentially private histogram publishing through lossy compression. In: 2012 IEEE 12th International Conference on Data Mining, pp. 1–10 (2012)
Google Scholar
AT&T and Contributers. Graphviz - graph visualization software. http://graphviz.org
Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2007, pp. 273–282. ACM, New York (2007)
Google Scholar
Blum, A., Dwork, C., Mcsherry, F., Nissim, K.: Practical privacy: the SulQ framework. In: PODS, pp. 128–138. ACM (2005)
Google Scholar
Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC 2008, pp. 609–618. ACM, New York (2008)
Google Scholar
Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. VLDB J. 10(2–3), 199–223 (2001)
MATH Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1
Chapter Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Chapter Google Scholar
Aggarwal, C.C. (ed.): Data Streams: Models and Algorithms. Springer, New York (2007). https://doi.org/10.1007/978-0-387-47534-9
Book MATH Google Scholar
Gaboardi, M., Arias, E.J.G., Hsu, J., Roth, A., Wu, Z.S.: Dual query: practical private query release for high dimensional data. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32, pp. 1170–1178. PMLR, Bejing, 22–24 June 2014
Google Scholar
Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2004, pp. 166–176. ACM, New York (2004)
Google Scholar
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Optimal and approximate computation of summary statistics for range aggregates. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2001, pp. 227–236. ACM, New York (2001)
Google Scholar
Hardt, M., Ligett, K., Mcsherry, F.: A simple and practical algorithm for differentially private data release. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 2339–2347. Curran Associates Inc. (2012)
Google Scholar
Hardt, M., Rothblum, G.: A multiplicative weights mechanism for privacy-preserving data analysis, pp. 61–70, May 2010
Google Scholar
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3(1–2), 1021–1032 (2010)
Article Google Scholar
Lichman, M.: UCI Machine Learning Repository (2013)
Google Scholar
Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. SIGMOD Rec. 27(2), 448–459 (1998)
Article Google Scholar
Qardaji, W.H., Yang, W., Li, N.: Understanding hierarchical methods for differentially private histograms. PVLDB 6, 1954–1965 (2013)
Google Scholar
Rastogi, V., Nath, S.: Differentially private aggregation of distributed time-series with transformation and encryption. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 735–746. ACM, New York (2010)
Google Scholar
Stollnitz, E.J., Derose, T.D., Salesin, D.H.: Wavelets for Computer Graphics: Theory and Applications. Morgan Kaufmann Publishers Inc., San Francisco (1996)
Google Scholar
Ullman, J.: Answering n2+O(1) counting queries with differential privacy is hard. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, STOC 2013, pp. 361–370. ACM, New York (2013)
Google Scholar
Ullman, J., Vadhan, S.: PCPs and the hardness of generating private synthetic data. In: Ishai, Y. (ed.) TCC 2011. LNCS, vol. 6597, pp. 400–416. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19571-6_24
Chapter MATH Google Scholar
Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. SIGMOD Rec. 28(2), 193–204 (1999)
Article Google Scholar
Vitter, J.S., Wang, M., Iyer, B.: Data cube approximation and histograms via wavelets. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, CIKM 1998, pp. 96–104. ACM, New York (1998)
Google Scholar
Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 225–236 (2010)
Google Scholar
Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Privbayes: private data release via Bayesian networks. ACM Trans. Database Syst. 42(4), 25:1–25:41 (2017)
Article MathSciNet Google Scholar

Download references

Acknowledgement

The research was partially supported by the Rita Altura Trust Chair in Computer Sciences; the Lynne and William Frankel Center for Computer Science; the Ministry of Foreign Affairs, Italy; the grant from the Ministry of Science, Technology and Space, Israel, and the National Science Council (NSC) of Taiwan; the Ministry of Science, Technology and Space, Infrastructure Research in the Field of Advanced Computing and Cyber Security; and the Israel National Cyber Bureau.

Authors are grateful to John Ullman for the fruitful discussions of the paper ideas and differential privacy.

Author information

Authors and Affiliations

Computer Science Department, Ben-Gurion University of the Negev, Beer-Sheva, Israel
Philip Derbeko, Shlomi Dolev & Ehud Gudes

Authors

Philip Derbeko
View author publications
You can also search for this author in PubMed Google Scholar
Shlomi Dolev
View author publications
You can also search for this author in PubMed Google Scholar
Ehud Gudes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philip Derbeko .

Editor information

Editors and Affiliations

Ben-Gurion University of the Negev, Beer Sheva, Israel
Itai Dinur
Ben-Gurion University of the Negev, Beer Sheva, Israel
Shlomi Dolev
Tata Consultancy Services (India), Chennai, Tamil Nadu, India
Sachin Lodha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Derbeko, P., Dolev, S., Gudes, E. (2018). Privacy via Maintaining Small Similitude Data for Big Data Statistical Representation. In: Dinur, I., Dolev, S., Lodha, S. (eds) Cyber Security Cryptography and Machine Learning. CSCML 2018. Lecture Notes in Computer Science(), vol 10879. Springer, Cham. https://doi.org/10.1007/978-3-319-94147-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-94147-9_9
Published: 17 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94146-2
Online ISBN: 978-3-319-94147-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics