Abstract
Data perturbation with random noise signals has been shown to be useful for data hiding in privacy-preserving data mining. Perturbation methods based on additive randomization allows accurate estimation of the Probability Density Function (PDF) via the Expectation-Maximization (EM) algorithm but it has been shown that noise-filtering techniques can be used to reconstruct the original data in many cases, leading to security breaches. In this paper, we propose a generic PDF reconstruction algorithm that can be used on non-additive (and additive) randomization techiques for the purpose of privacy-preserving data mining. This two-step reconstruction algorithm is based on Parzen-Window reconstruction and Quadratic Programming over a convex set – the probability simplex. Our algorithm eliminates the usual need for the iterative EM algorithm and it is generic for most randomization models. The simplicity of our two-step reconstruction algorithm, without iteration, also makes it attractive for use when dealing with large datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adam, N.R., Worthmann, J.C.: Security-control methods for statistical databases: A comparative study. ACM Comput. Surv. 21, 515–556 (1989)
Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithm. In: Symposium on Principles of Database Systems, pp. 247–255 (2001)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc. of the ACM SIGMOD Conference on Management of Data, pp. 439–450. ACM Press, New York (2000)
Assenza, A., Archambeau, C., Valle, M., Verleysen, M.: Assessment of probability density estimation methods: Parzen-Window and Finite Gaussian Mixture ISCAS. In: IEEE International Symposium on Circuits and Systems, Kos (Greece) (2006), pp. 3245–3248 (2006)
Bertsekas, D., Nedic, A., Ozdaglar, A.E.: Convex Analysis and Optimization Athena Scientific (2003)
Bertsekas, D.: Nonlinear Programming Athena Scientific (2004)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Davis, P.J., Rabinowitz, P.: Methods of Numerical Integration. Academic Press, San Diego (1984)
Evfimievski, A., Gehrke, J., Srikant, R.: Limiting Privacy Breaches in Privacy Preserving Data Minin. In: Proc. of ACM SIGMOD/PODS Conference, pp. 211–222 (2003)
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy Preserving Mining of Association Rule. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery in Databases and Data Mining, Edmonton, Alberta, Canada, pp. 217–228 (2002)
Fessler, J.A.: On transformations of random vectors. In: Technical Report 314, Comm. and Sign. Proc. Lab. Dept. of EECS, Univ. of Michigan, Ann Arbor, MI, 48109-2122 (1998)
Fukunaga, K.: Statistical Pattern Recognition, 2nd edn. California Academic Press, San Diego (1990)
Glen, A., Leemis, L., Drew, J.: Computing the distribution of the product of two continuous random variables. Computational statistics & data analysis 44, 451–464 (2004)
Hogg, R.V., Craig, A.T.: Introduction to Mathematical Statistics, 5th edn. Prentice-Hall, Englewood Cliffs, NJ (1995)
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation technique. In: Proceedings of the 3rd IEEE International Conference on Data Mining, Washington, DC, USA, pp. 99–106 (2003)
Kim, J.J., Winkler, W.E.: Multiplicative noise for masking continuous data Technical Report Statistics #2003-01, Statistical Research Division, U.S. Bureau of the Census, Washington, DC, USA (2003)
Luenberger, D.G.: Linear and Nonlinear Programming. Addison-Wesley, London (1984)
Liu, K., Kargupta, H., Ryan, J.: Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining. IEEE Transactions on Knowledge and Data Engineering (TKDE) 18, 92–106 (2006)
Oppenheim, A.V., Willsky, A.S.: Signals and Systems. Prentice-Hall, Englewood Cliffs (1996)
Parzen, E.: On the estimation of a probability density function and mode. Annals of Mathematical Statistics 33, 1065–1076 (1962)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman & Hall, London (1986)
Makov, U.E., Trotini, M., Fienberg, S.E., Meyer, M.M.: Additive noise and multiplicative bias as disclosure limitation techniques for continuous microdata: A simulation study. Journal of Computational Methods in Science and Engineering 4, 5–16 (2004)
U.S. Department of Housing, Urban Developments (USDHUDs) Office of Policy Development and Research (PD&R) (2005), http://www.huduser.org/datasets/il/IL_99_05_REV.xls
Verykios, V.S., Bertino, E., Fovino, I.N., Provenza, L.P., Saygin, Y., Theodoridis, Y.: State-of-the-art in Privacy Preserving Data Mining. ACM SIGMOD Record 3, 50–57 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tan, V.Y.F., Ng, SK. (2007). Generic Probability Density Function Reconstruction for Randomization in Privacy-Preserving Data Mining. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-73499-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)