Reviewing the Methods of Estimating the Density Function Based on Masked Data
Data privacy is an issue of increasing importance for big data mining, especially for micro-level data. A popular approach to protecting the such is perturbation. Therefore, techniques used to recover the statistical information of the original data from the perturbed data become indispensable in data mining.
This paper reviews and exams the existing techniques for estimating (alternatively, reconstructing) the density function of the original data based on the data perturbed using the additive/multiplicative noise method. Our studies show that the techniques developed for noise-added data cannot replace the techniques for noise-multiplied data, though the two types of masked data could be mutually converted through data transformation. This conclusion might attract data providers’ attention.
KeywordsConfidential data Masked data Multiplicative noise method Additive noise method
Part of R code for implementing the AS2000 Approach was developed by Miss A. Fernando supported by the Winter Project Scholarship 2016, School of Mathematics and Applied Statistics, UoW.
- 2.Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 247–255. ACM (2001)Google Scholar
- 3.Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: 2003 Third IEEE International Conference on Data Mining, ICDM 2003, pp. 99–106. IEEE (2003)Google Scholar
- 6.Lin, Y.X.: Mining the statistical information of confidential data from noise-multiplied data. In: Proceedings of the 3rd IEEE International Conference on Big Data Intelligence and Computing (2017)Google Scholar
- 8.Lin, Y.X., Mazur, L., Sarathy, R., Muralidhar, K.: Statistical information recovery from multivariate noise-multiplied data, a computational approach. Trans. Data Priv. 11, 23–45 (2018)Google Scholar
- 9.Kim, J.J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the Section on Survey Research Methods, pp. 303–308. American Statistical Association (1986)Google Scholar
- 10.Kim, J., Winkler, W.: Multiplicative noise for masking continuous data. Statistics 2003-01 (2003)Google Scholar
- 11.Mivule, K.: Utilizing noise addition for data privacy, an overview. In: Proceedings of the International Conference on Information and Knowledge Engineering (IKE), The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), p. 1 (2012)Google Scholar
- 13.Nayak, T.K., Sinha, B., Zayatz, L.: Statistical properties of multiplicative noise masking for confidentiality protection. J. Off. Stat. 27(3), 527–544 (2011)Google Scholar
- 15.Provost, S.B.: Moment-based density approximants. Math. J. 9(4), 727–756 (2005)Google Scholar
- 16.Lin, Y.X.: A computational Bayesian approach for estimating density functions based on noise-multiplied data. Int. J. Big Data Intell. (2018). (in press)Google Scholar
- 17.Ma, Y., Lin, Y.X., Sarathy, R.: The vulnerability of multiplicative noise protection to correlational attacks on continuous microdata. Technical report, National Institute for Applied Statistics Research Australia, School of Mathematics and Applied Statistics, University of Wollongong, Australia (2017)Google Scholar
- 18.United States Census Bureau: United states census dataset (2000). Accessed 27 July 2000Google Scholar