Abstract
Privacy Preserving Data Mining (PPDM) protects the disclosure of sensitive quasi-identifiers of dataset during mining by perturbing the data. This perturbed dataset is then used by trusted Third Party for effective derivation of association rules. Many PPDM algorithms destroy the original data to generate the mining results. It is essential that the perturbed data preserves the statistical inference of the sensitive attributes and minimize the information loss. Existing techniques based on Additive, Multiplicative and Geometric Transformations have minimal information loss, but suffer from reconstruction vulnerabilities. We propose Histogram Modification based method, viz. HiMod-Pert, for preserving the sensitive numeric attributes of perturbed dataset. Our method uses the difference in neighboring values to determine the perturbation factor. Experiments are performed to implement and test the applicability of the proposed technique. Evaluation using descriptive statistic metrics shows that the information loss is minimal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adam, N.R., Wortmann, J.C.: Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21(4), 515–556 (1989)
Duncan, G.T., Mukherjee, S.: Optimal disclosure limitation strategy in statistical databases: deterring tracker attacks through additive noise. J. Am. Stat. Assoc. 95(451), 720–729 (2000)
Gopal, R., Garfinkel, R., Goes, P.: Confidentiality via camouflage: the CVC approach to disclosure limitation when answering queries to databases. Oper. Res. 50(3), 501–516 (2002)
Liu, L., Kantarcioglu, M., Thuraisingham, B.: The applicability of the perturbation based privacy preserving data mining for real-world data. Data Knowl. Eng. 65, 5–21 (2007)
Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.: Tools for privacy preserving distributed data mining. SIGKDD Explor. 4(2), 38–44 (2002)
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: Random data perturbation techniques and privacy preserving data mining. In: IEEE International Conference on Data Mining (2003)
Bai Li, X., Sarkar, S.: A tree-based data perturbation approach for privacy-preserving data mining. IEEE Trans. Knowl. Data Eng. 18(9), 1278–1283 (2006)
Ni, Z., Shi, Y.Q., Ansari, N., Su, W.: Reversible data hiding. In: Proceedings of International Symposium on Circuits and Systems, Bangkok, Thailand, vol. 2, pp. 912–915, 25–28 May 2003
Tai, W., Yeh, C., Chang, C.: Reversible data hiding based on histogram modification of pixel differences. IEEE Trans. Circ. Syst. Video Technol. 19(6), 906–910 (2009)
Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of ACM SIGMOD Conference on Management of Data, Dallas, Texas, pp. 439–450, May 2000
Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, pp. 247–255 (2001)
Domingo-Ferrer, J., Sebé, F., Castellà -Roca, J.: On the security of noise addition for privacy in statistical databases. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 149–161. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25955-8_12
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: Random-data perturbation techniques and privacy preserving data mining. Knowl. Inf. Syst. 7(4), 387–414 (2005). https://doi.org/10.1007/s10115-004-0173-6
Liu, K., Giannella, C., Kargupta, H.: An attacker’s view of distance preserving maps for privacy preserving data mining. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 297–308. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_30
Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006). https://doi.org/10.1109/TKDE.2006.14
Giannella, C., Liu, K., Kargupta, H.: Breaching Euclidean distance-preserving data perturbation using few known inputs. IEEE Trans. Knowl. Data Eng. 83, 93–110 (2013). https://doi.org/10.1016/j.datak.2012.10.004
Chen, K., Sun, G., Liu, L.: Towards attack-resilient geometric data perturbation. In: Proceedings of the 2007 SIAM International Conference on Data Mining, Minneapolis, pp. 78–89 (2007)
Lichman, M.: UCI machine learning repository. School of Information and Computer Science, University of California, Irvine (2013). http://archive.ics.uci.edu/ml
https://github.com/Kjonge/DemoWorkbooks/blob/master/NBA%20salaries.xlsx
Domingo-Ferrer, J., Mateo-Sanz, J.M., Torra, V.: Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Proceedings of the International Conference on New Techniques and Technologies for Statistics: Exchange of Technology and Knowhow, pp. 807–826 (2001)
Herranz, J., Matwin, S., Nin, J., Torra, V.: Classifying data from protected statistical datasets. Comput. Secur. 29(8), 874–890 (2010). https://doi.org/10.1016/j.cose.2010.05.005
Sang, Y., Shen, H., Tian, H.: Effective reconstruction of data perturbed by random projections. IEEE Trans. Comput. 61(1), 101–117 (2012)
Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min. Knowl. Disc. 11(2), 181–193 (2005). https://doi.org/10.1007/s10618-005-0011-9
Shah, A., Gulati, R: Evaluating applicability of perturbation techniques for privacy preserving data mining by descriptive statistics. In: Proceedings of 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, India, pp. 621–627, 21–24 September 2016
Matatov, N., Rokach, L., Maimon, O.: Privacy-preserving data mining: a feature set partitioning approach. Inf. Sci. 180(14), 2696–2720 (2010). https://doi.org/10.1016/j.ins.2010.03.011
Fung, B.C.M., Wang, K., Yu, P.S.: Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Eng. 19(5), 711–725 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Shah, A.K., Gulati, R. (2018). HiMod-Pert: Histogram Modification Based Perturbation Approach for Privacy Preserving Data Mining. In: Patel, Z., Gupta, S. (eds) Future Internet Technologies and Trends. ICFITT 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 220. Springer, Cham. https://doi.org/10.1007/978-3-319-73712-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-73712-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73711-9
Online ISBN: 978-3-319-73712-6
eBook Packages: Computer ScienceComputer Science (R0)