Skip to main content

HiMod-Pert: Histogram Modification Based Perturbation Approach for Privacy Preserving Data Mining

  • Conference paper
  • First Online:
Future Internet Technologies and Trends (ICFITT 2017)

Abstract

Privacy Preserving Data Mining (PPDM) protects the disclosure of sensitive quasi-identifiers of dataset during mining by perturbing the data. This perturbed dataset is then used by trusted Third Party for effective derivation of association rules. Many PPDM algorithms destroy the original data to generate the mining results. It is essential that the perturbed data preserves the statistical inference of the sensitive attributes and minimize the information loss. Existing techniques based on Additive, Multiplicative and Geometric Transformations have minimal information loss, but suffer from reconstruction vulnerabilities. We propose Histogram Modification based method, viz. HiMod-Pert, for preserving the sensitive numeric attributes of perturbed dataset. Our method uses the difference in neighboring values to determine the perturbation factor. Experiments are performed to implement and test the applicability of the proposed technique. Evaluation using descriptive statistic metrics shows that the information loss is minimal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adam, N.R., Wortmann, J.C.: Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21(4), 515–556 (1989)

    Article  Google Scholar 

  2. Duncan, G.T., Mukherjee, S.: Optimal disclosure limitation strategy in statistical databases: deterring tracker attacks through additive noise. J. Am. Stat. Assoc. 95(451), 720–729 (2000)

    Article  Google Scholar 

  3. Gopal, R., Garfinkel, R., Goes, P.: Confidentiality via camouflage: the CVC approach to disclosure limitation when answering queries to databases. Oper. Res. 50(3), 501–516 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  4. Liu, L., Kantarcioglu, M., Thuraisingham, B.: The applicability of the perturbation based privacy preserving data mining for real-world data. Data Knowl. Eng. 65, 5–21 (2007)

    Article  Google Scholar 

  5. Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.: Tools for privacy preserving distributed data mining. SIGKDD Explor. 4(2), 38–44 (2002)

    Article  Google Scholar 

  6. Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: Random data perturbation techniques and privacy preserving data mining. In: IEEE International Conference on Data Mining (2003)

    Google Scholar 

  7. Bai Li, X., Sarkar, S.: A tree-based data perturbation approach for privacy-preserving data mining. IEEE Trans. Knowl. Data Eng. 18(9), 1278–1283 (2006)

    Article  Google Scholar 

  8. Ni, Z., Shi, Y.Q., Ansari, N., Su, W.: Reversible data hiding. In: Proceedings of International Symposium on Circuits and Systems, Bangkok, Thailand, vol. 2, pp. 912–915, 25–28 May 2003

    Google Scholar 

  9. Tai, W., Yeh, C., Chang, C.: Reversible data hiding based on histogram modification of pixel differences. IEEE Trans. Circ. Syst. Video Technol. 19(6), 906–910 (2009)

    Article  Google Scholar 

  10. Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of ACM SIGMOD Conference on Management of Data, Dallas, Texas, pp. 439–450, May 2000

    Google Scholar 

  11. Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, pp. 247–255 (2001)

    Google Scholar 

  12. Domingo-Ferrer, J., Sebé, F., Castellà-Roca, J.: On the security of noise addition for privacy in statistical databases. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 149–161. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25955-8_12

    Chapter  Google Scholar 

  13. Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: Random-data perturbation techniques and privacy preserving data mining. Knowl. Inf. Syst. 7(4), 387–414 (2005). https://doi.org/10.1007/s10115-004-0173-6

    Article  Google Scholar 

  14. Liu, K., Giannella, C., Kargupta, H.: An attacker’s view of distance preserving maps for privacy preserving data mining. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 297–308. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_30

    Chapter  Google Scholar 

  15. Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006). https://doi.org/10.1109/TKDE.2006.14

    Article  Google Scholar 

  16. Giannella, C., Liu, K., Kargupta, H.: Breaching Euclidean distance-preserving data perturbation using few known inputs. IEEE Trans. Knowl. Data Eng. 83, 93–110 (2013). https://doi.org/10.1016/j.datak.2012.10.004

    Article  Google Scholar 

  17. Chen, K., Sun, G., Liu, L.: Towards attack-resilient geometric data perturbation. In: Proceedings of the 2007 SIAM International Conference on Data Mining, Minneapolis, pp. 78–89 (2007)

    Google Scholar 

  18. Lichman, M.: UCI machine learning repository. School of Information and Computer Science, University of California, Irvine (2013). http://archive.ics.uci.edu/ml

  19. https://github.com/Kjonge/DemoWorkbooks/blob/master/NBA%20salaries.xlsx

  20. Domingo-Ferrer, J., Mateo-Sanz, J.M., Torra, V.: Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Proceedings of the International Conference on New Techniques and Technologies for Statistics: Exchange of Technology and Knowhow, pp. 807–826 (2001)

    Google Scholar 

  21. Herranz, J., Matwin, S., Nin, J., Torra, V.: Classifying data from protected statistical datasets. Comput. Secur. 29(8), 874–890 (2010). https://doi.org/10.1016/j.cose.2010.05.005

    Article  Google Scholar 

  22. Sang, Y., Shen, H., Tian, H.: Effective reconstruction of data perturbed by random projections. IEEE Trans. Comput. 61(1), 101–117 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  23. Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min. Knowl. Disc. 11(2), 181–193 (2005). https://doi.org/10.1007/s10618-005-0011-9

    Article  MathSciNet  Google Scholar 

  24. Shah, A., Gulati, R: Evaluating applicability of perturbation techniques for privacy preserving data mining by descriptive statistics. In: Proceedings of 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, India, pp. 621–627, 21–24 September 2016

    Google Scholar 

  25. Matatov, N., Rokach, L., Maimon, O.: Privacy-preserving data mining: a feature set partitioning approach. Inf. Sci. 180(14), 2696–2720 (2010). https://doi.org/10.1016/j.ins.2010.03.011

    Article  MathSciNet  Google Scholar 

  26. Fung, B.C.M., Wang, K., Yu, P.S.: Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Eng. 19(5), 711–725 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alpa Kavin Shah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shah, A.K., Gulati, R. (2018). HiMod-Pert: Histogram Modification Based Perturbation Approach for Privacy Preserving Data Mining. In: Patel, Z., Gupta, S. (eds) Future Internet Technologies and Trends. ICFITT 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 220. Springer, Cham. https://doi.org/10.1007/978-3-319-73712-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73712-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73711-9

  • Online ISBN: 978-3-319-73712-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics