Journal of Zhejiang University-SCIENCE A

, Volume 10, Issue 7, pp 952–963 | Cite as

Distributed anonymous data perturbation method for privacy-preserving data mining

Article

Abstract

Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed systems. Meanwhile, data perturbation techniques are comparatively efficient but are mainly used in centralized privacy-preserving data mining (PPDM). In this paper, we propose a light-weight anonymous data perturbation method for efficient privacy preserving in distributed data mining. We first define the privacy constraints for data perturbation based PPDM in a semi-honest distributed environment. Two protocols are proposed to address these constraints and protect data statistics and the randomization process against collusion attacks: the adaptive privacy-preserving summary protocol and the anonymous exchange protocol. Finally, a distributed data perturbation framework based on these protocols is proposed to realize distributed PPDM. Experiment results show that our approach achieves a high security level and is very efficient in a large scale distributed environment.

Key words

Privacy-preserving data mining (PPDM) Distributed data mining Data perturbation 

CLC number

TP391.7 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, D., Aggarwal, C.C., 2001. On the Design and Quantification of Privacy Preserving Data Mining Algorithms. Proc. 20th ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems, p.247–255. [doi:10.1145/375551.375602]Google Scholar
  2. Agrawal, R., Srikant, R., 2000. Privacy-preserving data mining. ACM SIGMOD Record, 29(2):439–450. [doi:10.1145/335191.335438]CrossRefGoogle Scholar
  3. Ashley, P., Hada, S., Karjoth, G., 2003. The Enterprise Privacy Authorization Language (EPAL 1.1), IBM. Available from: http://www.zurich.ibm.com/security/enterprise-privacy/epal/
  4. Ashrafi, M.Z., Taniar, D., Smith, K., 2003. Towards Privacy Preserving Distributed Association Rule Mining. Distributed Computing-IWDC, p.279–289. [doi:10.1007/b94926]Google Scholar
  5. Beaver, D., 1991. Foundations of secure interactive computing. CRYPTO, 1991:377–391. [doi:10.1007/3-540-46766-1]MATHGoogle Scholar
  6. Bertino, E., Fovino, I.N., Provenza, L.P., 2005. A framework for evaluating privacy preserving data mining algorithms. Data Min. Knowl. Discov., 11(2):121–154. [doi:10.1007/s10618-005-0006-6]MathSciNetCrossRefGoogle Scholar
  7. Chaum, D., Crepeau, C., Damgard, I., 1988. Multiparty Unconditionally Secure Protocols. Proc. 20th Annual ACM Symp. on Theory of Computing, p.11–19. [doi:10.1145/62212.62214]Google Scholar
  8. Chawla, S., Dwork, C., McSherry, F., Smith, A., Wee, H., 2005. Toward Privacy in Public Databases. Theory of Cryptography Conf., p.363–385. [doi:10.1007/b106171]Google Scholar
  9. Cramer, R., Damgard, I., Nielsen, J.B., 2001. Multiparty Computation from Threshold Homomorphic Encryption. Proc. EUROCRYPT, p.280–300. [doi:10.1007/3-540-44987-6]Google Scholar
  10. Cranor, L., Langheinrich, M., Marchiori, M., Presler-Marshall, M., Reagle, J. (Eds.), 2002. The Platform for Privacy Preferences 1.0 (P3P1.0) Specification. W3C. Available from: http://www.w3.org/TR/P3P/
  11. CSA (Canadian Standards Association), 2004. Privacy Code. Available from: http://www.csa.ca/standards/privacy/Default.asp?laguage=English
  12. Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J., 2004. Privacy preserving mining of association rules. Inf. Syst., 29(4):343–364. [doi:10.1016/j.is.2003.09.001]CrossRefGoogle Scholar
  13. Fienberg, S.E., McIntyre, J., 2004. Data swapping: variations on a theme by dalenius and reiss. Priv. Statist. Datab., 3050:14–29. [doi:10.1007/b97945]CrossRefGoogle Scholar
  14. Fukasawa, T., Wang, J., Takata, T., Miyazaki, M., 2004. An Effective Distributed Privacy-preserving Data Mining Algorithm. Intelligent Data Engineering and Automated Learning (IDEAL), p.320–325. [doi:10.1007/b99975]Google Scholar
  15. Goldreich, O., Micali, S., Wigderson, A., 1987. How to Play Any Mental Game or a Completeness Theorem for Protocols with Honest Majority. 19th ACM Symp. on the Theory of Computing, p.218–229. [doi:10.1145/28395.28420]Google Scholar
  16. Kargupta, H., Das, K., Liu, K., 2007. Multi-party, privacy-preserving distributed data mining using a game theoretic framework. LNCS, 4702:523. [doi:10.1007/978-3-540-74976-9]Google Scholar
  17. Liew, C.K., Choi, U.J., Liew, C.J., 1985. A data distortion by probability distribution. ACM Trans. Datab. Syst., 10(3):395–411. [doi:10.1145/3979.4017]CrossRefMATHGoogle Scholar
  18. Paillier, P., 1999. Public-key cryptosystems based on composite degree residuosity classes. Advances in Cryptology EUROCRYPT, 99:223–238. Available from: http://www.springerlink.com/content/kwjvf0k8fqyy2h3d/ MathSciNetMATHGoogle Scholar
  19. Rizvi, S.J., Haritsa, J.R., 2002. Maintaining Data Privacy in Association Rule Mining. Proc. 28th Int. Conf. on Very Large Data Bases, 28:682–693. [doi:10.1016/B978-155860869-6/50066-4]CrossRefGoogle Scholar
  20. Sweeney, L., 2002. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncert., Fuzz. and Knowl.-based Syst., 10(5):571–588. [doi:10.1142/S021848850200165X]MathSciNetCrossRefMATHGoogle Scholar
  21. Yao, A.C., 1986. How to Generate and Exchange Secrets. Proc. 27th IEEE Symp. on Foundations of Computer Science, p.162–167.Google Scholar
  22. Zhang, P., Tong, Y.H., Tang, S.W., Yang, D.Q., 2005. Privacy Preserving Naive Bayes Classification Advanced Data Mining and Applications. 3584:744–752. [doi:10.1007/b11111]Google Scholar

Copyright information

© Zhejiang University and Springer-Verlag GmbH 2009

Authors and Affiliations

  1. 1.School of Electronic Information and Electrical EngineeringShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations