A sanitization approach for privacy preserving data mining on social distributed environment

Abstract

Data owners worry about their private data in the information that is being uncovered without authorization in the cloud computing environment. While applying privacy preserving methods to the data, the data owners attempt to retain the knowledge inside the data. One approach to solve this problem is the concept of distributed databases where different parties have horizontal or vertical partitions of the data. Cluster analysis is a frequently used data mining task which aims at decomposing or partitioning a usually multivariate data set into groups such that the data objects in one group are more similar to each other. While using encryption based kernel k-means algorithm, large data’s can’t be encrypted in the distributed environment. To extend the privacy concept, a novel method based Privacy Preserving Distributed Data Mining is planned. According to this, a sanitization approach will be developed to improve the privacy of the user data. In sanitization process, a privacy based objective function will be developed and an optimal key will be generated based on the proposed objective function. Here artificial bee colony algorithm will be utilized for optimal key generation and large amount of data can be encrypted. Once the sanitization process is done, the sanitized information will be updated to service provider by the helper user for each cluster. Finally, the experimentation will be carried out with existing database to prove the efficiency of the proposed algorithm. The implementation will be done in JAVA using cloud simulator. Extensive execution assessments and security analysis exhibit the legitimacy and efficiency of the proposed technique.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

References

  1. Ahmed G, Zou J, Fareed MMS, Zeeshan M (2015) Sleep-awake energy efficient distributed clustering algorithm for wireless sensor networks. Comput Electr Eng 56:385–398

    Article  Google Scholar 

  2. Akay B, Karaboga D (2012) A modified artificial bee colony algorithm for real-parameter optimization. Inf Sci 192:120–142

    Article  Google Scholar 

  3. Azimi R, Sajedi H, Ghayekhloo M (2017) A distributed data clustering algorithm in P2P networks. Appl Soft Comput 51:147–167

    Article  Google Scholar 

  4. Bhuyan HK, Kamila NK (2015) Privacy preserving sub-feature selection in distributed data mining. Appl Soft Comput 36:552–569

    Article  Google Scholar 

  5. Chen J, Schizas ID (2016) Distributed information-based clustering of heterogeneous sensor data. Signal Process 126:35–51

    Article  Google Scholar 

  6. Chitta R, Jin R, Jain AK (2012) Efficient kernel clustering using random fourier features. In: 2012 IEEE 12th international conference on data mining. Brussels, pp 161–170

  7. Karaboga D, Ozturk C (2010) Fuzzy clustering with artificial bee colony algorithm. Sci Res Essays 5(14):1899–1902

    Google Scholar 

  8. Kokkinos Y, Margaritis KG (2015) Confidence ratio affinity propagation in ensemble selection of neural network classifiers for distributed privacy-preserving data mining. Neurocomputing 150:513–528

    Article  Google Scholar 

  9. Lakshmi NSR, Babu S, Bhalaji N (2016) Analysis of clustered QoS routing protocol for distributed wireless sensor network. Comput Electr Eng 64:173–181

    Article  Google Scholar 

  10. Limón X, Guerra-Hernández A, Cruz-Ramírez N, Acosta-Mesa HG, Grimaldo F (2016) A windowing strategy for distributed data mining optimized through GPUs. Pattern Recognit Lett 93:23–30

    Article  Google Scholar 

  11. Lin CY (2016) A reversible data transform algorithm using integer transform for privacy-preserving data mining. J Syst Softw 117:104–112

    Article  Google Scholar 

  12. Matatov N, Rokach L, Maimon O (2010) Privacy-preserving data mining: a feature set partitioning approach. Inf Sci 180(14):2696–2720

    MathSciNet  Article  Google Scholar 

  13. Movie Lens Dataset (2019). http://www.grouplens.org

  14. Nagano J, Shinomiya N (2016) Efficient switch clustering for distributed controllers of OpenFlow network with bi-connectivity. Comput Netw 96:48–57

    Article  Google Scholar 

  15. Naldi MC, Campello RJ (2015) Comparison of distributed evolutionary k-means clustering algorithms. Neurocomputing 163:78–93

    Article  Google Scholar 

  16. Nayahi JJV, Kavitha V (2016) Privacy and utility preserving data clustering for data anonymization and distribution on Hadoop. Future Gener Comput Syst 74:393–408

    Article  Google Scholar 

  17. Nettleton DF, Salas J (2016) A data driven anonymization system for information rich online social network graphs. Expert Syst Appl 55:87–105

    Article  Google Scholar 

  18. Peng T (2016) Collaborative trajectory privacy preserving scheme in location-based services. Inf Sci 387:165–179

    Article  Google Scholar 

  19. Taheri H, Neamatollahi P, Younis OM, Naghibzadeh S, Yaghmaee MH (2012) An energy-aware distributed clustering protocol in wireless sensor networks using fuzzy logic. Ad Hoc Netw 10(7):1469–1481

    Article  Google Scholar 

  20. Tian Z, Shi W, Wang Y, Zhu C, Du X, Su S, Sun Y, Guizani N (2019a) Real time lateral movement detection based on evidence reasoning network for edge computing environment. IEEE Trans Industr Inf. https://doi.org/10.1109/TII.2019.2907754

  21. Tian Z, Li M, Qiu M, Sun Y, Su S (2019b) Block-DEF: a secure digital evidence framework using blockchain. Inf Sci 491:151–165

    Article  Google Scholar 

  22. Tian Z, Su S, Shi W, Du X, Guizani M, Yu X (2019c) A data-driven method for future Internet route decision modeling. Future Gener Comput Syst 95:212–220

    Article  Google Scholar 

  23. Tsapanos N (2015) A distributed framework for trimmed kernel k-means clustering. Pattern Recognit 48(8):2685–2698

    Article  Google Scholar 

  24. Xie K, Ning X, Wang X, He S, Ning Z, Liu X, Qin Z (2016) An efficient privacy-preserving compressive data gathering scheme in WSNs. Inf Sci 390:82–94

    Article  Google Scholar 

  25. Ximeng L, Deng RH, Yang Y, Tran HN, Zhong S (2017) Hybrid privacy-preserving clinical decision support system in fog–cloud computing. Future Gener Comput Syst 78:1–50

    Google Scholar 

  26. Yang JJ, Li JQ, Niu Y (2015) A hybrid solution for privacy preserving medical data sharing in the cloud environment. Future Gener Comput Syst 43:74–86

    Article  Google Scholar 

  27. Ye A, Li Y, Xu L (2016) A novel location privacy-preserving scheme based on l-queries for continuous LBS. Comput Commun 98:1–10

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to P. L. Lekshmy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lekshmy, P.L., Rahiman, M.A. A sanitization approach for privacy preserving data mining on social distributed environment. J Ambient Intell Human Comput 11, 2761–2777 (2020). https://doi.org/10.1007/s12652-019-01335-w

Download citation

Keywords

  • Service provider
  • Sanitization
  • Helper user
  • Optimal key
  • User data
  • Kernel k-means
  • Artificial bee colony