Skip to main content

DPHKMS: An Efficient Hybrid Clustering Preserving Differential Privacy in Spark

  • Conference paper
  • First Online:
Book cover Advances in Internetworking, Data & Web Technologies (EIDWT 2017)

Abstract

k-means is one of the most widely used clustering algorithms by far. However, when faced with massive data clustering tasks, traditional data mining approaches, especially existing clustering mechanisms fail to deal with malicious attacks under arbitrary background knowledge. This could result in violation of individuals’ privacy, as well as leaks through system resources and clustering outputs while untrusted codes are directly performed on the original data. To address this issue, this paper proposes a novel, effective hybrid k-means clustering preserving differential privacy in Spark, namely Differential Privacy Hybrid k-means (DPHKMS). We combined Particle Swarm Optimization and Cuckoo-search to initiate better cluster centroid selections in the framework of big data computing platform, Apache Spark. Furthermore, DPHKMS is implemented and theoretically proved to meet ε-differential privacy with determinative privacy budget allocation under Laplace mechanism. Finally, experimental results on challenging benchmark data sets demonstrated that DPHKMS, guaranteeing availability and scalability, significantly improves existing varieties of k-means and consistently outperforms the state-of-the-art ones in terms of privacy-preserving, verifying the effectiveness and advantages of incorporating heuristic swarm intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chasaki, D., Mansour, C.: Security challenges in the internet of things. Int. J. Space-Based Situated Comput. 5, 141–149 (2015)

    Article  Google Scholar 

  2. Beldad, A.: Sealing one’s online wall off from outsiders: determinants of the use of facebook’s privacy settings among young dutch users. Int. J. Technol. Hum. Interact. (IJTHI) 12, 21–34 (2016)

    Article  Google Scholar 

  3. Barhamgi, M., Benslimane, D., Ghedira, C.: PPPDM–a privacy-preserving platform for data mashup. Int. J. Grid Utility Comput. 3, 175–187 (2012)

    Article  Google Scholar 

  4. Li, X., He, Y., Niu, B.: An exact and efficient privacy-preserving spatiotemporal matching in mobile social networks. Int. J. Technol. Hum. Interact. (IJTHI) 12, 36–47 (2016)

    Article  Google Scholar 

  5. Petrlic, R., Sekula, S., Sorge, C.: A privacy-friendly architecture for future cloud computing. Int. J. Grid Utility Comput. 4, 265–277 (2013)

    Article  Google Scholar 

  6. Duan, Y., Canny, J.: How to deal with malicious users in privacy-preserving distributed data mining. Stat. Anal. Data Mining 2, 18–33 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  7. Khan, N., Al-Yasiri, A.: Cloud security threats and techniques to strengthen cloud computing adoption framework. Int. J. Inf. Technol. Web Eng. (IJITWE) 11, 50–64 (2016)

    Article  Google Scholar 

  8. Zhang, W., Jiang, S., Zhu, X.: Cooperative downloading with privacy preservation and access control for value-added services in VANETs. Int. J. Grid Utility Comput. 7, 50–60 (2016)

    Article  Google Scholar 

  9. Almiani, M., Razaque, A., Al, D.A.: Privacy preserving framework to support mobile government services. Int. J. Inf. Technol. Web Eng. (IJITWE) 11, 65–78 (2016)

    Article  Google Scholar 

  10. Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Making 5, 597–604 (2006)

    Article  Google Scholar 

  11. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 129–137 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  12. Su, D., Cao, J., Li, N.: Differentially private k-means clustering. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 26–37 (2016)

    Google Scholar 

  13. Samet, S., Miri, A., Orozco-Barbosa, L.: Privacy preserving k-means clustering in multi-party environment. In: SECRYPT, pp. 381–385 (2007)

    Google Scholar 

  14. Doganay, M.C., Pedersen, T.B., Saygin, Y.: Distributed privacy preserving k-means clustering with additive secret sharing. In: Proceedings of the 2008 International Workshop on Privacy and Anonymity in Information Society, pp. 3–11 (2008)

    Google Scholar 

  15. Upmanyu, M., Namboodiri, A.M., Srinathan, K.: Efficient privacy preserving k-means clustering. In: Pacific-Asia Workshop on Intelligence and Security Informatics, pp. 154–166. Springer, Heidelberg (2010)

    Google Scholar 

  16. Chen, H., Hu, Y., Lian, Z.: An additively homomorphic encryption over large message space. Int. J. Inf. Technol. Web Eng. (IJITWE) 10, 82–102 (2015)

    Article  Google Scholar 

  17. Hadoop. http://hadoop.apache.org

  18. Mllib. http://spark.apache.org/mllib

  19. Spark. http://spark.apache.org

  20. Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54, 86–95 (2011)

    Article  Google Scholar 

  21. Kennedy, J.: Particle swarm optimization. In: Encyclopedia of Machine Learning, pp. 760–766. Springer, Heidelberg (2011)

    Google Scholar 

  22. Yang, X.S., Deb, S.: Engineering optimisation by cuckoo search. Int. J. Math. Model. Numer. Optimisation 1, 330–343 (2010)

    Article  MATH  Google Scholar 

  23. Zhou, M., Zhang, R., Xie, W.: Security and privacy in cloud computing: a survey. In: 2010 Sixth International Conference on Semantics Knowledge and Grid (SKG), pp. 105–112 (2010)

    Google Scholar 

  24. Roy, I., Setty, S.T., Kilzer, A.: Airavat: security and privacy for mapreduce. In: NSDI, pp. 297–312 (2010)

    Google Scholar 

  25. Bahmani, B., Moseley, B., Vattani, A., et al.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2013)

    Article  Google Scholar 

  26. Ahmadyfard, A., Modares, H.: Combining PSO and k-means to enhance data clustering. In: International Symposium on Telecommunications, IST 2008, pp. 688–691 (2008)

    Google Scholar 

  27. Kong, W., Lei, Y., Ma, J.: Virtual machine resource scheduling algorithm for cloud computing based on auction mechanism. Optik-Int. J. Light Electron Optics. 127, 5099–5104 (2016)

    Article  Google Scholar 

Download references

Acknowledgments

We will thank for National Natural Science Foundation funded project 61309008 and Shaanxi Province Natural Science Funded Project 2014JQ8049. Also, we would also like to thank our partners at our Research Lab and their generous gifts in support of this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhi-Qiang Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Gao, ZQ., Zhang, LJ. (2018). DPHKMS: An Efficient Hybrid Clustering Preserving Differential Privacy in Spark. In: Barolli, L., Zhang, M., Wang, X. (eds) Advances in Internetworking, Data & Web Technologies. EIDWT 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 6. Springer, Cham. https://doi.org/10.1007/978-3-319-59463-7_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59463-7_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59462-0

  • Online ISBN: 978-3-319-59463-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics