Abstract
k-means is one of the most widely used clustering algorithms by far. However, when faced with massive data clustering tasks, traditional data mining approaches, especially existing clustering mechanisms fail to deal with malicious attacks under arbitrary background knowledge. This could result in violation of individuals’ privacy, as well as leaks through system resources and clustering outputs while untrusted codes are directly performed on the original data. To address this issue, this paper proposes a novel, effective hybrid k-means clustering preserving differential privacy in Spark, namely Differential Privacy Hybrid k-means (DPHKMS). We combined Particle Swarm Optimization and Cuckoo-search to initiate better cluster centroid selections in the framework of big data computing platform, Apache Spark. Furthermore, DPHKMS is implemented and theoretically proved to meet ε-differential privacy with determinative privacy budget allocation under Laplace mechanism. Finally, experimental results on challenging benchmark data sets demonstrated that DPHKMS, guaranteeing availability and scalability, significantly improves existing varieties of k-means and consistently outperforms the state-of-the-art ones in terms of privacy-preserving, verifying the effectiveness and advantages of incorporating heuristic swarm intelligence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chasaki, D., Mansour, C.: Security challenges in the internet of things. Int. J. Space-Based Situated Comput. 5, 141–149 (2015)
Beldad, A.: Sealing one’s online wall off from outsiders: determinants of the use of facebook’s privacy settings among young dutch users. Int. J. Technol. Hum. Interact. (IJTHI) 12, 21–34 (2016)
Barhamgi, M., Benslimane, D., Ghedira, C.: PPPDM–a privacy-preserving platform for data mashup. Int. J. Grid Utility Comput. 3, 175–187 (2012)
Li, X., He, Y., Niu, B.: An exact and efficient privacy-preserving spatiotemporal matching in mobile social networks. Int. J. Technol. Hum. Interact. (IJTHI) 12, 36–47 (2016)
Petrlic, R., Sekula, S., Sorge, C.: A privacy-friendly architecture for future cloud computing. Int. J. Grid Utility Comput. 4, 265–277 (2013)
Duan, Y., Canny, J.: How to deal with malicious users in privacy-preserving distributed data mining. Stat. Anal. Data Mining 2, 18–33 (2009)
Khan, N., Al-Yasiri, A.: Cloud security threats and techniques to strengthen cloud computing adoption framework. Int. J. Inf. Technol. Web Eng. (IJITWE) 11, 50–64 (2016)
Zhang, W., Jiang, S., Zhu, X.: Cooperative downloading with privacy preservation and access control for value-added services in VANETs. Int. J. Grid Utility Comput. 7, 50–60 (2016)
Almiani, M., Razaque, A., Al, D.A.: Privacy preserving framework to support mobile government services. Int. J. Inf. Technol. Web Eng. (IJITWE) 11, 65–78 (2016)
Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Making 5, 597–604 (2006)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 129–137 (1982)
Su, D., Cao, J., Li, N.: Differentially private k-means clustering. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 26–37 (2016)
Samet, S., Miri, A., Orozco-Barbosa, L.: Privacy preserving k-means clustering in multi-party environment. In: SECRYPT, pp. 381–385 (2007)
Doganay, M.C., Pedersen, T.B., Saygin, Y.: Distributed privacy preserving k-means clustering with additive secret sharing. In: Proceedings of the 2008 International Workshop on Privacy and Anonymity in Information Society, pp. 3–11 (2008)
Upmanyu, M., Namboodiri, A.M., Srinathan, K.: Efficient privacy preserving k-means clustering. In: Pacific-Asia Workshop on Intelligence and Security Informatics, pp. 154–166. Springer, Heidelberg (2010)
Chen, H., Hu, Y., Lian, Z.: An additively homomorphic encryption over large message space. Int. J. Inf. Technol. Web Eng. (IJITWE) 10, 82–102 (2015)
Hadoop. http://hadoop.apache.org
Spark. http://spark.apache.org
Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54, 86–95 (2011)
Kennedy, J.: Particle swarm optimization. In: Encyclopedia of Machine Learning, pp. 760–766. Springer, Heidelberg (2011)
Yang, X.S., Deb, S.: Engineering optimisation by cuckoo search. Int. J. Math. Model. Numer. Optimisation 1, 330–343 (2010)
Zhou, M., Zhang, R., Xie, W.: Security and privacy in cloud computing: a survey. In: 2010 Sixth International Conference on Semantics Knowledge and Grid (SKG), pp. 105–112 (2010)
Roy, I., Setty, S.T., Kilzer, A.: Airavat: security and privacy for mapreduce. In: NSDI, pp. 297–312 (2010)
Bahmani, B., Moseley, B., Vattani, A., et al.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2013)
Ahmadyfard, A., Modares, H.: Combining PSO and k-means to enhance data clustering. In: International Symposium on Telecommunications, IST 2008, pp. 688–691 (2008)
Kong, W., Lei, Y., Ma, J.: Virtual machine resource scheduling algorithm for cloud computing based on auction mechanism. Optik-Int. J. Light Electron Optics. 127, 5099–5104 (2016)
Acknowledgments
We will thank for National Natural Science Foundation funded project 61309008 and Shaanxi Province Natural Science Funded Project 2014JQ8049. Also, we would also like to thank our partners at our Research Lab and their generous gifts in support of this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Gao, ZQ., Zhang, LJ. (2018). DPHKMS: An Efficient Hybrid Clustering Preserving Differential Privacy in Spark. In: Barolli, L., Zhang, M., Wang, X. (eds) Advances in Internetworking, Data & Web Technologies. EIDWT 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 6. Springer, Cham. https://doi.org/10.1007/978-3-319-59463-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-59463-7_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59462-0
Online ISBN: 978-3-319-59463-7
eBook Packages: EngineeringEngineering (R0)