Skip to main content

Research on K-Means Clustering Algorithm Over Encrypted Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11983))

Abstract

Aiming at the privacy-preserving problem in data mining process, this paper proposes an improved K-Means algorithm over encrypted data, called HK-means++ that uses the idea of homomorphic encryption to solve the encrypted data multiplication problems, distance calculation problems and the comparison problems. Then apply these security protocols to the improved clustering algorithm framework. To prevent the leakage of privacy while calculating the distance between the sample points and the center points, it prevents the attacker from inferring the cluster grouping of the user by hiding the cluster center. To some extent, it would reduce the risk of leakage of private data in the cluster mining process. It is well known that the traditional K-Means algorithm is too dependent on the initial value. In this paper, we focus on solving the problem to reduce the number of iterations, and improve the clustering efficiency. The experimental results demonstrate that our proposed, HK-Means algorithm has good clustering performance and the running time is also reduced.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bonawitz, K., Ivanov, V., Kreuter, B., et al.: Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191. ACM, New York (2017)

    Google Scholar 

  2. Neha, B., Gordhan, B.: Privacy-preserving using distributed k-means clustering for arbitrarily partitioned data. Int. J. Eng. Res. Dev. 2(2), 2291–2295 (2014)

    Google Scholar 

  3. Su, D., Cao, J., Li, N.: Differentially private k-means clustering and a hybrid approach to private optimization. ACM Trans. Priv. Secur. 20(4), 1–33 (2017)

    Article  Google Scholar 

  4. Yu, Q., Luo, Y., Chen, C., et al.: Outlier-eliminated k-means clustering algorithm based on differential privacy preservation. Appl. Intell. 45(4), 1179–1191 (2016)

    Article  Google Scholar 

  5. Ren, J., Xiong, J., Yao, Z., et al.: DPLK-means: a novel differential privacy k-means mechanism. In: 2017 IEEE Second International Conference on Data Science in Cyberspace, Shenzhen, pp. 133–139. IEEE (2017)

    Google Scholar 

  6. Raphael, B., Raluca, P., Stephen, T., et al.: Machine learning classification over encrypted data. In: Network and Distributed System Security Symposium. NDSS Symposium, San Diego (2015)

    Google Scholar 

  7. Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIFKDD International Conference on Knowledge Discovery & Data mining, pp. 206–215. ACM, New York (2003)

    Google Scholar 

  8. Gentry, G.: Computing arbitrary function of encrypted data. Commun. ACM 53(3), 97 (2010)

    Article  Google Scholar 

  9. Fang, W., Yang, R., Xia, K.: SMC-based privacy protection clustering model. Syst. Eng. Electron. 54(7), 1505–1510 (2012)

    Google Scholar 

  10. Erkin, Z., Veugen, T., Toft, T., et al.: Privacy-Preserving Distributed Clustering. EURASIP J. Inf. Seur. 1, 1–15 (2013)

    Google Scholar 

  11. Yi, X., Zhang, Y.: Equally contributory privacy-preserving k-means clustering over vertically partitioned data. Inf. Syst. 38(1), 97–107 (2013)

    Article  Google Scholar 

  12. Almutairi, N., Coenen, F., Dures, K.: K-means clustering using homomorphic encryption and an updatable distance matrix: secure third party data clustering with limited data owner interaction. In: Bellatreche, L., Chakravarthy, S. (eds.) DaWaK 2017. LNCS, vol. 10440, pp. 274–285. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64283-3_20

    Chapter  Google Scholar 

  13. Aggarwal, A., Kaur, D., Mittal, D., et al.: Secure data mining in cloud using homomorphic encryption. In: 2014 IEEE International Conference on Cloud Computing in Emerging Markets, Bangalore. IEEE (2014)

    Google Scholar 

  14. Gheid, Z., Challal, Y.: Efficient and privacy-preserving k-means clustering for big data mining. In: Proceedings of the IEEE Trustcom/BigdataSE/ISPA, Tianjin, pp. 791–798. IEEE (2016)

    Google Scholar 

  15. Angela, J., Frederik, A.: Unsupervised machine learning on encrypted data. In: Cid, C., Jacobson, M. (eds.) SAC 2018. LNCS, vol. 11349, pp. 453–478. Springer, Cham (2017). https://doi.org/10.1007/978-3-030-10970-7_21

    Chapter  Google Scholar 

Download references

Acknowledgments

This work is supported, in part, by the National Natural Science Foundation of China under grant No. 61872069, in part, by the Fundamental Research Funds for the Central Universities (N171704005), in part, by the Shenyang Science and Technology Plan Projects (18-013-0-01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, C., Wang, A., Liu, X., Xu, J. (2019). Research on K-Means Clustering Algorithm Over Encrypted Data. In: Vaidya, J., Zhang, X., Li, J. (eds) Cyberspace Safety and Security. CSS 2019. Lecture Notes in Computer Science(), vol 11983. Springer, Cham. https://doi.org/10.1007/978-3-030-37352-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37352-8_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37351-1

  • Online ISBN: 978-3-030-37352-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics