Privacy-Preserving K-Means Clustering Upon Negative Databases

  • Xiaoyi Hu
  • Liping Lu
  • Dongdong ZhaoEmail author
  • Jianwen XiangEmail author
  • Xing Liu
  • Haiying Zhou
  • Shengwu Xiong
  • Jing Tian
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11304)


Data mining has become very popular with the arrival of big data era, but it also raises privacy issues. Negative database (NDB) is a new type of data representation which stores the negative image of data and can protect privacy while supporting some basic data mining operations such as classification and clustering. However, the existing clustering algorithm upon NDBs is based on Hamming distance, when facing datasets which have many categories for each attribute, the encoded data will become very long and resulting in low computational efficiency. In this paper, we propose a privacy-preserving k-means clustering algorithm based on Euclidean distance upon NDBs. The main step of k-means algorithm is to calculate the distance between each record and cluster centers, in order to solve the problem of privacy disclosure in this step, we transform each record in database into an NDB and propose a method to estimate Euclidean distance from a binary string and an NDB. Our work opens up new ideas for data mining upon negative database.


Privacy protection Data mining Negative database k-means clustering 



This work was partially supported by the National Natural Science Foundation of China (Grant No. 61806151, 61672398, 61702387), the Hubei Provincial Natural Science Foundation of China (Grant No. 2017CFA012, 2017CFB302), the Key Technical Innovation Project of Hubei (Grant No. 2017AAA122), Provincial Science & Technology International Cooperation Program of Hubei (Grant No. 2017AHB048), the Applied Fundamental Research of Wuhan (Grant No. 20160101010004), and the Open Fund of Hubei Key Lab. of Transportation of IoT (Grant No. 2017III28-004).


  1. 1.
    Bao, Y., Luo, W., Zhang, X.: Estimating positive surveys from negative surveys. Stat. Prob. Lett. 83(2), 551–558 (2013)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bringer, J., Chabanne, H.: Negative databases for biometric data. In: Proceedings of the 12th ACM Workshop on Multimedia and Security, pp. 55–62. ACM (2010)Google Scholar
  3. 3.
    Chen, K., Liu, L.: Geometric data perturbation for privacy preserving outsourced data mining. Knowl. Inf. Syst. 29(3), 657–695 (2011)CrossRefGoogle Scholar
  4. 4.
    Dasgupta, D., Azeem, R.: An investigation of negative authentication systems. In: Proceedings of 3rd International Conference on Information Warfare and Security, pp. 117–126 (2008)Google Scholar
  5. 5.
    Dasgupta, D., Roy, A., Nag, A.: Negative authentication systems. Advances in User Authentication. ISFS, pp. 85–145. Springer, Cham (2017). Scholar
  6. 6.
    Dasgupta, D., Saha, S.: A biologically inspired password authentication system. In: Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies, p. 41. ACM (2009)Google Scholar
  7. 7.
    Dasgupta, D., Saha, S.: Password security through negative filtering. In: 2010 International Conference on Emerging Security Technologies (EST), pp. 83–89. IEEE (2010)Google Scholar
  8. 8.
    Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). Accessed 27 Aug 2018
  9. 9.
    Dhiraj, S.S., Khan, A.M.A., Khan, W., Challagalla, A.: Privacy preservation in k-means clustering by cluster rotation. In: TENCON 2009–2009 IEEE Region 10 Conference, pp. 1–7. IEEE (2009)Google Scholar
  10. 10.
    Esponda, F.: Everything that is not important: negative databases [research frontier]. IEEE Comput. Intell. Mag. 3(2), 60–63 (2008)CrossRefGoogle Scholar
  11. 11.
    Esponda, F.: Negative surveys. arXiv preprint. arXiv: math/0608176 (2006)
  12. 12.
    Esponda, F.: Hiding a needle in a haystack using negative databases. In: Solanki, K., Sullivan, K., Madhow, U. (eds.) IH 2008. LNCS, vol. 5284, pp. 15–29. Springer, Heidelberg (2008). Scholar
  13. 13.
    Esponda, F., Ackley, E.S., Helman, P., Jia, H., Forrest, S.: Protecting data privacy through hard-to-reverse negative databases. Int. J. Inf. Secur. 6(6), 403–415 (2007)CrossRefGoogle Scholar
  14. 14.
    Esponda, F., Forrest, S., Helman, P.: Enhancing privacy through negative representations of data. Technical report, Department of Computer Science, University of New Mexico (2004)Google Scholar
  15. 15.
    Esponda, F., Trias, E.D., Ackley, E.S., Forrest, S.: A relational algebra for negative databases. University of New Mexico, Technical report (2007)Google Scholar
  16. 16.
    Ferris, B., Froehlich, J.: WalkSAT as an informed heuristic to DPLL in sat solving. Technical report, CSE 573: Artificial Intelligence (2004)Google Scholar
  17. 17.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)zbMATHGoogle Scholar
  18. 18.
    Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 593–599. ACM (2005)Google Scholar
  19. 19.
    Jia, H., Moore, C., Strain, D.: Generating hard satisfiable formulas by hiding solutions deceptiveily. In: National Conference on Artificial Intelligence, pp. 384–389 (2005)Google Scholar
  20. 20.
    Lin, K.P.: Privacy-preserving kernel k-means clustering outsourcing with random transformation. Knowl. Inf. Syst. 49(3), 885–908 (2016)CrossRefGoogle Scholar
  21. 21.
    Liu, R., Luo, W., Wang, X.: A hybrid of the prefix algorithm and the q-hidden algorithm for generating single negative databases. In: 2011 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), pp. 31–38. IEEE (2011)Google Scholar
  22. 22.
    Liu, R., Luo, W., Yue, L.: Classifying and clustering in negative databases. Front. Comput. Sci. 7(6), 864–874 (2013)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Liu, R., Luo, W., Yue, L.: The p-hidden algorithm: hiding single databases more deeply. Immune Comput. 2(1), 43–55 (2014)Google Scholar
  24. 24.
    Mahajan, Y.S., Fu, Z., Malik, S.: Zchaff2004: an efficient SAT solver. In: Hoos, H.H., Mitchell, D.G. (eds.) SAT 2004. LNCS, vol. 3542, pp. 360–375. Springer, Heidelberg (2005). Scholar
  25. 25.
    Oliveira, S., Zaiane, O.: Data perturbation by rotation for privacy-preserving clustering. Technical report TR04-17 (2004)Google Scholar
  26. 26.
    Patel, S., Patel, V., Jinwala, D.: Privacy preserving distributed k-means clustering in malicious model using zero knowledge proof. In: Hota, C., Srimani, P.K. (eds.) ICDCIT 2013. LNCS, vol. 7753, pp. 420–431. Springer, Heidelberg (2013). Scholar
  27. 27.
    Pipatsrisawat, K., Darwiche, A.: On the power of clause-learning SAT solvers with restarts. In: Gent, I.P. (ed.) CP 2009. LNCS, vol. 5732, pp. 654–668. Springer, Heidelberg (2009). Scholar
  28. 28.
    Selman, B., Kautz, H.A., Cohen, B.: Noise strategies for improving local search. In: AAAI, vol. 94, pp. 337–343 (1994)Google Scholar
  29. 29.
    Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. ACM (2003)Google Scholar
  30. 30.
    Zhao, D., Luo, W., Liu, R., Yue, L.: A fine-grained algorithm for generating hard-toreverse negative databases. In: 2015 International Workshop on Artificial Immune Systems (AIS), pp. 1–8 (2015)Google Scholar
  31. 31.
    Zhao, D., Luo, W., Liu, R., Yue, L.: Negative iris recognition. IEEE Trans. Dependable Secure Comput. 15(1), 112–125 (2018)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Xiaoyi Hu
    • 1
  • Liping Lu
    • 1
  • Dongdong Zhao
    • 1
    Email author
  • Jianwen Xiang
    • 1
    Email author
  • Xing Liu
    • 1
  • Haiying Zhou
    • 2
  • Shengwu Xiong
    • 1
  • Jing Tian
    • 1
  1. 1.Hubei Key Laboratory of Transportation of Internet of Things, School of Computer Science and TechnologyWuhan University of TechnologyWuhanChina
  2. 2.Institution of Automotive EngineeringHubei University of Automotive TechnologyShiyanChina

Personalised recommendations