Privacy-Preserving K-Means Clustering Upon Negative Databases

Hu, Xiaoyi; Lu, Liping; Zhao, Dongdong; Xiang, Jianwen; Liu, Xing; Zhou, Haiying; Xiong, Shengwu; Tian, Jing

doi:10.1007/978-3-030-04212-7_17

Xiaoyi Hu¹⁶,
Liping Lu¹⁶,
Dongdong Zhao¹⁶,
Jianwen Xiang¹⁶,
Xing Liu¹⁶,
Haiying Zhou¹⁷,
Shengwu Xiong¹⁶ &
…
Jing Tian¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11304))

Included in the following conference series:

International Conference on Neural Information Processing

2411 Accesses
6 Citations

Abstract

Data mining has become very popular with the arrival of big data era, but it also raises privacy issues. Negative database (NDB) is a new type of data representation which stores the negative image of data and can protect privacy while supporting some basic data mining operations such as classification and clustering. However, the existing clustering algorithm upon NDBs is based on Hamming distance, when facing datasets which have many categories for each attribute, the encoded data will become very long and resulting in low computational efficiency. In this paper, we propose a privacy-preserving k-means clustering algorithm based on Euclidean distance upon NDBs. The main step of k-means algorithm is to calculate the distance between each record and cluster centers, in order to solve the problem of privacy disclosure in this step, we transform each record in database into an NDB and propose a method to estimate Euclidean distance from a binary string and an NDB. Our work opens up new ideas for data mining upon negative database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bao, Y., Luo, W., Zhang, X.: Estimating positive surveys from negative surveys. Stat. Prob. Lett. 83(2), 551–558 (2013)
Article MathSciNet Google Scholar
Bringer, J., Chabanne, H.: Negative databases for biometric data. In: Proceedings of the 12th ACM Workshop on Multimedia and Security, pp. 55–62. ACM (2010)
Google Scholar
Chen, K., Liu, L.: Geometric data perturbation for privacy preserving outsourced data mining. Knowl. Inf. Syst. 29(3), 657–695 (2011)
Article Google Scholar
Dasgupta, D., Azeem, R.: An investigation of negative authentication systems. In: Proceedings of 3rd International Conference on Information Warfare and Security, pp. 117–126 (2008)
Google Scholar
Dasgupta, D., Roy, A., Nag, A.: Negative authentication systems. Advances in User Authentication. ISFS, pp. 85–145. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58808-7_3
Chapter Google Scholar
Dasgupta, D., Saha, S.: A biologically inspired password authentication system. In: Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies, p. 41. ACM (2009)
Google Scholar
Dasgupta, D., Saha, S.: Password security through negative filtering. In: 2010 International Conference on Emerging Security Technologies (EST), pp. 83–89. IEEE (2010)
Google Scholar
Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml. Accessed 27 Aug 2018
Dhiraj, S.S., Khan, A.M.A., Khan, W., Challagalla, A.: Privacy preservation in k-means clustering by cluster rotation. In: TENCON 2009–2009 IEEE Region 10 Conference, pp. 1–7. IEEE (2009)
Google Scholar
Esponda, F.: Everything that is not important: negative databases [research frontier]. IEEE Comput. Intell. Mag. 3(2), 60–63 (2008)
Article Google Scholar
Esponda, F.: Negative surveys. arXiv preprint. arXiv: math/0608176 (2006)
Esponda, F.: Hiding a needle in a haystack using negative databases. In: Solanki, K., Sullivan, K., Madhow, U. (eds.) IH 2008. LNCS, vol. 5284, pp. 15–29. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88961-8_2
Chapter Google Scholar
Esponda, F., Ackley, E.S., Helman, P., Jia, H., Forrest, S.: Protecting data privacy through hard-to-reverse negative databases. Int. J. Inf. Secur. 6(6), 403–415 (2007)
Article Google Scholar
Esponda, F., Forrest, S., Helman, P.: Enhancing privacy through negative representations of data. Technical report, Department of Computer Science, University of New Mexico (2004)
Google Scholar
Esponda, F., Trias, E.D., Ackley, E.S., Forrest, S.: A relational algebra for negative databases. University of New Mexico, Technical report (2007)
Google Scholar
Ferris, B., Froehlich, J.: WalkSAT as an informed heuristic to DPLL in sat solving. Technical report, CSE 573: Artificial Intelligence (2004)
Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
MATH Google Scholar
Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 593–599. ACM (2005)
Google Scholar
Jia, H., Moore, C., Strain, D.: Generating hard satisfiable formulas by hiding solutions deceptiveily. In: National Conference on Artificial Intelligence, pp. 384–389 (2005)
Google Scholar
Lin, K.P.: Privacy-preserving kernel k-means clustering outsourcing with random transformation. Knowl. Inf. Syst. 49(3), 885–908 (2016)
Article Google Scholar
Liu, R., Luo, W., Wang, X.: A hybrid of the prefix algorithm and the q-hidden algorithm for generating single negative databases. In: 2011 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), pp. 31–38. IEEE (2011)
Google Scholar
Liu, R., Luo, W., Yue, L.: Classifying and clustering in negative databases. Front. Comput. Sci. 7(6), 864–874 (2013)
Article MathSciNet Google Scholar
Liu, R., Luo, W., Yue, L.: The p-hidden algorithm: hiding single databases more deeply. Immune Comput. 2(1), 43–55 (2014)
Google Scholar
Mahajan, Y.S., Fu, Z., Malik, S.: Zchaff2004: an efficient SAT solver. In: Hoos, H.H., Mitchell, D.G. (eds.) SAT 2004. LNCS, vol. 3542, pp. 360–375. Springer, Heidelberg (2005). https://doi.org/10.1007/11527695_27
Chapter Google Scholar
Oliveira, S., Zaiane, O.: Data perturbation by rotation for privacy-preserving clustering. Technical report TR04-17 (2004)
Google Scholar
Patel, S., Patel, V., Jinwala, D.: Privacy preserving distributed k-means clustering in malicious model using zero knowledge proof. In: Hota, C., Srimani, P.K. (eds.) ICDCIT 2013. LNCS, vol. 7753, pp. 420–431. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36071-8_33
Chapter Google Scholar
Pipatsrisawat, K., Darwiche, A.: On the power of clause-learning SAT solvers with restarts. In: Gent, I.P. (ed.) CP 2009. LNCS, vol. 5732, pp. 654–668. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04244-7_51
Chapter Google Scholar
Selman, B., Kautz, H.A., Cohen, B.: Noise strategies for improving local search. In: AAAI, vol. 94, pp. 337–343 (1994)
Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. ACM (2003)
Google Scholar
Zhao, D., Luo, W., Liu, R., Yue, L.: A fine-grained algorithm for generating hard-toreverse negative databases. In: 2015 International Workshop on Artificial Immune Systems (AIS), pp. 1–8 (2015)
Google Scholar
Zhao, D., Luo, W., Liu, R., Yue, L.: Negative iris recognition. IEEE Trans. Dependable Secure Comput. 15(1), 112–125 (2018)
Article Google Scholar

Download references

Acknowledgement

This work was partially supported by the National Natural Science Foundation of China (Grant No. 61806151, 61672398, 61702387), the Hubei Provincial Natural Science Foundation of China (Grant No. 2017CFA012, 2017CFB302), the Key Technical Innovation Project of Hubei (Grant No. 2017AAA122), Provincial Science & Technology International Cooperation Program of Hubei (Grant No. 2017AHB048), the Applied Fundamental Research of Wuhan (Grant No. 20160101010004), and the Open Fund of Hubei Key Lab. of Transportation of IoT (Grant No. 2017III28-004).

Author information

Authors and Affiliations

Hubei Key Laboratory of Transportation of Internet of Things, School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
Xiaoyi Hu, Liping Lu, Dongdong Zhao, Jianwen Xiang, Xing Liu, Shengwu Xiong & Jing Tian
Institution of Automotive Engineering, Hubei University of Automotive Technology, Shiyan, China
Haiying Zhou

Authors

Xiaoyi Hu
View author publications
You can also search for this author in PubMed Google Scholar
Liping Lu
View author publications
You can also search for this author in PubMed Google Scholar
Dongdong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jianwen Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Xing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Haiying Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Shengwu Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Jing Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Dongdong Zhao or Jianwen Xiang .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi Sing Leung
Kobe University, Kobe, Japan
Seiichi Ozawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, X. et al. (2018). Privacy-Preserving K-Means Clustering Upon Negative Databases. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11304. Springer, Cham. https://doi.org/10.1007/978-3-030-04212-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-04212-7_17
Published: 17 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04211-0
Online ISBN: 978-3-030-04212-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics