Towards Secure Cloud Data Similarity Retrieval: Privacy Preserving Near-Duplicate Image Data Detection

  • Yulin Wu
  • Xuan Wang
  • Zoe L. JiangEmail author
  • Xuan Li
  • Jin Li
  • S. M. Yiu
  • Zechao Liu
  • Hainan Zhao
  • Chunkai Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11337)


As the development of cloud computing technology, cloud storage service has been widely used these years. People upload most of their data files to the cloud for saving local storage space and making data sharing available everywhere. Except for storage service, data similarity retrieval is another basic service that cloud provides, especially for image data. As demand for near-duplicate image detection increases, it has been an attracted research topic in cloud image data similarity retrieval in resent years. However, due to some image data (like medical images and face recognition images) contains important privacy information, it is preferred to support privacy protection in cloud image data similarity retrieval. In this paper, focusing on image data stored in the cloud, we propose a privacy preserving near-duplicate image data detection scheme based on the LSH algorithm. In particular, users would use their own image data to generate image-feature LSH metadata vector using LSH algorithm and would store both the ciphertexts of image data and image-feature LSH metadata vector in cloud. When the inquirer queries the near-duplicate image data, he would generate the image-feature query token LSH metadata vector using LSH algorithm and send it to cloud. With the query token, cloud will execute the privacy-preserving near-duplicate image data detection and return the encrypted result to inquirer. Then the inquirer would decrypt the ciphertext and get the final result. Our security and performance analysis shows that the proposed scheme achieves the goals of privacy preserving and lightweight.


Near-duplicate Privacy preserving LSH algorithm Cloud image data Lightweight 



This work is supported by Basic Reasearch Project of Shenzhen of China (No. JCYJ20160318094015947, JCYJ20170307151518535), National Key Research and Development Program of China (No. 2017YFB0803002), The Natural Science Foundation of Fujian Province, China (No. 2017J05099), and National Natural Science Foundation of China (No. 61472091).


  1. 1.
    Gantz, J., Reinsel, D.: The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. IDC iView: IDC Anal. Future 2007(2012), 1–16 (2012)Google Scholar
  2. 2.
    Douceur, J.R., Adya, A., Bolosky, W.J., Simon, P., Theimer, M.: Reclaiming space from duplicate files in a serverless distributed file system. In: Proceedings of the 22nd International Conference on Distributed Computing Systems, pp. 617–624. IEEE (2002)Google Scholar
  3. 3.
    Bellare, M., Keelveedhi, S., Ristenpart, T.: Message-locked encryption and secure deduplication. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 296–312. Springer, Heidelberg (2013). Scholar
  4. 4.
    Wang, C., Chow, S.S.M., Wang, Q., Ren, K., Lou, W.: Privacy-preserving public auditing for secure cloud storage. IEEE Trans. Comput. 62(2), 362–375 (2013)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Harnik, D., Pinkas, B., Shulman-Peleg, A.: Side channels in cloud services: deduplication in cloud storage. IEEE Secur. Priv. 8(6), 40–47 (2010)CrossRefGoogle Scholar
  6. 6.
    Di Pietro, R., Sorniotti, A.: Boosting efficiency and security in proof of ownership for deduplication. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, pp. 81–82. ACM (2012)Google Scholar
  7. 7.
    Halevi, S., Harnik, D., Pinkas, B., Shulman-Peleg, A.: Proofs of ownership in remote storage systems. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, pp. 491–500. ACM (2011)Google Scholar
  8. 8.
    Xu, J., Chang, E.-C., Zhou, J.: Weak leakage-resilient client-side deduplication of encrypted data in cloud storage. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, pp. 195–206. ACM (2013)Google Scholar
  9. 9.
    Tan, Y., Jiang, H., Feng, D., Tian, L., Yan, Z., Zhou, G.: Sam: a semantic-aware multi-tiered source de-duplication framework for cloud backup. In: The 39th International Conference on Parallel Processing (ICPP), pp. 614–623. IEEE (2010)Google Scholar
  10. 10.
    Fu, Y., Jiang, H., Xiao, N., Tian, L., Liu, F.: AA-Dedupe: an application-aware source deduplication approach for cloud backup services in the personal computing environment. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 112–120. IEEE (2011)Google Scholar
  11. 11.
    Xu, L., Hu, J., Mkandawire, S., Jiang, H.: SHHC: a scalable hybrid hash cluster for cloud backup services in data centers. In: The 31st International Conference on Distributed Computing Systems Workshops (ICDCSW), pp. 61–65. IEEE (2011)Google Scholar
  12. 12.
    Li, J., Chen, X., Li, M., Li, J., Lee, P.P., Lou, W.: Secure deduplication with efficient and reliable convergent key management. IEEE Trans. Parallel Distrib. Syst. 25(6), 1615–1625 (2014)CrossRefGoogle Scholar
  13. 13.
    Li, J., Li, Y.K., Chen, X., Lee, P.P., Lou, W.: A hybrid cloud approach for secure authorized deduplication. IEEE Trans. Parallel Distrib. Syst. 26(5), 1206–1216 (2015)CrossRefGoogle Scholar
  14. 14.
    Li, J., et al.: Secure distributed deduplication systems with improved reliability. IEEE Trans. Comput. 64(12), 3569–3579 (2015)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Bellare, M., Keelveedhi, S., Ristenpart, T.: DupLESS: server-aided encryption for deduplicated storage. IACR Cryptology ePrint Archive 2013/429 (2013)Google Scholar
  16. 16.
    Stanek, J., Sorniotti, A., Androulaki, E., Kencl, L.: A secure data deduplication scheme for cloud storage. In: Christin, N., Safavi-Naini, R. (eds.) FC 2014. LNCS, vol. 8437, pp. 99–118. Springer, Heidelberg (2014). Scholar
  17. 17.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: The 47th Annual IEEE Symposium on Foundations of Computer Science, pp. 459–468. IEEE (2006)Google Scholar
  18. 18.
    Ke, Y., Sukthankar, R., Huston, L., Ke, Y., Sukthankar, R.: Efficient near-duplicate detection and sub-image retrieval. In: ACM Multimedia, vol. 4, p. 5. Citeseer (2004)Google Scholar
  19. 19.
    Qamra, A., Meng, Y., Chang, E.Y.: Enhanced perceptual distance functions and indexing for image replica recognition. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 379–391 (2005)CrossRefGoogle Scholar
  20. 20.
    Chum, O., Philbin, J., Isard, M., Zisserman, A.: Scalable near identical image and shot detection. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, pp. 549–556. ACM (2007)Google Scholar
  21. 21.
    Hu, Y., Li, M., Yu, N.: Efficient near-duplicate image detection by learning from examples. In: 2008 IEEE International Conference on Multimedia and Expo, pp. 657–660. IEEE (2008)Google Scholar
  22. 22.
    Kuzu, M., Islam, M.S., Kantarcioglu, M.: Efficient similarity search over encrypted data. In: The 28th International Conference on Data Engineering (ICDE), pp. 1156–1167. IEEE (2012)Google Scholar
  23. 23.
    Cui, H., Yuan, X., Wang, C.: Harnessing encrypted data in cloud for secure and efficient image sharing from mobile devices. In: 2015 IEEE International Conference on Computer Communications, pp. 2659–2667. IEEE (2015)Google Scholar
  24. 24.
    Yuan, X., Wang, X., Wang, C., Weng, J., Ren, K.: Enabling secure and fast indexing for privacy-assured healthcare monitoring via compressive sensing. IEEE Trans. Multimed. 18(10), 2002–2014 (2016)CrossRefGoogle Scholar
  25. 25.
    Cui, H., Yuan, X., Zheng, Y., Wang, C.: Enabling secure and effective near-duplicate detection over encrypted in-network storage. In: The 35th Annual IEEE International Conference on Computer Communications, pp. 1–9. IEEE (2016)Google Scholar
  26. 26.
    Yuan, X., Wang, X., Wang, C., Chenyun, Y., Nutanong, S.: Privacy-preserving similarity joins over encrypted data. IEEE Trans. Inf. Forensics Secur. 12(11), 2763–2775 (2017)CrossRefGoogle Scholar
  27. 27.
    Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on P-stable distributions. In: Proceedings of the 20th annual Symposium on Computational Geometry, pp. 253–262. ACM (2004)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yulin Wu
    • 1
  • Xuan Wang
    • 1
  • Zoe L. Jiang
    • 1
    Email author
  • Xuan Li
    • 2
  • Jin Li
    • 3
  • S. M. Yiu
    • 4
  • Zechao Liu
    • 1
  • Hainan Zhao
    • 1
  • Chunkai Zhang
    • 1
  1. 1.School of Computer Science and TechnologyHarbin Institute of Technology (Shenzhen)ShenzhenChina
  2. 2.College of mathematics and informaticsFujian Normal UniversityFuzhouChina
  3. 3.School of Computational Science and Education SoftwareGuangzhou UniversityGuangzhouChina
  4. 4.The University of Hong KongPok Fu LamHong Kong SAR, China

Personalised recommendations