Skip to main content

A Secure Data Deduplication Scheme for Cloud Storage

  • Conference paper
  • First Online:
Financial Cryptography and Data Security (FC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8437))

Included in the following conference series:

Abstract

As more corporate and private users outsource their data to cloud storage providers, recent data breach incidents make end-to-end encryption an increasingly prominent requirement. Unfortunately, semantically secure encryption schemes render various cost-effective storage optimization techniques, such as data deduplication, ineffective. We present a novel idea that differentiates data according to their popularity. Based on this idea, we design an encryption scheme that guarantees semantic security for unpopular data and provides weaker security and better storage and bandwidth benefits for popular data. This way, data deduplication can be effective for popular data, whilst semantically secure encryption protects unpopular content. We show that our scheme is secure under the Symmetric External Decisional Diffie-Hellman Assumption in the random oracle model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We have chosen to formalize this approach for the sake of readability. In practice, one would adopt a solution in which the file is encrypted only once with \(K\); this key, not the entire file, is in turn encrypted with a slightly modified version of \(\mathcal {E}_\mu \) that allows \(H_1(F_c)\) to be used as the \(H_1\)-hash for computing ciphertext and decryption shares for \(K\). This approach would require uploading and storing a single ciphertext of the file and not two as described above.

References

  1. Open Security Foundation: DataLossDB. http://datalossdb.org/

  2. Meister, D., Brinkmann, A.: Multi-level comparison of data deduplication in a backup scenario. In: SYSTOR ’09, pp. 8:1–8:12. ACM, New York (2009)

    Google Scholar 

  3. Mandagere, N., Zhou, P., Smith, M.A., Uttamchandani, S.: Demystifying data deduplication. In: Middleware ’08, pp. 12–17. ACM, New York (2008)

    Google Scholar 

  4. Aronovich, L., Asher, R., Bachmat, E., Bitner, H., Hirsch, M., Klein, S.T.: The design of a similarity based deduplication system. In: SYSTOR ’09, pp. 6:1–6:14 (2009)

    Google Scholar 

  5. Dutch, M., Freeman, L.: Understanding data de-duplication ratios. SNIA forum (2008). http://www.snia.org/sites/default/files/Understanding_Data_Deduplication_Ratios-20080718.pdf

  6. Harnik, D., Margalit, O., Naor, D., Sotnikov, D., Vernik, G.: Estimation of deduplication ratios in large data sets. In: IEEE MSST ’12, pp. 1–11, April 2012

    Google Scholar 

  7. Harnik, D., Pinkas, B., Shulman-Peleg, A.: Side channels in cloud services: deduplication in cloud storage. IEEE Security Privacy 8(6), 40–47 (2010)

    Article  Google Scholar 

  8. Halevi, S., Harnik, D., Pinkas, B., Shulman-Peleg, A.: Proofs of ownership in remote storage systems. In: CCS ’11, pp. 491–500. ACM, New York (2011)

    Google Scholar 

  9. Di Pietro, R., Sorniotti, A.: Boosting efficiency and security in proof of ownership for deduplication. In: ASIACCS ’12, pp. 81–82. ACM, New York (2012)

    Google Scholar 

  10. Douceur, J.R., Adya, A., Bolosky, W.J., Simon, D., Theimer, M.: Reclaiming space from duplicate files in a serverless distributed file system. In: ICDCS ’02, pp. 617–632. IEEE Computer Society, Washington, DC (2002)

    Google Scholar 

  11. Storer, M.W., Greenan, K., Long, D.D., Miller, E.L.: Secure data deduplication. In: StorageSS ’08, pp. 1–10. ACM, New York (2008)

    Google Scholar 

  12. Bellare, M., Keelveedhi, S., Ristenpart, T.: Message-locked encryption and secure deduplication. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 296–312. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  13. Xu, J., Chang, E.C., Zhou, J.: Weak leakage-resilient client-side deduplication of encrypted data in cloud storage. In: 8th ACM SIGSAC Symposium, pp. 195–206

    Google Scholar 

  14. Bellare, M., Keelveedhi, S., Ristenpart, T.: DupLESS: server-aided encryption for deduplicated storage. In: 22nd USENIX Conference on Security, pp. 179–194 (2013)

    Google Scholar 

  15. Douceur, J.R.: The sybil attack. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, pp. 251–260. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  16. Goldwasser, S., Micali, S.: Probabilistic encryption. J. Comput. Syst. Sci 28, 270–299 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  17. Fahl, S., Harbach, M., Muders, T., Smith, M.: Confidentiality as a service-usable security for the cloud. In: TrustCom 2012, pp. 153–162 (2012)

    Google Scholar 

  18. Fahl, S., Harbach, M., Muders, T., Smith, M., Sander, U.: Helping johnny 2.0 to encrypt his facebook conversations. In: SOUPS 2012, pp. 11–28 (2012)

    Google Scholar 

  19. Ateniese, G., Blanton, M., Kirsch, J.: Secret handshakes with dynamic and fuzzy matching. In: NDSS ’07 (2007)

    Google Scholar 

  20. Shamir, A.: How to share a secret. Commun. ACM 22(11), 612–613 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  21. Goyal, V., Pandey, O., Sahai, A., Waters, B.: Attribute-based encryption for fine-grained access control of encrypted data. In: ACM CCS ’06, pp. 89–98 (2006)

    Google Scholar 

  22. Canetti, R., Lindell, Y., Ostrovsky, R., Sahai, A.: Universally composable two-party and multi-party secure computation. In: STOC ’02 (2002)

    Google Scholar 

  23. Camenisch, J.L., Hohenberger, S., Lysyanskaya, A.: Balancing accountability and privacy using e-cash (extended abstract). In: De Prisco, R., Yung, M. (eds.) SCN 2006. LNCS, vol. 4116, pp. 141–155. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  24. Lysyanskaya, A., Rivest, R.L., Sahai, A., Wolf, S.: Pseudonym systems (extended abstract). In: Heys, H.M., Adams, C.M. (eds.) SAC 1999. LNCS, vol. 1758, pp. 184–199. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was supported by the Grant Agency of the Czech Technical University in Prague, grant No. SGS13/139/OHK3/2T/13.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elli Androulaki .

Editor information

Editors and Affiliations

Appendices

Appendix A: Proof of Lemma 1

SXDH assumes two groups of prime order \(q\), \(\mathbb {G}_1 \), and \(\mathbb {G}_2 \), such that there is not an efficiently computable distortion map between the two; a bilinear group \(\mathbb {G}_{T} \), and an efficient, non-degenerate bilinear map \(\hat{e}: \mathbb {G}_1 \times \mathbb {G}_2 \rightarrow \mathbb {G}_{T} \). In this setting, the Decisional Diffie-Hellman (DDH) holds in both \(\mathbb {G}_1\), and \(\mathbb {G}_2\), and that the bilinear decisional Diffie-Hellman (BDDH) holds given the existence of \(\hat{e}\) [19].

Challenger \(\mathcal {C}\) is given an SXDH context \(\mathbb {G}_1 ', \mathbb {G}_2 ', \mathbb {G}_{T} ', \hat{e}'\) and an instance of the DDH problem \(\langle \mathbb {G}_1 ', g ', A=(g ')^a, B=(g ')^b, W \rangle \) in \(\mathbb {G}_1\)’. \(\mathcal {C}\) simulates an environment in which \(\mathcal {A}\) operates, using its advantage in the game \(\mathsf{DS }_{\mu }\)-\(\mathsf{IND }\) to decide whether \(W = g '^{ab}\). \(\mathcal {C}\) interacts with \(\mathcal {A}\) in the \(\mathsf{DS }_{\mu }\)-\(\mathsf{IND }\) game as follows:

  • Setup Phase \(\mathcal {C}\) sets \(\mathbb {G}_1 \leftarrow \mathbb {G}_1 '\), \(\mathbb {G}_2 \leftarrow \mathbb {G}_2 '\), \(\mathbb {G}_{T} \leftarrow \mathbb {G}_{T} '\), \(\hat{e} = \hat{e}'\), \(g \leftarrow g '\); picks a random generator \(\bar{g} \) of \(\mathbb {G}_2\) and sets \(\bar{g} _{pub} = (\bar{g})^\mathsf{sk}\), where \(\mathsf{sk}\leftarrow _R \mathbb {Z}_q^*\). \(\mathcal {C}\) also generates the set of user identities \(\mathsf{U}= \{{\mathsf {U}}_i \}_{i=0}^{n-1}\). The public key \(\mathsf{pk}= \{q, \mathbb {G}_1, \mathbb {G}_2, \mathbb {G}_{T} \, \hat{e}, \mathcal {O}_{\mathsf{H}_\mathsf{1}}, \mathcal {O}_{\mathsf{H}_\mathsf{2}}, \bar{g}, \bar{g} _{pub}\}\) and \(\mathsf{U}\) are forwarded to \(\mathcal {A}\). \(\mathcal {A}\) declares the list \(\mathsf {U}_\mathcal {A} \) of \(n_\mathcal {A} < t-1\) user identities that will later on be subject to \(\mathcal {O}_{\mathsf{Corrupt}}\) calls. Let \(\mathsf {U}_\mathcal {A} =\{{\mathsf {U}}_i \}_{i=0}^{n_\mathcal {A}-1}\). To generate key-shares \(\{\mathsf{sk_{i}}\}_{i=0}^{n-1}\), \(\mathcal {C}\) constructs a \(t-1\)-degree Lagrange polynomial \(\mathrm {P} ()\) with interpolation points \(\mathrm {I_P} = \{ (0, \mathsf{sk_{}}) \cup \{ (\mathsf{r_{i}}, y_i) \}_{i=0}^{t-2} \}, \) where \(\mathsf{r_{i}}, y_i\leftarrow _R \mathbb {Z}_q^*\), for \(i \in [0,t-3]\), and \(\mathsf{r_{t-2}} \leftarrow _R \mathbb {Z}_q^*\), \(y_{t-2} \leftarrow a\). Secret key-shares are set to \(\mathsf{sk_{i}} \leftarrow y_i, i \in [0,n-1]\). Since \(a\) is not known to \(\mathcal {C}\), \(\mathcal {A}\) sets the corrupted key-shares to be \(\mathsf{sk_{i}}\) for \(i \in [0, n_\mathcal {A}-1]\).

  • Access to Oracles \(\mathcal {C}\) simulates oracles \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\), \(\mathcal {O}_{\mathsf{H}_\mathsf{2}}\), \(\mathcal {O}_{\mathsf{Corrupt}}\) and \(\mathcal {O}_{\mathsf{DShare}}\) :

    • \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\): to respond to \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\)-queries, \(\mathcal {C}\) maintains a list of tuples \(\{\mathsf{H_1}, v, h_{v}, \rho _{v}, \mathrm {c}_v\}\) as explained below. We refer to this list as \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) list, and it is initially empty. When \(\mathcal {A}\) submits an \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) query for \(v\), \(\mathcal {C}\) checks if \(v\) already appears in the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) list in a tuple \(\{v, h_{v}, \rho _{v}, \mathrm {c}_{v}\}\). If so, \(\mathcal {C}\) responds with \(\mathsf{H_1}(v) = h_{v}\). Otherwise, \(\mathcal {C}\) picks \(\rho _{v} \leftarrow _R \mathbb {Z}_q^*\), and flips a coin \(\mathrm {c}_{v}\); \(\mathrm {c}_{v}\) flips to \('0'\) with probability \(\delta \) for some \(\delta \) to be determined later. If \(\mathrm {c}_{v}\) equals \('0'\), \(\mathcal {C}\) responds \(\mathsf{H_1}(v) = h_{v} = g^{\rho _{v}}\) and stores \(\{v, h_{v}, \rho _{v}, \mathrm {c}_{v}\}\); otherwise, she returns \(\mathsf{H_1}(v) = h_{v} = B^{\rho _{v}}\) and stores \(\{v, h_{v}, \rho _{v}, \mathrm {c}_{v}\}\).

    • \(\mathcal {O}_{\mathsf{H}_\mathsf{2}}\): The challenger \(\mathcal {C}\) responds to a newly submitted \(\mathcal {O}_{\mathsf{H}_\mathsf{2}}\) query for \(v\) with a randomly chosen \(h_{v} \in \mathbb {G}_{T} \). To be consistent in her \(\mathcal {O}_{\mathsf{H}_\mathsf{2}}\) responses, \(\mathcal {C}\) maintains the history of her responses in her local memory.

    • \(\mathcal {O}_{\mathsf{Corrupt}}\): \(\mathcal {C}\) responds to a \(\mathcal {O}_{\mathsf{Corrupt}}\) query involving user \({\mathsf {U}}_i \in \mathsf {U}_\mathcal {A} \), by returning the coordinate \(y_i\) chosen in the Setup Phase.

    • \(\mathcal {O}_{\mathsf{DShare}}\): simulation of \(\mathcal {O}_{\mathsf{DShare}}\) is performed as follows. As before, \(\mathcal {C}\) keeps track of the submitted \(\mathcal {O}_{\mathsf{DShare}}\) queries in her local memory. Let \(\langle m, {\mathsf {U}}_i \rangle \) be a decryption query submitted for message \(m\) and user identity \({\mathsf {U}}_i \). If there is no entry in \(\mathsf{H_1}\)-list for \(m\), then \(\mathcal {C}\) runs the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) algorithm for \(m\). Let \(\{m, h_{m}, \rho _{m}, \mathrm {c}_m\}\) be the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) entry in \(\mathcal {C}\)’s local memory for message \(m\). Let \(\mathrm {I_P} ' \leftarrow \mathrm {I_P} \setminus (\mathsf{r_{t-2}}, \mathsf{sk_{t-2}})\). \(\mathcal {C}\) responds with \({\mathsf {ds}}_{m,i} = \left( g^{\sum \limits _{(\mathsf{r_{j}}, \mathsf{sk_{j}}) \in \mathrm {I_P} '} \mathsf{sk_{j}} \lambda _{\mathsf{r_{i}},\mathsf{r_{j}}}^{\mathrm {I_P} '}} X^{\lambda _{\mathsf{r_{i}} ,\mathsf{r_{t-2}}}^{\mathrm {I_P} '}}\right) ^{\rho _m}\) where \(X \leftarrow A\) iff \(\mathrm {c}_m = 0\), and \(X \leftarrow W\) iff \(\mathrm {c}_m = 1\). In both cases, \(\mathcal {C}\) keeps a record of her response in her local memory.

  • Challenge Phase \(\mathcal {A}\) selects the challenge message \(m_*\). Let the corresponding entry in the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) list be \(\{m_*, h_{m_*}, \rho _{m_*}, \mathrm {c}_{m_*}\}\). If \(\mathrm {c}_{m_*} = 0\), then \(\mathcal {C}\) aborts.

  • Guessing Phase \(\mathcal {A}\) outputs one bit \({\mathsf {b}}'_{m_*}\) representing the guess for \({\mathsf {b}}_{m_*}\). \(\mathcal {C}\) responds positively to the DDH challenger if \({\mathsf {b}}_{m_*}'=0\), and negatively otherwise.

It is easy to see, that if \(\mathcal {A}\)’s answer is \('0'\), it means that the \(\mathcal {O}_{\mathsf{DShare}}\) responses for \(m_*\) constitute properly structured decryption shares for \(m_*\). This can only be if \(W = g^{ab}\) and \(\mathcal {C}\) can give a positive answer to the SXDH challenger. Clearly, if \(\mathrm {c}_{m_*} = 1\) and \(\mathrm {c}_{m} = 0\) for all other queries to \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) such that \(m \ne m_*\), the execution environment is indistinguishable from the actual game \(\mathsf{DS }_{\mu }\)-\(\mathsf{IND }\). This happens with probability \(\mathrm {Pr}[\mathrm {c}_{m_*} = 1\ \wedge \ (\forall m \ne m_*: \mathrm {c}_{m} = 0)] = \delta (1 - \delta )^{\mathcal {Q}_{H_1} - 1}, \) where \(\mathcal {Q}_{H_1}\) is the number of distinct \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) queries. By setting \(\delta \approx \frac{1}{\mathcal {Q}_{H_1} - 1}\) the above probability becomes greater than \(\frac{1}{e\cdot (\mathcal {Q}_{H_1}-1)}\) and the success probability of the adversary can be bounded as \(\mathsf {Adv}_{{{\mathsf{DS }_{\mu }}}-{\mathsf{IND }}}^{\mathcal {A}} \le e \cdot (\mathcal {Q}_{H_1}-1) \cdot \mathsf {Adv}_{\mathsf {SXDH}}^\mathcal {C} \).

Appendix B: Proof of Lemma 2

Challenger \(\mathcal {C}\) is given an instance \(\langle q'\), \(\mathbb {G}_1 ', \mathbb {G}_2 ', \mathbb {G}_{T} ', \hat{e}', g ', \bar{g} ', A=(g ')^a, B=(g ')^b, C=(g ')^c, \bar{A}=(\bar{g} ')^a, \bar{B}=(\bar{g} ')^b, \bar{C}=(\bar{g} ')^c, W \rangle \) of the SXDH problem and wishes to use \(\mathcal {A}\) to decide if \(W = \hat{e}\left( g ', \bar{g} ' \right) ^{abc}\). The algorithm \(\mathcal {C}\) simulates an environment in which \(\mathcal {A}\) operates, using its advantage in the game \(\mathsf{IND }_{\mu }\)-\(\mathsf{CPA }\) to help compute the solution to the BDDH problem, as described before. \(\mathcal {C}\) interacts with \(\mathcal {A}\) within an \(\mathsf{IND }_{\mu }\)-\(\mathsf{CPA }\) game:

  • Setup Phase \(\mathcal {C}\) sets \(q \leftarrow q'\), \(\mathbb {G}_1 \leftarrow \mathbb {G}_1 '\), \(\mathbb {G}_2 \leftarrow \mathbb {G}_2 '\), \(\mathbb {G}_{T} \leftarrow \mathbb {G}_{T} '\), \(\hat{e} = \hat{e}'\), \(g \leftarrow g '\), \(\bar{g} \leftarrow \bar{g} '\), \(\bar{g} _{pub} = \bar{A}\). Notice that the secret key \(\mathsf{sk}= a\) is not known to \(\mathcal {C}\). \(\mathcal {C}\) also generates the list of user identities \(\mathsf{U}\). \(\mathcal {C}\) sends \(\mathsf{pk}= \{q, \mathbb {G}_1, \mathbb {G}_2, \mathbb {G}_{T} \, \hat{e}, \mathcal {O}_{\mathsf{H}_\mathsf{1}}, \mathcal {O}_{\mathsf{H}_\mathsf{2}}, \bar{g}, \bar{g} _{pub}\}\) to \(\mathcal {A}\). At this point, \(\mathcal {A}\) declares the list of corrupted users \(\mathsf {U}_\mathcal {A} \) as in \(\mathsf{DS }_{\mu }\)-\(\mathsf{IND }\). Let \(\mathsf {U}_\mathcal {A} =\{{\mathsf {U}}_i \}_{i=0}^{n_\mathcal {A}-1}\). To generate key-shares \(\{\mathsf{sk_{i}}\}_{i=0}^{n-1}\), \(\mathcal {C}\) picks a \(t-1\) degree Lagrange polynomial \(\mathrm {P} ()\) assuming interpolation points \( \mathrm {I_P} = \left\{ (0, a)\ \cup \ \{(\mathsf{r_{i}}, y_{i})\}_{i=0}^{t-2} \right\} ,\) where \(\mathsf{r_{i}}, y_{i} \leftarrow _R \mathbb {Z}_q^*\). She then sets the key-shares to \(\mathsf{sk_{i}} \leftarrow y_i, i \in [0, n-1]\) and assigns \(\mathsf{sk_{i}}\) for \(i \in [0, n_\mathcal {A}-1]\) to corrupted users.

  • Access to Oracles \(\mathcal {C}\) simulates oracles \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\), \(\mathcal {O}_{\mathsf{H}_\mathsf{2}}\), \(\mathcal {O}_{\mathsf{Corrupt}}\) and \(\mathcal {O}_{\mathsf{DShare}}\) :

    • \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\), \(\mathcal {O}_{\mathsf{H}_\mathsf{2}}\), \(\mathcal {O}_{\mathsf{Corrupt}}\): \(\mathcal {C}\) responds to these queries as in \(\mathsf{DS }_{\mu }\)-\(\mathsf{IND }\).

    • \(\mathcal {O}_{\mathsf{DShare}}\): \(\mathcal {C}\) keeps track of the submitted \(\mathcal {O}_{\mathsf{DShare}}\) -queries in her local memory. Let \(\langle m, {\mathsf {U}}_i \rangle \) be a decryption query submitted for message \(m\) and user identity \({\mathsf {U}}_i \). If there is no entry in \(\mathsf{H_1}\)-list for \(m\), then \(\mathcal {C}\) runs the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) algorithm for \(m\). Let \(\{m, h_{m}, \rho _{m}, \mathrm {c}_m\}\) be the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) entry in \(\mathcal {C}\)’s local memory for message \(m\). If \(\mathrm {c}_m = 1\), and \(\mathcal {A}\) has already submitted \(t-n_{\mathcal {A}}-1\) queries for \(m\), \(\mathcal {C}\) aborts. If the limit of \(t-n_{\mathcal {A}}-1\) queries has not been reached, \(\mathcal {C}\) responds with a random \({\mathsf {ds}}_{m,i} \in \mathbb {G}_1 \) and keeps a record for it. From Lemma 1, this step is legitimate as long as less than \(t\) decryption shares are available for \(m\). Let \(\mathrm {I_P} ' \leftarrow \mathrm {I_P} \setminus (0, a)\). If \(\mathrm {c}_m = 0\), \(\mathcal {C}\) responds with \({\mathsf {ds}}_{m,i} = \left( g^{\sum \limits _{(\mathsf{r_{j}}, \mathsf{sk_{j}}) \in \mathrm {I_P} '} \mathsf{sk_{j}} \lambda _{\mathsf{r_{i}},\mathsf{r_{j}}}^{\mathrm {I_P} '}} A^{\lambda _{\mathsf{r_{i}},0}^{\mathrm {I_P} '}}\right) ^{\mathsf{r_{m}}}\).

  • Challenge Phase \(\mathcal {A}\) submits \(m_*\) to \(\mathcal {C}\). \(\mathcal {A}\) has not submitted \(\mathcal {O}_{\mathsf{DShare}}\) -queries for the challenge message with more than \(t-n_{\mathcal {A}}-1\) distinct user identities. Next, \(\mathcal {C}\) runs the algorithm for responding to \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\)-queries for \(m_*\) to recover the entry from the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\)-list. Let the entry be \(\{m_*, h_{m_*}, \rho _{m_*}, \mathrm {c}_{m_*}\}\). If \(\mathrm {c}_{m_*} = 0\), \(\mathcal {C}\) aborts. Otherwise, \(\mathcal {C}\) computes \(e_{*} \leftarrow W^{\rho _{m_*}}\), sets \(c_{*} \leftarrow \langle m_* \oplus \mathsf{H_2}(e_{*}), \bar{C}\rangle \) and returns \(c_{*}\) to \(\mathcal {A}\).

  • Guessing Ph. \(\mathcal {A}\) outputs the guess \({\mathsf {b}}'\) for \({\mathsf {b}}\). \(\mathcal {C}\) provides \({\mathsf {b}}'\) for its SXDH challenge.

If \({\mathcal {A}}\)’s answer is \({\mathsf {b}}' = 1\), it means that she has recognized the ciphertext \(c_*\) as the encryption of \(m_*\); \(\mathcal {C}\) can then give the positive answer to her SXDH challenge. Indeed, \( W^{\rho _{m_*}} = \hat{e}\left( g, \bar{g} \right) ^{abc\rho _{m_*}} = \hat{e}\left( (B^{\rho _{m_*}})^a, \bar{g} ^c \right) = \hat{e}\left( H_1(m_*)^{\mathsf{sk}}, \bar{C} \right) . \) Clearly, if \(\mathrm {c}_{m_*} = 1\) and \(\mathrm {c}_{m} = 0\) for all other queries to \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) such that \(m \ne m_*\), then the execution environment is indistinguishable from the actual game \(\mathsf{IND }_{\mu }\)-\(\mathsf{CPA }\). This happens with probability \(\mathrm {Pr}[\mathrm {c}_{m_*} = 1\ \wedge \ (\forall m \ne m_*: \mathrm {c}_{m} = 0)] = \delta (1 - \delta )^{\mathcal {Q}_{H_1} - 1}\), where \(\mathcal {Q}_{H_1}\) is the number of different \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\)-queries. By setting \(\delta \approx \frac{1}{\mathcal {Q}_{H_1} - 1}\), the above probability becomes greater than \(\frac{1}{e\cdot (\mathcal {Q}_{H_1}-1)}\), and the success probability of the adversary \(\mathsf {Adv}_{{{\mathsf{IND }_{\mu }}}-{\mathsf{CPA }}}^{\mathcal {A}}\) is bounded as \(\mathsf {Adv}_{{{\mathsf{IND }_{\mu }}}-{\mathsf{CPA }}}^{\mathcal {A}} \le e \cdot (\mathcal {Q}_{H_1}-1) \cdot \mathsf {Adv}_{\mathsf {SXDH}}^\mathcal {C} \).

Rights and permissions

Reprints and permissions

Copyright information

© 2014 International Financial Cryptography Association

About this paper

Cite this paper

Stanek, J., Sorniotti, A., Androulaki, E., Kencl, L. (2014). A Secure Data Deduplication Scheme for Cloud Storage. In: Christin, N., Safavi-Naini, R. (eds) Financial Cryptography and Data Security. FC 2014. Lecture Notes in Computer Science(), vol 8437. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45472-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45472-5_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45471-8

  • Online ISBN: 978-3-662-45472-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics