A Secure Data Deduplication Scheme for Cloud Storage

Stanek, Jan; Sorniotti, Alessandro; Androulaki, Elli; Kencl, Lukas

doi:10.1007/978-3-662-45472-5_8

Jan Stanek¹⁵,
Alessandro Sorniotti¹⁶,
Elli Androulaki¹⁶ &
…
Lukas Kencl¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8437))

Included in the following conference series:

International Conference on Financial Cryptography and Data Security

6571 Accesses
73 Citations

Abstract

As more corporate and private users outsource their data to cloud storage providers, recent data breach incidents make end-to-end encryption an increasingly prominent requirement. Unfortunately, semantically secure encryption schemes render various cost-effective storage optimization techniques, such as data deduplication, ineffective. We present a novel idea that differentiates data according to their popularity. Based on this idea, we design an encryption scheme that guarantees semantic security for unpopular data and provides weaker security and better storage and bandwidth benefits for popular data. This way, data deduplication can be effective for popular data, whilst semantically secure encryption protects unpopular content. We show that our scheme is secure under the Symmetric External Decisional Diffie-Hellman Assumption in the random oracle model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We have chosen to formalize this approach for the sake of readability. In practice, one would adopt a solution in which the file is encrypted only once with \(K\); this key, not the entire file, is in turn encrypted with a slightly modified version of \(\mathcal {E}_\mu \) that allows \(H_1(F_c)\) to be used as the \(H_1\)-hash for computing ciphertext and decryption shares for \(K\). This approach would require uploading and storing a single ciphertext of the file and not two as described above.

References

Open Security Foundation: DataLossDB. http://datalossdb.org/
Meister, D., Brinkmann, A.: Multi-level comparison of data deduplication in a backup scenario. In: SYSTOR ’09, pp. 8:1–8:12. ACM, New York (2009)
Google Scholar
Mandagere, N., Zhou, P., Smith, M.A., Uttamchandani, S.: Demystifying data deduplication. In: Middleware ’08, pp. 12–17. ACM, New York (2008)
Google Scholar
Aronovich, L., Asher, R., Bachmat, E., Bitner, H., Hirsch, M., Klein, S.T.: The design of a similarity based deduplication system. In: SYSTOR ’09, pp. 6:1–6:14 (2009)
Google Scholar
Dutch, M., Freeman, L.: Understanding data de-duplication ratios. SNIA forum (2008). http://www.snia.org/sites/default/files/Understanding_Data_Deduplication_Ratios-20080718.pdf
Harnik, D., Margalit, O., Naor, D., Sotnikov, D., Vernik, G.: Estimation of deduplication ratios in large data sets. In: IEEE MSST ’12, pp. 1–11, April 2012
Google Scholar
Harnik, D., Pinkas, B., Shulman-Peleg, A.: Side channels in cloud services: deduplication in cloud storage. IEEE Security Privacy 8(6), 40–47 (2010)
Article Google Scholar
Halevi, S., Harnik, D., Pinkas, B., Shulman-Peleg, A.: Proofs of ownership in remote storage systems. In: CCS ’11, pp. 491–500. ACM, New York (2011)
Google Scholar
Di Pietro, R., Sorniotti, A.: Boosting efficiency and security in proof of ownership for deduplication. In: ASIACCS ’12, pp. 81–82. ACM, New York (2012)
Google Scholar
Douceur, J.R., Adya, A., Bolosky, W.J., Simon, D., Theimer, M.: Reclaiming space from duplicate files in a serverless distributed file system. In: ICDCS ’02, pp. 617–632. IEEE Computer Society, Washington, DC (2002)
Google Scholar
Storer, M.W., Greenan, K., Long, D.D., Miller, E.L.: Secure data deduplication. In: StorageSS ’08, pp. 1–10. ACM, New York (2008)
Google Scholar
Bellare, M., Keelveedhi, S., Ristenpart, T.: Message-locked encryption and secure deduplication. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 296–312. Springer, Heidelberg (2013)
Chapter Google Scholar
Xu, J., Chang, E.C., Zhou, J.: Weak leakage-resilient client-side deduplication of encrypted data in cloud storage. In: 8th ACM SIGSAC Symposium, pp. 195–206
Google Scholar
Bellare, M., Keelveedhi, S., Ristenpart, T.: DupLESS: server-aided encryption for deduplicated storage. In: 22nd USENIX Conference on Security, pp. 179–194 (2013)
Google Scholar
Douceur, J.R.: The sybil attack. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, pp. 251–260. Springer, Heidelberg (2002)
Chapter Google Scholar
Goldwasser, S., Micali, S.: Probabilistic encryption. J. Comput. Syst. Sci 28, 270–299 (1984)
Article MATH MathSciNet Google Scholar
Fahl, S., Harbach, M., Muders, T., Smith, M.: Confidentiality as a service-usable security for the cloud. In: TrustCom 2012, pp. 153–162 (2012)
Google Scholar
Fahl, S., Harbach, M., Muders, T., Smith, M., Sander, U.: Helping johnny 2.0 to encrypt his facebook conversations. In: SOUPS 2012, pp. 11–28 (2012)
Google Scholar
Ateniese, G., Blanton, M., Kirsch, J.: Secret handshakes with dynamic and fuzzy matching. In: NDSS ’07 (2007)
Google Scholar
Shamir, A.: How to share a secret. Commun. ACM 22(11), 612–613 (1979)
Article MATH MathSciNet Google Scholar
Goyal, V., Pandey, O., Sahai, A., Waters, B.: Attribute-based encryption for fine-grained access control of encrypted data. In: ACM CCS ’06, pp. 89–98 (2006)
Google Scholar
Canetti, R., Lindell, Y., Ostrovsky, R., Sahai, A.: Universally composable two-party and multi-party secure computation. In: STOC ’02 (2002)
Google Scholar
Camenisch, J.L., Hohenberger, S., Lysyanskaya, A.: Balancing accountability and privacy using e-cash (extended abstract). In: De Prisco, R., Yung, M. (eds.) SCN 2006. LNCS, vol. 4116, pp. 141–155. Springer, Heidelberg (2006)
Chapter Google Scholar
Lysyanskaya, A., Rivest, R.L., Sahai, A., Wolf, S.: Pseudonym systems (extended abstract). In: Heys, H.M., Adams, C.M. (eds.) SAC 1999. LNCS, vol. 1758, pp. 184–199. Springer, Heidelberg (2000)
Chapter Google Scholar

Download references

Acknowledgements

This work was supported by the Grant Agency of the Czech Technical University in Prague, grant No. SGS13/139/OHK3/2T/13.

Author information

Authors and Affiliations

Czech Technical University in Prague, Prague, Czech Republic
Jan Stanek & Lukas Kencl
IBM Research - Zurich, Rüschlikon, Switzerland
Alessandro Sorniotti & Elli Androulaki

Authors

Jan Stanek
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Sorniotti
View author publications
You can also search for this author in PubMed Google Scholar
Elli Androulaki
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Kencl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elli Androulaki .

Editor information

Editors and Affiliations

Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Nicolas Christin
University of Calgary, Calgary, Alberta, Canada
Reihaneh Safavi-Naini

Appendices

Appendix A: Proof of Lemma 1

SXDH assumes two groups of prime order \(q\), \(\mathbb {G}_1 \), and \(\mathbb {G}_2 \), such that there is not an efficiently computable distortion map between the two; a bilinear group \(\mathbb {G}_{T} \), and an efficient, non-degenerate bilinear map \(\hat{e}: \mathbb {G}_1 \times \mathbb {G}_2 \rightarrow \mathbb {G}_{T} \). In this setting, the Decisional Diffie-Hellman (DDH) holds in both \(\mathbb {G}_1\), and \(\mathbb {G}_2\), and that the bilinear decisional Diffie-Hellman (BDDH) holds given the existence of \(\hat{e}\) [19].

Challenger \(\mathcal {C}\) is given an SXDH context \(\mathbb {G}_1 ', \mathbb {G}_2 ', \mathbb {G}_{T} ', \hat{e}'\) and an instance of the DDH problem \(\langle \mathbb {G}_1 ', g ', A=(g ')^a, B=(g ')^b, W \rangle \) in \(\mathbb {G}_1\)’. \(\mathcal {C}\) simulates an environment in which \(\mathcal {A}\) operates, using its advantage in the game \(\mathsf{DS }_{\mu }\)-\(\mathsf{IND }\) to decide whether \(W = g '^{ab}\). \(\mathcal {C}\) interacts with \(\mathcal {A}\) in the \(\mathsf{DS }_{\mu }\)-\(\mathsf{IND }\) game as follows:

Setup Phase \(\mathcal {C}\) sets \(\mathbb {G}_1 \leftarrow \mathbb {G}_1 '\), \(\mathbb {G}_2 \leftarrow \mathbb {G}_2 '\), \(\mathbb {G}_{T} \leftarrow \mathbb {G}_{T} '\), \(\hat{e} = \hat{e}'\), \(g \leftarrow g '\); picks a random generator \(\bar{g} \) of \(\mathbb {G}_2\) and sets \(\bar{g} _{pub} = (\bar{g})^\mathsf{sk}\), where \(\mathsf{sk}\leftarrow _R \mathbb {Z}_q^*\). \(\mathcal {C}\) also generates the set of user identities \(\mathsf{U}= \{{\mathsf {U}}_i \}_{i=0}^{n-1}\). The public key \(\mathsf{pk}= \{q, \mathbb {G}_1, \mathbb {G}_2, \mathbb {G}_{T} \, \hat{e}, \mathcal {O}_{\mathsf{H}_\mathsf{1}}, \mathcal {O}_{\mathsf{H}_\mathsf{2}}, \bar{g}, \bar{g} _{pub}\}\) and \(\mathsf{U}\) are forwarded to \(\mathcal {A}\). \(\mathcal {A}\) declares the list \(\mathsf {U}_\mathcal {A} \) of \(n_\mathcal {A} < t-1\) user identities that will later on be subject to \(\mathcal {O}_{\mathsf{Corrupt}}\) calls. Let \(\mathsf {U}_\mathcal {A} =\{{\mathsf {U}}_i \}_{i=0}^{n_\mathcal {A}-1}\). To generate key-shares \(\{\mathsf{sk_{i}}\}_{i=0}^{n-1}\), \(\mathcal {C}\) constructs a \(t-1\)-degree Lagrange polynomial \(\mathrm {P} ()\) with interpolation points \(\mathrm {I_P} = \{ (0, \mathsf{sk_{}}) \cup \{ (\mathsf{r_{i}}, y_i) \}_{i=0}^{t-2} \}, \) where \(\mathsf{r_{i}}, y_i\leftarrow _R \mathbb {Z}_q^*\), for \(i \in [0,t-3]\), and \(\mathsf{r_{t-2}} \leftarrow _R \mathbb {Z}_q^*\), \(y_{t-2} \leftarrow a\). Secret key-shares are set to \(\mathsf{sk_{i}} \leftarrow y_i, i \in [0,n-1]\). Since \(a\) is not known to \(\mathcal {C}\), \(\mathcal {A}\) sets the corrupted key-shares to be \(\mathsf{sk_{i}}\) for \(i \in [0, n_\mathcal {A}-1]\).
Access to Oracles \(\mathcal {C}\) simulates oracles \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\), \(\mathcal {O}_{\mathsf{H}_\mathsf{2}}\), \(\mathcal {O}_{\mathsf{Corrupt}}\) and \(\mathcal {O}_{\mathsf{DShare}}\) :
- \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\): to respond to \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\)-queries, \(\mathcal {C}\) maintains a list of tuples \(\{\mathsf{H_1}, v, h_{v}, \rho _{v}, \mathrm {c}_v\}\) as explained below. We refer to this list as \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) list, and it is initially empty. When \(\mathcal {A}\) submits an \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) query for \(v\), \(\mathcal {C}\) checks if \(v\) already appears in the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) list in a tuple \(\{v, h_{v}, \rho _{v}, \mathrm {c}_{v}\}\). If so, \(\mathcal {C}\) responds with \(\mathsf{H_1}(v) = h_{v}\). Otherwise, \(\mathcal {C}\) picks \(\rho _{v} \leftarrow _R \mathbb {Z}_q^*\), and flips a coin \(\mathrm {c}_{v}\); \(\mathrm {c}_{v}\) flips to \('0'\) with probability \(\delta \) for some \(\delta \) to be determined later. If \(\mathrm {c}_{v}\) equals \('0'\), \(\mathcal {C}\) responds \(\mathsf{H_1}(v) = h_{v} = g^{\rho _{v}}\) and stores \(\{v, h_{v}, \rho _{v}, \mathrm {c}_{v}\}\); otherwise, she returns \(\mathsf{H_1}(v) = h_{v} = B^{\rho _{v}}\) and stores \(\{v, h_{v}, \rho _{v}, \mathrm {c}_{v}\}\).
- \(\mathcal {O}_{\mathsf{H}_\mathsf{2}}\): The challenger \(\mathcal {C}\) responds to a newly submitted \(\mathcal {O}_{\mathsf{H}_\mathsf{2}}\) query for \(v\) with a randomly chosen \(h_{v} \in \mathbb {G}_{T} \). To be consistent in her \(\mathcal {O}_{\mathsf{H}_\mathsf{2}}\) responses, \(\mathcal {C}\) maintains the history of her responses in her local memory.
- \(\mathcal {O}_{\mathsf{Corrupt}}\): \(\mathcal {C}\) responds to a \(\mathcal {O}_{\mathsf{Corrupt}}\) query involving user \({\mathsf {U}}_i \in \mathsf {U}_\mathcal {A} \), by returning the coordinate \(y_i\) chosen in the Setup Phase.
- \(\mathcal {O}_{\mathsf{DShare}}\): simulation of \(\mathcal {O}_{\mathsf{DShare}}\) is performed as follows. As before, \(\mathcal {C}\) keeps track of the submitted \(\mathcal {O}_{\mathsf{DShare}}\) queries in her local memory. Let \(\langle m, {\mathsf {U}}_i \rangle \) be a decryption query submitted for message \(m\) and user identity \({\mathsf {U}}_i \). If there is no entry in \(\mathsf{H_1}\)-list for \(m\), then \(\mathcal {C}\) runs the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) algorithm for \(m\). Let \(\{m, h_{m}, \rho _{m}, \mathrm {c}_m\}\) be the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) entry in \(\mathcal {C}\)’s local memory for message \(m\). Let \(\mathrm {I_P} ' \leftarrow \mathrm {I_P} \setminus (\mathsf{r_{t-2}}, \mathsf{sk_{t-2}})\). \(\mathcal {C}\) responds with \({\mathsf {ds}}_{m,i} = \left( g^{\sum \limits _{(\mathsf{r_{j}}, \mathsf{sk_{j}}) \in \mathrm {I_P} '} \mathsf{sk_{j}} \lambda _{\mathsf{r_{i}},\mathsf{r_{j}}}^{\mathrm {I_P} '}} X^{\lambda _{\mathsf{r_{i}} ,\mathsf{r_{t-2}}}^{\mathrm {I_P} '}}\right) ^{\rho _m}\) where \(X \leftarrow A\) iff \(\mathrm {c}_m = 0\), and \(X \leftarrow W\) iff \(\mathrm {c}_m = 1\). In both cases, \(\mathcal {C}\) keeps a record of her response in her local memory.
Challenge Phase \(\mathcal {A}\) selects the challenge message \(m_*\). Let the corresponding entry in the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) list be \(\{m_*, h_{m_*}, \rho _{m_*}, \mathrm {c}_{m_*}\}\). If \(\mathrm {c}_{m_*} = 0\), then \(\mathcal {C}\) aborts.
Guessing Phase \(\mathcal {A}\) outputs one bit \({\mathsf {b}}'_{m_*}\) representing the guess for \({\mathsf {b}}_{m_*}\). \(\mathcal {C}\) responds positively to the DDH challenger if \({\mathsf {b}}_{m_*}'=0\), and negatively otherwise.

It is easy to see, that if \(\mathcal {A}\)’s answer is \('0'\), it means that the \(\mathcal {O}_{\mathsf{DShare}}\) responses for \(m_*\) constitute properly structured decryption shares for \(m_*\). This can only be if \(W = g^{ab}\) and \(\mathcal {C}\) can give a positive answer to the SXDH challenger. Clearly, if \(\mathrm {c}_{m_*} = 1\) and \(\mathrm {c}_{m} = 0\) for all other queries to \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) such that \(m \ne m_*\), the execution environment is indistinguishable from the actual game \(\mathsf{DS }_{\mu }\)-\(\mathsf{IND }\). This happens with probability \(\mathrm {Pr}[\mathrm {c}_{m_*} = 1\ \wedge \ (\forall m \ne m_*: \mathrm {c}_{m} = 0)] = \delta (1 - \delta )^{\mathcal {Q}_{H_1} - 1}, \) where \(\mathcal {Q}_{H_1}\) is the number of distinct \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) queries. By setting \(\delta \approx \frac{1}{\mathcal {Q}_{H_1} - 1}\) the above probability becomes greater than \(\frac{1}{e\cdot (\mathcal {Q}_{H_1}-1)}\) and the success probability of the adversary can be bounded as \(\mathsf {Adv}_{{{\mathsf{DS }_{\mu }}}-{\mathsf{IND }}}^{\mathcal {A}} \le e \cdot (\mathcal {Q}_{H_1}-1) \cdot \mathsf {Adv}_{\mathsf {SXDH}}^\mathcal {C} \).

Appendix B: Proof of Lemma 2

Challenger \(\mathcal {C}\) is given an instance \(\langle q'\), \(\mathbb {G}_1 ', \mathbb {G}_2 ', \mathbb {G}_{T} ', \hat{e}', g ', \bar{g} ', A=(g ')^a, B=(g ')^b, C=(g ')^c, \bar{A}=(\bar{g} ')^a, \bar{B}=(\bar{g} ')^b, \bar{C}=(\bar{g} ')^c, W \rangle \) of the SXDH problem and wishes to use \(\mathcal {A}\) to decide if \(W = \hat{e}\left( g ', \bar{g} ' \right) ^{abc}\). The algorithm \(\mathcal {C}\) simulates an environment in which \(\mathcal {A}\) operates, using its advantage in the game \(\mathsf{IND }_{\mu }\)-\(\mathsf{CPA }\) to help compute the solution to the BDDH problem, as described before. \(\mathcal {C}\) interacts with \(\mathcal {A}\) within an \(\mathsf{IND }_{\mu }\)-\(\mathsf{CPA }\) game:

Setup Phase \(\mathcal {C}\) sets \(q \leftarrow q'\), \(\mathbb {G}_1 \leftarrow \mathbb {G}_1 '\), \(\mathbb {G}_2 \leftarrow \mathbb {G}_2 '\), \(\mathbb {G}_{T} \leftarrow \mathbb {G}_{T} '\), \(\hat{e} = \hat{e}'\), \(g \leftarrow g '\), \(\bar{g} \leftarrow \bar{g} '\), \(\bar{g} _{pub} = \bar{A}\). Notice that the secret key \(\mathsf{sk}= a\) is not known to \(\mathcal {C}\). \(\mathcal {C}\) also generates the list of user identities \(\mathsf{U}\). \(\mathcal {C}\) sends \(\mathsf{pk}= \{q, \mathbb {G}_1, \mathbb {G}_2, \mathbb {G}_{T} \, \hat{e}, \mathcal {O}_{\mathsf{H}_\mathsf{1}}, \mathcal {O}_{\mathsf{H}_\mathsf{2}}, \bar{g}, \bar{g} _{pub}\}\) to \(\mathcal {A}\). At this point, \(\mathcal {A}\) declares the list of corrupted users \(\mathsf {U}_\mathcal {A} \) as in \(\mathsf{DS }_{\mu }\)-\(\mathsf{IND }\). Let \(\mathsf {U}_\mathcal {A} =\{{\mathsf {U}}_i \}_{i=0}^{n_\mathcal {A}-1}\). To generate key-shares \(\{\mathsf{sk_{i}}\}_{i=0}^{n-1}\), \(\mathcal {C}\) picks a \(t-1\) degree Lagrange polynomial \(\mathrm {P} ()\) assuming interpolation points \( \mathrm {I_P} = \left\{ (0, a)\ \cup \ \{(\mathsf{r_{i}}, y_{i})\}_{i=0}^{t-2} \right\} ,\) where \(\mathsf{r_{i}}, y_{i} \leftarrow _R \mathbb {Z}_q^*\). She then sets the key-shares to \(\mathsf{sk_{i}} \leftarrow y_i, i \in [0, n-1]\) and assigns \(\mathsf{sk_{i}}\) for \(i \in [0, n_\mathcal {A}-1]\) to corrupted users.
Access to Oracles \(\mathcal {C}\) simulates oracles \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\), \(\mathcal {O}_{\mathsf{H}_\mathsf{2}}\), \(\mathcal {O}_{\mathsf{Corrupt}}\) and \(\mathcal {O}_{\mathsf{DShare}}\) :
- \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\), \(\mathcal {O}_{\mathsf{H}_\mathsf{2}}\), \(\mathcal {O}_{\mathsf{Corrupt}}\): \(\mathcal {C}\) responds to these queries as in \(\mathsf{DS }_{\mu }\)-\(\mathsf{IND }\).
- \(\mathcal {O}_{\mathsf{DShare}}\): \(\mathcal {C}\) keeps track of the submitted \(\mathcal {O}_{\mathsf{DShare}}\) -queries in her local memory. Let \(\langle m, {\mathsf {U}}_i \rangle \) be a decryption query submitted for message \(m\) and user identity \({\mathsf {U}}_i \). If there is no entry in \(\mathsf{H_1}\)-list for \(m\), then \(\mathcal {C}\) runs the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) algorithm for \(m\). Let \(\{m, h_{m}, \rho _{m}, \mathrm {c}_m\}\) be the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) entry in \(\mathcal {C}\)’s local memory for message \(m\). If \(\mathrm {c}_m = 1\), and \(\mathcal {A}\) has already submitted \(t-n_{\mathcal {A}}-1\) queries for \(m\), \(\mathcal {C}\) aborts. If the limit of \(t-n_{\mathcal {A}}-1\) queries has not been reached, \(\mathcal {C}\) responds with a random \({\mathsf {ds}}_{m,i} \in \mathbb {G}_1 \) and keeps a record for it. From Lemma 1, this step is legitimate as long as less than \(t\) decryption shares are available for \(m\). Let \(\mathrm {I_P} ' \leftarrow \mathrm {I_P} \setminus (0, a)\). If \(\mathrm {c}_m = 0\), \(\mathcal {C}\) responds with \({\mathsf {ds}}_{m,i} = \left( g^{\sum \limits _{(\mathsf{r_{j}}, \mathsf{sk_{j}}) \in \mathrm {I_P} '} \mathsf{sk_{j}} \lambda _{\mathsf{r_{i}},\mathsf{r_{j}}}^{\mathrm {I_P} '}} A^{\lambda _{\mathsf{r_{i}},0}^{\mathrm {I_P} '}}\right) ^{\mathsf{r_{m}}}\).
Challenge Phase \(\mathcal {A}\) submits \(m_*\) to \(\mathcal {C}\). \(\mathcal {A}\) has not submitted \(\mathcal {O}_{\mathsf{DShare}}\) -queries for the challenge message with more than \(t-n_{\mathcal {A}}-1\) distinct user identities. Next, \(\mathcal {C}\) runs the algorithm for responding to \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\)-queries for \(m_*\) to recover the entry from the \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\)-list. Let the entry be \(\{m_*, h_{m_*}, \rho _{m_*}, \mathrm {c}_{m_*}\}\). If \(\mathrm {c}_{m_*} = 0\), \(\mathcal {C}\) aborts. Otherwise, \(\mathcal {C}\) computes \(e_{*} \leftarrow W^{\rho _{m_*}}\), sets \(c_{*} \leftarrow \langle m_* \oplus \mathsf{H_2}(e_{*}), \bar{C}\rangle \) and returns \(c_{*}\) to \(\mathcal {A}\).
Guessing Ph. \(\mathcal {A}\) outputs the guess \({\mathsf {b}}'\) for \({\mathsf {b}}\). \(\mathcal {C}\) provides \({\mathsf {b}}'\) for its SXDH challenge.

If \({\mathcal {A}}\)’s answer is \({\mathsf {b}}' = 1\), it means that she has recognized the ciphertext \(c_*\) as the encryption of \(m_*\); \(\mathcal {C}\) can then give the positive answer to her SXDH challenge. Indeed, \( W^{\rho _{m_*}} = \hat{e}\left( g, \bar{g} \right) ^{abc\rho _{m_*}} = \hat{e}\left( (B^{\rho _{m_*}})^a, \bar{g} ^c \right) = \hat{e}\left( H_1(m_*)^{\mathsf{sk}}, \bar{C} \right) . \) Clearly, if \(\mathrm {c}_{m_*} = 1\) and \(\mathrm {c}_{m} = 0\) for all other queries to \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\) such that \(m \ne m_*\), then the execution environment is indistinguishable from the actual game \(\mathsf{IND }_{\mu }\)-\(\mathsf{CPA }\). This happens with probability \(\mathrm {Pr}[\mathrm {c}_{m_*} = 1\ \wedge \ (\forall m \ne m_*: \mathrm {c}_{m} = 0)] = \delta (1 - \delta )^{\mathcal {Q}_{H_1} - 1}\), where \(\mathcal {Q}_{H_1}\) is the number of different \(\mathcal {O}_{\mathsf{H}_\mathsf{1}}\)-queries. By setting \(\delta \approx \frac{1}{\mathcal {Q}_{H_1} - 1}\), the above probability becomes greater than \(\frac{1}{e\cdot (\mathcal {Q}_{H_1}-1)}\), and the success probability of the adversary \(\mathsf {Adv}_{{{\mathsf{IND }_{\mu }}}-{\mathsf{CPA }}}^{\mathcal {A}}\) is bounded as \(\mathsf {Adv}_{{{\mathsf{IND }_{\mu }}}-{\mathsf{CPA }}}^{\mathcal {A}} \le e \cdot (\mathcal {Q}_{H_1}-1) \cdot \mathsf {Adv}_{\mathsf {SXDH}}^\mathcal {C} \).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stanek, J., Sorniotti, A., Androulaki, E., Kencl, L. (2014). A Secure Data Deduplication Scheme for Cloud Storage. In: Christin, N., Safavi-Naini, R. (eds) Financial Cryptography and Data Security. FC 2014. Lecture Notes in Computer Science(), vol 8437. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45472-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-662-45472-5_8
Published: 09 November 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45471-8
Online ISBN: 978-3-662-45472-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics