1 Introduction

Since Proofs of Retrievability (POR [22]) and Provable Data Possession (PDP [4]) are proposed in 2007, a lot of effort of research community has been devoted to constructing proofs of storage schemes with more advanced features. The new features include, public key verifiability [30], supporting dynamic operations [10, 16, 35] (i.e. inserting/deleting/editing a data block), supporting multiple cloud servers [13], privacy-preserving against auditor [42], and supporting data sharing [37], etc.

1.1 Drawbacks of Publicly Verifiable Proofs of Storage

Expensive Setup Preprocessing. We look back into the very first feature—public verifiability, and observe that all existing publicly verifiable POS schemes suffer from serious drawbacks: (1) Merkle Hash Tree based method is not disk IO-efficient and not even a sub-linear memory authenticator [24]: Every bit of the file has to be accessed by the cloud storage server in each remote integrity auditing process. (2) By our knowledge, all other publicly verifiable POS schemes employ a lot of expensive operation (e.g. group exponentiation) to generate authentication tags for data blocks. As a result, it is prohibitively expensive to generate authentication tags for medium or large size data file. For example, Wang et al. [38] achieves throughput of data pre-processing (i.e. generating authentication tag) at speed 17.2 KB/s with an Intel Core 2 1.86 GHz workstation CPU, which means it will take about 17 h to generate authentication tags for a 1 GB file. Even if the user has a CPU with 8 cores, it still requires more than 2 h heavy computation. Such amount of heavy computation is not appropriate for a laptop, not to mention tablet computer (e.g. iPad) or smart phone. It might be weird to tell users that, mobile device should be only used to verify or download data file stored in cloud storage server, and should not be used to upload (and thus pre-process) data file to cloud. Unless a formal lower bound is proved and shows that existing study of POS has reach optimality, it is the responsibility of our researchers to make both pre-processing and verification of (third party verifiable) POS practically efficient, although the existing works have already reached good amortized complexity. In this paper, we make our effort towards this direction, improving pre-processing speed by several hundred times without sacrificing efficiency on other aspects.

In many publicly verifiable POS (POR/PDP) scheme (e.g. [4, 30, 38, 42]), publicly verifiable authentication tag function, which is a variant of signing algorithm in a digital signature scheme, is applied directly over every block of a large user data. This is one of few application scenarios that a public key cryptography primitive is directly applied over large user data. In contrast, (1) public key encryption scheme is typically employed to encrypt a short symmetric cipher key, and the more efficient symmetric cipher (e.g. AES) will encrypt the user data; (2) digital signature scheme is typically applied over a short hash digest of large user data, where the hash function (e.g. SHA256) is much more efficient (in term of throughput) than digital signature signing algorithm.

Lack of Control on Auditing. The benefit of publicly verifiable POS schemes is that, anyone with the public key can audit the integrity of data in cloud storage, to relieve the burden from the data owner. However, one should not allow any third party to audit his/her data at their will, and delegation of auditing task should be in a controlled and organized manner. Otherwise, we cannot prevent extreme cases: (1) on one hand, some data file could attract too much attention from public, and are audited unnecessarily too frequently by the public, which might actually result in distributed denial of service attack against the cloud storage server; (2) on the other hand, some unpopular data file may be audited by the public too rarely, so that the possible data loss event might be detected and alerted to the data owner too late and no effective countermeasure can be done to reduce the damage at that time.

1.2 Existing Approaches to Mitigate Drawbacks

Outsourcing Expensive Operations. To reduce the computation burden on data owner for preprocessing in setup phase, the data owner could outsource expensive operations (e.g. group exponentiation) to some cloud computing server during authentication tag generation, by using existing techniques (e.g. [12, 21]) as black-box, and verify the computation result.

However, this approach just shifts the computation burden from data owner to cloud storage server, instead of reducing the amount of expensive operations. Furthermore, considering the data owner and cloud computing server as a whole system, much more cost in network communication and computation will be incurred: (1) uploading (possibly transformed) data file, to the cloud computing server, and downloading computation results from the cloud computing server; (2) extra computation cost on both data owner side and cloud computing server side, in order to allow data owner to verify the computation result returned by the cloud computing server and maintain data privacy against cloud computing server.

One may argue that it could save much of the above cost, if the outsourcing of expensive operations and proofs of storage scheme are integrated together and letting cloud storage server takes the role of cloud computing server. But in this case, simple black-box combination of existing proofs of storage scheme and existing privacy-preserving and verifiable outsource scheme for expensive operations, may not work. Thus, a new sophisticated proofs of storage scheme is required to be constructed following this approach, which remains an open problem.

Dual Instantiations of Privately Verifiable Proof of Storage. The data owner could independently apply an existing privately verifiable POS scheme over an input file twice, in order to generate two key pairs and two authentication tags per each data block, where one key pair and authentication tag (per data block) will be utilized by data owner to perform data integrity check, and the other key pair and authentication tag (per data block) will be utilized by auditor to perform data integrity check, using the interactive proof algorithm in the privately verifiable POS scheme. The limitation of this approach is that, in order to add an extra auditor or switch the auditor, the data owner has to download the whole data file to refresh the key pair and authentication tags for auditor.

Recently, [2] gave an alternative solution. The data owner runs privately verifiable POS scheme (i.e. Shacham-Water’s scheme [30] as in [2]) over a data file to get a key pair and authentication tag per each data block, and uploads the data file together with newly generated authentication tags to cloud storage server. Next, the auditor downloads the whole file from cloud storage server, and independently runs the same privately verifiable POS scheme over the downloaded file, to get another key pair and another set of authentication tags. The auditor uploads these authentication tags to cloud storage server. For each challenge query provided by the auditor, the cloud storage server will compute two responses, where one is upon data owner’s authentication tags and the other is upon auditor’s authentication tags. Then the auditor can verify the response generated upon his/her authentication tags, and keeps the other response available for data owner.

Since [2] aims to resolve possible framing attack among the data owner, cloud storage server and auditor, all communication messages are digitally signed by senders, and the auditor has to prove to the data owner that, his/her authentication tags are generated correctly, where this proof method is very expensive, and comparable to tag generation complexity of publicly verifiable POS scheme (e.g. [4, 30, 38, 42]). Furthermore, in this scheme, in the case of revoking or adding an auditor, the new auditor has to download the whole file, then compute authentication tags, and prove that these tags are correctly generated to the data owner.

We remark that our early version of this work appeared as a private internal technique report in early 2014, before [2] became available to public.

Program Obfuscation. Very recently, [19] proposed to construct publicly verifiable POR from privately verifiable POR using indistinguishability obfuscation technique [17]. This obfuscation technique is able to embed the data owner’s secret key in a verifier program, in a way such that it is hard to recover the secret key from the obfuscated verifier program. Therefore, this obfuscated verifier program could be treated as public key and given to the auditor to perform data integrity check. However, both [17, 19] admit that indistinguishability obfuscation is currently impractical. Particularly, [1] implements the scheme of [17] and shows that, it requires about 9 h to obfuscate a simple function which contains just 15 AND gates, and resulted obfuscated program has size 31.1 GB. Furthermore, it requires around 3.3 h to evaluate the obfuscated program on a single input.

1.3 Our Approach

To address the issues of existing publicly verifiable POS schemes, we propose a hybrid POS scheme, which on one hand supports delegation of data auditing task and switching/adding/revoking an auditor, like publicly verifiable POS schemes, and on the other hand is as efficient as a privately verifiable POS scheme.

Unlike in publicly verifiable POS scheme, the data owner could delegate the auditing task to some semi-trusted third party auditor, and this auditor is responsible to audit the data stored in cloud storage on behalf of the data owner, in a controlled way and with proper frequency. We call such an exclusive auditor as Owner-Delegated-Auditor or ODA for short. In real world applications, ODA could be another server that provides free or paid auditing service to many cloud users.

Our bottom line is that, even if all auditors colluded with the dishonest cloud storage server, our formulation and scheme should guarantee that the data owner still retains the capability to perform POR auditing by herself.

Table 1. Performance Comparison of Proofs of Storage (POR,PDP) Schemes. In this table, publicly verifiable POS schemes appear above our scheme, and privately verifiable POS schemes appear below our scheme.

Overview of Our Scheme. Our scheme generates two pairs of public/private keys: (pksk) and (vpkvsk). The verification public/private key pair (vpkvsk) is delegated to the ODA. Our scheme proposes a novel linear homomorphic authentication tag function [5], which is extremely lightweight, without any expensive operations (e.g. group exponentiation or bilinear map). Our tag function generates two tags \((\sigma _i, t_i)\) for each data block, where tag \(\sigma _i\) is generated in a way similar to Shacham and Waters’ privately verifiable POR scheme [30], and tag \(t_i\) is generated in a completely new way. Each of tag \(\sigma _i\) and tag \(t_i\) is of length equal to 1 / m-fraction of length of a data block, where the data block is treated as a vector of dimension m Footnote 1. ODA is able to verify data integrity remotely by checking consistency among the data blocks and both tags \(\{ (\sigma _i, t_i) \}\) that are stored in the cloud storage server, using the verification secret key vsk. The data owner retains the capability to verify data integrity by checking consistency between the data blocks and tags \(\{ \sigma _i \}\), using the master secret key sk. When an ODA is revoked and replaced by a new ODA, data owner will update all authentication tags \(\{ t_i \}\) and the verification key pair (vpkvsk) without downloading the data file from cloud, but keep tags \(\{ \sigma _i \}\) and master key pair (pksk) unchanged.

Furthermore, we customize the polynomial commitment scheme proposed by Kate et al. [23] and integrate it into our homomorphic authentication tag scheme, in order to reduce proof size from O(m) to O(1).

1.4 Contributions

Our main contributions can be summarized as below:

  • We propose a new formulation called “Delegatable Proofs of Storage” (\(\mathsf {DPOS}\)), as a relaxed variant of publicly verifiable POS. Our formulation allows data owner to delegate auditing task to a third party auditor, and meanwhile retains the capability to perform audit task by herself, even if the auditor colluded with the cloud storage server. Our formulation also supports revoking and switching auditors efficiently.

  • We design a new scheme under this formulation. Our scheme is as efficient as privately verifiable POS: The tag generation throughput is slightly larger than 10 MB/s per CPU core on a mobile CPU released in Year 2008. On the other side, our scheme allows delegation of auditing task to a semi-trusted third party auditor, and also supports switching and revoking an auditor at any time, like a publicly verifiable POS scheme. We compare the performance complexity of our scheme with the state of arts in Table 1, and experiment shows the tag generation speed of our scheme is more than hundred times faster than the state of art of publicly verifiable POS schemes.

  • We prove that our scheme is sound (Theorems 1 and 2) under Bilinear Strong Diffie-Hellman Assumption in standard model.

2 Related Work

Recently, much growing attention has been paid to integrity check of data stored at untrusted servers [36, 8, 9, 11, 1316, 20, 22, 2833, 3645, 4753]. In CCS’07, Ateniese et al. [4] defined the provable data possession (PDP) model and proposed the first publicly verifiable PDP scheme. Their scheme used RSA-based homomorphic authenticators and sampled a number of data blocks rather than the whole data file to audit the outsourced data, which can reduce the communication complexity significantly. However, in their scheme, a linear combination of sampled blocks are exposed to the third party auditor (TPA) at each auditing, which may leak the data information to the TPA. At the meantime, Juels and Kaliski [22] described a similar but stronger model: proof of retrievability (POR), which enables auditing of not only the integrity but also the retrievability of remote data files by employing spot-checking and error-correcting codes. Nevertheless, their proposed scheme allows for only a bounded number of auditing services and does not support public verification.

Shacham and Waters [30, 31] proposed two POR schemes, where one is private key verifiable and the other is public key verifiable, and gave a rigorous proof of security under the POR model [22]. Similar to [4], their scheme utilized homomorphic authenticators built from BLS signatures [7]. Subsequently, Zeng et al. [51], Wang et al. [43, 44] proposed some similar constructions for publicly verifiable remote data integrity check, which adopted the BLS based homomorphic authenticators. With the same reason as [4], these protocols do not support data privacy. In [38, 42], Wang et al. extended their scheme to be privacy preserving. The idea is to mask the linear combination of sampled blocks in the server’s response with some random value. With the similar masking technique, Zhu et al. [53] introduced another privacy-preserving public auditing scheme. Later, Hao et al. [20] and Yang et al. [49] proposed two privacy-preserving public auditing schemes without applying the masking technique. Yuan et al. [50] gave a POR scheme with public verifiability and constant communication cost. Ren [26] designed mutual verifiable public POS application.

However, by our knowledge, all of the publicly verifiable PDP/POR protocols require to do a large amount of computation of exponentiation on big numbers for generating the authentication tags upon preprocessing the data file. This makes these schemes impractical for file of medium or large size, especially limiting the usage on mobile devices.

Although delegable POS has been studied by [25, 27, 34], unfortunately these works have the same drawback with public POS, i.e., the cost of tag generation is extremely high.

3 Formulation

We propose a formulation called “Delegatable Proofs of Storage” scheme (\(\mathsf {DPOS}\) for short), based on existing POR [22, 30] and PDP [4] formulations. We provide the system model in Sect. 3.1 and the trust model in Sect. 3.2. We will defer the security definition to Sect. 5, where the security analysis of our scheme will be provided.

3.1 System Model

Definition 1

A Delegatable Proofs of Storage (\(\mathsf {DPOS}\)) scheme consists of algorithms (\(\mathsf {KeyGen}\), \(\mathsf {Tag}\), \(\mathsf {UpdVK}\), \(\mathsf {OwnerVerify})\), and a pair of interactive algorithms \(\langle \mathsf {P}, \mathsf {V} \rangle \), where each algorithm is described as below

  • \({\mathsf {KeyGen}(1^\lambda )\rightarrow (pk,sk, vpk, vsk)}:\) Given a security parameter \(1^\lambda \), this randomized key generating algorithm generates a pair of public/private master keys (pksk) and a pair of public/private verification keys (vpkvsk).

  • \(\mathsf {Tag}(sk, vsk, F) \rightarrow (\mathtt {Param}_F, \{ (\sigma _i, t_i) \}):\) Given the master secret key sk, the verification secret key vsk, and a data file F as input, the tag algorithm generates a file parameter \(\mathtt {Param}_F\) and authentication tags \(\{ (\sigma _i, t_i) \}\), where a unique file identifier \(\mathtt {id}_F\) is a part of \(\mathtt {Param}_F\).

  • \(\mathsf {UpdVK}(vpk, vsk, \{t_i\}) \rightarrow (vpk', vsk', \{t_i'\}) :\) Given the current verification key pair (vpkvsk) and the current authentication tags \(\{t_i\}\), this updating algorithm generates the new verification key pair \((vpk',vsk')\) and the new authentication tags \(\{ t_i'\}\).

  • \(\langle \mathsf {P}(pk, vpk, \{ (\vec {\varvec{F}}_i, \sigma _i, t_i) \}_{i})\),\(\mathsf {V}(\) \(vsk\),vpk,pk,\(\mathtt {Param}_F)\) \(\rangle \) \(\rightarrow \) \((b, {\mathtt {Context}}, {\mathtt {Evidence}})\): The verifier algorithm \(\mathsf {V}\) interacts with the prover algorithm \(\mathsf {P}\) to output a decision bit \(b \in \{ {\mathtt {1}}, {\mathtt {0}}\}\), \(\mathtt {Context}\) and \(\mathtt {Evidence}\), where the input of \(\mathsf {P}\) consists of the master public key pk, the verification public key vpk, and file blocks \(\{ \vec {\varvec{F}}_i \}\) and authentication tags \(\{ \sigma _i, t_i \}\), and the input of \(\mathsf {V}\) consists of the verification secret key vsk, verification public key vpk, master public key pk, and file information \(\mathtt {Param}_F\).

  • \(\mathsf {OwnerVerify}(sk, pk, {\mathtt {Context}}, {\mathtt {Evidence}}\), \(\mathtt {Param}_F))\) \(\rightarrow \) \((b_0, b_1):\) The owner verifier algorithm \(\mathsf {\mathsf {OwnerVerify}}\) takes as input the master key pair (skpk) and \(\mathtt {Context}\) and \(\mathtt {Evidence}\), and outputs two decision bits \(b_0, b_1 \in \{ 0, 1 \}\), where \(b_0\) indicates accepting or rejecting the storage server, and \(b_1\) indicates accepting or rejecting the ODA.

Fig. 1.
figure 1

Illustration of system model of \(\mathsf {DPOS}\).

A \(\mathsf {DPOS}\) system is described as below and illustrated in Fig. 1(a) and (b).

Definition 2

A \(\mathsf {DPOS}\) system among three parties—data owner, cloud storage server and auditor, can be implemented by running a \(\mathsf {DPOS}\) scheme (\(\mathsf {KeyGen}\), \(\mathsf {Tag}\), \(\mathsf {UpdVK}\),\(\langle \mathsf {P}, \mathsf {V} \rangle , {\mathsf {\mathsf {OwnerVerify}}})\) in the following three phases, where the setup phase will execute at the very beginning, for only once (for one file); the proof phase and revoke phase can execute for multiple times and in any (interleaved) order.

Setup phase. The data owner runs the key generating algorithm \(\mathsf {KeyGen}(1^{\lambda })\) for only once across all files, to generate the per-user master key pair (pksk) and the verification key pair (vpkvsk). For every input data file, the data owner runs the tag algorithm \(\mathsf {Tag}\) over the (possibly erasure encoded) file, to generate authentication tags \(\{ (\sigma _i, t_i) \}\) and file parameter \(\mathtt {Param}_F\). At the end of setup phase, the data owner sends the file F, all authentication tags \(\{ (\sigma _i, t_i )\}\), file parameter \(\mathtt {Param}_F\), and public keys (pkvpk) to the cloud storage server. The data owner also chooses an exclusive third party auditor, called Owner-Delegated-Auditor (ODA, for short), and delegates the verification key pair (vpkvsk) and file parameter \(\mathtt {Param}_F\) to the ODA. After that, the data owner may keep only keys (pkskvpkvsk) and file parameter \(\mathtt {Param}_F\) in local storage, and delete everything else from local storage.

Proof phase. The proof phase consists of multiple proof sessions. In each proof session, the ODA, who runs algorithm \(\mathsf {V}\), interacts with the cloud storage server, who runs algorithm \(\mathsf {P}\), to audit the integrity of data owner’s file, on behalf of the data owner. Therefore, ODA is also called verifier and cloud storage server is also called prover. ODA will also keep all outputs of algorithm \(\mathsf {V}\), i.e. tuples (b, \(\mathtt {Context}\), \(\mathtt {Evidence}\)), and allow data owner to fetch and verify these tuples using algorithm \(\mathsf {\mathsf {OwnerVerify}}\) at any time.

Revoke phase. In the revoke phase, the data owner downloads all tags \(\{ t_i \}\) from cloud storage server, revokes the current verification key pair, and generates a fresh verification key pair and new tags \(\{ t_i' \}\), by running algorithm \(\mathsf {UpdVK}\). The data owner also chooses a new ODA, and delegates the new verification key pair to this new ODA, and sends the updated tags \(\{ t_i' \}\) to the cloud storage server to replace the old tags \(\{ t_i \}\).

Definition 3

(Completeness). A \(\mathsf {DPOS}\) scheme (\(\mathsf {KeyGen}\), \(\mathsf {Tag}\), \(\mathsf {UpdVK}\), \(\langle \mathsf {P}, \mathsf {V} \rangle \), \(\mathsf {\mathsf {OwnerVerify}}\)) is complete, if the following condition holds: For any keys (pk,sk, vpk,vsk) generated by \({\mathsf {KeyGen}}\), for any file F, if all parties follow our scheme exactly and the data stored in cloud storage is intact, then interactive proof algorithms \(\langle \mathsf {P}, \mathsf {V} \rangle \) will always output \(({\mathtt {1}}, \ldots )\) and \(\mathsf {\mathsf {OwnerVerify}}\) algorithm will always output \(({\mathtt {1}}, {\mathtt {1}})\).

3.2 Trust Model

In this paper, we aim to protect data integrity of data owner’s file. The data owner is fully trusted, and the cloud storage server and ODA are semi-trusted in different sense: (1) The cloud storage server is trusted in maintaining service availability and is not trusted in maintaining data integrity (e.g. the server might delete some rarely accessed data for economic benefits, or hide the data corruption events caused by server failures or attacks to maintain reputation). (2) Before he/she is revoked, the ODA is trusted in performing the delegated auditing task and protecting his/her verification secret key securely. A revoked ODA could be potentially malicious and might surrender his/her verification secret key to the cloud storage server.

4 Our Proposed Scheme

4.1 Preliminaries

Let \(\mathbb {G}\) and \(\mathbb {G}_T\) be two multiplicative cyclic groups of prime order p. Let g be a randomly chosen generator of group \(\mathbb {G}\). Let \(e:\mathbb {G} \times \mathbb {G} \rightarrow \mathbb {G}_T\) be a non-degenerate and efficiently computable bilinear map. For vector \(\vec {\varvec{a}} = (a_1, \ldots , a_m) \) and \(\vec {\varvec{b}} = (b_1, \ldots , b_m) \), the notation denotes the dot product (a.k.a inner product) of the two vectors \(\vec {\varvec{a}}\) and \(\vec {\varvec{b}}\). For vector \(\vec {\varvec{v}}= (v_0, \ldots , v_{m-1})\) the notation denotes the polynomial in variable x with \(\vec {\varvec{v}}\) being the coefficient vector.

4.2 Construction of the Proposed \(\mathsf {DPOS}\) Scheme

We define our DPOS scheme \((\mathsf {KeyGen}\), \(\mathsf {Tag}\), \(\mathsf {UpdVK}\), \(\langle \mathsf {P}, \mathsf {V} \rangle \), \({\mathsf {\mathsf {OwnerVerify}}})\) as below, and these algorithms will run in the way as specified in Definition 2. We remind that in the following description of algorithms, some equations have inline explanation highlighted in box, which is not a part of algorithm procedures but could be useful to understand the correctness of our algorithms.

Choose at random a \(\lambda \)-bits prime p and a bilinear map \(e: \mathbb {G} \times \mathbb {G} \rightarrow \mathbb {G}_T\), where \(\mathbb {G}\) and \(\mathbb {G}_T\) are both multiplicative cyclic groups of prime order p. Choose at random a generator \(g \in \mathbb {G}\). Choose at random \(\alpha , \gamma , \rho \in _R \mathbb {Z}_p^{*}\), and \((\beta _1\), \(\beta _2\), \(\ldots \), \(\beta _m)\) \(\in _R \left( \mathbb {Z}_{p} \right) ^{m}\). For each \(j \in [1,m]\), define \(\alpha _j := \alpha ^j \mod p\), and compute \(g_j := g^{\alpha _j},\ h_j := g^{\rho \cdot \beta _j}.\) Let \(\alpha _0 := 1\), \(\beta _0 := 1\), \(g_0 = g^{\alpha ^{0}} = g\), \(h_0 = g^{\rho }\), vector \(\vec {\varvec{\alpha }} := \) \((\alpha _1\), \(\alpha _2\), \(\ldots \), \(\alpha _m)\), and \(\vec {\varvec{\beta }} := (\beta _1, \beta _2, \ldots , \beta _m)\). Choose two random seeds \(s_0, s_1\) for pseudorandom function \(\mathcal {PRF}_\mathtt{seed}: \{0,1\}^{\lambda } \times \mathbb {N} \rightarrow \mathbb {Z}_p\).

The secret key is \(sk=(\alpha , \vec {\varvec{\beta }}, s_0)\) and the public key is \(pk=(g_0, g_1, \ldots , g_m)\). The verification secret key is \(vsk= (\rho , \gamma , s_1)\) and the verification public key is \(vpk = (h_0, h_1, \ldots , h_m)\).

Split fileFootnote 2 F into n blocks, where each block is treated as a vector of m elements from \(\mathbb {Z}_p\): \(\{ \vec {\varvec{F}}_i=(F_{i,0},\ldots ,F_{i,m-1}) \in \mathbb {Z}_p^m \}_{i \in [0, n-1]}\). Choose a unique identifier \(\mathtt {id}_F \in \{0,1\}^\lambda \) for file F. Define a customizedFootnote 3 pseudorandom function w.r.t. the file F: \({{\mathcal {R}}}_{s}(i) = \mathcal {PRF}_{s}(\mathtt {id}_F, i)\).

For each block \(\vec {\varvec{F_i}}\), \(0 \le i\le n-1\), compute

$$\begin{aligned}&\sigma _i := \left\langle \vec {\varvec{\alpha }} ,\ \vec {\varvec{F_i}} \right\rangle + {{\mathcal {R}}}_{s_0}(i) \ \ \boxed { = \alpha \cdot {\mathsf {Poly}}_{\vec {\varvec{F_i}}}(\alpha ) + {{\mathcal {R}}}_{s_0}(i) \mod p }&\end{aligned}$$
(1)
$$\begin{aligned}&t_i := \rho \left\langle \vec {\varvec{\beta }} ,\ \vec {\varvec{F_i}} \right\rangle + \gamma {{\mathcal {R}}}_{s_0}(i) + {{\mathcal {R}}}_{s_1}(i) \mod p&\end{aligned}$$
(2)

The general information of F is \(\mathtt {Param}_F := (\mathtt {id}_F, n)\).

Parse vpk as \((h_0, \ldots , h_m)\) and vsk as \((\rho , \gamma , s_1)\). Verify the integrity of all tags \(\{ t_i \}\) (We will discuss how to do this verification later), and abort if the verification fails. Choose at random \(\gamma ' \in _R \mathbb {Z}_p^{*}\) and choose a random seed \(s_1'\) for pseudorandom function \({{\mathcal {R}}}\). For each \(j \in [0,m]\), compute \(h_j' := h_j^{\gamma '} = g^{\left( \rho \cdot \gamma ' \right) \cdot \beta _j} \ \in \mathbb {G}.\) For each \(i \in [0, n-1]\), compute a new authentication tag

$$\begin{aligned} t_i' :=&\gamma ' \left( t_i - {{\mathcal {R}}}_{s_1}(i) \right) + {{\mathcal {R}}}_{s_1'}(i) \mod p.\\ =&\ \ \boxed { \gamma ' \cdot \rho \left\langle \vec {\varvec{\beta }} ,\ \vec {\varvec{F_i}} \right\rangle + \left( \gamma ' \cdot \gamma \right) {{\mathcal {R}}}_{s_0}(i) + {{\mathcal {R}}}_{s_1'}(i) \mod p } \end{aligned}$$

The new verification public key is \(vpk' := (h_0', \ldots , h_m')\) and the new verification secret key is \(vsk' := (\gamma ' \cdot \rho ,\ \gamma ' \cdot \gamma ,\ s_1')\).

V1: Verifier parses \(\mathtt {Param}_F\) as \((\mathtt {id}_F, n)\). Verifier chooses a random subset \(\mathbf {C} = \{ i_1, i_2, \ldots , i_c \} \subset [0, n-1]\) of size c, where \(i_1< i_2< \ldots < i_c\). Choose at random \(w,\xi \in _R \mathbb {Z}_p^{*}\), and compute \(w_{i_\iota } := w^{\iota } \mod p\) for each \(\iota \in [1, c]\). Verifier sends \((\mathtt {id}_F, \{ (i, w_i): i \in \mathbf {C} \}, \xi )\) to the prover to initiate a proof session.

P1: Prover finds the file and tags \(\{ (\vec {\varvec{F}}_i, \sigma _i, t_i) \}_i\) corresponding to \(\mathtt {id}_F\). Prover computes \(\vec {\varvec{\mathcal {F}}} \in \mathbb {Z}_p^{m}\), and \(\bar{\sigma }, \bar{t} \in \mathbb {Z}_p\) as below.

$$\begin{aligned} \vec {\varvec{\mathcal {F}}} :=&\left( \sum _{i \in \mathbf {C}} w_i \vec {\varvec{F}}_i \right) \ \mod p; \end{aligned}$$
(3)
$$\begin{aligned} \bar{\sigma } :=&\left( \sum _{i \in \mathbf {C}} w_i \sigma _i \right) \ \mod p; \end{aligned}$$
(4)
$$\begin{aligned} \bar{t} :=&\left( \sum _{i \in \mathbf {C}} w_i t_i \right) \ \mod p. \end{aligned}$$
(5)

Evaluate polynomial \({\mathsf {Poly}}_{\vec {\varvec{\mathcal {F}}}}(x)\) at point \(x=\xi \) to obtain \(z:={\mathsf {Poly}}_{\vec {\varvec{\mathcal {F}}}}(\xi ) \mod p\). Divide the polynomial (in variable x) \({\mathsf {Poly}}_{\vec {\varvec{\mathcal {F}}}}(x) - {\mathsf {Poly}}_{\vec {\varvec{\mathcal {F}}}}(\xi )\) with \((x-\xi )\) using polynomial long division, and denote the coefficient vector of resulting quotient polynomial as \(\vec {\varvec{v}} = (v_0, \ldots v_{m-2})\), that is, \({\mathsf {Poly}}_{\vec {\varvec{v}}}(x) \equiv \frac{{\mathsf {Poly}}_{\vec {\varvec{\mathcal {F}}}}(x) - {\mathsf {Poly}}_{\vec {\varvec{\mathcal {F}}}}(\xi )}{x-\xi } \mod p\). (\(Note:~(x-\xi )~can~divide~polynomial\)  \({\mathsf {Poly}}_{\vec {\varvec{\mathcal {F}}}}(x) - {\mathsf {Poly}}_{\vec {\varvec{\mathcal {F}}}}(\xi )\) perfectly, since the latter polynomial evaluates to 0 at point \(x=\xi \).)

Compute \((\psi _{\alpha }, \psi _{\beta }, \phi _{\alpha }) \in \mathbb {G}^{3}\) as below

$$\begin{aligned}&\psi _{\alpha } := \prod \limits _{j=0}^{m-1} g_j^{\vec {\varvec{\mathcal {F}}}[j]} \ \ \boxed {= \prod \limits _{j=0}^{m-1} \left( g^{\alpha ^j} \right) ^{\vec {\varvec{\mathcal {F}}}[j]} = g^{ {\mathsf {Poly}}_{\vec {\varvec{\mathcal {F}}}}(\alpha ) } ; }&\end{aligned}$$
(6)
$$\begin{aligned}&\psi _{\beta } := \prod \limits _{j=0}^{m-1} h_{j+1}^{\vec {\varvec{\mathcal {F}}}[j]} \ \ \boxed { = \prod \limits _{j=0}^{m-1} \left( g^{\rho \cdot \beta _{j+1}} \right) ^{\vec {\varvec{\mathcal {F}}}[j]} = g^{ \rho \left\langle \vec {\varvec{\beta }} ,\ \vec {\varvec{\mathcal {F}}} \right\rangle } ; }&\end{aligned}$$
(7)
$$\begin{aligned}&\phi _{\alpha } := \prod \limits _{j=0}^{m-2} g_j^{v_j} \ \ \boxed {= \prod \limits _{j=0}^{m-2} \left( g^{\alpha ^j} \right) ^{v_j} = g^{ {\mathsf {Poly}}_{\vec {\varvec{v}}}(\alpha ) } . }&\end{aligned}$$
(8)

Prover sends \((z,\phi _{\alpha }, \bar{\sigma }, \bar{t}, \psi _{\alpha }, \psi _{\beta })\) to the verifier.

V2: Let \({\mathtt {Context}} := (\xi , \{ (i, w_i): i \in \mathbf {C} \})\) and \({\mathtt {Evidence}} := (z\), \(\phi _{\alpha }\), \(\bar{\sigma })\). Verifier sets \(b:={\mathtt {1}}\) if the following equalities hold and sets \(b:={\mathtt {0}}\) otherwise.

$$\begin{aligned} e(\psi _{\alpha }, g)&\mathop {=}\limits ^{?}\ e(\phi _{\alpha },\ g^{\alpha }/g^{\xi }) \cdot e(g,g)^{z} \end{aligned}$$
(9)
$$\begin{aligned} \left( \frac{e(\psi _{\alpha },\ g^{\alpha }) }{e\left( g,\ g^{\bar{\sigma }} \right) } \right) ^\gamma&\mathop {=}\limits ^{?}\ \frac{e(\psi _{\beta },\ g)}{e\left( g,\ g^{\bar{t}} \cdot g^{ -\sum \limits _{i\in \mathbf {C}} w_i {{\mathcal {R}}}_{s_1}(i)} \right) } \end{aligned}$$
(10)

Output \((b, {\mathtt {Context}}, {\mathtt {Evidence}})\).

Parse \(\mathtt {Context}\) as \((\xi , \{ (i, w_i): i \in \mathbf {C} \})\) and parse \(\mathtt {Evidence}\) as \((z, \phi _{\alpha }, \bar{\sigma })\). Verifier will set \(b_0 := {\mathtt {1}}\) if the following equality hold; otherwise set \(b_0 := {\mathtt {0}}\).

$$\begin{aligned} \Big ( e(\phi _{\alpha }, g^{\alpha }/g^{ \xi }) e(g,g)^{z} \Big )^{\alpha } \ \mathop {=}\limits ^{?}\ e(g,\ g^{\bar{\sigma }} ) \cdot e(g, g)^{ \left( - \sum \limits _{i \in \mathbf {C}} w_i {{\mathcal {R}}}_{s_0}(i) \right) } \end{aligned}$$
(11)

If ODA’s decision b equals to \(b_0\), then set \(b_1 := {{\mathtt {1}}}\); otherwise set \(b_1 := {{\mathtt {0}}}\). Output \((b_0, b_1)\).

The completeness of the above scheme is proved in the full paper [46].

4.3 Discussion

How to verify the integrity of all tag values \(\{ t_i \}\) in algorithm \(\mathsf {UpdVK}\) ? A straightforward method is that: The data owner keeps tack a hash (e.g. SHA256) value of \(t_0 \Vert t_1 \ldots \Vert t_{n-1}\) in local storage, and updates this hash value when executing \(\mathsf {UpdVK}\).

How to reduce the size of challenge \(\{ (i, w_i): i \in \mathbf {C} \}\) ? Dodis et al. [15]’s result can be used to represent a challenge \(\{ (i, w_i): i \in \mathbf {C} \}\) compactly as below: Choose the subset \(\mathbf {C}\) using Goldreich [18]’s \((\delta , \epsilon )\)-hitterFootnote 4, where the subset \(\mathbf {C}\) can be represented compactly with only \(\log n + 3 \log (1/\epsilon )\) bits. Assume \(n < 2^{40}\) (sufficient for practical file size) and let \(\epsilon = 2^{-80}\). Then \(\mathbf {C}\) can be represented with 280 bits. Recall that \(\{ w_i: i \in \mathbf {C} \}\) is derived from some single value \(w \in \mathbb {Z}_p^{*}\).

4.4 Experiment Result

We implement a prototype of our scheme in C language and using GMPFootnote 5 and PBCFootnote 6 library. We run the prototype in a Laptop PC with a 2.5 GHz Intel Core 2 Duo mobile CPU (model T9300, released in 2008). Our test files are randomly generated and of size from 128 MB to 1 GB. We achieve a throughput of data preprocessing at speed slightly larger than 10 megabytes per second, with \(\lambda =1024\).

In contrast, Atenesis et al. [3, 4] achieves throughput of data preprocessing at speed 0.05 megabytes per second with a 3.0 GHz desktop CPU [4]. Wang et al. [38] achieves throughput of data pre-processing at speed 9.0KB/s and 17.2KB/s with an Intel Core 2 1.86 GHz workstation CPU, when a data block is a vector of dimension \(m=1\) and \(m=10\), respectively. According to the pre-processing complexity of [38] shown in Table 1, the theoretical optimal throughput speed of [38] is twice of the speed for dimension \(m=1\), which can be approached only when m tends to \(+\infty \).

Therefore, the data pre-processing in our scheme is 200 times faster than Atenesis et al. [3, 4], and 500 times faster than Wang et al. [38], using a single CPU core. We remark that, all of these schemes (ours and [3, 4, 38]) and some others can be speedup by N times using N CPU cores in parallel. However, typical cloud user who runs the data pre-processing task, might have CPU cores number \(\le 4\).

5 Security Analysis

5.1 Security Formulation

We will define soundness security in two layers. Intuitively, if a cloud storage server can pass auditor’s verification, then there exists an efficient extractor algorithm, which can output the challenged data blocks. Furthermore, if a cloud storage server with knowledge of verification secret key can pass data owner’s verification, then there exists an efficient extractor algorithm, which can output the challenged data blocks. If the data file is erasure encoded in advance, the whole data file could be decoded from sufficiently amount of challenged data blocks.

5.1.1 Definition of Soundness w.r.t Verification of Auditor

Based on the existing Provable Data Possession formulation [4] and Proofs of Retrievability formulation [22, 30], we define \(\mathsf {DPOS}\) soundness security game \(\mathsf {Game}_\mathtt{sound}\) between a probabilistic polynomial time (PPT) adversary \(\mathcal {A}\) (i.e. dishonest prover/cloud storage server) and a PPT challenger \(\mathcal {C}\) w.r.t. a \(\mathsf {DPOS}\) scheme \(\mathcal {E} = (\mathsf {KeyGen}\), \(\mathsf {Tag}\), \(\mathsf {UpdVK}\), \(\langle \mathsf {P}, \mathsf {V} \rangle \), \({\mathsf {\mathsf {OwnerVerify}}})\) as below.

Setup: The challenger \(\mathcal {C}\) runs the key generating algorithm \(\mathsf {KeyGen}(1^{\lambda })\) to obtain two pair of public-private keys (pksk) and (vpkvsk). The challenger \(\mathcal {C}\) gives the public key (pkvpk) to the adversary \(\mathcal {A}\) and keeps the private key (skvsk) securely.

Learning: The adversary \(\mathcal {A}\) adaptively makes polynomially many queries, where each query is one of the following:

  • Store-Query(\(\mathtt {F}\)): Given a data file \(\mathtt {F}\) chosen by \(\mathcal {A}\), the challenger \(\mathcal {C}\) runs tagging algorithm \((\mathtt {Param}_F, \{ (\sigma _i, t_i) \}) \leftarrow \mathsf {Tag}(sk, vsk, \mathtt {F})\), where \(\mathtt {Param}_F = (\mathtt {id}_F, n)\), and sends the data file \({\mathtt {F}}\), authentication tags \(\{ (\sigma _i, t_i) \}\), public keys (pkvpk), and file parameter \(\mathtt {Param}_F\), to \(\mathcal {A}\).

  • Verify-Query \((\mathtt {id}_F)\): Given a file identifier \(\mathtt {id}_F\) chosen by \(\mathcal {A}\), if \(\mathtt {id}_F\) is not the (partial) output of some previous Store-Query that \(\mathcal {A}\) has made, ignore this query. Otherwise, the challenger \(\mathcal {C}\) initiates a proof session with \(\mathcal {A}\) w.r.t. the data file \({\mathtt {F}}\) associated to the identifier \(\mathtt {id}_F\) in this way: The adversary \(\mathcal {C}\), who runs the verifier algorithm \(\mathsf {V}(vsk, vpk, pk, \mathtt {Param}_F)\), interacts with the adversary \(\mathcal {A}\), who replaces the prover algorithm \(\mathsf {P}\) with any PPT algorithm of its choice, and obtains an output \((b, \mathtt {Context}, \mathtt {Evidence})\), where \(b \in \{ {\mathtt {1}}, {\mathtt {0}}\}\). The challenger runs the algorithm \(\mathsf {\mathsf {OwnerVerify}}(b, \mathtt {Context}, \mathtt {Evidence})\) to obtain output \((b_0, b_1) \in \{ 0, 1 \}^{2}\). The challenger sends the two decision bits \((b, b_0)\) to the adversary as feedback.

  • RevokeVK-Query: To respond to this query, the challenger runs the verification key update algorithm to obtain a new pair of verification keys \((vpk', vsk'\), \(\{ t_i' \})\) \(:= \mathsf {UpdVK}(vpk, vsk, \{ t_i \})\), and sends the revoked verification secret key vsk and the new verification public key \(vpk'\) and new authentication tags \(\{ t_i' \}\) to the adversary \(\mathcal {A}\), and keeps \(vsk'\) private.

Commit: Adversary \(\mathcal {A}\) outputs and commits on \((\mathtt {id}^{*}, \mathtt {Memo}, \tilde{\mathsf {P}})\), where each of them is described as below:

  • a file identifier \(\mathtt {id}^{*}\) among all file identifiers it obtains from \(\mathcal {C}\) by making Store-Queries in Learning phase;

  • a bit-string \(\mathtt {Memo}\);

  • a description of PPT prover algorithm \(\tilde{\mathsf {P}}\) (e.g. an executable binary file).

Challenge: The challenger randomly chooses a subset \(\mathbf {C}^{*}\) \(\subset \) \([0,\ n_{F^{*}}-1]\) of size \(c < \lambda ^{0.9}\), where \({\mathtt {F}}^{*}\) denotes the data file associated to identifier \(\mathtt {id}^{*}\), and \(n_{F^{*}}\) is the number of blocks in file \(F^{*}\).

Extract: Let denote a knowledge-extractor algorithm with oracle access to prover algorithm \(\tilde{\mathsf {P}}(\mathtt {Memo})\). More precisely, the extractor algorithm \(\mathcal {E}\) will revoke the verifier algorithm \(\mathsf {V}(vsk, vpk, pk, \mathtt {Param}_{F^*})\) to interact with \(\tilde{\mathsf {P}}(\mathtt {Memo})\), and observes all communication between the prover and verifier. It is worthy pointing out that: (1) the extractor \(\mathcal {E}\) can feed input (including random coins) to the verifier \(\mathsf {V}\), and cannot access the internal states (e.g. random coins) of the prover \(\tilde{\mathsf {P}}(\mathtt {Memo})\), unless the prover \(\tilde{\mathsf {P}}\) sends its internal states to verifier; (2) the extractor \(\mathcal {E}\) can rewind the algorithm \(\tilde{\mathsf {P}}\), as in formulation of Shacham and Waters [30, 31]. The goal of this knowledge extractor is to output data blocks \(\{(i, \mathtt {F}_i'): i \in \mathbf {C}^{*}\}\).

The adversary \(\mathcal {A}\) wins this \(\mathsf {DPOS}\) soundness security game \(\mathsf {Game}_\mathtt{Sound}\), if the verifier algorithm \(\mathsf {V}(vsk\), vpk, pk, \(\mathtt {Param}_{F^*})\) accepts the prover algorithm \(\tilde{\mathsf {P}}(\mathtt {Memo})\) with some noticeable probability \(1/\lambda ^{\tau }\) for some positive integer \(\tau \), where the sampling set is fixed as \(\mathbf {C}^{*}\). More precisely,

(12)

The challenger \(\mathcal {C}\) wins this game, if these exists PPT knowledge extractor algorithm \(\mathcal {E}\) such that the extracted blocks \(\{ (i, \mathtt {F}_i'): i \in \mathbf {C}^{*} \}\) are identical to the original \(\{ (i, \mathtt {F}_i): i \in \mathbf {C}^{*} \}\) with overwhelming high probability. That is,

(13)

Definition 4

(Soundness-1). A \(\mathsf {DPOS}\) scheme is sound against dishonest cloud storage server w.r.t. auditor, if for any PPT adversary \(\mathcal {A}\), \(\mathcal {A}\) wins the above \(\mathsf {DPOS}\) security game \(\mathsf {Game}_\mathtt{Sound}\) implies that the challenger \(\mathcal {C}\) wins the same security game.

5.1.2 Definition of Soundness w.r.t Verification of Owner

We define \(\mathsf {Game2}_\mathtt{sound}\) by modifying the \(\mathsf {DPOS}\) soundness security game \(\mathsf {Game}_\mathtt{sound}\) as below: (1) In the Setup phase, the verification private key vsk is given to the adversary \(\mathcal {A}\); (2) in the Extract phase, the knowledge extractor has oracle access to \(\mathsf {\mathsf {OwnerVerify}}(sk,\ldots )\), additionally.

Definition 5

(Soundness-2). A \(\mathsf {DPOS}\) scheme is sound against dishonest cloud storage server w.r.t. owner, if for any PPT adversary \(\mathcal {A}\), \(\mathcal {A}\) wins the above \(\mathsf {DPOS}\) security game \(\mathsf {Game2}_\mathtt{sound}\), i.e.

(14)

implies that the challenger \(\mathcal {C}\) wins the same security game, i.e. there exists PPT knowledge extractor algorithm \(\mathcal {E}\) such that

(15)

Remarks

  • The two events “adversary \(\mathcal {A}\) wins” and “challenger \(\mathcal {C}\) wins” are not mutually exclusive.

  • The above knowledge extractor formulates the notion that “data owner is capable to recover data file efficiently (i.e. in polynomial time) from the cloud storage server”, if the cloud storage sever can pass verification with noticeable probability and its behavior will not change any more. The knowledge extractor might also serve as the contingency planFootnote 7 (or last resort) to recover data file, when downloaded file from cloud is always corrupt but the cloud server can always pass the verification with high probability.

  • Unlike POR [30, 31], our formulation separates “error correcting code” out from POS scheme, since error correcting code is orthogonal to our design of homomorphic authentication tag function. If required, error correcting code can be straightforwardly combined with our \(\mathsf {DPOS}\) scheme, and the analysis of such combination is almost identical to previous works.

5.2 Security Claim

Definition 6

( m -Bilinear Strong Diffie-Hellman Assumption). Let \(e: \mathbb {G} \times \mathbb {G} \rightarrow \mathbb {G}_T\) be a bilinear map where \(\mathbb {G}\) and \(\mathbb {G}_T\) are both multiplicative cyclic groups of prime order p. Let g be a randomly chosen generator of group \(\mathbb {G}\). Let \(\varsigma \in _R \mathbb {Z}_p^{*}\) be chosen at random. Given as input a \((m+1)\)-tuple \(\mathbf {T}=(g, g^{\varsigma }, g^{\varsigma ^2} \ldots , g^{\varsigma ^m}) \in \mathbb {G}^{m+1}\), for any PPT adversary \(\mathcal {A}\), the following probability is negligible

$$\begin{aligned} \mathsf {Pr} \left[ d = e(g, g)^{1/(\varsigma +c)} \,{ where }\, (c,d)=\mathcal {A}(\mathbf {T}) \right] \le negl(\log p). \end{aligned}$$

Theorem 1

Suppose m-BSDH Assumption hold, and \(\mathsf {PRF}\) is a secure pseudorandom function. The \(\mathsf {DPOS}\) scheme constructed in Sect. 4 is sound w.r.t. auditor, according to Definition 4 (Proof is given in the full paper [46]).

Theorem 2

Suppose m-BSDH Assumption hold, and \(\mathsf {PRF}\) is a secure pseudorandom function. The \(\mathsf {DPOS}\) scheme constructed in Sect. 4 is sound w.r.t. data owner, according to Definition 5 (Proof is given in the full paper [46]).

6 Conclusion

We proposed a novel and efficient POS scheme. On one side, the proposed scheme is as efficient as privately verifiable POS scheme, especially very efficient in authentication tag generation. On the other side, the proposed scheme supports third party auditor and can revoke an auditor at any time, close to the functionality of publicly verifiable POS scheme. Compared to existing publicly verifiable POS scheme, our scheme improves the authentication tag generation speed by more than 100 times. How to prevent data leakage to ODA during proof process and how to enable dynamic operations (e.g. inserting/deleting a data block) in our scheme are in future work.