Keywords

1 Introduction

Cloud computing is growing exponentially and posses abundant advantages including flexible resource management, quick deployment, decreased costs and easy access. In spite of cheaper data storage and computation in cloud computing, cloud clients lose direct control over sensitive data and face more privacy preservation problems. Privacy protection and data security are two issues in cloud computing [1], as users often outsource sensitive information to honest-but-curious cloud storage providers(CSP). Therefore, developing an efficient and secure searchable encryption which allows user to securely search over ciphertext through keywords and selectively retrieve files of interest is of paramount importance.

Although there has been much work focusing on improving the efficiency and security, existing schemes just concentrate on reliability of single CSP, which has some other limitations, such as data corruption, lack of availability and privacy protection. Traditional data replication on multi-clouds is the most straightforward but least cost-efficient approach. Therefore, the focus of searchable encryption has turned to multi-clouds, inter-clouds or cloud-of-cloud. In this paper, we explore the fine-grained searchable encryption technology in multi-clouds [2, 3] and define two schemes as basic scheme and enhanced scheme, respectively. Through Identity-Based Encryption (IBE) scheme [4] the basic scheme can supports single keyword searching in multi-clouds environment, and the enhanced scheme based on Attribute-Based Encryption scheme [5] can achieve a fine-grained access control and allow to perform expressive searching over encrypted data. To avoid time delay and improve efficiency, both schemes can be further extended with Shamir’s (t,n) secret sharing scheme [6] to gain functions of availability and robustness.

The rest of this article is organized as follows. Section 2 begins with the related works. Section 3 introduces some definitions associated with our schemes. Section 4 gives the system model and threat model of both schemes. Section 5 demonstrates our proposed schemes in detail. Section 6 presents the security and performance analysis. Section 7 draws a conclusion.

2 Related Work

Searchable encryption is a cryptographic primitive which allows user to securely search over ciphertext through keywords and selectively retrieve files of interest. Since Song et al. [7] first proposed symmetric key-based scheme for retrieving encrypted data according to IBE scheme, vast subsequent searchable encryption schemes [8, 9] have been proposed to improve the efficiency and security using vector space model, edit distance or multi-way trie-tree. However, these schemes cannot support expressive search like boolean search or non-monotone search. The concept of attribute-based encryption(ABE) was first proposed by Sahai et al. [5]. According to the access policy, ABE schemes can be roughly categorized into KP-ABE(i.e. the key is associated with access policy and ciphertext is embedded with attribute) or CP-ABE(contrary to KP-ABE), only there is a match between attributes and access policy data users can decrypt encrypted data. With abundant ABE schemes [10], we can extend them to searchable encryption schemes to achieve fine-grained access control so as to support expressive searching.

3 Preliminaries

In this section we give some definitions associated with our proposed schemes.

Definition 1

Composite order bilinear groups [11] Given a security parameter k, the bilinear group generator \(\mathcal {G}(\cdot )\) outputs two cyclic groups \(G_{1},G_{2}\) of order q. Where \(q=q_{1}q_{2}q_{3}q_{4}\) (\(q_{1}, q_{2}, q_{3}, q_{4}\) are distinct primes), \(e: G_{1} \times G_{1} \rightarrow G_{2}\) is the bilinear map. And \(G_{q_{1}}, G_{q_{2}}, G_{q_{3}}, G_{q_{4}}\) are the subgroup of order \(q_{1}, q_{2}, q_{3}, q_{4}\). The following features will be satisfied:

  1. 1.

    Bilinear: \(\forall g,h \in G_{1}\), \(a,b \in \mathbb {Z}_{q}\), \(e(g^{a},h^{b})=e(g,h)^{ab}\).

  2. 2.

    Non-degenerate: \(\exists g \in G_{1}\) such that e(gg) has order q in \(G_{2}\).

  3. 3.

    Orthogonality: \(\forall h_{i} \in G_{q_{i}}\) and \(\forall h_{j} \in G_{q_{j}}\), \(i \ne j\), \(e(h_{i},h_{j})=1\), where 1 is the identity element in \(G_{2}\).

4 System and Threat Models

We consider a cryptographic cloud storage system supporting both information retrieval and fine-grained access control over encrypted records.

Fig. 1.
figure 1

Framework of our scheme

There are four entries contained in these schemes, namely data owner, data user, key generator server(KGS) and N CSPs. Where data owner uploads ciphertext and index to each CSP, KGS manages master key and transmits secret key to data owner and users, and CSP stores the ciphertext and index and performs search operation for data users. The framework of our scheme is shown in Fig. 1.

In this paper, suppose the data owner and authorized data user are trusted, while the CSP is honest-but-curious. Specifically, the CSP honestly follows established protocols, but it is still anxious to sensitive or crucial information. The most frequent threat is that the vicious CSP may collude with other CSPs to analyze and deduce the paintext of data files or keywords. To ensure the security and robustness of our schemes, we employ Shamir’s secret sharing scheme to enhance the availability and privacy of our proposed schemes.

5 Constructions of Our Schemes

With the increasing attacks and intrusions, maintaining the security of CSP becomes increasing difficult. The natural solution is to encrypt outsourced data in order to reduce the vulnerability on the case that CSP is compromised. Therefore, it is necessary to resolve the previous challenges and ensure the robust even when certain CSP crashes in distributed system. In the multi-cloud model, certain CSP may collute with other CSPs to analyze and deduce sensitive information. To preserve the data privacy and security against above threats, our basic scheme can tackle these problems and is appropriate for common applications. In actual scenarios, in order to accurately locate the relevant files so as to reduce unnecessary computation burden, data users need to submit several keywords. Therefore, in our enhanced scheme based on KP-ABE we can make up the flaws of basic scheme and achieve expressive searching.

5.1 Basic Scheme

In our basic scheme in which the ciphertext is encrypted based on single keyword, collusion attack can be effectively avoided in multi-clouds model. The specific construction of basic scheme is shown as follows:

  • Setup(\(1^{k}\)): Given secure parameter k, Key Generator Server(KGS) outputs two multiplication cyclic groups \(G_{1},G_{2}\) of order q and two hash functions \(H_{1}: \{0,1\}^{*} \rightarrow G_{1}, H_{2}: G_{2} \rightarrow \{0,1\}^{n}\). Let g be a generator of \(G_{1}\) and e be the bilinear map, e: \(G_{1}\) \(\times \) \(G_{1}\) \(\rightarrow \) \(G_{2}\). Choose random elements \(r_{j} \in \mathbb {Z}_{q }\)(1 \(\le j \le \) N) and master secret \(\alpha \in \mathbb {Z}_{q }\), then publish the system parameters as \(Params=(G_{1},G_{2},H_{1},H_{2},e,g,g_{1}=g^{\alpha })\). Where \(msk=\{\alpha \}\) is only owned by KGS, \(\{r_{1},...,r_{N}\}\) are sent to data owner and authorized data users.

  • KeyGen(\(w, \alpha , H_{1}\)): KGS utilizes master key \(\alpha \) and hash function \(H_{1}\) to generate the secret key \(sk=H_{1}(w)^{\alpha }\) for the keyword w submitted by data user.

  • Enc(\(f_{i},H_{1},g_{1},w_{i}\)): Before encrypting the plaintext \(f_{i} \in \{0,1\}^{n}\), data owner first divides \(f_{i}\) into N chunks \(f_{i}=\{f_{i,1},...f_{i,N}\}\), which each chunk has the same length as file \(f_{i}\), such that \(f_{i}=f_{i,1} \oplus ...\oplus f_{i,N}\). Next he or she selects random \(r_{j} \in \mathbb {Z}_{q }\) for each CSP and computers \(g_{w}=e(H_{1}(w_{i}),g_{1})\), \(c_{i,j}=\langle g^{r_{j}},f_{i,j} \oplus H_{2}(g_{w}^{r_{j}}) \rangle \),\(I_{i,j}=H_{1}(w_{i})^{r_{j}}\), finally \(c_{i,j}\) and \(I_{i,j}\) are outsourced to the j-th(\(1 \le j \le N\)) CSP. Where \(w_{i}\) is extracted from file \(f_{i}\).

  • Search(\(w',H_{1},r_{j},c_{i,j},I\)): When wanting to retrieval files containing keyword \(w'\), data user first performs an interaction with KGS and gains sk, then generates the trapdoor \(T_{w'}=sk^{r_{j}}\) of keyword \(w'\), finally submits it to the j-th CSP. After receiving the trapdoor \(T_{w'}\), the j-th CSP verifies whether the equation \(e(T_{w'},g)=e(g_{1},I_{i,j})\) holds.

    If \(w_{i}=w'\), then the equation holds and returns the ciphertext \(c_{i,j}=\langle A,B \rangle \) to data user. Otherwise \(\perp \).

  • Dec(\(c_{i,j},pk\)): Data user decrypts the ciphertext \(c_{i,j}\) through using secret key. \(f_{i,j}=B \oplus H_{2}(e(sk,A))\)

Discuss 1: With the basic scheme, we can easily solve the problems of single-point failure and collusion attack. As the ciphertext stored in every CSP uses different parameter \(r_{j}\), even though several CSPs collaborate each other they cannot decrypt the ciphertext. Therefore, we can ensure the security and availability in basic scheme. However, one shortcoming of basic scheme is that it cannot be applied in practical applications and achieve fine-grained access control. To address the above drawbacks, we define an enhanced scheme based on Key-Policy Attribute-Based Encryption(KP-ABE).

5.2 Enhanced Scheme

Though there are many searchable encryption schemes focusing on multi-keyword or boolean search [1214], the need of expressive search schemes is still urgent in practical applications. The ciphertext in the enhanced scheme is defined by a set of attributes, while the private key is described by an access matrix [15]. As a result, our enhanced scheme can effectively avoid vicious collusion attack and support expressive search even though the access structure [5] is leaked to malicious CSPs.

  • Setup(\(1^{k},\mathcal {U}\)): Given secret parameter \(k \), KGS outputs parameters \(G_{1},G_{2},e, G_{q_{1}},G_{q_{2}},G_{q_{3}},G_{q_{4}}\). Where \(G_{1},G_{2}\) are two cyclic groups of order q, e: \(G_{1}\) \(\times \) \(G_{1}\) \(\rightarrow \) \(G_{2}\), \(q=q_{1}q_{2}q_{3}q_{4}\), and \(G_{q_{i}}\) is the subgroup of order \(q_{i}\) in \(G_{1}\) . Let \(\mathcal {U}=\{1,...,n\}\) be an attribute set , for each attribute \(i \in \mathcal {U}\), randomly choose \(t_{i} \in \mathbb {Z}_{q}\). And select random numbers \(\alpha \in \mathbb {Z}_{q}, g_{1},\nu _{1} \in G_{q_{1}}, g_{4},\nu _{4} \in G_{q_{4}}\), where \(g_{1}, g_{4}\) are the generators of \(G_{q_{1}}, G_{q_{4}}\), respectively, \(\mu =\nu _{1}\nu _{4}\). Then it chooses random \(s_{j} \in \mathbb {Z}_{q},(1 \le j \le N)\) for each CSP, finally pkmsk are defined as \(pk=\{q,g_{1},g_{4},e(g_{1},g_{1})^{\alpha },\mu ,\beta _{i}=g_{1}^{t_{i}},\forall i\}, msk=\{\nu _{1},\alpha \}\). Where \(\{s_{1},...,s_{N}\}\) are sent to data owner and authorized data users.

  • Enc(\(f_{i},s_{j},pk,W\)): Unlike the file chunk in basic scheme, each file chunk \(f_{i,j}\) contains m keyword fields, namely \(W=\{w_{1},...,w_{m}\}\). Data owner chooses \(\eta _{1},\eta _{2} \in G_{q_{4}}\), then takes the keyword set as attributes to encrypt file \(f_{i,j}\). The ciphertext is computed as \(c_{i,j}=\{c=f_{i,j}e(g_{1},g_{1})^{\alpha s_{j}}, c_{0}=g_{1}^{s_{j}}\eta _{1},c_{r}=(\mu \beta _{r})^{s_{j}}\eta _{2}, \forall w_{r} \in W, 1\le r \le m\}\). Then the ciphertext \((c_{i,j},1\le j \le N)\) is sent to the j-th CSP.

  • Trapdoor(\(\mathbb {A},\rho ,pk,msk\)): An access matrix will be derived from submitted keyword set \(W'=\{w_{1},...,w_{l|l \le m}\}\) before data user performing search requirements. Suppose \(\mathbb {A}\) be the \(m \times n\) access matrix, each row \(\mathbb {A}_{\rho (i)}\) represents a keyword field, where \(\rho \) is a function from \(\{1,...,l\}\) to \(\{1,...,m\}\), \(i \in \{1,...,l\}\). Data user first selects a random vector \({\varvec{V}} \in \mathbb {Z}_{q}^{n}\) such that \(\mathbf 1 \cdot {\varvec{V}}=\alpha \), where \(\mathbf 1 =(1,0,...,0)\). Then he or she randomly chooses \(\theta _{\rho (i)} \in \mathbb {Z}_{q},\varphi _{\rho (i),1},\varphi _{\rho (i),2} \in G_{q_{3}}\), and computes the trapdoor as \(T=\{T_{\rho (i)}^{1}=g_{1}^{\mathbb {A}_{\rho (i)}\mathbf V }(\nu _{1}\beta _{\rho (i)})^{\theta _{\rho (i)}}\varphi _{\rho (i),1},T_{\rho (i)}^{2}=g_{1}^{\theta _{\rho (i)}}\varphi _{\rho (i),2}\}\).

  • Test(\(c_{i,j},pk,T\)): If the keyword set W embedded in ciphertext satisfies the access matrix of data user, CSP chooses a constant \(\omega _{\rho (i)}\), such that .  Then CSP computers the following equation:

    $$\begin{aligned} \prod _{\mathbb {A}_{\rho (i)} \in W'} \frac{e(c_{0},T_{\rho (i)}^{1})^{\omega _{\rho (i)}}}{e(c_{\rho (i)},T_{\rho (i)}^{2})^{\omega _{\rho (i)}}}= & {} \prod _{\mathbb {A}_{\rho (i)} \in W'}\frac{e(g_{1},g_{1})^{s_{j}\mathbb {A}_{\rho (i)}\mathbf V \omega _{\rho (i)}}e(g_{1},\nu _{1}\beta _{\rho (i)})^{s_{j}\theta _{\rho (i)}\omega _{\rho (i)}}}{e(\nu _{1}\beta _{\rho (i)},g_{1})^{s_{j}\theta _{\rho (i)}\omega _{\rho (i)}}} \\= & {} \prod _{\mathbb {A}_{\rho (i)} \in W'}e(g_{1},g_{1})^{s_{j}\mathbb {A}_{\rho (i)}\mathbf V \omega _{\rho (i)}} \\= & {} e(g_{1},g_{1})^{s_{j} \alpha } \end{aligned}$$

    Finally, the plaintext is returned as \(f_{i,j}=\frac{c}{e(g_{1},g_{1})^{s_{j} \alpha }}\).

5.3 Extension

In our proposed schemes, until N file slices have been gained from CSPs, the data user can reconstruct the original file. However, in actual applications data user cannot restore the whole file \(f_{i}\) because of single-point failure of CSP. Furthermore, searching all file chunks inevitably leads to time delay, which seriously impacts the availability and robustness of our schemes. To settle this problem, the Shamir’s secret sharing scheme can be adopted to improve the efficiency and reliability of our schemes, Namely, the each file chunk will be encrypted with a (t-1)-order polynomial before re-encrypted by data owners. Specific steps are shown as follows:

  • After file \(f_{i}\) has been divided into N chunks, a random (t-1)-order function will be chosen by data owner for each file as follows:

    $$\begin{aligned} \mathcal {F}_{i}(x)= & {} c_{t-1}x^{t-1}+c_{t-2}x^{t-2}+...+c_{1}x^{1}+f_{i} \end{aligned}$$
  • Before encrypting file chunk \(f_{i,j}\), data owner first preprocess each file chunk as follows:

    $$\begin{aligned} \mathcal {F}_{i,1}(f_{i,1})= & {} c_{t-1}f_{i,1}^{t-1}+c_{t-2}f_{i,1}^{t-2}+...+c_{1}f_{i,1}^{1}+f_{i} \\ \mathcal {F}_{i,2}(f_{i,2})= & {} c_{t-1}f_{i,2}^{t-1}+c_{t-2}f_{i,2}^{t-2}+...+c_{1}f_{i,2}^{1}+f_{i} \\ \vdots&\\ \mathcal {F}_{i,N}(f_{i,1})= & {} c_{t-1}f_{i,N}^{t-1}+c_{t-2}f_{i,N}^{t-2}+...+c_{1}f_{i,N}^{1}+f_{i} \end{aligned}$$
  • Data user can employ any t plaintext gained from N CSPs to reconstruct the original file.

Based on the efficient and secure multiple CSPs mechanism, data user can reconstruct the original file even though the \(\{n-t\}\) CSPs have been compromised, where the t is the predefined threshold value. However, the search pattern may be leaked when more than t CSPs collude with each other. Therefore, the proposed schemes combined with Shamir’s secret sharing ensure the security and privacy of data to some extent.

6 Security and Performance Analysis

6.1 Security Analysis

Table 1. Performance analysis

As chosen ciphertext security(IND-CPA) is the standard acceptable notion of security of public key searchable encryption scheme, our basic scheme is required to satisfy this strong notion of security. So we can say the proposed basic scheme is semantically secure against IND-CPA attack if no polynomially bounded adversary \(\mathcal {A}\) has a non-negligible advantage against the challenger in the IND-CPA game as follows.

  • \(\mathbf{Setup: }\) Given a security parameter k, the challenger runs the Setup algorithm and sends the resulting parameters to adversary \(\mathcal {A}\), while the msk is owned by himself.

  • \(\mathbf{Phase 1: }\) The adversary issues queries \(q_{1},...,q_{n}\) where query \(q_{i}\) is: \(-\) KeyGen \(\langle w_{i} \rangle \). The challenger runs the KeyGen algorithm and generates the private key \(pk_{i}\) corresponding to \(w_{i}\), then he or she sends the private keys to adversary. As these queries may be asked adaptively, namely, each query \(q_{i}\) may depend on the replies to \(q_{1},...,q_{i-1}\).

  • \(\mathbf{Challenge: }\) After the Phase 1 is over, the adversary outputs two equal length files \(f_{0},f_{1} \in \mathbf {F}\) and keywords \(w_{0},w_{1}\) on which he or she wishes to be challenged. The only constraint is that keywords \(w_{0},w_{1}\) did not appear in Phase 1. The challenger picks a random bit \(b \in \{0,1\}\) and sends \(c_{b} \in \mathbf {C}\) to adversary.

  • \(\mathbf{Phase 1: }\) The adversary issues more queries \(q_{n+1},...,q_{m}\) , where each \(q_{i}\) is: \(-\) KeyGen \(\langle w_{i} \rangle \). The challenger responds as in Phase 1, where \(w_{0},w_{1} \ne w_{i}\).

  • \(\mathbf{Guess: }\) Finally, the adversary outputs a guess \(b^{'} \in \{0,1\}\) and wins the game if \(b^{'}=b\).

We call this adversary \(\mathcal {A}\) as an IND-CPA adversary, and the advantage of the adversary in successfully attacking the basic scheme (BS) is given as follows:

$$\begin{aligned} Adv_{BS,\mathcal {A}}(k)=|Pr[b=b^{'}]-\frac{1}{2}| \end{aligned}$$

Theorem 1:

Suppose the hash function \(H_{1},H_{2}\) are random oracles, then our basic scheme is IND-CPA security if the BDH Problem is hard.

Theorem 2:

If the assumptions in [15] hold, our enhanced scheme is secure and anonymous. Due to space constraints, the detailed proofs can refer to full paper.

6.2 Performance Analysis

The computational complexity mainly depends on pairing operation(p), exponentiation operation(e) and hash operations(\(h_{1},h_{2}\)) in the Encryption(Enc), Trapdoor(Trap) and Test algorithms, where \(h_{1}\) maps a string to the point in \(G_{1}\), \(h_{2}\) maps the point in \(G_{2}\) to a string.

Where the values of m, N represent the keyword fields and the number of CSP, respectively. The theoretical analysis of computational complexity is shown in Table 1.

Fig. 2.
figure 2

Performance analysis of different schemes

In order to intuitively show the superior performance in Fig. 2, we use the Type A curves defined within the PBC library. Regarding on the computational cost of above operations, we set \(p=5.811ms,e=3.85ms,h_{1}=12.418ms,h_{2}=0.947ms\), and the specific parameters are shown in literature  [16].

Fig. 3.
figure 3

An example of performance comparison

Considering the practical applications, we set m,N \(\in \) [1,10]. From the subgraph Fig. 2(a) we notice that the computational cost of Enc algorithm in basic scheme is limited to the range (23.026ms, 66.199ms), and our enhanced scheme has less computational cost than other schemes when N=1. However, with the increased N \(\ge 2\), the Enc algorithm of enhanced scheme will have biggest computational overhead as it needs to compute N ciphertext for N CSPs. In Fig. 2(b) the Trap algorithm in enhanced scheme has the least computational cost, which is superior to other schemes  [15, 17]. And the performance of basic scheme is still better than previous schemes within certain range. In Fig. 2(c) the Test algorithm of our enhanced scheme almost has the lowest computational cost, and the basic scheme is also has less computational overhead under certain conditions. Next we will give an example in Fig. 3 to show performance superiority of our schemes, in which we set m=5,N=3. In conclusion, the computational complexity of our proposed schemes are lower than previous schemes, especially in the single CSP setting. Although the computational cost of Enc operation in enhanced scheme is larger than other schemes, our enhanced scheme can be applied to multiple CSPs setting so that the problems of single-point failure and collusion attack can be avoided.

7 Conclusion

In this paper, we design two searchable encryption technologies in multi-clouds model. Our proposed schemes can avoid single-point failure threat and potential collusion attack. However, searching all file chunks inevitably leads to time delay, which seriously impacts the availability and robustness of our schemes. Through using Shamir’s secret sharing scheme we can improve the efficiency and reliability of our schemes. Compared with existing searchable encryption schemes in single-cloud model, our schemes can achieve availability and security at the same time without increasing computational burden.