1 Introduction

1.1 Background

As we all know, big data has three outstanding features: large volume, high velocity and high variety. Cloud storage is well designed for big data because of its excellent capability to store large volumes of data, to prepare for high velocity of data generation and to process high variety of data. Cloud computing provides great convenience to users and is one of the most popular technologies at present.

Meanwhile, cloud computing (data outsourcing) raises confidentiality and privacy concerns. Simple encryption can protect data confidentiality easily. However, when data users want to search using some specific keyword to get documents of their interest among massive volumes of data, this becomes a new challenge. In order to search by a particular keyword, the data owner has to decrypt the data first before starting the searching process. It is obviously not practical especially when the volume of data is large. Searchable encryption (SE) [5,6,7, 10, 11, 15, 20] is a cryptographic primitive to address search over ciphertexts. SE allows data users retrieve documents from the cloud server according to some keywords. Searchable encryption has been studied intensively and a mass of keyword search schemes over encrypted cloud data have been proposed.

In the cloud environment, a data owner usually shares his documents with data users. In this paper, we focus on the single-owner/multi-user setting. In this setting, when a data user wants to search over the data owner’s documents, he usually needs to ask data owner to produce necessary trapdoor information to help him complete the search. This is to say, the data owner must be online all the time to perform the per-query interaction with data users. However, the primary goal of data owner is to outsource his search services to the cloud server, so we remove the per-query interaction between the data owner and data users in our scheme.

In some practical setting, search with only one keyword may obtain a great quantity of documents and obviously it lacks search precision. Especially when the queried keyword does not accurately describe the documents that data user wants to get. Thus, multi-keyword searchable encryption merged. That is to say, data users can do the research with multi keywords, namely, conjunctive keyword search. Thus, data users can get the documents including all the queried keywords.

Most existing SE schemes assume that the cloud server is honest-but-curious. That is to say, the cloud server may try to find out which keyword the ciphertext is about. Besides, it may also be curious about which keywords data user wants to search. So it is necessary to ensure index privacy and token privacy.

In most literatures, when data users want to search with specific keywords to get their target documents, they have to compute a search token including quite a few items and send it to the cloud server to complete the search process. It may consume a lot of bandwidth and the computing overhead is huge for data users. However, the search token in our scheme consists of only two items by using multi-input inner-product functional encryption. Furthermore, our scheme guarantees index privacy and token privacy simultaneously.

1.2 Design Goal

In this paper, we propose a secure and efficient single-owner/multi-user searchable encryption scheme supporting multi-keyword search. Our design goal is summarized as follows:

  1. 1.

    Our scheme avoids the per-query interaction between the data owner and data users. Namely, the data owner does not need to stay online waiting for data users to search in his archives.

  2. 2.

    By tactfully leveraging multi-input inner-product functional encryption, our scheme allows the cloud server to complete search processes with search tokens which consist of only two items.

  3. 3.

    By using an inverted index structure and super-incremental sequence, our scheme achieves efficient multi-keyword search.

  4. 4.

    Our scheme ensures the correctness of search phase.

  5. 5.

    Our scheme achieves partial token privacy, index privacy and token privacy at the same time.

1.3 Organization

The structure of this paper is as follows:

In Sect. 2, we introduce some related work. Section 3 gives some preliminaries. Then we propose our system model and describe our scheme in detail in Sect. 4 and Sect. 5 respectively. In Sect. 6, we show the correctness and security of our scheme and analyze function and efficiency of our scheme by comparing with other schemes in Sect. 7. In the last section, we summarize this paper.

2 Related Work

2.1 Index

The index structures have an effect on assisting to perfect the scheme. Different index structures have different advantages and disadvantages. Curtmola et al. proposed a searchable encryption scheme in literature [6] based on the inverted index because of its efficiency. Although inverted index structure is efficient on searching, it is not convenient on updating the files. Goh et al. [11] proposed an index structure based on bloom filter. While Chang et al. proposed a vector index in [5].

2.2 Searchable Encryption

The first searchable encryption is proposed by Song et al. [20] in the symmetric key setting. The security notion of searchable encryption was first introduced by Goh [11]. And then, Curtmola et al. [6] presented a stronger security notion, indistinguishability against adaptive chosen-keyword attacks (IND-CKA2). Boneh et al. [7] designed the first searchable encryption with keyword search in the public key setting, but its search efficiency is not fast comparing to the symmetric searchable encryption. All the scheme mentioned above only support single-keyword search.

However, in some settings, a single-keyword may not describe the search precision correctly. Therefore, multi-keyword searchable encryption [2, 4, 8, 12,13,14, 16,17,18,19, 21, 22] has received increasing attention.

Golle et al. [12] first proposed the construction of conjunctive keyword searchable encryption in the single-owner/single-user setting and presented two schemes. In the first scheme, the size of search token is linear with the number of encrypted documents. In the second scheme, the size of search token is constant by using bilinear parings while the computational cost is still not low. In the literatures [14, 16, 19], conjunctive and disjunctive keyword search are proposed, which make the multi-keyword search semantics get a further extension. Cash et al. [4] proposed the first sublinear symmetric searchable encryption support boolean queries in single-owner/single-user setting and implemented it in a large database [3]. Jarecki et al. [13] extends it to single-owner/multi-user setting, which data owner needs to be online all the time. Besides, the search time mainly depends on the number of files including the least frequency keywords among keywords data user wants to search. That is to say, search efficiency is not high when the keywords data user queries are all high frequency ones.

3 Preliminary

3.1 Notation

We use \( s{\mathop {\longleftarrow }\limits ^{R}} S \) to denote the operation of uniformly sampling a random element s from a set S. We use PPT to denote a probabilistic polynomial-time algorithm. \({\lambda }\) represents the security parameter in this paper. We use lower case boldface italics to denote (column) vectors and upper case boldface italics to denote matrices. For a matrix \(\varvec{M}\) over \(\mathbb Z_q\), we have \([\varvec{M}]_1:=g_1^{\varvec{M}}\) and \([\varvec{M}]_2:=g_2^{\varvec{M}}\), where exponentiation is carried out component-wise. Besides, we use [n] to denote integers no more than n and we use \({<}\varvec{x},\varvec{y}{>}\) to denote inner product of \(\varvec{x}\) and \(\varvec{y}\) where \(\varvec{x}\) and \(\varvec{y}\) are column vectors with same dimension.

3.2 Asymmetric Bilinear Groups

Let \(\mathcal {PG}\) denote a group generator – an algorithm which takes a security parameter \(\lambda \) as input and outputs a description of prime order groups \({\mathbb {G}}_1\),\({\mathbb {G}}_2\),\({\mathbb {G}}_T\) with a bilinear map \(e:{\mathbb {G}}_1 \times {\mathbb {G}}_2 \rightarrow {\mathbb {G}}_T\). We define \(\mathcal {PG}\)’s output as \((q,g_1,g_2,{\mathbb {G}}_1,{\mathbb {G}}_2,{\mathbb {G}}_T,e)\) where q is a prime of \(\varTheta (\lambda )\) bits, \({\mathbb {G}}_1,{\mathbb {G}}_2,{\mathbb {G}}_T\) are cyclic groups of order q. \(g_1,g_2,g_T\) are the generator of \({\mathbb {G}}_1,{\mathbb {G}}_2\) and \({\mathbb {G}}_T\) respectively. \(e:{\mathbb {G}}_1 \times {\mathbb {G}}_2 \rightarrow {\mathbb {G}}_T\) is a map with the following properties:

  1. (1)

    Bilinearity: \(\forall a,b \in Z_q\), \(e(g_1^a,g_1^b)= e(g_1,g_2)^{ab}\)

  2. (2)

    Non-degeneracy: \(e(g_1,g_2)\ne 1\)

  3. (3)

    Computability: \(\forall u\in {\mathbb {G}}_1\), \(v\in \mathbb G_2\), e(uv) can be efficiently computed.

3.3 Multi-input Inner-Product Encryption

In inner-product encryption scheme, upon receiving the ciphertext of a vector \(\varvec{x}\), only the recipients who have the secret key \(k_{\varvec{y}}\) can obtain the inner product \({<}\varvec{x}, \varvec{y}{>}\) of \(\varvec{x}\) and \(\varvec{y}\). While in multi-input inner-product encryption, only the recipients who have the secret key \(k_{\varvec{y}_1,\varvec{y}_2,...\varvec{y}_n}\) and ciphertexts of vector \(\varvec{x}_1\),\(\varvec{x}_2\),...,\(\varvec{x}_n\) can obtain the sum of inner product \({<}\varvec{x}_{i}, \varvec{y}_{i}{>}\), namely, \(\sum _{i=1}^{n}{<}\varvec{x}_{i}, \varvec{y}_{i}{>}\).

We will use the definition of Matrix Decision Diffie-Hellman (MDDH) Assumption in [9].

3.4 Matrix Distribution

Let \(k,l\in N\), with \(l>k\), we call \(\mathcal {D}_{l,k}\) a matrix distribution if it outputs matrices in \(Z_q^{l\times k}\) of full rank k in polynomial time. We write \(\mathcal {D}_k:=\mathcal {D}_{k+1,k}\). Without loss of generality, we assume the first k rows of \(\mathbf A{\mathop {\longleftarrow }\limits ^{R}}\mathcal {D}_{l,k}\) form an invertible matrix. Particularly, we use \(\mathcal {U}_{l,k}\) to denote the uniform distribution. \(\mathcal {U}_k\) stands for \(\mathcal {U}_{k+1,k}\). In this work, we are mostly interested in the uniform matrix distribution \(\mathcal {U}_{l,k}\).

3.5 \(\mathcal {D}_{l,k}\)-Matrix Diffie-Hellman Assumption \(\mathcal {D}_{l,k}\)-MDDH

Let \(\mathcal {D}_{l,k}\) be a matrix distribution. We say that the \(\mathcal {D}_{l,k}\)-Matrix Diffie-Hellman (\(\mathcal {D}_{l,k}\)-MDDH) Assumption relative to \(\mathcal {PG}\) in \({\mathbb {G}}_s\) holds if for all PPT adversaries \(\mathcal {A}\), there is no non-negligible function Adv. Namely \(Adv_{{\mathbb {G}}_s,\mathcal {A}}^{\mathcal {D}_{l,k}-MDDH}=|Pr[\mathcal {A}(\mathcal {PG},[\varvec{A}]_s, [\varvec{A}{\mathbf {w}}]_s)=1]-Pr[\mathcal {A}(\mathcal {PG}, [\varvec{A}]_s,[\varvec{u}]_s)=1]|=negl(\lambda )\), where the probability is taken over \(\varvec{A}{\mathop {\longleftarrow }\limits ^{R}}\mathcal {D}_{l,k}\), \(\varvec{w}{\mathop {\longleftarrow }\limits ^{R}}{\mathbb {Z}}_q^k\), \(\varvec{u}{\mathop {\longleftarrow }\limits ^{R}}{\mathbb {Z}}_q^{k+1}\) and \(s\in \left\{ 1,2\right\} \).

Lemma 1

Among all possible matrix distribution \(\mathcal {D}_{l,k}\), the uniform matrix distribution \(\mathcal {U}_{l,k}\) is the hardest possible instance. We have \(\mathcal {D}_{l,k}-MDDH \Rightarrow \mathcal {U}_{l,k}-MDDH\). For all PPT adversaries \(\mathcal {A}\), there exists an adversary \(\mathcal {B}\) such that \(Adv_{\mathbb G_s,\mathcal {A}}^{\mathcal {U}_{l,k}-MDDH} \le Adv_{\mathbb G_s,\mathcal {B}}^{\mathcal {U}_{k}-MDDH}\).

Lemma 2

For \(\varvec{A}{\mathop {\longleftarrow }\limits ^{R}}\mathcal {U}_{l,k}\), \(\varvec{W}{\mathop {\longleftarrow }\limits ^{R}}{\mathbb {Z}}_q^{k\times Q}\), \(\varvec{U}{\mathop {\longleftarrow }\limits ^{R}}{\mathbb {Z}}_q^{{(k+1)}\times Q}\), \(s\in \left\{ 1,2\right\} \). \(Adv_{{\mathbb {G}}_s,\mathcal {A}}^{\mathcal {Q-U}_{l,k}-MDDH}= |Pr[\mathcal {A}(\mathcal {PG},[\varvec{A}]_s, [\varvec{A}{\mathbf {W}}]_s)=1]-Pr[\mathcal {A}(\mathcal {PG}, [\varvec{A}]_s,[\varvec{U}]_s)=1]|\). Then, we have for all PPT adversaries \(\mathcal {A}\), there exists an adversary \(\mathcal {B}\) such that \(Adv_{{\mathbb {G}}_s,\mathcal {A}}^{\mathcal {Q-U}_{l,k}-MDDH} \le Adv_{{\mathbb {G}}_s,\mathcal {B}}^{\mathcal {U}_{l,k}-MDDH}+\frac{1}{q-1}\).

Fig. 1.
figure 1

System model

4 System Model

In our single-owner/multi-user setting, there are three different kinds of entities: data owner, data user and cloud server. As shown in Fig. 1, the data owner has a collection of files and wants to outsource his search service to the cloud server. The data owner first extracts keywords from the files and constructs inverted indices. It is important to note that our scheme is mainly applicable to the scenes that the number of keywords is limited but the number of files is huge, so the data owner only extracts the most relevant keywords.

And then, the data owner outsources encrypted indices and encrypted files to the cloud server. Besides, the data owner sends partial token and search-authorized secret key to each legitimate data user, with which data user is able to generate search token about the keywords he wants to search. When a data user performs a search query, he sends the search token to the cloud server. With the search token and encrypted indices, the cloud server finally returns target documents to the data user.

Formally, our multi-keyword searchable encryption is a tuple of six polynomial-time algorithms \(\pi =(Setup,Enc,PartialTokenGen,ClientK Gen\), TokenGenSearch)

  • \(Setup (1^{\lambda })\rightarrow (pp,msk)\): is a probabilistic algorithm that the data owner takes security parameter \(1^{\lambda }\) as input and generates system master key msk and public parameter pp.

  • \(Enc(pp,F,W,DU) \rightarrow \) (\(C_W\), \(C_F\), \(C_{Indices}\), \(C_{List}\)): is a probabilistic algorithm that the data owner takes public parameter pp, a document collection \(F=\left\{ {f_1,f_2,...,f_n}\right\} \), keyword dictionary \(W=\left\{ {w_1,w_2,...,w_m}\right\} \) which is public and a set of legitimate data users DU as input and generate encrypted keywords \(C_W\), encrypted files \(C_F\) and encrypted indices \(C_{Indices}\). The file encryption is executed by using some simple symmetric encryption due to efficiency concerns. Besides, the data owner generates an encrypted list \(C_{List}\) about data users and their corresponding information.

  • \(PartialTokenGen (pp, msk,\xi ) \rightarrow pt\): is a probabilistic algorithm that the data owner takes ppmsk to generate partial token pt for each legitimate data user \(\xi \in DU\), with which and his search-authorized private key, a data user can generate search tokens for the keywords he wants to search.

  • \(ClientKGen (pp, msk,\xi ) \rightarrow sk\): is a probabilistic algorithm that the data owner takes pp and msk as input and generates different search-authorized private key sk for each legitimate data user \(\xi \in DU\).

  • \(TokenGen (sk, pt, Q) \rightarrow token\): is a deterministic algorithm that the data users use their private key sk and partial-token pt to produce search tokens token for the keyword set Q they want to query.

  • \(Search(token, C_W, C_F, C_{Indices},C_{List})\rightarrow RST \): is a deterministic algorithm that the cloud server uses search token token to search over encrypted indices \(C_{Indices}\). Then it downloads the matched encrypted files RST and returns them to the data user.

5 Construction

In this section, we will introduce our multi-keyword searchable encryption scheme in detail.

  • \(Setup(1^{\lambda })\): Given a bilinear group \(e:{\mathbb {G}}_1 \times {\mathbb {G}}_2 \rightarrow {\mathbb {G}}_T\), where q is a prime of \(\varTheta (\lambda )\) bits, \({\mathbb {G}}_1,{\mathbb {G}}_2,{\mathbb {G}}_T\) are cyclic groups of order q. \(g_1,g_2,g_T\) are generators of \(\mathbb G_1,{\mathbb {G}}_2\) and \({\mathbb {G}}_T\) respectively. Randomly select a matrix \(\varvec{A}\) from \({\mathbb {Z}}_q^{3\times 2}\) of full rank, namely randomly select a matrix \(\varvec{A}\) from \(\mathcal {U}_2\), randomly choose a matrix \(\varvec{M}\) from \({\mathbb {Z}}_q^{3\times 3}\), \(\varvec{V}\) from \({\mathbb {Z}}_q^{2\times 3}\), and randomly select m vectors \(\varvec{z}_1,\varvec{z}_2,...,\varvec{z}_m\) from \(\mathbb Z_q^{2}\). Let \(\varepsilon =(Setup,Enc,KGen,Dec)\) be a public-key encryption scheme, where Setup is a public key generation algorithm, Enc is an encryption algorithm, KGen is a secret key generation algorithm and Dec is a decryption algorithm. \(pk_{server}\leftarrow \varepsilon .Setup(1^{\lambda })\) is public key of the cloud server. Then, output public parameter pp=\((g_1,g_2,g_T,q,[\varvec{A}]_1,pk_{server})\) and master secret key \(msk=(\varvec{M},\varvec{V},\left\{ \varvec{z}_i\right\} _{i\in [m]})\)

  • Enc(ppFWDU): Choose a super-incremental sequence \(\alpha _1,\alpha _2,...,\alpha _m \in (0, \frac{{\mathrm {log}}_{{g}_{T}}q}{2})\), that is, for \(i \in [m]\),

    $$\begin{aligned} \alpha _i>\alpha _1+\alpha _2+...+\alpha _{i-1} \end{aligned}$$

    For each keyword \({w_i} \in W\), use a pseudo random substitution to map i to j, let \({\varvec{x}_i}=(w_i,1,r) \in {\mathbb {Z}}_q^3\), where r are randomly chosen from \({\mathbb {Z}}_q\), choose \(\varvec{y}_i=(y_{i_1},y_{i_2},1)\in {\mathbb {Z}}_q^3\) such that \(\alpha _j= {<}\varvec{x}_i,\varvec{y}_i{>}\). Besides, choose different \(r_{\xi }\) for each data user \(\xi \in DU\), where \(r_{\xi }\) is randomly chosen from \(\mathbb Z_q\). Record \(\left\{ \varvec{x}_i\right\} _{i\in [m]}\), \(\left\{ \varvec{y}_i\right\} _{i\in [m]}\) and \(List =\left\{ \xi ,r_{\xi }\right\} _{\xi \in DU}\). Then, we compute \(C_W=\left\{ {g_T}^{\alpha _1},{g_T}^{\alpha _2},...,{g_T}^{\alpha _m}\right\} \) as the ciphertext of keywords W, compute \(C_F\) as the ciphertext of files F with some symmetric encryption algorithm and generate encrypted indices \(C_{Indices}=\left\{ {g_T}^{\alpha _j},Id_{w_i}\right\} _{j \in [m]}\). \(Id_{w_i}\) means a set of file identifiers of files which include keywords \(w_i\). Besides, we use \(pk_{server}\) to compute \(C_{List} = \varepsilon .Enc(List)\). Finally, the data owner sends (\(C_W\), \(C_F\), \(C_{Indices}\), \(C_{List}\)) to cloud server.

  • \(PartialTokenGen(pp, msk, \left\{ \varvec{x}_i\right\} _{i\in [m]}, \xi )\): For each legitimate data user \({\xi \in DU}\), the data owner randomly chooses different \(\varvec{s}_{\xi ,i} \in {\mathbb {Z}}_q^2\) and \(r_{pt} \in {\mathbb {Z}}_q\). Let \(\varvec{r}_{\xi _{pt}}=(0,0,r_{pt})\), and compute partial-token as follows.

    $$\begin{aligned} {[\varvec{c}_i]}_1=[\varvec{A} \varvec{s}_{\xi ,i}]_1 \quad \end{aligned}$$
    (1)
    $$\begin{aligned}{}[{\varvec{c}_i}^{'}]_1=[\varvec{M}\varvec{A} \varvec{s}_{\xi ,i}+\varvec{x}_i+\varvec{r}_{\xi _{pt}}]_1 \quad \end{aligned}$$
    (2)
    $$\begin{aligned}{}[{\varvec{c}_i}^{''}]_1=[\varvec{V} \varvec{A} \varvec{s}_{\xi ,i}+\varvec{z}_i]_1 \quad \end{aligned}$$
    (3)

    Send the partial-token \(pt= \left( [\varvec{c}_i]_1,[{\varvec{c}_i}^{'}]_1,[{\varvec{c}_i}^{''}]_1\right) _{i\in [m]}\) to the data user \(\xi \) by a secure channel.

  • \(ClientKGen(pp, msk,\left\{ \varvec{y}_i\right\} _{i\in [m]},\xi ,r_{\xi })\): For each legitimate user \({\xi \in DU}\), the data owner randomly chooses different \(\varvec{r}_{\xi _1} \in \mathbb Z_q^2\), let \(\varvec{r}_{\xi _{sk}}=(0,r_\xi -r_{pt},0) \in {\mathbb {Z}}_q^3\) and compute secret key as follows.

    $$\begin{aligned} \varvec{d}_i = {\varvec{M}}^{T}(\varvec{y}_i+\varvec{r}_{\xi _{sk}}) +{\varvec{V}}^{T}\varvec{r}_{\xi _1} \quad \end{aligned}$$
    (4)
    $$\begin{aligned} Z_i={<}\varvec{z}_i, \varvec{r}_{\xi _1}{>} \quad \end{aligned}$$
    (5)

    Send the secret key \(sk=\left( \left\{ [\varvec{d}_i]_2,[ Z_i]_T,[\varvec{y}_i+\varvec{r}_{\xi _{sk}}]_2\right\} _{i\in [m]},[\varvec{r}_{\xi _1}]_2 \right) \) to the data user \(\xi \) by a secure channel.

  • TokenGen(skptQ): With partial-token pt, secret key sk and the keywords set \(Q=\left\{ w_{q_1}, w_{q_2},..., w_{q_t}\right\} \subseteq W\) to be searched, data users compute search tokens as follows. We use \(e([\varvec{X}]_1,[\varvec{Y}]_2)\) to denote \([\varvec{X}^T \varvec{Y}]_T\).

    $$\begin{aligned} st=\prod _{i=1}^t \frac{e([\varvec{c}_{q_i}^{'}]_1,[\varvec{y}_{q_i}+\varvec{r}_{\xi _{sk}}]_2)\cdot e([\varvec{c}_{q_i}^{''}]_1,[\varvec{r}_{\xi _1}]_2)/e([\varvec{c}_{q_i}]_1,[\varvec{d}_{q_i}]_2)}{[Z_i]_T} \quad \end{aligned}$$
    (6)

    Data users send the search tokens \(token=(st,\varepsilon .Enc(t))\) corresponding to the keywords they want to search to the cloud server.

  • \(Search(token,C_W,C_F,C_{Indices},C_{List})\): When the cloud server receives a search token, it first decrypts \(\varepsilon .Enc(t)\) to get t and retrieves real search token \(rst={{g}_{T}}^{\sum _{i=1}^{t}{<}{\varvec{x}}_{{q}_{i}},{\varvec{y}}_{{q}_{i}}{>}}\) by t and \([r_{\xi }]_{T}\) corresponding to user identity \(\xi \) and then determines whether \(g^{\alpha _m}\) less than the real search token rst, if so, return \(\bot \), which means that the search token is illegal and there is no corresponding keywords. Otherwise, by using binary search, the cloud server determines whether there is a k satisfying \(g^{\alpha _k}\le rst \le g^{\alpha _{k+1}}\), if so, it means that the keyword corresponding to \(g^{\alpha _k}\) is one of the keyword the data user wants to search. Then, it calcautes \(rst = rst/g^{\alpha _k}\) and repeats the above steps until rst equals to one. Pseudo code is showed in Algorithm 1. Finally, cloud server takes all the file identifiers that contain the keywords to be searched and then returns the ciphertexts of the corresponding file to the data user.

figure a

6 Correctness and Security

6.1 Correctness

We now show the correctness of the search phase.

The data user first calculates search token as follows:

$$\begin{aligned} st&=\prod _{i=1}^t \frac{e([\varvec{c}_{q_i}^{'}]_1,[\varvec{y}_{q_i}+\varvec{r}_{\xi _{sk}}]_2)\cdot e([{\mathbf {c}}_{q_i}^{''}]_1,[\varvec{r}_{\xi _1}]_2)/e([\varvec{c}_{q_i}]_1,[\varvec{d}_{q_i}]_2)}{[Z_i]_T} \nonumber \\&= \prod _{i=1}^{t}\frac{{{g}_{T}}^{{<}{\varvec{c}}_{{q}_{i}}^{'},{\varvec{y}}_{{q}_{i}}+\varvec{r}_{\xi _{sk}}{>}}\cdot {{g}_{T}}^{{<}{\varvec{c}}_{{q}_{i}}^{''},\varvec{r_{\xi _1}}{>}}/{{g}_{T}}^{{<}{\varvec{c}}_{{q}_{i}},{\varvec{d}}_{{q}_{i}}{>}}}{{\left[ { Z}_{i}\right] }_{T}}\nonumber \\&= \prod _{i=1}^{t}\frac{{{g}_{T}}^{{<}{{\varvec{M}}\varvec{A}}{\varvec{s}}_{\xi ,qi}+{\varvec{x}}_{qi}+\varvec{r}_{\xi _{pt}},{\varvec{y}}_{{q}_{i}}+\varvec{r}_{\xi _{sk}}{>}+{<}{{\varvec{V}}\varvec{A}}{\varvec{s}}_{\xi ,qi}+{\varvec{z}}_{qi},\varvec{r}_{\xi _1}{>}}}{{{g}_{T}}^{{<}{\varvec{z}}_{qi},\varvec{r}_{\xi _1}{>}+{<}{\varvec{A}}{\varvec{s}}_{\xi ,qi},{{\varvec{M}}}^{T}({\varvec{y}}_{qi}+\varvec{r}_{\xi _{sk}})+{{\varvec{V}}}^{T}\varvec{r}_{\xi _1}>}}\nonumber \\&= {\prod }_{i=1}^{t}{{g}_{T}}^{{<}{\varvec{x}}_{{q}_{i}}+\varvec{r}_{\xi _{pt}},{\varvec{y}}_{{q}_{i}+\varvec{r}_{\xi _{sk}}}{>}}\nonumber \\&= {{g}_{T}}^{\sum _{i=1}^{t}{<}{\varvec{x}}_{{q}_{i}},{\varvec{y}}_{{q}_{i}}{>}\,+\,r_{\xi }} \end{aligned}$$
(7)

When the cloud server receives a search token, it first decrypts \(\varepsilon .Enc(t)\) to get t and retrieves real search token \(rst={{g}_{T}}^{\sum _{i=1}^{t}{<}{\varvec{x}}_{{q}_{i}},{\varvec{y}}_{{q}_{i}}{>}}\) by t and \([r_{\xi }]_T\) according to user’s identity \(\xi \). Because \(\alpha _1,\alpha _2,...,\alpha _m \in (0, \frac{{\mathrm {log}}_{{g}_{T}}q}{2})\) is a super-incremental sequence, we have that \(\alpha _i>\alpha _1+\alpha _2+...+\alpha _{i-1}\). Thus, \({g_T}^{\alpha _i}> {g_T}^{\alpha _1+\alpha _2+...+\alpha _{i-1}}\). Because of the ciphertext of keywords \({g_T}^{\alpha _j}={g_T}^{{<}\varvec{x}_{qi},\varvec{y}_{qi}{>}}\), it means that the product of the cipherhext of keywords that data user wants to search equals to the real search token. When the cloud server retrieves \({g_T}^{\sum _{i=1}^t {<}\varvec{x}_{qi},\varvec{y}_{qi}{>}}\), it determines whether there is a k satisfying \({g_T}^{\alpha _{k}} \le rst < {g_T}^{\alpha _{k+1}}\) or not. Obviously, the keywords corresponding to \({g_T}^{\alpha _{k+1}},...,{g_T}^{\alpha _m}\) can not be the target keyword. If keyword corresponding to \({g_T}^{\alpha _k}\) is not the target keyword, the keywords corresponding to \({g_T}^{\alpha _1},...,{g_T}^{\alpha _{k-1}}\) must be the target keywords, namely, \({g_T}^{\alpha _1+...+\alpha _{k-1}}=rst\). However, according to the super-incremental sequence, we know that \({g_T}^{\alpha _1+...+\alpha _{k-1}} < rst\), that is to say, the keywords corresponding to \({g_T}^{\alpha _1},...,{g_T}^{\alpha _{k-1}}\) cannot be the target keywords. Therefore, we know that the keyword corresponding to \({g_T}^{\alpha _k}\) is one of the target keywords.

6.2 Security

For the files, ciphertexts \(C_F\) are semantic security by adopting symmetric encryption, such as AES. Then a probabilistic polynomial-time adversary cannot get any useful information from \(C_F\) with non-negligible probability. For the keyword, we have:

Partial-Token Privacy. Partial-Token Privacy means that a probabilistic polynomial-time adversary cannot get any useful information from the partial-token. That is to say, assuming that an adversary gets one item of the partial-token from a legitimate data user \(\xi \), he could not know which keywords the item is about.

  • Setup: The challenger plays a role as the system and runs Setup(), then it keeps master key msk.

  • Init: The challenger runs Enc() to get different reasonable \(\left\{ \varvec{x}_i\right\} _{i\in [m]}\), \(\left\{ \varvec{y}_i\right\} _{i\in [m]}\) and \(\left\{ \xi ,r_{\xi }\right\} _{\xi \in DU}\), with which it can run ClientKGen() and PartialTokenGen().

  • Query Phase1: The adversary adaptively queries sk and pt about different \(\xi \) for polynomial times. The challenger runs PartialTokenGen() and ClientKGen() algorithm and returns \(pt \longleftarrow PartialTokenGen\) \((pp, msk,\left\{ \varvec{x}_i\right\} _{i\in [m]}, \xi )\) and \(sk \longleftarrow ClientKGen(pp,msk,\left\{ \varvec{y}_i\right\} _{i\in [m]}, \xi , r_{\xi })\).

  • Challenge phase: The adversary randomly selects two keywords \(w_{i_0}\) and \(w_{i_1}\) and submits the identity \(\xi \) he wants to challenge with the restriction that \(\xi \) has not queried before. The challenger flips a coin to select \(\beta \longleftarrow \left\{ 0,1\right\} \) and then runs PartialTokenGen() algorithm. The challenger returns \(PartialTokenGen(pp,msk,\varvec{x}_{i\beta },\xi ,r_{\xi })\) to the adversary.

  • Query Phase 2: The adversary executes queries as Phase1 did.

  • Guess: Finally, the adversary gives a guess \(\beta ^{'}\) of \(\beta \) and wins the game if \(\beta ^{'}=\beta \). We can define the advantage of adversary winning the game is \(|Pr [\beta ^{'}=\beta ]-\frac{1}{2}|\).

Theorem 1

If an adversary wins the game mentioned above with a non-negligible advantage, there is an adversary \(\mathcal {B}\) can break MDDH assumption.

Proof

Specific proofs are detailed in the Appendix A. \(\square \)

Index Privacy. Index privacy means that a probabilistic polynomial-time adversary cannot get any useful information from encrypted keyword \(C_w\). In other words, the cloud server cannot determine which keyword the ciphertext is for.

  • Setup: The challenger plays a role as the system and runs Setup(), then it keeps master key msk.

  • Query Phase1: The adversary adaptively queries the ciphertext of keyword w for polynomial times, and get \(C_{w} \longleftarrow Enc(pp,w)\).

  • Challenge phase: The adversary randomly selects two keywords \(w_{i_0}\) and \(w_{i_1}\) which have not queried before, and sends them to the challenger. The challenger flips a coin to select \(\beta \longleftarrow \left\{ 0,1\right\} \) and then returns \(C_{w} \longleftarrow Enc(pp,w_{i\beta })\) to the adversary.

  • Query Phase 2: The adversary continues to query the ciphertext of keyword w as Phase1 did with the restriction that w is neither \(w_{i_0}\) nor \( w_{i_1}\).

  • Guess: Finally, the adversary gives a guess \(\beta ^{'}\) of \(\beta \) and wins the game if \(\beta ^{'}=\beta \). We can define the advantage of adversary winning the game is \(|Pr [\beta ^{'}=\beta ]-\frac{1}{2}|\).

Theorem 2

If an adversary wins the game mentioned above with a non-negligible advantage, our scheme is secure with index privacy.

Proof

If the adversary wants to know whether \(C_w\) is about \(w_{i_0}\) or \(w_{i_1}\), he will analyze the \(C_w={g_T}^{{<}\varvec{x}_{i\beta },\varvec{y}_{i\beta }{>}}={g_T}^{\alpha _{j}}\). Because \(\alpha _{j}\) is less than \(\frac{{\mathrm {log}}_{{g}_{T}}q}{2}\), he could get \({<}\varvec{x}_{i\beta },\varvec{y}_{i\beta }{>}\) by logarithmic operation. However, \(\varvec{y}_{i\beta }\) is kept secret by the challenger, so that it is impossible for the adversary to get \({\mathbf {x}}_{i\beta }\) and has no chance to get keyword \(w_{i\beta }\). Therefore, the adversary can not know whether \(\beta \) equals to 0 or 1. \(\square \)

Token Privacy. Token privacy means that given a search token, a probabilistic polynomial-time adversary cannot learn which keyword the search token is for. Namely, the cloud server cannot know which keyword the data user queries.

  • Setup: The challenger plays a role as the system and runs Setup(), then it keeps master key msk.

  • Init: The challenger runs Enc() to get reasonable \(\left\{ \varvec{x}_i\right\} _{i\in [m]}\), \(\left\{ \varvec{y}_i\right\} _{i\in [m]}\) and \(\left\{ \xi ,r_{\xi }\right\} _{\xi \in DU}\), with which it can run ClientKGen() to obtain secret key sk. Besides, it runs PartialTokenGen() to get partial token pt.

  • Query Phase1: The adversary with \(\xi \) adaptively queries search token of keyword w for polynomial times, and get \(st \longleftarrow Token (sk,pt,w)\).

  • Challenge phase: The adversary randomly selects two keywords \(w_{i_0}\) and \(w_{i_1}\) which have not queried before, and sends them to the challenger. The challenger flips a coin to select \(\beta \longleftarrow \left\{ 0,1\right\} \) and then runs TokenGen() algorithm. The challenger returns \(st \longleftarrow Token (sk,pt,w)\) to the adversary.

  • Query Phase 2: The adversary continues to query search token of keyword w as Phase1 did with the restriction that w is neither \(w_{i_0}\) nor \(w_{i_1}\).

  • Guess: Finally, the adversary gives a guess \(\beta ^{'}\) of \(\beta \) and wins the game if \(\beta ^{'}=\beta \). We can define the advantage of adversary winning the game is \(|Pr [\beta ^{'}=\beta ]-\frac{1}{2}|\).

Theorem 3

If an adversary wins the game mentioned above with a non-negligible advantage, our scheme is secure with token privacy.

Proof

If the adversary wants to know whether st is about \(w_{i_0}\) or \(w_{i_1}\), he will analyze the \(st={g_T}^{{<}\varvec{x}_{i\beta },\varvec{y}_{i\beta }{>}+r_{\xi }}\). For the cloud, although he can calculate \({g_T}^{{<}\varvec{x}_{i\beta },\varvec{y}_{i\beta }{>}}\) and get \({<}\varvec{x}_{i\beta },\varvec{y}_{i\beta }{>}\) by logarithmic operation, the cloud has no way to get \(\varvec{y}_{i\beta }\). Therefore, it is incapable of getting \(\varvec{x}_{i\beta }\) and unable to get keyword \(w_{i\beta }\). \(\square \)

Table 1. Calculation overhead

7 Functionality and Efficiency

We compare our scheme with the work in [13, 22]and the first scheme of work in [12] in Table 1. From the table, we can see that the size of ciphertext is linear with the number of keywords m in both literature [13] and our scheme. While in literature [22], the ciphertext size is linear with the product of the number of keywords m and the number of files n. And in literature [12], the ciphertext size is linear with the number of keywords data owner extracts. We can easily find that the size of the search token is constant only in our scheme. Obviously, our scheme could significantly reduce the communication and transportation overhead, especially when the number of data users is large and the query frequency is high. Besides, by using an inverted index structure and super-incremental sequence, our scheme achieves efficient multi-keyword search, which is illustrated in Table 1. In addition, our scheme avoids the per-query interaction between data owner and data users. That is to say, the data owner does not need to stay online waiting for data users to search in his archives. Furthermore, our scheme supports multi-keyword search in single-owner/multi-user setting.

8 Conclusion

In our scheme, search tokens have only two items by tactfully leveraging multi-input inner-product functional encryption, which reduces communication and transportation overhead significantly. The use of inverted index structure and super-incremental sequence makes the multi-keyword search process efficient. In addition, our scheme avoids the per-query interaction between data owner and data users. That is to say, data owner does not need to stay online waiting for data users to search in his archives. What is more, our scheme ensures the correctness of search process and protects the privacy of keywords and plaintext files.