1 Introduction

Cloud computing is being adopted by organizations and individuals to address various types of computation needs including file storage, archiving, etc. However, there have been several incidents of data leak in popular cloud storage service providers [1, 25]. To ensure the security of the sensitive data and prevent any unauthorized access, users may need to encrypt data before uploading to cloud. If data uploaded to cloud is encrypted using traditional encryption techniques, executing search queries on the stored data become infeasible. To alleviate this situation many searchable encryption techniques have been proposed [46, 9, 11, 13, 16, 19, 20]. Among those approaches, searchable symmetric encryption (SSE) [46, 9, 13, 16, 19] emerges as an efficient alternative for cloud based storage systems due to minimal storage overhead, low performance overhead, and relatively good security.

However, almost all searchable encryption techniques require executing some code on the cloud servers to enable efficient processing. On the other hand, popular commercial personal cloud storage providersFootnote 1 Footnote 2 Footnote 3 only support basic file operations like read and write file that makes it infeasible to apply traditional SSE techniques. Furthermore, complex queries on multimedia data may require running different and expensive cryptographic operations. These limitations create a significant problem for wide adoption of SSE techniques. Therefore, developing SSE schemes that can run on the existing cloud storage systems without requiring the cloud service providers cooperation emerges as an important and urgent need. To our knowledge, only [19] considered a setup without computational support from the cloud storage but the proposed solution does not support efficient complex querying over encrypted data.

Even though, one can wish that an alternative SSE as a service could be offered in the near future by the cloud service providers, due to network effects, many of the existing users may not want to switch their cloud service providers. Therefore, any new “secure” cloud storage with SSE providers may have a hard time in getting significant traction. So supporting SSE on the existing cloud storage platforms without requiring any support from the cloud storage service providers is a critical need.

In addition, adoption of multimedia (e.g., image, music, video, etc.) data for social communication is increasing day by day. KPCB analyst Mary Meeker’s 2014 annual Internet Trends reportFootnote 4 states 1.8 billion photos shared each day. However, indexing multimedia data is harder compared to text data. A significant pre-processing is required to convert raw multimedia data to a searchable format and queries made on multimedia data are complex as well. So building efficient cryptographic storage system that can easily handle multimedia content is a very important problem.

To address these challenges, in this paper, we propose an efficient searchable encryption scheme framework that can work on existing cloud storage services and can easily handle multimedia data. Our proposed framework only requires file storage and retrieval support from cloud storage services. Furthermore, by leveraging the extensible extract, transform and load operations provided by our framework, very complex queries can be executed on the encrypted data. As an example, we show how our framework could be used to run face recognition queries on encrypted images. To our knowledge, this is the first system that can support complex queries on encrypted multimedia data without significant computational support from the cloud service provider (i.e., without running customized code in the cloud). Main contributions of this work can be summarized as follows:

  • We propose a generic outsourcing framework that enables secure and efficient querying on any data. Our framework supports complex querying on any encrypted data by allowing queries to be represented as series of simple equality queries using the features extracted from the data. Later on, these extracted features are transformed into encrypted indexes and these indexes are loaded to cloud and leveraged for efficient encrypted query processing.

  • We prove that our system satisfies adaptive semantic security for dynamic SSE.

  • We show the applicability of our framework by applying it to state-of-the-art image querying algorithms (e.g., face recognition) on encrypted data.

  • We implement a prototype of our system and empirically evaluate the efficiency under various query types using real world cloud services. Our results show that our system introduces very little overhead, which makes it remarkably efficient and applicable to real-world problems.

The rest of the paper is organized as follows: Sect. 2 discusses previous related works, Sect. 3 provides the general setup and threat model of our system, Sect. 4 describes internal details of each phases, Sect. 5 extends our initial framework making it dynamic, in Sect. 6 we discuss the security of our system, Sect. 7 shows an application of our proposed framework, Sect. 8 shows the experimentations, and in Sect. 9 we conclude our work.

2 Related Work

Currently there are few ways to build encrypted cloud storage with content based search. Searchable symmetric encryption(SSE) is one of those, which allows users to encrypt data in a fashion that can be searched later on. Different aspects of SSE has been studied extensively as shown in an extensive survey of provably secure searchable encryption by Bösch et al. in [4]. Curtmola et al. [9] provided simple construction for SSE with practical security definitions, which was then adopted and extended by several others in subsequent work. Few works also looked into dynamic construction of SSE [5, 13, 14, 16] so that new documents can be added after SSE construction.

Another branch of study related to SSE is supporting conjunctive boolean query. Cash et al. [6] proposed such a construction, where authors used multi-round protocol for doing boolean query with reasonable information leakage. In the process they also claimed to build the most efficient SSE in terms of time and storage. Kuzu et al. [15] proposed an efficient SSE construction for similarity search, where they used locality sensitive hashing to convert similarity search to equality search. There are also work towards supporting efficient range query, substring matching query, etc. [10], where a rich query is converted to an exact matching query. However, these constructions require specialized server. Importantly, we can easily adopt such a conversion technique in our framework.

Naveed et al. [19] proposed a dynamic searchable encryption schema with simple storage server similar to our setup. The system also hides certain level of access pattern. However, authors did not consider complex query problem in their work, which is one of the major challenges that we solved in this work.

Another way of querying encrypted database is oblivious RAM (ORAM) [11, 20] that also hides search access pattern and much secure. Despite recent developments [21], traditional ORAM remains inefficient for practical usage in cloud storage system as described in [3]. Furthermore, our proposed system converts complex operations into sequence of key value read and write operations, which can easily be combined with ORAM technique to hide the access pattern.

Qin et al. [22] proposed an efficient privacy preserving cloud based secure image feature extraction and comparison technique. Similar construction for ranked image retrieval is proposed by [17, 23, 29]. These systems depend on highly capable cloud server for preforming image similarity query.

Finally, there are few commercial secure cloud storage systems, e.g., SpiderOakFootnote 5, BoxCryptorFootnote 6,WualaFootnote 7, etc. Even though these systems are easy to use and provide reliable security, these systems provide neither server based search nor complex query support. All these systems depend on either operating system or local indices to provide search functionalities. As a result, to provide search functionalities these systems need to download and decrypt all the data stored in cloud server, which might not be efficient solution in all circumstances.

3 Background and Threat Model

Searchable Symmetric Encryption (SSE) is one of the many mechanisms to enable search over encrypted data. In a SSE schema, we not only encrypt the input dataset, but also we create an encrypted inverted index. The index contains mapping of encrypted version of keywords (called trapdoors) to list of document ids that contains corresponding plain text keywords. Formally, a SSE schema is defined as collection of 5 algorithms \(SSE = (Gen, Enc,Trpdr, Search, Dec)\) Given security parameter Gen generates a master symmetric key, Enc generates the encrypted inverted index and encrypted data sets from the input dataset. Trpdr algorithm takes keywords as input and outputs the trapdoor, which is used by Search algorithm to find list of documents associated with input keywords. Finally, the Dec algorithm decrypts the encrypted document given the id and proper key. We refer the reader to [9] for further discussion of SSE. Furthermore, in a typical SSE settings, Gen, Enc, Trpdr, and Dec are performed in a client device and the Search algorithm is performed in a cloud server. For this reason, we need a server with custom computational support to run a SSE based system. Here, we focus on building a framework that enables us to build SSE alike schema with complex query processing capabilities using file storage servers that does not have custom computation support.

Threat Model. In this study, we consider a setup, where a user owns a set of documents, which includes multimedia documents. User wants to store these documents into a cloud storage server in encrypted form. User also wants to perform complex search queries over the encrypted data. Most importantly, user wants to utilize existing cloud storage service, which is not capable of executing any custom code provided by user. Formally cloud storage server \(\mathcal {Z}\) can only preform read and write operations. This simple requirement of cloud storage server makes the system easily adoptable in several real world scenario. On the other hand, user have devices with sufficient computation power that can perform modern symmetric cryptography algorithms and are called clients.

In our system, the communication between server and client is done over encrypted channel, such as https. So eavesdroppers can not learn any meaning full information about the documents capturing the communication, apart form existence of such communication. We also assume that the cloud storage server \(\mathcal {Z}\) is managed by Bob, who is semi-honest. As such, he follows the protocol as it is define but he may try to infer private information about the document he hosts. Furthermore, the system does not hide search access pattern, meaning Bob can observe the trapdoors in search query. Based on the encrypted file accesses after subsequent search queries Bob also can figure out trapdoor to document ids assignments. However, Bob can not observe the plain text keyword of trapdoors.

4 The Proposed System

Our main motivation is to build encrypted cloud storage that can support complex search query with support of simple file storage server. We generalize the required computations into a five phase Extract, Transform, Load, Query, Post-Process (ETLQP) framework. These five phases represent chronological order of operations required to create, store encrypted index, and perform complex operations. Figure 1(a) and (b) illustrates an overview of different phases in our system.

Fig. 1.
figure 1

Overall workflow of our proposed system and important data structures. (a) Index creation consists of extract, transform and load phases. (b) Search consists of query and post-process phases. (c) \(\mathcal {P}\), output of extract phase that maps document ids to feature value pairs, (d) Inverted index \(\mathcal {I}\), that maps search signatures to document ids.

4.1 Extract

In this phase we extract necessary features from a dataset. Let, \(\mathcal {D} = \{d_1, d_2, ..., d_n\}\) be a set of documents, \(id(d_i)\) be the identifier of document \(d_i\), \(\varTheta = \{\theta _1, \theta _2, ..., \theta _m\}\) be a set of m feature extractor functions. Functions in \(\varTheta \) can extract set of feature and value pairs (fv) from documents. We build list \(U_i\) with all the feature value pairs extracted from \(d_i\). For all the feature extractors \(\theta _j \in \varTheta \) we compute \((f, v) \leftarrow {\theta _j}(d_i)\) and store (fv) in \(U_i\). Finally we organize the result in \(\mathcal {P}\), such that \(\mathcal {P}[id(d_i)] \leftarrow U_i\). Such an example \(\mathcal {P}\) is illustrated in Fig. 1(c). Here, we have four documents \(\{D_1, .. D_4\}\). \(D_1\) has feature value pairs \(U_1=\){\((f_a, v_\alpha )\),\((f_b, v_\beta )\), \((f_b, v_\gamma )\)}, etc.

To clarify further, let us assume that, we want to build an encrypted image storage application that can preform location based query over the encrypted images. In other word, the system is capable of answering queries, such as, find images taken in Italy. To support such a query, we implement a feature extractor function \(\theta _{l}\), where \(\theta _{l}\) extracts location information from image meta data. Output of \(\theta _{l}\) is defined as a feature value pair (“LOCATION”, “longitude and latitude of image”). We define as many feature extractor necessary based on application need. However, all feature extractor functions returns values in similar format. In Sect. 7 we discuss in details how we defined more feature extractors and use those to answer much more complicated queries.

4.2 Transform

In this phase we transform the extracted feature values into much simpler form so that complex search operations can be expressed as series of equality searches. We compute search signatures s form feature-value pairs and associate corresponding documents with s. This association at query stage can be used to infer existence of a feature-value pair in a document. Essentially here we define sets of transform functions \(\mathcal {T} = \{t_1, .., t_p\}\), where each transform function is designed to generate search signatures from a feature value pair (fv) and \(\mathcal {T}_f\) defines subset of transformation functions that can be applied to feature f.

With these transform functions \(\mathcal {T}\), we generate an inverted index \(\mathcal {I}\) that is indexed by search signatures and contains list of document ids. For all the feature value pairs in \(\mathcal {P}\), we generate search signature \(s_{f,v}^t \leftarrow t(f, v)\) where \(t \in \mathcal {T}_f\). We build document id list \(V_s\) for all the unique search signature s that contains \(id(D_i)\) if and only if there exists a feature value pair (fv) that is in \(U_i\) and at least one transformation function t that generates search signature s. Finally we fill the inverted index \(\mathcal {I}\) such that \(\mathcal {I}[s] \leftarrow V_s\). In Fig. 1(d) we show such an example \(\mathcal {I}\), which is created from \(\mathcal {P}\) of Fig. 1(c). Here, search signature \(s_1\), \(s_2\), \(s_3\), \(s_4\), \(s_5\) are generated from feature value pairs \((f_a, v_\alpha )\), \((f_b, v_\beta )\), \((f_b, v_\gamma )\), \((f_a, v_\sigma )\), \((f_a, v_\delta )\) accordingly.

Similarly, in our encrypted image storage application example, we define a transform function \(t_l\) that takes geographic location and document id as input, converts the location information to mailing address using reverse address lookup service, takes the country information and document id to construct a search signature using a collision resistant hash function.

Using such extract transform model has several benefits over adhoc model. The proposed model helps us to organize the necessary computation into modules, which intern increase development efficiency. The feature extractor functions can be reused in other project.

4.3 Load

In this phase we setup our encryption schema, encrypt the inverted index, and upload the encrypted version into a file storage server \(\mathcal {Z}\). We initialize a master encryption key K, three random constants \(C_1\), \(C_2\), \(C_3\), a secure pseudo random permutation function \(\varphi \), and a keyed pseudo random function H. Given a key, \(\varphi \) encrypts data, \(\varphi ^{-1}\) decrypts corresponding result, and H generates authentication code of messages. In addition, we define a small synchronized cache \(\mathcal {C}\) and an encryption key \(K_C\) for encrypting the cache. \(\mathcal {C}\) is always synchronized with storage server \(\mathcal {Z}\). Synchronization is achieved by updating the server’s version after any change in client’s version and before updating the cache locally most recent version is downloaded from the server first. In \(\mathcal {C}\), we store document id list size of all search signatures of \(\mathcal {I}\), which is notated by \(\mathcal {C}.freq\). Later, we also use this cache to store information related to individual files to make the query phase easier.

figure a

We divide all the document id lists in \(\mathcal {I}\) into b length blocks and add padding to last block if needed. The value of b is determined by defining and minimizing a cost function (described in Subsect. 4.6). We generate trapdoors \(T_j^s\) and \(K_j^s\) for \(j^{th}\) block of document list of \(\mathcal {I}[s]\). We use \(K_j^s\) to encrypt block contents and \(T_j^s\) as the key for encrypted inverted index \(\mathcal {E}\). To query the inverted index later on, our system will regenerate these two trapdoors and perform inverse operations to build the original document id list. In addition, we store number of documents associated with a signature s in \(\mathcal {C}.freq[s]\), then encrypt and upload the cache. Algorithm 1 describes the operations necessary for load phase.

4.4 Query

In previous phases we have created an encrypted inverted index and uploaded into file storage server \(\mathcal {Z}\). Query and post-process phases are dedicated for querying the index and returning proper output to user. First, given a user query q, we extract and transform it to a set of search signatures \(\mathcal {Q}\). We use number of document ids per block, stored in \(\mathcal {C}.freq\), to compute block counts, which in turn used to compute trapdoors \(K_j^s\) and \(T_j^s\) for each block of search signatures. Using these trapdoors we retrieve and decrypt document ids. Finally, the result is organized into a hash table \(\mathcal {R}\) such that \(\mathcal {R}[s] = \mathcal {I}[s]\) for all \(s \in \mathcal {Q}\). Algorithm 2 contains the detail operations of query phase.

figure b

4.5 Post-Process

In this step we further process the result of query phase to remove false positive entries. Given result set \(\mathcal {R}\) from query phase for query q, we remove id of document that does not match the original query. Therefore, \(\mathcal {R}.remove(id(d)) \) if \(q(d) \ne True\). Query that only contains exact search features, this phase is optional.

4.6 Optimal Block Size Analysis

Block size has a direct impact on performance of our proposed system. Larger block size implies waste of space for padding and smaller block size implies many blocks to process. So we need to find an optimal value of block size b that keeps the over all cost to minimal. In our construction for each block we have a fixed cost and a dynamic cost that is related to block length. We define fixed cost as \(\alpha \) and co-efficient of dynamic cost \(\beta \). Cost can be in terms of time and size. Both linearly depends on block size in our construction. So cost for a b length block is \((\alpha + \beta \times b)\). Let, \(\mathcal {J}(s)\) is \(|\mathcal {I}[s]|\) meaning document id list size for search signature s and total cost \(\mathcal {G}(b)\) for blocking and encrypting given inverted index \(\mathcal {I}\) for block length b then \(\mathcal {G}(b) = \sum _{s \in \mathcal {I}} \left\lceil \frac{\mathcal {J}(s)}{b} \right\rceil (\alpha + \beta \times b) \). We want to minimize the above function for b. However, if contains a ceiling function, which can not be minimize by taking derivatives and equating to zero. So we approximate the probability distribution of \(\mathcal {J}\), i.e., lengths of document id list in \(\mathcal {I}\). We assumed that, distribution is Pareto distribution, which is defined by probability density function (PDF) \(f(x|\gamma , x_m) = \frac{\gamma x_m^\gamma }{x^{(\gamma + 1)}}\), where x is the random variable, \(\gamma \) is distribution parameter, and \(x_m\) is minimum value of x. After several algebraic simplification (explained in details in full version [24]), we find the first order derivative

$$\begin{aligned} \begin{aligned} \mathcal {G}'(b) = \beta - x_m^\gamma \beta b ^{-\gamma } + (\alpha + \beta b) x_m^\gamma \gamma b^{-\gamma - 1} - \gamma x_m^{\gamma } b^{-\gamma }(\frac{\alpha }{b} + \beta ) - \frac{\gamma x_m^\gamma }{\gamma - 1} b ^ {-\gamma - 1} \alpha \end{aligned} \end{aligned}$$

Now we minimize b by setting \(\mathcal {G}'(b) = 0\) and solving the equation for b. In experimentation we observe that method of moments estimation for \(x_m\) and \(\gamma \) gives almost correct value.

5 Dynamic Document Addition

Here we are going to improve our algorithms to support dynamic addition of documents. Given a new document set for addition we first perform extract and transform to build an inverted index. Next we compute number of blocks and number of empty spaces in last block for each signature the cache \(\mathcal {C}\). If there exists empty space we fill the empty spaces than create new blocks as needed. On the other hand, if a new signature is observed in new inverted index we perform exact same steps of load. Due to space limitation we deffer further discussion on dynamic document addition for full version of the paper [24].

6 Security

Over the years, many security definitions have been proposed for searchable encryption for semi-honest model. Among those simulation based adaptive semantic security definition by Curtmola et al. [9] is widely used in literature. Later it is customized to work under random oracle model in [14]. We adapt this definition to prove our security model. In short, we define, history \(\mathcal {H}_\eta \), trace \(\lambda \) (the maximum amount of information that a data owner allows its leakage to an adversary) and view \(v\) (the information that is accessible to an adversary) of our system and show the existence of polynomial size simulator \(\mathcal {S}\) such that the simulated view \(v_S(\mathcal {H}_\eta )\) and the real view \(v_R(\mathcal {H}_\eta )\) of history \(\mathcal {H}_\eta \) are computationally indistinguishable. Due to space limitation we defer the formal security proof to our full version [24].

7 Application of ETLQP Framework

As an application of our ETLQP framework we built an image storage system that saves encrypted images in cloud storage and built an encrypted index to search later on. Before going into further detail of our ETLQP framework implementation we briefly describe Fuzzy Color and Texture Histogram (FCTH) [7], Eigenface [27], Locality Sensitive Hashing (LSH) [12], and range query to exact query conversion mechanism [10].

Fuzzy Color Texture Histogram (FCTH) [7] is an histogram of image that combines texture and color information. It is widely used in content based image retrieval systems (CBIR), e.g. [8, 18]. FCTH of an image can be considered as a vector with 192 dimensions and distance between FCTH vector of images can be used to determine similarity among images.

Eigenface [27] is a very well studied, effective yet simple technique for face recognition using static 2D face image. In summary, face images are considered as a point in a high dimensional space. An eigenspace consisting few significant eigen vectors are computed for approximating faces in a training face dataset. Next, test face images are projected into the computed eigenspace. Distances of test face images and all training faces images are computed. If any distance is bellow a pre-determined threshold then those faces are considered a match for associated test face. A detail formal explanation of eigen-face schema is presented in full version [24].

Locality sensitive hashing is a technique widely used to reduce dimensions. Core concept of LSH is to define a family of hash functions such that similar items belong to same bucket with high probability. More specifically we utilized LSH in euclidean space and adopted widely accepted projection over random line technique described in [2]. Let, r be a random projection vectors, v be an input vector, o be a random number used as offset, and w be bucket length parameter fixed by user. The bucket id is computed by \(Round(\frac{v . r + o}{ w })\) function. Finally, several such projection vectors are used to generate several bucket ids for a single input vector. In this setting, nearby items will share at least a same bucket with very high probability.

Range query to exact query conversion. We adopt the range query mechanism described in [10]. Let, a be a discrete feature that has value ranging from 0 to \(2^{t-1}\), meaning it requires t bits to represent in binary. We first create binary tree of t depth representing the complete range. Each leaf node (at depth t) represent an element in the range and we level all left edge as 0 and right edge as 1. So, the path from the root to a leaf node essentially represent the binary encoding of that leaf. In transform phase, we convert an input value of the range to t feature-value tuples, where the feature is concatenation of field name, depth i and value is binary encoding of inner node at depth i. During the query phase given a range we first find the cover as described in [10], create the corresponding search signatures and perform the query.

7.1 ETLQP for Image Storage

To build an application using ETLQP framework described system section, programmer has to define proper extract and transformation functions. Load, Query and Post-Process phases remain the same. For our image storage software we consider four features location - where the picture was take, time - when the picture was taken, texture and color - for searching similar pictures, and faces - for face recognition. In our implemented system queries of first two features are equality search and later two are similarity search. Similarity searches are difficult to perform since result not only contains exact matches but also contains results that are similar. So, we need to have a similarity measure for the feature in question. To accomplish such a similarity queries we utilize LSH, which essentially helps us to convert the query to sequence of equality search. In addition, result of LSH can contain false positives. We need extra post processing to remove those.

Extract. Location and time data are extracted from ExifFootnote 8 meta-data. Exif is a very popular standard for attaching image meta-data into image used by all popular camera manufacturers. Camera with Global Positioning System (GPS) module can store longitude and latitude of a picture taken into Exif data, which can be extracted easily using available librariesFootnote 9. We use FCTH for similarity analysis and used a open source implementation of FCTH analyzer [18]. Finally, for face recognition using Eigenface, we extract frontal faces from images using haar cascade [28] frontal face pattern classifier.

Transform. Now we define appropriate transformations for extracted features. Main idea behind the definition of transformation functions is to make the query easier later on. So definition of transformation functions is mainly guided by the query demand.

  •  Location. Location information in terms of longitude and latitude is difficult to use in practice. We use OpenStreetMap’s reverse geolocation serviceFootnote 10 to determine address of latitude and longitude associated with the image. To make query easier later, we generate search signatures of six sub-features of the address - full address, city, county, country, state, and zip.

  •  Time. Similarly we break created date of an image into five sub-features - complete date, year, month, day of month, and day of week. We generate search signatures based on these sub-features. In addition, to support range query based on date we convert the time into unix time stamp that essentially represents seconds passed from \(1\ January\ 1970\) without considering the leap second. Then we divide the time stamp by number of seconds in a day (86400), that gives us the number of days passed from epoch. Finally, we build the range query binary tree with depth 20, which essentially is capable of covering dates till year 4840. Then we create the feature value list as described earlier.

  •  Texture and Color. In the extract phase we extracted FCTH of provided image, which is a 192 dimensional vector. We can treat each dimension as different sub features but that will make it difficult to perform similarity search later on. Instead we define an euclidean LSH schema that put near elements into same bucket and use the bucket ids to generate search signatures.

  •  Face. We built an eigenface schema with extracted face images. Again to preserve similarity we built an euclidean LSH schema with weight vectors of faces and store the eigenspace related information into synchronized cache \(\mathcal {C}\). In particular we store the average face, selected top eigenfaces, and weights of all faces. Storing such information is the major reason of defining the cache \(\mathcal {C}\).

Query and Post-Process. With previously defined extract and transform functions client can perform time queries, such as find images that are taken on specific year, month, day of week, day of moth, or in a rang of dates, etc. Client can also perform location queries, such as find pictures taken in a country, state, city, etc. In both of these cases, we transform a query into encrypted search signatures and retrieve associated encrypted document ids from the cloud storage server. Finally we decrypt and display the result directly to the user. On the other hand, for face recognition and image-similarity query, we extract appropriate feature values from a query image and transform these values into LSH bucket ids of previously defined LSH schema. We generate encrypted search signature, retrieve encrypted document ids, and decrypt the result like date and time queries. However, before showing results to user we remove false positive results introduced by the LSH schema.

8 Experimental Evaluation

Setup. In our proposed design we have two components client and server. Client processes images, performs cryptographic operations, and produces encrypted inverted index that is stored in server. In query phase client retrieves partial index from the server based on user-query.

ETQLP client is written in Java using several other libraries for image feature extraction. Cryptographic operations are performed using Java Cryptographic Extension (JCE) implementation. During our experimentation, we execute the client program in a computer with Intel(R) Core(TM) i7-4770 3.40 GHz CPU, 16 GB RAM running Ubuntu 14.04.4 LTS. Our implemented client can store encrypted inverted index into different types of servers.

- File storage server in local network. We developed a very simple web based storage service that has two end points file read and file write. Our server is written in Python (v2.7.6) using Flask (v0.10.1) microframework and files are stored in a MongoDB (v3.2.0). We deployed our local storage server in a machine with Intel(R) Xeon(R) CPU E5420 2.50 GHz CPU, 30 GB of RAM running CentOS 6.4. In addition, our client computer is also in the same network.

- Amazon S3 Footnote 11 is very popular commercial object and file storage system, which provides easy to use representational state transfer (REST) application program interface (API) for storing, retrieving and managing arbitrary binary data or file. Amazon also provides very extensive software development kit (SDK) for building applications to utilize it’s services. In our implementation, search signatures of encrypted inverted index \(\mathcal {E}\) are keys of S3 objects and content of the objects are associated encrypted document id list.

- Personal file storage services. In our implementation the client is capable to use popular commercial file storage services, for example - DropboxFootnote 12, BoxFootnote 13, and Google DriveFootnote 14. However, due to rate limitation of these services we could not perform extensive analysis. Details are presented in full version [24].

Dataset. We randomly selected 20109 images from Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M) in [26], which contains basic information of 100 million media objects. Size of this random dataset is 42.3 GB, average file size is 2.15 MB, number of faces detected 7027, and 4102 images have latitude and longitude embedded in EXIF data.

Fig. 2.
figure 2

Index and cache sizes

Experiments. We measure performance of different phases of our framework for varying number of randomly selected images from above dataset. Horizontal axis of most of the reported graphs is number of randomly selected images used to build the index and vertical axis is the observation. We repeat each experiment for at least 3 times and report the average observation value. We extracted four features of the images as discussed in Sect. 7. Figure 2 illustrates size growth of unencrypted inverted index, encrypted inverted index, and synchronized cache. The growth is linear, which implies index size increment is proportional to the number of files added. Moreover, in our experiment we observed that for 20000 images encrypted inverted index size is only 7.05 MB, which is about four average size images in our dataset. So size over head of our proposed system is very low.

Fig. 3.
figure 3

Time required in different phases of building and uploading the index for different number of images.

Fig. 4.
figure 4

Time required for different type of queries vs number of files.

We also observe that feature extraction is the most time consuming phase of our system. Figure 3(a) illustrates required time for extracting features. We observe that face detection and extraction time is the dominating factor in this phase. It requires 464.54 min to detect and extract faces from 20000 images in sequential manner, averaging about 1.39 s per image. In addition, other three features takes 85.87 min for 20000 images, averaging 0.26 s per images. Even though it looks like a long time for a lot of images but time required for individual image is very little. Furthermore, these experiments are done in sequential manner. A multi-threaded implementation will certainly reduce the over all time. In addition, in this prototype we implemented a separate program to call native OpenCV API to detect faces and communicate the results back to the main process, which added extra overhead. In contrast, transform phase is one of the fastest phase in our implementation. Here, extracted feature values are transformed into inverted index of search signatures and document ids. We observed that the growth is almost linear and for 20000 images it only requires 696 ms, shown in Fig. 3(b).

Next we encrypt and load the inverted index into a cloud storage server. In our experiments, we load the encrypted index into (1) Local server and (2) Amazon S3. Figure 3(c) shows the time required for encrypting and loading inverted index into local and Amazon S3 server. For 20000 images it requires 20.52 s to encrypt and load the entire inverted index split into 1 MB blocks into local storage server and 5.65 min to complete in Amazon S3 server. Furthermore, the time growth is linear due to the linear growth of index size.

Once encrypted inverted index are uploaded in storage server, we perform queries with different extracted features. In each of the cases we randomly select five value of the respected feature and perform the query and report the average. Figure 4 illustrates performance of the location, date, FCTH, and face queries. Among the location queries we observe that query by full address is fastest Query by state takes longest to finish and query by city performs in between. This is because time require to finish a query is proportional to the number to blocks fetched and processed. Very few images are like to have same full address however more images likely to have common state or city. Among the date queries range query and query by year-month-date (YMD) combination is the fastest for similar reason. Finally, FCTH and face query requires longest time due to long extract, transform, and post-processing step. We had to keep our discussion short here because of space limitation. Detail analysis of these results and few more experiment results are presented in the full version [24].

9 Conclusion

In this study, we addressed the problem of searchable encryption with simple server that can support complex queries with multimedia data type. We made several contributions including an extensible general framework with security proof and its implementation. Our defined extract, transform, load, query and post-process (ETLQP) framework can build efficient searchable encryption scheme for complex data types (e.g., images). With this framework we can perform very sophisticated queries, such as face recognition, without needing cryptographic computational support from the server. Our implementation shows small overhead for building encrypted search index and performing such complex queries. In addition, we also show that overhead of general cryptographic operations is negligible compared to other necessary operations of a cloud based file storage system.