# Secure Database Using Order-Preserving Encryption Scheme Based on Arithmetic Coding and Noise Function

## Abstract

Order-preserving symmetric encryption (OPE) is a deterministic encryption scheme which encryption function preserves numerical order of the plaintexts. That allows comparison operations to be directly applied on encrypted data in case, for example, decryption takes too much time or cryptographic key is unknown. That’s why it is successfully used in cloud databases as effective range queries can be performed based on. This paper presents order-preserving encryption scheme based on arithmetic coding. In the first part of it we review principles of arithmetic coding, which formed the basis of the algorithm, as well as changes that were made. Then we describe noise function approach, which makes algorithm cryptographically stronger and show modifications that can be made to obtain order-preserving hash function. Finally we analyze resulting vulnerability to chosen-plaintext attack.

## Keywords

Cloud computing security Order-preserving encryption Symmetric-key cryptosystems Order-preserving hash functions## 1 Introduction

Nowadays, the amount of information stored in various databases steadily increases. In order to store and effectively manage large amounts of data it is needed to increase data storages capacity and allocate funds for its administration. Another way that was chosen by many companies is to give the database management to a third-party. Such service is managed by a cloud operator and is called Database as a Service, DBaaS.

Obviously, this approach has its own flaws. And the most important of them is security issue. Data can be stolen by the service provider itself or by someone else from its storage. Fortunately, this problem can be solved by encryption. Of course if we just encrypt the whole database with a conventional encryption algorithm, we’ll have to encrypt and decrypt it each time we need something. So, all advantages will be lost. That’s why special encryption schemes, such as homomorphic encryption and order-preserving encryption, are developed. The first one allows us to handle encrypted data, and the second – to sort them and select the desired.

All known order-preserving schemes have significant problems, such as low level of security (polynomial monotonic functions [1], spline approximation [2], linear functions with random noise [3]), low performance (summation of random numbers [4], B-trees [5]) or too-large numbers proceeding (scheme by Boldyreva [6]). Proposed scheme doesn’t have these disadvantages and, furthermore, unlike all the others can be used to encrypt real numbers. Also it can be used to obtain order-preserving hash function.

This algorithm combines two main ideas, which the majority of OPE schemes operate with: monotonic functions design and elements of coding theory (implicit monotonic functions design). It is claimed that scheme is based on arithmetic coding and noise function, but, in fact, this article considers only the case with binary alphabet. In theory, nothing prevents the use of an arbitrary one.

First, let’s give a definition of order-preserving encryption. Assume there are two sets A and B with order relation \( < \). Function \( {\text{f}}: {\text{A}} \to {\text{B}} \) is strictly increasing if \( \forall {\text{x}},{\text{y}} \in {\text{A}}, {\text{x}} < y \Leftrightarrow f\left( {\text{x}} \right) < f\left( {\text{y}} \right) \). Order-preserving encryption is deterministic symmetric encryption based on strictly increasing function.

The described order-preserving encryption scheme was developed in Laboratory of Modern Computer Technologies of Novosibirsk State University Research Department as a part of “Protected Database” project^{1} and is based on arithmetic coding and noise function. Let us consider them precisely.

## 2 Splitting Procedure of Arithmetic Coding

Suppose \( \upgamma = \frac{\text{p}}{{{\text{p}} + {\text{q}}}},\upmu = \frac{\text{q}}{{{\text{p}} + {\text{q}}}} \), where \( {\text{p}},{\text{q}} \) are random natural numbers. Obviously, \( \upgamma +\upmu = 1 \). Let us split the interval \( \left[ {\left. {0, 1} \right)} \right. \) into two parts \( \left[ {\left. {0, \frac{\text{p}}{{{\text{p}} + {\text{q}}}}} \right)} \right.,\left[ {\left. {\frac{\text{p}}{{{\text{p}} + {\text{q}}}}, 1} \right)} \right. \). If \( {\text{G}}\left( {\frac{\text{p}}{{{\text{p}} + {\text{q}}}}} \right) > 0 \), the interval \( \left[ {\left. {0, \frac{\text{p}}{{{\text{p}} + {\text{q}}}}} \right)} \right. \) is selected, and the output is 0-bit (\( \upbeta_{1} = 0 \)). If \( {\text{G}}\left( {\frac{\text{p}}{{{\text{p}} + {\text{q}}}}} \right) < 0 \), the interval \( \left[ {\left. {\frac{\text{p}}{{{\text{p}} + {\text{q}}}}, 1} \right)} \right. \) is selected, and \( \upbeta_{1} = 1 \). Let us denote \( \left[ {\left. {{\text{a}}_{1} ,{\text{b}}_{1} } \right)} \right. \) the interval was selected.

This interval is again split into parts in the ratio \( \upgamma:\upmu \). According to the sign of function \( {\text{G}}({\text{x}}) \) in the splitting point, one of the segments is selected. Proceeding by induction, the interval \( \left[ {\left. {{\text{a}}_{\text{k}} ,{\text{b}}_{\text{k}} } \right)} \right. \) can be calculated for \( \forall {\text{k}} \). Its length is \( \upgamma^{\text{r}}\upmu^{{{\text{n}} - {\text{r}}}} \), where \( {\text{r}} \) is the number of zeros in string \( \upbeta \). If \( \forall {\text{r}}: \frac{1}{{2^{\text{n}} }} <\upgamma^{\text{r}}\upmu^{{{\text{k}} - {\text{r}}}} \), then \( {\text{s}} \in \left[ {\left. {{\text{a}}_{\text{k}} ,{\text{b}}_{\text{k}} } \right)} \right. \) and \( {\text{c}} = 2^{\text{n}} {\text{s}} \) are uniquely defined by \( \upbeta = (\upbeta_{1} , \ldots ,\upbeta_{\text{k}} ) \). It is also obvious that this mapping preserves an order.

Generalizing used in the adaptive arithmetic coding, as well as in the proposed algorithm, is that it is possible to use different ratio on each step. This allows us to achieve stronger security of encryption.

## 3 Noise Function

It is known that the composition of two strictly increasing functions strictly increases. Therefore, to provide stronger security of cryptographic algorithm special random strictly increasing function is used in addition to the splitting procedure. In fact, we use inverse function of the one that was generated.

It was proved [6] that OPE schemes cannot satisfy the standard notions of security, such as indistinguishability against chosen-plaintext attack (IND-CPA) [7], since they leak the ordering information of the plaintexts. If an adversary knows plaintexts \( {\text{p}}_{1} , {\text{p}}_{2} \) and corresponding ciphertexts \( {\text{c}}_{1} , {\text{c}}_{2} \) and \( {\text{c}} \), such that \( {\text{c}}_{1} < c < {\text{c}}_{2} \), it is obvious that the plaintext for \( {\text{c}} \) lies in the interval \( ({\text{p}}_{1} , {\text{p}}_{2} ) \). In addition, the adversary can always find the decryption function in some approximation, for instance, using linear interpolation.

Thus, the adversary can get \( \left( {{\text{a}}_{0} , \ldots {\text{a}}_{\text{n}} } \right) \) and correspondingly encryption function \( {\text{f}}({\text{x}}) \).

## 4 Cryptographic Scheme

### 4.1 Key Generation

As a private key of encryption algorithm we consider noise function \( {\text{f}}\left( {\text{x}} \right) = \int_{\text{c}}^{\text{x}} {\left( {{\text{a}}_{0} + {\text{a}}_{1} {\text{t}} + {\text{a}}_{2} {\text{t}}^{2} } \right)({\text{a}}_{3} + {\text{a}}_{4} \sin \left( {{\text{a}}_{5} + {\text{a}}_{6} {\text{t}}} \right) + {\text{a}}_{7} \cos ({\text{a}}_{8} + {\text{a}}_{9} {\text{t}})){\text{dt}}} \) and a set of ratios \( \left( {{\text{p}}_{\text{i}} , {\text{q}}_{\text{i}} } \right) \).

- 1.
Generate random ratios \( {\text{p}}_{\text{i}} ,{\text{q}}_{\text{i}} \).

- 2.Check the condition$$ \mathop \prod \limits_{\text{i}} \frac{{{ \hbox{max} }\left( {{\text{p}}_{\text{i}} ,{\text{q}}_{\text{i}} } \right)}}{{{\text{p}}_{\text{i}} + {\text{q}}_{\text{i}} }}{\text{f}}_{ \hbox{max} }^{ '} \left( {\text{x}} \right) < \frac{1}{{2^{\text{n}} }} $$
If this conditions if satisfied, go to the step 3, else go back to the step 1.

- 3.
Output the set of ratios \( \left( {{\text{p}}_{1} ,{\text{q}}_{1} } \right),\left( {{\text{p}}_{2} ,{\text{q}}_{2} } \right), \ldots ,\left( {{\text{p}}_{\text{k}} ,{\text{q}}_{\text{k}} } \right). \)

The key is the set \( {\text{K}} = [\left( {{\text{a}}_{0} , \ldots ,{\text{a}}_{9} } \right),\left( {{\text{p}}_{1} ,{\text{q}}_{1} } \right),\left( {{\text{p}}_{2} ,{\text{q}}_{2} } \right), \ldots ,\left( {{\text{p}}_{\text{k}} ,{\text{q}}_{\text{k}} } \right)]. \)

### 4.2 Encryption

Assume we need to encrypt n-bit integer s with the key \( {\text{K}} = [{\text{f}}({\text{x}}),\left( {{\text{p}}_{1} ,{\text{q}}_{1} } \right),\left( {{\text{p}}_{2} ,{\text{q}}_{2} } \right), \ldots ,\left( {{\text{p}}_{\text{k}} ,{\text{q}}_{\text{k}} } \right)] \), where \( {\text{f}}({\text{x}}) \) is a noise function, \( {\text{f}}\left( {{\text{a}}_{0} } \right) = 0 \), \( {\text{f}}({\text{b}}_{0} ) = 2^{\text{n}} \), and \( ({\text{p}}_{\text{i}} ,{\text{q}}_{\text{i}} ) \) is a set of ratios. Consider the \( {\text{i}} \)-th iteration of algorithm.

If \( {\text{f}}({\text{x}}) > s \), then \( \upbeta_{\text{i}} = 0 \), \( {\text{a}}_{\text{i}} = {\text{a}}_{{{\text{i}} - 1}} , {\text{b}}_{\text{i}} = {\text{x}} \). Otherwise, \( \upbeta_{\text{i}} = 1, {\text{a}}_{\text{i}} = {\text{x}}, {\text{b}}_{\text{i}} = {\text{b}}_{{{\text{i}} - 1}} \).

Notice that \( \forall {\text{i}},{\text{f}}^{ - 1} ({\text{s}}) \in \left[ {\left. {{\text{a}}_{\text{i}} ,{\text{b}}_{\text{i}} } \right)} \right. \) according to the selection of \( {\text{a}}_{\text{i}} \) and \( {\text{b}}_{\text{i}} \). After performing k iterations, (where k is the size of the key, i.e. the number of ratios) we obtain the bit sequence \( \upbeta = \left( {\upbeta_{1} , \ldots ,\upbeta_{\text{k}} } \right),\upbeta_{\text{i}} \in \left\{ {0,1} \right\} \), which is a ciphertext for \( {\text{s}} \).

### 4.3 Decryption

Suppose there is a bit sequence \( \upbeta = \left( {\upbeta_{1} , \ldots ,\upbeta_{\text{k}} } \right),\upbeta_{\text{i}} \in \left\{ {0,1} \right\} \), which is the ciphertext for \( {\text{s}} \), encrypted with some key \( {\text{K}} \). Let us consider the i-th iteration of the algorithm.

If \( \upbeta_{\text{i}} = 0 \), then \( {\text{a}}_{\text{i}} = {\text{a}}_{{{\text{i}} - 1}} , {\text{b}}_{\text{i}} = {\text{x}} \). Otherwise, \( {\text{a}}_{\text{i}} = {\text{x}} \), \( {\text{b}}_{\text{i}} = {\text{b}}_{{{\text{i}} - 1}} \).

## 5 Scheme Modifications

### 5.1 Application of the Scheme for Fixed-Point Arithmetic

After key generation number \( {\text{l}} \) can’t be modified and is a part of the key. So, the secret key \( {\text{K}} \) now is the set \( [{\text{l}},\left( {{\text{a}}_{0} , \ldots ,{\text{a}}_{9} } \right),\left( {{\text{p}}_{1} ,{\text{q}}_{1} } \right),\left( {{\text{p}}_{2} ,{\text{q}}_{2} } \right), \ldots ,\left( {{\text{p}}_{\text{k}} ,{\text{q}}_{\text{k}} } \right)] \).

### 5.2 Strictly Increasing Hash Function

This algorithm can also be modified to produce a strictly increasing hash function. It can be used, for example, in encrypted database, if it stores two entities for each data: ciphertext, that was obtained from cryptographically strong algorithm and hash value returned by hash function. This allows both to be sure that the data won’t be decrypted by adversary (first entity is secure and the second can’t be decrypted at all) and apply comparison operations on encrypted data to some extent.

To begin, we note that output has the same bit size as the number of ratios \( {\text{p}}_{\text{i}} ,{\text{q}}_{\text{i}} \) from the secret key. So, in order to obtain a hash function, it is enough to change the procedure of key generation, and more precisely, its ratios generation part.

Instead of the condition checking from the point 2, satisfaction of which guaranteed that the data can be decrypted, now we need to perform the first point – pair \( {\text{p}}_{\text{i}} ,{\text{q}}_{\text{i}} \) generation – a number of times. This number, evidently, is equal to the number of bits that hash function returns.

- 1.
Select strictly increasing noise function f(x). To do this, generate \( \left( {{\text{a}}_{0} , \ldots {\text{a}}_{9} } \right) \) so that

for \( \forall {\text{t}} \in ({\text{c}}; {\text{x}}_{ \hbox{max} } ) \) , where \( {\text{c}} \) is a fixed constant.$$ \left( {{\text{a}}_{0} + {\text{a}}_{1} {\text{t}} + {\text{a}}_{2} {\text{t}}^{2} } \right)({\text{a}}_{3} + {\text{a}}_{4} \sin \left( {{\text{a}}_{5} + {\text{a}}_{6} {\text{t}}} \right) + {\text{a}}_{7} \cos ({\text{a}}_{8} + {\text{a}}_{9} {\text{t}})) > 0 $$ - 2.
Generate random set of ratios \( \left( {{\text{p}}_{1} ,{\text{q}}_{1} } \right),\left( {{\text{p}}_{2} ,{\text{q}}_{2} } \right), \ldots ,\left( {{\text{p}}_{\text{m}} ,{\text{q}}_{\text{m}} } \right) \) .

- 3.
The key is the set \( {\text{K}} = [\left( {{\text{a}}_{0} , \ldots ,{\text{a}}_{9} } \right),\left( {{\text{p}}_{1} ,{\text{q}}_{1} } \right),\left( {{\text{p}}_{2} ,{\text{q}}_{2} } \right), \ldots ,\left( {{\text{p}}_{\text{m}} ,{\text{q}}_{\text{m}} } \right)] \) .

To get rid of the big numbers processing, for instance, if we need to get hash of a large file, it is possible to split input data into parts with acceptable size and calculate hash for each of them. The result hash value of the whole file can be found as their concatenation. This approach allows us to hash data of any predetermined dimension.

So, there are three parameters that we can select arbitrarily depending on our purpose: \( {\text{s}}_{1} \) – size of the processed parts, \( {\text{s}}_{2} \) – hash size for each of them (\( {\text{s}}_{2} < {\text{s}}_{1} ) \), and \( {\text{s}}_{3} \) – maximum file size. Obviously, final hash is \( \frac{{{\text{s}}_{2} {\text{s}}_{3} }}{{{\text{s}}_{1} }} \)-bit.

Since encryption algorithm remains the same, the hash function running time depends linearly on its output size (it is equal to the number of algorithm iterations). Therefore, it is not recommended to choose too-big \( {\text{s}}_{2} \) number.

In order to process files smaller than the maximum size, they can be padded with zeros on the left. In this case, order is still preserves. Since this is a hash function algorithm, decryption is no longer exists.

## 6 Encryption Security

As we have seen (see Sect. 3) OPE schemes cannot satisfy the standard notions of security against chosen-plaintext attack. Different methods of cryptoanalysis are considered to determine the notion of order-preserving encryption security [2, 8, 9, 10]. Generally, the security of such schemes is based on the fact that monotonic function, the scheme is based on, must be completely indistinguishable from truly random monotonic function. This means that only an access to the private key allows performing accurate data decryption.

So let us check this algorithm for this condition in practice. To do that, we encrypted all 16-bit numbers (from 0 to 65535) with the same random key and analyzed the results.

As we see, this chart and right hyperbola \( {\text{y}} = \frac{1}{\text{x}} \) are alike. It is typical for monotonic functions that were generated randomly and indicates that the maximum available security of the algorithm was achieved.

We can see that the differences are distributed very irregularly. As it is a feature of secure encryption, we can claim that proposed algorithm is cryptographically strong.

## Footnotes

- 1.
This research is performed in Novosibirsk State University under support of Ministry of Education and Science of Russia (contract no. 02.G25.31.0054).

## References

- 1.Ozsoyoglu, G., Singer, D.A., Chung, S.S.: Anti-tamper databases: querying encrypted databases. EECS Department, Math Department, Case Western Reserve University, Cleveland, OH 44106. doi:http://dx.doi.org/10.1109/ICDEW.2006.30
- 2.Agrawal, R., Kiernan, G.G., Srikant, R., Xu, Y.: System and method for order-preserving encryption for numeric dataGoogle Scholar
- 3.Kerschbaum, F.: Commutative order-preserving encryption. Karlsruhe, United States Patent 20120121080, DE, 17 May 2012Google Scholar
- 4.Bebek, G.: Anti-tamper database research: Inference control techniques. Technical Report EECS433 Final Report, Case Western Reserve University (2002)Google Scholar
- 5.Popa, R.A., Li, F.H., Zeldovich, N.: An ideal-security protocol for order-preserving encoding, MIT CSAIL (2011). doi:http://dx.doi.org/10.1109/CISS.2012.6310814
- 6.Boldyreva, A., Chenette, N., Lee, Y., O’Neill, A.: Order-preserving symmetric encryption. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 224–241. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- 7.Shnayder, B.: Applied Cryptology, 816 pp. Triumph, Moscow (2002)Google Scholar
- 8.Boldyreva, A., Chenette, N., O’Neill, A.: Order-preserving encryption revisited: improved security analysis and alternative solutions. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol. 6841, pp. 578–595. Springer, Heidelberg (2011)CrossRefGoogle Scholar
- 9.Martinez, S., Miret, J.M., Tomas, R., Valls, M.: Security analysis of order preserving symmetric cryptography. Appl. Math. Inf. Sci.
**7**(4), 1285–1295 (2013)MathSciNetCrossRefGoogle Scholar - 10.Xiao, L., Bastani, O., Yen, I.: Security analysis for order preserving encryption schemes. Technical Report UTDCS-01-12 (2012). https://utd.edu/~ilyen/techrep/OPE-proof1.pdf