Keywords

1 Introduction

1.1 Simulating Correlated Information

Informal Problem Statement. Let \((X,Z)\in \mathcal {X}\times \mathcal {Z}\) be a pair of correlated random variables. We can think of Z as a randomized function of X. More precisely, consider the randomized function \(h:\mathcal {X}\rightarrow \mathcal {Z}\), which for every x outputs z with probability \(\Pr [Z=z|X=x]\). By definition it satisfies

$$\begin{aligned} (X,h(X)) \overset{d}{=} (X,Z) \end{aligned}$$
(1)

however the function h is inefficient as we need to hardcode the conditional probability table of Z|X. It is natural to ask, if this limitation can be overcome

Q1: Can we represent Z as an efficient function of X?

Not surprisingly, it turns out that a positive answer may be given only in computational settings. Note that replacing the equality in Eq. (1) by closeness in the total variation distance (allowing the function h to make some mistakes with small probability) is not enoughFootnote 1. This discussion leads to the following reformulated question

Q1’: Can we efficiently simulate Z as a function of X?

Why It Matters? Aside from being very foundational, this question is relevant to many areas of computer science. We will not discuss these applications in detail, as they are well explained in [JP14]. Below we only mention where such a generic simulator can be applied, to show that this problem is indeed well-motivated.

  1. (a)

    Complexity Theory. From the simulator one can derive Dense Model Theorem [RTTV08], Impagliazzo’s hardcore lemma [Imp95] and a version of Szemeredis Regularity Lemma [FK99].

  2. (b)

    Cryptography. The simulator can be applied for settings where Z models short leakage from a secret state X. It provides tools for improving and simplifying proofs in leakage-resilient cryptography, in particular for leakage-resilient stream ciphers [JP14].

  3. (c)

    Pseudorandomness. Using the simulator one can conclude results called chain rules [GW11], which quantify pseudorandomness in conditioned distributions. They can be also applied to leakage-resilient cryptography.

  4. (d)

    Zero-knowledge. The simulator can be applied to represent the text exchanged in verifier-prover interactions Z from the common input X [CLP15].

Thus, the simulator may be used as a tool to unify, simplify and improve many results. Having briefly explained the motivation we now turn to answer the posed question, leaving a more detailed discussion of some applications to Sect. 1.6.

1.2 Problem Statement

The problem of simulating auxiliary inputs in the computational setting can be defined precisely as follows

Given a random variables \(X\in \{0,1\}^n\) and correlated \(Z\in \{0,1\}^{\ell }\), what is the minimal complexity \(s_h\) of a (randomized) function h such that the distributions of h(X) and Z are \((\epsilon ,s)\)-indistinguishable given X, that is

$$\begin{aligned} |{{\mathrm{\mathbb { E }}}}\textsc {D}(X,{h(X)})-{{\mathrm{\mathbb { E }}}}\textsc {D}(X,Z) | < \epsilon \end{aligned}$$

holds for all (deterministic) circuits \(\textsc {D}\) of size s?

The indistinguishability above is understood with respect to deterministic circuits. However it doesn’t really matter for distinguishing two distributions, where randomized and deterministic distinguishers are equally powerfulFootnote 2.

It turns out that it is relatively easyFootnote 3 to construct a simulator h with a polynomial blowup in complexity, that is when

$$\begin{aligned} s_h =\mathrm {poly}\left( s,\epsilon ^{-1}, 2^{\ell } \right) . \end{aligned}$$

However, more challenging is to minimize the dependency on \(\epsilon ^{-1}\). This problem is especially important for cryptography, where security definitions require the advantage \(\epsilon \) to be possibly small. Indeed, for meaningful security \(\epsilon =2^{-80}\) or at least \(\epsilon = 2^{-40}\) it makes a difference whether we lose \(\epsilon ^{-2}\) or \(\epsilon ^{-4}\). We will see later how much inefficient bounds here may affect provable security of stream ciphers.

1.3 Related Works

Original Work of Jetchev and Pietrzak (TCC’14).

The authors showed that Z can be “approximately” computed from X by an “efficient” function \(\mathsf {h}\).

Theorem 1

([JP14], corrected). For every distribution (XZ) on \(\{0,1\}^n\times \{0,1\}^{\ell }\) and every \(\epsilon \), s, there exists a “simulator” \(h:\{0,1\}^n\rightarrow \{0,1\}^\ell \) such that

  1. (a)

    (Xh(X)) and (XZ) are \((\epsilon ,s)\)-indistinguishable

  2. (b)

    h is of complexity \(s_{\mathsf {h}}=O\left( s\cdot 2^{4\ell }\epsilon ^{-4} \right) \)

The proof uses the standard min-max theorem. In the statement above we correct two flaws. One is a missing factor of \(2^{\ell }\). The second (and more serious) one is the (corrected) factor \(\epsilon ^{-4}\), claimed incorrectly to be \(\epsilon ^{-2}\). The flaws are discussed in Appendix A.

Vadhan and Zheng (CRYPTO’13).

The authors derived a version of Theorem 1 but with incomparable bounds

Theorem 2

([VZ13]). For every distribution XZ on \(\{0,1\}^n\times \{0,1\}^{\ell }\) and every \(\epsilon \), s, there exists a “simulator” \(h:\{0,1\}^n\rightarrow \{0,1\}^\ell \) such that

  1. (a)

    (Xh(X)) and (XZ) are \((s,\epsilon )\)-indistinguishable

  2. (b)

    h is of complexity \(s_{\mathsf {h}}=O\left( s\cdot 2^{\ell }\epsilon ^{-2} + 2^{\ell }\epsilon ^{-4}\right) \)

The proof follows from a general regularity theorem which is based on their uniform min-max theorem. The additive loss of \(O\left( 2^\ell \epsilon ^{-4}\right) \) appears as a consequence of a sophisticated weight-updating procedure. This error is quite large and may dominate the main term for many settings (whenever \(s \ll \epsilon ^{-2}\)).

As we show later, Theorems 1 and 2 give in fact comparable security bounds when applied to leakage-resilient stream ciphers (see Sect. 1.6)

1.4 Our Results

We reduce the dependency of the simulator complexity \(s_h\) on the advantage \(\epsilon \) to only a factor of \(\epsilon ^{-2}\), from the factor of \(\epsilon ^{-4}\).

Theorem 3

(Our Simulator). For every distribution XZ on \(\{0,1\}^n\times \{0,1\}^{\ell }\) and every \(\epsilon \), s, there exists a “simulator” \(h:\{0,1\}^n\rightarrow \{0,1\}^\ell \) such that

  1. (a)

    (Xh(X)) and (XZ) are \((s,\epsilon )\)-indistinguishable

  2. (b)

    h is of complexity \(s_{\mathsf {h}}=O\left( s\cdot 2^{5\ell }\log (1/\epsilon )\epsilon ^{-2}\right) \)

Below in Table 1 we compare our result to previous works.

Table 1. The complexity of simulating \(\ell \)-bit auxiliary information given required indistinguishability strength, depending on the proof technique. For simplicity, terms \(\mathrm {polylog}(1/\epsilon )\) are omitted.

Our result is slightly worse in terms of dependency \(\ell \), but outperforms previous results in terms of dependency on \(\epsilon ^{-1}\). However, the second dependency is more crucial for cryptographic applications. Note that the typical choice is sub-logarithmic leakage, that is \(\ell = o\left( \log \epsilon ^{-1}\right) \) is asymptotic settingsFootnote 4 (see for example [CLP15]). Stated in non-asymptotic settings this assumption translates to \(\ell < c\log \epsilon ^{-1}\) where c is a small constant (for example \(c= \frac{1}{12}\) see [Pie09]). In these settings, we outperform previous results.

To illustrate this, suppose we want to achieve security \(\epsilon = 2^{-60}\) simulating just one bit from a 256-bit input. As it follows from Table 1, previous bounds are useless as they give the complexity bigger than \(2^{256}\) which is the worst complexity of all boolean functions over the chosen domain. In settings like this, only our bound can be applied to conclude meaningful results. For more concrete examples of settings where our bounds are even only meaningful, we refer to Table 2 in Sect. 1.6.

1.5 Our Techniques

Our approach utilizes a simple boosting technique: as long as the condition (a) in Theorem 3 fails, we can use the distinguisher to improve the simulator. This makes our algorithm constructive with respect to distinguishers obtained from an oracleFootnote 5, similarly to other boosting proofs [JP14, VZ13]. In short, if for a “candidate” solution h there exists \(\textsc {D}\) such that

$$\begin{aligned} {{\mathrm{\mathbb { E }}}}\textsc {D}(X,Z) - {{\mathrm{\mathbb { E }}}}\textsc {D}(X,h(X)) > \epsilon \end{aligned}$$

then we construct a new solution \(h'\) using \(\textsc {D}\) and h, according to the equationFootnote 6

$$\begin{aligned} \Pr [h' (x)= z] = \Pr [h(x)=z]+\gamma \cdot \mathsf {Shift}\left( \textsc {D}(x,z)\right) + \mathsf {Corr}(x,z) \end{aligned}$$

where

  1. (a)

    The parameter \(\gamma \) is afixed step chosen in advance (its optimal value depends on \(\epsilon \) and \(\ell \) and is calculated in the proof.)

  2. (b)

    \( \mathsf {Shift}\left( \textsc {D}(x,z)\right) \) is a shifted version of \(\textsc {D}\), so that \(\sum _{z} \mathsf {Shift}\left( \textsc {D}(x,z)\right) = 0\). This restriction correspond to the fact that we want to preserve the constraint \(\sum _{z}h(x,z)=1\). More precisely, \(\mathsf {Shift}\left( \textsc {D}(x,z)\right) =\textsc {D}(x,z)-{{\mathrm{\mathbb { E }}}}_{z'\leftarrow U_{\ell }}\textsc {D}(x,z)\)

  3. (c)

    \(\mathsf {Corr}(x,z)\) is a correction term used to fix (some of) possibly negative weights.

The procedure is being repeated in a loop, over and over again. The main technical difficulty is to show that it eventually stops after not so many iterations.

Note that in every such a step the complexity cost of the shifting term is \(O\left( 2^{\ell }\cdot \mathrm {size}(\textsc {D}) \right) \) Footnote 7. The correction term, in our approach, does a search over z looking for the biggest negative mass, and redistributes it over the remaining points. Intuitively, it works because the total negative mass is getting smaller with every step. See Algorithm 1 for a pseudo-code description of the algorithm and the rest of Sect. 3 for a proof.

1.6 Applications

Better Security for the EUROCRYPT’09 Stream Cipher. The first construction of leakage-resilient stream cipher was proposed by Dziembowski and Pietrzak in [DP08]. On Fig. 1 below we present a simplified version of this cipher [Pie09], based on a weak pseudorandom function (wPRF).

Fig. 1.
figure 1

The EUROCRYPT’09 stream cipher (adaptive leakage). F denotes a weak pseudorandom function. By \(K_i\) and \(x_i\) we denote, respectively, values of the secret state and keystream bits. Leakages are denotted in gray with \(L_i\).

Jetchev and Pietrzak in [JP14] showed how to use the simulator theorem to simplify the security analysis of the EUROCRYPT’09 cipher. The cipher security depends on the complexity of the simulator as explained in Theorem 1 and Remark 2. We consider the following setting:

  • number of rounds \(q=16\),

  • F instantiated with \(\mathsf {AES}256\) (as in [JP14])

  • cipher security we aim for \(\epsilon '=2^{-40}\)

  • \(\lambda = 3\) bits of leakage per round

The concrete bounds for \((q,\epsilon ',s')\)-security of the cipher (which roughly speaking means that q consecutive outputs is \((s',\epsilon ')\)-pseudorandom, see Sect. 2 for a formal definition) are given in Table 2 below. We ommit calculations as they are merely putting parameters from Theorems 1, 2 and 3 into Remark 2 and assuming that AES as a weak PRF is \((\epsilon ,s)\)-secure for any pairs \(s/\epsilon \approx 2^k\) (following the similar example in [JP14]).

Table 2. The security of the EUROCRYPT’09 stream cipher, instantiated with AES256 as a weak PRF of rouhgly \(k=256\) bits of security. In this settngs only our new bounds provide non-trivial bounds.

More generally, we can give the following comparison of security bounds for different wPRF-based stream ciphers, in terms of time-sccess ratio. The bounds in Table 3 follow from the simple lemma in Sect. 4, which shows how the time-success ratio changes under explicit reduction formulas.

Table 3. Different bounds for wPRF-based leakage-resilient stream ciphers. k is the security level of the underlying wPRF. The value \(k'\) is the security level for the cipher, understood in terms of time-success ratio. the numbers denote: (1) The EUROCRYPT’09 cipher, (2) The CSS’10/CHESS’12 cipher, (3) The CT-RSA’13 cipher.

1.7 Organization

In Sect. 2 we discuss basic notions and definitions. The proof of Theorem 3 appears in Sect. 3.

2 Preliminaries

2.1 Notation

By \(\mathbb {E}_{y\leftarrow Y} f(y)\) we denote an expectation of f under y sampled according to the distribution Y.

2.2 Basic Notions

Indistinguishability. Let \(\mathcal {V}\) be a finite set, and \(\mathcal {D}\) be a class of deterministic [0, 1]-valued functions on \(\mathcal {V}\). For any two real functions \(f_1,f_2\) on \(\mathcal {V}\), we say that \(f_1,f_2\) are \((\mathcal {D},\epsilon )\)-indistinguishable if

$$\begin{aligned} \forall \textsc {D}\in \mathcal {D}:\quad \left| \sum _{x\in \mathcal {V}} \textsc {D}(x)\cdot f_1(x)-\sum _{x\in \mathcal {V}} \textsc {D}(x)\cdot f_2(x) ) \right| \leqslant \epsilon \end{aligned}$$

Note that the domain \(\mathcal {V}\) depends on the context. If \(X_1,X_2\) are two probability distributions, we say that they are \((s,\epsilon )\)-indistinguishable if their probability mass functions are indistinguishable, that is when

$$\begin{aligned} \left| \sum _{x\in V} \textsc {D}(x)\cdot \Pr [X_1=x]-\sum _{x\in V} \textsc {D}(x)\cdot \Pr [X_2=x] \right| \leqslant \epsilon \end{aligned}$$

for all \(\textsc {D}\in \mathcal {D}\). If \(\mathcal {D}\) consists of all circuits of size s we say that \(f_1,f_2\) are \((s,\epsilon )\)-indistinguishable.

Remark 1

This an extended notion of indistinguishability, borrowed from [TTV09], which captures not only probability measures but also real-valued functions. A good intuition is provided by the following observation [TTV09]: think of functions over \(\mathcal {V}\) as \(|\mathcal {V}|\)-dimensional vectors then \(\epsilon \geqslant | \sum _{x\in V} \textsc {D}(x)\cdot f_1(x)-\sum _{x\in V} \textsc {D}(x)\cdot f_2(x)| =| \langle f_1-f_2, \textsc {D}\rangle |\) means that \(f_1\) and \(f_2\) are nearly orthogonal for all test functions in \(\mathcal {D}\).

Distinguishers. In the definition above we consider deterministic distinguishers, as this is required by our algorithm. However, being randomized doesn’t help in distinguishing, as any randomized-distinguisher achieving advantage \(\epsilon \) when run on two fixed distributions can be converted into a deterministic distinguishers of the same size and advantage (by fixing one choice of coins). Moreover, any real-valued distinguisher can be converted, by a boolean threshold, into a boolean one with at least the same advantage [FR12].

Relative Complexity. We say that a function h has complexity at most T relative to the set of functions \(\mathcal {D}\) if there are functions \(\textsc {D}_1,\ldots ,\textsc {D}_{T}\) such h can be computed by combining them using at most T of the following operations: (a) multiplication by a constant, (b) application of a boolean threshold function, (c) sum, (d) product.

2.3 Stream Ciphers Definitions

We start with the definition of weak pseudorandom functions, which are computationally indistinguishable from random functions, when queried on random inputs and fed with uniform secret key.

Definition 1

(Weak pseudorandom functions). A function \(\textsc {F}: \{0, 1\}^{k} \times \{0, 1\}^{n} \rightarrow \{0, 1\}^{m}\) is an \((\epsilon , s, q)\)-secure weak PRF if its outputs on q random inputs are indistinguishable from random by any distinguisher of size s, that is

$$\begin{aligned} \left| \Pr \left[ \textsc {D}\left( \left( X_i \right) _{i=1}^{q},\textsc {F}((K,X_i)_{i=1}^{q} \right) =1\right] - \Pr \left[ \textsc {D}\left( \left( X_i\right) _{i=1}^{q},\left( R_i\right) _{i=1}^{q}\right) =1 \right] \right| \leqslant \epsilon \end{aligned}$$

where the probability is over the choice of the random \(X_i \leftarrow \{0,1\}^n\), the choice of a random key \(K \leftarrow \{0,1\}^k\) and \(R_i \leftarrow \{ 0,1\}^m\) conditioned on \(R_i = R_j\) if \(X_i = X_j\) for some \(j < i\).

Stream ciphers generate a keystream in a recursive manner. The security requires the output stream should be indistinguishable from uniformFootnote 8.

Definition 2

(Stream ciphers). A stream-cipher \(\mathsf {SC} : \{0, 1\}^k \rightarrow \{0, 1\}^k \times \{0, 1\}^n\) is a function that, when initialized with a secret state \(S_0 \in \{0, 1\}^k\), produces a sequence of output blocks \(X_1, X_2, . . . \) computed as

$$\begin{aligned} (S_i, X_i) := \mathsf {SC}(S_{i-1}). \end{aligned}$$

A stream cipher \(\mathsf {SC}\) is \((\epsilon ,s,q)\)-secure if for all \(1 \leqslant i \leqslant q\), the random variable \(X_i\) is \((s,\epsilon )\)-pseudorandom given \(X_1, . . . , X_{i-1}\) (the probability is also over the choice of the initial random key \(S_0\)).

Now we define leakage resilient stream ciphers, following the “only computation leaks” assumption.

Definition 3

(Leakage-resilient stream ciphers). A leakage-resilient stream-cipher is \((\epsilon ,s,q,\lambda )\)-secure if it is \((\epsilon ,s,q)\)-secure as defined above, but where the distinguisher in the j-th round gets \(\lambda \) bits of arbitrary deceptively chosen leakage about the secret state accessed during this round. More precisely, before \((S_j,X_j) := \mathsf {SC}(S_{j-1})\) is computed, the distinguisher can choose any leakage function \(f_j\) with range \(\{0,1\}^{\lambda }\), and then not only get \(X_j\), but also \(\Lambda _j := f_j(\hat{S}_{j-1})\), where \(\hat{S}_{j-1}\) denotes the part of the secret state that was modified (i.e., read and/or overwritten) in the computation \(\mathsf {SC}(S_{j-1})\).

2.4 Security of Leakage-Resilient Stream Ciphers

Best provable secure constructions of leakage-resilient streams ciphers are based on so called weak PRFs, primitives which look random when queried on random inputs [Pie09, FPS12, JP14, DP10, YS13]. The most recent (TCC’14) analysis is based on a version of Theorem 1.

Theorem 4

(Proving Security of Stream Ciphers [JP14]). If F is a \((\epsilon _F, s_F, 2)\)-secure weak PRF then \(\mathsf {SC}^F\) is a \((\epsilon ', s', q, \lambda )\)-secure leakage resilient stream cipher where

$$\begin{aligned} \epsilon ' = 4q \sqrt{\epsilon _F2^{\lambda }},\quad s' = \Theta (1) \cdot \frac{s_{F} \epsilon '^4}{2^{4\lambda }}. \end{aligned}$$

Remark 2

(The exact complexity loss). An inspection of the proof in [JP14] shows that \(s_{F}\) equals the complexity of the simulator h in Theorem 1, with circuits of size \(s'\) as distingusihers and \(\epsilon \) replaced by \(\epsilon '\).

2.5 Time-Success Ratio

The running time (circuit size) s and success probability \(\epsilon \) of attacks (practical and theoretical) against a particular primitive or protocol may vary. For this reason Luby [LM94] introduced the time-success ratio \(\frac{t}{\epsilon }\) as a universal measure of security. This model is widely used to analyze provable security, cf. [BL13] and related works.

Definition 4

(Security by Time-Success Ratio [LM94]). A primitive P is said to be \(2^{k}\)-secure if for every adversary with time resources (circuit size in the nonuniform model) s, the success probability in breaking P (advantage) is at most \(\epsilon < s\cdot 2^{-k}\). We also say that the time-success ratio of P is \(2^{k}\), or that is has k bits of security.

For example, \(\mathsf {AES}\) with a 256-bit random key is believed to have 256 bits of security as a weak PRFFootnote 9.

3 Proof of Theorem 3

For technical convenience, we attempt to efficiently approximate the conditional probability function \(g(x,z) = \Pr [Z=z|X=x]\) rather than building the sampler directly. Once we end with building an efficient approximation h(xz), we transform it into a sampler \(h_{\mathsf {sim}}\) which outputs z with probability h(xz) (this transformation yields only a loss of \(2^{\ell }\log (1/\epsilon )\)). We are going to prove the following fact

For every function g on \(\mathcal {X}\times \mathcal {Z}\) which is a \(\mathcal {X}\)-conditional probability mass function over Z (that is \(g(x,z)\geqslant 0\) for all xz and \(\sum _{z}g(x,z)=1\) for every x), and for every class \(\mathcal {D}\) closed under complementsFootnote 10 there exists h such that

  1. (a)

    h is a \(\mathcal {X}\)-conditional probability mass function over Z

  2. (b)

    h is of complexity \(s_h = O(2^{4\ell }\epsilon ^{-2})\) with respect to \(\mathcal {D}\)

  3. (c)

    (XZ) and \((X, h_{\mathsf {sim}}(X))\) are indistinguishable, which in terms of g and h means

    $$\begin{aligned} \left| \sum _{z} {{\mathrm{\mathbb { E }}}}_{x\sim X} \left[ \textsc {D}(x,z)\cdot ( g(x,z)-h(x,z) ) \right] \right| \leqslant \epsilon \end{aligned}$$
    (2)

The sketch of the construction is shown in Algorithm 1. Here we would like to point out two things. First, we stress that we do not produce a strictly positive function; what our algorithm guarantees, is that the total negative mass issmall. We will see later that this is enough. Second, our algorithm performs essentially same operations for every x, which is why its complexity depends only on \(\mathcal {Z}\).

We denote for shortness \(\overline{\textsc {D}}(x,z)=\textsc {D}(x,z)-{{\mathrm{\mathbb { E }}}}_{z'\leftarrow U_{\mathcal {Z}}}\textsc {D}(x,z')\) for any \(\textsc {D}\) (the “shift” transformation).

figure a

Proof

Consider the functions \({h}^{t}\). Define \(\tilde{h}^{t+1}(x,z) \overset{def}{=} h^{t}(x,z)+\gamma \cdot \overline{\textsc {D}}^{t+1}(x,z)\). According to Algorithm 1, we have

$$\begin{aligned} {h}^{t+1}(x,z) = {h}^{t}(x,z) + \gamma \cdot \overline{\textsc {D}}^{t+1}(x,z) + \theta ^{t+1}(x,z) \end{aligned}$$
(3)

with the correction term \(\theta ^{t}(x,z)\) that be computed recursively as (see Line 13 in Algorithm 1)

$$\begin{aligned} \begin{array}{rl} \theta ^{t}(x,z) &{} = 0 \\ \theta ^{t}(x,z) &{} = \left\{ \begin{array}{rl} -\min \left( h^{t}(x,z) + \gamma \cdot \overline{\textsc {D}}^{t+1}(x,z),0\right) , &{} \text { if } z= z_{\text {min}}^{t}(x)\\ \frac{\min \left( h^{t}(x,z_{\text {min}}^{t}(x)))+\gamma \cdot \overline{\textsc {D}}^{t+1}(x,z_{\text {min}}^{t}(x)),0\right) }{\#\mathcal {Z}-1} &{} \text { if }z \not = z_{\text {min}}^{t}(x) \end{array} \right. \quad t=0,1,\ldots \end{array} \end{aligned}$$
(4)

where \(z_{\text {min}}^{t}(x)\) is one of the points z minimizing \( h^{t}(x,z)+\gamma \cdot \overline{\textsc {D}}^{t+1}(x,z) \) (chosen and fixed for every t) . In particular

$$\begin{aligned} h^{t}(x,z_{\text {min}}^{t}(x)))+\gamma \cdot \overline{\textsc {D}}^{t+1}(x,z_{\text {min}}^{t}(x))< 0 \Longleftrightarrow \exists z: \ h^{t}(x,z)+\gamma \cdot \overline{\textsc {D}}^{t+1}(x,z)<0 \end{aligned}$$
(5)

Notation: for notational convenience we indenify the functions \(\textsc {D}^{t}(x,z)\), \(\overline{\textsc {D}}^{t}(x,z)\), \(\theta ^{t}(x,z)\), \(\tilde{h}^{t}(x,z)\) and \(h^{t}(x,z)\) with matrices where x are columns and z are rows. That is \(h^{t}_x\) denotes the \(|\mathcal {Z}|\)-dimensional vector with entries \(h^{t}(x,z)\) for \(z\in \mathcal {Z}\) and similarly for other functions \(\textsc {D}^{t}(x,z)\), \(\overline{\textsc {D}}^{t}(x,z)\), \(\theta ^{t}(x,z)\), \(\tilde{h}^{t}(x,z)\).

Claim 1

(Complextity of Algorithm 1). T executions of the “while loop” can be realized with time \(O\left( T\cdot |\mathcal {Z}| \cdot \mathrm {size}(\mathcal {D})\right) \) and memory \(O(|\mathcal {Z}|)\).Footnote 11

This claim describes precisely resources required to compute the function \(h^{T}\) for every T. In order to bound T, we define the energy function as follows:

Claim 2

(Energy function). Define the auxiliary function

$$\begin{aligned} \varDelta ^{t} = \sum _{i=0}^{t-1} {{\mathrm{\mathbb { E }}}}_{x\sim X} \left[ \overline{\textsc {D}}^{i+1}_x\cdot \left( g_x-h^{i}_x\right) \right] . \end{aligned}$$
(6)

Then we have \( \varDelta ^{t} = E_1 + E_2 \) where

$$\begin{aligned} \begin{array}{rl} E_1 &{}=\frac{1}{\gamma }{{\mathrm{\mathbb { E }}}}_{x\sim X}\left[ \left( h^{t}_x-h^{0}_x\right) \cdot g_x + \frac{1}{2}\sum _{i=0}^{t-1}\left( h^{i+1}_x - h^{i}_x\right) ^2-\frac{1}{2}\left( \left( h^{t}_x\right) ^2-\left( h^{0}_x\right) ^2 \right) \right] \\ E_2 &{} = \frac{1}{\gamma }{{\mathrm{\mathbb { E }}}}_{x\sim X}\left[ -\sum _{i=0}^{t-1} \theta ^{i+1}_x\cdot \left( g_x-h^{i+1}_x\right) - \sum _{i=0}^{t-1} \theta ^{i+1}_x\cdot \left( h^{i+1}_x-h^{i}_x\right) \right] \end{array} \end{aligned}$$
(7)

Note that all the symbols represent vectors and multiplications, including squares, should be understood as scalar products. The proof is based on simple algebraic manipulations and appears in Appendix B.

Remark 3

(Technical issues and intuitions). To upper-bound the formulas in Eq. (7), we need the following important properties

  1. (a)

    Boundedness of correction terms, that is ideally \(|\theta ^{i}(x.z)| = O(\mathrm {poly}(|\mathcal {Z}|)\cdot \gamma )\).

  2. (b)

    Acute angle between the correction and the error, that is \(\theta ^{i}_x\cdot (g_x-h^{i}_x) \geqslant 0\).

Below we present an outline of the proof, discussing more technical parts in the appendix.

Proof Outline. Indeed, with these assumptions we prove an upper bound on the energy function, namely

$$\begin{aligned} E_1 + E_2 \leqslant O\left( \mathrm {poly}( |\mathcal {Z}|)\cdot \left( t \gamma + \gamma ^{-1}\right) \right) , \end{aligned}$$
(8)

which follows from the properties (a) and (b) above (they are proved in Claims 4 and 3 below, and the inequality on \(E_1+E_2\) is derived in Claim 5). Note that, except a factor \(\mathrm {poly}(|\mathcal {Z}|)\), our formula (not the proof, though) is identical to the bound used in [TTV09] (see Claim 3.4 in the eprint version). Indeed, our theorem is, to some extent, an extension to the main result in [TTV09] to cover the conditional case, where \(|\mathcal {X}|>1\). The main difference is that we show how to simulate a short leakage |Z| given X, whereas [TTV09] shows how to simulate Z alone, under the assumption that the distribution of Z is dense in the uniform distribution (the min-entropy gap being small)Footnote 12.

Since the bound above is valid for any step t, and since on the other hand we have \(t\epsilon \leqslant \varDelta ^{t} \) after t steps of the algorithm, we achieve a contradiction (to the number of steps) setting \(\gamma = \epsilon /\mathrm {poly}(|\mathcal {Z})\). Indeed, suppose that \(t\epsilon \leqslant A |\mathcal {Z}|^B (\gamma ^{-1} + t\gamma )\) for some positive constants AB. Since the step size \(\gamma \) can be chosen arbitrarily, we can set \(\gamma = \frac{\epsilon }{2A|\mathcal {Z}|^B}\) which yields \( \frac{t\epsilon }{2} \leqslant \frac{2A^2|\mathcal {Z}|^B}{\epsilon }\) or \(t \leqslant 4A^2 |\mathcal {Z}|^B\epsilon ^{-2}\), which means that the algorithm terminates after at most \(T = \mathrm {poly}(|\mathcal {Z}|)\epsilon ^{-2}\) steps. Our proof goes exactly this way, except some extra optimization do obtain better exponent A.

We stress that it outputs only a signed measure, not a probability distribution yet. However, because of property (a) the negative mass is only of order \(\mathrm {poly}(|\mathcal {Z}|)\epsilon \) and the function we end with can be simply rescaled (we replace negative masses by 0 and normalize the function dividing by a factor \(1-m\) where m is the total negative mass). With this transformation, we keep the expected advantage \(O(\epsilon )\) and lose an extra factor \(O(|\mathcal {Z}|)\) in the complexity. We can then. Finally, we need to remember that we construct only a probability distribution function, not a sampler. Transforming it into a sampler yields an overhead of \(O(\mathcal {Z})\). This discussion shows that it is possible to build a sampler of complexity \( \mathrm {poly}(|\mathcal {Z}|)\epsilon ^{-2}\) with respect to \(\mathcal {D}\). A more careful inspection of the proof shows that we can actually achieve the claimed bound \(|\mathcal {Z}|^5\epsilon ^{-2}\) (see Remark 4 at the end of the proof).

Technical Discussion. We note that condition (b) somehow means that mass cuts should go in the right direction, as it is much simpler to prove that Algorithm 1 terminates when there are no correction terms \(\theta ^{t}\); thus we don’t want to go in a wrong direction and ruin the energy gain. Concrete bounds on properties (a) and (b) are given in Claims 3 and 4.

In Algorithm 1 in every round we shift only one negative point mass (see Line 13). However, since this point mass is chosen to be as big as possible and since \(h^{t+1}\) and \(h^{t}\) differ only by a small term \(\gamma \cdot \overline{\textsc {D}}^{t+1}\) except the mass shift \(\theta ^{t+1}\), one can expect that we have the negative mass under control. Indeed, this is stated precisely in Claim 3 below.

Claim 3

(The total negative mass is small). Let

$$\begin{aligned} \textsf {NegativeMass}(h^{t}(x,\cdot )) = -\sum _{z} \min ( h^{t}(x,z) ,0) \end{aligned}$$

be the total negative mass in \(h^{t}(x,z)\) as the function of z. Then we have

$$\begin{aligned} \textsf {NegativeMass}(h^{t}(x,\cdot ) < |\mathcal {Z}|^3\gamma . \end{aligned}$$
(9)

for every x and every t. In fact, for all xz and t we have the following stronger bound

$$\begin{aligned} \max _{z}\left| \min (h^{t}(x,z),0)\right| < |\mathcal {Z}|\gamma . \end{aligned}$$

The proof is based on a recurrence relation that links \(\textsf {NegativeMass}(h^{t+1}(x,\cdot )\) with \(\textsf {NegativeMass}(h^{t}(x,\cdot )\), and appears in Appendix C.

Claim 4

(The angle formed by the correction and the difference vector is acute). For every xt we have \(\textsf {Angle}\left( \theta ^{t+1}_x,g_x-{h}^{t+1}_x\right) \in \left[ -\frac{\pi }{2},\frac{\pi }{2}\right] \).

The proof appears in Appendix D.

Having established Claims 3 and 4 we are now in position to prove a concrete bound in Eq. (8). To this end, we give upper bounds on \(E_1\) and \(E_2\), defined in Eq. (7), separately.

Claim 5

(Algorithm 1 terminates after a small number of steps). The energy function in Claim 2 can be bounded as follows

$$\begin{aligned} E_1 \leqslant \gamma ^{-1}\left( 1 + 2|\mathcal {Z}|^2\gamma + |\mathcal {Z}|t\gamma ^2 + |\mathcal {Z}|^3 t \gamma ^2 \right) ,\quad E_2 \leqslant 2|\mathcal {Z}|^2 t \gamma . \end{aligned}$$

In particular, we conclude that with \(\gamma = \frac{\epsilon }{8|\mathcal {Z}|^4}\) the algorithm terminates after at most \(t = O( |\mathcal {Z}|^3) \epsilon ^{-2}\) steps.

First, note that by Claim 4 we have \( -\sum _{i=0}^{t-1} \theta ^{i+1}_x\cdot \left( g_x-h^{i+1}_x\right) \leqslant 0\). Second, by definition of the sequence \((h^{i})_i\) we have \(- \sum _{i=0}^{t-1} \theta ^{i+1}_x\cdot \left( h^{i+1}_x-h^{i}_x\right) = -\sum _{i=0}^{t-1} \theta ^{i+1}_x\cdot \theta ^{i+1}_x - \sum _{i=0}^{t-1}\gamma \theta ^{i+1}_x\cdot \overline{\textsc {D}}^{i+1}_x\) which is at most \(2|\mathcal {Z}|^3 t \gamma ^2\), because of Eq. (9) (the sum of absolute correction terms \(\sum _{z}|\theta ^{i+1}(x,z)|\) is, by definition, twice the total negative mass, and \(|\overline{\textsc {D}}^{i+1}(x,z)| \leqslant 1\)). This proves that

$$\begin{aligned} E_2 \leqslant \frac{1}{\gamma }\cdot 2 |\mathcal {Z}|^3 t \gamma ^2 = 2|\mathcal {Z}|^3 t \gamma . \end{aligned}$$

To bound \(E_1\), note that we have to bounds two non-negative terms, namely \(\frac{1}{2}\sum _{i}\left( h^{i+1}_x-h^{i}_x\right) ^2\) and \(\left( h^{t}_x-h^{0}_x\right) \cdot g_x\). As for the first one, we have

$$\begin{aligned} \left( h^{i+1}_x-h^{i}_x\right) ^2 = \left( \gamma \overline{\textsc {D}}^{i+1}_x + \theta ^{i+1}_x\right) ^2 \leqslant 2(\gamma \overline{\textsc {D}}^{i+1}_x)^2 + 2\left( \theta ^{i+1}_x\right) ^2, \end{aligned}$$

where the inequality follows by the Cauchy-Schwarz inequalityFootnote 13. We trivially have \(\left( \overline{\textsc {D}}^{i+1}_x\right) ^2 \leqslant |\mathcal {Z}|\) (because of \(|\overline{\textsc {D}}(x,z) | \leqslant 1\)). By the definition of correction terms in Eq. (4) we have \(\left( \theta ^{i+1}_x\right) ^2 = \sum _{z}( \theta ^{i+1}(x,z))^2 < 2(\theta ^{i+1}(x,z_0))^2\), where \(\theta ^{i+1}(x,z_0)\) is the smallest negative mass, which is at most \((2|\mathcal {Z}|^3\gamma )^2\) by Eq. (9) . Thus, we have \(\left( h^{i+1}_x-h^{i}_x\right) ^2 \leqslant 2 |\mathcal {Z}|\gamma ^2 + 8|\mathcal {Z}|^6\gamma ^2\). To bund \(\left( h^{t}_x-h^{0}_x\right) \cdot g_x\) note that \(-h^{0}_x\cdot g_x \leqslant 0\) and that \(h^{t}_x\cdot g_x \leqslant \max _{z} |h^{t}(x,z)|\) (because \(g(x,z) \geqslant 0\) and \(\sum _{x}g(x,z) = 1\)) which means \(h^{t}_x\cdot g_x \leqslant 1+2\textsf {NegativeMass}(h^{t}_x)\) (as \(\sum _{z}\max ( h^{t}(x,z),0) = 1-\sum _{z}\min ( h^{t}(x,z), 0) = 1+\textsf {NegativeMass}(h^{t}_x)\) and \(-\sum _{z}\min ( h^{t}(x,z),0) = \textsf {NegativeMass}(h^{t}_x)\) by \(\sum _{z}\max ( h^{t}(x,z) = 1\) and the definition of the total negative mass). This allows us to estimate \(E_1\) as follows

$$\begin{aligned} E_1 \leqslant \gamma ^{-1}\left( 1+2 |\mathcal {Z}|^3\gamma + |\mathcal {Z}| t \gamma ^2 + 4|\mathcal {Z}|^6 t\gamma ^2\right) \end{aligned}$$

After t steps, the energy is at least \(t\epsilon \). On the other hand, it at most \(E_1+E_2\). Since \(|\mathcal {Z}|, |\mathcal {Z}|^3 \leqslant |\mathcal {Z}|^6\), we obtain

$$\begin{aligned} t\epsilon < \gamma ^{-1} +2 |\mathcal {Z}| ^3+ 7 |\mathcal {Z}|^6 t \gamma \end{aligned}$$

Since this is true for any positive \(\gamma \), we choose \(\gamma = \frac{\epsilon }{14|\mathcal {Z}|^6}\), which gives us (slightly weaker than claimed)

$$\begin{aligned} t < 32|\mathcal {Z}|^6\epsilon ^{-2}. \end{aligned}$$

Remark 4

(Optimized bounds). By the second part of Claim 3 we have \(|\theta ^{i}(x,z)| < |\mathcal {Z}|\gamma \) for every xz and i. An inspection of the discussion above shows that this allows us to improve the bounds on \(E_1,E_2\)

$$\begin{aligned} E_1 \leqslant \gamma ^{-1}\left( 1 + 2|\mathcal {Z}|^2\gamma + |\mathcal {Z}|t\gamma ^2 + |\mathcal {Z}|^2 t \gamma ^2 \right) ,\quad E_2 \leqslant 2|\mathcal {Z}|^2 t \gamma \end{aligned}$$

Setting \(\gamma = \frac{\epsilon }{8|\mathcal {Z}|^2} \) we get \(E_1 + E_2 \leqslant 20 |\mathcal {Z}|^2 \epsilon ^{-1}\) and \(t \leqslant 20 |\mathcal {Z}|^2\epsilon ^{-2}\).

This finishes the proof of the claim.

From Claim 5 we conclude that after \(t = O\left( |\mathcal {Z}|^2\epsilon ^{-2} \right) \) steps we end up with a function \(h = h^{t}\) that is \((s,\epsilon )\)-indistinguishable from g, because the algorithm terminated (and, clearly, has the complexity at most \( O\left( |\mathcal {Z}|^3\epsilon ^{-2} \right) \) relative to circuits of size s (including an overhead of \(O(|\mathcal {Z}|)\) to compute \(\overline{\textsc {D}}\) from \(\textsc {D}\)). To finish the proof, we need to solve two issues

Claim 6

(From the signed measure to the probability measure). Let \( h^{t}\) be the output of the algorithm. Define the probability distribution

$$\begin{aligned} h(x,z) = \frac{\max (h^{t}(x,z),0)}{ \sum _{z'}{ \max (h^{t}(x,z'),0)}} \end{aligned}$$

for every xz. Then \(h^{t}(x,\cdot )\) and \(h(x,\cdot )\) are \(O(\epsilon )\)-statistically close for every x.

To prove the claim, we note that \( \sum _{z'}{ \max (h^{t}(x,z'),0)}\) equals \(1+\beta \) where \(\beta = \textsf {NegativeMass}(h^{t}(x,\cdot )\). Thus we have \(|h(x,z) - h^{t}(x,z)| \leqslant | h^{t}(x,z)|\cdot \frac{\beta }{1+\beta }\). Since \(\sum _{z'}| h^{t}(x,z')| = \sum _{z'}{ \max (h^{t}(x,z'),0)} - \sum _{z'}{ \min (h^{t}(x,z'),0)} = 1 + 2\beta \), we get \(\sum |h(x,z) - h^{t}(x,z)| = O(\beta )\) which is \(O(\epsilon )\) by Claim 3 for \(\gamma \) defined as in Claim 5.

Recall that we have constructed an approximating probability measure h for the probability mass function g, which is not a sampler yet. However, we can fix it by rejection sampling, as shown below.

Claim 7

(From the pmf to the sampler). There exists a (probabilistic) function \(h_{\mathsf {sim}}:\mathcal {X}\rightarrow \mathcal {Z}\) which calls h(xz) (defined as above) at most \(O(|\mathcal {Z}|\log (1/\epsilon ))\) times and for every x the distribution of its output is \(\epsilon \)-close to \(h(x,\cdot )\) for every x.

The proof goes by a simple rejection sampling argument: we sample a point \(z\leftarrow \mathcal {Z}\) at random and reject with probability h(xz). The rejection probability in one turn is \(\frac{1}{|\mathcal {Z}|}\). If we repeat the experiment \(|\mathcal {Z}|\log (1/\epsilon )|\) then the probability of rejection in every round is only \(\epsilon \). On the other hand, conditioned on the opposite event, we get the distribution identical to \(h(x,\cdot )\). So the distance is at most \(\epsilon \) as claimed. note that

The last two claims prove that the distribution of \(h_{sim}(x)\) is \((s,O(\epsilon ))\)-close to \(h^{t}_x = h^{t}(x,\cdot )\), for every x. Since \(h^{t}\), as a function of xz is \((s,\epsilon )\)-close to g, and g is the conditional distribution of Z|X, we obtain

$$\begin{aligned} X, h_{sim}(X) \approx ^{s,O(\epsilon )} X,Z \end{aligned}$$

and the complexity of the final sampler \(h_{sim}(X)\) is \(O(|\mathcal {Z}|^5\epsilon ^{-2})\)

4 Time-Success Ratio Under Algebraic Transformations

In Theorem 1 below we provide a quantitative analysis of how the time-success ratio changes under concrete formulas in security reductions.

Lemma 1

(Time-success ratio for algebraic transformations). Let abc and ABC be positive constants. Suppose that \(P'\) is secure against adversaries \((s',\epsilon ')\), whenever P is secure against adversaries \((s,\epsilon )\), where

$$\begin{aligned} \begin{array}{rl} s' &{} = s\cdot c\epsilon ^{C} - b\epsilon ^{-B} \\ \epsilon ' &{} = a\epsilon ^A. \end{array} \end{aligned}$$
(10)

In addition, suppose that the following condition is satisfied

$$\begin{aligned} A \leqslant C+1. \end{aligned}$$
(11)

Then the following is true: if P is \(2^{k}\)-secure, then \(P'\) is \(2^{k'}\)-secure (in the sense of Definition 4) where

$$\begin{aligned} k' = \left\{ \begin{array}{rl} \frac{A}{B+C+1} k + \frac{A}{B+C+1}(\log c - \log b)-\log a, &{} \quad b\geqslant 1 \\ \frac{A}{C+1} k + \frac{A}{C+1}\log c -\log a, &{} \quad b = 0 \end{array} \right. \end{aligned}$$
(12)

The proof is elementary though not immediate. It can be found in [Skó15].

Remark 5

(On the technical condition 11 ( ) ) . This condition is satisfied in almost all applications, at in the reduction proof typically \(\epsilon '\) cannot be better (meaning higher exponent) than \(\epsilon \). Thus, quite often we have \(A\leqslant 1\).