1 Introduction

The problem to construct pseudorandom functions (PRFs) from pseudorandom permutations (PRPs) is called “Luby-Rackoff Backwards” [BKR98] (referring to the well known work of Luby and Rackoff who showed how to construct a PRP from a PRF [LR88]). In [BKR98], the authors considered two sequential block cipher calls, where the output of the first call is the key input to the second one. However, this construction achieves security only up to the birthday bound on the output size. Achieving security beyond the birthday bound is somewhat non-trivial. Xoring the outputs of two independent n-bit random permutationsFootnote 1 is a very simple way to construct random functions from random permutations. We call it the XOR construction and denote it as \(\mathsf {XORP}\). We also consider a generalized version of the XOR construction in which we xor k independent n-bit random permutations, and denote it as \(\mathsf {XORP}[k]\). Lucks [Luc00] showed beyond the birthday bound security for \(\mathsf {XORP}[k]\) for all \(k \ge 2\). In particular, he showed that the construction achieves at least \(\frac{kn}{k+1}\)-bit security. This bound was further improved in a sequence of papers [BI99, CLP14, Pat10, Pat08b]. Very recently, Dai et al. [DHT17] have shown n-bit security for \(\mathsf {XORP}\). Earlier, Mennink et al. [MP15] showed a reduction proving that the security of \(\mathsf {XORP}[k]\) can be reduced to that of \(\mathsf {XORP}\) for any \(k \ge 3\). Hence, \(\mathsf {XORP}[k]\) also achieves n-bit security. The \(\mathsf {XORP}\) (or its general version \(\mathsf {XORP}[k]\)) construction is important since it has been used to obtain some constructions achieving beyond the birthday bound (or sometimes almost full) security (e.g., CENC [Iwa06, IMV16], PMAC_Plus [Yas11], and ZMAC [IMPS17]).

Moving from secret to public random permutation. While to a certain degree it is possible to view the permutations as secret, there are many reasons to consider the setting where they are public. For example, we sometimes instantiate block ciphers with fixed keys. Moreover, many unkeyed permutations are designed as an underlying primitive of encryption [BDPVA11a], MAC [BDPVA11b], hash functions [BDP+13, RAB+08, Wu11, GKM+09], etc. The CAESAR competition [CAE] received various permutation-based authenticated encryptions, and all of these constructions have been analyzed in the public permutation model.

The security game, in this setting, is clearly different from the standard indistinguishable model due to the public access of the adversary to the underlying permutations. An appropriate notion is the indifferentiability framework, introduced by Maurer et al. [MRH04]. Informally, it gives a sufficient condition under which an ideal functionality can be replaced by an indifferentiable-secure construction based on ideal, publicly available underlying primitives. We note that the security game for indifferentiability is also an indistinguishability game in which one has to design a simulator aimed to simulate the underlying primitive. In the past, many constructions were analyzed (e.g., [AMP10, BDPVA08, BMN10]) under this security notion.

Known indifferentiable security bounds of \(\mathsf {XORP}\) and \(\mathsf {XORP}[k]\). In this indifferentiability model, Mandal et al. [MPN10] proved \(\frac{2n}{3}\)-bit security for \(\mathsf {XORP}\). Later, Mennink et al. [MP15] pointed out a subtle but non-negligible flaw in their proof and fixed the security proof. Recently, Lee [Lee17] has shown improved security for the general construction \(\mathsf {XORP}[k]\). In particular, he has proved \(\frac{(k-1)n}{k}\)-bit security for the general construction \(\mathsf {XORP}[k]\) when k is an even integer. Table 1 summarizes the state-of-the-art for \(\mathsf {XORP}\) and \(\mathsf {XORP}[k]\) in the public permutation setting.

Table 1. A brief comparison of known bounds and our bounds for the constructions \(\mathsf {XORP}\) and \(\mathsf {XORP}[k]\). Here q denotes the total number of queries made by the adversary to all oracles.

Mirror theory and its limitation. Patarin introduced a combinatorial problem motivated from the PRF-security of \(\mathsf {XORP}[k]\) type constructions. Informally, mirror theory (see [Pat10]) provides a suitable lower bound on the number of solutions satisfying a system of linear equations involving exactly two variables at a time. Together with the H-coefficient technique [Vau03, Pat08a, IMV16], this leads to a bound on the PRF-distinguishing advantage of \(\mathsf {XORP}\). The mirror theory seems to be very powerful as it can be applied to prove optimal security for many constructions such as EDM, EWCDM, etc. [MN17a, MN17b]. However, the proof of the mirror theory is quite complex with some of its steps lacking necessary details. Later, Patarin [CLP14] himself provided a simpler alternative but sub-optimal proof for \(\mathsf {XORP}[k]\) (which is a trivial corollary of the mirror theory).

One may wonder whether the same technique can be applied to the indifferentiability setup or not. Here, we note that the mirror theory puts a constraint on the system of equations so that no equation in one variable can be generated through linear combination of equations from the system. On the other hand, in the indifferentiable security game, the adversary can make public permutation calls and observe the responses. So, along with the two variables linear equations, we also have to consider several single variable equations. This shows the limitation of the mirror theory in this setup.

Our contribution and the proof technique. Proving full security of \(\mathsf {XORP}\) in the public permutation model was an open problem so far. The original simulator [MPN10], used in the security proof of \(\mathsf {XORP}\), is conjectured to allow for security up to \(2^n\) queries. However, the authors of [MP15] expressed this as a highly non-trivial exercise. In this paper, we resolve this open problem and prove n-bit indifferentiable security of \(\mathsf {XORP}\) and \(\mathsf {XORP}[k]\) for all \(k \ge 3\). Full indifferentiable security of \(\mathsf {XORP}\) is our main result which we state and prove in Theorem 2. Subsequently, in Theorem 3, we show full indifferentiability of \(\mathsf {XORP}[k]\); for this, we reduce the security of \(\mathsf {XORP}[k] (k\ge 3\)), to the security of \(\mathsf {XORP}\), and then apply our main result.

The simulator (described in Sect. 3) that we consider in the security proof of \(\mathsf {XORP}\) follows the same steps as the simulator of [MPN10, Lee17] in the case of forward queries. However, the simulator differs in the responses to the backward queries. In the case of backward queries, the simulator queries the ideal random function repeatedly (about n times) until it succeeds in its goal.

We follow the recently introduced \(\chi ^2\) method [DHT17] to prove our claim. This method was implicitly used by Stam [Sta78] while proving a bound on the total variation between a truncated random permutation and a random function. Though in a purely statistical context, (to the best of our knowledge) Stam’s work can be viewed as the origin of the \(\chi ^2\) method, which led to a bound on the PRF-security of the truncated random permutation construction (see [GG16, GGM17] for recent results and discussion on this construction). In [DHT17], the authors used this method to obtain bounds on the PRF-security of \(\mathsf {XORP}\) and the EDM construction [CS16a, MN17b]. Also, using this method full PRF-security of variable output length XOR pseudorandom functions has been shown [BN18].

In this paper, we show another application of the \(\chi ^2\) method in (symmetric-key) cryptography in the context of \(\mathsf {XORP}[k]\) type construction. Our main result demonstrates the power of this method as the proof of full security of \(\mathsf {XORP}\), in the indifferentiability setup, becomes very hard with the existing methods. However, our proof using the \(\chi ^2\) method is not a straightforward extension of the proof in the indistinguishability framework due to Dai et al.; it is somewhat complicated as, unlike in the indistinguishability framework, we will need to consider the primitive queries (i.e., outputs of the individual permutations). Moreover, we will have to handle the backward queries whose analysis is somewhat involved.

Outline of the paper. In the next section, we cover the preliminaries where we discuss the notion of indifferentiability and the \(\chi ^2\) method. In Sect. 3, we describe the simulator that we consider in the proof of our main result (Theorem 2). In Sect. 4, we state and prove Theorem 2. Some auxilliary proofs, used in the proof of Theorem 2, are given in Sect. 5. Finally, in Sect. 6, we show full indifferentiability of \(\mathsf {XORP}[k]\).

2 Preliminaries

In this section, we cover the technical preliminaries required to understand our results. We begin with the notational setup. Then we recall the preliminary security notions related to adversary and its advantage in the context of an indistinguishability game. This is to motivate our subsequent discussion on the notion of indifferentiability. Finally, we briefly describe the \(\chi ^2\) method which is our main tool.

Notational convention. We will use upper case letters to denote random variables and their corresponding lower case letters to denote particular realizations of the variables. Given an integer s we will use the notation \(X^s\) to denote the tuple \((X_1, \ldots , X_s)\) of random variables and use \(x^s\) to denote the tuple \((x_1, \ldots , x_s)\) of corresponding realizations. Moreover, we write \(\{X^s\}\) to denote the set \(\{X_i : 1 \le i \le s\}\). Given a set , we will write to mean that X is sampled uniformly at random from the set .

2.1 Adversary and Advantage

Here, we recall the notion of adversarial advantage in the context of a generic indistinguishability game. An adversary \({\mathscr {A}}\) is an oracle algorithm that interacts with an oracle \(\mathscr {O}\) through queries and responses. Finally, it returns a bit \(b \in \{0,1\}\). We express this as \({\mathscr {A}}^{\mathscr {O}} \rightarrow b\).

In an indistinguishability game, \({\mathscr {A}}\) interacts with two oracles \(\mathscr {O}_1\) and \(\mathscr {O}_2\). The goal of \({\mathscr {A}}\) is to distinguish between \(\mathscr {O}_1\) and \(\mathscr {O}_2\) only from the corresponding queries and responses. The advantage of the adversary in this game, denoted \(\mathsf {Adv}_{{\mathscr {A}}}(\mathscr {O}_1, \mathscr {O}_2)\), is given by

$$\begin{aligned} \mathsf {Adv}^{\mathrm{dist}}_{\mathscr {O}_1, \mathscr {O}_2}({\mathscr {A}}) := \vert {{\mathbf {P}}}{{\mathbf {r}}}[{\mathscr {A}}^{\mathscr {O}_1} \rightarrow 1]-{{\mathbf {P}}}{{\mathbf {r}}}[{\mathscr {A}}^{\mathscr {O}_2} \rightarrow 1]\vert , \end{aligned}$$

where the probabilities are taken over the random coins of \({\mathscr {A}}, \mathscr {O}_1,\) and \(\mathscr {O}_2\).

In this work, we will focus on the information theoretic security of the constructions (\(\mathsf {XORP}\) and \(\mathsf {XORP}[k]\)). So, we let \({\mathscr {A}}\) to be computationally unbounded. Therefore, without loss of any generality, we assume \({\mathscr {A}}\) to be deterministic (it can always fix its internal coin tosses to those which maximizes its advantage). However, we restrict \({\mathscr {A}}\) to only q queries. Let the corresponding replies from \(\mathscr {O}_1\) and \(\mathscr {O}_2\) be \(X_{1}^{q} = (X_{1,1}, \ldots , X_{1,q})\) and \(X_{2}^{q} = (X_{2,1}, \ldots , X_{2,q})\) respectively. Note that \(X_{1}^{q}\) and \(X_{2}^{q}\) are random variables that capture the randomness of the oracles \(\mathscr {O}_1\) and \(\mathscr {O}_2\) respectively. Both \(X_{1}^{q}\) and \(X_{2}^{q}\) are distributed over the output alphabet \(\varOmega ^q = \varOmega \times \cdots \times \varOmega \) of the oracles. Then in this setting, it is not difficult to see that

$$\begin{aligned} \mathsf {Adv}^{\mathrm{dist}}_{\mathscr {O}_1, \mathscr {O}_2}({\mathscr {A}})&= \vert {{\mathbf {P}}}{{\mathbf {r}}}[{\mathscr {A}}^{\mathscr {O}_1} \rightarrow 1]-{{\mathbf {P}}}{{\mathbf {r}}}[{\mathscr {A}}^{\mathscr {O}_2} \rightarrow 1]\vert \nonumber \\&\le \max _{\mathscr {E}\subseteq \varOmega ^q} \sum _{x^q \in \mathscr {E}} ({{\mathbf {P}}}{{\mathbf {r}}}[X_{1}^{q}=x^q]- {{\mathbf {P}}}{{\mathbf {r}}}[X_{2}^{q}=x^q]). \end{aligned}$$
(1)

The quantity on the r.h.s. of (1) is the statistical distance or the total variation distance between \(X_{1}^{q}\) and \(X_{2}^{q}\). We will consider it slightly more formally in Sect. 2.3. We denote by \(\mathsf {Adv}^{\mathrm{dist}}_{\mathscr {O}_1, \mathscr {O}_2}(q)\) the maximum of the distinguishing advantages \(\mathsf {Adv}^{\mathrm{dist}}_{\mathscr {O}_1, \mathscr {O}_2}({\mathscr {A}})\) among all the adversaries \({\mathscr {A}}\) making at most q queries.

2.2 Indifferentiability

The notion of indifferentiability was introduced by Maurer et al. in [MRH04]. It is a stronger security notion than indistinguishability in the following sense. Informally, let a construction \({\mathsf {T}}\) have oracle access to an ideal primitive \({\mathsf {F}}\). Then in an indistinguishability game, when \({\mathsf {T}}\) is presented as an oracle to the adversary \({\mathscr {A}}\), it can only query \({\mathsf {T}}\) in a black-box manner, i.e., \({\mathscr {A}}\) can not query \({\mathsf {F}}\). Whereas in the indifferentiability game, \({\mathscr {A}}\) can query both \({\mathsf {T}}\) and \({\mathsf {F}}\).

As shown in Fig. 1, in the indifferentiability game, in the real world, a construction \({\mathsf {T}}\) has oracle access to an ideal primitive \({\mathsf {F}}\). On the other hand, in the ideal world, the simulator \({\mathsf {S}}\) has access to another ideal primitive \({\mathsf {G}}\). \({\mathscr {A}}\) can query any of these four entities with the goal of distinguishing between the two worlds. In this case, \({\mathscr {A}}\)’s advantage can be written as

$$\begin{aligned} \mathsf {Adv}^{\mathrm{diff}}_{{\mathsf {T}}^{{\mathsf {F}}}, {\mathsf {G}}^{{\mathsf {S}}}}({{\mathscr {A}}}) = \vert {{\mathbf {P}}}{{\mathbf {r}}}[{\mathscr {A}}^{{\mathsf {T}}, {\mathsf {F}}} \rightarrow 1]-{{\mathbf {P}}}{{\mathbf {r}}}[{\mathscr {A}}^{{\mathsf {G}}, {\mathsf {S}}} \rightarrow 1]\vert . \end{aligned}$$

In order to prove indifferentiability of \({\mathsf {T}}\) from \({\mathsf {G}}\), it is sufficient to construct a simulator \({\mathsf {S}}\) in such a way that \(\mathsf {Adv}^{\mathrm{diff}}_{{\mathsf {T}}^{{\mathsf {F}}}, {\mathsf {G}}^{{\mathsf {S}}}}({{\mathscr {A}}})\) becomes negligible for any adversary \({\mathscr {A}}\). The following definition captures this idea more formally.

Fig. 1.
figure 1

Indifferentiability game

Definition 1

(Indifferentiability [MRH04]). A Turing machine \({\mathsf {T}}\) with oracle access to an ideal primitive \({\mathsf {F}}\) is said to be \((t, q_{{\mathsf {T}}}, q_{{\mathsf {F}}}, \varepsilon )\)-indifferentiable from an ideal primitive \({\mathsf {G}}\) if there exists a simulator \({\mathsf {S}}\) with oracle access to \({\mathsf {G}}\) and running time at most t, such that for any adversary \({\mathscr {A}}\), it holds that

$$\begin{aligned} \mathsf {Adv}^{\mathrm{diff}}_{{\mathsf {T}}^{{\mathsf {F}}}, {\mathsf {G}}^{{\mathsf {S}}}}({{\mathscr {A}}})< \varepsilon . \end{aligned}$$

\({\mathscr {A}}\) makes at most \(q_{{\mathsf {T}}}\) queries to \({\mathsf {T}}\) or \({\mathsf {G}}\) and at most \(q_{{\mathsf {F}}}\) queries to \({\mathsf {F}}\) or \({\mathsf {S}}\). Similarly, \({\mathsf {T}}^{{\mathsf {F}}}\) is said to be computationally indifferentiable from \({\mathsf {G}}\) if the running time of \({\mathscr {A}}\) is bounded above by a polynomial in the security parameter and \(\varepsilon \) is a negligible function of the security parameter.

Remark 1

For our purpose, we will not consider the parameter t. Also, we will not consider \( q_{{\mathsf {T}}}\) and \(q_{{\mathsf {F}}}\) separately and focus on their sum \(q =q_{{\mathsf {T}}}+q_{{\mathsf {F}}}\), which is the total number of queries made by \({\mathscr {A}}\). Moreover, when \({\mathsf {F}}\) and \({\mathsf {S}}\) are adequately understood we will write the advantage term as \(\mathsf {Adv}^{\mathrm{diff}}_{{\mathsf {T}}, {\mathsf {G}}}({{\mathscr {A}}})\).

We write \(\mathsf {Adv}^{\mathrm{diff}}_{{\mathsf {T}}, {\mathsf {G}}}(q) = \max _{{\mathscr {A}}} \mathsf {Adv}^{\mathrm{diff}}_{{\mathsf {T}}, {\mathsf {G}}}({{\mathscr {A}}})\), where maximum is taken over all adversaries making at most q queries to its oracles.

Indifferentiable security of \(\mathsf {XORP}\) and \(\mathsf {XORP}[k]\). We first describe the \(\mathsf {XORP}\) and \(\mathsf {XORP}[k]\) constructions. Let \(\mathsf {Perm}\) denote the set of all permutations over the set \(\{0,1\}^n\). Let \(\mathsf {\Pi _0}\) and \(\mathsf {\Pi _1}\) be two independent random permutations, i.e., . The \(\mathsf {XORP}\) construction takes an input x from \(\{0,1\}^n\) and returns the element \(\mathsf {\Pi _0}(x) \oplus \mathsf {\Pi _1}(x)\). This construction can be further generalized to k permutations. Let \(\mathsf {\Pi _0}, \ldots , \mathsf {\Pi }_{k-1}\) be k independent random permutations. We define

$$\begin{aligned} \mathsf {XORP}[k](x) = \bigoplus _{i=0}^{k-1} \mathsf {\Pi }_i(x). \end{aligned}$$
(2)

So, \(\mathsf {XORP}[2]\) is same as \(\mathsf {XORP}\). Now, we describe the setting of indifferentiable security in our context.

In the real world, the construction \(\mathsf {XORP}\) has oracle access to the random permutations \(\mathsf {\Pi _0}\) and \(\mathsf {\Pi _1}\). When the adversary \({\mathscr {A}}\) queries the construction \(\mathsf {XORP}\) with a value \(x \in \{0,1\}^n\), \(\mathsf {XORP}\) queries the oracles \(\mathsf {\Pi _0}\) and \(\mathsf {\Pi _1}\) with x and receives back \(\mathsf {\Pi _0}(x)\) and \(\mathsf {\Pi _1}(x)\) respectively. Finally, it computes \(\mathsf {\Pi _0}(x) \oplus \mathsf {\Pi _1}(x)\) and returns it to \({\mathscr {A}}\). In addition to querying the \(\mathsf {XORP}\) construction, \({\mathscr {A}}\) can directly query the oracles \(\mathsf {\Pi _0}\) and \(\mathsf {\Pi _1}\) and obtain the values of \(\mathsf {\Pi _0}(y),\mathsf {\Pi _1}(y), \mathsf {\Pi _{0}^{-1}}(y),\) and \(\mathsf {\Pi _{1}^{-1}}(y)\) for any \(y \in \{0,1\}^n\). The queries for \(\mathsf {\Pi _0}(y)\) and \(\mathsf {\Pi _1}(y)\) are forward queries and the queries \(\mathsf {\Pi _{0}^{-1}}(y)\) and \(\mathsf {\Pi _{1}^{-1}}(y)\) are backward queries.

In the ideal world, \({\mathscr {A}}\) queries the random function \(\mathsf {\$}\) and the simulator \({\mathsf {S}}\). \({\mathsf {S}}\) has oracle access to \(\mathsf {\$}\). However, \({\mathsf {S}}\) does not have access to (the transcripts of) the interactions between \({\mathscr {A}}\) and \(\mathsf {\$}\). The purpose of \({\mathsf {S}}\) is to simulate the output behavior of the oracles \(\mathsf {\Pi _0}\) and \(\mathsf {\Pi _1}\). That is, for \(b \in \{0,1\}\), when \({\mathscr {A}}\) makes a forward query (xb) with \(x \in \{0,1\}^n\), \({\mathsf {S}}\) returns a random variable \(V_b\in \{0,1\}^n\). So, for \(b \in \{0,1\}\), \(V_b\) simulates \(\mathsf {\Pi _b}(x)\). Similarly, when \({\mathscr {A}}\) makes a backward query (yb) (with \(y \in \{0,1\}^n\) and \(b \in \{0,1\}\)) \({\mathsf {S}}\) returns a random variable \(V_b \in \{0,1\}^n \cup \{\perp \}\). \(V_b \in \{0,1\}^n\) simulates \(\mathsf {\Pi _b}^{-1}(y)\). The output \(V_b = \perp \) indicates that \({\mathsf {S}}\) aborted after a fixed number of iterations. This will be more clear when we will describe the simulator \({\mathsf {S}}\) in Sect. 3. In order to prove that \(\mathsf {XORP}\) is indifferentiable from \(\mathsf {\$}\) it is enough to construct simulator \({\mathsf {S}}\) in such a way that no adversary \({\mathscr {A}}\) can distinguish between the distributions of \(V_b\) and \(\mathsf {\Pi _b}\). In other words, advantage of any adversary \({\mathscr {A}}\), which, in this case, can be written as below,

$$\begin{aligned} \mathsf {Adv}^{\mathrm{diff}}_{\mathsf {XORP}, \mathsf {\$}}({\mathscr {A}}) = \vert {{\mathbf {P}}}{{\mathbf {r}}}[{\mathscr {A}}^\mathsf{{XOR}, (\mathsf {\Pi _0}, \mathsf {\Pi _1}, \mathsf {\Pi _{0}^{-1}}, \mathsf {\Pi _{1}^{-1}})} \rightarrow 1 ] - {{\mathbf {P}}}{{\mathbf {r}}}[{\mathscr {A}}^{\mathsf {\$}, {\mathsf {S}}} \rightarrow 1]\vert \end{aligned}$$

becomes negligible. In our case, we will restrict \({\mathscr {A}}\) to q queries and obtain a concrete upper bound on \(\mathsf {Adv}^{\mathrm{diff}}_{\mathsf {XORP}, \mathsf {\$}}({\mathscr {A}})\) (in terms of parameters q and n). This will be sufficient to show indifferentiability of \(\mathsf {XORP}\) with \(\mathsf {\$}\). For the \(\mathsf {XORP}[k]\) construction, we obtain similar upper bound on \(\mathsf {Adv}^{\mathrm{diff}}_{\mathsf {XORP}[k], \mathsf {\$}}({\mathscr {A}})\).

2.3 \(\chi ^2\) Method for Bounding Total Variation

Here, we provide a brief description of the \(\chi ^2\) method. Given a set \(\varOmega \), let \(X^q := (X_1, \ldots , X_q)\) and \(Z^q := (Z_1, \ldots , Z_q)\) be two random vectors distributed over \(\varOmega ^q = \varOmega \times \cdots \times \varOmega \) (q times) according to the distributions \({\mathbf {P}}_{\varvec{0}}\) and \({\mathbf {P}}_{\varvec{1}}\) respectively. Then the total variation distance or statistical distance between the distributions \({\mathbf {P}}_{\varvec{0}}\) and \({\mathbf {P}}_{\varvec{1}}\) is defined as

$$\begin{aligned} \Vert {\mathbf {P}}_{\varvec{0}}-{\mathbf {P}}_{\varvec{1}}\Vert := {1 \over 2}\sum _{x^q \in \varOmega ^q} \vert {\mathbf {P}}_{\varvec{0}}(x^q) - {\mathbf {P}}_{\varvec{1}}(x^q)\vert = \max _{\mathscr {E}\in \varOmega ^q}\left( \sum _{x^q \in \mathscr {E}} {\mathbf {P}}_{\varvec{0}}(x^q)- {\mathbf {P}}_{\varvec{1}}(x^q)\right) . \end{aligned}$$

In what follows, we will require the following conditional distributions.

$$\begin{aligned} {\mathbf {P}}_{\varvec{0}|x^{i-1}}(x_i)&: = {{\mathbf {P}}}{{\mathbf {r}}}[X_i= x_i \mid X_1 = x_1, \ldots , X_{i-1} = x_{i-1}], \\ {\mathbf {P}}_{\varvec{1}|x^{i-1}}(x_i)&: = {{\mathbf {P}}}{{\mathbf {r}}}[Z_i= x_i \mid Z_1 = x_1, \ldots , Z_{i-1} = x_{i-1}]. \end{aligned}$$

When \(i =1\), \({\mathbf {P}}_{\varvec{0}|x^{i-1}}[x_1]\) represents \({{\mathbf {P}}}{{\mathbf {r}}}[X_1 = x_1]\). Similarly, for \({\mathbf {P}}_{\varvec{1}|x^{i-1}}[x_1]\). Let \(x^{i-1} \in \varOmega ^{i-1}\), \(i \ge 1\). The \(\chi ^2\)-distanceFootnote 2 between these two conditional probability distributions is defined as

$$\begin{aligned} \chi ^2({\mathbf {P}}_{\varvec{0}|x^{i-1}}, {\mathbf {P}}_{\varvec{1}|x^{i-1}}) := \sum _{x_i \in \varOmega } {({\mathbf {P}}_{\varvec{0}|x^{i-1}}(x_i)- {\mathbf {P}}_{\varvec{1}|x^{i-1}}(x_i))^2 \over {\mathbf {P}}_{\varvec{1}|x^{i-1}}(x_i) }. \end{aligned}$$
(3)

Note that for the above definition to work, it is required that the support of the distribution \({\mathbf {P}}_{\varvec{0}|x^{i-1}}\) be contained within the support of the distribution \({\mathbf {P}}_{\varvec{1}|x^{i-1}}\). Further, when the distributions \({\mathbf {P}}_{\varvec{0}|x^{i-1}}\) and \({\mathbf {P}}_{\varvec{1}|x^{i-1}}\) are clear from the context we will use the notation \(\chi ^2(x^{i-1})\) for \(\chi ^2({\mathbf {P}}_{\varvec{0}|x^{i-1}}, {\mathbf {P}}_{\varvec{1}|x^{i-1}})\).

In a very recent work [DHT17], Dai et al. introduced a new method, which they term the \(\chi ^2\) method (Chi-squared method), to bound the statistical distance between two joint distributions in terms of the expectations of the \(\chi ^2\)-distances of the corresponding conditional distributions. At the heart of the \(\chi ^2\) method is the following theorem, stated in our notation and setting.

Theorem 1

([DHT17]). Following the notation as above and suppose the support of the distribution \({\mathbf {P}}_{\varvec{0}|x^{i-1}}\) is contained within the support of the distribution \({\mathbf {P}}_{\varvec{1}|x^{i-1}}\) for all \(x^{i-1}\), then

$$\begin{aligned} \Vert {\mathbf {P}}_{\varvec{0}} - {\mathbf {P}}_{\varvec{1}} \Vert \le \left( {1\over 2} \sum _{i=1}^{q} {{\mathbf {E}}}{{\mathbf {x}}}[{\chi }^2(X^{i-1})]\right) ^{1\over 2}, \end{aligned}$$
(4)

where for each i, the expectation is over the \((i-1)\)-th marginal distribution of \({\mathbf {P}}_{\varvec{0}}\).

As an aside, we mention that the main ingredients of the proof of Theorem 1 are (i) Pinsker’s inequality, (ii) chain rule of Kullback-Leibler divergence (KL divergence), and (iii) Jensen’s inequalityFootnote 3: Pinsker’s inequality upper bounds statistical distance between the distributions by the KL divergence between the two distributions, chain rule of KL divergence expresses the KL divergence of two joint distributions as the sum of the KL divergences between corresponding conditional distributions, and finally Jensen’s inequality is used to upper bound the KL divergence between two distributions by their \(\chi ^2\)-divergence (\(\chi ^2\)-distance).

In [DHT17], Dai et al. have applied Theorem 1 to show PRF-security of two well known constructions, namely the xor of two random permutations [Pat08b, Pat10, BI99, Luc00] and the encrypted Davies-Meyer (EDM) construction [CS16a, MN17a]. Subsequently, in [BN18], this method has been applied to prove full PRF-security of the variable output length XOR pseudorandom function construction. This method seems to have potential for further application to obtain better bounds (and simplified proofs) on the PRF-security of other constructions where proofs so far have evaded more classical methods, such as the H-coefficient method [Pat08a]. In fact, much earlier, Stam [Sta78] used this method, implicitly and in a purely statistical context, to obtain a PRF-security bound of the truncated random permutation construction.

3 Simulator and Transcripts

3.1 Description of the Simulator

Here, we describe the simulator \({\mathsf {S}}\) used in the proof of Theorem  2.Footnote 4 The goal of the simulator \({\mathsf {S}}\) is to mimic the permutations \(\mathsf {\Pi } := (\mathsf {\Pi _0}(.), \mathsf {\Pi _1}(.), \mathsf {\Pi _0}^{-1}(.), \mathsf {\Pi _1}^{-1}(.))\) in such a way that \((\mathsf {XORP},\mathsf {\Pi })\) and \((\$,{\mathsf {S}})\) look indistinguishable. So, \({\mathsf {S}}\) has interfaces corresponding to the forward and backward queries of the random permutations \(\mathsf {\Pi _0}\) and \(\mathsf {\Pi _1}\). Formally, \({\mathsf {S}}\) consists of a pair of stateful randomized algorithms \(\mathsf {SIM_{\mathsf {FWD}}}\) (which is invoked for the responses to the forward queries) and \(\mathsf {SIM}_{\mathsf {BCK}}\) (which is invoked for responses to the backward queries). More precisely, for \(x \in \{0,1\}^n\) and \(b \in \{0,1\}\), when an adversary \({\mathscr {A}}\) makes a forward query (xb) to \({\mathsf {S}}\), \({\mathsf {S}}\) runs the algorithm \(\mathsf {SIM_{\mathsf {FWD}}}\) and returns a random variable \(V_b\in \{0,1\}^n\). So, for \(b \in \{0,1\}\), \(V_b\) simulates \(\mathsf {\Pi _b}(x)\). Similarly, when \({\mathscr {A}}\) makes a backward query (yb) (with \(y \in \{0,1\}^n\) and \(b \in \{0,1\}\)) to \({\mathsf {S}}\), \({\mathsf {S}}\) runs the algorithm \(\mathsf {SIM}_{\mathsf {BCK}}\) and returns a random variable \(V_b \in \{0,1\}^n \cup \{\bot \}\). \(V_b \in \{0,1\}^n\) simulates \(\mathsf {\Pi _b}^{-1}(y)\). Note that \({\mathsf {S}}\) has access to the random function $ which returns random elements from \(\{0,1\}^n\) on every fresh query. The goal of the simulator \({\mathsf {S}}\) is to simulate the output behavior of \(\mathsf {\Pi _0}(.), \mathsf {\Pi _1}(.), \mathsf {\Pi _0}^{-1}(.)\), and \(\mathsf {\Pi _1}^{-1}(.)\) in the ideal world in such a way that it remains consistent with the \(\mathsf {XORP}\) construction, which is given by the condition

$$\begin{aligned} \$(x) = \mathsf {SIM_{\mathsf {FWD}}}(x,0) \oplus \mathsf {SIM_{\mathsf {FWD}}}(x,1) ~\text{ for }~ x \in \{0,1\}^n. \end{aligned}$$

However, \({\mathsf {S}}\) may fail to maintain the condition. Whenever it fails (which happens only for the backward queries), \(\mathsf {SIM}_{\mathsf {BCK}}\) returns \(\bot \). Before returning \(\bot \) it makes several attempts where it interacts with \(\mathsf {\$}\). If after n attempts it fails to maintain the condition (we will show that would happen with very low probability), it aborts. \(V_b = \bot \) indicates that event.

Fig. 2.
figure 2

Description of the simulator for all forward queries.

Description of the internal state. In order to be consistent with its replies, i.e., to output the same \(V_b\) corresponding to the same queries (forward or backward), \({\mathsf {S}}\) is stateful, i.e., it maintains a history of all the previous interactions (i.e., queries and responses) with \({\mathscr {A}}\). To do this, \({\mathsf {S}}\) internally maintains three sets \(\mathscr {D}, \mathscr {R}_0\), and \(\mathscr {R}_1\), and also maintains two lists (indexed by elements of \(\mathscr {D}\)) \(\mathscr {L}_0, \mathscr {L}_1\).

The set \(\mathscr {D}\) contains all \(x \in \{0,1\}^n\) belonging to the forward queries (xb) made by \({\mathscr {A}}\) and all \(V_b \in \{0,1\}^n\) that the simulator output on a backward query made by \({\mathscr {A}}\). For \(b \in \{0,1\}\), the set \(\mathscr {R}_b\) contains all \(y \in \{0,1\}^n\) belonging to the backward queries (yb) made by \({\mathscr {A}}\) along with all \(V_b\) output by \({\mathsf {S}}\) on a forward query made by \({\mathscr {A}}\). The lists \(\mathscr {L}_0, \mathscr {L}_1\) capture the input-output mapping of \({\mathsf {S}}\). More precisely, for \(b \in \{0,1\}, x \in \mathscr {D}, y \in \mathscr {R}_b\), \(\mathscr {L}_b(x) =y\) implies either \(V_b =y\) was output on a forward query (xb) or \(V_b = x\) was output on a backward query (yb). More importantly, for all \(x \in \mathscr {D}\), the relationship \(\mathscr {L}_0(x) \oplus \mathscr {L}_1(x) = \mathsf {\$}(x)\) is always satisfied.

Now, we describe how the simulator works via the algorithms \(\mathsf {SIM_{\mathsf {FWD}}}\) and \(\mathsf {SIM}_{\mathsf {BCK}}\). Details of the these algorithms are given in Figs. 2 and 3. In the following, we assume that \({\mathscr {A}}\) always makes fresh queries since otherwise the simulator can repeat the previous responses (as it maintains internal states keeping all previous queries and responses).

Algorithm \(\mathsf {SIM_{\mathsf {FWD}}}\) (see Fig. 2). On an input \((x \in \{0,1\}^n, b \in \{0,1\})\), \({\mathsf {S}}\) queries \(\mathsf {\$}\) to obtain \(Z = \$(x)\). Then, \({\mathsf {S}}\) samples \(V_b\) randomly from the set \(\{0,1\}^n \setminus \{\mathscr {R}_b \cup \{Z \oplus \mathscr {R}_{1- b} \}\}\), where \(Z \oplus \mathscr {R}_{1-b}\) denotes the set \(\{Z \oplus y| y \in \mathscr {R}_{1-b}\}\). Here, it can be observed that the set \(\{0,1\}^n \setminus \{\mathscr {R}_b~\cup ~\{Z~\oplus ~\mathscr {R}_{1- b} \}\}\) is non-empty, provided \(q < 2^{n-1}\). Therefore, for \(q < 2^{n-1}\), the sampling is always possible. Subsequently, \({\mathsf {S}}\) sets \(V_b\) and \(Z \oplus V_b\) as outputs of \(\mathsf {SIM_{\mathsf {FWD}}}(x, b)\) and \(\mathsf {SIM_{\mathsf {FWD}}}(x, 1-b)\) respectively (and hence \(\mathsf {SIM_{\mathsf {FWD}}}(x, 0)~\oplus ~\mathsf {SIM_{\mathsf {FWD}}}(x, 1) = \$(x)\)). Before \({\mathsf {S}}\) returns \(V_b\) to the adversary, it updates all internal sets accordingly.

Algorithm \(\mathsf {SIM}_{\mathsf {BCK}}\) (see Fig. 3). Next, we present the algorithm \(\mathsf {SIM}_{\mathsf {BCK}}\). On an input (\(y \in \{0,1\}^n, b \in \{0,1\}\)), \({\mathsf {S}}\) samples an element \(V_b\) randomly from outside the set \(\mathscr {D}\) and then obtains \({\$}(V_b)\) by querying $. Now, there is a certain chance that \(\$(V_b) \oplus y\) is in the range set \(\mathscr {R}_{1-b}\), which would then violate the permutation property of \(\mathsf {\Pi }_{1-b}\) that \({\mathsf {S}}\) is simulating. So, \({\mathsf {S}}\) continue with similar attempts until it samples a \(V_b\) such that \(\$(V_b)~\oplus ~y \notin \mathscr {R}_{1-b}\). It makes at most n such attempts. If it fails after all these n attempts, it returns \(\bot \). In all these attempts, the \({\mathsf {S}}\) maintains an auxiliary set \(\mathscr {D}'\) which is not a part of its state and only used locally during an execution. At the beginning of the algorithm, \(\mathscr {D}'\) is initialized to the current domain \(\mathscr {D}\). At the start of each iteration, a fresh \(V_b\) is sampled from the set \(\{0,1\}^n \setminus \mathscr {D}'\). If the conditions \(y \oplus \$(V_b) \in \mathscr {R}_{1-b}\) is satisfied (i.e., the sampled \(V_b\) turns out to be a bad choice), then \(V_b\) is appended to \(\mathscr {D}'\) and the next iteration begins. Note that if \(q+n < 2^n\) then the set \(\{0,1\}^n \setminus \mathscr {D}'\) is always non-empty so that the sampling of \(V_b\) (in Step 6) is possible in every iteration. But \(q+n < 2^n\) is trivially satisfied for \(n \ge 3\) and \(q < 2^{n-1}\). When the condition is not satisfied (i.e., when \(y \oplus \$(V_b) \notin \mathscr {R}_{1-b}\)) then \({\mathsf {S}}\) returns \(V_b\) after appropriately updating all the internal sets.

Remark 2

Here, as an aside, one may notice that there is a chance of collision due to two backward queries made to the two random permutations in the real world or two interfaces in the ideal world. We explain this with the following example. Assume that \({\mathscr {A}}\) makes backward queries (y, 0) and \((y', 1)\) in the real world. Then it is easy to see that there is a positive probability of getting the same output in both the cases (as \(\mathsf {\Pi _0}\) and \(\mathsf {\Pi _1}\) are sampled independently from the set \(\mathsf {Perm}\)). On the other hand, in the ideal world, when \({\mathscr {A}}\) makes the query (y, 0), then if \(y \oplus \$(V_0) \notin \mathscr {R}_{1}\) (which has positive probability) then at Step 13 of \(\mathsf {SIM}_{\mathsf {BCK}}\) \(\mathscr {L}_1(V_0)\) is set to \(\$(V_0)\oplus y\) and \(V_0\) is returned to \({\mathscr {A}}\). Now, if \({\mathscr {A}}\) makes the query \((\$(V_0)\oplus y,1)\) then due to the check at Step 2, \(V_0\) is again returned to \({\mathscr {A}}\). Therefore, there is a positive probability of collision for the queries (y, 0) and \((y', 1)\) in the ideal world as well (as was to be expected since the simulator is simulating the permutations \(\mathsf {\Pi _0}\) and \(\mathsf {\Pi _1}\)), where \(y' =\$(V_0)\oplus y\) in this case.

Fig. 3.
figure 3

Description of the simulator for all backward queries.

3.2 Additional Information to the Adversary

After the adversary \({\mathscr {A}}\) has finished its interaction in the real/ideal world, i.e., when it has made q queries and received corresponding replies, it is provided with the following additional information. Note that the additional information does not degrade \({\mathscr {A}}\)’s advantage as it is always possible to discard it. Below we assume \(x, x_i, y\) are from \(\{0,1\}^n\) and b is from \(\{0,1\}\).

  1. 1.

    For each query x made to the construction \(\mathsf {XORP}\), \({\mathscr {A}}\) is given the values \(\mathsf {\Pi _0}(x)\) and \(\mathsf {\Pi _1}(x)\). Similarly, for each query x made to the random function \(\mathsf {\$}\), \({\mathscr {A}}\) is given the outputs of the simulator \({\mathsf {S}}\) corresponding to the forward queries (x, 0) and (x, 1).

  2. 2.

    For each forward query (xb) made to \(\mathsf {\Pi _b}\) (i.e., for each value of \(\mathsf {\Pi _b}(x)\)), it is also given \(\mathsf{\Pi _{1-b}}(x)\). Similarly, for each forward query (xb) made to \({\mathsf {S}}\), \({\mathscr {A}}\) is also given the value corresponding to the forward query \((x, 1-b)\).

  3. 3.

    For each backward query (yb) made to \(\mathsf {\Pi _b}\) (i.e., for each value of \(\mathsf {\Pi _{b}^{-1}}(y)\)), it is also given \(\mathsf{\Pi _{1-b}}(\mathsf {\Pi _{b}^{-1}}(y))\). For each backward query (yb) made to \({\mathsf {S}}\), \({\mathscr {A}}\) is also given the value corresponding to the forward query \((x, 1-b)\), where x is the value returned by \({\mathsf {S}}\) on the backward query (yb).

With access to this extra information, \({\mathscr {A}}\) knows the tuple \((x_i, \mathsf {\Pi _0}(x_i), \mathsf {\Pi _1}(x_i))\) corresponding to its i-th query in the real world. Note that from \(\mathsf {\Pi _0}(x_i)\) and \(\mathsf {\Pi _1}(x_i)\), \({\mathscr {A}}\) can always obtain \(\mathsf {\Pi _0}(x_i) \oplus \mathsf {\Pi _1}(x_i)\) (which is, in fact, the output of \(\mathsf {XORP}\) when queried with \(x_i\)). Therefore, we do not include this redundant information in the tuple. When \(\mathsf {\Pi _0}(x_i)\) and \(\mathsf {\Pi _1}(x_i)\) are treated as random variables, we will denote \( \mathsf {\Pi _0}(x_i)\) by \(U_{0, i}\) and \( \mathsf {\Pi _1}(x_i)\) by \(U_{1,i}\). So, the tuple \((x_i, U_{0, i}, U_{1,i})\) is a random variable and an arbitrary but fixed value of this random variable will be denoted by \((x_i, u_{0,i}, u_{1,i})\). Similarly, in the ideal world, corresponding to the i-th query, \({\mathscr {A}}\) knows the tuple \((x_i, V_{0, i}, V_{1,i})\), where for \(b \in \{0,1\}\), \(V_{b, i}\) is the reply of \({\mathsf {S}}\) to the forward query \((x_i, b)\). Similar to the previous case, we will denote a fixed value of the random variable \((x_i, V_{0, i}, V_{1,i})\) by \((x_i, v_{0,i}, v_{1,i})\). In the case where the backward query resulted in an abort, i.e., the output was \(\bot \), we take \(x_i = \bot \) and \(v_{0,i}\) and \(v_{1,i}\) can be arbitrary (but fixed). In fact, in this case, \(v_{0,i}\) and \(v_{1,i}\) are purely included to maintain uniformity of presentation and will be disregarded in subsequent calculations. Further, slightly abusing the notation for its simplicity, we will denote any such tuple (i.e., a tuple with \(x_i = \bot \)) by \(\bot \). Note that we did not include the query type (i.e., forward or backward) information in the tuple as, in our calculation, we will consider both the possibilities for a tuple. However, for the sake of completeness, one can assume that \({\mathscr {A}}\) has this information.

Without loss of any generality we will assume that \({\mathscr {A}}\) does not repeat its queries as the response will be the same for a repeated query. Also, we will discard any duplicate copy of a tuple that may have occurred due to the extra information supplied to \({\mathscr {A}}\)Footnote 5.

(Extended) transcript of the adversary. In the real world, the sequence of random variables \((x_i, U_{0, i}, U_{1,i})\), with \(1 \le i \le q\), is supported on the set \(\mathscr {T}_u\) of sequences \((x_i, u_{0,i}, u_{1,i}), 1 \le i \le q,\) \(x_i,u_{0,i},u_{1,i} \in \{0,1\}^n\) and \(x_i \ne x_j, u_{0,i} \ne u_{0,j}, u_{1,i} \ne u_{1,j}\) for \(1\le i < j \le q\). Whereas in the ideal world the sequence of random variables \((x_i, V_{0, i}, V_{1,i})\), with \(1 \le i \le q\), is supported on the set \(\mathscr {T}_v\) of sequences \((x_i, v_{0,i}, v_{1,i}), 1 \le i \le q,\) \(x_i \in \{0,1\}^n \cup \{\bot \}\), \(v_{0,i},v_{1,i} \in \{0,1\}^n\) and \(x_i \ne x_j, v_{0,i} \ne v_{0,j}, v_{1,i} \ne v_{1,j}\) for each \(1\le i < j \le q\) such that \(x_i \ne \bot \ne x_j\). So, we have \(\mathscr {T}_u \subset \mathscr {T}_v\). We term elements of \(\mathscr {T}_u\) and \(\mathscr {T}_v\) views of the adversary. In our subsequent treatment, we will solely work with the views from the real and the ideal world, and the fact that \(\mathscr {T}_u \subset \mathscr {T}_v\) will be essential for the application of the \(\chi ^2\) method.

4 Main Result

In this section, we state and prove our main result. We continue in the setup of the previous section. To simplify the presentation we denote \(2^n\) by N. Our main result is the following.

Theorem 2

Let \(N\ge 16\) and \(q < {N\over 2}\). Then

$$\begin{aligned} \mathsf {Adv}^{\mathrm{diff}}_{\mathsf {XORP}, \mathsf {\$}}(q) \le \sqrt{1.25q \over N} \end{aligned}$$

Proof

Before presenting the technical details we will provide a brief outline of the proof to help the reader follow the underlying idea and the flow of the proof. But before that we will describe our notational setup for the proof.

 To simplify the notation we will denote the random variable \((x_i, U_{0, i}, U_{1,i})\) by \(S_i\) and \((x_i, V_{0, i}, V_{1,i})\) by \(T_i\). So, \(S_i\) (resp. \(T_i\)) follows the distribution of the real (resp. ideal) world which we denote by \(\mathsf {{p}_{0}^{fwd}}(.)\) (resp \(\mathsf {{p}_{1}^{fwd}}\)) when \(S_i\) (resp \(T_i\)) is a forward query and by \(\mathsf {{p}_{0}^{bck}}(.)\) (resp \(\mathsf {{p}_{1}^{bck}}\)) when \(S_i\) (resp \(T_i\)) is a backward query. Hence, we denote \({{\mathbf {P}}}{{\mathbf {r}}}[S_i = s_i]\) by \(\mathsf {{p}_{0}^{fwd}}(s_i)\) and \({{\mathbf {P}}}{{\mathbf {r}}}[T_i = t_i]\) by \(\mathsf {{p}_{1}^{fwd}}(t_i)\) when \(S_i\) and \(T_i\) are forward queries and likewise for backward queries. Further, we will abuse the notation to denote the joint distribution of \(S^{i-1}\) by \(\mathsf {{p}_{0}^{fwd}}\) when \(S^{i-1}\) corresponds to \(i-1\) forward queries and by \(\mathsf {{p}_{0}^{bck}}\) when \(S^{i-1}\) corresponds to \(i-1\) backward queries. Moreover, for fixed \(s_i\) and \(s^{i-1}\), we denote \({{\mathbf {P}}}{{\mathbf {r}}}[S_i=s_i \mid S_1= s_1, \ldots , S_{i-1}=s_{i-1}]\) by \(\mathsf {{p}_{0}^{fwd}}(s_i \mid s^{i-1})\) when \(S_i\) corresponds to a forward query; likewise for the other cases.

 The main tool we use in our proof is Theorem 1. Our goal is to evaluate the r.h.s. of (4). In doing so, we calculate \({{\mathbf {E}}}{{\mathbf {x}}}[\chi ^2(S^{i-1})]\) over the real world distributions (\(\mathsf {{p}_{0}^{fwd}}\) and \(\mathsf {{p}_{0}^{bck}}\)). More precisely, we consider the two cases; (i) when \(s_i\) is a forward query, and (ii) when \(s_i\) is a backward query. For the forward query case, we first calculate \(\chi ^2(s^{i-1})\) for fixed \(s^{i-1}\), which is given by the sum of

$$\begin{aligned} {(\mathsf {{p}_{0}^{fwd}}(s_i \mid s^{i-1})-\mathsf {{p}_{1}^{fwd}}(s_i \mid s^{i-1}))^2 \over \mathsf {{p}_{1}^{fwd}}(s_i\mid s^{i-1}) } \end{aligned}$$

taken over all possible \(s_i\) given \(s^{i-1}\). Here, we note that the support \(\mathscr {T}_u\) of real world distributions (\(\mathsf {{p}_{0}^{fwd}}\) and \(\mathsf {{p}_{0}^{bck}}\)) is included in the supports \(\mathscr {T}_u\) and \(\mathscr {T}_v\) of the ideal world distributions \(\mathsf {{p}_{0}^{fwd}}\) and \(\mathsf {{p}_{0}^{bck}}\) respectively. Hence, \(\chi ^2(s^{i-1})\) is well defined. Next, we consider the random variable \(S^{i-1}\) in the real world. Each \(S_j \in \{S^{i-1}\}\) may correspond to a forward query or a backward query. However, since the distributions \(\mathsf {{p}_{0}^{fwd}}\) and \(\mathsf {{p}_{0}^{bck}}\) are identical, the distribution of \(S^{i-1}\) does not depend on the query type of each individual \(S_j\). So, we treat \(\chi ^2(S^{i-1})\) as a random variable and take its expectation under the distribution of \(S^{i-1}\). Finally, we take the sum of \({{\mathbf {E}}}{{\mathbf {x}}}[\chi ^2(S^{i-1})]\) for all i in the range \(1\le i \le q\), which turns out to be \({8q^3\over N^3}\).

Corresponding steps for the backward query case are exactly similar to the forward query case when \(s_i \ne \perp \). The case when \(s_i = \perp \) is treated separately. Summing the expectations \({{\mathbf {E}}}{{\mathbf {x}}}[\chi ^2(S^{i-1})]\) for the two subcases (i.e., for \(s_i \ne \perp \) and \(s_i=\perp \)) we obtain the final sum (taken over all i in the range \(1\le i \le q\)) for the backward query case to be \({2.5q\over N}\). Finally, we get the upper bound on \(\mathsf {Adv}^{\mathrm{diff}}_{\mathsf {XORP}, \mathsf {\$}}(q)\) by applying Theorem 1, where we get an upper bound on the r.h.s. of (4) by taking the maximum of the forward and backward queries for all the q queries (which in this case turns out to be the backward query).

 Following the above discussion we first consider the case when \(s_i\) is a forward query, and then consider the case when it is a backward query. To simplify notation, from here on, we will denote \(i-1\) by r.

Forward query

First, we calculate \(\mathsf {{p}_{0}^{fwd}}(s_i \mid s^{r})\) and \(\mathsf {{p}_{1}^{fwd}}(s_i \mid s^{r})\) for fixed \(s_i\) and \(s^r\), where \(s_i = (x_i, u_{0,i}, u_{1,i})\). \(\mathsf {{p}_{0}^{fwd}}(s_i \mid s^{r})\) is straightforward to calculate. Since \(x_i \notin \{x^{r}\}\), \(\mathsf {\Pi _0}(x_i)\) and \(\mathsf {\Pi _1}(x_i)\) are two independent random samples drawn from outside the sets \(\{u_{0}^{r}\}\) and \(\{u_{1}^{r}\}\) respectively. Thus

(5)

To calculate \(\mathsf {{p}_{1}^{fwd}}(s_i \mid s^{r})\), we consider, without loss of any generality, the execution of the algorithm \(\mathsf {SIM_{FWD}}\) algorithm on the forward query \((v_{0, i}, 0)\) (the case when the forward query is \((v_{1, i}, 1)\) is identical). In this case, \( \mathscr {D}= \{x^{r}\},\mathscr {R}_{0}= \{u_{0}^{r}\}, \mathscr {R}_{1}= \{u_{1}^{r}\}\). Then we have

$$\begin{aligned} \mathsf {{p}_{1}^{fwd}}(s_i \mid s^{r})&= {{\mathbf {P}}}{{\mathbf {r}}}[T_i= (x_i, v_{0,i}, v_{1,i}) \mid T_{1}= (x_{1}, v_{0,1}, v_{1,1}), \ldots , T_{r}= (x_{r}, v_{0,r}, v_{1,r})] \nonumber \\&= {{\mathbf {P}}}{{\mathbf {r}}}[\mathsf {\$}(x_i)=v_{0,i} \oplus v_{1,i} \wedge V_0 = v_{0, i} \mid \mathscr {D}= \{x^{r}\},\mathscr {R}_{0}= \{u_{0}^{r}\}, \mathscr {R}_{1}= \{u_{1}^{r}\}] \nonumber \\&= {1\over N}\times {1 \over N-\vert \mathscr {W}_{x_i}\vert }, \end{aligned}$$
(6)

where \(\mathscr {W}_{x_i} = \mathscr {R}_0 \cup \{\mathsf {\$}(x_i)\oplus \mathscr {R}_{1}\}\). From (5) and (6) we derive the expression for \(\chi ^2(s^r)\) below.

(7)

The sum in (7) is over all possible \(s_i\)’s given \(s^{r}\). The number of such number of such \(s_i\)’s is \((N-r)(N-\vert \mathscr {W}_{x_i}\vert )\). Therefore,

$$\begin{aligned} \chi ^2(s^{r})&= {N\left( \vert \mathscr {W}_{x_i}\vert - {2rN-r^2 \over N}\right) ^2\over (N-r)^3} \end{aligned}$$
(8)

Let \(S^{r}\) be chosen according to the distribution \(\mathsf {{p}_{0}^{fwd}}\). Then \(\mathscr {D}, \mathscr {R}_0, \mathscr {R}_1\) are random variables. This, in turn, means \(\mathscr {W}_{x_i}\) and \(\chi ^2(S^{r})\) are also random variables (as functions of \(\mathscr {D}, \mathscr {R}_0, \mathscr {R}_1\)). Our goal is to calculate the expectation of \(\chi ^2(S^r)\) under the distribution \(\mathsf {{p}_{0}^{fwd}}\). For notational simplicity, we denote the random variable \(\vert \mathscr {W}_{x_i}\vert \) by \(\mathtt{W}\) (mildly violating our notational convention). So, from (8) we have

$$\begin{aligned} {{\mathbf {E}}}{{\mathbf {x}}}[\chi ^2(Z^{r})]&= {{\mathbf {E}}}{{\mathbf {x}}}\left[ {N\left( \mathtt{W}- {2rN-r^2 \over N}\right) ^2\over (N-r)^3}\right] \nonumber \\&= {N \over (N-r)^3} \times {{\mathbf {E}}}{{\mathbf {x}}}\left[ \left( \mathtt{W}- {2rN-r^2 \over N}\right) ^2\right] . \end{aligned}$$
(9)

In the next lemma, whose proof is postponed to Sect. 5, we calculate \({{\mathbf {E}}}{{\mathbf {x}}}[\mathtt{W}]\).

Lemma 1

With the above notation

$$\begin{aligned} {{\mathbf {E}}}{{\mathbf {x}}}[\mathtt{W}] = {2rN-r^2 \over N},~ \text{ and }~ \mathbf {Var}[\mathtt{W}] \le {r^2 \over N}. \end{aligned}$$

Using Lemma 1, (9) can be written as

$$\begin{aligned} {{\mathbf {E}}}{{\mathbf {x}}}[\chi ^2(S^{r})]&={N \over (N-r)^3} \times {{\mathbf {E}}}{{\mathbf {x}}}\left[ \left( \mathtt{W}- {{\mathbf {E}}}{{\mathbf {x}}}[\mathtt{W}]\right) ^2\right] \\&= {N \over (N-r)^3} \times \mathbf {Var}[\mathtt{W}]. \end{aligned}$$

In Lemma 1, we also showed that \(\mathbf {Var}[\mathtt{W}] \le {r^2 \over N}\). This leads to the following final expression for the forward query case.

$$\begin{aligned} \sum _{i=1}^{q}{{\mathbf {E}}}{{\mathbf {x}}}[\chi ^2(S^{r})]&\le \sum _{i=1}^{q}{r^2 \over (N-r)^3} \nonumber \\&\le {8q^3\over N^3}. \end{aligned}$$
(10)

In (10), we used the fact \(r < q\) and \(q < {N \over 2}\).

Backward query

Let \(\mathscr {Z}\) be the set of all possible \(s_i\)’s which are not ‘abort’, i.e., \(s_i \ne \bot \). Then for backward queries we have the following split.

$$\begin{aligned} {{\mathbf {E}}}{{\mathbf {x}}}[\chi ^2(S^{r})] =\,&{{\mathbf {E}}}{{\mathbf {x}}}\left[ \sum _{s_i \in \mathscr {Z}}{(\mathsf {{p}_{0}^{bck}}(s_i\mid S^{r})-\mathsf {{p}_{1}^{bck}}(s_i \mid S^{r}))^2 \over \mathsf {{p}_{1}^{bck}}(s_i\mid S^{r})}\right] \nonumber \\&+ {{\mathbf {E}}}{{\mathbf {x}}}\left[ {(\mathsf {{p}_{0}^{bck}}(\bot \mid S^{r})-\mathsf {{p}_{1}^{bck}}(\bot \mid S^{r}))^2 \over \mathsf {{p}_{1}^{bck}}(\bot \mid S^{r})}\right] \end{aligned}$$
(11)

We evaluate the two expectations on the r.h.s. of (11) in the following two cases.

In this case, we have for fixed \(s^r\),

$$\begin{aligned} \mathsf {{p}_{0}^{bck}}(s_i \mid s^{r}) = \mathsf {{p}_{0}^{fwd}}(s_i \mid s^{r}) = {1 \over (N-r)^2}. \end{aligned}$$

Next, we calculate \(\mathsf {{p}_{1}^{bck}}(s_i \mid s^{r})\). For this, we need to consider the execution of the algorithm \(\mathsf {SIM_{BCK}}\). Let the backward query, without loss of any generality, be \((v_{0,i}, 0)\). Further, let us denote by \(V_{0}^{j}\) the \(V_0\) sampled by the algorithm \(\mathsf {SIM_{BCK}}\) at the j-th iteration, where by j-th iteration we mean j-th repeated execution of the steps 6 to 19 of \(\mathsf {SIM_{BCK}}\). Let us assume that \(\mathsf {SIM_{BCK}}\) succeeds at the \(\ell \)-th iteration for \(1 \le \ell \le n\), i.e., for \(1 \le j \le \ell -1\), \( \mathsf {\$}(V_{0}^{j})\oplus v_{0,i} \in \mathscr {R}_{1}\), and \(V_{0}^{\ell } = x_i\), where \(\mathscr {R}_{1} = \{v_{1}^{r}\}\) (also \(\mathscr {R}_0 = \{v_{0}^{r}\}, \mathscr {D}= \{x^{r}\}\)). Let us denote by \(\mathsf {BAD}_{\ell -1}\) the event \(\mathsf {\$}(V_{0}^{1})\oplus v_{0,i},\ldots ,\mathsf {\$}(V_{0}^{\ell -1})\oplus v_{0,i} \in \mathscr {R}_{1}\) and by \({\mathsf {E}}\) the event \(\mathscr {D}= \{x^{r}\} \wedge \mathscr {R}_0 = \{v_{0}^{r}\}\wedge \mathscr {R}_{1} = \{v_{1}^{r}\}\). Then

$$\begin{aligned} \mathsf {{p}_{1}^{bck}}(s_i\mid s^{r})&= {{\mathbf {P}}}{{\mathbf {r}}}[T_i= (x_i, v_{0,i}, v_{1,i})\mid T_{r}= (x_{r}, v_{0,r}, v_{1,r}), \ldots ,T_{1}= (x_{1}, v_{0,1}, v_{1,1})]\\&= \sum _{\ell = 1}^{n}{{\mathbf {P}}}{{\mathbf {r}}}[\mathsf {BAD}_{\ell -1}\wedge V_{0}^{\ell } = x_i \wedge \mathsf {\$}(x_i)=v_{0,i} \oplus v_{1,i}\mid {\mathsf {E}}]\\&= \sum _{\ell = 1}^{n}{{\mathbf {P}}}{{\mathbf {r}}}[V_{0}^{\ell } = x_i \wedge \mathsf {\$}(x_i)=v_{0,i} \oplus v_{1,i} \mid \mathsf {BAD}_{\ell -1},{\mathsf {E}}] \times {{\mathbf {P}}}{{\mathbf {r}}}[\mathsf {BAD}_{\ell -1}\mid {\mathsf {E}}] \end{aligned}$$

Now, \({{\mathbf {P}}}{{\mathbf {r}}}[\mathsf {BAD}_{\ell -1}\mid {\mathsf {E}}]\) can be calculated as

$$\begin{aligned} {{\mathbf {P}}}{{\mathbf {r}}}[\mathsf {BAD}_{\ell -1}\mid {\mathsf {E}}]&= \prod _{j=1}^{\ell -1}{{\mathbf {P}}}{{\mathbf {r}}}[\mathsf {\$}(V_{0}^{j})\oplus v_{0,i} \in \mathscr {R}_{1} = \{v_{1}^{r}\} \mid {\mathsf {E}}] \nonumber \\&= \left( {r\over N}\right) ^{\ell -1}. \end{aligned}$$
(12)

To justify (12) we first note that the distribution \(\mathsf {{p}_{0}^{bck}}(.)\) is supported on the set of tuples \(s^r = (s_1, \ldots , s_{r})\) such that none of the \(s_j\), with \(1 \le j \le r\), is \(\bot \). So, in the \(\mathsf {SIM_{BCK}}\) algorithm the set \(\mathscr {R}_{1}\) has size r. Also, at the j-th iteration, with \(1 \le j \le \ell -1\) a fresh \(V_{0}^{j}\) (sampled from outside the set \(\mathscr {D}'\)) is given to \(\mathsf {\$}\). Therefore, \(\mathsf {BAD}_{\ell -1}\) occurs when \(\ell -1\) independent events each with probability \({r\over N}\) occur, leading to the expression in (12).

Next, at the \(\ell \)-th iteration the set \(\mathscr {D}'\) has size \(r+\ell -1\). Since \( V_{0}^{\ell }\) is sampled at random from the set \(\{0,1\}^n \setminus \mathscr {D}'\), we immediately have

$$\begin{aligned} {{\mathbf {P}}}{{\mathbf {r}}}[V_{0}^{\ell } = x_i \wedge \mathsf {\$}(x_i)=v_{0,i} \oplus v_{1,i} \mid \mathsf {BAD}_{\ell -1}, {\mathsf {E}}] = {1 \over N} \times {1 \over N-r-\ell +1}. \end{aligned}$$
(13)

By combining (12) and (13) we get

$$\begin{aligned} \mathsf {{p}_{1}^{bck}}(s_i \mid s^{r})&= \sum _{\ell = 1}^{n} {1 \over N} \times {1 \over N-r-\ell +1} \times \left( {r \over N}\right) ^{\ell -1}. \end{aligned}$$
(14)

In the following lemma, we derive a lower and an upper bound on \(\mathsf {{p}_{1}^{bck}}(s_i \mid s^{r})\). Proof of the lemma is given in Sect. 5.

Lemma 2

With the above notation, the following bounds hold for \(\mathsf {{p}_{1}^{bck}}(s_i \mid s^{r})\).

$$\begin{aligned} {1 \over (N-r)^2} \times \left( 1-\left( {r\over N}\right) ^n\right) \le \mathsf {{p}_{1}^{bck}}(s_i\mid s^{r}) \le {4 \over N(N-r)}. \end{aligned}$$

Let us denote the lower and upper bounds in Lemma 2 by \(\mathtt{L}\) and \(\mathtt{U}\) respectively. Then

$$\begin{aligned} {\left( \mathsf {{p}_{0}^{bck}}(s_i \mid s^{r})-\mathsf {{p}_{1}^{bck}}(s_i \mid s^{r})\right) ^2 \over \mathsf {{p}_{1}^{bck}}(s_i \mid s^{r})}&\le \max \left\{ {\left( \mathtt{U}- {1 \over (N-r)^2}\right) ^2 \over \mathtt{U}}, {\left( \mathtt{L}- {1 \over (N-r)^2}\right) ^2 \over \mathtt{L}}\right\} . \end{aligned}$$
(15)

(15) is justified because the function \({\left( y - {1 \over (N-r)^2}\right) ^2 \over y}\) attains its minimum (\(=0\)) at \(y = {1 \over (N-r)^2}\) and is strictly increasing for \(y \ge {1 \over (N-r)^2}\) and strictly decreasing for \(y \le {1 \over (N-r)^2}\). Now,

$$\begin{aligned} {\left( \mathtt{U}- {1 \over (N-r)^2}\right) ^2 \over \mathtt{U}}&= {3N -4r \over 4N(N-r)^3}, \end{aligned}$$

and

$$\begin{aligned} {\left( \mathtt{L}- {1 \over (N-r)^2}\right) ^2 \over \mathtt{L}}&= {\left( {r \over N}\right) ^{2n}\over (N-r)^2 \times \left( 1 - \left( {r \over N}\right) ^n\right) }. \end{aligned}$$

Further, considering that \(|\mathscr {Z}|\) is at most \((N - \vert \mathscr {D}\vert )(N - \vert \mathscr {R}_1 \vert )=(N-r)^2\), we get

$$\begin{aligned} \sum _{s_i \in \mathscr {Z}} {\left( \mathsf {{p}_{0}^{bck}}(s_i \mid s^{r})-\mathsf {{p}_{1}^{bck}}(s_i \mid s^{r})\right) ^2 \over \mathsf {{p}_{1}^{bck}}(s_i \mid s^{r})}&\le \max \left\{ {3N -4r \over 4N(N-r)}, {\left( {r \over N}\right) ^{2n}\over \left( 1 - \left( {r \over N}\right) ^n\right) }\right\} . \end{aligned}$$

Therefore, when \(S^r\) is a random variable that follows the distribution \(\mathsf {{p}_{0}^{bck}}\), we obtain the following expectation under the distribution \(\mathsf {{p}_{0}^{bck}}\).

$$\begin{aligned} {{\mathbf {E}}}{{\mathbf {x}}}\left[ \sum _{s_i \in \mathscr {Z}} {\left( \mathsf {{p}_{0}^{bck}}(s_i \mid S^{r})-\mathsf {{p}_{1}^{bck}}(s_i \mid S^{r})\right) ^2 \over \mathsf {{p}_{1}^{bck}}(s_i \mid S^{r})}\right]&\le \max \left\{ {3N -4r \over 4N(N-r)}, {\left( {r \over N}\right) ^{2n}\over \left( 1 - \left( {r \over N}\right) ^n\right) }\right\} . \end{aligned}$$
(16)

In the real world, there is no abort, so \(\mathsf {{p}_{0}^{bck}}(\bot \mid S^{r}) =0 \). Therefore, similar to (12),

$$\begin{aligned} {{\mathbf {E}}}{{\mathbf {x}}}\left[ {\left( \mathsf {{p}_{0}^{bck}}(\bot \mid S^{r})-\mathsf {{p}_{1}^{bck}}(\bot \mid S^{r})\right) ^2 \over \mathsf {{p}_{1}^{bck}}(\bot \mid S^{r})}\right]&= {{\mathbf {E}}}{{\mathbf {x}}}\left[ \mathsf {{p}_{1}^{bck}}(\bot \mid S^{r})\right] \nonumber \\&= \mathsf {{p}_{1}^{bck}}(\bot )\nonumber \\&= \left( {r \over N}\right) ^n. \end{aligned}$$
(17)

From (11), (16), and (17) we derive

$$\begin{aligned} \sum _{r=0}^{q-1}{{\mathbf {E}}}{{\mathbf {x}}}[\chi ^2(S^{r})]&\le \sum _{r=0}^{q-1} \max \left\{ {3N -4r \over 4N(N-r)}, {\left( {r \over N}\right) ^{2n}\over \left( 1 - \left( {r \over N}\right) ^n\right) }\right\} + \left( {r \over N}\right) ^n\\&\le q \times \left( \max \left\{ {3 \over 4(N-q)}, {\left( {q \over N}\right) ^{2n}\over \left( 1 - \left( {q \over N}\right) ^n\right) }\right\} + \left( {q \over N}\right) ^n \right) .\\ \end{aligned}$$

For \(q < {N \over 2}\) we have the following bounds,

$$\begin{aligned} {3 \over 4(N-q)}~<~{3 \over 2N},~~ {\left( {q \over N}\right) ^{2n}\over \left( 1 - \left( {q \over N}\right) ^n\right) }~<~{1 \over N(N-1)},~\text{ and }~ \left( {q \over N}\right) ^n~<~{1 \over N}. \end{aligned}$$

Hence, we have for the backward query

$$\begin{aligned} \sum _{r=0}^{q-1}{{\mathbf {E}}}{{\mathbf {x}}}[\chi ^2(S^{r})]&\le {2.5q \over N}. \end{aligned}$$
(18)

Finally, we get the following upper bound on \(\mathsf {Adv}^{\mathrm{diff}}_{\mathsf {XORP}, \mathsf {\$}}(q)\).

$$\begin{aligned} \mathsf {Adv}^{\mathrm{diff}}_{\mathsf {XORP}, \mathsf {\$}}(q)&= \Vert S^q - T^q \Vert \end{aligned}$$
(19)
$$\begin{aligned}&\le \sqrt{{1\over 2} \sum _{r=0}^{q-1}{{\mathbf {E}}}{{\mathbf {x}}}[\chi ^2(S^{r})]}\end{aligned}$$
(20)
$$\begin{aligned}&\le \sqrt{1.25 q \over N}, \end{aligned}$$
(21)

where (19) is from the definition of \(\mathsf {Adv}^{\mathrm{diff}}_{\mathsf {XORP}, \mathsf {\$}}(q)\). (20) is given by (4) and (21) is given by the maximum of (10) and (18) (which is (18)) for the q queries.    \(\square \)

5 Auxiliary Proofs

In this section we state and prove Lemmas 1 and 2. We begin with Lemma 1 where we work with the same notation and setting of the Forward Query part of the proof of Theorem 2.

5.1 Proof of Lemma 1

Lemma 1

With the notation of Theorem 2,

$$\begin{aligned} {{\mathbf {E}}}{{\mathbf {x}}}[\mathtt{W}] = {2rN-r^2 \over N},~ \text{ and }~ \mathbf {Var}[\mathtt{W}] \le {r^2 \over N}. \end{aligned}$$

Proof

When \(Z^r\) is chosen according to the distribution \(\mathsf {{p}_{0}^{fwd}}\) the sets \(\{U_{0}^{r}\}\) and \(\{U_{1}^{r}\}\) are two random subsets (sampled independently) of \(\{0,1\}^n\) of cardinality r. Also, in keeping with the notation of Theorem 2, we assume \(x_i\) to be a fixed element of \(\{0,1\}^n\). Now, for each \(g \in \{0,1\}^n\), we define an indicator random variable \(I_g\) as follows.

$$\begin{aligned} I_g= {\left\{ \begin{array}{ll} 1 ~\text{ if }~ g \in \{U_{0}^{r}\} ~ \text{ and }~ g\oplus x_i \in \{U_{1}^{r}\}\\ 0 ~ \text{ otherwise. } \end{array}\right. } \end{aligned}$$

Therefore,

$$\begin{aligned} {{\mathbf {E}}}{{\mathbf {x}}}[I_g] = {{\mathbf {P}}}{{\mathbf {r}}}[I_g =1]&= {{\mathbf {P}}}{{\mathbf {r}}}[g \in \{U_{0}^{r}\} \wedge g\oplus x_i \in \{U_{1}^{r}\}]= {r\over N} \times {r \over N} = {r^2 \over N^2}. \end{aligned}$$
(22)

Also,

$$\begin{aligned} \mathtt{W}= 2r - \sum _{g\in \{0,1\}^n} I_g. \end{aligned}$$

Thus,

$$\begin{aligned} {{\mathbf {E}}}{{\mathbf {x}}}[\mathtt{W}]&= 2r - {{\mathbf {E}}}{{\mathbf {x}}}\left[ \sum _{g\in \{0,1\}^n} I_g\right] = 2r - \sum _{g\in \{0,1\}^n} {{\mathbf {E}}}{{\mathbf {x}}}[I_g]= {2rN - r^2 \over N}. \end{aligned}$$

Next, to calculate \(\mathbf {Var}[\mathtt{W}]\) we use the following relationship.

$$\begin{aligned} \mathbf {Var}[\mathtt{W}] = \mathbf {Var}\left[ \sum _{g\in \{0,1\}^n} I_g\right] = \sum _{g\in \{0,1\}^n} \mathbf {Var}[I_g]+ \sum _{g \ne h \in \{0,1\}^n} \mathbf {Cov}[I_g, I_h]. \end{aligned}$$

\(\mathbf {Var}[I_g]\) is straightforward to calculate from the defintion;

$$\begin{aligned} \mathbf {Var}[I_g] = {{\mathbf {E}}}{{\mathbf {x}}}[I_{g}^{2}]- {{\mathbf {E}}}{{\mathbf {x}}}[I_g]^2&= {{\mathbf {E}}}{{\mathbf {x}}}[I_g](1 - {{\mathbf {E}}}{{\mathbf {x}}}[I_g])\\&= {r^2 \over N^2} \times \left( 1 - {r^2 \over N^2}\right) \\&< {r^2 \over N^2}. \end{aligned}$$

From the definition, \(\mathbf {Cov}[I_g, I_h]\) is given by \(\mathbf {Cov}[I_g, I_h] = {{\mathbf {E}}}{{\mathbf {x}}}[I_gI_h]- {{\mathbf {E}}}{{\mathbf {x}}}[I_g]{{\mathbf {E}}}{{\mathbf {x}}}[I_h]\). Since \({{\mathbf {E}}}{{\mathbf {x}}}[I_g]={{\mathbf {E}}}{{\mathbf {x}}}[I_h] = {r^2 \over N^2}\) is given by (22), the task reduces to the calculation of \( {{\mathbf {E}}}{{\mathbf {x}}}[I_gI_h]\) which we consider below.

$$\begin{aligned} {{\mathbf {E}}}{{\mathbf {x}}}[I_gI_h]&= {{\mathbf {P}}}{{\mathbf {r}}}[I_g=1 \wedge I_h =1]\\&= {{\mathbf {P}}}{{\mathbf {r}}}[g \in \{U_{0}^{r}\} \wedge h \in \{U_{0}^{r}\} \wedge g \oplus x_i \in \{U_{1}^{r}\} \wedge h \oplus x_i \in \{U_{1}^{r}\} ]\\&= {{\mathbf {P}}}{{\mathbf {r}}}[g \in \{U_{0}^{r}\} \wedge h \in \{U_{0}^{r}\}] \times {{\mathbf {P}}}{{\mathbf {r}}}[g \oplus x_i \in \{U_{1}^{r}\} \wedge h \oplus x_i \in \{U_{1}^{r}\}]\\&= {r(r-1) \over N(N-1)} \times {r(r-1) \over N(N-1)}\\&= \left( {r(r-1) \over N(N-1)}\right) ^2 \end{aligned}$$

Therefore, \(\mathbf {Cov}[I_g, I_h] = \left( {r(r-1) \over N(N-1)}\right) ^2 - \left( {r^2 \over N^2}\right) ^2 < 0\). This implies that

$$\begin{aligned} \mathbf {Var}[\mathtt{W}] < N \times {r^2 \over N^2} = {r^2 \over N}. \end{aligned}$$

This finishes the proof of the lemma.    \(\square \)

5.2 Proof of Lemma 2

Lemma 2

With the notation of Theorem 2, the following bounds hold for \(\mathsf {{p}_{1}^{bck}}(s_i\mid s^{r})\).

$$\begin{aligned} {1 \over (N-r)^2} \times \left( 1-\left( {r\over N}\right) ^n\right) \le \mathsf {{p}_{1}^{bck}}(z_i\mid z^{r}) \le {4 \over N(N-r)}. \end{aligned}$$

Proof

The lower bound is justified as follows.

$$\begin{aligned} \mathsf {{p}_{1}^{bck}}(z_i\mid z^{r})&=\sum _{\ell = 1}^{n} {1 \over N} \times {1 \over N-r-\ell +1} \times \left( {r \over N}\right) ^{\ell -1} \\&\ge {1 \over N(N-r)}\times \sum _{\ell = 1}^{n} \left( {r \over N}\right) ^{\ell -1}\\&= {1 \over N(N-r)}\times {1 - \left( {r \over N}\right) ^n \over 1 - \left( {r \over N}\right) }\\&= {1 \over (N-r)^2} \times \left( 1-\left( {r\over N}\right) ^n\right) . \end{aligned}$$

For the upper bound, we get

$$\begin{aligned} \mathsf {{p}_{1}^{bck}}(z_i\mid z^{r})=\sum _{\ell = 1}^{n} {1 \over N} \times {1 \over N-r-\ell +1} \times \left( {r \over N}\right) ^{\ell -1}&\le {4 \over N^2} \times \sum _{\ell = 1}^{\infty } \left( {r \over N}\right) ^{\ell -1} \\&= {4 \over N(N-r)}. \nonumber \end{aligned}$$
(23)

The first term on the r.h.s. of (23) follows by noting that \(r< q < {N \over 2}\) and \(\ell < n = \log N \le {N \over 4}\), for \(N \ge 16\).    \(\square \)

6 Extension to the Xor of k Permutations

In this section, we apply our main result (Theorem 2) to show full indifferentiable security of the \(\mathsf {XORP}[k]\) construction for any k. Following Theorem 2, it is sufficient to consider \(\mathsf {XORP}[k]\) with \(k \ge 3\). In particular, our result is the following.

Theorem 3

Let \(N\ge 16\) and \(q < {N\over 2}\). Then, there exists a simulator \({\mathsf {S}}'\) for \(\mathsf {XORP}[k], k\ge 3,\) such that for any adversary \({\mathscr {A}}'\), there exists an adversary \({\mathscr {A}}\) with

$$\begin{aligned}\mathsf {Adv}^{\mathrm{diff}}_\mathsf{{XORP}[k], \mathsf {\$}}({\mathscr {A}}') = \mathsf {Adv}^{\mathrm{diff}}_\mathsf{{XORP}, \mathsf {\$}}({\mathscr {A}}) \end{aligned}$$

and hence, \(\mathsf {Adv}^{\mathrm{diff}}_\mathsf{{XORP}[k], \mathsf {\$}}(q) \le \sqrt{1.25 q \over N}\).

Proof

The indifferentiable security analysis of \(\mathsf {XORP}[k]\) follows a reduction technique which is similar to the technique used in [MP15] to prove PRF-security of \(\mathsf {XORP}[k]\) in the indistinguishability setting. However, in our case, we additionally need to consider the simulator \({\mathsf {S}}'\).

\({\mathsf {S}}'\). First, we recall the simulator \({\mathsf {S}}\) for \(\mathsf {XORP}\) from Sect. 3. The simulator \({\mathsf {S}}'\) works almost the same way as \({\mathsf {S}}\) works. It first samples \((k-2)\) independent random permutations \(\mathsf {\Pi }_2, \ldots , \mathsf {\Pi }_{k-1}\). Note that the sampling can be simulated as a lazy sampling in an efficient manner instead of sampling the whole permutations at a time.

In case of a forward or backward query (xi), with \(i \ge 2\), \({\mathsf {S}}'\) responds honestly (i.e., it uses its own sampled random permutation as mentioned above). When \(i \in \{0, 1\}\), it behaves exactly in the same way as \({\mathsf {S}}\) except that it computes \(\$'(x) = \$(x) \oplus \mathsf {\Pi }_2(x) \oplus \cdots \mathsf {\Pi }_{k-1}(x)\) and then applies Step 5 of \(\mathsf {SIM_{\mathsf {FWD}}}\) (see Fig. 2) in case of a forward query, or Step 7 of \(\mathsf {SIM}_{\mathsf {BCK}}\) ( see Fig. 3) in case of a backward query.

Next, we describe the reduction for the adversaries. Suppose there is an adversary \({\mathscr {A}}'\) against \(\mathsf {XORP}[k]\) and consider the simulator \({\mathsf {S}}'\) defined above. Now, we construct an adversary \({\mathscr {A}}\) against \(\mathsf {XORP}\) and the simulator \({\mathsf {S}}\). The adversary \({\mathscr {A}}\) first stores the permutations \( \mathsf {\Pi }_2, \ldots , \mathsf {\Pi }_{k-1}\) (again using lazy sampling to make those efficient). Next, \({\mathscr {A}}\) runs the algorithm \({\mathscr {A}}'\) which can make two types of queries, namely (a) primitive or simulator queries and (b) construction or random function queries. Below, we consider these two types of queries.

(a):

In case of a primitive or simulator query (xi) (either forward or backward), \({\mathscr {A}}\) first checks whether \(i \ge 2\) or not. If \(i \ge 2\), then \({\mathscr {A}}\) can simulate the response on its own, i.e., it computes \(\mathsf {\Pi }_i(x)\) or \(\mathsf {\Pi }_{i}^{-1}(x)\), where \( \mathsf {\Pi }_i \in \{\mathsf {\Pi }_2, \ldots , \mathsf {\Pi }_{k-1}\}\), and sends the output back to \({\mathscr {A}}'\). If \(i = 0\) or 1, then \({\mathscr {A}}\) forwards the query to its simulator/primitive oracle and whatever response it gets again forwards to \({\mathscr {A}}'\); so, it basically relays the queries and responses.

(b):

In case of a construction or random function query, \({\mathscr {A}}\) forwards the query to its corresponding construction/random function oracle. Suppose \({\mathscr {A}}\) gets Z as a response. Then it computes \(Z' = Z \oplus \bigoplus _{i=2}^{k-1} \mathsf {\Pi }_i(x)\), and sends \(Z'\) back to \({\mathscr {A}}'\).

Note that \({\mathscr {A}}\) is actually interacting with \((\mathsf {XORP}, ( \mathsf {\Pi }_0, \mathsf {\Pi }_1, \mathsf {\Pi }_0^{-1}, \mathsf {\Pi }_1^{-1}))\), whereas the interaction interface of \({\mathscr {A}}'\) is equivalent to

$$\begin{aligned} (\mathsf {XORP}[k], (\mathsf {\Pi }_0,\ldots , \mathsf {\Pi }_{k-1}, \mathsf {\Pi }_0^{-1},\ldots , \mathsf {\Pi }_{k-1}^{-1})). \end{aligned}$$

Now, assume that \({\mathscr {A}}\) is interacting with \((\$, {\mathsf {S}})\), the interaction interface of \({\mathscr {A}}'\) is then equivalent to \((\$ \oplus \mathsf {XORP}[k-2], {\mathsf {S}}')\). It is easy to see the correctness of the first oracle as \({\mathscr {A}}'\) xors his computation of \(\mathsf {XORP}[k-2]\) with the output of \(\$\). Similarly, one can show the simulator interface of \({\mathscr {A}}'\) is \({\mathsf {S}}'\). Note that \(\$ \oplus \mathsf {XORP}[k-2]\) is completely independent of \(\mathsf {XORP}[k-2]\), and we can consider it as another independent random function \(\$'\). Thus, the interface of \({\mathscr {A}}'\) is equivalent to \((\$', {\mathsf {S}}')\). So, \({\mathscr {A}}\) perfectly simulates the real and the ideal world of \({\mathscr {A}}'\). Therefore, \(\mathsf {Adv}^{\mathrm{diff}}_\mathsf{{XORP}[k], \mathsf {\$}}({\mathscr {A}}') = \mathsf {Adv}^{\mathrm{diff}}_\mathsf{{XORP}, \mathsf {\$}}({\mathscr {A}}) \). By Theorem 2, we finally have

$$\begin{aligned} \mathsf {Adv}^{\mathrm{diff}}_\mathsf{{XORP}[k], \mathsf {\$}}(q) \le \sqrt{1.25q \over N}. \end{aligned}$$

   \(\square \)

7 Conclusion

Proving full security of \(\mathsf {XORP}\) construction in the secret or public permutation model (i.e., indifferentiable security) is a challenging problem. Recently, Dai et al. introduced a method, called the \(\chi ^2\) method, using which they were able to obtain full PRF-security of \(\mathsf {XORP}\) in the secret random permutation model. The full security in the public permutation model for this construction was an open problem. In this paper, we apply the \(\chi ^2\) method to the \(\mathsf {XORP}\) construction to prove its full indifferentiable security. We believe this method can also be used for other cryptographic constructions for which the full security is not known.

Here, we also remark that though our bound shows full (i.e., n-bit) indifferentiable security of \(\mathsf {XORP}\) and \(\mathsf {XORP}[k]\), in practice (i.e., for realistic setting of parameters), it does not lead to full n-bit security (mainly due to the presence of the square root in the bound). As an immediate goal, it will be interesting to investigate if a more sophisticated application of the \(\chi ^2\) method can get rid of the square root in our bound.