Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Proving security of protocols where an adversary can make queries and/or corrupt players adaptively is a notoriously hard problem. Selective security, where the adversary must commit to its queries before the protocol starts, often allows for an easy proof, but in general does not imply (the practically relevant) adaptive security notion [CFGN96].

Panjwani [Pan07] argues that the two common approaches to achieving adaptive security, namely requiring that all parties erase past data [BH93], or using non-committing encryption [CFGN96] are not satisfactory. He introduces the generalized selective decryption (GSD) problem and uses it as an abstraction of security requirements of multicast encryption protocols [WGL00, MP06]. GSD is defined by a very simple game that captures the difficulty of proving adaptive security of some interesting protocols.

The Generalized Selective Decryption (GSD) Game. In the GSD game we consider a symmetric encryption scheme \(\mathsf{Enc}\) and a parameter \(n\in \mathbb {N}\). Initially, we sample n random keys \(k_1,\ldots ,k_n\) and a bit \(b\in \{0,1\}\). During the game the adversary \(\mathsf{A}\) can make two types of queries. Encryption query: on input (ij) she receives \(c=\mathsf{Enc}_{k_i}(k_j)\); corruption query: on input i, she receives \(k_i\). At some point, \(\mathsf{A}\) chooses some i to be challenged on. If \(b=0\), she gets the key \(k_i\); if \(b=1\), she gets a uniformly random \(r_i\).Footnote 1 Finally, \(\mathsf{A}\) outputs a guess bit \(b'\). The goal is prove that for any efficient \(\mathsf{A}\), is negligible (or, equivalently, \(k_i\) is pseudorandom) assuming only that \(\mathsf{Enc}\) is a secure encryption scheme. We only allow one challenge query, but this notion is equivalent to allowing any number of challenge queries by a standard hybrid argument (losing a factor that is only the number of challenge queries).

It is convenient to think of the GSD game as dynamically building a graph, which we call key graph. We start with a graph with n vertices labeled \(1,\ldots ,n\), where we associate vertex i with key \(k_i\). On an encryption query \(\mathsf{Enc}_{k_i}(k_j)\) we add a directed edge \(i\rightarrow j\). On a corruption query i we label the vertex i as corrupted. Note that if i is corrupted then \(\mathsf{A}\) also learns all keys \(k_j\) for which there is a path from i to j in the key graph by simply decrypting the keys along that path. To make the game non-trivial, challenge queries are thus only allowed for keys that are not reachable from any corrupted key. Another restriction we must make is to disallow encryption cycles, i.e., loops in the graph. Otherwise we cannot hope to prove security assuming only standard security (in our case IND-CPA) of the underlying encryption scheme, as this would require circular (or key-dependent-message) security [BRS03], which is stronger than IND-CPA [ABBC10]. Finally, we require that the challenge query is a leaf in the graph; this restriction too is necessary unless we make additional assumptions on the underlying encryption scheme (cf. Footnote 9).

Selective security of GSD. In order to prove security of the GSD game, one must turn an adversary \(\mathsf{A}\) that breaks the GSD game with some advantage into an adversary \(\mathsf{B}\) that breaks the security of \(\mathsf{Enc}\) with some advantage \(\epsilon '=\epsilon '(\epsilon )\). The security notion we consider is the standard notion of indistinguishability under chosen plaintext attacks (IND-CPA). Recall that in the IND-CPA game an adversary \(\mathsf{B}\) is given access to an encryption oracle \(\mathsf{Enc}_k(\cdot )\). At some point \(\mathsf{B}\) chooses a pair of messages \((m_0,m_1)\), then gets a challenge ciphertext \(c=\mathsf{Enc}_k(m_b)\) for a random bit b, and must output a guess \(b'\). The advantage of \(\mathsf{B}\) is .

It is not at all clear how to construct an adversary \(\mathsf{B}\) that breaks IND-CPA from an \(\mathsf{A}\) that breaks GSD. This problem becomes much easier if we assume that \(\mathsf{A}\) breaks the selective security of GSD, where \(\mathsf{A}\) must choose all its encryption, corruption and challenge queries before the experiment starts.

Fig. 1.
figure 1

Hybrids for the selective security proof. Green nodes correspond to keys, dark nodes are random values. The adversary \(\mathsf{A}\) commits to encryption queries (1, 3), (2, 3), (3, 5) and challenge 5 (Encryption query (4, 6) is outside the connected component containing the challenge and thus not relevant for the hybrids. \(\mathsf{A}\) could also corrupt keys 4 and 6, which are also outside.) Hybrid \(H_0\) is the real game, hybrid \(H_5\) is the random game, where instead of an encryption of the challenge key \(\mathsf{Enc}_{k_3}(k_5)\), the adversary gets an encryption of the random value \(\mathsf{Enc}_{k_3}(r_5)\). If an adversary \(\mathsf{A}\) can distinguish any two consecutive hybrids \(H_i\) and \(H_{i+1}\) with some advantage \(\delta \), we can use \(\mathsf{A}\) to construct \(\mathsf{B}\) which breaks the IND-CPA security of \(\mathsf{Enc}\) with the same advantage \(\delta \): E.g., assume \(\mathsf{B}\) is given an IND-CPA challenge \(C=\mathsf{Enc}_k(z)\) where z is one of two messages (which we call \(k_5\) and \(r_5\)). Now \(\mathsf{B}\) can simulate game \(H_2\) for \(\mathsf{A}\), but when \(\mathsf{A}\) makes the encryption query (3, 5), \(\mathsf{B}\) answers with C. If \(z=k_5\) then \(\mathsf{B}\) simulates game \(H_2\); but if \(z=r_5\), it simulates game \(H_3\). Note that \(\mathsf{B}\) can simulate the games because \(k_3\), which in the simulation is \(\mathsf{B}\)’s challenger’s key, is not used anywhere else. Thus, \(\mathsf{B}\) has the same advantage in the IND-CPA game as \(\mathsf{A}\) has in distinguishing \(H_3\) from \(H_4\) (Color figure online).

In fact, it is sufficient to know the topology of the connected component in the key graph that contains the challenge node. Let \(\alpha \) denote the number of edges in this component. One can now define a sequence of \(2\alpha \) hybrid games \(H_0,\ldots ,H_{2\alpha -1}\), where the first game is the real game (i.e., the GSD game with \(b=0\) where the adversary gets the key), the last hybrid is the random game (\(b=1\)), and moreover, from any adversary that distinguishes \(H_i\) from \(H_{i+1}\) with some advantage \(\epsilon '\), we get an adversary against the IND-CPA security of \(\mathsf{Enc}\) with the same advantage. Thus, given an \(\mathsf{A}\) breaking GSD with advantage \(\epsilon \), we can break the IND-CPA security with advantage \(\epsilon '\ge \epsilon /(2\alpha -1) \ge \epsilon /n^2\) (as an n vertex graph has \(\le n^2\) edges). We illustrate this reduction in Fig. 1.

Adaptive security of GSD. In the selective security proof for GSD we crucially relied on the fact that we knew the topology of the underlying key graph. Proving adaptive security, where the adversary decides what queries to ask adaptively during the experiment, is much more difficult. A generic trick to prove adaptive security is “complexity leveraging”, where one simply turns an adaptive adversary into a selective one by initially guessing the adaptive adversary’s choices and committing to those (as required by the selective security game). If during the security game the adaptive choices by the adversary disagree with the guessed ones, we simply abort. The problem with this approach is that assuming the adaptive adversary has advantage \(\epsilon \), the constructed selective adversary only has advantage \(\epsilon /P\) where is the probability of that our guess is correct, which is typically exponentially small. Concretely, in the GSD game we need to guess the nodes in the connected component containing the challenge, and as the number of such choices is exponential in the number of keys n, this probability is \(2^{-\varTheta (n)}\).

No proofs for the adaptive security of GSD with a subexponential (in n) security loss are known in general. But remember that the GSD problem abstracts problems we encounter in proving adaptive security of many real-world applications where the underlying key graph is typically not completely arbitrary, but often has some special structure. Motivated by this, Panjwani [Pan07] investigated better reductions assuming some special structure of the key graph. He gives a proof where the security degradation is only exponential in the depth of the key graph, as opposed to its size. Concretely, he proves that if the encryption scheme is \(\epsilon \)-IND-CPA secure then the adaptive GSD game with n keys where the adversary is restricted to key graphs of depth \(\ell \) is \(\epsilon '\)-secure where

$$ \epsilon '=\epsilon \cdot O(n\cdot (2n)^\ell ). $$

Until today, Panjawain’s bound is the only non-trivial improvement over the \(2^{\varTheta (n)}\) loss for GSD.

Our Result. The main result of this paper is Theorem 2, which states that GSD restricted to trees can be proven secure with only a quasi-polynomial loss

$$ \epsilon '=\epsilon \cdot n^{3\log (n)+5}. $$

Our bound is actually even stronger as the entire key graph need not be a tree; it is sufficient that the subgraph containing only the nodes from which the challenge node can be reached is a tree (when ignoring edge directions).

The bound above is derived from a more fine-grained bound: assuming that the longest path in the key graph is of length \(\ell \), the in-degree of every node is at most d and the challenge node can be reached from at most s sources (i.e., nodes with in-degree 0) we get

$$ \epsilon '=\epsilon \cdot dn((2d+1)n)^{\lceil \log s \rceil }\,(3n)^{\lceil \log \ell \rceil }. $$

Note that \(\ell ,d\) and s are at most n and the previous bound was derived from this by setting \(\ell =d=s=n\). Panjwani [Pan07] uses his bound to give a quasi-polynomial reduction of the Logical Key Hierarchy (LKH) protocol [WGL00]. Panjwani first fixes a flaw in LKH, and calls the new protocol rLKH with “r” for repaired. rLKH is basically the GSD game restricted to a binary tree.Footnote 2

The users correspond to the leaves of this tree, and their keys consists of all the nodes from the root to their leaf. Thus, if the tree is almost full and balanced, then it has only depth \(\ell \approx \log n\) and Panjwani’s bound loses only a quasi-polynomial factor \(n^{\log (n)+2}\) (if \(\ell =\log n\)). As here \(d=2,\ell =\log n,s=n\), our bound gives a slightly worse bound \(n^{\log (n)+\log \log (n)+4}\) for this particular problem, but this is only the case if a large fraction of the keys are actually used, and the adversary gets to see almost all of them. If \(\ell \) is significantly larger than \(\log n\) (e.g., because only few of the keys are active, or the tree is constructed in an unbalanced way like e.g. proposed in [SS00]), our bounds decrease only marginally, as opposed to exponentially fast in \(\ell \) in [Pan07].

Graphs with Small Cut-Width. The reason our result is restricted to trees is that in the process of generating the hybrids, we have to guess nodes such that removing this node splits the tree in a “nice” way (this has to be done \(\log n\) times, losing a factor n in the distinguishing advantage every time).

One can generalize this technique (but we do not work out the details in this paper) to graphs with small “cut-width”, where we say that a graph has cut-width w if for any two vertices uv that are not connected by an edge, there exists a set of at most w vertices such that removing those disconnects u from v (a tree has cut-width \(w=1\)). For graphs with cut-width w we get

$$ \epsilon '=\epsilon \cdot n^{(2w+1)\log (n)+4}, $$

which is subexponential in n, and thus beats the existing exponential bound whenever \(w=o(n/\log ^2(n))\). Whether there exists a subexponential reduction which works for any graph is an intriguing open problem.

Shorter Keys from Better Reduction. An exponential security loss (as via complexity leveraging) means that, even when assuming exponential hardness of \(\mathsf{Enc}\) (which is a typical assumption for symmetric encryption schemes like AES), one needs to use keys for \(\mathsf{Enc}\) whose length is at least linear in n to get any security guarantee for the hardness of GSD at all. Whereas our bound for trees means that a key of length polylog(n) is sufficient to get asymptotically overwhelming security (again assuming \(\mathsf{Enc}\) is exponentially hard).

Nested Hybrids. In a classical paper [GGM86] Goldreich, Goldwasser and Micali constructed a pseudorandom function (PRF) from a pseudorandom generator (PRG). More recently, three papers independently [BW13, KPTZ13, BGI14] observed that this construction is also a so-called constrained PRF, where for every string x one can compute a constrained key \(k_x\) that allows evaluation of the PRF on all inputs with prefix x. Informally, the security requirement is that an adversary that can ask for constrained keys cannot distinguish the output of the PRF on some challenge input from random.

All three papers [BW13, KPTZ13, BGI14] only prove selective security of this constrained PRF, where before any queries the adversary must commit to the input on which it wants to be challenged. This proof is a hybrid argument losing a factor 2m in the distinguishing advantage, where m is the PRF input length. One can then get adaptive security losing a huge exponential factor \(2^m\) via complexity leveraging. Subsequently, Fuchsbauer et al. [FKPR14] gave a reduction that only loses a quasi-polynomial factor \((3q)^{\log m}\), where q denotes the number of queries made by the adversary. Our proofs borrows ideas from their work.

Very informally, the idea behind their proof is the following. In the standard proof for adaptive security using leveraging one first guesses the challenge query (losing a huge factor \(2^m\)), which basically turns the adaptive attacker into a selective one, followed by a simple hybrid argument (losing a small factor 2m) to prove selective security. The proof from [FKPR14] also first makes a guessing step, but a much simpler one, namely which of the q queries made by the adversary is the first to coincide with the challenge query on the first bits. This is followed by a hybrid argument losing a factor 3, so both steps together lose a factor 3q. At this point the reduction is not finished yet, but intuitively the problem was reduced to itself but on inputs of only half the size . These two steps can be iterated \(\log m\) times (losing a total factor of \((3q)^{\log m}\)) to get a reduction to the security of the underlying PRG.

Proof Outline for Paths. Our proof for GSD uses an approach similar to the one just explained, iterating fairly simple guessing steps with hybrid arguments, but the analogy ends here, as the actual steps are very different.

We first outline the proof for the adaptive security of the GSD game for a special case where the adversary is restricted in the sense that the connected component in the key graph containing the challenge must be a path. Even for this very special case, currently the best reduction [Pan07] loses an exponential factor \(2^{\varTheta (n)}\). We will now outline a reduction losing only a quasi-polynomial \(n^{\log n}\) factor.Footnote 3 Recall that the standard way to prove adaptive security is to first guess the entire connected component containing the challenge, and then prove selective security as illustrated in Fig. 1.

Fig. 2.
figure 2

Illustration of our adaptive security proof for paths.

Our approach is not to guess the entire path, but in a first step only the node in the middle of the path (as we make a uniform guess, it will be correct with probability ). This reduces the adaptive security game to a “slightly selective” game where the adversary must commit initially to this middle node, at the price of losing a factor n in the distinguishing advantage.Footnote 4

Let \(H_0\) and \(H_3\) denote these “slightly selective” real and random GSD games (we also assume that the adversary initially commits to the challenge query, which costs another factor of n). We illustrate this with a small example featuring a path of length 4 in Fig. 2. The correct guess for the middle node for the particular run of the experiment illustrated in the figure is \(i=5\). As now we know the middle vertex is \(i=5\), we can define new games \(H_1\) and \(H_2\) which are derived from \(H_0\) and \(H_3\), respectively, by replacing the ciphertext \(\mathsf{Enc}_{k_j}(k_i)\) with an encryption \(\mathsf{Enc}_{k_j}(r_i)\) of a random value (in the figure this is illustrated by replacing the edge \(k_j\rightarrow k_i\) with \(k_j\rightarrow r_i\)).

So, what have we gained? If our adaptive adversary has advantage \(\epsilon \) in distinguishing the real and random games then she has advantage at least \(\epsilon /n\) to distinguish the “slightly selective” real and random games \(H_0\) and \(H_3\), and thus for some \(i\in \{0,1,2\}\) she can distinguish the games \(H_i\) and \(H_{i+1}\) with advantage \(\epsilon /3n\). Looking at two consecutive games \(H_i\) and \(H_{i+1}\), we see that they only differ in one edge (e.g., in \(H_2\) we answer the query (3, 5) with \(\mathsf{Enc}_{k_3}(r_5)\), in \(H_3\) with \(\mathsf{Enc}_{k_3}(k_5)\)), and moreover this edge will be at the end of a path that now has only length 2, that is, half the length of the path in our original real and random games.

We can now continue this process, constructing new games where the path length is halved, paying a factor 3n in distinguishing advantage. For example, as illustrated in Fig. 2, we can guess the node that halves the path leading to the differing query in games \(H_2\) and \(H_3\) (for the illustrated path this would be \(i=3\)), then define new games where we assume the adversary commits to this node (paying a factor n), and then define two new games \(H'_2\) and \(H'_3\), which are derived from games \(H_2\) and \(H_3\) (which now are augmented by our new guess), respectively, by answering the query (ji) that asks for an encryption of this node (in the figure \((j,i)=(1,3)\)) with an encryption \(\mathsf{Enc}_{k_1}(r_3)\) instead of \(\mathsf{Enc}_{k_1}(k_3)\).

If we start with a path of length \(\ell \le n\) then after \(\log \ell \le \log n\) iterations of this process we proved the existence of two consecutive games (call them \(G_0\) and \(G_1\)) that differ only in a single edge \(j\rightarrow i\) and the vertex j has in-degree 0. That is, both games are identical, except that in one game the encryption query (ji) is answered with \(\mathsf{Enc}_{k_j}(k_i)\) and in the other with \(\mathsf{Enc}_{k_j}(r_i)\). Moreover, the key \(k_j\) is not used anywhere else in the experiment and we know exactly when this query is made during the experiment (as the adversary committed to i).

Given a distinguisher \(\mathsf{A}\) for \(G_0\) and \(G_1\), we can now construct an attacker \(\mathsf{B}\) that breaks the IND-CPA security of the underlying encryption scheme with the same advantage: in the IND-CPA game \(\mathsf{B}\) chooses two random messages \(m_0,m_1\) and asks to be challenged on them.Footnote 5 The game samples a random bit b and returns the challenge \(C=\mathsf{Enc}_k(m_b)\) to \(\mathsf{B}\), which must then output a guess \(b'\) for b. At this point, \(\mathsf{B}\) invokes \(\mathsf{A}\) and simulates the game \(G_0\) for it, choosing all keys at random, except that it uses C to answer the encryption query (ji).Footnote 6 Finally, \(\mathsf{B}\) forwards \(\mathsf{A}\)’s guess \(b'\). Identifying \((k,m_0,m_1)\) with \((k_j,k_i,r_i)\), we see that depending on whether \(b=0\) or \(b=1\), \(\mathsf{B}\) simulates either \(G_0\) or \(G_1\). Thus, whatever advantage \(\mathsf{A}\) has in distinguishing \(G_0\) from \(G_1\), \(\mathsf{B}\) will break the IND-CPA security of \(\mathsf{Enc}\) with the same advantage.

Proof Outline for Trees. We will now outline our reduction of the adaptive security of GSD to the IND-CPA security of \(\mathsf{Enc}\) for a more general case. Namely, the adversary is only restricted in that the key graph resulting from its queries is such that the connected component containing the challenge is a tree. (Recall that we already disallowed cycles in the key graph as this would require circular security. Being a tree means that we also have no cycles in the key graph when ignoring edge directions). Note that paths as discussed in the previous section are very special trees. The GSD problem on trees is particularly interesting, as it captures some multicast encryption protocols like the Logical Key Hierarchy (LKH) protocol [WGL00]. We refer the reader to [Pan07] for details.

Trees with in-degrees \(\le 1\). Let us first consider the case where the connected component containing the challenge is a tree, and moreover all its vertices have in-degree 0 or 1. It turns out that the proof outlined for paths goes through with only minor changes for such trees. Note that such a tree has exactly one vertex with in-degree 0, which we call the root, and there is a unique path from the root to the challenge node. We can basically ignore all the edges not on this path and do a reduction as the one outlined above. The only difference is that now, when simulating the game \(G_b\) (where b is 0 or 1 depending on the whether the challenge C with which we answer the encryption query (ji) is \(\mathsf{Enc}_{k_j}(k_i)\) or \(\mathsf{Enc}_{k_j}(r_i)\)), the adversary can also ask for encryption queries (jx) for any x. This might seem like a problem as we do not know \(k_j\) (we identified \(k_j\) with the key used by the IND-CPA challenger). But recall that in the IND-CPA game there is an encryption oracle \(\mathsf{Enc}_{k_j}(\cdot )\), which we can query for the answer \(\mathsf{Enc}_{k_j}(k_x)\) to such encryption queries.

General Trees. For general trees, where nodes can have in-degree greater than 1, we need to work more. The proof for paths does not directly generalize, as now nodes (in particular, the challenge) can be reached from more than one node with in-degree 0. We call these the sources of this node; for example in the tree \(H_0\) in Fig. 3, the (challenge) node \(k_7\) has 4 sources \(k_1,k_2,k_3\) and \(k_{12}\).

Fig. 3.
figure 3

Illustration of our adaptive security proof for general trees.

On a high level, our proof strategy will be to start with a tree where the challenge node c has s sources (more precisely, we have two games that differ in one edge that points to \(k_i\) in one game, and to \(r_i\) in the other, like games \(H_0\) and \(H_7\) in Fig. 3). We then guess a node v that “splits” the tree in a nice way, by which we mean the following: Assume v has in-degree d and we divert every edge going into v to a freshly generated node; let’s call them \(v_1,\ldots ,v_d\). Then this splits the tree into a forest consisting of \(d+1\) trees (the component containing the challenge and one component for every \(v_i\)). The node v “well-divides” the tree if after the split the node c and all of \(v_1,\ldots ,v_d\) have at most \(\lceil s/2 \rceil \) sources.

As an example, consider again the tree \(H_0\) in Fig. 3, where the challenge node \(k_7\) has 4 sources. The node \(k_9\) would be a good guess, as it well-divides the tree: consider the forest after splitting at this node as described above (creating new nodes \(v_1,v_2,v_3\) and diverting the edges going into \(k_9\) to them, i.e., replacing \(k_5\rightarrow k_9\) by \(k_5\rightarrow v_1\), \(k_6\rightarrow k_9\) by \(k_6\rightarrow v_2\), and \(k_{12}\rightarrow k_9\) by \(k_{12}\rightarrow v_3\)). Then we obtain 4 trees, where now \(c=k_7\) has only one source (\(k_9\)) and the new nodes \(v_1,v_2,v_3\) have 2, 1 and 1 sources, respectively.

Once we have guessed a well-dividing node v (or equivalently, the adversary has committed to such a node), we define 2d hybrid games (where d is the degree of the well-dividing node) between the two initial games, which we call \(H_0\) and \(H_{2d+1}\), as follows. \(H_1\) is derived from \(H_0\) by diverting the first encryption query that asks for an encryption of v (i.e., that is of the form (jv) for some j) from real to random; that is, we answer with \(\mathsf{Enc}_{k_j}(r_v)\) instead of \(\mathsf{Enc}_{k_j}(k_v)\). For \(i\le d\), \(H_i\) is derived from \(H_0\) by diverting the first i encryption queries. \(H_{d+1}\) is derived from \(H_d\) by diverting the encryption query that asks for an encryption of the challenge c from real to random. The final \(d-1\) hybrids games are used to switch the encryption of v back from random to real, one edge at a time. This process is illustrated in the games \(H_0\) to \(H_7\) in Fig. 3.

Because v was well-dividing (and we show in the full version that such a node always exists), we can prove the following property for any two consecutive games \(H_i\) and \(H_{i+1}\): they differ in exactly one edge, which for some jv in one game is \(k_j\rightarrow k_v\) and \(k_j\rightarrow r_v\) in the other, and moreover, \(k_j\) has at most \(\lceil s/2 \rceil \) sources.

If an adversary can distinguish \(H_0\) and \(H_{2d+1}\) with advantage \(\epsilon \) then it must distinguish two hybrids \(H_i\) and \(H_{i+1}\) with advantage \(\epsilon /((2d+1)n)\) (where n accounts for guessing the well-dividing node). But any such two hybrids now only have at most \(\lceil s/2 \rceil \) sources. If we repeat this guessing/hybrid steps \(\log s\) times, we end up with two games \(G_0\) and \(G_1\) which differ in one edge that has only one source. At this point we can then use our reduction for trees with only one source outlined above.

Analyzing the Security Loss. To halve the number of sources, we guess a well-dividing vertex (which costs a factor n in the reduction), and then must add up to 2d intermediate hybrids (where d is the maximum in-degree of any node), costing another factor \(2d+1\). Assuming that the number of sources is bounded by s, we have to iterate the process at most \(\log s\) times. Finally, we lose another factor d (but only once) because our final node can have more than one ingoing edge. Overall, assuming the adversary breaks the GSD game with advantage \(\epsilon \) on trees with at most s sources and in-degree at most d, our reduction yields an attacker against the IND-CPA security of \(\mathsf{Enc}\) with advantage

$$\epsilon /\,dn((2d+1)n)^{\lceil \log s \rceil }\,(3n)^{\lceil \log \ell \rceil }\ .$$

For general trees, since \(s,d\le n\), we have \(\epsilon /\,n^{3\log n+5}\).

2 Preliminaries

For \(a\in {\mathbb {N}}\), we let \([a]=\left\{ 1,2, \ldots , a \right\} \) and \([a]_0=[a]\cup \left\{ 0 \right\} \). We say adversary (or distinguisher) D is t-bounded if D runs in time t.

Definition 1

(Indistinguishability). Two distributions X and Y are \((\epsilon ,t)\)-indistinguishable, denoted \(Y \sim _{(\epsilon ,t)} X\) or \(\varDelta _{t}(Y,X)\le \epsilon \), if no t-bounded distinguisher D can distinguish them with advantage greater than \(\epsilon \), i.e.,

$$\begin{aligned} \varDelta _{t}(Y,X)\le \epsilon \ \Longleftrightarrow \ \forall \,\mathsf{D}_t : \big | \hbox {Pr}\left[ \mathsf{D}_t(X)=1 \right] -\hbox {Pr}\left[ \mathsf{D}_t(Y)=1 \right] \big |\le \epsilon . \end{aligned}$$

Symmetric Encryption. A pair of algorithms \((\mathsf{Enc},\mathsf{Dec})\) with input \(k\in \{0,1\}^\lambda \), where \(\lambda \) is the security parameter, and a message m (or a ciphertext) from \(\{0,1\}^*\) is a symmetric-key encryption scheme if for all km we have \({\mathsf{Dec}_k}({\mathsf{Enc}_k}(m))=m\). Consider the game \(\mathbf{Exp}_{\mathsf{Enc},\,\mathsf{D}}^{{\textsc {ind-cpa}}-b}\) between a challenger C and a distinguisher D: C chooses a uniformly random key \(k\in \{0,1\}^\lambda \) and a bit \(b\in \{0,1\}\); D can make encryption queries for messages m and receives \({\mathsf{Enc}_k}(m)\); finally, \(\mathsf{D}\) outputs a pair \((m_0,m_1)\), is given \({\mathsf{Enc}_k}({m_b})\) and outputs a bit \(b'\in \{0,1\}\), which is also the output of \(\mathbf{Exp}^{{\textsc {ind-cpa}}-b}_{\mathsf{Enc},\,\mathsf{D}}\).Footnote 7

Definition 2

Let \(t \in {\mathbb {N}}^+\) and \(0<\epsilon <1\). An encryption scheme \((\mathsf{Enc},\mathsf{Dec})\) is \((t,\epsilon )\)-IND-CPA secure if for any t-bounded distinguisher D, we have

$$\big |\hbox {Pr}\big [\mathbf{Exp}^{{\textsc {ind-cpa}}-1}_{\mathsf{Enc},\,\mathsf{D}}=1\big ] - \hbox {Pr}\big [\mathbf{Exp}^{{\textsc {ind-cpa}}-0}_{\mathsf{Enc},\,\mathsf{D}}=1\big ]\big |\le \epsilon .$$

3 The GSD Game

In this section we describe the generalized selective decryption game as defined in [Pan07] and give our main theorem. Consider the following game, \(\mathbf{Exp}^{{\textsc {gsd}}-(n,b)}_{\mathsf{Enc},\,\mathsf{A}}\) called the generalized selective decryption (GSD) game, parameterized by an encryption scheme \(\mathsf{Enc}\),Footnote 8 an integer n and a bit b. It is played by the adversary A and the challenger B. First B samples n keys \(k_1, k_2, \ldots , k_n\) uniformly at random from \(\{0,1\}^\lambda \). A can make three types of queries during the game:

  • encrypt: A query of the form encrypt(ij) is answered with \(c\leftarrow \mathsf{Enc}_{k_i}(k_j)\).

  • corrupt: A query of the form \(\mathsf{corrupt}(i)\) is answered with \(k_i\).

  • challenge: The response to \(\mathsf{challenge}(i)\) depends on the bit b: if \(b = 0\), the answer is \(k_i\); if \(b=1\), the answer is a random value \(r_i\in \{0,1\}^\lambda \).

\(\mathsf{A}\) can make multiple queries of each type, adaptively and in any order. It can also make several challenge queries at any point in the in the game. Allowing multiple challenge queries models the fact that the respective keys are jointly pseudorandom (as opposed to individual keys being pseudorandom by themselves). Allowing to interleave challenges with other queries models that they remain pseudorandom even after corrupting more keys or seeing further ciphertexts.

We can think of the n keys that \(\mathsf{B}\) creates as n vertices, labeled \(1,2,\ldots ,n\), in a graph. In the beginning of the game there are no edges, but every time \(\mathsf{A}\) queries \(\mathsf{encrypt}(i,j)\), we add the edge \(i\rightarrow j\) to the graph. When \(\mathsf{A}\) queries \(\mathsf{corrupt}(i)\) for some \(i\in [n]\), we mark i as a corrupt vertex; when \(\mathsf{A}\) queries \(\mathsf{challenge}(i)\), we mark it as a challenge vertex. For an adversary \(\mathsf{A}\) we call this graph the key graph, denoted \(G(\mathsf{A})\) and we write \(V^\text {corr}(\mathsf{A})\) and \(V^\text {chal}(\mathsf{A})\) for the sets of corrupt and challenge nodes, respectively. (Note that \(G(\mathsf{A})\) is a random variable depending on the randomness used by \(\mathsf{A}\) and its challenger.)

Legitimate Adversaries. Consider an adversary that corrupts a node i in \(G(\mathsf{A})\) and queries \(\mathsf{challenge}(j)\) for some j which is reachable from i. Then \(\mathsf{A}\) can successively decrypt the keys on the path from i to j, in particular \(k_j\), and thus deduce the bit b. We only consider non-trivial breaks and require that no challenge node is reachable from a corrupt node in \(G(\mathsf{A})\).

Two more restrictions must be imposed on \(G(\mathsf{A})\) if we only want to assume that \(\mathsf{Enc}\) satisfies IND-CPA. First, we do not allow key cycles, that is, queries yielding

$$\mathsf{Enc}_{k_{i_1}}(k_{i_2}), \mathsf{Enc}_{k_{i_2}}(k_{i_3}),\ldots , \mathsf{Enc}_{k_{i_{s-1}}}(k_{i_s}), \mathsf{Enc}_{k_s}(k_{i_1}),$$

as this would require the scheme to satisfy key-dependent-message (a.k.a. circular) security [BRS03, CL01].

Second, IND-CPA security does not imply that keys under which one has seen encryptions of random messages remain pseudorandom.Footnote 9 Pseudorandomness of keys (assuming only IND-CPA security of the underlying scheme) can thus only hold if their corresponding node does not have any outgoing edges. We thus require that all challenge nodes in the key graph are sinks (i.e., their out-degree is 0). The requirements (as formalized also in [Pan07]) are summarized in the following.

Definition 3

An adversary \(\mathsf{A}\) is legitimate if in any execution of \(\mathsf{A}\) in the GSD game the values of \(G(\mathsf{A})\), \(V^{corr}(\mathsf{A})\) and \(V^{chal}(\mathsf{A})\) are such that:

  • For all \(i \in V^{corr}(\mathsf{A})\) and \(j \in V^{chal}(\mathsf{A})\): j is unreachable from i in \(G(\mathsf{A})\).

  • \(G(\mathsf{A})\) is a directed acyclic graph (DAG) and every node in \(V^{chal}(\mathsf{A})\) is a sink.

Let \(n\in {\mathbb {N}}^+\) and \(\mathcal{G}\) be a class of DAGs with n vertices. We say that a legitimate adversary \(\mathsf{A}\) is a \(\mathcal{G}\)-adversary if in any execution the key graph belongs to \(\mathcal{G}\), i.e., \(G(\mathsf{A})\in \mathcal{G}\).

Definition 4

Let \(t \in {\mathbb {N}}^+\), \(0 < \epsilon < 1\). An encryption scheme \(\mathsf{Enc}\) is called \((n,t,\epsilon ,\mathcal{G})\)-GSD secure if for every \(\mathcal{G}\)-adversary A running in time t, we have

$$\big |\hbox {Pr}\big [ \mathbf{Exp}^{{\textsc {gsd}}-(n,1)}_{\mathsf{Enc},\,\mathsf{A}}=1 \big ]-\hbox {Pr}\big [ \mathbf{Exp}^{{\textsc {gsd}}-(n,0)}_{\mathsf{Enc},\,\mathsf{A}}=1 \big ] \big |\le \epsilon .$$

Assuming One Challenge Query is Enough. Although the definition of GSD allows the adversary to make any number of corruption queries, Panjwani [Pan07] observes that by a standard hybrid argument one can turn any adversary with advantage \(\epsilon \) (which makes at most \(q\le n\) challenge queries) into an adversary that makes only one challenge query, but still has advantage at least \(\epsilon /q\). From now on we therefore only consider adversaries that make exactly one challenge query (keeping in mind that we have to pay an extra factor n in the final distinguishing advantage for statements about general adversaries).

4 Single Source

In this section we will analyze the GSD game for key graphs in which the challenge node is only reachable from one source node. That is, for some \(q\le n\) there is a path \(p_1\rightarrow p_2\rightarrow \ldots \rightarrow p_q\) where \(p_1\) has in-degree 0, all nodes \(p_i\), \(2\le i \le q\) have in-degree 1 (but arbitrary out-degree) and the (single) challenge query is \(\mathsf{challenge}(p_q)\) (recall that the challenge has out-degree 0). Let \(\mathcal{G}_1\) be the set of all such graphs, and \(\mathcal{G}_1^\ell \subseteq \mathcal{G}_1\) be the subset where this path has length at most \(\ell \).

Theorem 1

(GSD on Trees with One Path to Challenge). Let \(t \in \mathbb {N}\), \(0<\epsilon <1\) and \(\mathcal{G}_{1}\) be the class of key graphs just defined. If an encryption scheme is \((t,\epsilon )\)-IND-CPA secure then it is also \((n,t',\epsilon ',\mathcal{G}_1)\)-GSD secure for

$$\epsilon '=\epsilon \cdot n\,(3n)^{\lceil \log n\rceil } \qquad and \qquad t'=t-{Q_{{\mathsf {Adv}}}}{T_{\mathsf{Enc}}}-\tilde{O}({Q_{{\mathsf {Adv}}}}), $$

where \({T_{\mathsf{Enc}}}\) denotes the time required to encrypt a key, and \({Q_{{\mathsf {Adv}}}}\) denotes an upper bound on the number of queries made by the adversary.Footnote 10 More generally, if we replace \(\mathcal{G}_1\) with \(\mathcal{G}^\ell _1\), we get

$$\epsilon '=\epsilon \cdot n\,(3n)^{\lceil \log \ell \rceil }\qquad \text { and }\qquad t'=t-{Q_{{\mathsf {Adv}}}}{T_{\mathsf{Enc}}}-\tilde{O}({Q_{{\mathsf {Adv}}}}).$$

GSD on Single-Source Graphs. For \(b\in \{0,1\}\), we consider the GSD game \(\mathbf{Exp}^{{\textsc {gsd}}-(n,b)}_{\mathsf{Enc}}\) on \(\mathcal{G}_1\) between B and an adversary A. Challenger B first samples n random keys \(k_1,k_2,\ldots ,k_n\) and we assume that already at this point \(\mathsf{B}\) samples fake keys \(r_1,\ldots ,r_n\). On all \(\mathsf{encrypt}(i,j)\) queries B returns real responses \(\mathsf{Enc}_{k_i}(k_j)\). If \(b=0\), the response to \(\mathsf{challenge}(z)\) is \(k_{z}\); if \(b=1\), the response is \(r_{z}\).

We require that the key graph is in \(\mathcal{G}_1\), that is the connected component of the key graph which contains the challenge z has a path \(p_1\rightarrow p_2\rightarrow \ldots \rightarrow p_q=z\) with \(p_1\) having in-degree 0, all other \(p_i\) having in-degree 1 and \(p_q=z\) having out-degree 0 (this means \(\mathsf{A}\) made queries \(\mathsf{encrypt}(p_{i-1},p_i)\), but no queries \(\mathsf{encrypt}(x,p_i)\) for \(x\ne p_{i-1}\)).

Eventually, A outputs a bit \(b'\in \{0,1\}\), which is also the output of the game. If the encryption scheme \(\mathsf{Enc}\) is not \((t',\epsilon ',\mathcal{G}_1)\)-GSD secure then there exists a \(\mathcal{G}_1\)-adversary A running in time \(t'\) such that

$$\begin{aligned} \big | \hbox {Pr}\big [ \mathbf{Exp}^{{\textsc {gsd}}-(n,0)}_{\mathsf{Enc},\,\mathsf{A}}=1 \big ]-\hbox {Pr}\big [ \mathbf{Exp}^{{\textsc {gsd}}-(n,1)}_{\mathsf{Enc},\,\mathsf{A}}=1 \big ] \big |> \epsilon '. \end{aligned}$$
(1)

Our Goal. Suppose we knew that our GSD adversary \(\mathsf{A}\) wants to be challenged on a fixed node \(z^*\) and that it will make a query \(\mathsf{encrypt}(y,z^*)\) for some y which it will not use in any other query. Then we could use \(\mathsf{A}\) directly to construct a distinguisher \(\mathsf{D}\) as in Definition 2: \(\mathsf{D}\) sets up all keys \(k_x\), \(x\in [n]\), samples a value \(r_{z^*}\) and runs \(\mathsf{A}\), answering \(\mathsf{A}\)’s queries using its keys; except when \(\mathsf{encrypt}(y,z^*)\) is queried for any \(y\in [q]\), D queries its own challenger on \((k_{z^*},r_{z^*})\) and forwards the answer to A. Moreover, \(\mathsf{challenge}(z^*)\) is answered with \(k_{z^*}\). If \(\mathsf{D}\)’s challenger \(\mathsf{C}\) chose \(b=0\), this perfectly simulates the real game for \(\mathsf{A}\). If \(b=1\) then \(\mathsf{A}\) gets an encryption of \(r_{z^*}\) and the challenge query is answered with \(k_{z^*}\), although in the random GSD game \(\mathsf{A}\) expects an encryption of \(k_{z^*}\) and \(\mathsf{challenge}(z^*)\) to be answered with \(r_{z^*}\). However, these two games are distributed identically, since both \(k_{z^*}\) and \(r_{z^*}\) are uniformly random values that do not occur anywhere else in the game. Thus D simulates the real game when \(b=0\) and the random game when \(b=1\). Note that D implicitly set \(k_y\) to the key that C chose, but that’s fine, since we assumed that \(k_y\) is not used anywhere else in the game and thus not needed by \(\mathsf{D}\) for the simulation.

Finally, suppose that, in addition to the challenge \(z^*\), we knew \(y^*\) for which \(\mathsf{A}\) will query \(\mathsf{encrypt}(y^*,z^*)\). Then we could also allow \(\mathsf{A}\) to issue queries of the form \(\mathsf{encrypt}(y^*,x)\), for x other than \(z^*\). \(\mathsf{D}\) could easily simulate any such query by querying \(k_x\) to its encryption oracle.

Unfortunately, general GSD adversaries can decide adaptively on which node they want to be challenged, and worse, they can make queries \(\mathsf{encrypt}(x,y)\), where y is a key that encrypts the challenge.

We will construct a series of hybrids where any two consecutive games \({\mathbf{Game}}\) and \({\mathbf{Game}}'\) are such that from a distinguisher \(\mathsf{A}\) for them, we can construct an adversary D against the encryption scheme with the same advantage. For this, the two games should only differ in the response of one encryption query on the path to the challenge, say encrypt(yz), which is responded to with a real ciphertext \(\mathsf{Enc}_{k_y}(k_z)\) in \({\mathbf{Game}}\) and with a fake ciphertext \(\mathsf{Enc}_{k_y}(r_z)\) in \({\mathbf{Game}}'\).

Moreover, the key \(k_y\) must not be encrypted anywhere else in the game, as our distinguisher D will implicitly set \(k_y\) to be the key of its IND-CPA challenger C. Thus, in \({\mathbf{Game}}\) and \({\mathbf{Game}}'\) all queries \(\mathsf{encrypt}(x,y)\), for any x, are responded to with a fake ciphertext \(\mathsf{Enc}_{k_x}(r_y)\). Summing up, we need the two games to have the following properties for some y:

  • Property 1. \({\mathbf{Game}}\) and \({\mathbf{Game}}'\) are identical except for the response to one query \(\mathsf{encrypt}(y,z)\), which is replied to with a real ciphertext in \({\mathbf{Game}}\) and a fake one in \({\mathbf{Game}}'\).

  • Property 2. Queries \(\mathsf{encrypt}(x,y)\) are replied to with a fake response in both games.

If we knew the entire key graph \(G(\mathsf{A})\) before answering \(\mathsf{A}\)’s queries then we could define a series of \(2q-1\) games as in Fig. 1 where we consecutively replace edges from the source to the challenge by fake nodes and then go back replacing fake edges with real ones starting with \(p_{q-2}\rightarrow p_{q-1}\). Any two consecutive games in such a sequence would satisfy the two properties, so we could use them to break IND-CPA.

The problem is that in general the probability of guessing the connected component containing the challenge is exponentially small in n and consequently from a GSD adversary’s advantage \(\epsilon '\) we will obtain a distinguisher \(\mathsf{D}\) with advantage \(\epsilon = \epsilon '/O(n!)\). To avoid an exponential loss, we thus must avoid guessing the entire component at once.

The First Step. Our first step is to define two new games \(\mathsf{Game}_{{}^{\emptyset }}^{{}_{\{q\}}}\) and \(\mathsf{Game}_{{}^{\left\{ q \right\} }}^{{}_{\left\{ q \right\} }}\), which are modifications of \(\mathbf{Exp}^{{\textsc {gsd}}-0}\) and \(\mathbf{Exp}^{{\textsc {gsd}}-1}\), respectively. Both new games have an extra step at the beginning of the game: \(\mathsf{B}\) guesses which key is going to be the challenge key and at the end of the game only if its guess was correct, the output of the game is \(\mathsf{A}\)’s output and otherwise it is 0. Clearly \(\mathsf{B}\)’s guess is correct with probability . Aside from this guessing step, \(\mathsf{Game}_{{}^{\emptyset }}^{{}_{\{q\}}}\) is identical to \(\mathbf{Exp}^{{\textsc {gsd}}-0}\); all responses are real. We therefore have .

Analogously, we define an auxiliary game, \(\mathsf{Game}_{{}^{1}}^{{}_{\{q\}}}\), which is identical to \(\mathbf{Exp}^{{\textsc {gsd}}-1}\), except for the guessing step. Again we have . We then define \(\mathsf{Game}_{{}^{\left\{ q \right\} }}^{{}_{\left\{ q \right\} }}\) exactly as \(\mathsf{Game}_{{}^{1}}^{{}_{\{q\}}}\), except for a syntactical change: Let z be the guessed value for the challenge node. Then any query \(\mathsf{encrypt}(x,z)\) is replied to with \(\mathsf{Enc}_{k_x}(r_z)\), that is, an encryption of the fake key \(r_z\). (Note that this game can be simulated, since we “know” z when guessing correctly.) On the other hand, the query \(\mathsf{challenge}(z)\) is answered with \(k_z\) (rather than \(r_z\) in \(\mathbf{Exp}^{{\textsc {gsd}}-1}\)). Since the difference between \(\mathsf{Game}_{{}^{1}}^{{}_{\{q\}}}\) and \(\mathsf{Game}_{{}^{\left\{ q \right\} }}^{{}_{\left\{ q \right\} }}\) is that we have replaced all occurrences of \(k_z\) by \(r_z\) and all occurrences of \(r_z\) by \(k_z\), which are distributed identically (thus we’ve merely swapped the names of \(k_z\) and \(r_z\)), we have .

Together with Eq. (1), we have thus

figure a

We continue to use the notational convention that for sets \(I\subseteq P\subseteq [n]\), the game \(\mathsf{Game}^P _I\) is derived from the real game by additionally guessing the nodes corresponding to P and answering encryptions of the nodes in I with fake keys. This is made formal in Fig. 4 below.

The Second Step. Assume q is a power of 2 and consider \(\mathsf{Game}_{{}^{\emptyset }}^{{}_{\{q/2,\,q\}}}\), which is identical to \(\mathsf{Game}_{{}^{\emptyset }}^{{}_{\{q\}}}\), except that in addition to the challenge node, B also guesses which node \(x\in [n]\) is going to be the node in the middle of the path to the challenge, i.e. \(p_{q/2}=x\). T he output of \(\mathsf{Game}_{{}^{\emptyset }}^{{}_{\{q/2,\,q\}}}\) is A’s output if the guess was correct and 0 otherwise. Since B guesses correctly with probability , we have

By guessing the middle node, we can assume the middle node is known and this will enable us to define a hybrid game, \(\mathsf{Game}_{{}^{\left\{ q/2 \right\} }}^{{}_{\left\{ q/2,\,q \right\} }}\), in which the query for the encryption of \(k_{p_{q/2}}\) is responded to with a fake answer. In addition, we consider games \(\mathsf{Game}_{{}^{\left\{ q \right\} }}^{{}_{\left\{ q/2,\,q \right\} }}\) and \(\mathsf{Game}_{{}^{\left\{ q/2,\,q \right\} }}^{{}_{\left\{ q/2,\,q \right\} }}\) which are similarly defined by making the same changes to game \(\mathsf{Game}_{{}^{\left\{ q \right\} }}^{{}_{\left\{ q \right\} }}\), i.e. guessing the middle node and replying to the encryption query of the guessed key with a fake and a real ciphertext respectively. Again, we have . Therefore \((t',\epsilon '/n)\)-distinguishability of \(\mathsf{Game}_{{}^{\emptyset }}^{{}_{\{q\}}}\) and \(\mathsf{Game}_{{}^{\left\{ q \right\} }}^{{}_{\left\{ q \right\} }}\) implies that \(\mathsf{Game}_{{}^{\emptyset }}^{{}_{\left\{ q/2,\,q \right\} }}\) and \(\mathsf{Game}_{{}^{\left\{ q \right\} }}^{{}_{\left\{ q/2,\,q \right\} }}\) are \((t',\epsilon '/n^2)\)-distinguishable, i.e. \({\varDelta }_{t}\big ({\mathsf{Game}_{{}^{\emptyset }}^{{}_{\left\{ q/2,\,q \right\} }},\mathsf{Game}_{{}^{\left\{ q \right\} }}^{{}_{\left\{ q/2,\,q \right\} }}}\big ) > \epsilon '/{n^2}\), and therefore by the triangle inequality

(2)

By Eq. (2), at least one of the pairs of games on the left-hand side must be \((t',\epsilon '/3n^2)\)-distinguishable. The two games of every pair differ in exactly one point, as determined by the subscript of each game. For instance, the difference between the last pair \(\mathsf{Game}_{{}^{\left\{ q/2,\,q \right\} }}^{{}_{\left\{ q/2,\,q \right\} }}\) and \(\mathsf{Game}_{{}^{\left\{ q \right\} }}^{{}_{\left\{ q/2,\,q \right\} }}\) is the encryption of node q / 2.

Recall that our goal is to construct a pair of hybrids where the differing query \(\mathsf{encrypt}(y,z)\) is such that all queries \(\mathsf{encrypt}(x,y)\) are replied to with \(\mathsf{Enc}_{k_x}(r_y)\), as formalized as Property 2. Games \(\mathsf{Game}_{{}^{\emptyset }}^{{}_{\{q\}}}\) and \(\mathsf{Game}_{{}^{\left\{ q \right\} }}^{{}_{\left\{ q \right\} }}\) differed in the last query on the path and the only key above it that is not encrypted anywhere is the start of the path. What we have achieved with our games above is to halve that distance: the first pair, \((\mathsf{Game}_{{}^{\emptyset }}^{{}_{\left\{ q/2,\,q \right\} }}, \mathsf{Game}_{{}^{\left\{ q/2 \right\} }}^{{}_{\left\{ q/2,\,n \right\} }})\), and the last pair, \((\mathsf{Game}_{{}^{\left\{ q/2,\,q \right\} }}^{{}_{\left\{ q/2,\,q \right\} }},\mathsf{Game}_{{}^{\left\{ q \right\} }}^{{}_{\left\{ q/2,\,q \right\} }})\), differ in a node that is only half way down the path; and the middle pair, \((\mathsf{Game}_{{}^{\left\{ q/2 \right\} }}^{{}_{\left\{ q/2,\,q \right\} }},\mathsf{Game}_{{}^{\left\{ q/2,\,q \right\} }}^{{}_{\left\{ q/2,\,q \right\} }})\), differ in the last node, but half way up the path there is a key, namely \(k_{q/2}\), which is not encrypted anywhere, as all queries \(\mathsf{encrypt}(x,q/2)\) are answered with \(\mathsf{Enc}_{k_x}(r_{q/2})\).

The Remaining Steps. For any of the three pairs that is \((t',\epsilon '/3n^2)\)-distinguishable (and by Eq. (2) there must exist one), we can repeat the same process on the half of the path which ends with the query that is different in the two games. For example, assume this holds for the last pair, that is

$$\begin{aligned} {\varDelta }_{t}\Big ({\mathsf{Game}_{\left\{ q/2,\,q \right\} }^{\left\{ q/2,\,q \right\} },\mathsf{Game}_{\left\{ q \right\} }^{\left\{ q/2,\,q \right\} }}\Big )\ >\ \frac{\epsilon '}{3n^2}. \end{aligned}$$
(3)

We repeat the process of guessing the middle node between the differing node and the random node above (in this case the root of the path), which is thus node q/4, and obtain a new pair which satisfies

$$\begin{aligned} {\varDelta }_{t}\Big ({\mathsf{Game}_{\left\{ q/2,\,q \right\} }^{\left\{ q/4,\,q/2,\,q \right\} },\mathsf{Game}_{\left\{ q \right\} }^{\left\{ q/4,\,q/2,\,q \right\} }}\Big )\ >\ \frac{\epsilon '}{3n^3}, \end{aligned}$$
(4)

by Eq. (3) and the fact that the guess is correctly with probability . We can now define two intermediate games

$$\begin{aligned} \mathsf{Game}_{\left\{ q/4,\,q/2,\,q \right\} }^{\left\{ q/4,\,q/2,\,q \right\} }\quad \text { and }\quad \mathsf{Game}_{\left\{ q/4,\,q \right\} }^{\left\{ q/4,\,q/2,\,q \right\} } \end{aligned}$$
(5)

where we replaced the encryption of \(k_{p_{q/4}}\) by one of \(r_{p_{q/4}}\). As in Eq. (2), we can again define a sequence of games by putting the games in Eq.  (5) between the ones in Eq. (4) and argue that by Eq. (4), two consecutive hybrids must be \((t', {\epsilon '}/(3^2 n^3))\)-distinguishable. What we have gained is that any pair in this sequence differs by exactly one edge and the closest fake answer above is only a fourth of the path length away.

Fig. 4.
figure 4

Definition of \(\mathsf{Game}_{I}^{P}\) for the single-source case.

Repeating these two steps a maximum number of \(\lceil \log q\rceil \) times, we arrive at two consecutive games, where the distance from the differing node to the closest “fake” node above is 1. We have thus found two games that satisfy Properties 1 and 2, meaning we can use a distinguisher \(\mathsf{A}\) to construct an adversary \(\mathsf{D}\) against the encryption scheme.

Since a path has at most n nodes, after at most \(\log n\) steps we end up with two games that are \((t',{\epsilon '}/n(3n)^{\lceil \log n\rceil })\)-distinguishable and which can be used to break the encryption scheme. If the adversary is restricted to paths of length \(\ell \) (i.e., graphs in \(\mathcal{G}_1^\ell \)), this improves to \((t',{\epsilon '}/n(3n)^{\lceil \log \ell \rceil })\).

Proof of Theorem  1. We formalize our method to give a proof of the theorem. In Fig. 4 we describe game \(\mathsf{Game}_{I}^{P}\), which is defined by the nodes on the path that are guessed (represented by the set P) and the nodes where an encryption of a key is replaced with an encryption of a value r (represented by \(I\subseteq P\)).

Lemma 1

Let \(I \subseteq P\subseteq [n]\) and \(z\in P\setminus I\). Also let y be the largest number in I such that \(y < z\), and \(y=0\) if z is smaller than all elements in I. If \(\mathsf{Game}_{I}^{P}\) and \(\mathsf{Game}_{I\cup \{z\}}^{P}\) are \((t,\epsilon )\)-distinguishable then the following holds.

  • If \(z=y+1\) then \(\mathsf{Enc}\) is not \((t+{Q_{{\mathsf {Adv}}}}{T_{\mathsf{Enc}}}+\tilde{O}({Q_{{\mathsf {Adv}}}})),\, \epsilon )\)-IND-CPA-secure.

  • If \(z>y+1\), define \(z'=y+\lfloor {(z-y)/2}\rfloor \), \(P'=P\cup \{z'\}\) and

    $$\begin{aligned} I_1&=I,&I_2&=I\cup \{z'\},&I_3&=I\cup \{z',z\},&I_4&=I\cup \{z\}. \end{aligned}$$

Then for some \(i\in \left\{ 1,2,3 \right\} \), games \(\mathsf{Game}_{I_i}^{P'}\) and \(\mathsf{Game}_{I_{i+1}}^{P'}\) are \((t, \epsilon /3n)\)-distinguishable.

The proof of this lemma can be found in the full version. Applying Lemma 1 repeatedly \(\lceil \log n\rceil \) times (or \(\lceil \log \ell \rceil \) if we know an upper bound on the path length \(\ell \)), we obtain the proof of Theorem 1.

5 General Trees

For a node v in a directed graph G let \({\mathsf T}_v\) denote the subgraph of G we get when only keeping the edges on paths that lead to v. In this section we prove bounds for GSD if the underlying key graph is a tree. Concretely, let \(\mathcal{G}_\tau \) be the class of key graphs that contain one designated “challenge node” z and where the graph \({\mathsf T}_z\) is a tree (when ignoring edge directions).

To give more fine-grained bounds we define a subset \(\mathcal{G}^{s,d,\ell }_\tau \subseteq \mathcal{G}_\tau \) as follows. For \(G\in \mathcal{G}_\tau \), let z be the challenge node and \({\mathsf T}_z\) as above. Then \(G\in \mathcal{G}^{s,d,\ell }_\tau \) if the challenge node has at most s sources (i.e., there are at most s nodes u of in-degree 0 s.t. there is a directed path from u to z), every node in \({\mathsf T}_z\) has in-degree at most d and the longest path in \({\mathsf T}_z\) has length at most \(\ell \). Note that as \(d<n,s<n\) and \(\ell \le n\) any \(G\in \mathcal{G}_\tau \) with n nodes is trivially in \(\mathcal{G}_\tau ^{n-1,n-1,n}\).

Theorem 2

(Security of GSD on Trees). Let \(n,t \in \mathbb {N}\), \(0<\epsilon <1\) and \(\mathcal{G}_\tau \) be the class of key graphs just defined. If an encryption scheme is \((t,\epsilon )\)-IND-CPA secure then it is also \((n,t',\epsilon ',\mathcal{G}_\tau )\)-GSD secure for

$$\begin{aligned} \epsilon ' = \epsilon \cdot n^2 (6n^3)^{{\lceil \log n \rceil }} \le \epsilon \cdot n^{3{\lceil \log n \rceil }+5} \qquad and \qquad t'=t-{Q_{{\mathsf {Adv}}}}{T_{\mathsf{Enc}}}-\tilde{O}({Q_{{\mathsf {Adv}}}}) \end{aligned}$$

(with \({Q_{{\mathsf {Adv}}}},{T_{\mathsf{Enc}}}\) as in Theorem 1). If we replace \(\mathcal{G}_\tau \) with \(\mathcal{G}_\tau ^{s,d,\ell }\) then

$$\begin{aligned} \epsilon ' = \epsilon \cdot dn((2d+1)n)^{\lceil \log s \rceil }\,(3n)^{\lceil \log \ell \rceil } \qquad and \qquad t'=t-{Q_{{\mathsf {Adv}}}}{T_{\mathsf{Enc}}}-\tilde{O}({Q_{{\mathsf {Adv}}}}). \end{aligned}$$

For space reasons, the proof of this theorem is moved to the full version.

6 Conclusions and Open Problems

We showed a quasipolynomial reduction of the GSD game on trees to the security of the underlying symmetric encryption scheme. As already discussed in the introduction, it is an interesting open problem to extend our reduction to general (directed, acyclic) graphs or to understand why this is not possible. This is the second result using the “nested hybrids” technique (after its introduction in [FKPR14] to prove the security of constrained PRFs), and given that it found applications for two seemingly unrelated problems, we believe that there will be further applications in the future.

One candidate is the problem of proving security under selective opening attacks [DNRS99, FHKW10, BHY09], where one wants to prove security when correlated messages are encrypted under different keys. Here, the adversary may adaptively chose to corrupt some keys after seeing all ciphertexts, and one requires that the messages in the unopened ciphertexts are indistinguishable from random messages (sampled so they are consistent with the already opened ciphertexts). This problem is notoriously hard, and no reduction avoiding complexity leveraging to IND-CPA security of the underlying scheme is known.