Fixing Cracks in the Concrete: Random Oracles with Auxiliary Input, Revisited
 11 Citations
 2.4k Downloads
Abstract
We revisit the security of cryptographic primitives in the randomoracle model against attackers having a bounded amount of auxiliary information about the random oracle. This situation arises most naturally when an attacker carries out offline preprocessing to generate state (namely, auxiliary information) that is later used as part of an online attack, with perhaps the bestknown example being the use of rainbow tables for function inversion. The resulting model is also critical to obtain accurate bounds against nonuniform attackers when the random oracle is instantiated by a concrete hash function.
Unruh (Crypto 2007) introduced a generic technique (called presampling) for analyzing security in this model: a random oracle for which S bits of arbitrary auxiliary information can be replaced by a random oracle whose value is fixed in some way on P points; the two are distinguishable with probability at most \(O(\sqrt{ST/P})\) by attackers making at most T oracle queries. Unruh conjectured that the distinguishing advantage could be made negligible for a sufficiently large polynomial P. We show that Unruh’s conjecture is false by proving that the distinguishing probability is at least \(\varOmega (ST/P)\).
Faced with this negative general result, we establish new security bounds, — which are nearly optimal and beat presampling bounds, — for specific applications of random oracles, including oneway functions, pseudorandom functions/generators, and message authentication codes. We also explore the effectiveness of salting as a mechanism to defend against offline preprocessing, and give quantitative bounds demonstrating that salting provably helps in the context of onewayness, collisionresistance, pseudorandom generators/functions, and message authentication codes. In each case, using (at most) n bits of salt, where n is the length of the secret key, we get the same security \(O(T/2^n)\) in the random oracle model with auxiliary input as we get without auxiliary input.
At the heart of our results is the compression technique of Gennaro and Trevisan, and its extensions by De, Trevisan and Tulsiani.
Keywords
Hash Function Random Oracle Auxiliary Information Message Authentication Code Random Oracle Model1 Introduction
The randomoracle model [4] often provides a simple and elegant way of analyzing the concrete security of cryptographic schemes based on hash functions. To take a canonical example, consider (naïve) password hashing where a password pw is stored as H(pw), for H a cryptographic hash function, and we are interested in the difficulty of recovering pw from H(pw) (i.e., we are interested in understanding the onewayness of H). It seems difficult to formalize a concrete assumption about H that would imply the difficulty of recovering pw for all highentropy distributions on pw; it would be harder still to come up with a natural assumption implying that for all distributions on pw with minentropy k, recovering pw requires \(O(2^k)\) work. If we model H as a random oracle, however, then both these statements can be proven easily—and this matches the best known attacks for many cryptographic hash functions.
Importantly, the above discussion assumes that no preprocessing is done. That is, we imagine an attacker who does no work prior to being given H(pw) or, more formally, we imagine that the attacker is fixed before the random oracle H is chosen. In that case, the only way an attacker can learn information about H is by making explicit queries to an oracle for H, and the abovementioned bounds hold. In practice, however, H is typically a standardized hash function that is known in advance, and offline preprocessing attacks—during which the attacker can query and store arbitrary information about H—can be a significant threat.
Concretely, let \(H: [N] \rightarrow [N]\) and assume that pw is uniform in [N]. The obvious attack to recover pw from H(pw) is an exhaustivesearch attack which uses time \(T=N\) in the online phase (equating time with the number of queries to H) to recover pw. But an attacker could also generate the entire function table for H during an offline preprocessing phase; then, given H(pw) in the online phase, the attacker can recover pw in O(1) time using a table lookup. The data structure generated during the offline phase requires \(S = O(N)\) space (ignoring \(\log N\) factors), but Hellman [11] showed a more clever construction of a data structure which, in particular, gives an attack using \(S=T=O(N^{2/3})\) (see [12, Sect. 5.4.3] for a selfcontained description). Rainbow tables implementing this approach along with later improvements (most notably by Oechslin [14]), are widely used in practice, and must be taken into account in any practical analysis of password security. Further work has explored improving these result and proving rigorous versions of them, as well as showing bounds on how well such attacks can perform [2, 6, 8, 9, 14, 18].
The above discussion in the context of function inversion gives a practical example of where auxiliary information about a random oracle (in this case, in the form of rainbow tables generated using the random oracle) can quantitatively change the security of a given application that uses the random oracle. For a more dramatic (but less practical) example, consider the case of collision finding. Given a random function \(H: [N] \rightarrow [N]\), one can show that \(O(\sqrt{N})\) queries are needed in order to find a collision in H (i.e., distinct points \(x, x'\) with \(H(x)=H(x')\)). But clearly we can find a collision in H during an offline preprocessing phase and store that collision using O(1) space, after which it is trivial to output that collision in an online phase in O(1) time. The conclusion is that in settings where offline preprocessing is a possibility, security proofs in the randomoracle model must be interpreted carefully. (We refer the reader to [5, 16], as well as many of the references below, for further discussion).
From a different viewpoint, another motivation for studying auxiliary information comes from the desire for obtaining accurate security bounds against nonuniform attackers when instantiating random oracle by a concrete hash function. Indeed, nonuniform attackers are allowed to have some arbitrary ‘advice’ before attacking the system. Translated to the random oracle model, this would require the attacker to be able to compute some arbitrary function of the entire random oracle, which cannot be done using only bounded number T of oracle queries. This mismatch already led to considerable confusion among both theoreticians and practitioners. We refer to [5, 15] for some indepth discussion, here only mentioning two most wellknown examples. (1) In the standard (nonuniform) model, no single function can be collisionresistant, while a single random oracle is trivially collisionresistant (without preprocessing); this is why in the standard model one considers a family of CRHFs, whose public key (which we call salt) is chosen after the attacker gets his nonuniform advice. To the best of our knowledge, prior to our work no meaningful CRHF bound was given for salted random oracle if (saltindependent) preprocessing was allowed. (2) In the standard (nonuniform) model, it is well known [1, 5, 7] that no pseudorandom generator (PRG) H(x) can have security better than \(2^{n/2}\) even against lineartime attackers, where n is the seedlength of x. In contrast, an expanding random oracle can be trivially shown to be \((T/2^n)\)secure PRG in the traditional random oracle model, easily surpassing the \(2^{n/2}\) barrier in the standard model (even for huge T up to \(2^{n/2}\), let alone polynomial T).
Random Oracle with Auxiliary Input. While somewhat different, the two motivating applications above effectively reduce to the following identical extension of the traditional random oracle model (ROM). A (computationally unbounded) attacker A can compute arbitrary S bits of information \(z=z({\mathcal O})\) about the random oracle \({\mathcal O}\) before attacking the system, and then use additional T oracle queries to \({\mathcal O}\) during the attack. Following Unruh [16], we call this the Random Oracle Model with Auxiliary Input (ROMAI), and this is the model we thoroughly study in this work. As we mentioned, while the traditional ROM only uses one parameter T, the ROMAI is parameterized by two parameters, S and T which roughly correspond to space (during offline preprocessing) and time (during online attack). For the application to nonuniform security, one can also use the ROMAI to get good estimates for nonuniform security against (nonuniform) circuits of size C by setting \(S=T=C\).^{1}
1.1 Handling Random Oracles with Auxiliary Input
Broadly speaking, there are three ways one can address the issue of preprocessing/auxiliary input in the randomoracle model: (1) by using a generic approach to analyze existing or proposed schemes, (2) by using an applicationspecific approach to analyze an existing or proposed scheme, or (3) by modifying existing schemes in an attempt to defeat preprocessing/nonuniform attacks. We discuss limited prior work on these three approaches below, before stating our results.
A generic approach. Unruh [16] was the first to propose a generic approach for dealing with auxiliary input in the randomoracle model. We give an informal overview of his results (a formal statement is given in Sect. 2). Say we wish to bound the success probability \(\epsilon \) (in some experiment) of an online attacker making T randomoracle queries, and relying on S bits of (arbitrary) auxiliary information about the random oracle. Unruh showed that it suffices to analyze the success probability \(\epsilon '(P)\) of the attack in the presence of a “presampled” random oracle that is chosen uniformly subject to its values being fixed in some adversarial way on P adversarial points (where P is a parameter), and no other auxiliary information is given; \(\epsilon \) is then bounded by \(\epsilon '(P)+O(\sqrt{ST/P})\), while P is then chosen optimally as to balance out the resulting two terms (see an example below).
This is an impressive result, but it falls short of what one might hope for. In particular, P must be superpolynomial in order to make the “security loss” \(O(\sqrt{ST/P})\) negligible, but in many applications if P is too large then the bound \(\epsilon '(P)\) one can prove on an attacker’s success probability in the presence of a “presampled” random oracle with P fixed points becomes too high. Unruh conjectured that his bound was not tight, and that it might be possible to bound the “security loss” by a negligible quantity for P a sufficiently large polynomial.
An applicationspecific approach. Given that the generic approach might lead to very suboptimal bounds, one might hope to develop a much tighter applicationspecific approach to get concrete bounds. To the best of our knowledge, no such work was done for the random oracle model with preprocessing. Indirectly, however, De et al. [6] adapted the beautiful compression “compression paradigm” introduced by Gennaro and Trevisan [9, 10] to show nearly tight security bounds for inverting inverting oneway permutations as well as specific PRGs (based on oneway permutations and hardcore bits). This was done not for the sake of analyzing security of these constructions,^{2} but rather to show limitations of generic inversion/distinguishing attacks all oneway functions or PRGs. Still, this elegant theoretical approach suggests that applicationspecific techniques, such as the compression paradigm, might be useful in the analysis of schemes based on realworld hash functions, such as SHA.
“Salting.” Even with optimal applicationspecific techniques, we have already discussed how preprocessing attacks can be effective for tasks like function inversion and collision finding, as well as nontrivial distinguishing attacks against pseudorandom generators/functions.
A natural defense against preprocessing attacks, which has been explicitly suggested [13] and is widely used to defeat such attacks in the context of password hashing, is to use salting. Roughly, this involves choosing a random but public value a and including it in the input to the hash function. Thus, in the context of password hashing we would choose a uniform salt a and store (a, H(a, pw)); in the context of collisionresistant hashing we would choose and publish a and then look at the hardness of finding collisions in the function \(H(a, \cdot )\); and in the context of pseudorandom generators we would choose a and then look at the pseudorandomness of H(a, x) (for uniform x) given a.
De et al. [6] briefly study the effect of salting for inverting oneway permutations as well as specific PRGs (based on oneway permutations and hardcore bits), but beyond that we are aware of no analysis of the effectiveness of salting for defeating preprocessing in any other contexts, including the use of hash functions which are not permutations.^{3} We highlight that although it may appear “obvious” that salting defeats, say, rainbow tables, it is not at all clear what is the quantitative security benefit of salting, and it is not clear whether rainbow tables can be adapted to give a (possibly different) online/offline tradeoff when salting is used.
1.2 Our Results
We address all three approaches outlined in the previous section. First, we investigate the generic approach to proving security in the randomoracle model with auxiliary input, and specifically explore the extent to which Unruh’s presampling technique can be improved. Here, our result is largely negative: disproving Unruh’s conjecture, we show that there is an attack for which the “security loss” stemming from Unruh’s approach is at least \(\varOmega (ST/P)\). Although there remains a gap between our lower bound and Unruh’s upper bound that will be interested to close, as we discuss next the upshot is that Unruh’s technique is not sufficient (in general) for proving strong concretesecurity bounds in the randomoracle model when preprocessing is a possibility.
Consider, e.g., the case of function inversion. One can show that the probability of inverting a random oracle \(H: [N] \rightarrow [N]\) for which P points have been “presampled” is \(O(P/N + T/N)\). Combined with the security loss of \(O(\sqrt{ST/2P})\) resulting from Unruh’s technique and plugging in the optimal value of P, we obtain a security bound of \(O((ST/N)^{1/3}+T/N)\) for algorithms making T oracle queries and using S bits of auxiliary input about H. And our negative result shows that the best bound one could hope to achieve by using Unruh’s approach is \(O((ST/N)^{1/2} + T/N)\). Both bounds fall short of the best known attacks, which succeed with probability \(\varOmega \left( \min \left\{ \frac{T}{N},(\frac{S^2T}{N^2})^{1/3}\right\} + \frac{T}{N}\right) \). Similar gaps exist for other cryptographic primitives.
Faced with this, we turn to studying a more direct approach for proving tighter bounds for specific important applications of hash functions, such as their use as oneway functions, pseudorandom generators/functions (PRGs/PRFs) or message authentication codes (MACs).^{4} Here we show much tighter, and in many cases optimal bounds for all of these primitives, which always beat the provable version of Unruh’s presampling (see Table 1 with value \(K=1\)). Not surprisingly, our bounds are not as good as what is possible to show without preprocessing, since those bounds are no longer true once preprocessing is allowed. In particular, setting \(S=T=C\) we now get meaningful nonuniform security bounds against circuits of size C for all of the above primitives, which often match the existing limitations known for nonuniform attacks. (For example, when \(C=S=T\) is polynomial in n, we get that the optimal nonuniform PRG/PRF security is lower bounded by \(2^{n/2}\), matching existing attacks).
Given these inherent limitation as compared to the traditional ROM without preprocessing, we formally examine the effects of “salting” as a way of mitigating or even defeating the effects of preprocessing/nonuniformity. As before, we look at the natural, “salted” constructions of oneway functions, PRGs, PRFs and MACs, but now can also examine collisionresistant hash functions (CRHFs), which can be potentially secure against preprocessing, once the salt is longenough. In all these case we analyze the security of these constructions in the presence of auxiliary information about the random oracle. In fact, the “unsalted” results for oneway functions, PRGs, PRFs and MACs mentioned above are simply special cases of salted result with the cardinality K of the salting space is \(K=1\).
Security bounds and best known attacks using space S and time T for “salted” constructions of primitives based on a random oracle. The first three (unkeyed) primitives are constructed from a random oracle \({\mathcal O}:[K]\times [N]\rightarrow [M]\), where [K] is the domain of the salt and [N] is the domain of the secret; the final two (keyed) primitives are constructed from a random oracle \({\mathcal O}:[K]\times [N]\times [L]\rightarrow [M]\), where [L] is the domain of the input. For simplicity, logarithmic factors and constant terms are omitted.
Security bounds (here)  Best known attacks  

OWFs  \(\frac{ST}{KN} + \frac{T}{N}\)  \(\min \left\{ \frac{ST}{KN},(\frac{S^2T}{K^2N^2})^{1/3}\right\} + \frac{T}{N}\) 
CRHFs  \(\frac{S}{K} + \frac{T^2}{M}\)  \(\frac{S}{K} + \frac{T^2}{M}\) 
PRGs  \((\frac{ST}{KN})^{1/2}+\frac{T}{N}\)  \((\frac{S}{KN})^{1/2}+\frac{T}{N}\) 
PRFs  \((\frac{ST}{KN})^{1/2}+\frac{T}{N}\)  \((\frac{S}{KN})^{1/2} + \frac{T}{N}\) 
MACs  \(\frac{ST}{KN} + \frac{T}{N} + \frac{T}{M} \)  \(\min \left\{ \frac{ST}{KN},(\frac{S^2T}{K^2N^2})^{1/3}\right\} + \frac{T}{N}+\frac{1}{M}\) 
All our new bounds are proven using the “compression paradigm” introduced by Gennaro and Trevisan [9, 10]. The main idea is to argue that if some attacker succeeds with “high” probability, then that attacker can be used to reversibly encode (i.e., compress) a random oracle beyond what is possible from an informationtheoretic point of view. Since we are considering attackers who perform preprocessing, our encoding must include the Sbit auxiliary information produced by the attacker. Thus, the main technical challenge we face is to ensure that our encoding compresses by (significantly) more than S bits.
Outlook. In this work we thoroughly revisited the ROM with auxiliary input, as we believe it has not gotten enough attention from the cryptographic community, despite being simultaneously important for the variety of reasons detailed above, and also much more interesting than the traditional ROM from a technical point in view. Indeed, even the most trivial oneline proof in the traditional ROM is either completely false once preprocessing is allowed (e.g., CRHFs), or becomes an interesting technical challenge (OWFs, PRGs, MACs) that requires new techniques, and usually teaches us something new about the primitive in question in relation to preprocessing.
Of course, given an abundance of works using random oracle, we hope our work will generate a lot of followup research analyzing the effects of preprocessing and nonuniformity for many other important uses of hash functions, as well as other idealized primitives (e.g., ideal ciphers).
2 Limits on the Power of Preprocessing
For two distributions \(D_1,D_2\) over universe \(\varOmega \), we use \(\varDelta (D_1,D_2)\) to denote their statistical distance Open image in new window .
In this section, we revisit the result of Unruh [16] that allows one to replace arbitrary (boundedlength) auxiliary information about a random oracle \({\mathcal O}\) with a (boundedsize) set fixing the value of the random oracle on some fraction of points. For a set of tuples \(Z=\{(x_1, y_1), \ldots \}\), we let \({\mathcal O}'[Z]\) denote a random oracle chosen uniformly subject to the constraints \({\mathcal O}'(x_i)=y_i\).
Theorem 1
This theorem enables proving various results in the randomoracle model even in the presence of auxiliary input by first replacing the auxiliary input with a fixed set of input/output pairs and then using standard lazysampling techniques for the value of the random oracle at other points. However, applying this theorem incurs a cost of \(\sqrt{ST/2P}\), and so superpolynomial P is required in order to obtain negligible advantage overall. It is open whether one can improve the bound in Theorem 1; Unruh conjectures [16, Conjecture 14] that for all polynomials S, T there is a polynomial P such that the statistical difference above is negligible. We disprove this conjecture by showing that the bound in the theorem cannot be improved (in general) below O(ST / P). That is,
Theorem 2
Proof
Pick S disjoint sets \(X_1,\dots , X_S\subset [N]\), where each set is of size \(t=T\cdot (4(P/ST)^2+1)\). Partition each set \(X_i\) into \(t/T=4(P/ST)^2+1\) disjoint blocks \(X_{i,1},\dots ,X_{i,t/T}\), each of size T. Algorithm \(A_1^{\mathcal O}\) outputs an Sbit state where the ith bit is equal to \(\mathrm {maj}(\oplus _{x\in X_{i,1}}{\mathcal O}(x),\dots , \oplus _{x\in X_{i,t/T}}{\mathcal O}(x))\) where \(\mathrm {maj}\) is the majority function. Algorithm \(A_1^{\mathcal O}(b_1, \ldots , b_S)\) chooses a uniform block \(X_{i,j}\) and outputs 1 iff \(\oplus _{x\in X_{i,j}}{\mathcal O}(x)=b_i\).
3 Function Inversion
For natural number n, we define \([n]=\{1, \ldots , n\}\). In this section, we prove bounds on the hardness of inverting “salted” random oracles in the presence of preprocessing. That is, consider choosing a random function \({\mathcal O}: [K] \times [N] \rightarrow [M]\) and then allowing an attacker \(A_0\) (with oracle access to \({\mathcal O}\)) to perform arbitrary preprocessing to generate an Sbit state \(\mathsf{st}\). We then look at the hardness of inverting \({\mathcal O}(a, x)\), given \(\mathsf{st}\) and \(a\), for algorithms \(A_1\) making up to T oracle queries, where \(a\in [K]\) and \(x \in [N]\) are uniform. We consider two notions of inversion: computing x itself, or the weaker goal of finding any \(x'\) such that \({\mathcal O}(a, x')={\mathcal O}(a, x)\). Assuming \(N=M\) for simplicity in the present discussion, we show that in either case the probability of successful inversion is \(O(\frac{ST}{KN} + \frac{T\log N}{N})\). We remark that the best bound one could hope to prove via a generic approach (i.e., using Theorem 1 with bestpossible bound O(ST / P)) is^{5} \(O(\sqrt{ST/KN} + T/N)\).
By way of comparison, rainbow tables [2, 6, 8, 11, 14] address the case \(K=0\) (i.e., no salt), and give success probability \(O(\min \{ST/N, (S^2T/N^2)^{1/3}\}+T/N)\). One natural way to adapt rainbow tables to handle salt is to compute K independent rainbow tables, each using space S / K, for the K reduced functions \({\mathcal O}(a, \cdot )\). Using this approach gives success probability \(O(\min \{ST/KN, (S^2T/K^2N^2)^{1/3}\}+T/N)\). This shows that our bound is tight when \(ST^2<KN\).
We begin with some preliminary lemmas that we will rely on in this and the following sections.
Lemma 1
Say there exist encoding and decoding procedures \((\mathsf{Enc}, \mathsf{Dec})\) such that for all \(m \in M\) we have \(\mathsf{Dec}(\mathsf{Enc}(m))=m\). Then Open image in new window .
Proof
For \(m \in M\), let \(s_m = \mathsf{Enc}(m)\). Define \(C=\sum _m 2^{s_m}\), and for \(m\in M\) let \(q_m = 2^{s_m}/C\). Then Open image in new window . By Jensen’s inequality, Open image in new window , and by Kraft’s inequality \(C\le 1\). The lemma follows. \(\blacksquare \)
Lemma 2
([6]). Suppose there exist randomized encoding and decoding procedures \((\mathsf{Enc}, \mathsf{Dec})\) for a set M with recovery probability \(\delta \). Then the encoding length of \((\mathsf{Enc}, \mathsf{Dec})\) is at least \(\log {\left M\right }\log {1/\delta }\).
Proof
By a standard averaging argument, there exists an r and a set \(M' \subseteq M\) with \(M' \ge \delta \cdot M\) such that \(\mathsf{Dec}(\mathsf{Enc}(m,r),r)=m\) for all \(m\in M'\). Let \(\mathsf{Enc}', \mathsf{Dec}'\) be the deterministic algorithms obtained by fixing the randomness to r. By Lemma 1, Open image in new window , and hence there exists an \(m'\) with \(\mathsf{Enc}'(m') \ge M  \log 1/\delta \). \(\blacksquare \)
We now state and prove the main results of this section. Let \(\mathsf{Func}(A, B)\) denote the set of all functions from A to B.
Theorem 3
Theorem 4
To prove Theorem 3, we first prove the following lemma:
Lemma 3
Proof
Specifically, the encoder uses randomness r to pick a set \(R \subseteq [K] \times [N]\), where each \((a,x)\in [K]\times [N]\) is included in R with probability 1 / 10T. For \(a\in [K]\), let \(G_a \subseteq R\) be the set of \((a,x) \in R\) such that \(A_1^{\mathcal O}(\mathsf{st}_{\mathcal O}, a, {\mathcal O}(a,x))=x\) and moreover \(A_1\) does not query \({\mathcal O}\) on any \((a', x') \in R\) (except possibly (a, x) itself). Let \(G=\bigcup _a G_a\). Define \(V_a = \{{\mathcal O}(a, x)\}_{x \in G_a}\), and note that \(V_a=G_a\).
As in De et al. [6], with probability at least 0.9 the size of G is at least \(\varepsilon KN/100T\). To see this, note that by a Chernoff bound, R has at least \(\varepsilon KN/40T\) points with probability at least 0.95. The expected number of points \((a, x) \in R\) for which \(A_1^{\mathcal O}(\mathsf{st}_{\mathcal O}, a, {\mathcal O}(a,x))=x\) but \(A_1\) queries \({\mathcal O}\) on some point \((a', x') \in R\) (besides (a, x) itself) is at most \(\frac{\varepsilon KN}{2}\cdot \frac{1}{10T} \cdot \left( 1(11/10T)^T\right) \le \frac{\varepsilon KN}{2000T}\). By Markov’s inequality, with probability at least 0.95 the number of such points is at most \(\frac{\varepsilon KN}{100T}\). So with probability at least 0.9, we have \(G \ge \frac{3\varepsilon KN}{200T}\ge \frac{\varepsilon KN}{100T}\).
 1.
Include \(\mathsf{st}_{\mathcal O}\) and, for each \(a \in [K]\), include \(\left V_a\right \) and a description of \(V_a\). This uses a total of \(S + K\log N + \sum _{a\in [K]}\log \left( {\begin{array}{c}M\\ \left G_a\right \end{array}}\right) \) bits.
 2.
For each a and \(y \in V_a\) (in lexicographic order), run \(A^{\mathcal O}_1(\mathsf{st}_{\mathcal O},a,y)\) and include in the encoding the answers to all the oracle queries made by \(A_1\) that have not been included in the encoding so far, except for any queries in R. (By definition of \(G_a\), there will be at most one such query and, if so, it will be the query (a, x) such that \({\mathcal O}(a, x)=y\).)
 3.
For each \((a,x)\in ([K]\times [N])\setminus G\) (in lexicographic order) for which \({\mathcal O}(a, x)\) has not been included in the encoding so far, add \({\mathcal O}(a,x)\) to the encoding.
Steps 2 and 3 explicitly include in the encoding the value of \({\mathcal O}(a, x)\) for each \((a, x) \in ([K] \times [N]) \setminus G\). Thus, the total number of bits added to the encoding by those steps is \(\left( KN\sum _{a}G_a\right) \log M\).
 1.
Recover \(\mathsf{st}_{\mathcal O}\), \(\{V_a\}_{a \in K}\), and \(\{V_a\}_{a \in K}\).
 2.
For each a and \(y \in V_a\) (in lexicographic order), run \(A_1(\mathsf{st}_{\mathcal O},a,y)\) while answering the oracle queries of \(A_1\) using the values stored in the encoding. The only exception is if \(A_1\) ever makes a query \((a, x) \in R\), in which case y itself is returned as the answer. The output x of \(A_1\) will be such that \({\mathcal O}(a,x)=y\).
 3.
For each \((a,x)\in [K]\times [N]\) (in lexicographic order) for which \({\mathcal O}(a, x)\) is not yet defined, recover the value of \({\mathcal O}(a, x)\) from the remainder of the encoding.
 1.
The set \(X_{{\mathcal O},a}\) (along with its size), using \(\log N + \left( {\begin{array}{c}N\\ X_{{\mathcal O},a}\end{array}}\right) \) bits.
 2.
The set \(Y_{{\mathcal O},a}\) (along with its size), using \(\log M + \left( {\begin{array}{c}M\\ Y_{{\mathcal O},a}\end{array}}\right) \) bits.
 3.
For each \(x \in X_{{\mathcal O},a}\), the value \({\mathcal O}(a,x) \in Y_{{\mathcal O},a}\) encoded using \(\log Y_{{\mathcal O},a}\) bits.
 4.
For each \(x \not \in X_{{\mathcal O},a}\), the value \({\mathcal O}(a,x)\) encoded using \(\log M\) bits.
4 CollisionResistant Hash Functions
In this section, we prove the following theorem.
Theorem 5
The bound in the above theorem matches (up to the \(K^{1} \log K\) term) the parameters achieved by the following: \(A_0\) outputs collisions in \({\mathcal O}(a_i, \cdot )\) for each of \(a_1, \ldots , a_S \in [K]\). Then \(A_1\) outputs the appropriate collision if \(a=a_i\), and otherwise performs a birthday attack in an attempt to find a collision.
To prove Theorem 5, we first prove the following lemma:
Lemma 4
Proof
Fix \({\mathcal O}:[K]\times [N]\rightarrow [M]\), and let \(\mathsf{st}_{\mathcal O}= A_0^{\mathcal O}\). Let \(G_{\mathcal O}\) be the set of \(a \in [K]\) such that \(A_1^{\mathcal O}(\mathsf{st}_{\mathcal O}, a)\) outputs a collision in \({\mathcal O}(a, \cdot )\). We assume, without loss of generality, that if \(A^{\mathcal O}_1(\mathsf{st}_{\mathcal O}, a)\) outputs \(x, x'\), then it must have queried \({\mathcal O}(a,x)\) and \({\mathcal O}(a,x')\) at some point in its execution. The basic observation is that we can use this to compress \({\mathcal O}(a, \cdot )\) for \(a \in G_{\mathcal O}\). Specifically, rather than store both \({\mathcal O}(a, x)\) and \({\mathcal O}(a, x')\) (using \(2 \log M\) bits), where \(x, x'\) is the collision in \({\mathcal O}(a, \cdot )\) output by \(A_1\), we instead store the value \({\mathcal O}(a, x)={\mathcal O}(a,x')\) once, along with the indices i, j of the oracle queries \({\mathcal O}(a, x)\) and \({\mathcal O}(a, x')\) made by \(A_1\) (using a total of \(\log M + 2 \log T\) bits). This is a net savings if \(2 \log T < \log M\). Details follow.
 1.
Encode \(\mathsf{st}_{\mathcal O}\), \(\left G_{\mathcal O}\right \), and \(G_{\mathcal O}\). This requires \(S + \log K + \log \left( {\begin{array}{c}K\\ \left G_{\mathcal O}\right \end{array}}\right) \) bits.
 2.
For each \(a\in G_{\mathcal O}\) (in lexicographic order), run \(A_1^{\mathcal O}(\mathsf{st}_{\mathcal O}, a)\) and let the second components of the oracle queries of \(A_1\) be \(x_1, \ldots , x_T\). (We assume without loss of generality these are all distinct.) If \(x, x'\) are the output of \(A_1\), let \(i< j\) be such that \(\{x,x'\}=\{x_i, x_j\}\). Encode i and j, along with the answers to each of \(A_1\)’s oracle queries (in order) except for the jth. Furthermore, encode \({\mathcal O}(a, x)\) for all \(x \in [N] \setminus \{x_1, \ldots , x_T\}\) (in lexicographic order). This requires \((N1) \cdot \log M + 2 \log T\) bits for each \(a \in G_{\mathcal O}\).
 3.
For each \(a \not \in G_{\mathcal O}\) and \(x \in [N]\) (in lexicographic order), store \({\mathcal O}(a, x)\). This uses \(N \log M\) bits for each \(a \not \in G_{\mathcal O}\).
Decoding is done in the obvious way.
The general case. In the general case, we need to take into account the fact that \(A_1\) may make arbitrary queries to \({\mathcal O}\). This affects the previous approach because \(A_1(\mathsf{st}_{\mathcal O}, a)\) may query \({\mathcal O}(a', x)\) for a value x that is output as part of a collision by \(A_1(\mathsf{st}_{\mathcal O}, a')\).
To deal with this, consider running \(A_1^{\mathcal O}(\mathsf{st}_{\mathcal O}, a)\) for all \(a \in G_{\mathcal O}\). There are at most \(T \cdot G_{\mathcal O}\) distinct oracle queries made overall. Although several of them may share the same prefix \(a \in [K]\), there are at most \(G_{\mathcal O}/2\) values of a that are used as a prefix in more than 2T queries. In other words, there is a set \(G'_{\mathcal O}\subseteq G_{\mathcal O}\) of size at least \(G_{\mathcal O}/2\) such that each \(a \in G'_{\mathcal O}\) is used in at most 2T queries when running \(A_1^{\mathcal O}(\mathsf{st}_{\mathcal O}, a)\) for all \(a \in G'_{\mathcal O}\).
To encode \({\mathcal O}\) we now proceed in a manner similar to before, but using \(G'_{\mathcal O}\) in place of \(G_{\mathcal O}\). Moreover, we run \(A_1^{\mathcal O}(\mathsf{st}_{\mathcal O}, a)\) for all \(a \in G'_{\mathcal O}\) (in lexicographic order) and consider all the distinct oracle queries made. For each \(a \in G'_{\mathcal O}\), let \(i_a < j_a \le 2T\) be such that the \(i_a\)th and \(j_a\)th oracle queries that use prefix a are distinct but yield the same output. (There must exist such indices by assumption on \(A_1\).) We encode \((i_a, j_a)\) for all \(a \in G'_{\mathcal O}\), along with the answers to all the (distinct) oracle queries made with the exception of the \(j_a\)th oracle query made using prefix a for all \(a \in G'_{\mathcal O}\). The remainder of \({\mathcal O}(\cdot , \cdot )\) is then encoded in the trivial way as before.Decoding is done in the natural way.
5 Pseudorandom Generators and Functions
In this section, we prove the following theorems.
Theorem 6
Theorem 7
Note that in both cases, an exhaustivesearch attack (with \(S=0\)) achieves distinguishing advantage \(\varTheta (T/N)\). With regard to pseudorandom generators (Theorem 6), De et al. [6] show an attack with \(T=0\) that achieves distinguishing advantage \(\varOmega (\sqrt{\frac{S}{KN}})\). Their attack can be extended to the case of pseudorandom functions (assuming \(q> \log {KN}\)) to obtain distinguishing advantage \(\varOmega (\sqrt{\frac{S}{KN}})\) in that case as well.
In proving the above, we rely on the following [6, Lemma 8.4]:
Lemma 5
We now prove Theorem 6.
Proof

\(C_1\) makes at most T oracle queries, and never queries its own input;

\(C_0\) runs \(B_0\) and also outputs as part of its state the truth table of a function mapping \([K] \times [N]\) to outputs of length at most \((\log M  1)\) bits;
As intuition for the proof of Theorem 7, note that we may view a pseudorandom function as a pseudorandom generator mapping a key to the truth table for a function, with the main difference being that the distinguisher is not given the entire truth table as input but instead may only access parts of the truth table via queries it makes. We may thus apply the same idea as in the proof of Theorem 6, with the output length (i.e., \(\log M\)) replaced by the number of queries the distinguisher makes. However in this case, Lemma 5 cannot be directly applied and a slightly more involved compression argument is required.
With this in mind, we turn to the proof of Theorem 7:
Proof
 1.
Include \(B^{\mathcal O}_0\). This uses at most \(S + 1\) bits.
 2.
For each \((a,k)\in ([K]\times [N])\backslash R\) (in lexicographic order), include the truth table of \({\mathcal O}(a,k,\cdot )\). Then for each \((a,k)\in R\backslash G\) (in lexicographic order), include the truth table of \({\mathcal O}(a,k,\cdot )\). This uses a total of \((KN\left G\right )\cdot L\) bits.
 3.
Include a description of \(G_0\). This uses \(\log {\left( {\begin{array}{c}\left G\right \\ \left G_0\right \end{array}}\right) }\) bits.
 4.
For each \((a,k)\in G\) (in lexicographic order), include in the encoding the answers to all the oracle queries made by \(C_1\) to the second oracle \({\mathcal O}(a,k,\cdot )\), and for every x such that (a, k, x) is not queried by \(C_1\) to \({\mathcal O}(a,k,\cdot )\) and x is not the output of \(C_1\), add \({\mathcal O}(a,k,x)\) to the encoding. This uses a total of \(\left G\right (L1)\) bits.
 1.
Recover \(B_0^{\mathcal O}\).
 2.
For each \((a,k)\in ([K]\times [N])\setminus R\), recover the truth table of \({\mathcal O}(a,k,\cdot )\). Identify set G by running \(C_1\) with \(B^{\mathcal O}_0\) on \((a,k)\in R\) because if \(C_1\) on (a, k) only makes query outside R, then \((a,k)\in G\). Go over \((a,k)\in R\setminus G\), and recover the truth table of \({\mathcal O}(a,k,\cdot )\).
 3.
Recover \(G_0\).
 4.
For each \((a,k)\in G\),run \(C_1(B_0^{\mathcal O},a)\) while answering the oracle queries to the first oracle using recovered values and to the second oracle using the values stored in the encoding. Suppose \(C_1\) outputs x, b, if \((a,k)\in G_0\), recover \({\mathcal O}(a,k,x)=b\) otherwise \({\mathcal O}(a,k,x)=1b\). After that for which \({\mathcal O}(a,k,x)\) is not yet defined, recover the value of \({\mathcal O}(a, k, x)\) from the remainder of the encoding.
6 Message Authentication Codes (MACs)
In this section, we prove the following theorem.
Theorem 8
Note that any generic inversion attack can be used to attack the above construction of a MAC by fixing some \(m \in [L]\) and then inverting the function \({\mathcal O}(a, \cdot , m)\) given a; in this sense, it is perhaps not surprising that the bound above contains terms \({\mathcal O}\left( \frac{ST}{KN} + \frac{T \log N}{N}\right) \) as in Theorem 3. There is, of course, also a trivial guessing attack that achieves advantage 1 / M.
Proof
Fix \({\mathcal O}:[K]\times [N]\times [L]\rightarrow [M]\). Let \(U_{{\mathcal O}}\) be the set of (a, k) such that \(B_1\) succeeds on (a, k). Let \(G_{{\mathcal O}}\) be the subset of \(U_{{\mathcal O}}\) such that for every \((a,k)\in G_{{\mathcal O}}\), \(B_1^{{\mathcal O},{\mathcal O}(a,k,\cdot )}\) does not query its first oracle with any query with prefix \((a',k')\in G_{{\mathcal O}}\). Because \(B_1\) makes at most T queries, there exists \(G_{{\mathcal O}}\) with size at least \(\left U_{{\mathcal O}}\right /(T+1)\).
We can encode \({\mathcal O}\) as follows.
 1.
Include \(A^{\mathcal O}_0\), \(\left G_{{\mathcal O}}\right \) and a description of \(G_{{\mathcal O}}\). This uses a total of \(S+\log {KN}+\log {\left( {\begin{array}{c}NK\\ \left G_{{\mathcal O}}\right \end{array}}\right) }\) bits.
 2.
For each \((a,k)\in ([K]\times [N])\setminus G_{{\mathcal O}}\) (in lexicographic order), include the truth table of \({\mathcal O}(a,k,\cdot )\). This uses a total of \((KN\left G_{{\mathcal O}}\right )\cdot L\log M\) bits.
 3.
For each \((a,k)\in G_{{\mathcal O}}\) (in lexicographic order), include in the encoding the answers to all the oracle queries made by \(B_1\) to the second oracle \({\mathcal O}(a,k,\cdot )\), and then for every m such that (a, k, m) is not queried by \(C_1\) to \({\mathcal O}(a,k,\cdot )\) and m is not the output of \(C_1\), add \({\mathcal O}(a,k,m)\) to the encoding. This uses a total of \(\left G\right (L1)\log M\) bits.
Footnotes
 1.
Since circuit of size C can encode up to \(S=\varOmega (C)\) bits of information about a given hash function H, as well as evaluate it close to \(T=\varOmega (C)\) times, assuming H is efficient.
 2.
For which we currently have no realworld candidates, since we do not have any candidates for efficient uninvertible “random permutations”.
 3.
Bellare et al. [3] study security of salting for the purposes of multiinstance security, but they do not address the issue of preprocessing.
 4.
As we mentioned, collisionresistance is impossible without salting, which we discuss shortly.
 5.
Any such bound would take the form \(O(ST/P + P/KN + T/N)\), where the first term is from application of the theorem, the second is the probability that the input to \(\mathcal A_1\) is from the set of fixed points, and the third is the success probability of a trivial bruteforce search. Setting \(P=\sqrt{ST/KN}\) optimizes this bound.
 6.
The logsum inequality states that for nonnegative \(t_1,\ldots ,t_n\) and \(w_1,\ldots ,w_n\), it holds that \(\sum _{i=1}^n t_i\log (w_i/t_i)\le \left( \sum _{i=1}^n t_i\right) \cdot \log (\sum _{i=1}^nw_i/\sum _{i=1}^nt_i)\). It also implies the average of \(t_1\log (w_1/t_1),\dots ,t_n\log (w_n/t_n)\) is less that \(\overline{t}\log (\overline{w}/\overline{t})\) where \(\overline{t}\) is the average of \(t_1,\dots ,t_n\) and \(\overline{w}\) is the average of \(w_1,\dots ,w_n\).
Notes
Acknowledgments
Jonathan Katz thanks Christine Evangelista, Aaron Lowe, Jordan Schneider, Lynesia Taylor, Aishwarya Thiruvengadam, and Ellen Vitercik, who explored problems related to salting and rainbow tables as part of an NSFREU program in the summer of 2014.
References
 1.Alon, N., Goldreich, O., Håstad, J., Peralta, R.: Simple constructions of almost \(k\)wise independent random variables. Random Struct. Algorithms 3(3), 289–304 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
 2.Barkan, E., Biham, E., Shamir, A.: Rigorous Bounds on Cryptanalytic Time/Memory Tradeoffs. In: Dwork, C. (ed.) CRYPTO 2006. LNCS, vol. 4117, pp. 1–21. Springer, Heidelberg (2006). doi: 10.1007/11818175_1 CrossRefGoogle Scholar
 3.Bellare, M., Ristenpart, T., Tessaro, S.: Multiinstance security and its application to passwordbased cryptography. In: SafaviNaini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 312–329. Springer, Heidelberg (2012). doi: 10.1007/9783642320095_19 CrossRefGoogle Scholar
 4.Bellare, M., Rogaway, P.: Random oracles are practical: a paradigm for designing efficient protocols. In: 1st ACM Conference on Computer and Communications Security, pp. 62–73. ACM Press (1993)Google Scholar
 5.Bernstein, D.J., Lange, T.: Nonuniform cracks in the concrete: the power of free precomputation. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013. LNCS, vol. 8270, pp. 321–340. Springer, Heidelberg (2013). doi: 10.1007/9783642420450_17 CrossRefGoogle Scholar
 6.De, A., Trevisan, L., Tulsiani, M.: Time space tradeoffs for attacks against oneway functions and PRGs. In: Rabin, T. (ed.) CRYPTO 2010. LNCS, vol. 6223, pp. 649–665. Springer, Heidelberg (2010). doi: 10.1007/9783642146237_35 CrossRefGoogle Scholar
 7.Dodis, Y., Steinberger, J.: Message authentication codes from unpredictable block ciphers. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 267–285. Springer, Heidelberg (2009). doi: 10.1007/9783642033568_16 CrossRefGoogle Scholar
 8.Fiat, A., Naor, M.: Rigorous time/space tradeoffs for inverting functions. SIAM J. Comput. 29(3), 790–803 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
 9.Gennaro, R., Gertner, Y., Katz, J., Trevisan, L.: Bounds on the efficiency of generic cryptographic constructions. SIAM J. Comput. 35(1), 217–246 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
 10.Gennaro, R., Trevisan, L.: Lower bounds on the efficiency of generic cryptographic constructions. In: 41st Annual Symposium on Foundations of Computer Science (FOCS), pp. 305–313. IEEE (2000)Google Scholar
 11.Hellman, M.: A cryptanalytic timememory tradeoff. IEEE Trans. Inf. Theory 26(4), 401–406 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
 12.Katz, J., Lindell, Y.: Introduction to Modern Cryptography, 2nd edn. Chapman & Hall/CRC Press (2014)Google Scholar
 13.Morris, R., Thompson, K.: Password security: a case history. Commun. ACM 22(11), 594–597 (1979)CrossRefGoogle Scholar
 14.Oechslin, P.: Making a faster cryptanalytic timememory tradeoff. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 617–630. Springer, Heidelberg (2003). doi: 10.1007/9783540451464_36 CrossRefGoogle Scholar
 15.Rogaway, P.: Formalizing human ignorance. In: Nguyen, P.Q. (ed.) VIETCRYPT 2006. LNCS, vol. 4341, pp. 211–228. Springer, Heidelberg (2006). doi: 10.1007/11958239_14 CrossRefGoogle Scholar
 16.Unruh, D.: Random oracles and auxiliary input. In: Menezes, A. (ed.) CRYPTO 2007. LNCS, vol. 4622, pp. 205–223. Springer, Heidelberg (2007). doi: 10.1007/9783540741435_12 CrossRefGoogle Scholar
 17.Yao, A.C.: Theory and applications of trapdoor functions. In: 23rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 80–91. IEEE (1982)Google Scholar
 18.Yao, A.C.C.: Coherent functions and program checkers. In: 22nd Annual ACM Symposium on Theory of Computing (STOC), pp. 84–94. ACM Press (1990)Google Scholar