Refinements of the ktree Algorithm for the Generalized Birthday Problem
Abstract
We study two open problems proposed by Wagner in his seminal work on the generalized birthday problem. First, with the use of multicollisions, we improve Wagner’s ktree algorithm that solves the generalized birthday problem for the cases when k is not a power of two. The new ktree only slightly outperforms Wagner’s ktree. However, in some applications this suffices, and as a proof of concept, we apply the new 3tree algorithm to slightly reduce the security of two CAESAR proposals. Next, with the use of multiple collisions based on Hellman’s table, we give improvements to the best known timememory tradeoffs for the ktree. As a result, we obtain the a new tradeoff curve \(T^2 \cdot M^{\lg k 1} = k \cdot N\). For instance, when \(k=4\), the tradeoff has the form \(T^2 M = 4 \cdot N\).
Keywords
Generalized birthday problem klist problem ktree algorithm Timememory tradeoff1 Introduction
Arguably, the most popular problem in private key cryptography is the collision search problem. It appears frequently not only in its classical usage, e.g. finding collisions for hash functions, but also as an intermediate subproblem of a wider cryptographic problem. The collision search has been widely studied and well understood. Besides this problem, and along with the search of multicollisions and multiple collisions, perhaps the next most popular is the generalized birthday problem (GBP).
The GBP is defined as follows: given k lists of random elements, choose a single element in each list, such that all the chosen elements sum up to a predefined value. Wagner is the first to investigate the GBP for all values of k and as an independent problem. In his seminal paper [31], he proposes an algorithm to solve GBP for all values of k and shows wide variety of applications ranging from blind signatures, to incremental hashing, low weight parity checks, and cryptanalysis of various hash functions.
Prior to Wagner, GBP problem has been mostly studied in the context of its application and only for a concrete number of lists (usually four lists, i.e. \(k=4\)). Schroeppel and Shamir [28] find all solutions to the 4list problem. Bernstein [4] uses similar algorithm to enumerate all solutions to a particular equation. Boneh, Joux and Nguyen [10] use Schroeppel and Shamir’s algorithm for solving integer knapsacks as well as Bleichenbacher [8] in his attack on DSA. Chose, Joux, and Mitton [11] use it to speed up search for parity checks for stream cipher cryptanalysis. Joux and Lercier [19] use related ideas in pointcounting algorithms for elliptic curves. Blum, Kalai, and Wasserman [9] apply it to find the first known subexponential algorithm for the learning parity with noise problem. Ajtai, Kumar, and Sivakumar findings [1] base on Blum, Kalai, and Wasserman’s algorithm as a subroutine to speed up the shortest lattice vector problem.
To solve the klist problem, Wagner proposes a socalled ktree algorithm. In a nutshell, the ktree is a divide and conquer approach and at each step it operates on only two lists. The step operations are based on a simple collision search. When the k lists are composed of nbit words, Wagner’s ktree algorithm solves the GBP in \(\mathcal {O}(k\cdot 2^{\frac{n}{\lfloor \lg k \rfloor + 1}})\) time and memory and requires lists of around \(2^{\frac{n}{\lfloor \lg k \rfloor + 1}}\) elements^{1}.
Even though the GBP has been shown to be very important to many problems in cryptography, more than a decade after its publication neither significant improvement to the ktree algorithm nor other dedicated algorithms have emerged. However, moderate improvements and refinements have been published. As one of the most important, we single out the extended ktree algorithm by Minder and Sinclair [21] that provides solution to GBP when the lists have smaller sizes and the timememory tradeoffs by Bernstein et al. [5, 6].
Our Contribution. Wagner points out a few open problems of the GBP and of the ktree algorithm. Two of these problems, namely, improving the efficiency of ktree when k is not a power of two and memory reduction of the ktree, are in fact the main research topics of our paper.
The ktree algorithm discards part of the lists when k is not a power of two (note how the complexity of ktree takes lower bound of \(\lg k\)). For instance, 7tree works only with 4 lists, while the remaining 3 lists are not processed. Our first improvement to the ktree is to work with the discarded lists (we call them passive lists) by creating multicollisions from the lists. From each of the passive lists we create a multicollision set of values that coincide on certain l bits, where \(l<n\). Then, we produce several solutions with the ktree from the other (active) lists, and for the same l bits. Finally, the remaining \(nl\) bits are absorbed by combining the multicollisions from the passive lists, and the solutions from the active lists. The advantage of our approach over the classical ktree is limited by the size of the multicollision sets, which in turn is bounded by the value of n. The speedup factor can be approximated as \(a\cdot n^c / \lg (b\cdot n)\), where a, b, c are constants that depend on k. The speedup is sufficient to break the \(\mathcal {O}(2^{\frac{n}{2}})\) complexity bound for the 3list problem and to show that in applications this can matter. As an example, we show a security reduction of two authenticated encryption CAESAR [3] proposals, Deoxys [16] and KIASU [18], based on the latest results of Nandi [22]. He shows that a forgery attack for COPA based candidates can be reduced to the 3list problem. We apply our improved 3tree algorithm to this problem and reduce the security bound of the candidates by 2 bits.
Our second contribution are timememory tradeoffs for the ktree algorithm. This research topic has been investigated by Bernstein et al. Their best tradeoffs are described with the curves \(TM^{\lg k} = k\cdot N\) and \(T^2 \cdot M^{\lg k  2 } = \frac{k^2}{4} \cdot N\), where M and T, are the memory and time complexity, respectively, and N is the size of the space of elements. To achieve a better tradeoff, we play around with the idea of producing multiple collisions in a memory constrained environment with the use of Hellman’s tables^{2}. It allows us to significantly reduce the memory complexity of the first level of the ktree algorithm and to achieve better tradeoffs. As a result, we obtain the tradeoff \(T^2M^{\lg k 1} = k\cdot N\). This translates to \(T^2M = 4\cdot N\) for \(k=4\), and \(T^2M^2 = 8\cdot N\) for \(k=8\) (cf. \(TM^2 = 4\cdot N\) and \(TM^3 = 8\cdot N\) curves of Bernstein et al.). As illustrated further in Fig. 6, for a given amount of memory, the new tradeoff always leads to a lower time complexity than the previous tradeoffs. The improvement of the tradeoff can be seen on the case of generalized birthday problem for the hash function SHA160 and \(k=8\). Our new tradeoff requires around \(2^{50}\) SHA1 computations and \(2^{30}\) memory on 8 cores (with the use of van Oorschot and Wiener’s parallel collisions search [30]), while with the same memory, the old tradeoff needs around \(2^{65}\) SHA1 computations.
2 The Generalized Birthday Problem
Wagner introduced the generalized birthday problem (GBP) as multidimensional generalization of the birthday problem. GBP is also called a klist problem, and is formalized as follows:
Problem 1
Given k lists \(L_1,\ldots ,L_k\) of elements drawn uniformly and independently at random from \(\{0,1\}^n\), find \(x_1 \in L_1, \ldots , x_k \in L_k\) such that \(x_1\oplus x_2\oplus \ldots \oplus x_k = 0\).
Obviously, if \(L_1\times L_2\times \ldots \times L_k \ge 2^n\), then with a high probability the solution to the problem exists. The real challenge, however, is to find it efficiently.
Wagner proposed the ktree algorithm that solves GBP (klist problem) faster under the assumption that the list sizes are larger. Further we describe the case when \(k=4\), refer to Fig. 1. Let us define \(S \bowtie T\) as a list of elements common to both S and T, and let \(low_l(x)\) stand for the l least significant bits of x. Furthermore, let us define \(S \bowtie _l T\) as a set that contains all pairs from \(S\times T\) that agree in their l least significant bits (the xor on the least significant bits is zero). Assume \(L_1,L_2,L_3,L_4\) are four lists, each containing \(2^l\) elements (l will be defined further). First we create a list \(L_{12}\) of values \(x_1\oplus x_2\), where \(x_1\in L_1,x_2\in L_2\), such that \(low_l(x_1\oplus x_2) = 0\). Similarly, we create a list \(L_{34}\) of values \(x_3\oplus x_4\), where \(x_3\in L_3, x_4\in L_4\), such that \(low_l(x_3\oplus x_4) = 0\). Finally, we search for a collisions between \(L_{12}\) and \(L_{34}\). It is easy to see that such a collision reveals the required solution, i.e. \(x_1\oplus x_2\oplus x_3\oplus x_4 = 0\).
The main advantage of the ktree algorithm lies in the way the solution is found – at each of the two levels, only a simple collision search algorithm is used, and only a certain number of bits is made to fulfill the final goal (the xor is zero on all bits). At the first level, the lists \(L_{12},L_{34}\) contain words that have zeros on the l least significant bits, thus xor of any two words from the lists must have zeros on these bits. At the second level, the xor of the words from the two lists will result in zeros on the remaining \(nl\) bits, if there are enough pairs. To get the sufficient number of pairs, the value of l is defined as \(l=n/3\). Then each of \(L_{12},L_{34}\) will have \(2^{n/3}\cdot 2^{n/3}/2^{n/3} = 2^{n/3}\) words, and thus at the second level there will be \(2^{n/3}\cdot 2^{n/3}=2^{2n/3}\) possible xors, one of which will have zeros on the remaining \(nn/3=2n/3\) bits. It is important to note that l is chosen as to balance the complexity of the two levels. Obviously, the total memory and the time complexities of the 4tree algorithm are \(\mathcal {O}(2^{n/3})\) each.
The very same idea is used to tackle any klist problem, where k is a power of two. The only difference is in the choice of l, and in the number of levels. In general, the number of levels equals \(\lg k\), and at each level except the final, additional l bits are set to zero. At the final level, the remaining 2l bits are zeroed. Hence, \(l \cdot \lg k + l = n\), and thus \(l=n/(\lg k + 1)\). The algorithm works in \(\mathcal {O}(k 2^{\frac{n}{\lg k +1}})\) time and memory and requires lists of sizes \(2^{\frac{n}{\lg k + 1}}\). As an example, let us focus on 8list problem, i.e. we have \(L_1,\ldots , L_8\) lists, \(\lg 8 = 3\) levels, and \(l=n/4\). At the first level we build \(L_{12},L_{34}, L_{56}, L_{78}\), by combining two lists \(L_i,L_j\), each with \(2^{l}=2^{n/4}\) elements that have zeros in the n / 4 least significant bits. At the second, we build \(L_{1234}\) and \(L_{5678}\) that have again \(2^{n/4}\) elements with zeros in the next n / 4 bits. Finally, at the third level, we find the solution that will have zeros on the remaining n / 2 bits.
Wagner’s algorithm works similarly when k is not a power of two. The trick is to make some lists passive, i.e. to choose one element from each of the passive lists, and then continue with the algorithm as for the case of power of two and the remaining lists. For instance, to solve 6list problem for lists \(L_1,\ldots ,L_6\), we take any element \(v_5 \in L_5\) and \(v_6\in L_6\), and then solve the 4list problem \(x_1\oplus x_2 \oplus x_3\oplus x_4 = v_5 \oplus v_6\), for the lists \(L_1,\ldots ,L_4\). We can easily remove the nonzero condition \(v_5\oplus v_6\) in the right part, by adding this value to each element of the list \(L_1\). Hence, the complexity of the klist problem for the case when k is not a power of two equals the complexity to the closest (and smaller) power of two case. Thus, for any value of k, the ktree algorithm works in \(\mathcal {O}(k\cdot 2^{\frac{n}{\lfloor \lg k \rfloor + 1}})\) time and memory.
3 Improved Algorithm for the 3List Problem
We focus on the 3list problem and show how to improve the complexity of Wagner’s 3tree algorithm. Our improvement is based on the idea of multicollisions. The technique mimics the approach developed by Nikolić et al. [24] and further generalized by Dinur et al. [12]. We exploit the ktree algorithm, but we also work with the passive lists and make them more active. Namely, instead of simply taking one element from the passive lists, we find in them partial multicollisions – sets of words that share the same value on particular bits. We then force the active lists on these bits to have a specific value (which is xor of all the values of the partial multicollisions), and at the final step, merge the results of the active and passive lists to obtain zero on the remaining bits. Let us take a closer look at this idea.
Definition 1
The set of nbit words \(S=\{x_1,\ldots ,x_p\}\) forms a ppartial multicollision on the s least significant bits, if \(low_s(x_1) = low_s(x_2)=\ldots =low_s(x_p)\).
This is to say that all p words are equal on the last s bits. Note, the choice to work with the least bits is not crucial but is introduced to simplify the presentation. Given an arbitrary set, we can create a ppartial multicollision from this set, i.e. we can find a subset that is ppartial multicollision. The maximal value of p depends on the size of the initial set and will be analyzed later in the section.
The complexity of our algorithm depends on the complexity of the two join operators and of producing multicollisions. The join operators (which are indeed simple collision searching algorithms) work in \(\mathcal {O}(2^l)\) as in each of the cases, the sizes of the lists are not larger than \(2^l\). Furthermore, the partial multicollisions from \(L_3=2^l\) can be produced in \(\mathcal {O}(2^l)\) time and memory^{3}. Hence the multicollision technique solves the 3list problem in \(\mathcal {O}(2^l)\) time and memory.
Experimental search of number of multicollisions.
l  Average size  \(\frac{l}{\ln l}\)  c 

10  5.80  4.34  1.34 
11  5.85  4.59  1.27 
12  6.10  4.83  1.26 
13  6.45  5.07  1.27 
14  7.00  5.30  1.32 
15  7.15  5.54  1.29 
16  7.55  5.77  1.31 
17  7.90  6.00  1.32 
18  8.15  6.23  1.31 
19  8.50  6.45  1.32 
20  8.70  6.68  1.30 
21  9.05  6.89  1.31 
22  9.50  7.12  1.33 
23  9.65  7.34  1.31 
24  10.30  7.55  1.36 
25  10.40  7.77  1.34 
26  10.60  7.98  1.33 
27  11.05  8.19  1.35 
28  11.15  8.40  1.33 
A comparison of the time complexities of Wagner’s 3tree with our new approach.
n  Speedup (\(\sqrt{p}\))  l 

64  3.43  31 
128  4.42  62 
256  5.82  126 
512  7.71  253 
The above strategy is in line with the multicollision approach by Nikolić et al. used in the analysis of the lightweight block cipher LED [14]. The advanced approach by Dinur et al., however, cannot be used for further improvements. One of their main ideas is to work simultaneously with a few multicollisions, instead of only one. In the case of the ktree algorithm, this would mean to produce from \(L_3\) several ppartial multicollision sets. However, each such set will collide on s different value of the l LSBs, i.e. the elements of the first ppartial multicollision set will have the value \(v_1\) on the l LSB, the elements of the second set will have the value \(v_2\), etc. The different values will increase the complexity of the later stage of ktree by a factor of s. When using the join operator on l bits of \(L_1\) and \(L_2\) there will be s targets (whereas previously we had only one), thus a simple collision search will have to be repeated s times. Therefore, in this particular case, Dinur et al. approach cannot be used.
Applications. The improvement of the 3tree algorithm can be used for cryptanalysis of authenticated encryption schemes proposed to the ongoing CAESAR [3]. Some of these schemes, to process the final incomplete blocks of messages, use a construction called XLS proposed by Ristenpart and Rogaway [27]. Initially, XLS was proven to be secure, however Nandi [22] points out flaws in the security proof and shows a very simple attack that requires three queries to break the construction. However, the CAESAR candidates that rely on XLS, do not allow this trivial attack as the required decryption queries are not permitted by the schemes. To cope with this limitation, Nandi proposes another attack [23], that requires only encryption queries. He is able to reduce the design flaw of XLS to the 3list problem. Therefore, Nandi is able to attack schemes that claim birthday bound query complexity because with only \(2^{\frac{n}{3}}\) queries (equivalent to size of the lists in the 3list problem), he can find a solution to the 3list problem (in \(2^{\frac{2n}{3}}\) time). However, Nandi cannot break the schemes that claim birthday bound time complexity, as he cannot solve the 3list problem faster than \(2^{\frac{n}{2}}\). Note, Nandi constructs the 3list problem from only \(2^{\frac{n}{3}}\) queries, rather than from \(3 \cdot 2^{\frac{n}{3}}\), as the elements of all three lists depend on the same \(2^{\frac{n}{3}}\) ciphertexts.
The CAESAR schemes based on XLS, such as Deoxys [16], Joltik [17], KIASU [18], Marble [13], SHELL [32], claim only birthday bound time complexity, thus Nandi’s findings do not break the security claims of these candidates. However, our improved 3tree algorithm goes below the birthday bound and thus can be used to show a slight weakness in some of these candidates.
Let us focus on the 128bit CAESAR candidates Deoxys and KIASU. The 3list problem for XLS in these candidates has the parameter \(n=128\). According to Table 2, we can take \(\sqrt{p}=4.42\) and \(l=62\). Consequently, the complexity of a forgery based on the XLS weakness is \(C \cdot 2^{62}\), where C is a constant factor. The value of C is 1 because: 1) as mentioned above, the 3 lists can be produced from the same \(2^{62}\) ciphertexts, and 2) all of the operations required by the improved 3tree algorithm are significantly less expensive than one encryption of the analyzed schemes. As a result, we obtain a forgery on the COPA modes of Deoxys and KIASU in \(2^{62}\) encryptions and thus the security level of these schemes is reduced by 2 bits from the claimed 64 bits.
4 Improved TimeMemory Tradeoffs
In applications, usually the elements of the lists \(L_i\) are in fact outputs of functions \(f_i\), thus GBP is often formulated as:
Problem 2
Given noninvertible functions \(f_1,\ldots ,f_k:\{0,1\}^{n'}\rightarrow \{0,1\}^n\), \(n'\ge n\), find \(y_1, \ldots , y_k \in \{0,1\}^{n'}\) such that \(f_1(y_1)\oplus f_2(y_2)\oplus \ldots \oplus f_k(y_k) = 0\).
In some applications, all the functions are identical, and the problem is to find distinct inputs:
Problem 3
Given a noninvertible function \(f:\{0,1\}^{n'}\rightarrow \{0,1\}^n, n'\ge n\), find distinct \(y_1, \ldots , y_k \in \{0,1\}^{n'}\) such that \(f(y_1)\oplus f(y_2)\oplus \ldots \oplus f(y_k) = 0\).
Both of the above problems give rise to the possibility of timememory tradeoffs, i.e., reducing the memory complexity of the ktree algorithm at the expense of time. We will investigate timememory tradeoffs for the GBP as defined in Problem 3. Recall that ktree in its current form assumes that both time and memory are of equal magnitude, i.e. \(T=M=O(k\cdot 2^{\frac{n}{\lg k + 1}})\).
The ktree relies on producing multiple collisions. For instance, at the first level of 4tree, \(2^{\frac{n}{3}}\) colliding pairs on \(\frac{n}{3}\) bits are produced. Producing these pairs is trivial when the amount of available memory is \(2^{\frac{n}{3}}\). However, once the memory is reduced to \(2^m, m < \frac{n}{3}\), the trivial collision search does not work.
The fact that ktree requires multiple collisions, opens doors to the following technique based on Hellman’s tables [15]^{5}.
Fact 1
(Hellman’s table) Let \(f:\{0,1\}^*\rightarrow \{0,1\}^n\) be an arbitrarysize input and nbit output function, \(N=2^n\), and let \(M=2^m\) be the amount of available memory. Once the precomputation equivalent to MX evaluations of f is performed, the cost of generating new collisions for f is \(\frac{N}{MX}\) per collision.
Joux and Lucks [20] use this technique to produce 3collisions. They set \(M=X=2^{\frac{n}{3}}\) to generate \(2^{\frac{n}{3}}\) ordinary collisions with time \(T=2^{\frac{2n}{3}}\) and memory \(M=2^{\frac{n}{3}}\). Then, they find another collision between \(2^{\frac{n}{3}}\) ordinary collisions and \(2^{\frac{2n}{3}}\) single values. When they generate \(2^{\frac{n}{3}}\) ordinary multiple collisions, Hellman’s table has an important role to keep the memory M rather than MX.
Further, we will use Hellman’s table to produce multiple collisions for the first level of ktree, but only on certain l bits (where \(l<n\)).
4.1 Improved TimeMemory Tradeoffs for the 4List Problem
We present a more efficient timememory tradeoff for GBP. Our tradeoff curve depends on the number of available lists, which is parameterized by k. For a better understanding, first we explain our algorithm for \(k=4\).
The original 4tree algorithm consists of twolevel collision searches (the parameter l used below will be determined later).

Level 1. Construct two lists, \(L_{12}\) and \(L_{34}\), each containing \(2^{\frac{nl}{2}}\) partial collisions on l bits.

Level 2. Find a collision between the elements of \(L_{12}\) and \(L_{34}\) on the remaining \(nl\) bits.
Our new 4tree algorithm works similarly with the exception of Level 1. At this level, we first construct Hellman’s table, and then we use it to find \(2^{\frac{nl}{2}}\) collisions. As a result, our algorithm decomposes Level 1 into two parts. Its complexity depends on the available memory M which in turn determines the length of the chains X. The updated 4tree is illustrated in Fig. 5 and is specified as follows.

Level 1a. Construct Hellman’s table containing M chains, each of length of X.

Level 1b. With the use of Hellman’s table, find \(2 \cdot 2^{\frac{nl}{2}}\) partial collisions on l bits. Store a half (\(2^{\frac{nl}{2}}\)) of them in a list \(L_{12}\) and the other half in \(L_{34}\).

Level 2. Find a collision between the elements of \(L_{12}\) and \(L_{34}\) on the remaining \(nl\) bits.
Construction of Hellman’s Table. For the Level 1a our algorithm first constructs Hellman’s table which contains M chains of length X. However, unlike in [20], we have the following technical obstacle. The function f takes an nbit input and produces an nbit output and thus for such a function only full nbit collisions can be identified. In other words, the classical Hellman’s table cannot be used to find partial collisions.
To solve this problem, we define a reduction function \(f_l:\{0,1\}^l \rightarrow \{0,1\}^l\) so that only the l bits are meaningful in the chain. For generating chains with \(f_l\), \(nl\) bits of 0’s are concatenated to the lbit input, and this value is processed with \(f:\{0,1\}^n \rightarrow \{0,1\}^n\). Finally, the nbit output is truncated to l bits, and is used as the input to the next chain. That is, \(f_l(x) = Trunc_l\bigl (f(0^{nl}x)\bigr )\), where \(Trunc_l(\cdot )\) truncates the nbit input to the l least significant bits.
To summarize, we choose M distinct lbit values \(v_i^0\) for \(i = 1,2,\ldots ,M\), for each of them generate a chain of length X by computing \(v_i^{j+1}=f_l(v_i^j)\) where \(j=1,2,\ldots ,X\). In total, MX values are in all the chains and only the first and the last points of each chain are stored in \(T_{pre}\). Thus Hellman’s table requires around MX time and M memory.
Generation of l bit Collisions. According to Fact 1, once Hellman’s table \(T_{pre}\) is constructed, the complexity for producing lbit collisions is reduced significantly. Considering that the size of the values in the chains is l bits and the length of each chain is X, Fact 1 shows that the cost is \(\frac{2^l}{MX}\) per collision.
To generate an lbit collision, we choose a random lbit value and with the function \(f_l\) from it compute a chain of length \(\frac{2^l}{MX}+X\). On average, one collision will occur before we reach the \(\frac{2^l}{MX}\) value of this new chain against the MX values stored in \(T_{pre}\). The computation of additional X values in the chain ensures that the corresponding \(v_i^X\) will appear as one of the ending points of \(T_{pre}\). The exact colliding pairs are detected by recomputing the chains from \(v_i^0\) and the chosen lbit value.
From the definition of \(f_l\), the two inputs colliding on f always have the form \((0^{nl}\Vert l_1, 0^{nl}\Vert l_2)\), where \(0^{nl}\) is a sequence of \(nl\) zero bits and \(l_1\) and \(l_2\) are some lbit values. A collision of the two chains means that \(Trunc_l\bigl (f(0^{nl}\Vert l_1)\bigr ) = Trunc_l\bigl (f(0^{nl}\Vert l_2)\bigr )\). Therefore, \(f(0^{nl}\Vert l_1)\) and \(f(0^{nl}\Vert l_2)\) only collide in the least significant l bits, while on the remaining \(nl\) bits behave randomly.
The collision generation process is iterated \(2^{\frac{nl}{2}}\) times and the input and output of each pair is stored in \(L_{12}\). Similarly, the process is iterated additional \(2^{\frac{nl}{2}}\) times and the results are stored in \(L_{34}\). Therefore the complexity of this step is around \(2\cdot 2^{\frac{nl}{2}} \cdot \frac{2^l}{MX}=2\cdot \frac{N^{\frac{1}{2}}2^{\frac{l}{2}}}{MX}\) time and \(2 \cdot 2^{\frac{nl}{2}} = 2 \cdot \frac{N^{\frac{1}{2}}}{2^{\frac{l}{2}}}\) memory.
Finding a Solution to the 4list Problem. From the two lists \(L_{12}\) and \(L_{34}\) containing \(2^{\frac{nl}{2}}\) partial collisions on l bits, we find a collision on the remaining \(nl\) bits. This procedure is straightforward and it requires \(2^{\frac{nl}{2}} = \frac{N^{\frac{1}{2}}}{2^{\frac{l}{2}}}\) time and no additional memory.
During the analysis, we relied implicitly on several facts. First, we assumed that Hellman’s table can contain an arbitrary number of points. In order to avoid collisions between the chains, however, the values of M and X cannot be arbitrary, but should depend on l. That is, during the construction of Hellman’s table, the number of chains and their length is bounded by the value of l. Biryukov and Shamir in [7] call this a matrix stopping rule, and define it as \(MX^2 \le 2^l\). It is trivial to see that this inequality holds in our case as \(MX^2 = M \frac{4N}{M^3} = \frac{4N}{M^2} = \frac{4N}{ (2N^{\frac{1}{2}}/2^{\frac{l}{2}})^2 } = 2^l\). For instance, when \(M=2^{\frac{n}{4}}\), then \(l\approx \frac{n}{2}\), \(T=2^{\frac{3n}{8}}\), \(X=2^{\frac{n}{8}}\). Therefore, obviously \(MX^2 = 2^{\frac{n}{2}} = 2^l\). We assumed as well that the tradeoff applies only to Problem 3. However, a close inspections of our algorithm reveals that it can be applied to the case of pairwise identical functions, i.e., \(f_1=f_2, f_3=f_4\). That is, the area of application of the tradeoff is wider, and is similar to the area of the tradeoff given by Bernstein et al. in (11). To deal with the extended case, we have to create two Hellman’s tables at the initial stage, one for each pair of functions. Thus the time and memory complexities will increase by a factor of two at Level 1a, and will stay the same at Levels 1b and 2.
4.2 Improved TimeMemory Tradeoff for the klist Problem
In this section, we generalize the timememory tradeoff for the ktree algorithm, where \(k = 2^d\). Overall, we replace the collision generation at Level 1 of the ktree algorithm with a generation based on Hellman’s table. Hereafter, we call the bits whose sum is fixed to zero clamped bits.
The ordinary ktree algorithm initially starts from \(2^d\) lists containing \(M=2^m\) elements. At Level 1, \(2^{d1}\) lists containing M elements are generated with m bits clamped. At Level i for \(i=2,3,\ldots ,d1\), \(2^{di}\) lists containing M elements are generated with im bits clamped. At the last Level d there are two lists containing M elements with \((d1)m\) bits clamped. As no longer M collisions are required, but rather only one, the sum on up to \((d+1)m\) bits can be 0, by setting \((d+1)m=n\), and thus the ktree algorithm will find the solution to the klist problem. However, if the memory size is restricted, i.e. \(m \ll \frac{n}{d+1}\), the ktree algorithm can enforce the sum of only \((d+1)m\) bits to zero.
A comparison of the number of clamping bits between the ktree and our algorithm.
#lists  #Clamped bits  

ktree algorithm  Our algorithm  
Level 1  \(2^{d1}\)  m  l 
Level i, \((i=2,\ldots ,d1)\)  \(2^{di}\)  im  \(l+(i1)m\) 
Level d  1  \((d+1)m\)  \(l+dm\) 
From the condition \(l=ndm\) and the parameters k and m, we can determine the reduction function \(f_l\) for Hellman’s table. We create M chains of length X, and only store the first and last values of the chains in Hellman’s table \(T_{pre}\). Once \(T_{pre}\) is constructed, we can find an lbit partial collision with a cost of \(\frac{2^l}{MX}\) per a collision, which is equivalent to \(\frac{N}{M^{d+1}X}\). At Level 1, we produce in total \((2^{d1} \cdot M)\) lbit collisions, and store them in \(2^{d1}\) lists each with M elements. The total cost for producing the partial collisions and thus the complexity of Level 1 is \(2^{d1} \cdot \frac{N}{M^dX}\).
Comparison of tradeoffs. For simplicity, the constant multiplication for N is ignored.
Method  M  T  Other parameters  

\(k=4\)  Bernstein et al. Eq. (9)  \(2^{\frac{n}{4}}\)  \(2^{\frac{n}{2}}\)  \(\) 
\((T \cdot M^2 = N)\)  \(2^{\frac{n}{6}}\)  \(2^{\frac{2n}{3}}\)  \(\)  
Our  \(2^{\frac{n}{4}}\)  \(2^{\frac{3n}{8}}\)  \(X=2^{\frac{n}{8}},l=\frac{n}{2}\)  
\((T^2 \cdot M = N)\)  \(2^{\frac{n}{6}}\)  \(2^{\frac{5n}{12}}\)  \(X=2^{\frac{n}{4}},l=\frac{2n}{3}\)  
\(k=8\)  Bernstein et al. Eq. (9)  \(2^{\frac{n}{5}}\)  \(2^{\frac{2n}{5}}\)  \(\) 
\((T \cdot M^3 = N)\)  \(2^{\frac{n}{6}}\)  \(2^{\frac{n}{2}}\)  \(\)  
Bernstein et al. Eq. (11)  \(2^{\frac{n}{5}}\)  \(2^{\frac{2n}{5}}\)  \(\)  
\((T^2\cdot M= N)\)  \(2^{\frac{n}{6}}\)  \(2^{\frac{5n}{12}}\)  \(\)  
Our  \(2^{\frac{n}{5}}\)  \(2^{\frac{3n}{10}}\)  \(X=2^{\frac{n}{10}},l=\frac{2n}{5}\)  
\((T^2 \cdot M^2 = N)\)  \(2^{\frac{n}{6}}\)  \(2^{\frac{n}{3}}\)  \(X=2^{\frac{n}{6}},l=\frac{n}{2}\) 
The previous curve given in (9) achieves the same time complexity as the ktree algorithm when sufficient memory is available, while the time complexity is about \(2^n\) when the available amount of memory is very limited. The previous curve given in (11) cannot reach the time complexity of the ktree algorithm even if sufficient memory is available, while the time complexity is at most \(2^{\frac{n}{2}}\) for very limited amount of memory. It is easy to see that our tradeoff takes advantages of those two curves, i.e. it requires the same complexity as the ktree algorithm when sufficient memory is available and requires only \(2^{\frac{n}{2}}\) time complexity when the available amount of memory is limited. Therefore, our tradeoff always allows a lower time complexity than both of the previous tradeoffs. It improves the time complexity and simplifies the situation, as it is the best for any value of m (unlike the previous two tradeoffs that outperformed each other for different values of m).
5 Conclusion
We have shown improvements to Wagner’s ktree algorithm for the case when k is not a power of two, and when the available memory is restricted. For the former case, our findings indicate that the passive lists can be used to reduce the complexity of the ktree (in the case of 3tree, by a factor of Open image in new window ). Rather than discarding the passive lists, we have produced multicollisions sets from them, and later, we have used the sets to decrease the size and thus the complexity of the ktree algorithm. In the case of a memory restricted klist problem, we have provided a new timememory tradeoff based on the idea of Hellman’s table. The precomputed table has allowed us to efficiently produce a large number of collisions at the very first level of the ktree algorithm, and thus to reduce the memory requirement of the whole algorithm. As a result, we have achieved an improved tradeoff that follows the curve \(T^2 M^{\lg k  1} = k \cdot N\).
We point out that we have run series of experiments to confirm parts of the analysis. In particular, we have verified that the predicted number of multicollisions and we have completely implemented the tradeoff for \(k=4, n=60\) and various sizes of available memory, e.g., \(m=8,10,14\). The outcome of the experiments has confirmed the tradeoff.
The 3list problem appears frequently in the literature and as our improved 3tree algorithm is the first that solves this problem with below the birthday bound complexity, we expect future applications of the algorithm. However, although our improved 3tree asymptotically outperforms Wagner’s 3tree algorithm, the speed up factor is lower for smaller values of n. Thus we urge careful analysis when applying the improved 3tree.
Bernstein [5] argues that the large memory requirement of Wagner’s ktree algorithm makes it impractical. He assumes that the memory access is far more expansive, thus the actual cost of the algorithm is miscalculated. He introduces tradeoffs (discussed in Sect. 4) to reduce the memory requirement, and to obtain algorithms of lower complexity (measured by the new metric). We note that as our tradeoffs are more memory effective, by the new metric they lead to better algorithms for the ktree problem with pairwise identical functions.
There are several future research directions. One is to consider restrictions on the amount of available data. The functions \(f_i\) in the klist problem are often assumed to be public, i.e. the attacker can evaluate them offline. When \(f_i\) are not public, the data needs to be collected by making online queries. Thus developing new timememorydata tradeoffs for this scenario is an interesting open problem. Another direction is to consider the weight of each function in the total cost of the algorithm, which leads to the case of an unbalanced GBP. This is based on the fact that in specific applications, it may occur that some of the functions are more costly to compute than other functions. The algorithm that solves an unbalanced GBP will be different than the one for the balanced GBP.
Footnotes
 1.
Note, we use \(\lg \) for \(\log _2\).
 2.
Joux and Lucks [20] use this technique to generate multiple collisions, which later lead to multicollisions.
 3.
It is to initialize counters for each possible value of the colliding bits, then for each \(x\in L_3\) increase the counter \(low_l(x_3)\). After all elements have been processed, counter with the highest value corresponds to the largest multicollision set.
 4.
It is pointed out in [6] that \(n\bar{n}\) bits can have an arbitrary value as long as the sum of all lists is zero. The technique is called clamping through precomputations.
 5.
Note, we could not exploit the more advanced Rivest’s distinguished points and Oechslin’s rainbow tables [25] to improve the analysis.
Notes
Acknowledgements
The authors would like to thank the anonymous reviewers of ASIACRYPT’15 for very helpful comments and suggestions (in particular, for pointing out the ballsintobins problem). Ivica Nikolić is supported by the Singapore National Research Foundation Fellowship 2012 (NRFNRFF201206).
References
 1.Ajtai, M., Kumar, R., Sivakumar, D.: A sieve algorithm for the shortest lattice vector problem. In: Vitter, J.S., Spirakis, P.G., Yannakakis, M. (eds.) Proceedings on 33rd Annual ACM Symposium on Theory of Computing, 6–8 July 2001, pp. 601–610. ACM, Heraklion (2001)Google Scholar
 2.Bellare, M., Micciancio, D.: A new paradigm for collisionfree hashing: incrementality at reduced cost. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS, vol. 1233, pp. 163–192. Springer, Heidelberg (1997) Google Scholar
 3.Bernstein, D.: CAESAR Competition (2013). http://competitions.cr.yp.to/caesar.html
 4.Bernstein, D.J.: Enumerating solutions to p(a) + q(b) = r(c) + s(d). Math. Comput. 70(233), 389–394 (2001)zbMATHCrossRefGoogle Scholar
 5.Bernstein, D.J.: Better priceperformance ratios for generalized birthday attacks. In: Workshop Record of SHARCS 2007: Specialpurpose Hardware for Attacking Cryptographic Systems (2007). http://cr.yp.to/rumba20/genbday20070719.pdf
 6.Bernstein, D.J., Lange, T., Niederhagen, R., Peters, C., Schwabe, P.: FSBday. In: Roy, B., Sendrier, N. (eds.) INDOCRYPT 2009. LNCS, vol. 5922, pp. 18–38. Springer, Heidelberg (2009) CrossRefGoogle Scholar
 7.Biryukov, A., Shamir, A.: Cryptanalytic time/memory/data tradeoffs for stream ciphers. In: Okamoto, T. (ed.) [26], vol. 1976, pp. 1–13. Springer, Heidelberg (2000)Google Scholar
 8.Bleichenbacher, D.: On the generation of DSA onetime keys. In: The 6th Workshop on Elliptic Curve Cryptography (ECC 2002) (2002)Google Scholar
 9.Blum, A., Kalai, A., Wasserman, H.: Noisetolerant learning, the parity problem, and the statistical query model. In: Yao, F.F., Luks, E.M. (eds.) Proceedings of the ThirtySecond Annual ACM Symposium on Theory of Computing, 21–23 May 2000, pp. 435–440. ACM, Portland (2000)Google Scholar
 10.Boneh, D., Joux, A., Nguyen, P.Q.: Why textbook ElGamal and RSA encryption are insecure. In: Okamoto, T. (ed.) [26], pp. 30–43 (2000)Google Scholar
 11.Chose, P., Joux, A., Mitton, M.: Fast correlation attacks: an algorithmic point of view. In: Knudsen, L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, p. 209. Springer, Heidelberg (2002) CrossRefGoogle Scholar
 12.Dinur, I., Dunkelman, O., Keller, N., Shamir, A.: Key recovery attacks on 3round EvenMansour, 8step LED128, and Full AES2. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013, Part I. LNCS, vol. 8269, pp. 337–356. Springer, Heidelberg (2013) CrossRefGoogle Scholar
 13.Guo, J.: Marble v1. Submitted to CAESAR (2014)Google Scholar
 14.Guo, J., Peyrin, T., Poschmann, A., Robshaw, M.: The LED block cipher. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 326–341. Springer, Heidelberg (2011) CrossRefGoogle Scholar
 15.Hellman, M.E.: A cryptanalytic timememory tradeoff. IEEE Trans. Inf. Theory 26(4), 401–406 (1980)zbMATHMathSciNetCrossRefGoogle Scholar
 16.Jean, J., Nikolić, I., Peyrin, T.: Deoxys v1. Submitted to CAESAR (2014)Google Scholar
 17.Jean, J., Nikolić, I., Peyrin, T.: Joltik v1. Submitted to CAESAR (2014)Google Scholar
 18.Jean, J., Nikolić, I., Peyrin, T.: KIASU v1. Submitted to CAESAR (2014)Google Scholar
 19.Joux, A., Lercier, R.: “Chinese and Match”, an alternative to Atkin “Match and Sort” method used in the sea algorithm. Math. Comput. 70(234), 827–836 (2001)zbMATHMathSciNetCrossRefGoogle Scholar
 20.Joux, A., Lucks, S.: Improved generic algorithms for 3collisions. In: Matsui, M. (ed.) ASIACRYPT 2009. LNCS, vol. 5912, pp. 347–363. Springer, Heidelberg (2009) CrossRefGoogle Scholar
 21.Minder, L., Sinclair, A.: The extended ktree algorithm. J. Cryptol. 25(2), 349–382 (2012)zbMATHMathSciNetCrossRefGoogle Scholar
 22.Nandi, M.: XLS is not a strong pseudorandom permutation. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. LNCS, vol. 8873, pp. 478–490. Springer, Heidelberg (2014) Google Scholar
 23.Nandi, M.: Revisiting security claims of XLS and COPA. Cryptology ePrint Archive, Report 2015/444 (2015). http://eprint.iacr.org/
 24.Nikolić, I., Wang, L., Wu, S.: Cryptanalysis of roundreduced LED. In: Moriai, S. (ed.) FSE 2013. LNCS, vol. 8424, pp. 112–130. Springer, Heidelberg (2014) Google Scholar
 25.Oechslin, P.: Making a faster cryptanalytic timememory tradeoff. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 617–630. Springer, Heidelberg (2003) CrossRefGoogle Scholar
 26.Okamoto, T.: ASIACRYPT 2000. LNCS, vol. 1976. Springer, Heidelberg (2000)zbMATHCrossRefGoogle Scholar
 27.Ristenpart, T., Rogaway, P.: How to enrich the message space of a cipher. In: Biryukov, A. (ed.) FSE 2007. LNCS, vol. 4593, pp. 101–118. Springer, Heidelberg (2007) CrossRefGoogle Scholar
 28.Schroeppel, R., Shamir, A.: A \(T=O(2^{n/2}), S=O(2^{n/4})\) algorithm for certain NPcomplete problems. SIAM J. Comput. 10(3), 456–464 (1981)zbMATHMathSciNetCrossRefGoogle Scholar
 29.Suzuki, K., Tonien, D., Kurosawa, K., Toyota, K.: Birthday paradox for multicollisions. In: Rhee, M.S., Lee, B. (eds.) ICISC 2006. LNCS, vol. 4296, pp. 29–40. Springer, Heidelberg (2006) CrossRefGoogle Scholar
 30.van Oorschot, P.C., Wiener, M.J.: Parallel collision search with cryptanalytic applications. J. Cryptol. 12(1), 1–28 (1999)zbMATHCrossRefGoogle Scholar
 31.Wagner, D.: A generalized birthday problem. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, p. 288. Springer, Heidelberg (2002) CrossRefGoogle Scholar
 32.Wang, L.: SHELL v1. Submitted to CAESAR (2014)Google Scholar