Keywords

1 Introduction

1.1 Background

In the search for possible cryptographic algorithms in the post-quantum era, multi-variate cryptography has emerged as one of the potential candidates. This branch of cryptography comprises schemes based on the difficulty to solve multi-variate polynomials over finite fields. While solving generic multi-variate polynomial systems is known to be NP-hard, a number of polynomial systems arising from cryptographic constructions have been efficiently solved. A well-known example is the polynomial system from the HFE cryptosystem [1]. As such, it is important to have a better understanding of the existing methods to solve multi-variate polynomial systems constructed from cryptosystems.

Various methods to solve multivariate polynomial systems exist. The most common approach to solve generic polynomial systems over finite fields is Gröbner basis algorithms [2,3,4]. Typically, a Gröbner basis with respect to the degree reverse lexicographical ordering is first computed via algorithms \(F_{4}\) or \(F_{5}\) [3, 4]. It is then converted to a Gröbner basis with respect to the lexicographical ordering by algorithms such as the FGLM algorithm [5] which contains equations where variables are eliminated. This enables the variables to be solved one at a time. See [6] for more details.

Another approach to solve multivariate polynomial systems is the XL algorithm and its variants [7,8,9,10]. This class of algorithms performs well when the system under consideration is overdetermined, that is, the number of equations far exceeds the number of variables. Briefly, this method works by considering multiples of the generating polynomials bounded by some degree for which the number of independent equations exceeds the number of monomials. Common linear algebra techniques can then be applied to solve this resulting system.

In both the Gröbner basis and XL type approaches, new polynomials are gradually added with degrees larger than the degree of the generating polynomials. This often causes the number of monomials to increase very rapidly, thereby resulting in excessive memory requirements. The ElimLin algorithm was first proposed by Courtois to attack DES [11, 12]. Essentially, the ElimLin algorithm seeks to successively find linear equations in the vector space generated by the original equations and subsequently, eliminating variables using these linear equations. Observe that his process constructs new polynomial systems with fewer variables without an increase in the degree. In fact, the “ElimLin” subroutine is incorporated into most Gröbner basis implementations.

All the above methods have been exploited in algebraic cryptanalysis of ciphers to solve for the unknown key variables in the polynomial system representing the ciphers. In general, block ciphers are less vulnerable to algebraic cryptanalysis as the polynomials usually have very high degrees or involve a large number of variables. Nonetheless, some block ciphers had been broken by algebraic attacks, including Keeloq [13]. As another example, ElimLin was used in [14] to solve 5 rounds of Present with half of the keybits fixed and 5 known plaintext/ciphertext pairs.

1.2 The Characteristic Set Algorithm

The characteristic set algorithm (or CSA for short) had been well-established to analyze algebraic properties of polynomial systems over algebraically-closed fields of characteristic 0, see [15, 16]. In [17, 18], the authors adapted this approach to solve polynomial systems over any finite field. Specifically, CSA seeks to decompose a polynomial system into monic triangular sets of polynomials with disjoint solution sets. Thus, the problem now boils down to solving a monic triangular set of polynomials which can be easily accomplished. In the case of a binary field, they further refined the decomposition algorithm so that the degrees of newly generated polynomials do not increase. The authors of [17] provided some experimental evidence which demonstrated that CSA seems to be more effective on sparse polynomial systems, particularly those exhibiting a block triangular structure. Examples of such systems include polynomials arising from linear feedback shift registers with nonlinear combining functions used in stream ciphers.

1.3 Our Contributions

In this paper, we propose some novel techniques to further enhance the binary CSA. The main features can be summarized as follows:

  • we incorporate ElimLin into the binary CSA.

  • we apply some statistical learning techniques to choose the next “splitting polynomial”.

  • we also adaptively choose the next polynomial set to process.

  • we propose a preprocessing phase where the variables can be sorted.

  • we allow for some flexibility in choosing the combination of features suitable for each polynomial system.

We call our enhanced version the enhanced binary characteristic set algorithm (EBCSA for short). In this paper, we present our experimental results on the block ciphers Present and Prince, as well as the stream ciphers Canfil and Toyocrypt. First, we benchmark EBCSA against prior versions of the characteristic set algorithms on Canfil ciphers. For Present cipher, we show that EBCSA outperforms previous algebraic methods as 5 rounds can be solved with fewer fixed keybits and known plaintext/ciphertext pairs. Moreover, we provide the first algebraic attack on 6 rounds of Present. As for the block cipher Prince, we carried out the first algebraic attack and our results showed that EBCSA is more effective than brute force search for 6 rounds of Prince-core (i.e. without whitening). Finally, we show that the stream cipher Toyocrypt can be solved by EBCSA in a more realistic setting compared to previous attacks, namely, using re-synchronization with sufficient random IV’s for a small number of keystream bits.

1.4 Organization of This Paper

This paper is organized as follows. In Sect. 2, we set out the notations and terminologies. We then provide a general description of the binary characteristic set algorithm [17] and its variant [18]. In Sect. 3, we describe our new techniques and present our enhanced binary characteristic set algorithm. Our experimental results are then provided in Sect. 4. Finally, we wrap up with some concluding remarks and suggestions for future research.

2 A Description of the Characteristic Set Algorithm

2.1 Notations and the Set-Up

Let \(F=\mathbf {{\mathcal F}}_{2}\) denote the binary field. For a positive integer n, let \(R=F[x_{0},x_{1},\ldots ,x_{n-1}]\) be the multivariate polynomial ring in n variables. Since \(x^{2}=x\) for all \(x\in F\), we are interested in the quotient ring \(\overline{R}=R/\{x_{i}^{2}+x_{i}:i=0,1,\ldots ,n-1\}\). In particular, every polynomial in R is equivalent to a representative \(p\in \overline{R}\) such that the degree of each variable in p is at most 1. In the following, we will represent all the polynomials in this form.

Definition 1

Fix \(\sigma \) to be a permutation on \(\{0,1,\ldots ,n-1\}\). Let P be a polynomial in \(\overline{R}\). We define the class of P with respect to \(\sigma \) to be the smallest i such that \(x_{\sigma (i)}\) occurs in P. We denote the class of P by \(\mathrm{cls}_{\sigma }(P)\). If \(\sigma \) is clear in the context, we will simply denote it by \(\mathrm{cls}(P)\).

Remark 1

  • For a constant polynomial \(P=0\) or 1, we define \(\mathrm{cls}(P) =-\infty \).

  • Let \(\sigma \) be the identity permutation, that is, \(\sigma (i)=i\) for \(i=0,1,\ldots ,n-1\). Then \(\mathrm{cls}_{\sigma }(P)\) is the smallest i such that \(x_{i}\) occurs in P. On the other hand, suppose that \(\sigma (i)=n-1-i\) for \(i=0,1,\ldots ,n-1\). Then \(\mathrm{cls}_{\sigma }(P)\) is the largest i such that \(x_{i}\) occurs in P. This coincides with the definition in [17].

For a polynomial \(P\in \overline{R}\), let c be its class with respect to a permutation \(\sigma \). Then P can be expressed as \(P=Ix_{\sigma (c)}+U\), where \(x_{\sigma (c)}\) does not occur in both I and U. We term I and U as the left child and right child of P, respectively. By the definition of c, it follows that both \(\mathrm{cls}_{\sigma }(I)\) and \(\mathrm{cls}_{\sigma }(U)\) are strictly greater than c. Note that a monic polynomial is one in which \(I=1\).

Definition 2

Let \({\mathcal A}\) be a finite set of polynomials in \(\overline{R}\). \({\mathcal A}\) is called triangular if the classes of the polynomials in \({\mathcal A}\) are all distinct. Moreover, \({\mathcal A}\) is monic triangular if every polynomial in \({\mathcal A}\) is monic.

The Main Problem: Let \({\mathcal P}\) be a finite set of boolean polynomials from \(\overline{R}\). We wish to find the zero set of \({\mathcal P}\), that is, to find \(Z({\mathcal P})=\{(x_{0},x_{1},\ldots ,x_{n-1})\in F^{n}:P(x_{0},x_{1},\ldots ,x_{n-1})=0\text { for all }P\in {\mathcal P}\}\).

2.2 The Basic Structure of Binary CSA

In this section, we describe the main principles underlying binary CSA to help solve the main problem stated above. Our description is more generic than that in [17] in the sense that we allow for a variable ordering on the variables. Note that we will focus on the multiplication-free CSA or MFCS presented in [17] as it was already shown to result in the best performance for the binary case. As such, CSA will refer to MFCS from now on. Throughout this section, we fix a permutation \(\sigma \) and simply write \(\mathrm{cls}(P)\) for the class of P. Essentially, the approach of binary CSA is a result of the following lemma.

Lemma 1

Let P be a polynomial of \(\overline{R}\) with class c and write \(P=Ix_{\sigma (c)}+U\). Then \(Z(\{P\})\) can be split into \(Z(\{P\})=Z({\mathcal P}_{1})\cup Z({\mathcal P}_{2})\) such that: (1) \(Z({\mathcal P}_{1})\) and \(Z({\mathcal P}_{2})\) are disjoint; (2) One of \({\mathcal P}_{1}\) or \({\mathcal P}_{2}\) contains a monic polynomial with class c.

Proof

As we are in the binary setting, the left child of P, namely I, can only take values 0 or 1 and \(Z(\{I\})\) and \(Z(\{I+1\})\) are disjoint. When \(I=0\), we have \(U=P+Ix_{\sigma (c)}=0\). On the other hand, if \(I=1\), we have \(x_{\sigma (c)}+U=Ix_{\sigma (c)}+U=P=0\). Let \({\mathcal P}_{1}=\{I,U\}\) and \({\mathcal P}_{2}=\{I+1,x_{\sigma (c)}+U\}\). The lemma now follows.

Suppose that \({\mathcal A}\) is a monic triangular set of polynomials. Clearly, \({\mathcal A}\) cannot contain more than n polynomials. Suppose that \(0\le c_{1}<c_{2}<\ldots <c_{t}\le n-1\) are the classes of the polynomials in \({\mathcal A}\). Then, \({\mathcal A}\) contains the polynomials \(P_{i}=x_{\sigma (c_{i})}+U_{i}, i=1,2,\cdots ,t\), where \(U_{1},U_{2},\ldots ,U_{t}\) represents the right child of \(P_{1},\ldots ,P_{t}\), respectively. Observe that since \(c_{1}<c_{2}<\ldots <c_{t}\), \(U_{t}\) does not contain the variables \(x_{\sigma (c_{1})},\ldots ,x_{\sigma (c_{t})}\). It follows that as long as the variables of \(U_{t}\) are fixed to some values, \(x_{\sigma (c_{t})}\) is determined. Similarly, \(U_{t-1}\) does not contain the variables \(x_{\sigma (c_{1})},\ldots x_{\sigma (c_{t-1})}\) which allows one to compute \(x_{\sigma (c_{t-1})}\). In this way, we can determine the zeroes of \({\mathcal A}\) by letting \(x_{i}\), \(i\not \in \{\sigma (c_{1}),\ldots ,\sigma (c_{t})\}\) take values 0 or 1 and computing the corresponding values of \(x_{\sigma (c_{1})},\ldots ,x_{\sigma (c_{t})}\) from the polynomials in \({\mathcal A}\).

In view of the above, the main goal of binary CSA is to decompose \(Z({\mathcal P})\) into disjoint sets based on Lemma 1. Specifically, we have the following theorem:

Theorem 1

[17, Theorems 4.1, 4.2] Let \({\mathcal P}\) be a finite set of boolean polynomials of \(\overline{R}\). The binary CSA decomposes \(Z({\mathcal P})\), in a finite number of steps, into a disjoint union of zero sets, namely,

$$\begin{aligned} Z({\mathcal P})=&\bigcup _{j=1}^{s}Z({\mathcal A}_{j}), \end{aligned}$$

where \({\mathcal A}_{1},\ldots ,{\mathcal A}_{s}\) are all monic triangular sets of boolean polynomials.

The main structure of CSA is described in Algorithm 1 in the Appendix. In order to achieve the goal of Theorem 1, the main function of binary CSA is the triset algorithm which takes in as input a finite set of boolean polynomials and outputs a monic triangular set \({\mathcal A}\) (possibly empty) as well as a set \({\mathcal P}^*\) of sets of boolean polynomials.

Remark 2

Beginning with the set \({\mathcal P}\), the whole CSA can be thought of as building a binary tree of sets (\({\mathcal P}^*\)) by adding \({\mathcal Q}^*\) as a branch to the corresponding input set to the triset function. The process stops when all the sets in the tree are processed and the result is a set \({\mathcal A}^*\) of monic triangular sets. Since the polynomial splits by letting \(I=0\) or 1, the binary CSA is essentially a generalization of brute force search where we split by letting the variables take 0 or 1.

Remark 3

Notice that we iteratively split polynomials to obtain sets with disjoint solutions. In this basic version of the triset function, we work on polynomials according to their class, namely, we pick the polynomial with the smallest class. In addition, the next input set to be fed into the triset function is generally taken to be the last generated set in \({\mathcal P}^*\).

2.3 BCSA

In [18], the authors proposed some additional features to improve the efficiency of the binary CSA in [17]. We call it BCSA and briefly describe the techniques here.

Linearization. The authors claimed that this technique was already implemented in the experiments of [17] but it was not explicitly mentioned. This technique applies to the triset function. Suppose that at a certain step, the polynomial \(x_{\sigma (c)}+L\) is generated, where \(\deg (L)\le 1\). Then we substitute \(x_{\sigma (c)}\) with L in the remaining polynomials and add \(x_{\sigma (c)}+L\) to \({\mathcal A}\). Observe that this forms part of the ElimLin process (only the substitution part).

The Choose Function. As discussed in Remark 2, we continue to pick polynomials to split until a monic triangular set is obtained or an inconsistency is reached. In the triset function of CSA, we simply pick the next polynomial to be one with the smallest class in the set. To improve efficiency, it was suggested in [18] to sort the polynomials according to some complexity metric and to choose the “simplest” polynomial among the remaining polynomials to split. The rationale is to seek for more linear polynomials which will help to simplify other polynomials by linearization described above. The complexity metric proposed in BCSA orders the polynomials as follows: A polynomial P is simpler than a polynomial \(P'\) if \((\deg (P),\#I(P),\#U(P),n-1-\mathrm{cls}_{\sigma }(P))<(\deg (P'),\#I(P'),\#U(P'),n-1-\mathrm{cls}_{\sigma }(P'))\) in a lexicographical manner, where I(P) (resp. U(P)) and \(I(P')\) (resp. \(U(P')\)) represent the left child (resp. right child) of P and \(P'\) respectively. Note that the description of the choose function in [18] with respect to the class of the polynomials is in a reverse order due to the difference in the definition of the class of a polynomial.

The Threshold Value. By Lemma (1), whenever we split a polynomial, we obtain a monic polynomial with the same class. In the triset function of CSA, we first split all the polynomials of the same class c to obtain all the monic polynomials with class c. We choose one of the monic polynomials \(P_{0}\) to be in \({\mathcal A}\) and then add \(P_{0}\) to all the monic polynomials to eliminate the class c from the remaining polynomials (the adding step).

Since BCSA splits polynomials in order of their complexity instead of their classes, it is not efficient to first obtain all the monic polynomials before adding them. As such, the authors of [18] introduced a threshold value t such that whenever t monic polynomials of the same class are obtained, the adding process will be executed to eliminate the class from \(t-1\) of these polynomials.

3 The Enhanced Binary CSA

In this paper, we propose additional techniques to enhance BCSA. Our version will be called EBCSA. Like BCSA, we keep the core of the structure, namely, EBCSA comprises the EBCSA-triset function and the main EBCS function, as shown in Fig. 1, where the new features introduced in EBCSA are marked in bold. Its basic structure is described in Algorithm 2 in the appendix. In the subsequent sections, we will describe the main features of EBCSA.

Fig. 1.
figure 1

Principles of the enhanced binary characteristic set algorithm

3.1 The ElimLin Technique

The first main feature of EBCSA is to integrate ElimLin into EBCSA-triset [11, 12]. Specifically, whenever a new polynomial is generated in \({\mathcal P}\), we perform the following as long as the generated set contains a linear polynomial. First, order the monomials in decreasing order of degrees (for example, the Grevlex ordering). With respect to this monomial ordering, represent each polynomial as a vector of coefficients. Next, perform Gaussian elimination on the matrix of these vectors. If 1 is obtained, one can terminate EBCSA-triset with \({\mathcal A}=\emptyset \) and \({\mathcal P}^*\). Otherwise, obtain all the linear polynomials \(x_{i}+L_{i}\) and substitute \(x_{i}\) with \(L_{i}\) in the remaining polynomials of \({\mathcal P}\) for all such i’s.

A key merit of this process is that it allows us to detect all the linear polynomials in the vector space spanned by the polynomials [12], thereby simplifying the remaining polynomials more quickly. At the same time, it detects an inconsistent set more rapidly. However, the main drawback is that it generally requires more memory to construct the matrix and may not be feasible when our polynomial set contains too many polynomials.

3.2 An Improved Complexity Metric

In BCSA, a complexity metric was introduced in order to sort the polynomials. With respect to this metric, the next simplest polynomial will be picked to perform the splitting. Their metric compares the degree of the polynomials, the number of terms in the left child and the right child of the polynomials as well as the classes of the polynomials in a linear manner. We propose a better metric based on the Sigmoid function.

For a child \(\mathbf {p}\) of a polynomial \(\mathbf {P}\), we define its complexity as

$$\begin{aligned} complex\left( x_{1},x_{2}\right) =x_{1}+Sigmoid\left( x_{2},K\right) =x_{1}+\frac{1-e^{\tfrac{x_{2}}{K}}}{1+e^{\tfrac{x_{2}}{K}}} \end{aligned}$$
(1)

where \(x_{1}=\deg \left( \mathbf {p}\right) \) determines the overall complexity and \(x_{2}=\#\left( \mathbf {p}\right) \) counts the number of terms in \(\mathbf {p}\) and it is normalized by the Sigmoid function to the interval \(\left( 0,1\right) \). Here \(K=-100\) is a scaling parameter which is optimized experimentally. Then the complexity of the polynomial \(\mathbf {P}=I\cdot x_{\sigma (c)}+U\) is defined as

$$\begin{aligned} Complex\mathbf {\left( P\right) } = Complex\mathbf {\left( I\right) +\alpha \left( \mathbf {P}\right) \cdot }Complex\mathbf {\left( U\right) } \end{aligned}$$
(2)

and

$$\begin{aligned} \alpha \left( \mathbf {P}\right) =c^{a\cdot \deg \left( \mathbf {P}\right) -b}=10^{3\cdot \deg \left( \mathbf {P}\right) -8} \end{aligned}$$
(3)

where \(Complex\mathbf {\left( p\right) } = complex\mathbf {\left( \deg \left( p\right) ,\#\left( p\right) \right) }\), and a, b and c are set to be 3, 8 and 10, respectively. \(\mathbf {p}\) is taken as either \(\mathbf {I}\) or \(\mathbf {U}\). Note that as the degree of the polynomial system increases, the complexity of the child \(\mathbf {U}\) tends to be more emphasized. Thus the coefficient \(\alpha \left( \mathbf {P}\right) \) is defined as an exponential function of the degree. For instance, when \(\deg \left( \mathbf {P}\right) \) is 2, 3 and 4, the weighting coefficient \(\alpha \left( \mathbf {P}\right) \) is 0.01, 10 and 10000, respectively.

In our model, the Sigmoid function is chosen as a candidate because it has good properties, namely, it is a bounded differentiable real-valued function in \(\left( 0,1\right) \) that is defined for all real input values and has a positive derivative at each value. The standard Sigmoid model is \(Sigmoid\left( x\right) =\frac{1}{1+e^{\tfrac{x-x_{0}}{K}}}\), and we are using a modified version \(Sigmoid\left( x\right) =\frac{1-e^{\tfrac{x}{K}}}{1+e^{\tfrac{x}{K}}}\). The complexity model in Eq. 2 is a combination of linear model and the modified Sigmoid model, and it preserves the good properties of the standard model. Similar logistic functions have been effectively used in population growth modelling, artificial neural networks and distance measure.

To select the best model mentioned above, we have tried many possible parameters for each model. Preliminary experimental results show that the modified Sigmoid model \(Sigmoid\left( x\right) =\frac{1-e^{\tfrac{x}{K}}}{1+e^{\tfrac{x}{K}}}\) exhibits the best overall performance on our training sets. The training sets include 120 randomly selected polynomial sets from more than 300 available polynomial sets generated from many different ciphers with degrees ranging from 2 to 4, and the rest of the sets are the testing sets. To determine the optimal parameters of each model, a 10-fold cross-validation [19] is applied. Hence, the parameters are dependent on the specific cipher as well as the selected features (degree and number of terms).

In order to reduce the computational cost when selecting the optimal parameters, the parameters are confined to a limited range by preliminary observations. Then the parameter setting \(\left( K,a,b,c\right) \) in Eqs. 1, 2 and 3 is determined by the cross-validation procedure as follows: we split the training sets into 10 equal sized parts; using 9 parts we fit the parameters, that is, record the parameter setting that produces the best performance; calculate its performance on the remaining 1 part as the validation set. We repeat these steps by using every part as the validation set. We repeat the whole procedure 3 times. Finally we get an average of the 3 solving times which corresponds to each parameter setting. We choose the setting which corresponds to the minimum solving time. For each of the models mentioned above, we repeat this procedure. As a result, the optimal parameter setting is determined as \(\left( K,a,b,c\right) =\left( -100,3,8,10\right) \). The parameters are not very sensitive, that is, when they are modified by around \(10\%\), the solving time remains within a \(10\%\) change. This parameter setting is not guaranteed to be effective for all individual polynomial sets, but is optimal in the sense of overall performance.

By contrast, the choose function in BCSA uses a layered linear model with respect to the essential parameters. This model suffers from the following drawbacks. (1) There are no parameters to control, limiting the possibility to optimize the performance; (2) The “layered” comparison sometimes cannot get reasonable results, e.g. a polynomial with a higher degree but very few terms is not necessarily more complex than a polynomial with a lower degree but the number of terms is huge. As such, the proposed EBCSA complexity model has more flexibility to produce better performance by optimization over a number of polynomial sets. Indeed, we have carried out extensive experiments using different functions and the above function works well in general. We present our experimental results for solving Canfil functions as comparison between using the modified Sigmoid function and the choose function of BCSA in Subsect. 4.1.

3.3 The Set Metric

We introduce a set metric on the sets in the binary tree \({\mathcal P}^*\) to help us find a set with a solution more quickly. Hence, this feature is particularly useful if the system admits a known number of solutions, as well as the case where only a fixed number of solutions is required.

Observe that polynomial sets in the same branch may contain many identical polynomials. Roughly speaking, once many sets in a certain branch terminate without a solution, we like to choose sets from a different branch to increase the chances of a solution. In other words, we will look for a set which is most dissimilar to the previous terminated sets.

For the purpose of practical implementations, we propose the following statistical feature for a set \({\mathcal P}\) of polynomials: \(S_{{\mathcal P}}=(m_{1},s_{1},m_{2},s_{2})\), where \(m_{1},m_{2}\) (respectively \(s_{1},s_{2}\)) represent the mean (respectively the standard deviation) of the degrees and number of terms of the polynomials in \({\mathcal P}\) respectively. Suppose that \({\mathcal P}_{1},{\mathcal P}_{2},\ldots ,{\mathcal P}_{l}\) are inconsistent, that is, terminate without any solution. Let \(a({\mathcal P}_{1},\ldots ,{\mathcal P}_{l})=\frac{s_{{\mathcal P}_{1}}+\ldots +s_{{\mathcal P}_{l}}}{l}\), the average of the statistical features of the terminated sets. Then we choose a set \({\mathcal P}\) such that the Euclidean distance between \(s_{{\mathcal P}}\) and \(a({\mathcal P}_{1},\ldots ,{\mathcal P}_{l})\) is the largest on the fly.

3.4 Sorting the Variables

In our description of binary CSA, we have introduced the permutation \(\sigma \) to allow for the variables to be rearranged. In general, CSA tends to be more efficient if \(\sigma \) is chosen so that the variables \(x_{\sigma (0)},\ldots ,x_{\sigma (x_{n-1})}\) occur in an increasing order of frequency in the polynomial system. The underlying reasons are as follows: recall that when performing linearization, we substitute a variable \(x_{i}\) with a linear polynomial \(L_{i}\). While this reduces the number of variables, it may result in the polynomial system becoming more complex, particularly when L contains many terms. Hence, it is more advisable to substitute those \(x_{i}\) which do not occur too frequently. In terms of implementation, we simply rename the variables so that \(x_{0},x_{1},\ldots ,x_{n-1}\) occur in an ascending order of frequency in the polynomial system.

3.5 Combining the Various Features

We observe from our experiments that not all polynomial systems behave equally well with the same set of features. According to experimental results, some sets may be more efficient with linearization alone while others perform much better with the ElimLin technique. In addition, the set selection may not be necessary if the system has many solutions and we seek to find all of them. In view of the above, EBCSA allows for different combinations of features to yield the best performance. In our current implementation of EBCSA, the following choices are provided: (1) Linearization or ElimLin; (2) Set selection based on statistical feature or set index; and (3) Terminate with one solution or all solutions.

Remark 4

Observe that unlike BCSA, we do not need a threshold value here as whenever we obtain two monic polynomials with the same class, we add them and choose the simpler polynomial according to our complexity metric to be in the set \({\mathcal A}\). In addition, in case the ElimLin option is turned on, polynomials with the same leading term will automatically be added.

4 Experimental Results

We tested EBCSA with many different polynomial systems, specifically those from block and stream ciphers. Our experiments show significant improvements over existing algebraic methods. In this paper, we present three such experiments, namely, the block ciphers Present and Prince as well as the stream cipher Toyocrypt. Experiments on Canfil ciphers were also carried out to compare the efficiency of EBCSA against CSA and BCSA. Since our goal is to find the unknown key which is likely unique, we use the following configurations of EBCSA for all the experiments: Linearization/ElimLin, set selection, and termination when one solution is obtained.

The platform for testing is a personal computer (Ubuntu 14.04, Intel Xeon CPU E5640 2.67 GHz, 24 GB RAM). Only single-core CPU is used. In the following, we further classify EBCSA into EBCSA and EBCSA-GE (short for EBCSA with Gaussian Elimination), where the former refers to EBCSA with linearization and the latter representing EBCSA with the ElimLin feature switched on.

4.1 Experiments on Our Complexity and Set Metrics

In order to benchmark EBCSA against BCSA and CSA, we performed experiments on the Canfil functions similar to those carried out in [17, 18]. The Canfil polynomial sets are boolean polynomial systems arising from linear feedback functions with nonlinear Canfil functions. In [17, Table 4], comparisons were made between CSA and the Gröbner basis algorithm for 8 such functions and the results showed that CSA outperformed the Gröbner basis method for these polynomial sets. In [18, Table 2], the authors recorded the number of splits (or branches) resulting from both CSA and BCSA for these same functions and concluded that BCSA was more effective on these sets.

In this paper, we carry out the same experiments using EBCSA and the number of splits (number of new non-terminating sets produced) for each of the 8 Canfil functions is recorded in Table 1. In addition, the timing on the Canfil functions using the approach of EBCSA with set selection is also listed.

Table 1. Comparison between EBCSA and BCSA in terms of the number of splits

Observe that EBCSA performs better than BCSA in 5 out of the 8 Canfil experiments with significant improvements for Canfil1, Canfil4, Canfil5 and Canfil6. For Canfil7, CSA outperforms BCSA but is less effective than EBCSA (with set selection). Note that we have used a random key in our experiments and it was not stated if a random or a specific key was used in the experiments in [17, 18].

The improvements of EBCSA suggest that our choice of a combined continuous Sigmoid function for polynomial complexity is more effective than the layered linear model used in BCSA. In addition, the use of set selection results in finding the unique solution more efficiently.

4.2 Experiments on Present Cipher

Present is a 31-round lightweight block cipher designed by Bogdanov et al. announced in CHES 2007 [20]. It has a block length of 64-bits and key lengths of either 80-bits or 128-bits. It is based on a permutation-substitution network with 16 4-bit S-boxes in each round. In [14], algebraic cryptanalysis was applied to Present with the 80-bit master key. This involves representing the intermediate text by variables and each S-box by 21 quadratic equations. The authors claimed that with 5 known plaintext/ciphertext (KP) pairs, 40 key bits can be recovered by the method of ElimLin.

In the report of [21], results on algebraic attacks on Present with ElimLin and the Polybori Gröbner basis implementation were presented. Even though ElimLin was implemented as a sub-routine in Polybori implementation, it was demonstrated that the stand-alone ElimLin implementation was more effective to solve Present. In addition, it was reported that 6 rounds of Present remained resistant to algebraic cryptanalysis using either ElimLin or Polybori.

As EBCSA-GE incorporates ElimLin into EBCSA, we perform experiments on round-reduced Present with EBCSA-GE. Once again, we consider the implicit representation of each s-box with 21 quadratic equations. Note that we place all the text variables as the front variables while the 80 key variables are placed behind. We perform our experiments with either 1 or 2 KP. We considered r rounds of Present for \(r=5\) and \(r=6\). For experiments with one KP, we did some pre-processing on the polynomial systems before running them on EBCSA-GE. Specifically, we rename the variables so that the text variables and the key variables are sorted in an ascending order of frequency. In each of the experiments, we fix \(x_{n-c-1},x_{n-c},\ldots ,x_{n-1}\), (that is, c most frequent key variables) and record the time taken as well as the number of splits as c varies.

Table 2. 5-round and 6-round present with 1 KP on EBCSA-GE
Table 3. 5-round present with 2 KP on EBCSA-GE
Table 4. A comparison of algebraic attacks on 5-round present

One sees that EBCSA-GE outperforms both ElimLin and the Polybori implementation of [21] in the following sense. From Table 4 with 40 keybits fixed, EBCSA-GE requires only 1 KP to solve in 4 min, which is of comparable magnitude to the timings of ElimLin and Polybori based on 5 KP’s. With just 2 KP’s, EBCSA-GE already outperforms ElimLin and Polybori, with a solving time of 9.5s. Next with 36 keybits fixed, EBCSA-GE can solve the system in a few minutes based on just 2 KP’s while ElimLin and Polybori both take longer to solve (3 to 4 h) and need more plaintexts (16 KP’s). From Table 3 with 2 KP, EBCSA-GE can solve up to 52 keybits (28 fixed keybits) for 5 rounds of Present, which is more keybits than what ElimLin and Polybori can solve from [21]. Finally from Table 2, EBCSA-GE can solve 6 rounds of Present with 1 KP and 52 fixed keybits, while the ElimLin and Polybori experiments from [21] are only conducted up to 5 rounds.

Recall that ElimLin essentially seeks to find linear polynomials in the space spanned by the polynomials and then eliminating their leading variables from the remaining polynomials via substitution. This method can only be successful if there are sufficient linear polynomials in the spanning set. This implies that more polynomials involving the key variables in the original set increases the chance of finding enough linear polynomials and hence, leading to a greater number of KPs as observed. In case there are insufficient linear polynomials, Gröbner basis may not be effective as an increase in the degree of the polynomials with a large number of variables will explode the memory. On the other hand, EBCSA-GE executes the splitting process without a rapid increase in memory and continues to find linear polynomials via ElimLin. By alternating between splitting and ElimLin processes, one sees that EBCSA-GE is more effective for Present cipher.

4.3 Experiments on Prince Cipher

Prince is another lightweight block cipher published in Asiacrypt 2012 [22]. It is a 12-round block cipher with a 64-bit block length and a 128-bit key length. It is optimized with respect to latency for hardware implementations. Its construction satisfies the reflection property and is symmetric about the middle layer. The cryptanalysis of Prince has gained much attention, especially after a challenge was posed by Ruhr-Universität Bochum [23] to find a practical attack for round-reduced Prince. To date, it was shown that round-reduced Prince is vulnerable to various attacks including Differential and Integral cryptanalysis as well as time-memory trade-off and meet-in-the-middle attacks [24,25,26,27,28]. To the best of our knowledge, no algebraic attack has been published on Prince.

We carry out the first algebraic attack on Prince. Specifically, we consider Prince-core, that is, without whitening. Hence, only the key \(K_{1}\) is used. As in all block ciphers, we consider the implicit representation of the s-boxes, thereby resulting in 21 quadratic equations for each s-box. In addition, we introduce 64 intermediate variables for each round. Since whitening is not used, the variables for the first round and the last two rounds can be omitted.

Now, consider the 6-round Prince-core. This system has 384 variables (320 text variables and 64 key variables) and 2016 quadratic equations. We place the key variables behind (\(x_{320},\ldots ,x_{383}\)). We generate 2 different systems using different random keys. For each key, we fix some key bits and run the system with our EBCSA solver. First, Table 5 records the results for Prince with both EBCSA and EBCSA-GE.

Table 5. 6-round Prince-core: EBCSA vs EBCSA-GE

Observe that when more bits are fixed, both EBCSA and EBCSA-GE should be executed in parallel to find the key as it is not certain which of the two versions will be more efficient. However, when fewer bits are fixed, EBCSA-GE tends to require much more memory to generate the large matrices, thereby slowing down the process significantly. In the following tables, we show the results of EBCSA for two random keys by varying the number of fixed keybits. By fitting the results with a suitable graph, we extrapolate the results to estimate the complexity without fixing any key bits. We conclude that the number of splits of Prince-core is generally better than the brute force \(f=2^{x}\), where x denotes the number of unknown keybits, as shown in Fig. 2. The results are also listed in Table 6, where the complexity of EBCSA is expected to provide an improvement of more than half the key size over the brute force search. We remark that this approach can be applied to test the complexity of other block ciphers as well.

Fig. 2.
figure 2

Splits of Prince-core. (a) \(f=324.19 \times 2^{0.494x}\) (b) \(f=3287.44\times 2^{0.337x}\)

Table 6. Splits of Prince-core (a) \(f=324.19\times 2^{0.494x}\) (b) \(f=3287.44\times 2^{0.337x}\)

The comparison with the existing attack on 6-round Prince core from [24] is provided in Table 7. It is obvious that the proposed EBCSA algorithm is more practical in terms of required number of chosen plaintext.

Table 7. Comparison of existing attacks on 6-round Prince-core

Remark 5

Data: number of chosen plaintext; Time: equivalent Prince operations; complexity of attacks on Prince is measured by \(D\times T<2^{128}\); Memory: Number of Prince 64-bit WORDs.

4.4 Experiments on the Stream Cipher Toyocrypt

The Toyocrypt [29] is a stream cipher comprising a 128-bit Galois linear feedback shift register (GLFSR) passing through a nonlinear function f of degree 63. Obviously, a naive application of algebraic cryptanalysis will not work due to the high degree of f. In [30], the authors discovered that in fact, \((x_{23}+1)f\) and \((x_{42}+1)f\) are cubic polynomials. Further, in [31], the authors showed that with re-synchronization and 4 IV’s, the degrees of the equations can be further reduced to 1. Note that in this case, the 4 IV’s must span a 2-D vector space.

Several algebraic attacks on Toyocrypt have been proposed. Table 8 summarizes the assumptions and requirements needed to carry out existing algebraic attacks on Toyocrypt.

Table 8. Comparison of assumptions on algebraic attacks on Toyocrypt

The attack of [30] requires \(2^{18}\) keystream bits for one session, while the attack of [31] requires 128 keystream bits from 4 chosen IV resynchronizations. These attack scenarios may not be easy to achieve in practice. Here we consider a more realistic scenario where only a few keystream bits can be deduced from some known information about the plaintext. Besides, we allow the IV in different sessions to be selected randomly. Finally, we apply our attack to Toyocrypt with GLFSR as in the original specifications of the cipher unlike previous attacks which only considered Toyocrypt with LFSR as a good approximation.

To be more precise, we fix a GLFSR with feedback taps 0, 1, 2, 7, 128. For any 2 IV’s,it follows from [31, Theorem 1] that the sum of \((x_{23}+1)f\) (or \((x_{42}+1)f\)) at these two IV’s result in equations of degree 2. Instead of choosing some fixed IV’s, we randomly choose k IV’s and construct the corresponding equations for \((x_{23}+1)f+z(x_{23}+1)\) and \((x_{42}+1)f+z(x_{42}+1)\), where z represents the keystream bit. Notice that these are degree 3 equations but the sum of any two of them will give a degree 2 equation.

We perform our experiments with at most 6 keystream bits using EBCSA-GE and record the results of our experiments in Table 9. In the preprocessing stage, we once again arranged the 128 variables ascendingly of their frequencies.

Table 9. Performance of Toyocrypt with EBCSA-GE

5 Conclusions and Future Work

In this paper, we enhanced the binary characteristic set algorithm to work more effectively on both block ciphers and stream ciphers. Using EBCSA, we improve algebraic attacks on various well-known ciphers including Prince, Present and Toyocrypt. EBCSA uses CSA as its core and incorporates ElimLin and some statistical features to improve its performance. In our current version, we can turn ElimLin on or off for each input polynomial system. For future research, we will seek to identify some characteristics of the polynomial system to choose between EBCSA-GE or EBCSA. We can further integrate the criteria into the main algorithm so that ElimLin can be turned on or off for each individual set produced by EBCSA.