Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

figure a

1 Introduction

Cryptographic algorithms are widely used in embedded computing devices, including SmartCards, to form the backbone of their security mechanisms. In general, security is established by assuming that the adversary has access to the input and output, but not internals, of the implementation. Unfortunately, in practice, attackers may recover cryptographic keys by analyzing physical information leaked through side channels. These so-called side-channel attacks exploit the statistical dependence between secret data and non-functional properties of a computing device such as the execution time [38], power consumption [39] and electromagnetic radiation [49]. Among them, differential power analysis (DPA) is an extremely popular and effective class of attacks [30, 42].

To thwart DPA attacks, masking has been proposed to break the statistical dependence between secret data and side-channel leaks through randomization. Although various masked implementations have been proposed, e.g., for AES or its non-linear components (S-boxes) [15, 37, 51, 52], checking if they are correct is always tedious and error-prone. Indeed, there are published implementations [51, 52] later shown to be incorrect [21, 22]. Therefore, formally verifying these countermeasures is important.

Previously, there are two types of verification methods for masking countermeasures [54]: one is type inference based [10, 44] and the other is model counting based [26, 27]. Type inference based methods [10, 44] are fast and sound, meaning they can quickly prove the computation is leakage free, e.g., if the result is syntactically independent of the secret data or has been masked by random variables not used elsewhere. However, syntactic type inference is not complete in that it may report false positives. In contrast, model counting based methods [26, 27] are sound and complete: they check if the computation is statistically independent of the secret [15]. However, due to the inherent complexity of model counting, they can be extremely slow in practice.

Fig. 1.
figure 1

Overview of SCInfer, where “ICR” denotes the intermediate computation result.

The aforementioned gap, in terms of both accuracy and scalability, has not been bridged by more recent approaches [6, 13, 47]. For example, Barthe et al. [6] proposed some inference rules to prove masking countermeasures based on the observation that certain operators (e.g., XOR) are invertible: in the absence of such operators, purely algebraic laws can be used to normalize expressions of computation results to apply the rules of invertible functions. This normalization is applied to each expression once, as it is costly. Ouahma et al. [47] introduced a linear-time algorithm based on finer-grained syntactical inference rules. A similar idea was explored by Bisi et al. [13] for analyzing higher-order masking: like in [6, 47], however, the method is not complete, and does not consider non-linear operators which are common in cryptographic software.

Our Contribution. We propose a refinement based approach, named SCInfer, to bridge the gap between prior techniques which are either fast but inaccurate or accurate but slow. Figure 1 depicts the overall flow, where the input consists of the program and a set of variables marked as public, private, or random. We first transform the program to an intermediate representation: the data dependency graph (DDG). Then, we traverse the DDG in a topological order to infer a distribution type for each intermediate computation result. Next, we check if all intermediate computation results are perfectly masked according to their types. If any of them cannot be resolved in this way, we invoke an SMT solver based refinement procedure, which leverages either satisfiability (SAT) solving or model counting (SAT#) to prove leakage freedom. In both cases, the result is fed back to improve the type system. Finally, based on the refined type inference rules, we continue to analyze other intermediate computation results.

Thus, SCInfer can be viewed as a synergistic integration of a semantic rule based approach for inferring distribution types and an SMT solver based approach for refining these inference rules. Our type inference rules (Sect. 3) are inspired by Barthe et al. [6] and Ouahma et al. [47] in that they are designed to infer distribution types of intermediate computation results. However, there is a crucial difference: their inference rules are syntactic with fixed accuracy, i.e., relying solely on structural information of the program, whereas ours are semantic and the accuracy can be gradually improved with the aid of our SMT solver based analysis. At a high level, our semantic type inference rules subsume their syntactic type inference rules.

The main advantage of using type inference is the ability to quickly obtain sound proofs: when there is no leak in the computation, often times, the type system can produce a proof quickly; furthermore, the result is always conclusive. However, if type inference fails to produce a proof, the verification problem remains unresolved. Thus, to be complete, we propose to leverage SMT solvers to resolve these left-over verification problems. Here, solvers are used to check either the satisfiability (SAT) of a logical formula or counting its satisfying solutions (SAT#), the later of which, although expensive, is powerful enough to completely decide if the computation is perfectly masked. Finally, by feeding solver results back to the type inference system, we can gradually improve its accuracy. Thus, overall, the method is both sound and complete.

We have implemented our method in a software tool named SCInfer and evaluated it on publicly available benchmarks [26, 27], which implement various cryptographic algorithms such as AES and MAC-Keccak. Our experiments show SCInfer is both effective in obtaining proofs quickly and scalable for handling realistic applications. Specifically, it can resolve most of the verification subproblems using type inference and, as a result, satisfiability (SAT) based analysis needs to be applied to few left-over cases. Only in rare cases, the most heavyweight analysis (SAT#) needs to be invoked.

To sum up, the main contributions of this work are as follows:

  • We propose a new semantic type inference approach for verifying masking countermeasures. It is sound and efficient for obtaining proofs.

  • We propose a method for gradually refining the type inference system using SMT solver based analysis, to ensure the overall method is complete.

  • We implement the proposed techniques in a tool named SCInfer and demonstrate its efficiency and effectiveness on cryptographic benchmarks.

The remainder of this paper is organized as follows. After reviewing the basics in Sect. 2, we present our semantic type inference system in Sect. 3 and our refinement method in Sect. 4. Then, we present our experimental results in Sect. 5 and comparison with related work in Sect. 6. We give our conclusions in Sect. 7.

2 Preliminaries

In this section, we define the type of programs considered in this work and then review the basics of side-channel attacks and masking countermeasures.

2.1 Probabilistic Boolean Programs

Following the notation used in [15, 26, 27], we assume that the program P implements a cryptographic function, e.g., \(\varvec{c}\leftarrow P(\varvec{p},\varvec{k})\) where \(\varvec{p}\) is the plaintext, \(\varvec{k}\) is the secret key and \(\varvec{c}\) is the ciphertext. Inside P, random variable \(\varvec{r}\) may be used to mask the secret key while maintaining the input-output behavior of P. Therefore, P may be viewed as a probabilistic program. Since loops, function calls, and branches may be removed via automated rewriting [26, 27] and integer variables may be converted to bits, for verification purposes, we assume that P is a straight-line probabilistic Boolean program, where each instruction has a unique label and at most two operands.

Fig. 2.
figure 2

An example for masking countermeasure.

Let \(\varvec{k}\) (resp. \(\varvec{r}\)) be the set of secret (resp. random) bits, \(\varvec{p}\) the public bits, and \(\varvec{c}\) the variables storing intermediate results. Thus, the set of variables is \(V=\varvec{k}\cup \varvec{r}\cup \varvec{p}\cup \varvec{c}\). In addition, the program uses a set \(\textsf {op}\) of operators including negation (\(\lnot \)), and (\(\wedge \)), or (\(\vee \)), and exclusive-or (\(\oplus \)). A computation of P is a sequence \(c_1\leftarrow \mathbf i _1(\varvec{p},\varvec{k},\varvec{r});\cdots ; c_n\leftarrow \mathbf i _n(\varvec{p},\varvec{k},\varvec{r})\) where, for each \(1\le i \le n\), the value of \(\mathbf i _i\) is expressed in terms of \(\varvec{p}\), \(\varvec{k}\) and \(\varvec{r}\). Each random bit in \(\varvec{r}\) is uniformly distributed in \(\{0,1\}\); the sole purpose of using them in P is to ensure that \(c_1,\cdots c_n\) are statistically independent of the secret \(\varvec{k}\).

Data Dependency Graph (DDG). Our internal representation of P is a graph \(\mathcal {G}_P=(N,E,\lambda )\), where N is the set of nodes, E is the set of edges, and \(\lambda \) is a labeling function.

  • \(N=L\uplus L_V\), where L is the set of instructions in P and \(L_V\) is the set of terminal nodes: \(l_v\in L_V\) corresponds to a variable or constant \(v\in \varvec{k}\cup \varvec{r}\cup \varvec{p}\cup \{0,1\}\).

  • \(E\subseteq N\times N\) contains edge \((l,l')\) if and only if \(l: c= x \circ y\), where either x or y is defined by \(l'\); or \(l: c= \lnot x\), where x is defined by \(l'\);

  • \(\lambda \) maps each \(l\in N\) to a pair (valop): \(\lambda (l)=(c,\circ )\) for \(l: c= x \circ y\); \(\lambda (l)=(c,\lnot )\) for \(l: c= \lnot x\); and \(\lambda (l)=(v,\bot )\) for each terminal node \(l_v\).

We may use \(\lambda _1(l) = c\) and \(\lambda _2(l)=\circ \) to denote the first and second elements of the pair \(\lambda (l)=(c,\circ )\), respectively. We may also use \(l.{\texttt {lft}}\) to denote the left child of l, and \(l.{\texttt {rgt}}\) to denote the right child if it exists. A subtree rooted at node l corresponds to an intermediate computation result. When the context is clear, we may use the following terms exchangeably: a node l, the subtree T rooted at l, and the intermediate computation result \(c=\lambda _1(l)\). Let |P| denote the total number of nodes in the DDG.

Figure 2 shows an example where \(\varvec{k}=\{k\}\), \(\varvec{r}=\{r_1,r_2,r_3\}\), \(\varvec{c}=\{c_1,c_2,c_3,c_4,c_5,c_6\}\) and \(\varvec{p}=\emptyset \). On the left is a program written in a C-like language except that \(\oplus \) denotes XOR and \(\wedge \) denotes AND. On the right is the DDG, where

$$\begin{array}{rl} c_3&{}=c_2\oplus c_1= (r_1 \oplus r_2)\oplus (k \oplus r_2)= k\oplus r_1 \\ c_4&{}= c_3\oplus c_2= ((r_1 \oplus r_2)\oplus (k \oplus r_2))\oplus (r_1 \oplus r_2)= k\oplus r_2 \\ c_5&{}= c_4 \oplus r_1= (((r_1 \oplus r_2)\oplus (k \oplus r_2))\oplus (r_1 \oplus r_2))\oplus r_1= k\oplus r_1\oplus r_2 \\ c_6&{}= c_5 \wedge r_3= ((((r_1 \oplus r_2)\oplus (k \oplus r_2))\oplus (r_1 \oplus r_2))\oplus r_1)\wedge r_3= (k\oplus r_1\oplus r_2)\wedge r_3\\ \end{array}$$

Let \({\texttt {supp}}:N\rightarrow \varvec{k}\cup \varvec{r}\cup \varvec{p}\) be a function mapping each node l to its support variables. That is, \({\texttt {supp}}(l)=\emptyset \) if \(\lambda _1(l)\in \{0,1\}\); \({\texttt {supp}}(l)=\{x\}\) if \(\lambda _1(l)=x\in \varvec{k}\cup \varvec{r}\cup \varvec{p}\); and \({\texttt {supp}}(l)={\texttt {supp}}(l.{\texttt {lft}})\cup {\texttt {supp}}(l.{\texttt {rgt}})\) otherwise. Thus, the function returns a set of variables that \(\lambda _1(l)\) depends upon structurally.

Given a node l whose corresponding expression e is defined in terms of variables in V, we say that e is semantically dependent on a variable \(r\in V\) if and only if there exist two assignments, \(\pi _1\) and \(\pi _2\), such that \(\pi _1(r) \ne \pi _2(r)\) and \(\pi _1(x)=\pi _2(x)\) for every \(x\in V\setminus \{r\}\), and the values of e differ under \(\pi _1\) and \(\pi _2\).

Let \(\texttt {semd}:N\rightarrow \varvec{r}\) be a function such that semd(l) denotes the set of random variables upon which the expression e of l semantically depends. Thus, \(\texttt {semd}(l)\subseteq {\texttt {supp}}(l)\); and for each \(r\in {\texttt {supp}}(l)\setminus \texttt {semd}(l)\), we know \(\lambda _1(l)\) is semantically independent of r. More importantly, there is often a gap between \({\texttt {supp}}(l)\cap \varvec{r}\) and \(\texttt {semd}(l)\), namely \(\texttt {semd}(l)\subseteq {\texttt {supp}}(l)\cap \varvec{r}\), which is why our gradual refinement of semantic type inference rules can outperform methods based solely on syntactic type inference.

Consider the node \(l_{c_4}\) in Fig. 2: we have \({\texttt {supp}}(l_{c_4})=\{r_1,r_2,k\}\), \(\texttt {semd}(l_{c_4})=\{r_2\}\), and \({\texttt {supp}}(l_{c_4})\cap \varvec{r} = \{r_1,r_2\}\). Furthermore, if the random bits are uniformly distributed in \(\{0,1\}\), then \(c_4\) is both uniformly distributed and secret independent (Sect. 2.2).

2.2 Side-Channel Attacks and Masking

We assume the adversary has access to the public input \(\varvec{p}\) and output \(\varvec{c}\), but not the secret \(\varvec{k}\) and random variable \(\varvec{r}\), of the program \(\varvec{c}\leftarrow P(\varvec{p},\varvec{k})\). However, the adversary may have access to side-channel leaks that reveal the joint distribution of at most d intermediate computation results \(c_1,\cdots c_d\) (e.g., via differential power analysis [39]). Under these assumptions, the goal of the adversary is to deduce information of \(\varvec{k}\). To model the leakage of each instruction, we consider a widely-used, value-based model, called the Hamming Weight (HW) model; other power leakage models such as the transition-based model [5] can be used similarly [6].

Let [n] denote the set \(\{1,\cdots , n\}\) of natural numbers where \(n\ge 1\). We call a set with d elements a d-set. Given values (pk) for \((\varvec{p},\varvec{k})\) and a d-set \(\{c_1,\cdots ,c_d\}\) of intermediate computation results, we use \(\textsf {D}_{p,k}(c_1,\cdots c_d)\) to denote their joint distribution induced by instantiating \(\varvec{p}\) and \(\varvec{k}\) with p and k, respectively. Formally, for each vector of values \(v_1, \cdots ,v_d\) in the probability space \(\{0,1\}^d\), we have \(\textsf {D}_{p,k}(c_1,\cdots c_d)(v_1, \cdots , v_d) =\)

$$\begin{aligned} \frac{ |\{ r \in \{0,1\}^{|\varvec{r}|} \mid v_1= \mathbf i _1(\varvec{p}=p,\varvec{k}=k,\varvec{r}=r), \cdots , v_d= \mathbf i _d(\varvec{p}=p,\varvec{k}=k,\varvec{r}=r)\}| }{2^{|\varvec{r}|}}. \end{aligned}$$

Definition 1

We say a d-set \(\{c_1,\cdots ,c_d\}\) of intermediate computation results is

  • uniformly distributed if \(\textsf {D}_{p,k}(c_1,\cdots ,c_d)\) is a uniform distribution for any p and k.

  • secret independent if \(\textsf {D}_{p,k}(c_1,\cdots ,c_d)=\textsf {D}_{p,k'}(c_1,\cdots ,c_d)\) for any (pk) and \((p,k')\).

Note that there is a difference between them: an uniformly distributed d-set is always secret independent, but a secret independent d-set is not always uniformly distributed.

Definition 2

A program P is order-d perfectly masked if every k-set \(\{c_1,\cdots ,c_k\}\) of P such that \(k\le d\) is secret independent. When P is (order-1) perfectly masked, we may simply say it is perfectly masked.

To decide if P is order-d perfectly masked, it suffices to check if there exist a d-set and two pairs (pk) and \((p,k')\) such that \(\textsf {D}_{p,k}(c_1,\cdots ,c_d)\ne \textsf {D}_{p,k'}(c_1,\cdots ,c_d)\). In this context, the main challenge is computing \(\textsf {D}_{p,k}(c_1,\cdots ,c_d)\) which is essentially a model-counting (SAT#) problem. In the remainder of this paper, we focus on developing an efficient method for verifying (order-1) perfect masking, although our method can be extended to higher-order masking as well.

Gap in Current State of Knowledge. Existing methods for verifying masking countermeasures are either fast but inaccurate, e.g., when they rely solely on syntactic type inference (structural information provided by \({\texttt {supp}}\) in Sect. 2.1) or accurate but slow, e.g., when they rely solely on model-counting. In contrast, our method gradually refines a set of semantic type-inference rules (i.e., using \(\texttt {semd}\) instead of \({\texttt {supp}}\) as defined in Sect. 2.1) where constraint solvers (SAT and SAT#) are used on demand to resolve ambiguity and improve the accuracy of type inference. As a result, it can achieve the best of both worlds.

3 The Semantic Type Inference System

We first introduce our distribution types, which are inspired by prior work in [6, 13, 47], together with some auxiliary data structures; then, we present our inference rules.

3.1 The Type System

Let \({\texttt {T}}=\{\texttt {CST},\texttt {RUD},\texttt {SID},\texttt {NPM},\texttt {UKD}\}\) be the set of distribution types for intermediate computation results, where \({\llbracket {c}\rrbracket }\) denotes the type of \(c\leftarrow \mathbf i (\varvec{p},\varvec{k},\varvec{r})\). Specifically,

  • \({\llbracket {c}\rrbracket }=\texttt {CST}\) means c is a constant, which implies that it is side-channel leak-free;

  • \({\llbracket {c}\rrbracket }=\texttt {RUD}\) means c is randomized to uniform distribution, and hence leak-free;

  • \({\llbracket {c}\rrbracket }=\texttt {SID}\) means c is secret independent, i.e., perfectly masked;

  • \({\llbracket {c}\rrbracket }=\texttt {NPM}\) means c is not perfectly masked and thus has leaks; and

  • \({\llbracket {c}\rrbracket }=\texttt {UKD}\) means c has an unknown distribution.

Definition 3

Let \(\texttt {unq}: N \rightarrow \varvec{r}\) and \(\texttt {dom}: N \rightarrow \varvec{r}\) be two functions such that (i) for each terminal node \(l\in L_V\), if \(\lambda _1(l)\in \varvec{r}\), then \(\texttt {unq}(l)=\texttt {dom}(l)=\lambda _1(l)\); otherwise \(\texttt {unq}(l)=\texttt {dom}(l)={\texttt {supp}}(l)=\emptyset \); and (ii) for each internal node \(l\in L\), we have

  • \(\texttt {unq}(l)=(\texttt {unq}(l.{\texttt {lft}})\cup \texttt {unq}(l.{\texttt {rgt}}))\setminus ({\texttt {supp}}(l.{\texttt {lft}})\cap {\texttt {supp}}(l.{\texttt {rgt}}))\);

  • \(\texttt {dom}(l)= (\texttt {dom}(l.{\texttt {lft}})\cup \texttt {dom}(l.{\texttt {rgt}}))\cap \texttt {unq}(l) \ \text{ if } \ \lambda _2(l)=\oplus \); but \(\texttt {dom}(l)= \emptyset \) otherwise.

Both \(\texttt {unq}(l)\) and \(\texttt {dom}(l)\) are computable in time that is linear in |P| [47]. Following the proofs in [6, 47], it is easy to reach this observation: Given an intermediate computation result \(c\leftarrow \mathbf i (\varvec{p},\varvec{k},\varvec{r})\) labeled by l, the following statements hold:

  1. 1.

    if \(|\texttt {dom}(l)|\ne \emptyset \), then \({\llbracket {c}\rrbracket }=\texttt {RUD}\);

  2. 2.

    if \({\llbracket {c}\rrbracket }=\texttt {RUD}\), then \({\llbracket {\lnot c}\rrbracket }=\texttt {RUD}\); if \({\llbracket {c}\rrbracket }=\texttt {SID}\), then \({\llbracket {\lnot c}\rrbracket }=\texttt {SID}\).

  3. 3.

    if \(r\not \in \texttt {semd}(l)\) for a random bit \(r\in \varvec{r}\), then \({\llbracket {r\oplus c}\rrbracket }=\texttt {RUD}\);

  4. 4.

    for every \(c'\leftarrow \mathbf i '(\varvec{p},\varvec{k},\varvec{r})\) labeled by \(l'\), if \(\texttt {semd}(l)\cap \texttt {semd}(l')=\emptyset \) and \({\llbracket {c}\rrbracket }={\llbracket {c'}\rrbracket }=\texttt {SID}\), then \({\llbracket {c \circ c'}\rrbracket }=\texttt {SID}\).

Figure 3 shows our type inference rules that concretize these observations. When multiple rules could be applied to a node \(l\in N\), we always choose the rules that can lead to \({\llbracket {l}\rrbracket }=\texttt {RUD}\). If no rule is applicable at l, we set \({\llbracket {l}\rrbracket }=\texttt {UKD}\). When the context is clear, we may use \({\llbracket {l}\rrbracket }\) and \({\llbracket {c}\rrbracket }\) exchangeably for \(\lambda _1(l)=c\). The correctness of these inference rules is obvious by definition.

Fig. 3.
figure 3

Our semantic type-inference rules. The \(\texttt {NPM}\) type is not yet used here; its inference rules will be added in Fig. 4 since they rely on the SMT solver based analyses.

Theorem 1

For every intermediate computation result \(c\leftarrow \mathbf i (\varvec{p},\varvec{k},\varvec{r})\) labeled by l,

  • if \({\llbracket {c}\rrbracket }=\texttt {RUD}\), then c is uniformly distributed, and hence perfectly masked;

  • if \({\llbracket {c}\rrbracket }=\texttt {SID}\), then c is guaranteed to be perfectly masked.

To improve efficiency, our inference rules may be applied twice, first using the \({\texttt {supp}}\) function, which extracts structural information from the program (cf. Sect. 2.1) and then using the \(\texttt {semd}\) function, which is slower to compute but also significantly more accurate. Since \(\texttt {semd}(l)\subseteq {\texttt {supp}}(l)\) for all \(l\in N\), this is always sound. Moreover, type inference is invoked for the second time only if, after the first time, \({\llbracket {l}\rrbracket }\) remains \(\texttt {UKD}\).

Example 1

When using type inference with \({\texttt {supp}}\) on the running example, we have

$$\begin{aligned} {\llbracket {r_1}\rrbracket }={\llbracket {r_2}\rrbracket }={\llbracket {r_3}\rrbracket }={\llbracket {c_1}\rrbracket }={\llbracket {c_2}\rrbracket }={\llbracket {c_3}\rrbracket }=\texttt {RUD}, \ {\llbracket {k}\rrbracket }={\llbracket {c_4}\rrbracket }={\llbracket {c_5}\rrbracket }={\llbracket {c_6}\rrbracket }=\texttt {UKD}\end{aligned}$$

When using type inference with \(\texttt {semd}\) (for the second time), we have

$$\begin{aligned} {\llbracket {r_1}\rrbracket }={\llbracket {r_2}\rrbracket }={\llbracket {r_3}\rrbracket }= {\llbracket {c_1}\rrbracket }={\llbracket {c_2}\rrbracket }={\llbracket {c_3}\rrbracket }={\llbracket {c_4}\rrbracket }={\llbracket {c_5}\rrbracket }=\texttt {RUD}, \ {\llbracket {k}\rrbracket }=\texttt {UKD}, \ {\llbracket {c_6}\rrbracket }=\texttt {SID}\end{aligned}$$

3.2 Checking Semantic Independence

Unlike \({\texttt {supp}}(l)\), which only extracts structural information from the program and hence may be computed syntactically, \(\texttt {semd}(l)\) is more expensive to compute. In this subsection, we present a method that leverages the SMT solver to check, for any intermediate computation result \(c\leftarrow \mathbf i (\varvec{p},\varvec{k},\varvec{r})\) and any random bit \(r\in \varvec{r}\), whether c is semantically dependent of r. Specifically, we formulate it as a satisfiability (SAT) problem (formula \(\varPhi _s\)) defined as follows:

$$\begin{aligned} \varTheta _{s}^{r=0}(c_0, \varvec{p},\varvec{k},\varvec{r}\setminus \{r\})\wedge \varTheta _{s}^{r=1}(c_1, \varvec{p},\varvec{k}, \varvec{r}\setminus \{r\})\wedge \varTheta _{s}^{\ne }(c_0,c_1), \end{aligned}$$

where \(\varTheta _{s}^{r=0}\) (resp. \(\varTheta _{s}^{r=1}\)) encodes the relation \(\mathbf i (\varvec{p},\varvec{k},\varvec{r})\) with r replaced by 0 (resp. 1), \(c_0\) and \(c_1\) are copies of c and \(\varTheta _{s}^{\ne }\) asserts that the outputs differ even under the same inputs.

In logic synthesis and optimization, when \(r\not \in \texttt {semd}(l)\), r will be called the don’t care variable [36]. Therefore, it is easy to see why the following theorem holds.

Theorem 2

\(\varPhi _{s}\) is unsatisfiable iff the value of r does not affect the value of c, i.e., c is semantically independent of r. Moreover, the formula size of \(\varPhi _{s}\) is linear in |P|.

Fig. 4.
figure 4

Our composition rules for handling sets of intermediate computation results.

3.3 Verifying Higher-Order Masking

The type system so far targets first-order masking. We now outline how it extends to verify higher-order masking. Generally speaking, we have to check, for any k-set \(\{c_1,\cdots ,c_k\}\) of intermediate computation results such that \(k\le d\), the joint distribution is either randomized to uniform distribution (\(\texttt {RUD}\)) or secret independent (\(\texttt {SID}\)).

To tackle this problem, we lift \({\texttt {supp}}\), \(\texttt {semd}\), \(\texttt {unq}\), and \(\texttt {dom}\) to sets of computation results as follows: for each k-set \(\{c_1,\cdots ,c_k\}\),

  • \({\texttt {supp}}(c_1,\cdots ,c_k) =\bigcup _{i\in [k]}{\texttt {supp}}(c_i)\);

  • \(\texttt {semd}(c_1,\cdots ,c_k) =\bigcup _{i\in [k]}\texttt {semd}(c_i)\);

  • \(\texttt {unq}(c_1,\cdots ,c_k) =\big (\bigcup _{i\in [k]}\texttt {unq}(c_i)\big )\setminus \bigcup _{i,j\in [k]}\big ({\texttt {supp}}(c_i)\cap {\texttt {supp}}(c_j)\big )\); and

  • \(\texttt {dom}(c_1,\cdots ,c_k) =\big (\bigcup _{i\in [k]}\texttt {dom}(c_i)\big )\cap \texttt {unq}(c_1,\cdots ,c_k)\).

Our inference rules are extended by adding the composition rules shown in Fig. 4.

Theorem 3

For every k-set \(\{c_1,\cdots ,c_k\}\) of intermediate computations results,

  • if \({\llbracket {c_1,\cdots ,c_k}\rrbracket }=\texttt {RUD}\), then \(\{c_1,\cdots ,c_k\}\) is guaranteed to be uniformly distributed, and hence perfectly masked;

  • if \({\llbracket {c_1,\cdots ,c_k}\rrbracket }=\texttt {SID}\), then \(\{c_1,\cdots ,c_k\}\) is guaranteed to be perfectly masked.

We remark that the \(\texttt {semd}\) function in these composition rules could also be safely replaced by the \({\texttt {supp}}\) function, just as before. Furthermore, to more efficiently verify that program P is perfect masked against order-d attacks, we can incrementally apply the type inference for each k-set, where \(k=1,2,\ldots , d\).

4 The Gradual Refinement Approach

In this section, we present our method for gradually refining the type inference system by leveraging SMT solver based techniques. Adding solvers to the sound type system makes it complete as well, thus allowing it to detect side-channel leaks whenever they exist, in addition to proving the absence of such leaks.

4.1 SMT-Based Approach

For a given computation \(c\leftarrow \mathbf i (\varvec{p},\varvec{k},\varvec{r})\), the verification of perfect masking (Definition 2) can be reduced to the satisfiability of the logical formula \((\varPsi )\) defined as follows:

$$\begin{aligned} \exists \varvec{p}.\exists \varvec{k}.\exists \varvec{k}'. \big ( \mathop {\sum }\nolimits _{\varvec{v}_r\in \{0,1\}^{|\varvec{r}|}}\mathbf i (\varvec{p},\varvec{k},\varvec{v}_r) \ne \mathop {\sum }\nolimits _{\varvec{v}_r\in \{0,1\}^{|\varvec{r}|}}\mathbf i (\varvec{p},\varvec{k}',\varvec{v}_r) \big ). \end{aligned}$$

Intuitively, given values \((\varvec{v}_p,\varvec{v}_k)\) of \((\varvec{p},\varvec{k})\), \(count=\sum _{\varvec{v}_r\in \{0,1\}^{|\varvec{r}|}}\mathbf i (\varvec{v}_p,\varvec{v}_k,\varvec{v}_r)\) denotes the number of assignments of the random variable \(\varvec{r}\) under which \(\mathbf i (\varvec{v}_p,\varvec{v}_k,\varvec{r})\) is evaluated to logical 1. When random bits in \(\varvec{r}\) are uniformly distributed in the domain \(\{0,1\}\), \(\frac{count}{2^{|\varvec{r}|}}\) is the probability of \(\mathbf i (\varvec{v}_p,\varvec{v}_k,\varvec{r})\) being logical 1 for the given pair \((\varvec{v}_p,\varvec{v}_k)\). Therefore, \(\varPsi \) is unsatisfiable if and only if c is perfectly masked.

Following Eldib et al. [26, 27], we encode the formula \(\varPsi \) as a quantifier-free first-order logic formula to be solved by an off-the-shelf SMT solver (e.g., Z3):

$$\begin{aligned} (\mathop {\bigwedge }\nolimits _{r=0}^{2^{|\varvec{r}|-1}}\varTheta _k^r)\wedge (\mathop {\bigwedge }\nolimits _{r=0}^{2^{|\varvec{r}|-1}}\varTheta _{k'}^r)\wedge \varTheta _{b2i}\wedge \varTheta _{\ne } \end{aligned}$$
  • \(\varTheta _k^v\) (resp. \(\varTheta _{k'}^v\)) for each \(r\in \{0,\cdots , 2^{|\varvec{r}|-1}\}\): encodes a copy of the input-output relation of \(\mathbf i (\varvec{p},\varvec{k},r)\) (resp. \(\mathbf i (\varvec{p},\varvec{k'},r)\)) by replacing \(\varvec{r}\) with concrete values r. There are \(2^{|\varvec{r}|}\) distinct copies, but share the same plaintext \(\varvec{p}\).

  • \(\varTheta _{b2i}\): converts Boolean outputs of these copies to integers (true becomes 1 and false becomes 0) so that the number of assignments can be counted.

  • \(\varTheta _{\ne }\): asserts the two summations, for \(\varvec{k}\) and \(\varvec{k'}\), differ.

Example 2

In the running example, for instance, verifying whether node \(c_4\) is perfectly masked requires the SMT-based analysis. For brevity, we omit the detailed logical formula while pointing out that, by invoking the SMT solver six times, one can get the following result: \({\llbracket {c_1}\rrbracket }={\llbracket {c_2}\rrbracket }={\llbracket {c_3}\rrbracket } ={\llbracket {c_4}\rrbracket }={\llbracket {c_5}\rrbracket }={\llbracket {c_6}\rrbracket }=\texttt {SID}\).

Although the SMT formula size is linear in |P|, the number of distinct copies is exponential of the number of random bits used in the computation. Thus, the approach cannot be applied to large programs. To overcome the problem, incremental algorithms [26, 27] were proposed to reduce the formula size using partitioning and heuristic reduction.

Incremental SMT-Based Algorithm. Given a computation \(c\leftarrow \mathbf i (\varvec{p},\varvec{k},\varvec{r})\) that corresponds to a subtree T rooted at l in the DDG, we search for an internal node \(l_s\) in T (a cut-point) such that \(\texttt {dom}(l_s)\cap \texttt {unq}(l)\ne \emptyset \). A cut-point is maximal if there is no other cut-point from l to \(l_s\). Let \(\widehat{T}\) be the simplified tree obtained from T by replacing every subtree rooted by a maximal cut-point with a random variable from \(\texttt {dom}(l_s)\cap \texttt {unq}(l)\). Then, \(\widehat{T}\) is \(\texttt {SID}\) iff T is \(\texttt {SID}\).

The main observation is that: if \(l_s\) is a cut-point, there is a random variable \(r\in \texttt {dom}(l_s)\cap \texttt {unq}(l)\), which implies \(\lambda _1(l_s)\) is \(\texttt {RUD}\). Here, \(r\in \texttt {unq}(l)\) implies \(\lambda _1(l_s)\) can be seen as a fresh random variable when we evaluate l. Consider the node \(c_3\) in our running example: it is easy to see \(r_1\in \texttt {dom}(c_2)\cap \texttt {unq}(c_3)\). Therefore, for the purpose of verifying \(c_3\), the entire subtree rooted at \(c_2\) can be replaced by the random variable \(r_1\).

In addition to partitioning, heuristics rules [26, 27] can be used to simplify SMT solving. (1) When constructing formula \(\varPhi \) of c, all random variables in \({\texttt {supp}}(l)\setminus \texttt {semd}(l)\), which are don’t cares, can be replaced by constant 1 or 0. (2) The No-Key and Sid rules in Fig. 3 with the \({\texttt {supp}}\) function are used to skip some checks by SMT.

Example 3

When applying incremental SMT-based approach to our running example, \(c_1\) has to be decided by SMT, but \(c_2\) is skipped due to No-Key rule.

As for \(c_3\), since \(r_1\in \texttt {dom}(c_2)\cap \texttt {unq}(c_3)\), \(c_2\) is a cut-point and the subtree rooted at \(c_2\) can be replaced by \(r_1\), leading to the simplified computation \(r_1\oplus ( r_2\oplus k)\) – subsequently it is skipped by the Sid rule with \({\texttt {supp}}\). Note that the above Sid rule is not applicable to the original subtree, because \(r_2\) occurs in the support of both children of \(c_3\).

There is no cut-point for \(c_4\), so it is checked using the SMT solver. But since \(c_4\) is semantically independent of \(r_1\) (a don’t care variable), to reduce the SMT formula size, we replace \(r_1\) by 1 (or 0) when constructing the formula \(\varPhi \).

Fig. 5.
figure 5

Complementary rules used during refinement of the type inference (Fig. 3).

Fig. 6.
figure 6

Example for feeding back.

4.2 Feeding SMT-Based Analysis Results Back to Type System

Consider a scenario where initially the type system (cf. Sect. 3) failed to resolve a node l, i.e., \({\llbracket {l}\rrbracket }=\texttt {UKD}\), but the SMT-based approach resolved it as either \(\texttt {NPM}\) or \(\texttt {SID}\). Such results should be fed back to improve the type system, which may lead to the following two favorable outcomes: (1) marking more nodes as perfectly masked (\(\texttt {RUD}\) or \(\texttt {SID}\)) and (2) marking more nodes as leaky (\(\texttt {NPM}\)), which means we can avoid expensive SMT calls for these nodes. More specifically, if SMT-based analysis shows that l is perfectly masked, the type of l can be refined to \({\llbracket {l}\rrbracket }=\texttt {SID}\); feeding it back to the type system allows us to infer more types for nodes that structurally depend on l.

On the other hand, if SMT-based analysis shows l is not perfectly masked, the type of l can be refined to \({\llbracket {l}\rrbracket }=\texttt {NPM}\); feeding it back allows the type system to infer that other nodes may be \(\texttt {NPM}\) as well. To achieve what is outlined in the second case above, we add the \(\texttt {NPM}\)-related type inference rules shown in Fig. 5. When they are added to the type system outlined in Fig. 3, more \(\texttt {NPM}\) type nodes will be deduced, which allows our method to skip the (more expensive) checking of \(\texttt {NPM}\) using SMT.

Example 4

Consider the example DDG in Fig. 6. By applying the original type inference approach with either \({\texttt {supp}}\) or \(\texttt {semd}\), we have

$$\begin{aligned} {\llbracket {c_1}\rrbracket }={\llbracket {c_4}\rrbracket }=\texttt {RUD}, \ {\llbracket {c_2}\rrbracket }={\llbracket {c_3}\rrbracket }={\llbracket {c_6}\rrbracket }=\texttt {SID}, \ {\llbracket {c_5}\rrbracket }={\llbracket {c_7}\rrbracket }=\texttt {UKD}. \end{aligned}$$

In contrast, by applying SMT-based analysis to \(c_5\), we can deduce \({\llbracket {c_5}\rrbracket }=\texttt {SID}\). Feeding \({\llbracket {c_5}\rrbracket }=\texttt {SID}\) back to the original type system, and then applying the Sid rule to \(c_7=c_5\oplus c_6\), we are able to deduce \({\llbracket {c_7}\rrbracket }=\texttt {SID}\). Without refinement, this was not possible.

4.3 The Overall Algorithm

figure b
figure c

Having presented all the components, we now present the overall procedure, which integrates the semantic type system and SMT-based method for gradual refinement. Algorithm 1 shows the pseudo code. Given the program P, the sets of public (\(\varvec{p}\)), secret (\(\varvec{k}\)), random (\(\varvec{r}\)) variables and an empty map \(\pi \), it invokes SCInfer(\(P,\varvec{p},\varvec{k},\varvec{r},\pi \)) to traverse the DDG in a topological order and annotate every node l with a distribution type from \({\texttt {T}}\). The subroutine TypeInfer implements the type inference rules outlined in Figs. 3 and 5, where the parameter f can be either \({\texttt {supp}}\) or \(\texttt {semd}\).

SCInfer first deduces the type of each node \(l\in N\) by invoking TypeInfer with \(f={\texttt {supp}}\). Once a node l is annotated as \(\texttt {UKD}\), a simplified subtree \(\widehat{P}\) of the subtree rooted at l is constructed. Next, TypeInfer with \(f=\texttt {semd}\) is invoked to resolve the \(\texttt {UKD}\) node in \(\widehat{P}\). If \(\pi (l)\) becomes non-\(\texttt {UKD}\) afterward, TypeInfer with \(f={\texttt {supp}}\) is invoked again to quickly deduce the types of the fan-out nodes in P. But if \(\pi (l)\) remains \(\texttt {UKD}\), SCInfer invokes the incremental SMT-based approach to decide whether l is either \(\texttt {SID}\) or \(\texttt {NPM}\). This is sound and complete, unless the SMT solver runs out of time/memory, in which case \(\texttt {UKD}\) is assigned to l.

Theorem 4

For every intermediate computation result \(c\leftarrow \mathbf i (\varvec{p},\varvec{k},\varvec{r})\) labeled by l, our method in SCInfer guarantees to return sound and complete results:

  • \(\pi (l)=\texttt {RUD}\) iff c is uniformly distributed, and hence perfectly masked;

  • \(\pi (l)=\texttt {SID}\) iff c is secret independent, i.e., perfectly masked;

  • \(\pi (l)=\texttt {NPM}\) iff c is not perfectly masked (leaky);

If timeout or memory out is used to bound the execution of the SMT solver, it is also possible that \(\pi (l)=\texttt {UKD}\), meaning c has an unknown distribution (it may or may not be perfectly masked). It is interesting to note that, if we regard \(\texttt {UKD}\) as potential leak and at the same time. bound (or even disable) SMT-based analysis, Algorithm 1 degenerates to a sound type system that is both fast and potentially accurate.

Table 1. Benchmark statistics.

5 Experiments

We have implemented our method in a verification tool named SCInfer, which uses Z3 [23] as the underlying SMT solver. We also implemented the syntactic type inference approach [47] and the incremental SMT-based approach [26, 27] in the same tool for experimental comparison purposes. We conducted experiments on publicly available cryptographic software implementations, including fragments of AES and MAC-Keccak [26, 27]. Our experiments were conducted on a machine with 64-bit Ubuntu 12.04 LTS, Intel Xeon(R) CPU E5-2603 v4, and 32 GB RAM.

Overall, results of our experiments show that (1) SCInfer is significantly more accurate than prior syntactic type inference method [47]; indeed, it solved tens of thousand of \(\texttt {UKD}\) cases reported by the prior technique; (2) SCInfer is at least twice faster than prior SMT-based verification method [26, 27] on the large programs while maintaining the same accuracy; for example, SCInfer verified the benchmark named P12 in a few seconds whereas the prior SMT-based method took more than an hour.

5.1 Benchmarks

Table 1 shows the detailed statistics of the benchmarks, including seventeen examples (P1–P17), all of which have nonlinear operations. Columns 1 and 2 show the name of the program and a short description. Column 3 shows the number of instructions in the probabilistic Boolean program. Column 4 shows the number of DDG nodes denoting intermediate computation results. The remaining columns show the number of bits in the secret, public, and random variables, respectively. Remark that the number of random variables in each computation is far less than the one of the program. All these programs are transformed into Boolean programs where each instruction has at most two operands. Since the statistics were collected from the transformed code, they may have minor differences from statistics reported in prior work [26, 27].

In particular, P1–P5 are masking examples originated from [10], P6–P7 are originated from [15], P8–P9 are the MAC-Keccak computation reordered examples originated from [11], P10–P11 are two experimental masking schemes for the Chi function in MAC-Keccak. Among the larger programs, P12–P17 are the regenerations of MAC-Keccak reference code submitted to the SHA-3 competition held by NIST, where P13–P16 implement the masking of Chi functions using different masking schemes and P17 implements the de-masking of Pi function.

Table 2. Experimental results: comparison of three approaches.

5.2 Experimental Results

We compared the performance of SCInfer, the purely syntactic type inference method (denoted Syn. Infer) and the incremental SMT-based method (denoted by SMT App). Table 2 shows the results. Column 1 shows the name of each benchmark. Column 2 shows whether it is perfectly masked (ground truth). Columns 3–4 show the results of the purely syntactic type inference method, including the number of nodes inferred as \(\texttt {UKD}\) type and the time in seconds. Columns 5–7 (resp. Columns 8–10) show the results of the incremental SMT-based method (resp. our method SCInfer), including the number of leaky nodes (\(\texttt {NPM}\) type), the number of nodes actually checked by SMT, and the time.

Compared with syntactic type inference method, our approach is significantly more accurate (e.g., see P4, P5 and P15). Furthermore, the time taken by both methods are comparable on small programs. On the large programs that are not perfectly masked (i.e., P13–P17), our method is slower since SCInfer has to resolve the \(\texttt {UKD}\) nodes reported by syntactic inference by SMT. However, it is interesting to note that, on the perfectly masked large program (P12), our method is faster.

Moreover, the \(\texttt {UKD}\) type nodes in P4, reported by the purely syntactic type inference method, are all proved to be perfectly masked by our semantic type inference system, without calling the SMT solver at all. As for the three \(\texttt {UKD}\) type nodes in P5, our method proves them all by invoking the SMT solver only twice; it means that the feedback of the new \(\texttt {SID}\) types (discovered by SMT) allows our type system to improve its accuracy, which turns the third \(\texttt {UKD}\) node to \(\texttt {SID}\).

Finally, compared with the original SMT-based approach, our method is at least twice faster on the large programs (e.g., P12–P17). Furthermore, the number of nodes actually checked by invoking the SMT solver is also lower than in the original SMT-based approach (e.g., P4–P5, and P17). In particular, there are 3200 \(\texttt {UKD}\) type nodes in P17, which are refined into \(\texttt {NPM}\) type by our new inference rules (cf. Fig. 5), and thus avoid the more expensive SMT calls.

To sum up, results of our experiments show that: SCInfer is fast in obtaining proofs in perfectly-masked programs, while retaining the ability to detect real leaks in not-perfectly-masked programs, and is scalable for handling realistic applications.

Table 3. Detailed statistics of our new method.

5.3 Detailed Statistics

Table 3 shows the more detailed statistics of our approach. Specifically, Columns 2–5 show the number of nodes in each distribution type deduced by our method. Column 6 shows the number of nodes actually checked by SMT, together with the time shown in Column 9. Column 7 shows the time spent on computing the \(\texttt {semd}\) function, which solves the SAT problem. Column 8 shows the time spent on computing the don’t care variables. The last column shows the total time taken by SCInfer.

Results in Table 3 indicate that most of the DDG nodes in these benchmark programs are either \(\texttt {RUD}\) or \(\texttt {SID}\), and almost all of them can be quickly deduced by our type system. It explains why our new method is more efficient than the original SMT-based approach. Indeed, the original SMT-based approach spent a large amount of time on the static analysis part, which does code partitioning and applies the heuristic rules (cf. Sect. 4.1), whereas our method spent more time on computing the \(\texttt {semd}\) function.

Column 4 shows that, at least in these benchmark programs, Boolean constants are rare. Columns 5–6 show that, if our refined type system fails to prove perfect masking, it is usually not perfectly masked. Columns 7–9 show that, in our integrated method, most of the time is actually used to compute \(\texttt {semd}\) and don’t care variables (SAT), while the time taken by the SMT solver to conduct model counting (SAT#) is relatively small.

6 Related Work

Many masking countermeasures [15, 17, 34, 37, 41, 43, 46, 48, 50,51,52] have been published over the years: although they differ in adversary models, cryptographic algorithms and compactness, a common problem is the lack of efficient tools to formally prove their correctness [21, 22]. Our work aims to bridge the gap. It differs from simulation-based techniques [3, 33, 53] which aim to detect leaks only as opposed to prove their absence. It also differs from techniques designed for other types of side channels such as timing [2, 38], fault [12, 29] and cache [24, 35, 40], or computing security bounds for probabilistic countermeasures against remote attacks [45].

Although some verification tools have been developed for this application [6, 7, 10, 13, 14, 20, 26, 27, 47], they are either fast but inaccurate (e.g., type-inference techniques) or accurate but slow (e.g., model-counting techniques). For example, Bayrak et al. [10] developed a leak detector that checks if a computation result is logically dependent of the secret and, at the same time, logically independent of any random variable. It is fast but not accurate in that many leaky nodes could be incorrectly proved [26, 27]. In contrast, the model-counting based method proposed by Eldib et al. [26,27,28] is accurate, but also significantly less scalable because the size of logical formulas they need to build are exponential in the number of random variables. Moreover, for higher-order masking, their method is still not complete.

Our gradual refinement of a set of semantic type inference rules were inspired by recent work on proving probabilistic non-interference [6, 47], which exploit the unique characteristics of invertible operations. Similar ideas were explored in [7, 14, 20] as well. However, these prior techniques differ significantly from our method because their type-inference rules are syntactic and fixed, whereas ours are semantic and refined based on SMT solver based analysis (SAT and SAT#). In terms of accuracy, numerous unknowns occurred in the experimental results of [47] and two obviously perfect masking cases were not proved in [6]. Finally, although higher-order masking were addressed by prior techniques [13], they were limited to linear operations, whereas our method can handle both first-order and higher-order masking with non-linear operations.

An alternative way to address the model-counting problem [4, 18, 19, 32] is to use satisfiability modulo counting, which is a generalization of the satisfiability problem of SMT extended with counting constraints [31]. Toward this end, Fredrikson and Jha [31] have developed an efficient decision procedure for linear integer arithmetic (LIA) based on Barvinok’s algorithm [8] and also applied their approach to differential privacy.

Another related line of research is automatically synthesizing countermeasures [1, 7, 9, 16, 25, 44, 54] as opposed to verifying them. While methods in [1, 7, 9, 44] rely on compiler-like pattern matching, the ones in [16, 25, 54] use inductive program synthesis based on the SMT approach. These emerging techniques, however, are orthogonal to our work reported in this paper. It would be interesting to investigate whether our approach could aid in the synthesis of masking countermeasures.

7 Conclusions and Future Work

We have presented a refinement based method for proving that a piece of cryptographic software code is free of power side-channel leaks. Our method relies on a set of semantic inference rules to reason about distribution types of intermediate computation results, coupled with an SMT solver based procedure for gradually refining these types to increase accuracy. We have implemented our method and demonstrated its efficiency and effectiveness on cryptographic benchmarks. Our results show that it outperforms state-of-the-art techniques in terms of both efficiency and accuracy.

For future work, we plan to evaluate our type inference systems for higher-order masking, extend it to handle integer programs as opposed to bit-blasting them to Boolean programs, e.g., using satisfiability modulo counting [31], and investigate the synthesis of masking countermeasures based on our new verification method.