1 Introduction

1.1 Helper Data Systems

The past decade has seen a lot of interest in the field of security with noisy data. In several security applications it is necessary to reproducibly extract secret data from noisy measurements on a physical system. One such application is read-proof storage of cryptographic keys using physical unclonable functions (PUFs) [5, 16, 18,19,20]. Another application is the privacy-preserving storage of biometric data.

Storage of keys in nonvolatile digital memory can often be considered insecure because of the vulnerability to physical attacks. (For instance, fuses can be optically inspected with a microscope; flash memory may be removed and read out.) PUFs provide an alternative way to store keys, namely in analog form, which allows the designer to exploit the inscrutability of analog physical behavior. Keys stored in this way are sometimes referred to as Physically Obfuscated Keys (POKs) [12]. In both the biometrics and the PUF/POK case, one faces the problem that some form of error correction has to be performed, but under the constraint that the redundancy data, which are visible to attackers, do not endanger the secret extracted from the physical measurement. This problem is solved by a special security primitive, the Helper Data System (HDS). A HDS in its most general form is shown in Fig. 1. The Gen procedure takes as input a measurement X. It outputs a secret S and (public) Helper Data W. The helper data are stored. In the reconstruction phase, a fresh measurement \(X'\) is obtained. Typically, \(X'\) is a noisy version of X, close to X (in terms of, e.g., Euclidean distance or Hamming distance) but not necessarily identical. The Rec (reconstruction) procedure takes \(X'\) and W as input. It outputs \(\hat{S}\), an estimate of S. If \(X'\) is not too noisy then \(\hat{S}=S\).

Two special cases of the general HDS are the Secure Sketch (SS) and the fuzzy extractor (FE) [10]. The Secure Sketch has \(S=X\) (and \(\hat{S}=\hat{X}\), an estimator for X). If X is not uniformly distributed, then S is not uniform. The SS is suitable for privacy-preserving biometrics, where high entropy of S (given W) is required, but not uniformity. The fuzzy extractor is required to have a (nearly) uniform S given W. The FE is typically used for extracting keys from PUFs and POKs.

1.2 The code offset method (COM)

By way of example we briefly present the code offset method [4, 9, 10, 14, 21], the oldest and most well-known HDS. The COM makes use of a linear code \({\mathcal {C}}\) with k-bit messages and n-bit codewords. The syndrome function is denoted as \(\mathtt{Syn}: \{0,1\} ^n\rightarrow \{0,1\} ^{n-k}\). The code supports a syndrome decoding algorithm \(\mathtt{SynDec}:\) \( \{0,1\} ^{n-k}\) \(\rightarrow \{0,1\} ^n\) that maps syndromes to error patterns. Use is also made of a key derivation function \(\mathtt{KeyDeriv}: \{0,1\} ^n\times \{0,1\} ^*\rightarrow \{0,1\} ^\ell \), with \(\ell \le k\). Below we show how the COM can be used as a FE for either uniform or non-uniform source X.

Fig. 1
figure 1

Data flow in a generic Helper Data System

\({ {Enrollment:}}\)

The enrollment measurement gives \(X\in \{0,1\} ^n\). The helper data are computed as \(W=\mathtt{Syn}\,X\). The key is computed as \(S=\mathtt{KeyDeriv}(X,R)\), where R is (optional) public randomness. The W and R are stored.

\({ {Reconstruction:}}\)

A fresh measurement of the PUF gives \(X'\in \{0,1\} ^n\). The estimator for X is computed as

$$\begin{aligned} \hat{X}=X'\oplus \mathtt{SynDec}(W\oplus \mathtt{Syn}\,X'), \end{aligned}$$
(1)

and the reconstructed key is \(\hat{S}=\mathtt{KeyDeriv}(\hat{X},R)\).

1.3 The problem of bias

After seeing the helper data W as specified above, an attacker has uncertainty \(\mathsf{H}(X|W)\) about X.Footnote 1 As W is a function of X, we have \(\mathsf{H}(X|W)=\mathsf{H}(X|\mathtt{Syn}\, X)=\mathsf{H}(X)-\mathsf{H}(\mathtt{Syn}\,X)\). Let us consider the simplest possible noise model, the binary symmetric channel (BSC). In the case of a BSC with bit error rate \(\beta \), the code’s redundancy has to satisfy \(n-k\ge n h(\beta )\) in order for the error correction to work. (Here, h denotes the binary entropy function, \(h(\beta )= -\beta \log \beta -(1-\beta )\log (1-\beta )\), with ‘log’ the base 2 logarithm.) More generally, \(\mathsf{H}(\mathtt{Syn}\,X)\) has at least to be equal to the entropy of the noise. Hence, the COM helper data in the BSC case leak at least \(nh(\beta )\) bits of information about X. This becomes a problem when X itself does not have much entropy, which occurs for instance if the bits in X are highly biased [13, 15]. Note that the problem is not fundamental: if the bias parameter (see Sect. 2) is denoted as p, the secrecy capacity is \(nh(p+\beta -2p\beta )-nh(\beta )\), which is positive.

A solution was proposed by Maes et al. [17]. Their approach is to combine debiasing with error correction. For the debiasing they use the von Neumann algorithm in a single-pass or multi-pass manner. Their helper data comprise a selection ‘mask’ that helps the reconstruction algorithm to identify the locations where von Neumann should yield output.

In this paper we follow a simpler approach similar to the Index-Based Syndrome (IBS) method [22]. In IBS the helper data consist of an ordered list of pointers to locations in X; the content of X in those locations together forms a codeword.

1.4 Contributions and outline

We introduce an alternative solution to the bias problem in helper data systems. We follow the condense-then-fuzzy-extract philosophy proposed by Canetti et al. [7] as one of the available options when faced with ‘more noise than entropy’ scenarios. Condensing means mapping the source variable X to a smaller space such that most of the entropy of X is retained, but the noise entropy is greatly reduced. Our way of condensing the source is to restrict X to the bit positions \({\mathcal {U}}\), with \({\mathcal {U}}\subset \{1,\ldots ,n\}\) a random subset containing all the rare symbols. The set \({\mathcal {U}}\) becomes part of the helper data.

Our \({\mathcal {U}}\) bears some similarity to the von Neumann mask in [17], but there are important differences. (i) The size of \({\mathcal {U}}\) is tunable. (ii) We can extract source information based on the legitimate party’s ability to distinguish \({\mathcal {U}}\) from fake instances of \({\mathcal {U}}\) when a ‘spamming’ technique similar to [21] is applied.

The outline of this paper is as follows. Section 2 gives the details of the scheme. Section 3 analyzes the extractable entropy and the practicality of the ‘spamming’ option. In Sect. 5 we make some remarks on the use of min-entropy. Section 6 summarizes and suggests future work.

2 Debiasing based on subset selection

We will use the following notation. The set \(\{1,\ldots ,n\}\) is written as [n]. The notation \(X_{\mathcal {U}}\) means X restricted to the positions specified in \({\mathcal {U}}\). Set difference is written as ‘\(\setminus \).’ A string consisting of n zeroes is written as \(0^n\). The Hamming weight of X is denoted as w(X). We will consider a source \(X\in \{0,1\} ^n\) made up of i.i.d. bits \(X_i\) following a Bernoulli distribution with parameter p, i.e., \(\mathrm{Pr}[X_i=1]=p\). Without loss of generality we take \(p\in (0,{\textstyle \frac{1}{2}})\). In particular we are interested in the case \(p<\beta \) where direct application of the COM fails. The notation ‘\(\log \)’ stands for the base-2 logarithm. Information distance (Kullback–Leibler divergence) is denoted as \(D(p||q)=p\log \frac{p}{q}+(1-p)\log \frac{1-p}{1-q}\) for \(p,q\in (0,1)\).

2.1 The scheme

Below we present a barebones version of our proposed scheme. We omit details concerning the protection of the stored data. There are well-known ways to protect helper data, using either Public Key Infrastructure or one-way functions [6]. We also omit details that have to do with the verification of the reconstructed key. These details can be trivially added.

\({ {System setup}}\)

The following system parameters are fixed. An integer u satisfying \(np< u<n\), representing the size of \({\mathcal {U}}\); a list length L; a pseudorandom generator f that takes as input a seed \(\sigma \) and a counter j, and outputs a subset \(f(\sigma ,j)\subset [n]\) such that \(|f(\sigma ,j)| =u\); a Secure Sketch \((\mathtt{Gen},\mathtt{Rec})\) that acts on a source in \( \{0,1\} ^u\) and is able to handle bit error rate \(\beta \); a key derivation function \(\mathtt{KDF}: \{0,1\} ^u\times [L]\rightarrow \{0,1\} ^\ell \). All these parameters are public.

\({ {Enrollment}}\)

  1. E1.

    Measure \(X\in \{0,1\} ^n\).

  2. E2.

    Draw a random subset \({\mathcal {U}}\subset [n]\) of size u such that \(X_{\mathcal {U}}\) contains as many ‘1’s as possible.

  3. E3.

    Compute \(Y=X_{\mathcal {U}}\) and \(W=\mathtt{Gen}(Y)\).

  4. E4.

    Draw a random seed \(\sigma \).

  5. E5.

    Draw a random \(z\in [L]\). Determine a permutation \(\pi :[n]\rightarrow [n]\) such thatFootnote 2 \(\pi ({\mathcal {U}})=f(\sigma ,z)\).

  6. E6.

    Derive the secret key as \(S=\mathtt{KDF}(Y,z)\).

  7. E7.

    Store \(\sigma ,\pi ,W\).

\({ {Reconstruction}}\)

  1. R1.

    Read \(\sigma ,\pi ,W\).

  2. R2.

    Measure \(X'\in \{0,1\} ^n\).

  3. R3.

    Construct a set \({\mathcal {M}}'=\{i\in [n]:\; X'_i=1\}\). Compute \({\mathcal {M}}=\pi ({\mathcal {M}}')\).

  4. R4.

    Construct the list \({\mathcal {L}}=(f(\sigma ,j))_{j=1}^L\).

  5. R5.

    Determine the index \(\hat{z}\in [L]\) for which \({\mathcal {L}}_{\hat{z}}\) has the largest overlap with \({\mathcal {M}}\).

  6. R6.

    Compute \(\hat{\mathcal {U}}=\pi ^\mathrm{inv}({\mathcal {L}}_{\hat{z}})\) and \(\hat{Y}=\mathtt{Rec}(X'_{\hat{\mathcal {U}}},W)\).

  7. R7.

    Reconstruct the key as \(\hat{S}=\mathtt{KDF}(\hat{Y},\hat{z})\).

We will typically consider \(u\ge 2np\). With an exponentially small probability it may occur that \(w(X)\ge u\), leading to \(X_{\mathcal {U}}=1^u\) in step E2. Even if such an exceptional PUF is encountered, the scheme still works.

2.2 Explanation of the scheme

The effect of steps E4,E5 is to create a list of \({\mathcal {U}}\)-candidates, of which only entry z is correct. To an attacker (who knows u but does not know X or \(X'\)) the L candidates are indistinguishable.Footnote 3

Steps R3–R5 allow for a quick search to identify the index z of the correct \({\mathcal {U}}\)-candidate. Note that the reconstruction algorithm compares a permuted \({\mathcal {M}}\) to \({\mathcal {L}}\)-entries instead of \({\mathcal {M}}\) to permuted \({\mathcal {L}}\)-entries; this improves speed. To further optimize for speed, steps R4 and R5 can be combined in a loop to select good z values on the fly as soon as a new \({\mathcal {L}}_j\) is generated.

Note that extremely fast pseudorandom generators exist which spew out more than 8 bits per clock cycle [1, 2]. This makes it practical to work with large values of L, as long as not too many plausible z-candidates are generated. See Sect. 3.3. Even on CPU-constrained devicesFootnote 4 (clock speed of MHz order) it should be possible to achieve \(L=2^{10}\).

We did not explicitly specify how to map a seed to a size-u subset of [n]. A very straightforward algorithm would be to pick u pseudorandom locations in [n].

We did not specify an algorithm for determining the permutation \(\pi \), nor did we specify in which form \(\pi \) is stored. These are minor details and have no impact on the overall efficiency of the scheme, since steps E5 and R3 are performed only once. The computational bottleneck is R4, R5. For details about permutations we refer to [3, 11].

Note that inputting z into the key derivation function increases the entropy of S by \(\log L\) bits. If the PUF has ample entropy then \(L=1\) suffices, and one can skip all steps involving the seed \(\sigma \) and the permutation \(\pi \); the \({\mathcal {U}}\) itself is stored as helper data. This yields a scheme that is very fast and implementable on resource-constrained devices.

3 Analysis

3.1 Entropy after condensing

The Hamming weight of X carries very little information. Let us assume for the moment that \(T=w(X)\in \{0,\ldots ,u\}\) is known to the attackerFootnote 5, just to simplify the analysis.

Even if the attacker knows t and \({\mathcal {U}}\) (i.e., z), there are \({u\atopwithdelims ()t}\) equally probable possibilities for Y. Hence,

$$\begin{aligned}&\mathsf{H}(Y|Z=z,T=t) = \log {u\atopwithdelims ()t}\nonumber \\&\quad > uh\left( \frac{t}{u}\right) -\frac{1}{2}\log \frac{t(u-t)}{u} -\log \frac{e^2}{\sqrt{2\pi }}. \end{aligned}$$
(2)

The inequality follows from Stirling’s approximation.

As (2) does not depend on z, the entropy \(\mathsf{H}(Y|T=t)\) is also given by (2). A lower bound on \(\mathsf{H}(Y|T)\) is obtained by taking the expectation over t. This turns out to be rather messy, since the distribution of t is a truncated binomial. (It is given that \(t\le u\), while originally \(w(X)\in \{0,\ldots ,n\}\)). As t equals approximately np on average, the result is \(\mathsf{H}(Y|T)\approx u h(\frac{np}{u})\). A more precise lower bound is given in Theorem 1 below.

Theorem 1

Let \(\delta \) be defined as

$$\begin{aligned} \delta =\min \left\{ e^{-2np^2\left( \frac{u}{np}-1\right) ^2}, e^{-np \frac{1}{3}\left( \frac{u}{np}-1\right) }, e^{-nD\left( \frac{u}{n}||p\right) }\right\} . \end{aligned}$$
(3)

Let \(\delta <p\). Let \(u\ge 2np/(1-\delta )\). Then, the entropy of Y given T can be lower bounded as

$$\begin{aligned}&\mathsf{H}(Y|T) > u h\left( \frac{np-n\delta }{u}\right) -\frac{1}{2}\log \frac{np}{1-\delta } \nonumber \\&\quad -\log \frac{e^2}{\sqrt{2\pi }} +\frac{1}{2\ln 2}\frac{np-n\delta }{u} \nonumber \\&\quad -u \left\{ \frac{1-p}{np}+\frac{2\delta }{p}+\delta +\frac{\delta ^3}{p^3} \right\} (1-\delta )^{-1}\left( 1-\frac{\delta }{p}\right) ^{-2}.\nonumber \\ \end{aligned}$$
(4)

Proof

The proof is rather tedious and can be found in ‘Appendix A.’ \(\square \)

The entropy of Y is obtained as follows,

$$\begin{aligned} \mathsf{H}(Y)=\mathsf{H}(YT)=\mathsf{H}(T)+\mathsf{H}(Y|T). \end{aligned}$$
(5)

The \(\mathsf{H}(T)\) is the entropy of the truncated binomial distribution.

Theorem 2

Let \(q_t={n\atopwithdelims ()t}p^t(1-p)^{n-t}\) denote the probabilities in the full binomial distribution. Let \(\delta \) be defined as in Theorem 1.

$$\begin{aligned} \mathsf{H}(T)> \log \frac{2\pi }{e} \sqrt{np(1-p)}+\log (1-\delta )-(n-u)q_u\log \frac{1}{q_u}. \end{aligned}$$

Proof

See ‘Appendix B.’ \(\square \)

Theorem 3

Let \(\delta \) be defined as in Theorem 1. Let \(\delta <p\). Let \(u\ge 2np/(1-\delta )\).

$$\begin{aligned}&\mathsf{H}(Y) > u h\left( \frac{np-n\delta }{u}\right) +\frac{1}{2\ln 2}\frac{np-n\delta }{u} -\log \frac{e^3}{(2\pi )^{3/2}} \nonumber \\&\quad +\frac{1}{2}\log (1-p)+\frac{3}{2}\log (1-\delta ) -(n-u)q_u\log \frac{1}{q_u} \nonumber \\&\quad -u \left[ \frac{1-p}{np}+\frac{2\delta }{p}+\delta +\frac{\delta ^3}{p^3} \right] (1-\delta )^{-1}\left( 1-\frac{\delta }{p}\right) ^{-2}. \nonumber \\ \end{aligned}$$
(6)

Proof

Follows directly from Theorems 1 and 2 by adding \(\mathsf{H}(T)+\mathsf{H}(Y|T)\). \(\square \)

The complicated expression (6) can be well approximated by \(uh(np/u)-u/np\). Note that the difference between the original source entropy \(\mathsf{H}(X)=nh(p)\) and the condensed form \(\mathsf{H}(Y)\approx uh(np/u)-u/np\) is considerable. For example, setting \(n=640\), \(p=0.1\), \(u=128\) yields \(\mathsf{H}(Y)\approx 126\) and \(\mathsf{H}(X)\approx 300\). A small part of this huge difference can be regained using the trick with the entropy of Z. The practical aspects are discussed in Sect. 3.3.

Note also that our scheme outperforms simple von Neumann debiasing by a factor of at least 2. The von Neumann algorithm takes n / 2 pairs of bits; each pair has a probability \(2p(1-p)\) of generating a (uniform) output bit; hence, the extracted entropy is \(np(1-p)<np\). In our scheme, if we set \(u\approx 2np\) we get \(\mathsf{H}(Y)\approx 2np-2\). Furthermore, \(\mathsf{H}(Y)\) is an increasing function of u.

3.2 Fuzzy Extraction after condensing

The code offset method applied to \(Y\in \{0,1\} ^u\) leaks at least \(u h(\beta )\) bits of information about Y. In case of a ‘perfect’ error-correcting code, the length of a noise-robust key reconstructed with the COM is \(\mathsf{H}(Y)-uh(\beta )\). In Fig. 2 we plot \(\mathsf{H}(Y)-uh(\beta )\) for some example parameter settings. (Note that in all cases shown here \(\beta \ge p\) holds; the COM acting on the original source X would be unable to extract any entropy.) Clearly, there is an optimal u for given \((n,p,\beta )\). In practice one is given p and \(\beta \) and has to find (nu) such that the \(\mathsf{H}(Y)-uh(\beta )\) is large enough.

Fig. 2
figure 2

The key length \(\mathsf{H}(Y)-uh(\beta )\) that the code offset method can reproducibly extract after the condensing step, plotted as a function of u for various parameter values. For \(\mathsf{H}(Y)\) the bound in Theorem 3 is used

3.3 The list size L

We briefly discuss how large L can be made before the reconstruction procedure sketched in Sect. 2 starts to produce too many candidates for z. We define \(\tilde{p}=p+\beta -2p\beta \).

On the one hand, there is the number of ‘1’ symbols in \(X'_{\mathcal {U}}\) for the correct \({\mathcal {U}}\). The number of 1’s in \(X_{\mathcal {U}}\) is on average np. Of these, on average \(np(1-\beta )\) will be a ‘1’ in \(X'\). The \((u-np)\) zeroes in \(X_{{\mathcal {U}}}\) will generate on average \((u-np)\beta \) 1’s in \(X'\). Hence, the number of 1’s in \(X'_{\mathcal {U}}\) is expected to be around \(np(1-\beta )+(u-np)\beta =np+(u-2np)\beta \).

On the other hand, there is the number of 1’s for incorrect \({\mathcal {U}}\)-candidates. The number of 1’s in \(X'\) is approximately \(n\tilde{p}\). We pretend that \(n\tilde{p}\) is integer, for notational simplicity. We denote by \(A\in \{0,\ldots ,n\tilde{p}\}\) the number of 1’s in \(X'_{{\mathcal {V}}}\) for a randomly chosen subset \({\mathcal {V}}\), with \(|{\mathcal {V}}|=u\). The A follows a hypergeometric probability distribution

$$\begin{aligned} \mathrm{Pr}[A=a]=\frac{{n\tilde{p}\atopwithdelims ()a}{n-n\tilde{p}\atopwithdelims ()u-a}}{{n\atopwithdelims ()u}} = \frac{{u\atopwithdelims ()a}{n-u \atopwithdelims ()n\tilde{p}-a}}{{n\atopwithdelims ()n\tilde{p}}}. \end{aligned}$$
(7)

The first expression looks at the process of selecting u out of n positions with exactly a 1’s hitting the \(n\tilde{p}\) existing 1’s in \(X'\); the second expression looks at the process of selecting \(n\tilde{p}\) positions in \(X'\) such that exactly a of them lie in \({\mathcal {V}}\). We have \({\mathbb {E}}_a\,a=u\tilde{p}\) and \({\mathbb {E}}_a(a-u\tilde{p})^2=u\tilde{p}(1-\tilde{p})(n-u)/(n-1)<u\tilde{p}\). In other words, a is sharply peaked around \(u\tilde{p}\).

We can put a threshold \(\theta \) somewhere in the gap between \(u\tilde{p}\) and \(np+(u-2np)\beta \), and declare a \({\mathcal {U}}\)-candidate \({\mathcal {V}}\) to be bad if the Hamming weight of \(X'_{\mathcal {V}}\) is lower than \(\theta \). Let us set \(\theta =np+(u-2np)\beta -c\cdot \sqrt{np}\) with c sufficiently large to avoid false negatives (i.e., missing the correct \({\mathcal {U}}\)). In a way analogous to (3), we can bound the false positive probability as

$$\begin{aligned} \mathrm{Pr}[A\ge \theta ]< & {} \min \Bigg \{ e^{-2u\tilde{p}^2\left( \theta /u\tilde{p}-1\right) ^2}, \; e^{-u\tilde{p}\frac{1}{3}\left( \theta /u\tilde{p}-1\right) },\nonumber \\&\left. e^{-uD\left( \frac{\theta }{u}||\tilde{p}\right) } \right\} . \end{aligned}$$
(8)

These bounds are obtained by applying (3) and replacing \(u\rightarrow \theta \), \(n\rightarrow u\), \(p\rightarrow \tilde{p}\). Figure 3 shows how many bits of entropy (\(b\approx -\log \mathrm{Pr}[A>\theta ]\)) can be obtained from z without running into false positives in step R5. To extract b bits of entropy, a list length \(L=2^b\) is needed. Figure 3 serves just to show the orders of magnitude and is by no means an exhaustive treatment of the whole parameter space \(n,p,\beta ,u,\theta \). We remark that the curves depend quite strongly on the threshold parameter c.

It is important to note that it is perfectly possible to make L extremely large. Then, many false positives occur, but this is not a fundamental problem. It requires extra work: one key reconstruction and one verification (e.g., of a key hash) per false positive. Depending on the available n, the computing platform, etc., this may be a viable option.

Fig. 3
figure 3

Plots of \(-\log \mathrm{Pr}[A>\theta ]\) as a function of n for \(\beta =p\), \(u=2.25 np\), and \(\theta =np+(u-2np)\beta -c\sqrt{np}\) with \(c=4\). The \(\mathrm{Pr}[A>\theta ]\) was obtained by numerical summation of (7). The vertical axis represents approximately the number of bits \(b=\log L\) that can be extracted from z without generating false positives

4 Comparison to other debiasing schemes

The main algorithms to compare against are given in the CHES2015 paper by Maes et al. [17]. They introduce several schemes that perform debiasing in the context of a helper data system: Classic Von Neumann (CVN), Pair-Output (2O-VN) and Multi-Pass Tuple Output (MP-TO-VN). The following figures of merit are important, (i) the amount of entropy retained in Y, from the original nh(p) contained in X; (ii) the amount of work required during the reconstruction phase to derive \(\hat{Y}\) from \(X'\).

Here, we will not include the additional entropy obtained from Z in our scheme. The procedure for reconstructing \(\hat{z}\) has no equivalent in [17], which makes comparison very difficult.

Scheme

Retained entropy

Reconstruction of \(\hat{Y}\)

Trivial

\(\approx 2np-2\)

Take subset of \(X'\)

Debiasing

(at \(u= 2np\))

CVN

\(np(1-p)\)

Take subset of \(X'\);

\(\approx np\) binary comp

2O-VN

\(np(1-p)\)

Take subset of \(X'\)

MP-TO-VN

\(np(1-p)\)

 

(two-pass)

\(+{\textstyle \frac{1}{2}} n\frac{p^2(1-p)^2}{p^2+(1-p)^2}\)

Take subset of \(X'\)

The reconstruction effort is practically the same in all these schemes and is very low.

The entropy estimates are obtained as follows. The result \(np(1-p)\) for the original von Neumann algorithm is discussed in Sect. 3.1. The CVN and 2O-VN retain exactly this amount. In the second VN pass as described in [17], there are \({\textstyle \frac{n}{2}}-np(1-p)\) bit pairs left after the first pass, and in each bit pair the two bits have the same value. In the second pass, this gives rise to \({\textstyle \frac{n}{4}}-{\textstyle \frac{1}{2}} np(1-p)={\textstyle \frac{n}{4}}[p^2+(1-p)^2]\) von Neumann comparisons. Each comparison yields an output bit with probability \(2\frac{p^2}{p^2+(1-p)^2}\frac{(1-p)^2}{p^2+(1-p)^2}\). Hence, the expected number of output bits in the second pass is \({\textstyle \frac{1}{2}} np^2(1-p)^2/[p^2+(1-p)^2]\).

Note that the two-pass MP-TO-VN adds at most 25% to the CVN entropy, while trivial debiasing adds slightly more than 100%.

5 Some remarks on min-entropy

One could take the point of view that the relevant quantity to study is min-entropy instead of Shannon entropy, since we are deriving cryptographic keys. The min-entropy of the source is \(\mathsf{H}_\mathrm{min}(X)=n\log (1-p)^{-1}\), corresponding to the all-zero string. For small p this is significantly smaller than the Shannon entropy \(\mathsf{H}(X)=nh(p)\). On the other hand, the entropy loss is also smaller when computed in terms of min-entropy.

Theorem 4

Consider a linear binary code with message length k and codeword length n that is able to correct t errors. Let \(X\in \{0,1\} ^n\) consist of i.i.d. bits that are Bernoulli-distributed with parameter \(p<{\textstyle \frac{1}{2}}\).

$$\begin{aligned}&\mathsf{H}_\mathrm{min}(X|\mathtt{Syn}\,X)>\mathsf{H}_\mathrm{min}(X)-(n-k) \nonumber \\&\quad +(t+1)\log \frac{1-p}{p}\nonumber \\&\quad -\frac{1}{2^{n-k}\ln 2}\sum _{a=0}^t{n\atopwithdelims ()a}\left\{ \left( \frac{1-p}{p}\right) ^{t-a+1}-1 \right\} . \end{aligned}$$
(9)

The proof is given in ‘Appendix C.’ For codes that are far from perfect, the last term in (9) is negligible.

However, there are strong arguments against using min-entropy in the context of biased PUFs. A situation where X has a Hamming weight far below the typical value np can be seen as a hardware error and is likely to occur only when the chip itself is malfunctioning. If we condition all our probabilities on the premise that the hardware is functioning correctly, then we are back in the typical regime; there min-entropy is almost identical to Shannon entropy.

6 Summary

We have introduced a method for source debiasing that can be used in Helper Data Systems to solve the ‘more noise than entropy’ problem. Our method applies the condense-then-fuzzy-extract idea [7] in a particularly simple way: the space \( \{0,1\} ^n\) is condensed to \( \{0,1\} ^u\) in such a way that all the rare symbols are kept; meanwhile, the noise entropy is reduced from \(nh(\beta )\) to \(uh(\beta )\). Theorem 3 gives a lower bound on the retained entropy \(\mathsf{H}(Y)\). Furthermore, there is the option of extracting additional entropy from the index z, which points at the real subset \({\mathcal {U}}\) among the fakes. Even in its bare form, without the fake subsets, our method outperforms basic von Neumann debiasing by factor of at least 2.

Figure 2 shows that after the condensing step the code offset method can extract significant entropy in a situation where the bare COM fails. It also shows the trade-off between the reduction of source entropy and noise entropy as u varies.

In Sect. 3.3 we did a very preliminary analysis of the practicality of extracting information from the index z. More work is needed to determine how this works out for real-world parameter values \(n,p,\beta \) and to see how the computations in steps R4 and R5 can be optimized for speed.

The entropy analysis can be improved and extended in various ways, e.g., by considering different noise models such as asymmetric noise.