Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Software obfuscation has a long tradition in aiming at protecting against reverse engineering. For example, the first International Obfuscated C Code Contest (www.ioccc.org) has been organized in 1984 and experienced the 23rd event in this series in 2014. There are obfuscators for all popular programming languages today. For example, for Java, there are several open-source projects like ProGuard, ClassEncrypt, or JavaGuard, and an even larger number of commercial products. These approaches are usually based on heuristics and best practices, ranging from simple renaming of function and variables names, to elaborate schemes, e.g. [19]. However, these practical obfuscators do not provide verifiable, proven security guarantees.

Provably secure obfuscation, in the sense that it is based on some reasonable cryptographic assumption, has long been a highly desirable yet hard-to-reach goal. Even worse, there have been devastating impossibility results for the natural notion of virtual black-box obfuscation [8] and only limited positive results for special cases like point functions [16]. A significant breakthrough came with the work by Garg et al. [28], indicating that the relaxed yet useful notion of indistinguishability obfuscation may be achievable for general circuits. This notion basically says that one cannot distinguish the obfuscated codes of two functionally equivalent circuit programs.

It is fair to say that the underlying cryptographic assumption, on which is security of the construction of Garg et al. [28] is based upon, is non-standard and not well analyzed (yet). This is also true for the alternative approach to build indistinguishability obfuscators proposed by Pass et al. [48]. This is complemented by yet other proposals of Gentry et al. [32] based on a more standard-like computational assumption about multilinear maps, and of Ananth and Jain [3] based on compact functional encryption. At the same time, recent attacks [18, 2123, 31] on multilinear maps, albeit currently not known to break the aforementioned obfuscation candidates, testify that constructions may suddenly turn out to lack the desired security guarantees. New suggestions and attacks keep on appearing at high frequency [6, 30, 41, 46].

The above leaves us with multiple choices of candidates for building obfuscators, both in practice as well as in theory, and it is currently difficult to determine the best choice in terms of security. For the heuristic, practical obfuscators, it may be even harder to distinguish sound constructions from weak approaches, since the design strategies may be vague. A straightforward idea to boost confidence in obfuscator candidates, both in theory as well as in practice, is to interlock multiple solutions and approaches. This idea of failure-tolerant cryptographic designs has traditionally been subsumed under the notion of robust combiners.

1.1 Robust Combiners for Obfuscation

The notion of robust combiners has been introduced by Harnik et al. [34] based on the idea of tolerant cryptographic designs by Herzberg [3537]. Such combiners take several candidates for a cryptographic task and provide a secure solution if a quorum of the candidates is indeed secure. The idea has been successfully applied to several cryptographic primitives, including hash functions [14, 25, 27, 45, 47, 49, 50], encryption [24, 34], commitments [3436], and oblivious transfer [34, 43, 44].

A robust combiner for obfuscation would take as input a program (abstractly in form of a circuit or a Turing machineFootnote 1) and create an obfuscated version with the help of the candidate obfuscators \({\mathcal {O}_{1}},{\mathcal {O}_{2}},\dots ,{\mathcal {O}_{N}}\). As long as a sufficient number of candidate obfuscators is indeed secure, the combiner should also provide a secure obfuscator. In order to make formal claims about the robustness of the combiner, due to the lack of rigorous security properties for practical obfuscators, one inevitably needs to base the notion of security for the combiner on the various models in the cryptographic literature, such as virtual black-box obfuscation or indistinguishability obfuscation.Footnote 2

What distinguishes the idea of combiners for obfuscation from the previous scenarios is that obfuscation combiners are higher-order combiners which are closely linked to the functionality of their inputs. Consider for instance the case of hash function combiners where it usually suffices that the combiner preserves the security property only, enabling solutions like the concatenation combiner \({\textsf {Comb}}^{H_1,H_2}(x)=H_1(x)||H_2(x)\) with longer output for collision resistance. Devising hash combiners with equal output size as \(H_1,H_2\), retaining this mild functional property, is conceivably hard [14, 49, 50]. An obfuscation combiner, in contrast, must provide a circuit which computes the same function as the input circuit; it cannot implement a different function with a larger output. Indeed, note that functional preservation and input hiding are conflicting requirements for obfuscation and one is easy to achieve without the other.

As a concrete example consider combiners in the context of virtual black-box obfuscation. Herzberg and Shulman [38] show that the cascading construction \({\textsf {Comb}}^{{\mathcal {O}_{1}},{\mathcal {O}_{2}}}(\cdot )={\mathcal {O}_{2}}({\mathcal {O}_{1}}(\cdot ))\) of two candidate obfuscators \({\mathcal {O}_{1}},{\mathcal {O}_{2}}\) is robust for this notion as long as functional correctness of the candidates is guaranteed. If this is not granted and the inner obfuscator is corrupt then \({\mathcal {O}_{1}}\) may implement an arbitrary function, such that the combiner neither preserves functional correctness nor necessarily input hiding. The latter holds as one usually does not have any security guarantees for input circuits with diverging functionalities, even if obfuscator \({\mathcal {O}_{2}}\) is sound. Analogously, if the outer obfuscator is corrupt then the resulting cascade may no longer sustain functionality.

While functional correctness of an obfuscator is usually not based on unproven cryptographic assumptions, unlike the obfuscation property, there are two reasons why certifying functional correctness may still be hard. First, software implementations are error prone, and the complexity of previous theoretical proposals for obfuscation [28, 32, 48] seems to be inimical in this regard. Secondly, one may have little control over, or insights into, the actual obfuscation program. This is clearly true for commercial obfuscation programs; in fact, the programs of such obfuscators are often themselves obfuscated. The creation of a corrupt obfuscator, which intentionally leaks some information, is easy; to demonstrate this, we implemented demos of different types of corrupted-obfuscators, including obfuscators which leak information even when used in cascade.

The concern about corrupt-obfuscators may also emerge in theoretical solutions. As an example for the latter, in the universal-parameter generation setting [39] a trusted party publishes an obfuscated program which parties can use to generate common parameters. What if we now prefer to use several potentially untrusted authorities and combine their obfuscated programs?

1.2 Our Results

Our goal is to provide a general combiner for obfuscation. It should satisfy the formal requirements in order to allow for sound solutions both in theory and in practice. Ideally, the combiner should tolerate a large number of corrupt obfuscators, be very efficient, and ensure various notions of obfuscation simultaneously. Note that, while virtual black-box obfuscation may be impossible in general, for some functions and attack models [7, 15] the notion may still be achievable, such that our combiner should also comply with this notion.

On the positive side we present a 3-out-of-4 combiner which can tolerate a single corrupt combiner out of four candidates. It is depicted in Fig. 1 and consists of two layers. In the first layer we insert the input circuit C into three combinations of three of the obfuscators each; in each combination, we output a circuit that produces the majority of the three obfuscated circuits. We only require three of the four combinations of picking three of four obfuscators. Each unit ensures that if at most one candidate is corrupt then functional correctness is still preserved. In the next layer we then run each of the first-layer majority circuits through the complementary fourth obfuscation candidate and again take the majority to ensure correctness. Obfuscation follows as either all three candidates on the first layer are sound and thus hide the input circuit, or the fourth candidate on the second layer ensures this.

Fig. 1.
figure 1

Our 3-out-of-4 combiner. The MAJ circuit has three hardwired circuits \(C_1, C_2\) and \(C_3\) with equal input and output sizes, which also correspond to the input and output size of the MAJ circuit. For input x the MAJ circuit evaluates each of the three circuits for x and returns the bit-wise majority of the circuit’s outputs.

Our combiner indeed works for different notions of obfuscation such as virtual black-box and grey box obfuscation,Footnote 3 indistinguishability obfuscation, and differing-input obfuscation. In total it requires twelve calls to obfuscators and has depth 2. The latter is important as obfuscation may cause a polynomial blow-up in size. Remarkably, while most theoretical solutions currently induce a significant size expansion, with a few exceptions [4, 13], obfuscators in practice only display a mild increase in code size. Note that devising combiners of depth 1 with a structure as above is impossible as the corrupt obfuscator may then leak information about the input circuit via the output.

We then show an impossibility result for 2-out-of-3 combiners. There are, of course, trivial combiners in this case, such as the combiner which simply uses the sound candidate only, and the (inefficient) combiner for indistinguishability obfuscation that evaluates the input circuit and then outputs the lexicographic smallest equivalent circuit. We thus focus on structural combiners that use a fixed pattern, independently of the status of the candidates, and do not semantically interpret the input circuit. Our 3-out-of-4 combiner is structural in this regard. We show that no 2-out-of-3 structural combiner may ensure both functional correctness and obfuscation. This holds for the weaker notion of indistinguishability obfuscation and therefore also for the stronger notions of black-box and grey-box obfuscation. Note that this also applies to any 1-out-of-2 combiner.Footnote 4

We extend the positive result as well as the negative result to the case of \((2\gamma +1)\)-out-of-\((3\gamma +1)\) resp. \(2\gamma \)-out-of-\(3\gamma \) combiners. That is, we give a construction which can be seen as a less efficient generalization of our basic solution if one can corrupt at most \(\gamma \) out of \(3\gamma +1\) obfuscators. We then argue that one cannot have structural combiners if \(\gamma \) out of the \(3\gamma \) obfuscators can be corrupt. For both settings we can draw on the ideas and techniques from the basic cases.

The combiners above are correcting in the sense that they guarantee functional correctness if a quorum of input obfuscator candidates is secure. One can also envision a weaker notion of combiners, which output circuits that either compute the correct output of the input circuit, but may also output, instead, a special error indicator \(\bot \). This error indicator should be output only when one of the component obfuscator is faulty; if all obfuscators are sound then the combiner must never output \(\bot \). In particular, such combiners cannot output false answers. We call them detecting combiners in analogy to coding theory.

For detecting combiners we achieve slightly different bounds. That is, we show that one can have \((\gamma +1)\)-out-of-\((2\gamma +1)\) combiners for any \(\gamma \). For the case \(\gamma =1\) and 2-out-of-3 combiners we can again provide an optimized version similar to our original 3-out-of-4 combiner. Concerning lower bounds, we can apply the ideas of the other combiners to show that there cannot exist structural \(\gamma \)-out-of-\(2\gamma \) detecting combiners for any \(\gamma \). The reduced overhead of detecting combiners may make them attractive option for practical implementations, where once detection ability exists, the attack-vector of providing faulty obfuscator appears unlikely.

While our main results follow the common approach in provably secure obfuscation, we stress that we view our approach to be equally well suited for practice. In Sect. 7 we therefore evaluate performance of our combiner when applied to practical obfuscators, and discuss the implications of our findings in this domain.

Concurrent Work. Independently of our work, Ananth et al. [2] also discuss the idea of obfuscation combiners. Their approach is fundamentally different from ours, and results in (non-structural) obfuscation combiners which are secure as long as a single candidate is secure. However, this comes at the cost of a significant overhead, and also requires additional cryptographic assumptions such as LWE or DDH, and indistinguishability obfuscation against sub-exponential adversaries.

2 Preliminaries

We exclusively treat circuits here; the approach can be transfered to the case of Turing machines straightforwardly. When speaking of circuits C from some class \(\mathcal {C}=(\mathcal {C}_\lambda )_{\lambda \in \mathbb {N}}\) we usually mean some arbitrary (but efficiently computable) description of the circuit. When considering specific encodings with dedicated properties, as required for our lower bounds, we usually write \(\left\langle C\right\rangle \) for the encoding of the circuit under scheme \(\left\langle \cdot \right\rangle \). If, on the other hand, we consider the function implemented by the circuit we usually write \(C(\cdot )\) instead, and C(x) for the output of circuit C on input x. When writing \(C(\cdot )= C'(\cdot )\) or \(C\equiv C'\) we refer to functional equality of circuits C and \(C'\), comprising input and output length, whereas \(C=C'\) or \(\left\langle C\right\rangle =\left\langle C'\right\rangle \) means equal descriptions (under the encoding in question).

2.1 Obfuscators

Barak et al. [8] defined several notions of obfuscators, with virtual black-box (VBB) obfuscators being the strongest one. This notion says that the adversary cannot learn anything from an obfuscated circuit beyond the circuit’s outputs for chosen inputs. While they also showed that this notion is in general unachievable, for specific cases such as point functions one may be able to attain this level of obfuscation. Below we mainly consider obfuscation of circuits, and we also consider the possibility that the obfuscator itself may be non-uniform and work specifically for different values of \(\lambda \). The latter allows corrupt (also called malicious) obfuscators to match the algorithm class of adversaries and distinguishers. All obfuscators here, sound and corrupt ones, are nonetheless considered to be stateless.

Definition 1

(Virtual Black-Box Obfuscation). A (possibly non-uniform) PPT algorithm \({\mathcal {O}}\) is a virtual black-box obfuscator for circuit class \(\mathcal {C}=(\mathcal {C}_\lambda )_{\lambda \in \mathbb {N}}\) if the following holds:

  • Functional Correctness: For any \(\lambda \in \mathbb {N}\), any circuit \(C\in \mathcal {C}_\lambda \), any obfuscated version \(O\leftarrow {\mathcal {O}}(1^\lambda ,C)\) we have \(C\equiv O\).

  • VBB Obfuscation: For any (possibly non-uniform) PPT algorithm \(\mathcal {A}\) there exists a (possibly non-uniform) algorithm PPT \(\mathcal {S}\) and a negligible function \(\epsilon (\lambda )\) such that for all circuits \(C\in \mathcal {C}_\lambda \) we have

    $$\begin{aligned} |\text {Prob}\,\big [{\mathcal {A}(1^\lambda ,{\mathcal {O}}(1^\lambda ,C))=1}\big ] - \text {Prob}\,\big [{\mathcal {S}^{C}(1^\lambda )=1}\big ] | \le \epsilon (\lambda ), \end{aligned}$$

    where the probabilities are over the randomness of \({\mathcal {O}}\) and \(\mathcal {A}\) resp. \(\mathcal {S}\).

 

Virtual grey-box (VGB) obfuscation [10] is defined analogously, only that the simulator above is computationally unbounded but can make at most a polynomial number of queries to its oracle circuit. Clearly, VBB obfuscation implies VGB obfuscation. A stronger notion is based on the extension to (dependent) auxiliary inputs [33] where both the adversary and the simulator receive a random sample \(\mathsf {aux}\) as additional input, where \(\mathsf {aux}\) may depend on any circuit \(C'\in \mathcal {C}_\lambda \).Footnote 5 We will use this version for proving the security of our combiners for VBB and VGB obfuscation.

Another meaningful relaxation, implied by both notions above in the non-uniform setting, is indistinguishability obfuscation [8] which basically says that the obfuscations of two functional equivalent circuits are indistinguishable:

Definition 2

(Indistinguishability Obfuscator). A (possibly non-uniform) PPT algorithm \({i\mathcal {O}}\) is called an indistinguishability obfuscator for a circuit class \(\mathcal {C}=(\mathcal {C}_\lambda )_{\lambda \in \mathbb {N}}\) if the following conditions hold:

  • Functional Correctness: For any \(\lambda \in \mathbb {N}\), any circuit \(C\in \mathcal {C}_\lambda \), any obfuscated version \(O\leftarrow {i\mathcal {O}}(1^\lambda ,C)\) we have \(C\equiv O\).

  • Indistinguishability: For any (possibly non-uniform) PPT distinguisher \(\mathcal {D}\), there exists a negligible function \(\epsilon (\lambda )\) such that for all circuits \(C_0,C_1\in \mathcal {C}_\lambda \) with \(C_0\equiv C_1\) we have

    $$\begin{aligned}&|\text {Prob}\,\big [\mathcal {D}(1^\lambda ,C_0,C_1,{i\mathcal {O}}(1^\lambda ,C_0))=1\big ] \\&\qquad - \text {Prob}\,\big [{\mathcal {D}(1^\lambda ,C_0,C_1,{i\mathcal {O}}(1^\lambda ,C_1))=1\big ]}| \le \epsilon (\lambda ), \end{aligned}$$

    where the probabilities are over the randomness of \({i\mathcal {O}}\) and \(\mathcal {D}\).

 

There are several variations of the above definitions. For one, we can allow for a negligible error in the functional correctness (over the random choices of the obfuscator). Both our positive and our negative result are robust with respect to such a change. That is, our 3-out-of-4-combiners uses a constant number of obfuscator calls such that the error would remain negligible; obfuscation would still hold, because the leakage due to incorrect obfuscator outputs has negligible probability. Similarly, our impossibility result about 2-out-of-3 combiners would still hold, even if the starting combiners would have perfect functional correctness, but the (fixed-size structural) combiner could have a negligible error. Alternatively, one may use the recent approach in [12] to eliminate the error first.

Finally, yet another version of obfuscation, called differing-inputs obfuscation [8], demands indistinguishability of two obfuscated circuits, but only if the input circuits \(C_0,C_1\) can be sampled such that finding inputs where \(C_0\) and \(C_1\) differ, is infeasible. More formally, we assume that there is a PPT algorithm \(\textsf {Sampler}\) associated to the circuit family \(\mathcal {C}\) such that for any PPT algorithm \(\mathcal {A}\) there exists a negligible function \(\epsilon (\lambda )\) such that the probability that \(C_0(x)\ne C_1(x)\), where \((C_0,C_1,\textsf {aux})\leftarrow \textsf {Sampler}(1^\lambda )\) and \(x\leftarrow \mathcal {A}(1^\lambda ,C_0,C_1,\textsf {aux})\), is at most \(\epsilon (\lambda )\). Note that we assume that \(\textsf {Sampler}(1^\lambda )\) only outputs circuits \(C_0,C_1\in \mathcal {C}_\lambda \).

A differing-inputs obfuscator \({\text {di}\mathcal {O}}\) for \(\mathcal {C}\) and \(\textsf {Sampler}\) is now defined analogously to an indistinguishability obfuscator, only that it is infeasible to distinguish outputs \({\text {di}\mathcal {O}}(1^\lambda ,C_0)\) and \({\text {di}\mathcal {O}}(1^\lambda ,C_1)\) for \((C_0,C_1,\textsf {aux})\leftarrow \textsf {Sampler}(1^\lambda )\), even if given \(\textsf {aux}\) as additional input. While the notion is also quite useful for the design of protocols [1], Garg et al. [29] argue that the notion may be hard to achieve.

2.2 Combiners for Obfuscators

Roughly, a combiner for obfuscators is a procedure which uses a set of obfuscators \({\mathcal {O}_{1}},{\mathcal {O}_{2}},\dots \) to turn an input circuit C into an obfuscated one, with the guarantee that if an (unspecified) quorum of the underlying obfuscators is secure, then so is the combiner. In the definition below we abstractly speak of o-obfuscators, leaving open which obfuscation category \(o\in \{\text {VBB, VGB, indistinguishability, differing-inputs}\}\) we refer to.

For combiners of primitives with multiple properties, such as functional correctness and obfuscation here, there are varying levels of combiners, called weak, mild, and strong [26, 27]. A strong combiner preserves security “property-wise”, i.e., for each property individually if sufficiently many candidates have this property then so does the combiner. A weak combiner only preserves all properties if there are enough candidates which are secure and thus have all properties simultaneously. The mild notion is in between where the candidates must somehow cover all properties but for each property possibly by different candidates. In [26, 27] it has been discussed that strong robustness implies mild robustness which in turn implies weak robustness, and that the implications are strict in case of hash functions for some properties.

Definition 3

(Robust Combiner for o-Obfuscation). Let \({{{\textsf {\textit{Comb}}}}}\) be a PPT oracle algorithm and let \({\mathcal {O}_{1}},\dots ,{\mathcal {O}_{N}}\) be o-obfuscators candidates. Then \({{{\textsf {\textit{Comb}}}}}\) is called a

  • strongly robust t-out-of-N combiner if for each of functional correctness and o-obfuscation, if at least t of the N candidates have this property, then so does the combiner \({{{\textsf {\textit{Comb}}}}}^{{\mathcal {O}_{1}},\dots ,{\mathcal {O}_{N}}}\);

  • mildly robust t-out-of-N combiner if, whenever functional correctness and o-obfuscation are each satisfied by at least t of the N candidates, then the combiner too has both properties;

  • weakly robust t-out-of-N combiner if the combiner is a functional correct o-obfuscator if there are at least t out of N candidates which are simultaneously functionally correct and o-obfuscators.

The definition assumes that the obfuscators and combiner all work for the same class \(\mathcal {C}\) of obfuscatable circuits. This neglects an important aspect, though: If the combiner calls obfuscators recursively then the candidates need to be able to handle obfuscated circuits, too. We assume that this is indeed the case —and discuss it more explicitly for our structural combiners below— making the implicit assumption that the candidates also allow for a superclass \({\mathcal {C}}^{{\textsf {Comb}}}\) of circuits which is rich enough tor capture intermediate circuits created by the specific combiner. Still, the task for the combiner is to obfuscate the “core” class \(\mathcal {C}\) of circuits.

Fig. 2.
figure 2

Example of a unit (with pass-through version on the right-hand side).

As usual for combiners in general, there is always a secure obfuscation combiner, namely, the one which “obliviously” uses the secure obfuscator \({\mathcal {O}_{i}}\) and ignores the other ones in order to obfuscate the input circuit. However, this only provides an existential proof and says nothing about how to design an actual solution. Even worse, for indistinguishability obfuscation there is a trivial (non-efficient) combiner for obfuscators which can be described effectively [8]. The combiner takes as input (the description of) a circuit C and finds the (lexicographically) minimal circuit \(C_{\min }\) which computes the same functionality as C and outputs this circuit \(C_{\min }\). Then any two circuits \(C,C'\) with the same functionality yield the same obfuscated circuit \(C_{\min }\). This combiner ignores the candidate obfuscators and already constitutes an unconditionally secure obfuscator itself. It is even efficient relative to a \(\Sigma _2^p\) oracle. Hence, any lower bound for combiners would need to bypass this result and therefore need to implicitly show that \(\Sigma _2^p\ne P\).

One option to circumvent the first problem is to require to have an effective mean to turn attacks against the combiner into attacks for the candidate obfuscators. This option of so-called black-box combiners has been used for other lower bounds such as for hash function combiners [14, 49, 50]. Still, in our setting such black-box combiners would have to deal with the problem of the inefficient combiner.

An alternative path, which we also take here, is therefore to restrict the way how the combiner works. Whereas the above unconditional combiner approaches the circuit semantically by plotting its behavior, we look into what we call structural combiners here. Basically, these are combiners which have a prescribed structure with place-holder gates for the obfuscators, and they merely plug in the input circuit and derive the output circuit according to this fixed structure, without evaluating the circuits. It turns out that our 3-out-of-4 combiner is in fact structural.

2.3 Structural Combiners

A structural combiner for obfuscators is a circuit consisting of NAND gates and of obfuscator gates, where each one of the latter is labeled with one of the obfuscators \({\mathcal {O}_{i}}\). The layout is independent of the actual obfuscators and should thus work with any concrete obfuscator candidates, i.e., be black-box. The combiner is structured in so-called units. A unit is a sub circuit which takes as input the descriptions of circuits and itself describes a circuit. The unit first inserts the input circuits into some of the obfuscators, where we allow multiple appearances of obfuscators in a unit, and then processes the output circuits by a circuit consisting of NAND gates only. An example is given in the left part of Fig. 2. If the input circuit is given to obfuscators \(i_1,i_2,\dots \) then we call this an \(\{i_1,i_2,\dots \}\)-unit for the multiset \(\{i_1,i_2,\dots \}\). The example in Fig. 2 describes a \(\{1,3\}\)-unit. Furthermore, we can even let some input circuit be passed to the NAND-circuit completely, saying that the unit is pass-through in this case. Since it is irrelevant for our lower bound which circuit is passed through, we do not need to specify the identifier. The right hand side of Fig. 2 shows a pass-through version of a \(\{1,3\}\)-unit.

The output of a unit can itself serve again as the input for another unit. We can therefore nest units in a tree-like structure as in Fig. 3. In particular, we can analogously to the notion of depths of circuits define the depth of a unit, starting with level-1 units, as well as paths from the input circuit to the final unit. We call the path of units form level-1 units to the final unit a full path. A unit which is level-1 always receive the combiner’s input circuit as inputs, but potentially also other unit circuits if it is simultaneously a higher level unit. Every unit has at least one input circuit, and a unit can of course serve as multiple inputs to other units.

To complete the description of a structural combiner we need to specify the output of our combiner for some input circuit C, once the obfuscator candidates are determined. We call this the initialization of the combiner with C. Basically the output is again a circuit and it is derived by stepwise replacing the obfuscator gates in units (starting with level-1 units which receive C as input) with samples of the output of the corresponding obfuscator. Note that the structure of the combiner circuit remains, only the obfuscator gates are now filled in with concrete circuits. In case of pass-through units we additionally place the code of the unit’s input circuit inside the new circuit at the corresponding position. Once a unit has been initialized we can use it as input to a higher-level unit and initialize that unit, till we have eventually initialized the final unit. Instructively, the reader may think of this as a left-to-right pass in Fig. 3 to compute the final output circuit, denoted as \({\textsf {Comb}}^{{\mathcal {O}_{1}},{\mathcal {O}_{2}},\dots }(C)\). Note that this is a random variable, depending on the randomness of the obfuscators. A sample of this random variable can then be fed with inputs x to produce some output y.

Fig. 3.
figure 3

Combiner circuit consisting of units.

The above assumes that the class of obfuscatable circuits for structural circuits is closed under recursive constructions of units. We note that for concrete constructions such as our 3-out-of-4 combiner in the next section it suffices that we can also obfuscate level-1 units of the original input circuits. Given a circuit class \(\mathcal {C}=(\mathcal {C}_\lambda )_{\lambda \in \mathbb {N}}\), some fixed structural combiner \({\textsf {Comb}}\), and fixed obfuscators \({\mathcal {O}_{1}},{\mathcal {O}_{2}},\dots \) we denote by \(\mathcal {C}^{{\textsf {Comb}}}=(\mathcal {C}_\lambda ^{{\textsf {Comb}}})_{\lambda \in \mathbb {N}}\) the class of circuits which, besides all circuits \(C\in \mathcal {C}_\lambda \), for any C also includes all possible initializations of all units of the combiner (except for the final unit) for the given obfuscators. It is understood that, when considering a specific combiner \({\textsf {Comb}}\), all candidate obfuscators \({\mathcal {O}_{1}},{\mathcal {O}_{2}},\dots \) must be able to handle the class \({\mathcal {C}}^{{\textsf {Comb}}}\), whereas the combiner only works for the “inner” class \(\mathcal {C}\). Instructively, one may think of \(\mathcal {C}\) as the class one would like to obfuscate, although the candidate obfuscators allow for broader classes.

3 Robust 3-out-of-4 Combiner for Obfuscators

In this section we present a 3-out-of-4 (structural) combiner for obfuscation, depicted in Fig. 1 on Page 524.

3.1 Construction

The idea is to first obfuscate the input circuit C by all combinations of 3 out of the 4 given obfuscators \({\mathcal {O}_{1}},\dots ,{\mathcal {O}_{4}}\) and for each combination taking the majority of the output of the three obfuscated circuits. Note that since at least 2 of the 3 obfuscators in such a combination work properly, the majority decision provides a functionally correct output. Formally, for the majority circuit \(\text {MAJ}\) combining three input circuits by evaluating each one for a given input x and taking the bit-wise majority of the outputs, we thus build the circuits

$$\begin{aligned} O_{i_1,i_2,i_3}\leftarrow \text {MAJ}({\mathcal {O}_{i_1}}(C),{\mathcal {O}_{i_2}}(C),{\mathcal {O}_{i_3}}(C)), \quad 1\le i_1< i_2<i_3\le 4 \end{aligned}$$

for all possible 4 combinations of \(i_1,i_2,i_3\). Since we merely need an arbitrary 3 of these 4 circuits for the next stage, we take the combinations leaving out obfuscators 1, 2 and 3 (in this order).

Of course, a corrupt obfuscator among \({\mathcal {O}_{i_1}},{\mathcal {O}_{i_2}},{\mathcal {O}_{i_3}}\) in the majority combination could still reveal information about the input circuit C. We hence add another layer where we now combine three of the majority combinations as before, by running each combination \(O_{i_1,i_2,i_3}\) through the complementary obfuscator \({\mathcal {O}_{i_4}}\) and taking the majority of these circuits again. Put differently, we now build the circuit

$$\begin{aligned} \text {MAJ}({\mathcal {O}_{1}}(O_{2,3,4}),{\mathcal {O}_{2}}(O_{1,3,4}),{\mathcal {O}_{3}}(O_{1,2,4})). \end{aligned}$$

Functional correctness of our combiner is guaranteed because each of the input circuits \(O_{2,3,4},O_{1,3,4},O_{1,2,4}\) computes the correct function and at least two of the level-2 obfuscators \({\mathcal {O}_{1}},{\mathcal {O}_{2}},{\mathcal {O}_{3}}\) are correct. The obfuscation property holds because if one of the level-2 obfuscators, say, \({\mathcal {O}^*_{1}}\), is malicious, then the level-1 obfuscators generating \(O_{2,3,4}\) already hide the input circuit. Furthermore, the malicious obfuscator \({\mathcal {O}^*_{1}}\) cannot bias the functional correctness of the circuits \(O_{1,3,4}\) and \(O_{1,2,4}\) in the other branches, such that the sound second-layer obfuscators \({\mathcal {O}_{2}},{\mathcal {O}_{3}}\) also hide \(O_{1,3,4}\) and \(O_{1,2,4}\) and thus the input circuit C, even if \({\mathcal {O}^*_{1}}\) on the first level reveals information about C.

3.2 Security

We start by showing that the combiner is (strongly) robust for indistinguishability obfuscation. Recall that strong robustness refers to the fact that each property, functional correctness and obfuscation, is preserved individually. Note that for our combiner (and also the security proof) it suffices that the parties merely have black-box access to all obfuscators.

Theorem 1

The combiner in Fig. 1 is a strongly robust 3-out-of-4 combiner for indistinguishability obfuscation.

Proof

Functional correctness is straightforward, given that for each unit at least two obfuscators are functionally correct and since we apply the majority of the outputs.

We next show indistinguishability. Take an arbitrary distinguisher \(\mathcal {D}\) against our combiner. We need to show that there exists a negligible function \(\epsilon \) such that for an arbitrary pair \(C_0,C_1\in \mathcal {C}_\lambda \) of circuits, the distinguishing advantage of \(\mathcal {D}\) is smaller than \(\epsilon (\lambda )\). The idea is to show that one can gradually replace the input circuits \(C_0\) to the obfuscators in the combiner by circuit \(C_1\), taking some care with the single corrupt obfuscator.

For the gradual replacement fix the order of the nine level-1 obfuscators \({\mathcal {O}_{2}},{\mathcal {O}_{3}},{\mathcal {O}_{4}},\dots ,{\mathcal {O}_{1}},{\mathcal {O}_{2}},{\mathcal {O}_{4}}\) according to their appearance in Fig. 1 from top to down, with one exception: for a parameter \(k\in \{1,2,3,4\}\), a reminiscent for the index of the corrupt obfuscator \({\mathcal {O}^*_{k}}\), we move all occurrences of this obfuscator to the very end of the list. For instance, for \(k=2\) we would have the order \({\mathcal {O}_{3}},{\mathcal {O}_{4}},{\mathcal {O}_{1}},\dots ,{\mathcal {O}_{4}},{\mathcal {O}^*_{2}},{\mathcal {O}^*_{2}}\). Let \(K=K(k)\in \{7,8\}\) be the first index of \({\mathcal {O}^*_{k}}\) in that list. Define now the random hybrid variables \(H^k_i(C_0,C_1)\) for \(i=0,1,\dots ,9\) as the output of our combiner if we pass circuit \(C_0\) for the first i obfuscators (according to our order) and \(C_1\) for the remaining \(9-i\) ones. Then, clearly \(H^k_9(C_0,C_1)\) corresponds to the distribution of our combiner for input \(C_0\), and \(H^k_0(C_0,C_1)\) to the one of our combiner for \(C_1\). It hence suffices to show for any i that \(\mathcal {D}\)’s probability of distinguishing adjacent \(H^k_{i-1},H^k_{i}\) is negligible.

To bound the advantage of \(\mathcal {D}\) for each pair \((H^k_{i-1},H^k_i)\) we will wrap the algorithm into a sequence of distinguishers \(\mathcal {D}^k_{i}\) for \(i=1,2,\dots ,9\). The distinguisher \(\mathcal {D}^k_{i}\) works in two modes, depending on the status of the i-th obfuscator in our sequence:

  • If i is such that the i-th obfuscator is not corrupt, i.e., \(i< K(k)\), then \(\mathcal {D}_i^k\) expects as input a pair \(C_0,C_1\) and an obfuscated circuit \(O'\) generated by the i-th obfuscator in our order for \(C_b\), \(b\in \{0,1\}\). Algorithm \(\mathcal {D}_i^k\) computes the output of our combiner (with the given obfuscators) but inserts \(C_1\) as input in the first \(i-1\) level-1 obfuscators, \(O'\) as the output of the i-th level-1 obfuscator, and \(C_0\) as input in the final \(9-i\) slots. It completes the output O of the combiner for these data and lets \(\mathcal {D}\) run on \(C_0,C_1\) and O. Algorithm \(\mathcal {D}_i^k\) returns whatever \(\mathcal {D}\) outputs.

  • If the i-th obfuscator is corrupt, i.e., \(i\ge K(k)\), then \(\mathcal {D}_i^k\) expects as extra auxiliary input a pair \(C_0,C_1\) and a sample \(O'\) of one of the sound obfuscator candidates. Here the obfuscator \({\mathcal {O}_{j}}\) producing \(O'\) is determined by looking at the level-1 unit u in which the i-th (corrupt) obfuscator \({\mathcal {O}^*_{k}}\) appears. For this unit, and its three obfuscators, there exists the fourth, complementing obfuscator \({\mathcal {O}_{j}}\) to which the unit’s output is fed to on the level-2 unit. For instance, if \(k=2\), \(K=8\), and \(i=8\), then the corresponding level-1 unit u is the top one in Fig. 1, and the complementing obfuscator is \({\mathcal {O}_{1}}\).

    The input to the complementing obfuscator \({\mathcal {O}_{j}}\) for deriving \(O'\) is either a sample of the level-1 unit where all honest obfuscators are initialized with \(C_0\) and the corrupt one with \(C_1\), or all of them are initialized with \(C_0\). By assumption, both samples are in the class \(\mathcal {C}_\lambda ^{{\textsf {Comb}}}\) such that the sample can be passed to \({\mathcal {O}_{j}}\). Algorithm \(\mathcal {D}_i^k\) now evaluates our combiner, by replacing inputs to obfuscators up to index i by \(C_1\), for subsequent indices giving input \(C_0\), and replacing the output of the complementing obfuscator \({\mathcal {O}_{j}}\) in unit u when evaluating our combiner by \(O'\). Return \(\mathcal {D}\)’s output bit on input \(C_0,C_1\) and the combiner’s output O.

Assume i is such that the i-th obfuscator in order is still different from \({\mathcal {O}^*_{k}}\), i.e., \(i<K\). Then if \(O'\) is the obfuscation of \(C_1\), then \(\mathcal {D}^k_i\) runs \(\mathcal {D}\) exactly on the distribution of the hybrid variable \(H^k_{i-1}(C_0,C_1)\). In particular, for \(i=1\) algorithm \(\mathcal {D}_i^k\) runs \(\mathcal {D}\) on a sample of our combiner’s output for \(C_1\). Analogously, for \(i=9\) and \(O'\) stemming from the complementing obfuscator for a sample of the level-1 unit with all \(C_0\) inputs, the input to \(\mathcal {D}\) is distributed like a sample of our combiner for \(C_0\) (and thus of \(H^k_9(C_0,C_1)\)).

Assume that the k-th obfuscator is indeed corrupt. For \(i<K\) it follows from the indistinguishability obfuscation of the sound obfuscators that there exist negligible functions \(\epsilon _i(\lambda )\) such that for any \(C_0,C_1\), the advantage of \(\mathcal {D}^k_i\) in distinguishing the two input cases is at most \(\epsilon _i(\lambda )\). For \(i\ge K\) this follows as the input circuits to the two sound obfuscators in unit u are already \(C_0\), such that the majority computation of the unit ensures that in both cases the unit circuit computes the function \(C_0(\cdot )\). It follows that both input circuits to the complementing obfuscator \({\mathcal {O}_{j}}\) compute the same function and we can again conclude from the security of the obfuscator that the advantage must be bounded by some function \(\epsilon _i(\lambda )\). Note that here we take advantage of the fact that indistinguishability holds for all circuits and therefore in paticular also for our partly combiner samples.

It therefore also holds for any i that the advantage of \(\mathcal {D}\) in distinguishing \(H_{i-1}(C_0,C_1)\) and \(H_i(C_0,C_1)\) for any \(C_0,C_1\) is at most \(\epsilon _i(\lambda )\), too. Hence, the overall advantage of \(\mathcal {D}\) is at most \(\epsilon (\lambda ):=\sum _{i=1}^9 \epsilon _i(\lambda )\) and thus negligible. it suffices that the proof provides an existential result.Footnote 6

The claim carries over to the case of differing-inputs obfuscation. Recall that the main difference to indistinguishability obfuscation is that, for the differing-inputs case, the circuits in question are generated by an algorithm \(\textsf {Sampler}\) such that the circuits may compute different functions, but \(\textsf {Sampler}\) ensures that finding differing inputs is infeasible. We can basically apply the same hybrid argument in this case as above. However, for the step \(i\ge K\), when using the obfuscation of our level-1 unit, we need to specify sampler \(\textsf {Sampler}'_k\) with oracle access to \({\mathcal {O}_{1}},\dots ,{\mathcal {O}_{4}}\) to generate the input circuit for the complementing obfuscator. Algorithm \(\textsf {Sampler}'_k\) first runs \(\textsf {Sampler}\) to get \((C_0,C_1,\textsf {aux})\), then generates two samples of the level-1 unit (one time using \(C_0\) for the honest obfuscators and \(C_1\) for \({\mathcal {O}^*_{k}}\), and the other time using \(C_0\) everywhere), and finally outputs these two samples and \(\textsf {aux}'=(C_0,C_1,\textsf {aux})\) as auxiliary data. Note that finding an input x where the two level-1 unit samples differ is impossible, as both implement the same function.

We next show that the claim remains true with respect to virtual black-box and grey-box obfuscation. For this we assume that the adversary and the simulator receive some circuit-dependent auxiliary input \(\mathsf {aux}\) as additional input, as explained in Sect. 2.

Proposition 1

The combiner in Fig. 1 is a strongly robust 3-out-of-4 combiner for virtual black-box and grey-box obfuscation with respect to dependent auxiliary input.

Proof

Functional correctness follows as in the case of indistinguishability obfuscation. We only discuss the VBB property here; the VGB property follows analogously.

Consider an adversary \(\mathcal {A}_0\) against VBB obfuscation. This adversary receives an output sample \(O'\) of our combiner as input and some auxiliary input \(\mathsf {aux}[0]=\mathsf {aux}[0](C)\). Let k be again the index of the malicious obfuscator and this time define \(L=L(k)\in \{3,5\}\) as follows. For \(k=4\) we would have the malicious obfuscator \({\mathcal {O}^*_{4}}\) only on first-level units and we only need to look at the \(L=3\) second-level obfuscators. For \(k\in \{1,2,3\}\), on the other hand, the malicious combiner appears in a second-level unit and we thus consider the \(L=5\) sound obfuscators, consisting of the 3 obfuscators leading to the second-level appearance of \({\mathcal {O}^*_{k}}\) and the remaining 2 honest level-two obfuscators.

Assume now that we change the auxiliary input to include the obfuscator results of our combiner for all L sound obfuscators defined above. Denote these intermediate results, ordered according to the obfuscator application, by \(O[1..L]=(O_{i_1},O_{i_2},O_{i_3},\dots ,O_{i_L})\), and let O[1..i] denote the first i entries in O[1..L]. Let \(\mathsf {aux}[0..i]\) denote the sample given by a sample of first i obfuscator outputs, together with the (independent) sample \(\mathsf {aux}[0]\) of \(\mathcal {A}_0\).

Instead of considering \(\mathcal {A}_0(1^\lambda ,O',\mathsf {aux}[0])\) we construct an algorithm \(\mathcal {A}_1\) which receives \(1^\lambda \) and \(\mathsf {aux}[0..L]\) as input, assembles a combiner output \(O'\) from \(\mathsf {aux}[1..L]\) by possibly evaluating the (level-2) malicious obfuscator, and runs adversary \(\mathcal {A}_0(1^\lambda ,O',\mathsf {aux}[0])\). Then, clearly, the output distribution of both algorithms are identical. We can now view \(\mathcal {A}_1\) as an algorithm which receives \(\mathsf {aux}[0..L-1]\) as auxiliary input, and the obfuscated circuit \(\mathsf {aux}[L]\) together with \(1^\lambda \) as regular input.Footnote 7 For this algorithm \(\mathcal {A}_1\), by assumption about the security of \({\mathcal {O}_{i_L}}\) producing \(O_{i_L}\), there exists a simulator \(\mathcal {S}^C_1(1^\lambda ,\mathsf {aux}[0..L-1])\) with negligibly close output distribution.

Given \(\mathcal {S}_1\) we construct an adversary \(\mathcal {A}_2\) which receives auxiliary input \(\mathsf {aux}[0..L-2]\), and \(1^\lambda \) and \(\mathsf {aux}[L-1]\) as regular input. It runs \(\mathcal {S}_1(1^\lambda ,\mathsf {aux}[0..L-2])\) and uses \(\mathsf {aux}[L-1]\) to answer oracle calls. Note that, by the functional correctness of \({\mathcal {O}_{i_{L-1}}}\), using \(\mathsf {aux}[L-1]\) to simulate the oracle C of \(\mathcal {S}_1\) is sound as both circuits compute the same function. We can set this argument forth to eventually obtain a simulator \(\mathcal {S}_L^C(1^\lambda ,\mathsf {aux}[0])\), producing some output distribution which is negligibly close to the one of our initial adversary \(\mathcal {A}_0(1^\lambda ,O',\mathsf {aux}[0])\). This shows VBB obfuscation.

Fig. 4.
figure 4

Examples of (insecure) structural combiners

4 Lower Bounds for Combiners

To illustrate how we use the two required security properties, function preservation and indistinguishability, against each other to derive our general result, it is useful to demonstrate our technique for some toy examples. In the examples we use an unspecified notion of indistinguishability of the obfuscators as we merely highlight the issues; the reader may think for sake of concreteness of the notion of indistinguishability obfuscation.

4.1 Simple Attempts that Fail

The first attempt to build a secure combiner consists of a single unit and is given in the left hand part of Fig. 4. It uses three obfuscators \({\mathcal {O}_{1}},{\mathcal {O}_{2}},{\mathcal {O}_{3}}\) and runs the input circuit through each of them. Then it combines the three obfuscated circuits by a majority circuit. Note that this means that this part takes the circuits and has some input wires for the input x, and it evaluates each circuit on x and outputs y as the bit-wise majority of the answers. While we cannot break the functional correctness of the combiner with a single corrupt obfuscator \({\mathcal {O}^*_{i}}\), we can easily break the indistinguishability property. To this end we take control of obfuscator \({\mathcal {O}^*_{1}}\) and let it simply output the input circuit in clear. Note that this means that the unit, after having been initialized, reveals the input circuit in clear as well, and this easy to distinguish.

In our second example we have two obfuscators \({\mathcal {O}_{1}},{\mathcal {O}_{2}}\), let the output of them be combined arbitrarily, and then input the derived circuit into obfuscator \({\mathcal {O}_{3}}\). Note that in this case it is unclear how to break the indistinguishability property by corrupting a single obfuscator only. If it is \({\mathcal {O}_{3}}\) then the obfuscators of the first unit already hide the input circuit; if we corrupt one of the obfuscators \({\mathcal {O}_{1}},{\mathcal {O}_{2}}\) then the final obfuscation hides the actual circuit.

We can, nonetheless, in the second example break the functional preservation property. Namely, assume that both \({\mathcal {O}_{1}}\) and \({\mathcal {O}_{2}}\) are secure and that there are two potential input circuits \(D_0,D_1\) computing different functions for the same input and output length. If we control \({\mathcal {O}_{1}}\), then we let it on any input circuit rather obfuscate \(D_1\). Vice versa, if we control \({\mathcal {O}_{2}}\) then we let it always obfuscate \(D_0\), independently of the actual input. It follows that the initialized combiners for \(D_0\) (with our malicious \({\mathcal {O}^*_{1}}\) and with genuine \({\mathcal {O}_{2}}\)) and for \(D_1\) (with genuine \({\mathcal {O}_{1}}\) and our malicious \({\mathcal {O}^*_{2}}\)) have the same distribution. For at least one of the two cases the computed function must then be incorrect, as the initialization samples for both input circuits \(D_0,D_1\) have the same distributions in both cases.

4.2 The General Case of 2-out-of-3 Combiners

The attacks in the simple case show the path for our general impossibility result for 2-out-of-3 structural combiners. If one of the three obfuscators appears in all units on the path from some level-1 unit to the final unit, then it can pass on information about the input circuit C to the final unit. This is done by forwarding some information about the input circuit in the output of the obfuscator. This would clearly violate the indistinguishability property. Hence, on all paths there must be a unit which only uses (at most) the same two obfuscators. But then we can “confuse” the combiner as we did in the second example above. The argument, however, requires some care to deal with the fact that we have many paths. Note also that the confusion strategy fails in our 3-out-of-4 construction because the majority of the three combiners yields the correct function.

For sake of concreteness we use the notion of indistinguishability obfuscation for the obfuscators and the combiners. Recall that this means that for any functionally equivalent circuits \(C_0,C_1\) from class \(\mathcal {C}\) the combiners initialization with these two circuits must be computationally indistinguishable. To avoid trivial cases we assume that the class \(\mathcal {C}\) contains at least two distinct but functionally equivalent circuits \(E_0,E_1\), and that it also contains two circuits \(D_0,D_1\) computing different functions. We call such classes non-trivial.

Since we pass on circuits as inputs we need to fix some encoding. Let \(\left\langle \cdot \right\rangle \) denote such a function mapping circuits from the class \(\mathcal {C}\) to strings. We assume that the encoding is such that given an encoding of a unit (after initialization) one can reconstruct the circuits output by the obfuscators. That is, there exists an efficient algorithm \(\textsf {reconstruct}\) such that given any initialization \(V\leftarrow U(C_1,C_2,C_3,\dots )\) of a unit U, including obfuscated circuits \(O_i\leftarrow {\mathcal {O}_{i_j}}(C_i)\) and possibly pass-through circuits \(O_i=C_i\), we have \(\textsf {reconstruct}(\left\langle V\right\rangle )=(\left\langle O_1\right\rangle ,\left\langle O_2\right\rangle ,\left\langle O_3\right\rangle ,\dots )\). Furthermore, we assume that the encoding of no unit coincides with the encoding of our equivalent circuits \(E_0\) or \(E_1\) such that it is clear if each \(\left\langle O_i\right\rangle \) is the result of a unit initialization or rather one of the circuits \(E_0\) or \(E_1\). We call such encodings \(\left\langle \cdot \right\rangle \) admissible. Note that this means that the combiner itself cannot apply any obfuscation techniques beyond the ones provided by the obfuscators placed inside the unit.

We first show that on any full path (from level-1 units to the final unit) each of the three combiners does not appear in a unit (and that a unit in which it does not appear is not pass-through). This holds for any structural combiner, independently of the total number of obfuscators and the number of malicious ones:

Lemma 1

Let \({{{\textsf {\textit{Comb}}}}}^{{\mathcal {O}_{1}},{\mathcal {O}_{2}},\dots ,{\mathcal {O}_{N}}}\) be a structural combiner for a non-trivial circuit class \(\mathcal {C}\) with admissible encoding \(\left\langle \cdot \right\rangle \). Then for any full path of units of the combiner and for any \(i\in \{1,2,\dots ,N\}\) there must be a unit which is not pass-through and which is not an \(\{i,\dots \}\)-unit, or else the combiner cannot be an indistinguishable obfuscator.

Proof

Assume that there exists a full path of units and an \(i\in \{1,2,\dots ,N\}\) such that each unit on the path is an \(\{i,\dots \}\)-unit or that it is pass-through (or both). Then we show how to break indistinguishability obfuscation as follows. Let \(E_0,E_1\) be some functional equivalent circuits in the class with distinct encodings under \(\left\langle \cdot \right\rangle \). We corrupt obfuscator \({\mathcal {O}_{i}}\) and for each input circuit let it, for each call, simply output the input circuit in clear, by duplicating the input description.

Since each unit on the pass includes the i-th obfuscator (or is pass through) a distinguisher can distinguish between a combiner obfuscation of \(E_0\) and \(E_1\) as follows. The distinguisher receives as input the initialization of the final unit U, and runs \(\textsf {reconstruct}(\left\langle U\right\rangle )\) to recover all (obfuscated or pass-through) input circuits \(O_1,O_2,\dots \). Since the distinguisher knows the layout of the combiner it can recursively apply the reconstruction algorithm to outputs of the i-th combiner resp. to passed circuits; both are initialized units. Following the full path in question, the distinguisher eventually obtains either \(E_0\) or \(E_1\) as the input circuit, and can thus distinguish the two cases easily.    \(\square \)

We next show that, given that each full path contains a unit in which, say, obfuscator \({\mathcal {O}_{3}}\) does not appear, we can confuse the combiner. This time, the claim only holds for 2-out-of-3 combiners:

Lemma 2

Let \({{{\textsf {\textit{Comb}}}}}^{{\mathcal {O}_{1}},{\mathcal {O}_{2}},{\mathcal {O}_{3}}}\) be a structural combiner for a non-trivial circuit class \(\mathcal {C}\) with admissible encoding \(\left\langle \cdot \right\rangle \). Then the combiner cannot be perfectly correct.

In particular, if u denotes the number of units in the structural combiner and m the maximal number of obfuscator gates in a unit, then with probability at least \(2^{-u}(mu)^{-mu}\) (over the random choices of the obfuscators) the combiner’s function is different from the one of the input circuit. If u and m are constant, for instance, this means a constant error in functional preservation.

Proof

By Lemma 1 for each path from level-1 units to the final unit there exists a unit which does not contain, say, the obfuscator \({\mathcal {O}_{3}}\) and which is neither pass-through. Put differently, such a unit contains (at most) the obfuscators \({\mathcal {O}_{1}},{\mathcal {O}_{2}}\), each one possibly multiple times Let \(U_1,U_2,\dots \) be the corresponding units which we call confusion units. In the example in Fig. 5 the confusion units on the three paths are marked by dotted lines.

We consider two cases, one time corrupting obfuscator \({\mathcal {O}_{1}}\), the other time corrupting obfuscator \({\mathcal {O}_{2}}\). Let us first consider the case that we corrupt obfuscator \({\mathcal {O}_{1}}\). Our version \({\mathcal {O}^*_{1}}\) of the obfuscator will internally hold, and formally attributed to the non-uniformity, an initialization sample of \({\textsf {Comb}}^{{\mathcal {O}_{1}},{\mathcal {O}_{2}},{\mathcal {O}_{3}}}(D_1)\) with the genuine obfuscators for input circuit \(D_1\). In particular, for each confusion unit \(U_i\) it will include the \(j\le m\) circuit codes of \(O_i^j[D_1]\) which the original obfuscator \({\mathcal {O}_{1}}\) output in unit \(U_i\) in this sample. In order to make our obfuscator state-free we will guess the right insertion positions and injected circuits. That is, for each call (about some input circuit) our malicious obfuscator \({\mathcal {O}^*_{1}}\) tosses a coin. If it comes out as head, then the obfuscator proceeds as the genuine obfuscator would. If it is tail, then it picks one of the at most mu circuits \(O_i^j[D_1]\) at random, and returns this circuit. An example of a run with good guesses is given in the left part of Fig. 5.

Fig. 5.
figure 5

Confusion units in this example are marked by dotted lines.

Fig. 6.
figure 6

Confusion strategy with malicious obfuscator \({\mathcal {O}^*_{1}}\) injecting parts of the upper \(D_1\) initialization sample into the lower \(D_0\) initialization (left), and malicious obfuscator \({\mathcal {O}^*_{2}}\) injecting parts of the upper \(D_0\) initialization sample into the lower \(D_1\) initialization (right).

For the other case we corrupt \({\mathcal {O}_{2}}\) and include a sample of \({\textsf {Comb}}^{{\mathcal {O}_{1}},{\mathcal {O}_{2}},{\mathcal {O}_{3}}}(D_0)\) of the genuine obfuscators, this time for input circuit \(D_0\). Analogously to the other case denote the output of \({\mathcal {O}_{2}}\) in the confusion unit \(U_i\) by \(O_i^j[D_0]\). When called, the malicious obfuscator \({\mathcal {O}^*_{2}}\) also generates an honest answer with probability \(\tfrac{1}{2}\), and inserts one of the pre-sampled circuits \(O_i^j[D_0]\), the choice made at random, in the other case.

For the analysis we start with the case of a malicious obfuscator \({\mathcal {O}^*_{1}}\). Note that, if we let u denote the number of units in the combiner, then with probability \(2^{-u}\) we overwrite the obfuscator’s behavior exactly for the confusion units, since we predict the status of each unit (confusion or not) exactly with probability \(\tfrac{1}{2}\). If so, then we also inject the hardwired circuits \(O_i^j[D_1]\) “correctly” in confusion unit \(U_i\) with probability at least \((mu)^{-mu}\), since we have at most u units with at most m obfuscator gates and need to guess for each unit correctly among the at most mu possibilities among all \(O_i^j[D_1]\)’s. If this happens, and the combiner receives circuit \(D_0\) as input, then in the confusion units we have consistent samples for \({\mathcal {O}_{1}}\) gates (if present), as if the combiners input had been \(D_1\). See the left part of Fig. 6 for an example. Simultaneously, in the same unit, we have consistent samples for \({\mathcal {O}_{2}}\) gates (if present), as if the overall input had been circuit \(D_0\).

By symmetry, the same is true if we control obfuscator \({\mathcal {O}^*_{2}}\) and the combiner’s input is \(D_1\). Hence, with probability at least \(2^{-u}(mu)^{-mu}\) either case creates the same output distribution. If this happens, then on each path to the final unit the corresponding confusion unit produces the same distribution upon the single initialization in both cases. It follows that the combiner must implement an incorrect function in one of the cases, showing that functional preservation is not satisfied. It follows that the combiner cannot be perfectly correct.

Noting that the combiner cannot work even if 2 of the 3 obfuscators both have both properties simultaneously, the previous lemmas imply that there are not even weakly robust 2-out-of-3 combiners for indistinguishability obfuscation. It follows that there cannot exist stronger forms of structural combiners either, such as 1-out-of-2 combiners, strong combiner, or virtual grey-box combiners.

Theorem 2

For any \(o {\in } \{VBB, VGB, indistinguishability, differing\)-\(inputs\}\) there is no structural weakly robust 2-out-of-3 o-obfuscation combiner \({{{\textsf {\textit{Comb}}}}}^{{\mathcal {O}_{1}},{\mathcal {O}_{2}},{\mathcal {O}_{3}}}\) for non-trivial circuit classes \(\mathcal {C}\) with admissible encoding \(\left\langle \cdot \right\rangle \).

5 The General Case of \((2\gamma {}[+1])\)-out-of-\((3\gamma {}[+1])\) Combiners

In this section we present a generalization of our 3-out-of-4 combiner to the case of \((2\gamma +1)\)-out-of-\((3\gamma +1)\) combiners for any fixed integer \(\gamma \). In fact, our combiner for \(\gamma =1\) in Sect. 3 can be seen as a special parallelized version of the general approach here. We then discuss that our lower bound for 2-out-of-3 structural combiners also carries over to the more general case of \(2\gamma \)-out-of-\(3\gamma \) combiners, showing that our general combiner here is optimal in this regard.

5.1 Robust \((2\gamma +1)\)-out-of-\((3\gamma +1)\) Combiners

Consider all sets I of subsets of \(\{1,2,\dots ,3\gamma +1\}\) of size \(2\gamma +1\). For each such set I form the unit which, similar to our 3-out-of-4 case, first in parallel obfuscates the input circuit with each obfuscator \({\mathcal {O}_{i}}\) for \(i\in I\), and then compute the majority circuit over all these \(2\gamma +1\) obfuscated circuits. We write

$$\begin{aligned} O_I(\cdot )=\text {MAJ} \left\{ {\mathcal {O}_{i}}(\cdot ) \, \left| \, i\in I {{\mathcal {O}_{i}}(\cdot )}\right. \right\} \end{aligned}$$

for this unit. To obfuscate a circuit C compose each of these units for all the I’s sequentially, in arbitrary order. Let us denote this process by

$$\begin{aligned} (\prod _{I} O_I)(\cdot ) = O_{I_\ell }(\cdots O_{I_3}(O_{I_2}(O_{I_1}(\cdot )))\cdots ) \end{aligned}$$

for constant \(\ell =\left( {\begin{array}{c}3\gamma +1\\ 2\gamma +1\end{array}}\right) \). Call this the sequential-subset combiner for \(\gamma \).

Intuitively, the sequential-subset combiner guarantees robustness as there exists a subset I such that this subset only uses the \(2\gamma +1\) uncorrupt obfuscators. At the same time each circuit \(O_I\) computes the correct function as the majority of the \(2\gamma +1\) obfuscators faithfully computes the correct function.

Theorem 3

For any constant \(\gamma \) the sequential-subset combiner is a strongly robust \((2\gamma +1)\)-out-of-\((3\gamma +1)\) combiner for indistinguishability obfuscation, for differing-inputs obfuscation, for virtual black-box obfuscation, and for grey-box obfuscation, the latter ones for dependent auxiliary inputs.

The proof is similar to our 3-out-of-4 combiner. Functionally correctness follows from the fact that the majority computation in each \(O_{I_i}\) ensures that the at most \(\gamma \) corrupt obfuscators cannot bias the outcome. Obfuscation follows as before because there must exist one set I which exclusively contains non-malicious obfuscators.

5.2 Impossibility for \(2\gamma \)-out-of-\(3\gamma \) Combiners

In this section we discuss that our lower bound for 2-out-of-3 structural combiners carries over to the more general case of \(2\gamma \)-out-of-\(3\gamma \) combiners.

Theorem 4

There is no structural weakly robust \(2\gamma \)-out-of-\(3\gamma \) obfuscation combiner \({{{\textsf {\textit{Comb}}}}}^{{\mathcal {O}_{1}},{\mathcal {O}_{2}},\dots ,{\mathcal {O}_{3\gamma }}}\) for non-trivial circuit classes \(\mathcal {C}\) with admissible encoding \(\left\langle \cdot \right\rangle \).

Proof

Recall the proof for the 2-out-of-3 case. There, in the first step we have shown that on each path from a level-1 unit to the output unit there must be a unit in which obfuscator \({\mathcal {O}_{3}}\) does not appear and which is not pass-through. We called these units confusion units.

The same argument now applies here as well for the \(\gamma \) obfuscators with indices \(2\gamma +1,\dots ,3\gamma \). Else, if there was a path in which one of these obfuscators appears in each unit (or if the unit is pass-through), then we could easily corrupt these obfuscators and forward information about the input circuit through the admissible encoding \(\left\langle \cdot \right\rangle \). Hence, in the case here there must be a confusion unit on each path, which only uses circuits with indices \(1,2,\dots ,2\gamma \) and which are not pass-through.

In the second step of the proof for the 2-out-of-3 case we then show that in the confusion units with obfuscators \({\mathcal {O}_{1}}\) and \({\mathcal {O}_{2}}\) we can confuse the combiner. One time we corrupt \({\mathcal {O}_{1}}\) and let it insert samples of circuit \(D_1\), and the other time we corrupt \({\mathcal {O}_{2}}\) and insert samples for \(D_0\), where \(D_0,D_1\) compute different functions. Then the combiner’s view when run on input \(D_0\) in the first case, and on \(D_1\) in the second case, has the same distribution and the combiner cannot provide functional correctness.

We apply the same argument here, one time corrupting the first \(\gamma \) obfuscators with indices \(1,\dots ,\gamma \) and inserting a sample for \(D_1\), and the other time corrupting obfuscators with indices between \(\gamma +1,\dots ,2\gamma \) and using a sample for \(D_0\). Then the combiner’s views in both cases (for input circuit \(D_0\) in the first case, and for \(D_1\) in the second case) are identical again such that it cannot provide a correct combiner.

As in the 2-out-of-3 case the malicious obfuscators above insert the confusion samples at random positions, such that it only achieves confusion with the same bound as in the previous case. Note also that we took advantage of the fact that corrupt combiners are coordinated centrally by the adversary.

6 Detecting Combiners

The combiners in the previous section were correcting in the sense that they guaranteed functionality correctness if a quorum if obfuscator candidates is secure. Here we consider combiners which should create circuits which either output the correct value, but may give some error output \(\bot \). We call them detecting combiners.

For detecting combiners we require a weaker correctness property, namely that for any circuit \(C\in \mathcal {C}\), for any \(O\leftarrow {\textsf {Comb}}^{{\mathcal {O}_{1}},{\mathcal {O}_{2}},\dots }(C)\) we have that \(O(x)\in \{ C(x),\bot \}\) for all \(x\in \{0,1\}^*\) in the domain of C. This means that the combiner may sometimes fail to compute the correct function value but then it signals this by outputting a special symbol \(\bot \). To prevent trivial solutions like the combiner which outputs the circuit that always returns \(\bot \) we assume that \(C\equiv O\) if all obfuscators are secure. Note that our assumption about the obfuscators \({\mathcal {O}_{1}},{\mathcal {O}_{2}},\dots \) being able to deal with (intermediate) combiner outputs in \(\mathcal {C}^{{\textsf {Comb}}}\) implies that the obfuscators may now also receive circuits which occasionally output \(\bot \).

6.1 Robust \((\gamma +1)\)-out-of-\((2\gamma +1)\) Detecting Combiners

To build our \((\gamma +1)\)-out-of-\((2\gamma +1)\) combiner we follow the approach of our sequential-subset combiner. We can also straightforwardly give the optimized version for the case of a 1-out-of-3 combiner, akin to our 1-out-of-4 combiner, but omit this step here. To build the sequential-subset combiner consider here all sets I of subsets of \(\{1,2,\dots ,2\gamma +1\}\) of size \(\gamma +1\). For each such set I we first obfuscate the input circuit with each obfuscator \({\mathcal {O}_{i}}\) for \(i\in I\). But now instead of completing the computation by adding a majority sub circuit, we now use the detecting version which (a) either outputs the string on which all circuits agree upon as output (even if it is \(\bot \)), or (b) returns \(\bot \) is there is no such unanimous decision. Let

$$\begin{aligned} O_I(\cdot )=\text {UNAN}\{{\mathcal {O}_{i}}(\cdot )\mid {i\in I}\} \end{aligned}$$

denote this unit with the unanimity circuit at the end. For obfuscation of C now compute the sequential-subset combiner

$$\begin{aligned} (\prod _{I} O_I)(\cdot ) = O_{I_\ell }(\cdots O_{I_3}(O_{I_2}(O_{I_1}(\cdot )))\cdots ) \end{aligned}$$

as before for constant \(\ell =\left( {\begin{array}{c}2\gamma +1\\ \gamma +1\end{array}}\right) \).

Theorem 5

For any constant \(\gamma \) the sequential-subset combiner is a strongly robust \((\gamma +1)\)-out-of-\((2\gamma +1)\) detecting combiner for indistinguishability obfuscation, for differing-inputs obfuscation, for virtual black-box obfuscation, and for grey-box obfuscation, the latter ones for dependent auxiliary inputs.

The proof is similar to the case of correcting combiners, except that we only guarantee the weaker functional correctness. This property is given since in each unit for index set I there is at least one honest obfuscator among the \(\gamma +1\) ones, the unanimity circuit either outputs the function value computed by the honest obfuscator (if all other circuits agree), which may either be the correct function value for some x or \(\bot \), or it returns the error message \(\bot \). It follows that the overall output of the combiner circuit can only comply with the circuit’s output, or returns \(\bot \). The obfuscation properties follow as before noting that the obfuscators are able to handle input circuits with output \(\bot \), and that there must exist an index set I which only contains good obfuscators.

6.2 Impossibility of \(\gamma \)-out-of-\(2\gamma \) Detecting Combiners

The idea for the lower bound for correcting combiners carries over to detecting combiners, as follows.

Theorem 6

There is no structural weakly robust \(\gamma \)-out-of-\(2\gamma \) obfuscation combiner \({{{\textsf {\textit{Comb}}}}}^{{\mathcal {O}_{1}},{\mathcal {O}_{2}},\dots ,{\mathcal {O}_{2\gamma }}}\) for non-trivial circuit classes \(\mathcal {C}\) with admissible encoding \(\left\langle \cdot \right\rangle \).

Proof

As in the case of 2-out-of-3 combiners and \(2\gamma \)-out-of-\(3\gamma \) combiners, here, there must be also (non-pass-through) confusion units in each path from input units to the final unit, where none of the obfuscators with indices \(\gamma +1,\dots ,2\gamma \) appears. Assume now that we corrupt the obfuscators with indices \(1,\dots ,\gamma \) and let these obfuscators insert intermediate samples of a circuit \(D_0\) of the combiner’s obfuscation in the confusion units, independently of the input. If the insertions happen at the right position with significant probability, then the combiner must output an obfuscated circuit as if the combiner has been run on \(D_0\) for honest obfuscators. In particular, the combiner’s circuit must then compute the function \(D_0\) on every input. This holds even if the original input circuit was \(D_1\), computing a different function than \(D_0\), i.e., \(D_0(z)\ne D_1(z)\) for some string z. But then the combiner’s circuit produces a false output \(D_0(z)\ne \bot \) for input z and cannot be detecting.

7 Implementation and Evaluation

Our formal results have been stated in terms of the common notion of circuit obfuscation. In practice, however, programs are usually considered to be better modeled for Turing machines. We stress that our results, especially for the majority-based combiner, hold for such Turing machine programs as well. Namely, our 3-out-of-4 combiner would then output the program implementing the nested majority implementations.

Concerning provably secure instantiations for Turing machine obfuscation, we note that if the running time and the input length of the Turing machine are bounded then one can in principle transform such machines into corresponding circuits, albeit at the cost of increasing the complexity significantly. A more efficient solution is to use obfuscation techniques for Turing machines directly. Given the current state of constructions this is possible if the input length can be bounded [40] and, for other constructions, if the space is also bounded beforehand [11, 17].

To evaluate the suggested combiners for typical obfuscation programs in practice we implemented the PyObf python package [9] that can be used to wrap existing obfuscators and to implement new combiners. Even though the conceptual construction of a combiner is not related to the concrete implementation of the underlying obfuscators, implementations for different programming languages might differ, since some constructions introduce new run-time parts (e.g., as the MAJ circuit in our case) to the program. We chose to use JavaScript as the implementation programming language because of the relatively high number of available obfuscators.

7.1 Performance Evaluation

For performance evaluation we used Yahoo!’s YUICompressor v2.4.8Footnote 8 as \({\mathcal {O}_{1}}\), a slightly randomized version of it as \({\mathcal {O}_{2}}\), Google’s Closure Compiler v20151015Footnote 9 as \({\mathcal {O}_{3}}\), and jsPacker.pl v1.00bFootnote 10 as \({\mathcal {O}_{4}}\). Note that there is no essential difference between \({\mathcal {O}_{1}}\) and \({\mathcal {O}_{2}}\), especially in terms of obfuscation overhead, since the latter only uses different and randomized symbol selection routine. Security of combiners usually relies on somewhat independent components but since we are mainly interested in performance evaluation here we opted for using the related choice. The evaluated combiners are:

$$\begin{aligned} C_1(.)&= {\mathcal {O}_{4}}({\mathcal {O}_{3}}({\mathcal {O}_{2}}({\mathcal {O}_{1}}(.)))) \\ C_2(.)&= {\mathcal {O}_{2}}({\mathcal {O}_{1}}({\mathcal {O}_{4}}({\mathcal {O}_{3}}(.)))) \\ C_3(.)&= \text {MAJ}({\mathcal {O}_{1}}(O_{2,3,4}), {\mathcal {O}_{2}}(O_{1,3,4}), {\mathcal {O}_{3}}(O_{1,2,4})) \\ C_4(.)&= \text {MAJ}({\mathcal {O}_{3}}(O_{4,1,2}), {\mathcal {O}_{4}}(O_{3,1,2}), {\mathcal {O}_{1}}(O_{3,4,2})) \end{aligned}$$

The evaluated programs (with varying input size, ranging from a few thousand bytes to a roughly million bytes) are Cookies.js v1.2.2Footnote 11 (6, 637 bytes), Highlight.js v9.0.0Footnote 12 (22, 604 bytes), jCarousel v0.3.4Footnote 13 (46, 007 bytes), Backbone.js v1.2.3Footnote 14 (71, 415 bytes), Chart.js v1.0.2Footnote 15 (109, 612 bytes), Epoch v0.8.4Footnote 16 (115, 940 bytes), Swig v1.4.2Footnote 17 (143, 975 bytes), PhysicsJS v0.7.0Footnote 18 (171, 847 bytes), jQuery v1.6.4Footnote 19 (238, 166 bytes), Raphaël v2.1.4Footnote 20 (304, 254 bytes), Dojo v1.10.4Footnote 21 (629, 481 bytes), Video.js v5.4.4Footnote 22 (675, 527 bytes) and AngularJS v1.4.5Footnote 23 (1, 052, 336 bytes).

Note that the above circuit model describes a program as a function with input and output, in contrast to the common software design of JavaScript libraries that heavily depends on the JavaScript context (e.g., the window object). But this difference is irrelevant to performance evaluation.

Fig. 7.
figure 7

Overhead of individual obfuscators (time and output bloat ratio) for the various programs (in relation to their input sizes for the obfuscator). Note that we do not display obfuscator \({\mathcal {O}_{2}}\) here as its performance is essentially identical to the one of \({\mathcal {O}_{1}}\).

Fig. 8.
figure 8

Overhead of cascaded combiner (time and output bloat ration).

Fig. 9.
figure 9

Overhead of 3-out-of-4 combiner (time and output bloat ration).

Figure 7 gives the effectiveness of the obfuscators in terms of obfuscation time and of output-bloat ratio. Here we show the figures in relation of the sizes of the various programs (from 6, 637 bytes to 1, 052, 336 bytes) given as input to the obfuscators. Note that the results may depend heavily on the specific input programs such that we cannot expect perfectly monotonic behavior in the graphs. Also, as mentioned before, many practical obfuscators come with techniques for code size reduction such that the output bloat ratio can be—and often is—smaller than 1. Next, we compare these figures to the results of the suggested combiners, first to the cascade combiners in Fig. 8 and then to the 3-out-of-4 combiners in Fig. 9.

In summary, the proposed obfuscation combiners do not add significant run time overhead compared to a single obfuscator. The factor is roughly proportional to the number of invoked instances, with some gains presumably due to the intermediate code optimization. Due to the advanced compression techniques the code size of our cascaded combiners is in the same order as the individual obfuscators. For the 3-out-of-4 combiner we of course get an increased output size because of the tripling for each majority step, potentially also hampering some code reductions.

7.2 Security Evaluation

Due to the unclear situation about security properties of practical obfuscators we have proven robustness of our combiners with respect to the common theoretical notions of obfuscation in the literature. There are approaches to define metrics for practical obfuscators, though. A first approach is by Collberg et al. [20] who define notions for potency (the incomprehensibility of the transformed program for humans), resilience (the hardness of undoing the transformation through the joint effort of engineers and deobfuscation techniques), and cost (the overhead caused by the obfuscator). The measure of quality of an obfuscator is then given by a vector of these three metrics.

While the notion of cost in [20] even distinguishes the full range between exponential and constant overhead in execution resources, the metrics for potency and resilience in [20] are less rigorous. They are accompanied by suitable software complexity measures such as program length or cyclomatic complexity [42]. Anckaert et al. [5] later used software complexity measures, too, for establishing a benchmarking system for obfuscators for binary executables.

While it is beyond the scope of our work here, it may be interesting to benchmark our obfuscation combiners according to the metrics in [5]. Note that their metrics focus on resilience and somewhat neglect the overhead. Since our combiners in principle increase the software complexity at the cost of incurring additional steps, one should expect that combinations of benchmarked obfuscators yield better values in this regard.

8 Conclusion

Our positive results about combiners, and also our lower bounds, indicate how to proceed both in theory and practice. If you only have two available candidates then the best solution appears to be the sequential composition \({\mathcal {O}_{2}}({\mathcal {O}_{1}}(\cdot ))\), if one can somehow guarantee that the inner obfuscator provides functional correctness. For three candidates (out of which at least two are sound) then our 2-out-of-3 detecting combiner should be the primary choice. To ensure correct output, our 3-out-of-4 combiner provides a secure solution.