figure a

1 Introduction

C remains central to our infrastructure, widely used for security-critical components of hypervisors, operating systems, language runtimes, and embedded systems. This has prompted much research on the verification of C code, but the semantics of C is remarkably complex, and important aspects of it are still unsettled, leaving programmers and verification tool builders on shaky ground. Here we are concerned with three aspects:

1. The Concurrency Memory Model. The 2011 versions of the ISO C++ and C standards adopted a new concurrency model [3, 12, 13], formalised during the development process [11], but the model is still in flux: various fixes have been found to be necessary [9, 14, 26]; the model still suffers from the “thin-air problem” [10, 15, 35]; and Linux kernel C code uses a different model, itself recently partially formalised [7].

2. The Memory Object Model. A priori, one might imagine C follows one of two language-design extremes: a concrete byte-array model with pointers that are simply machine words, or an abstract model with pointers combining abstract block IDs and structured offsets. In fact C is neither of these: it permits casts between pointer and integer types, and manipulation of their byte representations, to support low-level systems programming, but, while at runtime a C pointer will typically just be a machine word, compiler analyses and optimisations reason about abstract notions of the provenance of pointers [27, 29, 31]. This is a subject of active discussion in the ISO C and C++ committees and in compiler development communities.

3. The Thread-Local Sequential Semantics. Here, there are many aspects, e.g. the loosely specified evaluation order, the semantics of integer promotions, many kinds of undefined behaviour, and so on, that are (given an expert reading) reasonably well-defined in the standard, but that are nonetheless very complex and widely misunderstood. The standard, being just a prose document, is not executable as a test oracle; it is not a reference semantics usable for exploration or automated testing.

Each of these is challenging in isolation, but there are also many subtle interactions between them. For example, between (1) and (3), the pre-C11 ISO standard text was in terms of sequential stepwise execution of an (informally specified) abstract machine, while the C11 concurrency model is expressed as a predicate over complete candidate executions, and the two have never been fully reconciled – e.g. in the standard’s treatment of object lifetimes. Then there are fundamental issues in combining the ISO treatment of undefined behaviour with that axiomatic-concurrency-model style [10, §7]. Between (1) and (2), one has to ask about the relationships between the definition of data race and the treatment of uninitialised memory and padding. Between (2) and (3), there are many choices for what the C memory object model should be, and how it should be integrated with the standard, which are currently under debate. Between all three one has to consider the relationships between uninitialised and thin-air values and the ISO notions of unspecified values and trap representations. These are all open questions in what the C semantics and ISO standard are (or should be). We do not solve them here, but we provide a necessary starting point: a tool embodying a precise reference semantics that lets one explore examples and debate the alternatives.

We describe a tool, Cerberus-BMC, that for the first time lets one explore the allowed behaviours of C test programs that involve all three of the above. It is available via a web interface at http://cerberus.cl.cam.ac.uk/bmc.html.

For (1), Cerberus-BMC is parameterised on an axiomatic memory concurrency model: it reads in a definition of the model in a Herd-like format [6], and so can be instantiated with (substantial fragments of) either the C11 [3, 9, 12,13,14], RC11 [26], or Linux kernel [7] memory models. The model can be edited in the web interface. Then the user can load (or edit in the web interface) a small C program. The tool first applies the Cerberus compositional translation (or elaboration) into a simple Core language, as in [29, 31]; this elaboration addresses (3) by making many of the thread-local subtleties of C explicit, including the loose specification of evaluation order, arithmetic conversions, implementation-defined behaviour, and many kinds of undefined behaviour. Core computation is simply over mathematical integers, with explicit memory actions to interface with the concurrency and memory object models. However, there is a mismatch between the axiomatic style of the concurrency models for C (expressed as predicates on arbitrary candidate executions) with the operational style of the previous thread-local operational semantics for Core. We address this by replacing the latter with a new translation from Core into SMT problems. This is integrated with the concurrency model, also translated into SMT, following the ideas of [5]. These are furthermore integrated with an SMT version of parts of the PNVI (provenance-not-via-integers) memory object model of [29], the basis for ongoing work within the ISO WG14 C standards committee, addressing (2). The resulting SMT problems are passed to Z3 [32]. The web interface then provides a graphical view of the allowed concurrent executions for small test programs.

The Cerberus-BMC tool should be useful for programmers, compiler writers, verification tool builders, and members of the C/C++ standards committees. We emphasise that it is intended as an executable reference semantics for small test programs, not itself as a verification tool that can be applied to larger bodies of C: we have focussed on making it transparently based on principled semantics for all three aspects, without the complexities needed for a high-performance verification tool. But it should aid the construction of such.

Caveats and Limitations. Cerberus-BMC covers many features of 1–3, but far from all. With respect to the concurrency memory model, we support substantial fragments of the C11, RC11, and Linux kernel memory models. We omit locks and the (deprecated) C11/RC11 consume accesses. We only cover compare-exchange read-modify-write operations, and the fragment of RCU restricted to read_rcu_lock(), read_rcu_unlock(), and synchronize_rcu() used in a linear way, without control-flow-dependent calls to RCU, and without nesting.

With respect to the memory object model, we do not currently support dynamic allocation or manipulation of byte representations (such as with char* pointers), and we do not address issues such as subobject provenance (an open question within WG14).

With respect to the thread semantics, our translation to SMT does not currently cover arbitrary pointer type-casting, function pointers, multi-dimensional arrays, unions, floating point, bitwise operations, and variadic functions, and only covers simple structs. In addition, we inherit the limitations of the Cerberus thread semantics as per [29].

Related Work. There is substantial prior work on tools for concurrency semantics and for C semantics, but almost none that combines the two. On the concurrency semantics side, CppMem [1, 11] is a web-interface tool that computes the allowed concurrent behaviours of small tests with respect to variants (now somewhat outdated) of the C11 model, but it does not support other concurrency models or a memory object model, and it supports only a small fragment of C. Herd [6, 8] is a command-line tool that computes the allowed concurrent behaviours of small tests with respect to arbitrary axiomatic concurrency models expressed in its cat language, but without a memory object model and for tests which essentially just comprise memory events, without a C semantics. MemAlloy [38] and MemSynth [16] also support reasoning about axiomatic concurrency models, but again not integrated with a C language semantics.

On the C semantics side, several projects address sequential C semantics but without concurrency. We build here on Cerberus [28, 29, 31], a web-interface tool that computes the allowed behaviours (interactively or exhaustively) for moderate-sized tests in a substantial fragment of sequential C, incorporating various memory object models (an early version supported Nienhuis’s operational model for C11 concurrency [33], but that is no longer integrated). KCC and RV-Match [19, 21, 22] provide a command-line semantics tool for a substantial fragment of C, again without concurrency. Krebbers gives a Coq semantics for a somewhat smaller fragment [24].

Then there is another large body of work on model-checking tools for sequential and concurrent C. These are all optimised for model-checking performance, in contrast to the Cerberus-BMC emphasis on expressing the semantic envelope of allowed behaviour as clearly as we can (and, where possible, closely linked to the ISO standard). The former include tis-interpreter [18, 36], CBMC [17, 25], and ESBMC [20]. On the concurrent side, as already mentioned, we build on the approach of [5], which integrated various hardware memory concurrency models with CBMC. CDSChecker [34] supports something like the C/C++11 concurrency model, but subject to various limitations [34, §1.3]. It is implemented using a dynamically-linked shared library for the C and C++ atomic types, so implicitly adopts the C semantic choices of whichever compiler is used. RCMC [23], supports memory models that do not exhibit Load Buffering (LB), for an idealised thread-local language. Nidhugg [4] supports only hardware memory models: SC, TSO, PSO, and versions of POWER and ARM.

Fig. 1.
figure 1

Cerberus-BMC Screenshot: C11 Release/Acquire Message Passing. If the read of y is 1, then the last thread has to see the write of 1 to x.

Fig. 2.
figure 2

Linux kernel memory model RCU lock. Without synchronize_rcu(), the reads of x and y can see 0 and 1 (as shown), even though they are enclosed in an RCU lock. With synchronization, after reading x=1, the last thread has to see y=1.

2 Examples

We now illustrate some of what Cerberus-BMC can do, by example.

Concurrency Models. First, for C11 concurrency, Fig. 1 shows a screenshot for a classic message-passing test, with non-atomic writes and reads of x, synchronised with release/acquire writes and reads of y. The test uses an explicit parallel composition, written , to avoid the noise from the extra memory actions in pthread_create. The consistent race-free UB-free execution on the right shows the synchronisation working correctly: after the i read-acquire of y=1, the l non-atomic read of x has to read x=1 (there are no consistent executions in which it does not). As usual in C/C++ candidate execution graphs, rf are reads-from edges, sb is sequenced-before (program order), mo is modification order (the coherence order between atomic writes to the same address), and asw is additional-synchronised-with, between parent and child threads and vice versa. Read and write events (R/W) are annotated na for non-atomic and rel/acq for release/acquire.

For the Linux kernel memory model, the example in Fig. 2 shows an RCU (read-copy-update) synchronisation.

Memory Object Model. The example below illustrates a case where one cannot assume that C has a concrete memory object model: pointer provenance matters.

figure c

In some C implementations, x and y will happen to be allocated adjacent (the \(\mathtt{\_\,\_}\)BMC_ASSUME restricts attention to those executions). Then &x+1 will have the same numeric address as &y, but the write *p=11 is undefined behaviour rather than a write to y. This was informally described in the 2004 ISO WG14 C standards committee response to Defect Report 260 [37], but has never been incorporated into the standard itself. Cerberus-BMC correctly reports UB found: source.c:8:5-7, UB043_indirection_invalid_value following the PNVI (provenance-not-via-integers) memory object model of [29].

ISO Subtleties. Turning to areas where the ISO standard is clear to experts but widely misunderstood, in the example on the right ISO leaves it implementation-

figure d

defined whether is signed or unsigned. In the former case, the ISO integer promotion and conversion semantics will make the equality test false, leading to a division by 0, which is undefined behaviour.

The example below shows the correct treatment of the ISO standard’s loose specification of evaluation order, together with detection of the concurrency model’s unsequenced races (ur in the diagram): there are write and read accesses to x that are unrelated by sequenced-before (sb), and not otherwise synchronised and hence unrelated by happens-before, which makes this program undefined behaviour.

figure e

Treiber Stack. Finally, demonstrating the combination of all three aspects, we implemented a modified Treiber stack (the push() function is shown in Fig. 3) with relaxed accesses to struct fields. Although the Treiber stack is traditionally implemented by spinning on a compare-and-swap, as that can spin unboundedly, we instead use _ _BMC_ASSUME to restrict executions to those where the compare-and-swap succeed. Our tool correctly detects the different results from the concurrent relaxed-memory execution of threads concurrently executing the push and pop functions.

Fig. 3.
figure 3

Treiber stack push()

Fig. 4.
figure 4

Core program corresponding to . Core is essentially a typed, first-order lambda calculus with explicit memory actions such as create and store to interface with the concurrency and memory object models.

3 Implementation

After translating a C program into Core (see Fig. 4), Cerberus-BMC does a sequence of Core-to-Core rewrites in the style of bounded model checkers such as CBMC: it unwinds loops and inlines function calls (to a given bound), and renames symbols to generate an SSA-style program.

The explicit representation of memory operations in Core as first-order constructs allows the SMT translation to be easily separated into three components: the translation from Core to SMT, the memory object model constraints, and the concurrency model constraints.

1. Core to SMT. Each value in Core is represented as an SMT expression, with fresh SMT constants for memory actions such as create and store (e.g. lines 2 and 4), the concrete values of which are constrained by the memory object and concurrency models. The elaboration of C to Core makes thread-local undefined behaviour (as opposed to undefined behaviour from concurrency or memory layout), like signed integer overflow, explicit with a primitive undef construct. Undefined behaviour is then encoded in SMT as reachability of undef expressions, that is, satisfiability of the control-flow guards up to them.

2. Memory Object Model. As in the PNVI semantics [30], Cerberus-BMC represents pointers as pairs \((\pi , a)\) of a provenance \(\pi \) and an integer address a. The provenance of a pointer is taken into account when doing memory accesses, pointer comparisons, and casts between integer and pointer values. Our tool models address allocation nondeterminism by constraining address values based on allocations to be appropriately aligned and non-overlapping, but not constraining the addresses otherwise.

3. Concurrency Model. Cerberus-BMC statically extracts memory actions and computes an extended pre-execution containing relations such as program order. As control flow can not be statically determined, memory actions are associated with an SMT boolean guard representing the control flow conditions upon which the memory action is executed.

Cerberus-BMC reads in a model definition in a subset of the herd cat language large enough to express C11, RC11, and Linux, and generates a set of quantifier-free SMT expressions corresponding to the model’s constraints on relations. These constraints are based on a set of “built-in” relations defined in SMT such as rf. Cerberus-BMC then queries Z3 to extract all the executions, displaying the load/store values and computed relations for the user.

4 Validation

We validate correctness of the three aspects of Cerberus-BMC as follows, though, as ever, additional testing would be desirable. Performance data, demonstrating practical usability, is from a MacBook Pro 2.9 GHz Intel Core i5.

For C11 and RC11 concurrency, we check on 12 classic litmus tests. For Linux kernel concurrency, we hand-translated the 9 non-RCU tests and 4 of the RCU tests of [7] into C, and automatically translated the 40 tests of [2]. Running all the non-RCU tests takes less than 5 min; the RCU tests are slower, of the order of one hour, perhaps because of the recursive definitions involved.

For the memory object model, we take the supported subset (36 tests) of the provenance semantics test suite of [29]. These single-threaded tests each run in less than a second.

For the thread-local semantics, the Cerberus pipeline to Core has previously been validated using GCC Torture, Toyota ITC, KCC, and Csmith-generated test suites [29]. We check the mapping to BMC using 50 hand-written tests and the supported subset (400 tests) of the Toyota ITC test suite, each running in less than two minutes.

These test suites and the examples in the paper can be accessed via the CAV 2019 pop-up in the File menu of the tool.