Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In this paper we propose our tool ParCoSS (Parallelized Compiled Symbolic Simulation) for verification of cooperative multithreading programs available in the Extended Intermediate Verification Language (XIVL) format. The XIVL extends the SystemC IVL [11, 15], which has been designed to capture the simulation semantics of SystemC programs [2, 10, 13], with a small core of OOP features to facilitate the translation of C++ code [16]. For verification purpose the XIVL supports computations with symbolic expressions and the assume and assert functions with their usual semantic. Our tool and set of XIVL examples is available at [1].

Verification of (cooperative) multithreading programs is difficult due to the large state space caused by all possible inputs and thread interleavings. Symbolic Simulation, a combination of symbolic execution [4, 14] and Partial Order Reduction (POR) [8, 9] has been shown to be particularly effective to tackle state explosion [5, 6, 15]. Recently Compiled Symbolic Simulation (CSS) has been proposed as further improvement [12]. CSS works by integrating the symbolic execution engine and POR based scheduler together with the multithreading program, e.g. available in the XIVL format, into a C++ program. Then, a standard C++ compiler is used to generate a native binary, whose execution performs exhaustive verification of the multithreading program. In contrast to traditional verification methods based on interpretation, CSS can provide significant simulation speed-ups especially by native execution of concrete operations.

The implementation of our tool ParCoSS is based on CSS and additionally supports parallelization to further improve simulation performance. Compared to the original CSS approach our tool uses a fork/join based state space exploration instead of manually cloning the execution states to handle non-deterministic choices due to symbolic branches and scheduling decisions. A fork/join based architecture most notably has the following advantages: (1) It allows to generate more efficient code. (2) It drastically simplifies the implementation.

Fig. 1.
figure 1

XIVL example program

In particular, we avoid the layer of indirection necessary for variable access when manually tracking execution states and use native execution for all function calls by employing coroutines. Besides very efficient context switch implementation, coroutines allow natural implementation of algorithms without unwinding the native stack and without using state machines to resume execution on context switches. Additionally, manual state cloning of complex internal data structures is error prone and difficult to implement efficiently, whereas the fork system call is already very mature and highly optimized. Finally, our architecture allows for straightforward and efficient parallelization by leveraging the process scheduling and memory sharing capabilities of the underlying operating system.

2 Extended Intermediate Verification Language (XIVL)

An example cooperative multithreading program illustrating the core features of the XIVL is shown in Fig. 1. The program is using two threads to compute the sum of odd numbers up to the bound specified by the variable x, which is initialized using a symbolic expression of type int in Line 2 and constrained in Line 28. The threads synchronize using the wait and notify functions on the global event e. The XIVL syntax resembles C++, supports integer and boolean data types with all arithmetic and logic operators, arrays and pointers, is using high-level control flow structures and has a small set of OOP features including classes, inheritance and virtual methods with overrides and dynamic dispatch.

Fig. 2.
figure 2

Tool overview

3 Implementation Details

To simplify development, facilitate code re-use and the translation process from XIVL to C++ we have implemented the PCSS (Parallel CSS) library, which provides common building blocks for parallel symbolic simulation. The PCSS library is linked with the C++ program during compilation. An overview of our tool is shown in Fig. 2. In the following we will describe our PCSS library, provide more details on the fork/join based exploration and briefly sketch the translation process from XIVL to C++.

3.1 PCSS Library

The right hand side of Fig. 2 shows the main components, and their interaction, of the PCSS library. Essentially it consists of the following components: kernel, scheduler, SMT engine, fork engine and some process shared data.

The kernel provides a small set of functions which directly correspond to the XIVL kernel related primitives (e.g. wait and notify) and allows to simulate the SystemC event-driven simulation semantics. Furthermore, all thread functions of the XIVL model are registered in the kernel. The kernel will allocate a coroutine with every thread function as entry point. Coroutines naturally implement context switches as they allow to jump execution between arbitrarily nested functions while preserving the local data. Our implementation is using the lightweight boost context library and in particular the functions make_fcontext and jump_fcontext to create and switch execution between coroutines, respectively. The scheduler is responsible for selecting the next runnable thread inside the scheduling loop of the kernel. Our coroutine implementation allows to easily switch execution between the scheduling loop and the chosen thread. POR is employed to reduce the number of explored interleavings. The POR dependency relation is statically generated from the XIVL model and encoded into the C++ program during translation. At runtime it is passed to the scheduler during initialization.

The SMT engine provides common functionality required for symbolic execution. It keeps track of the current path condition, handles the assume and assert functions, and checks the feasibility of symbolic branch conditions. Furthermore, the SMT engine provides SMT types and operations. Essentially this is a lightweight layer around the underlying SMT solver and allows to transparently swap the employed SMT solver.

The fork engine is responsible to split the execution process into two independent processes in case of a non-deterministic choice. This happens when both branch directions are feasible in the SMT engine or multiple thread choices are still available in the scheduler after applying POR. One of the forked processes will continue exploration while the other is suspended until the first terminates. This approach simulates a depth first search (DFS) of the state space. As an optimization, the fork engine allows to run up to N processes in parallel, where N is a command line parameter to the compiled C++ program. Parallelization is very efficient as the processes explore disjoint state spaces independently.

Fig. 3.
figure 3

Symbolic branch execution

3.2 Fork/Join Based State Space Exploration

Executing Symbolic Branches. The on_branch function in the SMT engine, shown in Fig. 3, accepts a symbolic branch condition and returns a concrete decision, which is then used to control native C++ control flow structures. The check_branch_status checks the feasibility of both branch directions by checking the satisfiability of the branch condition and its negation. In case both branch directions are feasible, the execution will fork (Line 4) into two independent processes and update the path condition (pc) with the normal (Line 6) or negated condition (Line 8), respectively. Please note that the execution is not forked and the path condition is not extended when only a single branch direction is feasible.

Fig. 4.
figure 4

Implementation of parallelized forking

Parallelization. The forked processes communicate using anonymous shared memory, which is created during initialization in the first process using the mmap system call and thus accessible from all forked child processes. The shared memory essentially contains three information: (1) counter variable to ensure that no more than N processes will run in parallel, (2) shutdown flag to gracefully stop the simulation, e.g. when an assertion violation is detected, (3) unnamed semaphore to synchronize access. The semaphore is initialized and modified using the sem_init, sem_post and sem_wait functions. Furthermore, each process locally keeps track of the number of forked child processes (num_children). Figure 4 shows an implementation of the fork function. First the fork system call is executed. The child process (pid is zero) will never block, since executing only one process will not increase the number of active processes. The parent process however will first try to atomically check and increment the shared counter in Line 5. When this fails, i.e. the maximum number N of processes is already working, the parent process will wait until a working processes finishes, by either awaiting one of its own children (Line 7) or until some other process joins its children (Line 9).

3.3 XIVL to C++ Translation

We use the XIVL example from Fig. 1 to illustrate the XIVL to C++ translation process, which basically performs five steps: (1) Replace native data types (integer and boolean) and operations with SMT types and operations where necessary (here variable x). Variables which are never assigned a symbolic value (here variables i and sum) can keep their native type and perform native operations. (2) Instrument control flow code to query the SMT engine for a concrete decision. The branch condition \(x\,>\,\) 0 will be transformed into an SMT expression, e.g. as smt \(\rightarrow \) bv_gt (x, smt \(\rightarrow \) bv_val(0)), and wrapped by the on_branch function of the smt_engine. (3) Generate static POR information for the scheduler. Essentially, a static analysis is employed to detect read/write and notify/wait dependencies. A flow- and context-insensitive pointer analysis is used to increase the precision. (4) Redirect builtin XIVL functions to the SMT engine (assume and assert) and kernel instance (e.g. wait_event and notify). (5) Add a new main function that will initialize the PCSS library and call the main function of the XIVL model. It will initialize the global data of the XIVL model and then enter the scheduling loop of the kernel by calling the start_simulation function.

Table 1. Experiment results, T.O. denotes timeout (limit 750 s)

4 Evaluation and Conclusion

We have evaluated our tool on a set of SystemC benchmarks from the literature [3, 7, 15] and the TLM model of the Interrupt Controller for Multiple Processors (IRQMP) of the LEON3-based virtual prototype SoCRocket [17]. All experiments have been performed on a Linux system using a 3.5 GHz Intel E3 Quadcore with Hyper-Threading. We used Clang 3.5 for compilation of the C++ programs and Z3 v4.4.1 in the SMT Layer. The time (memory) limit have been set to 750 s (6 GB), respectively. T.O. denotes that time limit has been exceeded. The results are shown in Table 1. It shows the simulation time in seconds for Kratos [7], Interpreted Symbolic Simulation (ISS) [6, 15], and our tool ParCoSS with a single process (P-1) and parallelized with four (P-4) and eight processes (P-8). Comparing P-4 with P-8 allows to observe the effect of Hyper-Threading. The results demonstrate the potential of our tool and show that our parallelization approach can further improve results. As expected, CSS can be considerably faster than ISS. On some benchmarks Kratos is faster due to its abstraction technique, but the difference is not significant. Furthermore, Kratos is not applicable to the irqmp benchmark due to missing C++ language features. For future work we plan to integrate dynamic information, for POR and selection of code blocks for native execution, into our CSS framework.