Program Verification by Coinduction
Abstract
We present a novel program verification approach based on coinduction, which takes as input an operational semantics. No intermediates like program logics or verification condition generators are needed. Specifications can be written using any state predicates. We implement our approach in Coq, giving a certifying languageindependent verification framework. Our proof system is implemented as a single module imported unchanged into languagespecific proofs. Automation is reached by instantiating a generic heuristic with languagespecific tactics. Manual assistance is also smoothly allowed at points the automation cannot handle. We demonstrate the power and versatility of our approach by verifying algorithms as complicated as SchorrWaite graph marking and instantiating our framework for object languages in several styles of semantics. Finally, we show that our coinductive approach subsumes reachability logic, a recent languageindependent sound and (relatively) complete logic for program verification that has been instantiated with operational semantics of languages as complex as C, Java and JavaScript.
1 Introduction
Formal verification is a powerful technique for ensuring program correctness, but it requires a suitable verification framework for the target language. Standard approaches such as Hoare logic [1] (or verification condition generators) require significant effort to adapt and prove sound and relatively complete for a given language, with few or no theorems or tools that can be reused between languages. To use a software engineering metaphor, Hoare logic is a design pattern rather than a library. This becomes literal when we formalize it in a proof assistant.
We present instead a single languageindependent program verification framework, to be used with an executable semantics of the target programming language given as input. The core of our approach is a simple theorem which gives a coinduction principle for proving partial correctness.
To trust a nonexecutable semantics of a desired language, an equivalence to an executable semantics is typically proved. Executable semantics of programming languages abound in the literature. Recently, executable semantics of several real languages have been proposed, e.g. of C [2], Java [3], JavaScript [4, 5], Python [6], PHP [7], CAML [8], thanks to the development of executable semantics engineering frameworks like K [9], PLTRedex [10], Ott [11], etc., which make defining a formal semantics for a programming language almost as easy as implementing an interpreter, if not easier. Our coinductive program verification approach can be used with any of these executable semantics or frameworks, and is correctbyconstruction: no additional “axiomatic semantics”, “program logic”, or “semantics suitable for verification” with soundness proofs needed.
 RQ1

Is it feasible to have a sound and (relatively) complete verification infrastructure based on coinduction, which is languageindependent and versatile, i.e., takes an arbitrary language as input, given by its operational semantics?
 RQ2

Is it possible to match, or even exceed, the capabilities of existing languageindependent verification approaches based on operational semantics?
To address RQ1, we make use of a key mathematical result, Theorem 1, which has been introduced in more general forms in the literature, e.g., in [12, 13] and in [14]. We mechanized it in Coq in a way that allows us to instantiate it with a transition relation corresponding to any target language semantics, hereby producing certifying program verification for that language. Using the resulting coinduction principle to show that a program meets a specification produces a proof which depends only on the operational semantics. We demonstrate our proofs can be effectively automated, on examples including heap data structures and recursive functions, and describe the implemented proof strategy and how it can be reused across languages defined using a variety of operational styles.
To address RQ2, we show that our coinductive approach not only subsumes reachability logic [15], whose practicality has been demonstrated with languages like C, Java, and JavaScript, but also offers several specific advantages. Reachability logic consists of a sound and (relatively) complete proof system that takes a given language operational semantics as a theory and derives reachability properties about programs in that language. A mechanical procedure can translate any proof using reachability logic into a proof using our coinductive approach.
We first introduce our approach with a simple intuitive example, then prove its correctness. We then discuss mechanical verification experiments across different languages, show how reachability logic proofs can be translated into coinductive proofs, and conclude with related and future work. Our entire Coq formalization, proofs and experiments are available at [16].
2 Overview and Basic Notions
While our coinductive program verification approach is selfcontained and thus can be presented without reliance on other verification approaches, we prefer to start by discussing the traditional Hoare logic approach, for two reasons. First, it will put our coinductive approach in context, showing also how it avoids some of the limitations of Hoare logic. Second, we highlight some of the subtleties of Hoare logic when related to operational semantics, which will help understand the reasons and motivations underlying our definitions and notations.
2.1 Intuitive Hoare Logic Proof
While one can prove Hoare triples valid directly using the step relation \(\rightarrow _R\) and induction, or coinduction like we propose in this paper, the traditional approach is to define a languagespecific proof system for deriving Hoare triples from other triples, also known as a Hoare logic, or program logic, for the target programming language. Figure 2 shows such a program logic for IMP. Hoare logics are generally not executable, so testing cannot show whether they match the intended semantics of the language. Even for a simple language like IMP, if one mistakenly writes \(\mathtt{e}=1\) instead of \(\mathtt{e}\ne 0\) in rule (HLwhile), then one gets an incorrect program logic. When trusted verification is desired, the program logic needs to be proved sound w.r.t. a reference executable semantics of the language, i.e, that each derivable Hoare triple is valid. This is a highly nontrivial task for complex languages (C, Java, JavaScript), in addition to defining a Hoare logic itself. Our coinductive approach completely avoids this difficulty by requiring no additional semantics of the programming language for verification purposes.
This example is not complicated, in fact it is very intuitive. However, it abstracts out a lot of details in order to make it easy for a human to understand. It is easy to see the potential difficulties that can arise in larger examples from needing to factor out the side effect, and from mixing both program variables and mathematical variables in Hoare logic specifications and proofs. With our coinduction verification framework, all of these issues are mitigated.
2.2 Intuitive Coinduction Proof
Since our coinductive approach is languageindependent, we do not commit to any particular, languagespecific formalism for specifying reachability claims, such as Hoare triples. Consequently, we will work directly with raw reachability claims/specifications \(S\subseteq C\times {{\mathcal P}}(C)\) consisting of sets of pairs (c, P) with \(c \in C\) and \(P \subseteq C\) as seen above. We show how to coinductively prove the claim for the example sum program in the form given in (1), relying on nothing but a general languageindependent coinductive machinery and the trusted execution step relation \(\rightarrow _R\) of IMP. Recall that we drop the state frames (\(\sigma \)) in (1).
Intuitively, our approach consists of symbolic execution with the language step relation, plus coinductive reasoning for circular behaviors. Specifically, suppose that \(S_{ circ} \subseteq C \times {{\mathcal P}}(C)\) is a specification corresponding to some code with circular behavior, say some loop. Pairs \((c,P) \in S_{ circ}\) with \(c\in P\) are already valid, that is, \(c \Rightarrow _R P\) for those. “Execute” the other pairs \((c,P)\in S_{ circ}\) with the step relation \(\rightarrow _R\), obtaining a new specification \(S'\) containing pairs of the form (d, P), where \(c \rightarrow _R d\); since we usually have a mathematical description of the pairs in \(S_{ circ}\) and \(S'\), this step has the feel of symbolic execution. Note that \(S_{ circ}\) is valid if \(S'\) is valid. Do the same for \(S'\) obtaining a new specification \(S''\), and so on and so forth. If at any moment during this (symbolic) execution process we reach a specification S that is included in our original \(S_{ circ}\), then simply assume that S is valid. While this kind of cyclic reasoning may not seem sound, it is in fact valid, and justified by coinduction, which captures the essence of partial correctness, languageindependently. Reaching something from the original specification shows we have reached some fixpoint, and coinduction is directly related to greatest fixpoints. This is explained in detail in Sect. 3.
In many examples it is useful to chain together individual proofs, similar to (HLseq). Thus, we introduce the following sequential composition construct:
Definition 1
For \(S_1,S_2\subseteq C\times {{\mathcal P}}(C)\), let Open image in new window . Also, we define \({{\mathrm{trans}}}(S)\) as Open image in new window (\({{\mathrm{trans}}}\) can be thought of as a transitivity proof rule).
If \(S_1\) and \(S_2\) are valid then Open image in new window is also valid (Lemma 2).
2.3 Defining Execution Step Relations
Since our coinductive verification framework is parametric in a step relation, which also becomes the only trust base when certified verification is sought, it is imperative for its practicality to support a variety of approaches to define step relations. Ideally, it should not be confined to any particular semantic style that ultimately defines a step relation, and it should simply take existing semantics “offtheshelf” and turn them into sound and relatively complete program verifiers for the defined languages. We briefly recall three of the semantic approaches that we experimented with in our Coq formalization [16].
Smallstep structural operational semantics [18] (Fig. 3 top) is one of the most popular semantic approaches. It defines the transition relation inductively. This semantic style is easy to use, though often inconvenient to define some features such as abrupt changes of control and true concurrency. Additionally, finding the next successor of a configuration may take longer than in other approaches. Reduction semantics with evaluation contexts [17], depicted in the middle of Fig. 3, is another popular approach. It allows us to elegantly and compactly define complex evaluation strategies and semantics of control intensive constructs (e.g., call/cc), and it avoids a recursive definition of the transition relation. On the other hand, it requires an auxiliary definition of contexts along with splitting and plugging functions.
As discussed in Sect. 1, several large languages have been given formal semantics using K [9] (Fig. 3 bottom). K is more involved and less conventional than the other approaches, so it is a good opportunity to evaluate our hypothesis that we can just “plugandplay” operational semantics in our coinductive framework. A Kstyle semantics extends the code in the configuration to a list of terms, and evaluates within subterms by having a transition that extracts the term to the front of the list, where it can be examined directly. This allows a nonrecursive definition of transition, whose cases can be applied by unification.
In practice, in our automation, we only need to modify how a successor for a configuration is found. Besides that, the proofs remain exactly the same.
3 Coinduction as Partial Correctness
The intuitive coinductive proof of the correctness of sum in Sect. 2.2 likely raised a lot of questions. We give formal details of that proof in this section as well go through some definitions and results of the underlying theory. All proofs, including our Coq formalization, are in [16].
3.1 Definitions and Main Theorem
First, we introduce a definition that we used intuitively in the previous section:
Definition 2
If \(R \subseteq C \times C\), let \({{\mathrm{valid}}}_R \subseteq C \times {\mathcal P}(C)\) be defined as \({{\mathrm{valid}}}_R = \{(c,P) \mid c \Rightarrow _R P \text{ holds }\}\).
Recall from Sect. 2.1 that \(c \Rightarrow _R P\) holds iff the initial state c can either reach a state in P or can take an infinite number of steps (with \(\rightarrow _R\)). Pairs \((c,P)\in C\times {\mathcal P}(C)\) are called claims or specifications, and our objective is to prove they hold, i.e., \(c \Rightarrow _R P\). Sets of claims \(S \subseteq C \times {\mathcal P}(C)\) are valid if \(S \subseteq {{\mathrm{valid}}}_R\). To show such inclusions by coinduction, we notice that \({{\mathrm{valid}}}_R\) is a greatest fixpoint, specifically of the following operator:
Definition 3
Therefore, to prove \((c, P) \in {{\mathrm{step}}}_R(S)\), one must show either that \(c \in P\) or that \((succ(c), P) \in S\), where succ(c) is a resulting configuration after taking a step from c by the operational semantics.
Definition 4
Given a monotone function \(F : {\mathcal P}(D) \rightarrow {\mathcal P}(D)\), let its Fclosure \(F^{*} : {\mathcal P}(D) \rightarrow {\mathcal P}(D)\) be defined as Open image in new window , where \(\mu \) is the least fixpoint operator. This is welldefined as \(Y\mapsto F(Y) \cup X\) is monotone for any X.
The following lemma suffices for reachability verification:
Lemma 1
For any \(R \!\subseteq \! C \!\times \! C\) and \(S \!\subseteq \! C \!\times \! {\mathcal P}(C)\), we have \(S \subseteq {{\mathrm{step}}}_R({{\mathrm{step}}}_R^{*}(S))\) implies \(S \subseteq {{\mathrm{valid}}}_R\).
The intuition behind this lemma is captured in Sect. 2.2: we continue taking steps and once we reach a set of states already seen, we know our claim is valid. This would not be valid if \({{\mathrm{step}}}_R({{\mathrm{step}}}_R^{*}(S))\) was replaced simply with \({{\mathrm{step}}}_R^*(S)\), as \(X \subseteq F^{*}(X)\) hold trivially for any F and X. Lemma 1 (along with elementary set properties) replaces the entire program logic shown in Fig. 2. The only formal definition specific to the target language is the operational semantics. Lemma 1 does not need to be modified or reproven to use it with other languages or semantics. It generalizes into a more powerful result, that can be used to derive a variety of coinductive proof principles:
Theorem 1
If \(F,G : {\mathcal P}(D) \rightarrow {\mathcal P}(D)\) are monotone and \(G(F(A)) \subseteq F(G^{*}(A))\) for any \(A \subseteq D\), then \(X \subseteq F(G^{*}(X))\) implies \(X \subseteq \nu F\) for any \(X \subseteq D\), where \(\nu F\) is the greatest fixpoint of F.
Proofs, including a verified proof in our Coq formulation are in [16]. The proof can also be derived from [12, 13, 14], though techniques from these papers had previously not been applied to program verification. Lemma 1 is an easy corollary, with both F and G instantiated as \({{\mathrm{step}}}_R\), along with a proof that \(\nu {{\mathrm{step}}}_R = {{\mathrm{valid}}}_R\) (see [16]). However, instantiating F and G to be the same function is not always best. An interesting and useful G is the transitivity function \({{\mathrm{trans}}}\) in Definition 1, which satisfies the hypothesis in Theorem 1 when F is \({{\mathrm{step}}}_R\). [16] shows other sound instantiations of G.
3.2 Example Proof: Sum
We have to prove \(S \subseteq {{\mathrm{valid}}}_R\). Note that this specification is more general than the specifications in Sect. 2.2. Here, T represents the remainder of the code to be executed, while \(\sigma \) represents the remainder of the store, with \(\sigma [\bot /\texttt {s}]\) as \(\sigma \) restricted to \(Dom(\sigma )/\{\texttt {s}\}\). Thus, we write out the entire configuration here, which gives us freedom in expressing more complex specifications if needed.
Instead of proving this directly, we will prove two subclaims valid and connect them via sequential composition (Definition 1). First, we need the following:
Lemma 2
Open image in new window if \(S_1 \subseteq {{\mathrm{valid}}}_R\) and \(S_2 \subseteq {{\mathrm{valid}}}_R\).
3.3 Example Proof: Reverse
Then, the individual proofs for these specifications closely follow the same flavor as in the previous example: use \({{\mathrm{step}}}_R\) to execute the program via the operational semantics, use unions to case split as needed, and finish when we reach something in the target set or that was previously in our specification. The inherent similarity between these two examples hints that automation should not be too difficult. We go into detail regarding such automation in Sect. 4.
Reasoning with fixpoints and functions like \({{\mathrm{step}}}_R\) can be thought of as reasoning with proof rules, but ones which interact with the target programming language only through its operational semantics. The \({{\mathrm{step}}}_R\) operation corresponds, conceptually, to two such proof rules: taking an execution step and showing that the current configuration is in the target set. Sequential composition and the \({{\mathrm{trans}}}\) rule corresponds to a transitivity rule used to chain together separate proofs. Unions correspond to case analysis. The fixpoint in the closure definition corresponds to iterative uses of these proof rules or to referring back to claims in the original specification.
4 Experiments
We first discuss the example languages and programs, and the reusable elements in specifications, especially an effective style of representation predicates for heapallocated data structures. Then we show how we wrote specifications for example programs. Next we describe our proof automation, which was based on an overall heuristic applied unchanged for each language, though parameterized over subroutines which required somewhat more customization. Finally, we conclude with discussion of our verification of the SchorrWaite graphmarking example and a discussion of our support for verification of divergent programs.
4.1 Languages
We discuss three languages following different paradigms, each defined operationally. Many language semantics are available with the distributions of K [9], PLTRedex [10], and Ott [11], e.g., but we believe these three languages are sufficient to illustrate the languageindependence of our approach. Figure 4 shows a destructive linked list append function in each of the three languages.
HIMP (IMP with Heap) is an imperative language with (recursive) functions and a heap. The heap addresses are integers, to demonstrate reasoning about lowlevel representations, and memory allocation/deallocation are primitives. The configuration is a 5tuple of current code, local variable environment mapping identifiers to values, call stack with frames as pairs of code and environment, heap, and a collection of functions as a map from function name to definition.
Stack is a Forthlike stack based language, though, unlike in Forth, we do make control structures part of the grammar. A shared data stack is used both for local state and to communicate between function invocations, eliminating the store, formal parameters on function declarations, and the environment of stack frames. Stack’s configuration is also a 5tuple, but instead of a current environment there is a stack of values, and stack frames do not store an environment.
4.2 Specifying Data Structures
Our coinductive verification approach is agnostic to how claims in \(C \times {\mathcal P}(C)\) are specified. In Coq, we can specify sets using any definable predicates. Within this design space, we chose matching logic [23] for our experiments, which introduces patterns that concisely generalize the formulae of first order logic (FOL) and separation logic, as well as term unification. Symbols apply on patterns to build other patterns, just like terms, and patterns can be combined using FOL connectives, just like formulae. E.g., pattern \(P \wedge Q\) matches a value if P and Q both match it, [t] matches only the value t, \(\exists x . P\) matches if there is any assignment of x under which P matches, and \([\![\varphi ]\!]\) where \(\varphi \) is a FOL formula matches any value if \(\varphi \) holds, and no values otherwise (in [23] neither [t] nor \([\![\varphi ]\!]\) require a visible marker, but in Coq patterns are a distinct type, requiring explicit injections).
Example list specifications
4.3 Specifying Reachability Claims
The first parameter is the function definition. The second is the arguments. The heap effect is described as a pattern \(P_{ in }\) for the allowable initial states of the heap and function \(P_{ out }\) from returned values to corresponding heap patterns. For example, we specify the definition D of append in Fig. 4 by writing \( call (D,[x,y],(\mathrm {list}(a,x) * \mathrm {list}(b,y)), (\lambda r. \mathrm {list}(a{+\!\!+}b,r))) \), which is as compact and elegant as it can be. More specifications are given in Table 1. A number of specifications assert that part of the heap is left entirely unchanged by writing \([H]\wedge \ldots \) in the precondition to bind a variable H to a specific heap, and using the variable in the postcondition (just repeating a representation predicate might permit a function to reallocate internal nodes in a data structure to different addresses). The specifications Add and Add’ show that it can be a bit more complicated to assert that an input list is used undisturbed as a suffix of a result list. Specifications such as Length, Append, and Delete are written in terms of corresponding mathematical functions on the lists represented in the heap, separating those functional descriptions from details of memory layout.
4.4 Proofs and Automation
The basic heuristic in our proofs, which is also the basis of our proof automation, is to attack a goal by preferring to prove that the current configuration is in the target set if possible, then trying to use claims in the specification by transitivity, and only last resorting to taking execution steps according to the operational semantics or making case distinctions. Each of these operations begins, as in the example proofs, with certain manipulations of the definitions and fixpoints in the languageindependent core. Our heuristic is reusable, as a proof tactic parameterized over subtactics for the more specific operations. A prelude to the main loop begins by applying the main theorem to move from claiming validity to showing a coinductionstyle inclusion, and breaking down a specification with several classes of claims into a separate proof goal for each family of claims.
Additionally, our automation leverages support offered by the proof assistant, such as handling conjuncts by trying to prove each case, existentials by introducing a unification variable, equalities by unification, and so on. Moreover, we added tactics for map equalities and numerical formulae, which are shared among all languages involving maps and integers. The current proof goal after each step is always a reachability claim. So even in proofs which are not completely automatic, the proof automation can give up by leaving subgoals for the user, who can reinvoke the proof automation after making some proof steps of their own as long as they leave a proof goal in the same form.
4.5 Other Data Structures
Matching logic allows us to concisely define many other important data structures. Besides lists, we also have proofs in Coq with trees, graphs, and stacks [16]. These data structures are all used for proving properties about the SchorrWaite algorithm. In the next section we go into more detail about these data structures and how they are used in proving the SchorrWaite algorithm.
4.6 SchorrWaite
Our experiments so far demonstrate that our coinductive verification approach applies across languages in different paradigms, and can handle usual heap programs with a high degree of automation. Here we show that we can also handle the famous SchorrWaite graph marking algorithm [25], which is a wellknown verification challenge, “The SchorrWaite algorithm is the first mountain that any formalism for pointer aliasing should climb” [26]. To give the reader a feel for what it takes to mechanically verify such an algorithm, previous proofs in [27] and [28] required manually produced proof scripts of about 470 and, respectively, over 1400 lines and they both used conventional Hoare logic. In comparison our proof is 514 lines. Line counts are a crude measure, but we can at least conclude that the language independence and generality of our approach did not impose any great cost compared to using languagespecific program logics.
4.7 Divergence
Our coinductive framework can also be used to verify a program is divergent. Such verification is often a topic that is given its own treatment, as in [30, 31], though in our framework, no additional care is needed. To prove a program is divergent on all inputs, one verifies a set of claims of the form \((c, \emptyset )\), so that no configuration can be determined valid by membership in the final set of states. We have verified the divergence of a simple program under each style of IMP semantics in Fig. 3, as well as programs in each language from Sect. 4.1. These program include the omega combinator and the sum program from Sect. 3.2 with true replacing the loop guard.
4.8 Summary of Experiments
Statistics are shown in Table 2. For each example, size shows the amount of code to be verified, the size of the specification, and the size of the proof script. If verifying an example required auxiliary definitions or lemmas specific to that example, the size of those definitions were counted with the specification or proof. Many examples were verified by a single invocation of our automatic proof tactic, giving 1line proofs. Other small proofs required human assistance only in the form of applying lemmas about the domain. Proofs are generally smaller than the specifications, which are usually about as large as the code. This is similar to the results for Bedrock [32], and good for a foundational verification system.
Proof statistics
The reported “Proof” time is the time for Coq to process the proof script, which includes running proof tactics and proof searches to construct a complete proof. If this run succeeds, it produces a proof certificate file which can be rechecked without that overhead. For an initial comparison with Bedrock we timed their SinglyLinkedList.v example, which verifies length, reverse, and append functions that closely resemble our example code. The total time to run the Bedrock proof script was 93 s, and 31 s to recheck the proof certificate, distinctly slower than our times in Table 2. To more precisely match the Bedrock examples we modified our programs to represent lists nodes with fields at successive addresses rather than using HIMP’s records, but this only improved performance, down to 20 s to run the proof scripts, and 4 s to check the certificates.
5 Subsuming Reachability Logic
5.1 Advantages of Coinduction
A mechanical proof of our soundness theorem gives a more usable verification framework, since reachability logic requires operational semantics to be given as a set of rewrite rules, while our approach does not. Further, reachability logic fixes a set of syntactic proof rules, while in our approach the mathematical fixpoints and functions act as proof rules without explicitly requiring any. In fact, the generality of our approach allows introductions of other derived rules that do not compromise the soundness result. Similarly, the generality allows higherorder verification, which reachability logic cannot handle.
Further, we saw in Sect. 3 that the general proof of our theorem is entirely mathematical. We instantiate it with the \({{\mathrm{step}}}_R\) function to get a program verification framework. However, if we instantiate it with other functions, we could get frameworks for proving different properties, such as allpath validity or the “until” notion of validity previously mentioned. Reachability logic does not support any other notion of validity without changes to its proof system, which then require new proofs of soundness and relative completeness. For our framework, the proof of the main theorem does not need to be modified at all, and one only needs to prove that allpath validity is a greatest fixpoint (see Sect. 3). The same is true for any property. In this sense, this coinduction framework is much more general than the reachability logic proof system presented in [34].
5.2 Reachability Logic Proof System
The key construct in reachability logic is the notion of circularity. Circularities, represented as \(\mathcal C\) in Fig. 6, intuitively represent claims that are conjectured to be true but have not yet been proved true. These claims are proved using the Circularity rule, which is analogous in our coinductive framework to referring back to claims previously seen. Most of the other rules in Fig. 6 are not as interesting. Transitivity requires progress before the circularities are flushed as axioms. This corresponds to the outer \({{\mathrm{step}}}_R\) in our coinductive framework.
Clearly, there are obvious parallels between the Reachability Logic proof system and our coinductive framework. We have formalized and mechanically verified a detailed proof that reachability logic is an instance of our coinductive verification framework. One can refer to [16] for full details, but we briefly discuss the nature of the proof below.
5.3 Reachability Logic is Coinduction
Lemma 3
\(R_{\mathcal A} \vDash ^+ \mathcal A\) and if \(S_{\varphi \Rightarrow \varphi '} \subseteq {{\mathrm{valid}}}_{R_{\mathcal A}}\) then \(\mathcal A \vDash \varphi \Rightarrow \varphi '\).
This lemma suggests what to do: take any reachability logic proof of \(\mathcal A \vdash \varphi \Rightarrow \varphi '\) and any transition relation R such that \(R \vDash ^+ \mathcal A\), and produce a coinductive proof of \(S_{\varphi \Rightarrow \varphi '}\subseteq {{\mathrm{valid}}}_R\). This gives us not only a procedure to associate coinductive proofs to reachability logic proofs, but also an alternative method to prove the soundness of reachability logic. This is what we do below:
Theorem 2
If there is a reachability logic proof derivation for \(\mathcal A \vdash \varphi \Rightarrow \varphi '\) and a transition relation R such that \(R \vDash ^+ \mathcal A\), then \(S_{\varphi \Rightarrow \varphi '} \subseteq {{\mathrm{valid}}}_R\), and in particular this holds by applying Theorem 1 to an inclusion \(\overline{\mathcal C}\subseteq {{\mathrm{step}}}_R({{\mathrm{derived}}}_R^*(\overline{\mathcal C}))\). Here, \({{\mathrm{derived}}}_R\) is a particular function satisfying the conditions for G in Theorem 1 (see [16] for more details), and \(\mathcal C\) is a set of reachability rules consisting of \(\varphi \Rightarrow \varphi '\) along with those reachability rules which appear as conclusions of instances of the Circularity proof rule in the proof tree of \(\mathcal A \vdash \varphi \Rightarrow \varphi '\).
To prove Theorem 2, we apply the Set Circularity theorem of reachability logic [35], which states that any reachability logic claim \(\mathcal A \vdash \varphi \Rightarrow \varphi '\) is provable iff there is some set of claims \(\mathcal C\) such that \(\varphi \Rightarrow \varphi ' \in \mathcal C\) and for each \(\varphi _i \Rightarrow \varphi _i' \in \mathcal C\) there is a proof of \(\mathcal A \vdash _\mathcal C \varphi _i \Rightarrow \varphi _i'\) which does not use the Circularity proof rule. In the forward direction, we can take \(\mathcal C\) as defined in the statement of Theorem 2. The main idea is to convert proof trees into inclusions of sets of claims:
Lemma 4
Given a proof derivation of \(\mathcal A \vdash _\mathcal C \varphi _a \Rightarrow \varphi _b \) which does not use the Circularity proof rule (last rule in Fig. 6), if \(R \vDash ^+ \mathcal A\) and \(\mathcal C\) is nonempty then \(S_{\varphi _a \Rightarrow \varphi _b} \subseteq {{\mathrm{step}}}_R({{\mathrm{derived}}}_R^*(\overline{\mathcal C}))\).
This lemma is proven by strengthening the inclusion into one that can be proven by structural induction over the Reachability Logic proof rules besides Circularity.
Combining this lemma with Set Circularity shows that \(\overline{\mathcal C} = \cup _i S_{\varphi _i\Rightarrow \varphi _i'} \subseteq {{\mathrm{valid}}}_R\) which implies that \(S_{\varphi \Rightarrow \varphi '} \subseteq {{\mathrm{valid}}}_R\) exactly as desired. We have mechanized the proofs of Lemmas 3 and 4 in Coq [16]. This is a major result, constituting an independent soundness proof for Reachability Logic, and helps demonstrate the strength of our coinductive framework, despite its simplicity. Moreover, this allows proofs done using reachability logic as in [15] to be translated to mechanically verified proofs in Coq, immediately allowing foundational verification of programs written in any language.
6 Other Related Work
Here we discuss work other than reachability logic that is related to our coinductive verification system. We discuss commonly used program verifiers, including approaches based on operational semantics and Iris [36], an approach with some language independence. We also discuss related coinduction schemata.
6.1 Current Verification Tools
A number of prominent tools such as Why [37], Boogie [38, 39], and Bedrock [24, 32] provide program verification for a fixed language, and support other languages by translation if at all. For example, FramaC and Krakatoa, respectively, attempt to verify C and Java by translation through Why. Also, Spec# and Havoc, respectively, verify C# and C by translation through Boogie. We are not aware of soundness proofs for these translations. Such proofs would be highly nontrivial, requiring formal semantics of both source and target languages.
All of these systems are based on a verification condition (VC) generator for their programming language. Bedrock is closest in architecture and guarantees to our system, as it is implemented in Coq and verification results in a Coq proof certificate that the specification is sound with respect to a semantics of the object language. Bedrock supports dynamically created code, and modular verification of higherorder functions, for which our framework has preliminary support. Bedrock also makes more aggressive attempts at complete automation, which costs increased runtime. Most fundamentally, Bedrock is built around a VC generator for a fixed target language.
In sharp contrast to the above approaches, we demonstrated that a smallstep operational semantics suffices for program verification, without a need to define any other semantics, or verification condition generators, for the same language. A languageindependent, sound and (relatively) complete coinductive proof method then allows us to verify properties of programs using directly the operational semantics. As seen in Sect. 4.8 this language independence does not compromise other desirable properties. The required human effort and the performance of the verification task compare well with foundational program verifiers such as Bedrock, and we provide the same high confidence in correctness: the trust base consists of the operational semantics only.
6.2 Operational Semantics Based Approaches
Verifiable C [40] is a program verification tool for the C programming language based on an operational semantics for C defined in Coq. Hoare triples are then proved as lemmas about the operational semantics. However, in this approach and other similar approaches, it is necessary to prove such lemmas. Without them, verification of any nontrivial C program would be nearly impossible. In our approach, while we can also define and prove Hoare triples as lemmas, doing so is not needed to make program verification feasible, as demonstrated in the previous sections. We only need some additional domain reasoning in Coq, which logics like Verifiable C require in addition to Hoare logic reasoning. Thus, our approach automatically yields a program verification tool for any language with minimal additional reasoning, while approaches such as Verifiable C need over 40,000 lines of Coq to define the program logic. We believe this is completely unnecessary, and hope our coinductive framework will be the first step in eliminating such superfluous logics.
The work by the FLINT group [41, 42, 43] is another approach to program verification based on operational semantics. Languages developed use shallowly embedded state predicates in Coq, and inference rules are derived directly from the operational semantics. However, their work is not generic over operational semantics. For example, [43] is developed in the context of a particular machine model, with a fixed memory representation and register file. Even simple changes such as adding registers require updating soundness proofs. Our approach has a single soundness theorem that can be instantiated for any language.
Iris [36] is a concurrent separation logic that has language independence, with operational semantics formalized in Coq. Iris adds monoids and invariants to the program logic in order to facilitate verification. It also derives some Hoarestyle rules for verification from the semantics of a language. However, there are still structural Hoare rules that depend on the language that must be added manually. Additionally, once proof rules are generated, they are specialized to that particular language. Further, the verification in the paper relies on Hoare style reasoning, while in our approach, we do not assume any such verification style, as we work directly with the mathematical specifications. Finally, the monoids used are not generated and are specific to the program language used.
6.3 Other Coinduction Schemata
A categorical generalization of our key theorem was presented as a recursion scheme in [12, 13]. The titular result of the former is the dual of the \(\lambda \)coiteration scheme of the latter, which specializes to preorder categories to give our Theorem 1. A more recent and more general result is [14], which also generalized other recent work on coinductive proofs such as [44]. Unlike these approaches, which were presented for showing bisimilarity, the novelty of our approach stems in the use of these techniques directly to show Hoarestyle functional correctness claims, and in the development of the afferent machinery and automation that makes it work with a variety of languages, and not in advancing the already solid mathematical foundations of coinduction. Various weaker coinduction schemes are folklore, such as Isabelle/HOL’s standard library’s lemma coinduct3: Open image in new window .
7 Conclusion and Future Work
We presented a languageindependent program verification framework. Proofs can be as simple as with a custom Hoare logic, but only an operational semantics of the target language is required. We have mechanized a proof of the correctness of our approach in Coq. Combining this with a coinductive proof thus produces a Coq proof certificate concluding that the program meets the specification according to the provided semantics. Our approach is amenable to proof automation. Further automation may improve convenience and cannot compromise soundness of the proof system. A language designer need only give an authoritative semantics to enable program verification for a new language, rather than needing to have the experience and invest the effort to design and prove the soundness of a custom program logic.
One opportunity for future work is using our approach to provide proof certificates for reachability logic program verifiers such as K [9]. The K prover was used to verify programs in several real programming languages [15]. While the proof system is sound, trusting the results of these tools requires trusting the implementation of the K system. Our translation in Sect. 5 will allow us to produce proof objects in Coq for proofs done in K’s backend, which will make it sufficient to trust only Coq’s proof checker to rely on the results from K’s prover.
Another area for future work is verifying programs with higherorder specifications, where a specification can make reachability claims about values quantified over in the specification. This allows higherorder functions to have specifications that require functional arguments to themselves satisfy some specification. We have begun preliminary work on proving validity of such specifications using the notions of compatibility upto presented in [14]. Combining this with more general forms of claims may allow modular verification of concurrent programs, as in RGsep [45]. See [16] for initial work in these areas.
Other areas for future work are evaluating the reusability of proof automation between languages, and using the ability to easily verify programs under a modified semantics, e.g. adding time costs to allow proving realtime properties.
References
 1.Hoare, C.A.R.: An axiomatic basis for computer programming. Commun. ACM 12(10), 576–580 (1969). https://doi.org/10.1145/363235.363259CrossRefMATHGoogle Scholar
 2.Hathhorn, C., Ellison, C., Roşu, G.: Defining the undefinedness of C. In: PLDI, pp. 336–345. ACM (2015). https://doi.org/10.1145/2737924.2737979CrossRefGoogle Scholar
 3.Bogdănaş, D., Roşu, G.: KJava: a complete semantics of Java. In: POPL, pp. 445–456. ACM (2015). https://doi.org/10.1145/2676726.2676982
 4.Bodin, M., Chargueraud, A., Filaretti, D., Gardner, P., Maffeis, S., Naudziuniene, D., Schmitt, A., Smith, G.: A trusted mechanised JavaScript specification. In: POPL, pp. 87–100. ACM (2014). https://doi.org/10.1145/2535838.2535876
 5.Park, D., Stefănescu, A., Roşu, G.: KJS: a complete formal semantics of Javascript. In: PLDI, pp. 346–356. ACM (2015). https://doi.org/10.1145/2737924.2737991CrossRefGoogle Scholar
 6.Politz, J.G., Martinez, A., Milano, M., Warren, S., Patterson, D., Li, J., Chitipothu, A., Krishnamurthi, S.: Python: the full monty. In: OOPSLA, pp. 217–232. ACM (2013). https://doi.org/10.1145/2509136.2509536CrossRefGoogle Scholar
 7.Filaretti, D., Maffeis, S.: An executable formal semantics of PHP. In: Jones, R. (ed.) ECOOP 2014. LNCS, vol. 8586, pp. 567–592. Springer, Heidelberg (2014). https://doi.org/10.1007/9783662442029_23CrossRefGoogle Scholar
 8.Owens, S.: A sound semantics for OCaml\(_{light}\). In: Drossopoulou, S. (ed.) ESOP 2008. LNCS, vol. 4960, pp. 1–15. Springer, Heidelberg (2008). https://doi.org/10.1007/9783540787396_1CrossRefGoogle Scholar
 9.Roşu, G., Şerbănuţă, T.F.: An overview of the K semantic framework. J. LAP 79(6), 397–434 (2010). https://doi.org/10.1016/j.jlap.2010.03.012MathSciNetCrossRefMATHGoogle Scholar
 10.Klein, C., Clements, J., Dimoulas, C., Eastlund, C., Felleisen, M., Flatt, M., McCarthy, J.A., Rafkind, J., TobinHochstadt, S., Findler, R.B.: Run your research: on the effectiveness of lightweight mechanization. In: POPL, pp. 285–296. ACM (2012). https://doi.org/10.1145/2103656.2103691
 11.Sewell, P., Nardelli, F.Z., Owens, S., Peskine, G., Ridge, T., Sarkar, S., Strnisa, R.: Ott: effective tool support for the working semanticist. In: ICFP. ACM (2007). https://doi.org/10.1017/S0956796809990293
 12.Uustalu, T., Vene, V., Pardo, A.: Recursion schemes from comonads. Nord. J. Comput. 8(3), 366–390 (2001)MathSciNetMATHGoogle Scholar
 13.Bartels, F.: On generalised coinduction and probabilistic specification formats: distributive laws in coalgebraic modelling. Ph.D. thesis, Vrije Universiteit Amsterdam (2004)Google Scholar
 14.Pous, D.: Coinduction all the way up. In: LICS, pp. 307–316. IEEE (2016). https://doi.org/10.1145/2933575.2934564
 15.Ştefănescu, A., Park, D., Yuwen, S., Li, Y., Roşu, G.: Semanticsbased program verifiers for all languages. In: OOPSLA, pp. 74–91. ACM (2016). https://doi.org/10.1145/2983990.2984027
 16.Moore, B., Peña, L., Rosu, G.: GitHub repository (2017). https://github.com/FormalSystemsLaboratory/coinduction. Source code
 17.Wright, A.K., Felleisen, M.: A syntactic approach to type soundness. Inf. Comput. 115(1), 38–94 (1992). https://doi.org/10.1006/inco.1994.1093MathSciNetCrossRefMATHGoogle Scholar
 18.Plotkin, G.D.: A structural approach to operational semantics. J. Log. Algebraic Program. 60–61, 17–139 (2004). https://doi.org/10.1016/j.jlap.2004.05.001MathSciNetCrossRefMATHGoogle Scholar
 19.Reynolds, J.C.: Separation logic: a logic for shared mutable data structures. In: LICS, pp. 55–74. IEEE (2002). https://doi.org/10.1109/LICS.2002.1029817
 20.O’Hearn, P.W., Pym, D.J.: The logic of bunched implications. Bull. Symbolic Log. 5(2), 215–244 (1999). https://doi.org/10.2307/421090MathSciNetCrossRefMATHGoogle Scholar
 21.Felleisen, M., Friedman, D.P.: A calculus for assignments in higherorder languages. In: POPL, p. 314. ACM (1987). https://doi.org/10.1145/41625.41654
 22.Felleisen, M.: The calculi of Lambda\(\nu \)cs conversion: a syntactic theory of control and state in imperative higherorder programming languages. Ph.D. thesis, Indiana University (1987)Google Scholar
 23.Roşu, G.: Matching logic – extended abstract. In: RTA, LIPIcs, pp. 5–21. Schloss DagstuhlLZ I (2015). https://doi.org/10.4230/LIPIcs.RTA.2015.5
 24.Chlipala, A.: Mostlyautomated verification of lowlevel programs in computational separation logic. In: PLDI, pp. 234–245. ACM (2011). https://doi.org/10.1145/1993498.1993526CrossRefGoogle Scholar
 25.Schorr, H., Waite, W.M.: An efficient machineindependent procedure for garbage collection in various list structures. Commun. ACM 10(8), 501–506 (1967). https://doi.org/10.1145/363534.363554CrossRefMATHGoogle Scholar
 26.Bornat, R.: Proving pointer programs in Hoare logic. In: Backhouse, R., Oliveira, J.N. (eds.) MPC 2000. LNCS, vol. 1837, pp. 102–126. Springer, Heidelberg (2000). https://doi.org/10.1007/10722010_8CrossRefGoogle Scholar
 27.Mehta, F., Nipkow, T.: Proving pointer programs in higherorder logic. Inf. Comput. 199(1–2), 200–227 (2005). https://doi.org/10.1016/j.ic.2004.10.007MathSciNetCrossRefMATHGoogle Scholar
 28.Hubert, T., Marche, C.: A case study of C source code verification: the SchorrWaite algorithm. In: SEFM, pp. 190–199. IEEE (2005). https://doi.org/10.1109/SEFM.2005.1
 29.Gries, D.: The SchorrWaite graph marking algorithm. Acta Informatica 11(3), 223–232 (1979). https://doi.org/10.1007/BF00289068MathSciNetCrossRefMATHGoogle Scholar
 30.Gupta, A., Henzinger, T.A., Majumdar, R., Rybalchenko, A., Xu, R.G.: Proving nontermination. In: POPL, pp. 147–158. ACM (2008). https://doi.org/10.1145/1328438.1328459CrossRefGoogle Scholar
 31.Chen, H.Y., Cook, B., Fuhs, C., Nimkar, K., O’Hearn, P.: Proving nontermination via safety. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 156–171. Springer, Heidelberg (2014). https://doi.org/10.1007/9783642548628_11CrossRefGoogle Scholar
 32.Chlipala, A.: The Bedrock structured programming system: combining generative metaprogramming and Hoare logic in an extensible program verifier. In: ICFP, pp. 391–402. ACM (2013). https://doi.org/10.1145/2500365.2500592
 33.Roşu, G., Ştefănescu, A., Ciobâcă, Ş., Moore, B.M.: Onepath reachability logic. In: LICS, pp. 358–367. IEEE (2013). https://doi.org/10.1109/LICS.2013.42
 34.Roşu, G., Ştefănescu, A.: From Hoare logic to matching logic reachability. In: Giannakopoulou, D., Méry, D. (eds.) FM 2012. LNCS, vol. 7436, pp. 387–402. Springer, Heidelberg (2012). https://doi.org/10.1007/9783642327599_32CrossRefGoogle Scholar
 35.Roşu, G., Ştefănescu, A., Ciobâcă, c., Moore, B.M.: Reachability logic. Technical report, University of Illinois, July 2012. http://hdl.handle.net/2142/32952
 36.Jung, R., Swasey, D., Sieczkowski, F., Svendsen, K., Turon, A., Birkedal, L., Dreyer, D.: Iris: monoids and invariants as an orthogonal basis for concurrent reasoning. In: POPL, pp. 637–650. ACM (2015). https://doi.org/10.1145/2775051.2676980
 37.Filliâtre, J.C., Paskevich, A.: Why3—where programs meet provers. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 125–128. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642370366_8CrossRefGoogle Scholar
 38.Leino, K.R.M.: This is Boogie 2. Technical report, Microsoft Research, June 2008Google Scholar
 39.Barnett, M., Chang, B.Y.E., DeLine, R., Jacobs, B., Leino, K.R.M.: Boogie: a modular reusable verifier for objectoriented programs. In: de Boer, F.S., Bonsangue, M.M., Graf, S., de Roever, W.P. (eds.) FMCO 2005. LNCS, vol. 4111, pp. 364–387. Springer, Heidelberg (2006). https://doi.org/10.1007/11804192_17CrossRefGoogle Scholar
 40.Appel, A.W., Dockins, R., Hobor, A., Beringer, L., Dodds, J., Stewart, G., Blazy, S., Leroy, X.: Program Logics for Certified Compilers. Cambridge University Press, New York (2014)CrossRefGoogle Scholar
 41.Yu, D., Shao, Z.: Verification of safety properties for concurrent assembly code. In: ICFP, pp. 175–188. ACM (2004). https://doi.org/10.1145/1016850.1016875
 42.Feng, X., Shao, Z., Vaynberg, A., Xiang, S., Ni, Z.: Modular verification of assembly code with stackbased control abstractions. In: PLDI, pp. 401–414. ACM (2006). https://doi.org/10.1145/1133981.1134028
 43.Feng, X., Shao, Z., Guo, Y., Dong, Y.: Combining domainspecific and foundational logics to verify complete software systems. In: Shankar, N., Woodcock, J. (eds.) VSTTE 2008. LNCS, vol. 5295, pp. 54–69. Springer, Heidelberg (2008). https://doi.org/10.1007/9783540878735_8CrossRefGoogle Scholar
 44.Hur, C.K., Neis, G., Dreyer, D., Vafeiadis, V.: The power of parameterization in coinductive proof. In: POPL, pp. 193–206. ACM (2013)CrossRefGoogle Scholar
 45.Vafeiadis, V.: Modular finegrained concurrency verification. Ph.D. thesis, University of Cambridge (2008)Google Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.