Skip to main content

Modular, Correct Compilation with Automatic Soundness Proofs

  • Conference paper
  • First Online:
Leveraging Applications of Formal Methods, Verification and Validation. Modeling (ISoLA 2018)

Abstract

Formal verification of compiler correctness requires substantial effort. A particular challenge is lack of modularity and automation. Any change or update to the compiler can render existing proofs obsolete and cause considerable manual proof effort. We propose a framework for automatically proving the correctness of compilation rules based on simultaneous symbolic execution for the source and target language. The correctness of the whole system follows from the correctness of each compilation rule. To support a new source or target language it is sufficient to formalize that language in terms of symbolic execution, while the corresponding formalization of its counterpart can be re-used. The correctness of translation rules can be checked automatically. Our approach is based on a reduction of correctness assertions to formulas in a program logic capable of symbolic execution of abstract programs. We instantiate the framework for compilation from Java to LLVM IR and provide a symbolic execution system for a subset of LLVM IR.

This work was funded by the Hessian LOEWE initiative within the Software-Factory 4.0 project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Compiler verification with interactive proof assistants requires this as well, even separate formalizations for the target and source languages.

  2. 2.

    When using Theorem 1 to automatically prove the soundness of translation rules with a deductive verification system such as , this aspect can and must be precisely modeled. This can be done by tracking for each abstract program an (abstract) set of variables it depends on.

References

  1. Ahrendt, W., Beckert, B., Bubel, R., Hähnle, R., Schmitt, P.H., Ulbrich, M.: Deductive Software Verification - The KeY Book. LNCS, vol. 10001. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49812-6

    Book  Google Scholar 

  2. Augustsson, L.: A compiler for lazy ML. In: Proceedings of LFP 1984. ACM (1984)

    Google Scholar 

  3. Breebaart, L.: Rule-based compilation of data parallel programs. Ph.D. thesis, Delft University of Technology (2003)

    Google Scholar 

  4. Bubel, R., Hähnle, R., Pelevina, M.: Fully abstract operation contracts. In: Margaria, T., Steffen, B. (eds.) ISoLA 2014. LNCS, vol. 8803, pp. 120–134. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45231-8_9

    Chapter  Google Scholar 

  5. Burstall, R.M.: Program proving as hand simulation with a little induction. In: Information Processing, pp. 308–312. Elsevier (1974)

    Google Scholar 

  6. Cadar, C., Dunbar, D., et al.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: 8th USENIX Conference on OSDI, pp. 209–224. USENIX Association, Berkeley (2008)

    Google Scholar 

  7. Hähnle, R., Schaefer, I., Bubel, R.: Reuse in software verification by abstract method calls. In: Bonacina, M.P. (ed.) CADE 2013. LNCS (LNAI), vol. 7898, pp. 300–314. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38574-2_21

    Chapter  Google Scholar 

  8. Ji, R.: Sound program transformation based on symbolic execution and deduction. Ph.D. thesis, Technische Universität Darmstadt (2014)

    Google Scholar 

  9. Ji, R., Bubel, R.: Program transformation and compilation. In: Ahrendt, W., Beckert, B., Bubel, R., Hähnle, R., Schmitt, P.H., Ulbrich, M. (eds.) Deductive Software Verification – The KeY Book. LNCS, vol. 10001, pp. 473–492. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49812-6_14

    Chapter  Google Scholar 

  10. Klein, G., Nipkow, T.: A machine-checked model for a Java-like language, virtual machine, and compiler. ACM Trans. PLS 28(4), 619–695 (2006)

    Google Scholar 

  11. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of CGO 2004, p. 75. IEEE Computer Society (2004)

    Google Scholar 

  12. Le, V., Afshari, M., et al.: Compiler validation via equivalence modulo inputs. In: Proceedings of 35th ACM SIGPLAN Conference on PLDI, pp. 216–226. ACM (2014)

    Google Scholar 

  13. Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–115 (2009)

    Article  Google Scholar 

  14. Menendez, D., Nagarakatte, S., Gupta, A.: Alive-FP: automated verification of floating point based peephole optimizations in LLVM. In: Rival, X. (ed.) SAS 2016. LNCS, vol. 9837, pp. 317–337. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53413-7_16

    Chapter  Google Scholar 

  15. Saabas, A., Uustalu, T.: A compositional natural semantics and Hoare logic for low-level languages. Theor. Comput. Sci. 373(3), 273–302 (2007)

    Article  MathSciNet  Google Scholar 

  16. Scheurer, D., Hähnle, R., Bubel, R.: A general lattice model for merging symbolic execution branches. In: Ogata, K., Lawford, M., Liu, S. (eds.) ICFEM 2016. LNCS, vol. 10009, pp. 57–73. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47846-3_5

    Chapter  Google Scholar 

  17. Steinhöfel, D., Wasser, N.: A new invariant rule for the analysis of loops with non-standard control flows. In: Polikarpova, N., Schneider, S. (eds.) IFM 2017. LNCS, vol. 10510, pp. 279–294. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66845-1_18

    Chapter  Google Scholar 

  18. Tan, Y.K., Myreen, M.O., et al.: A new verified compiler backend for CakeML. In: Proceedings of 21st International Conference on Functional Programming, pp. 60–73. ACM (2016)

    Google Scholar 

  19. Zhao, J., Nagarakatte, S., et al.: Formalizing the LLVM intermediate representation for verified program transformations. In: Proceedings of 39th ACM SIGPLAN-SIGACT Symposium on POPL, pp. 427–440. ACM (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominic Steinhöfel .

Editor information

Editors and Affiliations

Appendices

Appendix

This appendix provides additional material for the reader, including a sketch of the integration of state merging techniques into our approach, an extended compilation example, and a proof of Theorem 1.

A State Merging

Our SE transition relation defined in Sect. 3 outputs tree structures: all states, but the root, have exactly one predecessor. State Merging [16] has originally been invented to mitigate the state explosion problem of SE. It permits to merge symbolic states with identical program counter as arising, for example, after the execution of the then and else branch of an statement, thus reducing the state space. Definition 2 can be extended to transition relations with state merging by applying the same conditions reversely for “merging” transitions:

Fig. 7.
figure 7

State merging translation rule

Definition 8

(Soundness of SE with State Merging). Let \(\delta _m\) be an SE transition relation with state merging (some states have more than one predecessor). We call \(\delta _m\) sound iff the transition relation

$$ \begin{aligned} \delta ':=~&\{(i,o)\in \delta _m:\lnot \exists {}i'\ne {}i,(i',o)\in \delta _m\}\,\cup \\&\{(o,i):(i,o)\in \delta _m\wedge \exists {}i'\ne {}i,(i',o)\in \delta _m\} \end{aligned} $$

is sound in the sense of Definition 2.

Translation rule mergeTransl (Fig. 7) merges two branches with identical program counters. Without it, compiling programs with branching statements leads to duplicated code, for example, in rule ifElseTransl. The double horizontal line below the rule’s premise indicates its special status: it is the only rule with more than one conclusion. For constructing the merged symbolic store, there is more than one option [16]. In the simplest case, we create a fully precise value summary by means of an if-then-else term.

B Factorial Example

The program in Fig. 8 (p. x) computes the factorial of variable x, contained in variable res after termination. Figure 12 shows the SET of the dual SE states corresponding to the translation of the Java to LLVM IR code. Constructing the tree works by first symbolically executing the Java program. The Java SET has the same structure, without the LLVM IR parts and observable variables sets.

In the second phase, we apply translation rules from the leaves to the root, obtaining the dual SE states. The root of the tree contains the compiled LLVM IR program \(q_ result \) depicted in Fig. 9 (with dropped loop scopes). The tree is slightly simplified: we sometimes abbreviate programs with “\(\dots \)” and combine several translation rules into one, for example, in the translation of Java’s , which in the course of SE would be simplified to SSA form first. Two subtrees have been factored out for better readability (Figures 10 and 11).

The example also shows an artifact occurring due to our rule-based compilation: The basic block with label is never reached since it is not targeted by jumps, and the block before jumps outside the loop (this is the translation of the instruction). The instruction has been added because of the general definition of the translation rule; if the else block did not jump out of the loop, it would also make sense. Removal of such dead code, as well as other optimizations, are subject to subsequent post-processing steps and not within the scope of this paper.

Fig. 8.
figure 8

Factorial program in Java

Fig. 9.
figure 9

Factorial program in LLVM IR

Fig. 10.
figure 10

Subtree “then” for the example tree in Fig. 12

Fig. 11.
figure 11

Subtree “else” for the example tree in Fig. 12

Fig. 12.
figure 12

An example compilation tree (slightly simplified)

C Proofs

Proof

(Theorem 1). We must prove that a translation rule is sound (Definition 6) if \(\mathcal {F}=\big (\bigwedge _{i=1,\dots ,m} F ' {( pr_i )}\big )\rightarrow F ' {(c)}\) is valid. By Proposition 1, a dual SE state is valid iff its justifying formula is valid, and the shape of \(\mathcal {F}\) directly encodes Definition 6. Hence, it suffices to show that the validity of justifying formulas \( F ' {(s')}\) for abstract states \(s'\) implies the validity of \( F {(s)} \) for all concrete instantiations s of \(s'\). Let be an abstract SE state where , \(C^a\) and are symbolic and \(p_j^ abs \), \(p_ IR ^ abs \) contain abstract placeholder symbols. An instantiation s has the shape , where \(p_j\) and \(p_ IR \) result from \(p_j^ abs \) and \(p_ IR ^ abs \) by replacing placeholders with concrete programs. We have to show the following implication:

Since the validity of implies the validity of , it suffices to show the following, stronger property:

figure e

Since abstract execution is sound (Observation 1), the premise of (*) holds for all concrete contracts substituted for the abstract ones induced by the placeholders in the programs; this means that in particular, we can substitute \(p_j\)/\(p_ IR \) for \(p_j^ abs \)/\(p_ IR ^ abs \). Let , be the updates resulting from SE of the concrete programs (if SE splits, we can obtain summaries by state merging, see Appendix A).

Now there are two possibilities: Either, is an atom, or it is a guarded conjunction (and s constitutes the conclusion of a dual SE rule application). We focus on the more complicated second case. Let be the (concrete) set of observable variables of s, \(C_i\) the path conditions of the premisses of the rule that has s in its conclusion, and the variables occurring in any symbolic store or path condition of those premises. Then, after instantiation, (*) expands to:

Generally, can contain variables not occurring in , due to over-approximation. On the other hand, if contains variables not in , those are not in the scope of the currently investigated rule, and are therefore taken out of consideration here.Footnote 2 Additionally, we perform a strengthening by dropping the updates in the conclusion (this can be regarded as an abstraction step, since we discharge information). W.l.o.g., consider two different variables x, y and ; their right-hand sides in the updates and are , , etc. Under the assumption that x and y do not occur in the \(C_i\) (see Remark 1), the problem simplifies to:

figure f

For simplicity, we assume that the terms do not contain program variables. The equivalence in (\(\triangle \)) is valid iff for all interpretations \(\mathcal {I}\), it holds that

figure g

Let \(\mathcal {I} _0\) be an arbitrary interpretation; we show that (\(\Delta ^{ sem }\)) holds for \(\mathcal {I} _0\). All \(C_i\) are mutually exclusive due to the restriction built into our semantics (Definition 2), i.e. \(C_i\leftrightarrow \bigwedge _{k\ne i}\lnot C_k\) for all \(i=1,\dots ,n\). Since (\(\dagger \)) is valid, for any interpretation \(\mathcal {I}\) there is exactly one i such that

figure h

We choose an interpretation \(\mathcal {I} _1\) that (i) interprets in the same way as \(\mathcal {I} _0\), (ii) satisfies and, at the same time, (iii) satisfies (\(\dagger ^{{sem}}\)) for some i. By definition of \(\mathcal {I} _1\) we have that implies , and, similarly, for , . Hence, (\(\Delta ^{ sem }\)) holds in \(\mathcal {I} _0\).    \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Steinhöfel, D., Hähnle, R. (2018). Modular, Correct Compilation with Automatic Soundness Proofs. In: Margaria, T., Steffen, B. (eds) Leveraging Applications of Formal Methods, Verification and Validation. Modeling. ISoLA 2018. Lecture Notes in Computer Science(), vol 11244. Springer, Cham. https://doi.org/10.1007/978-3-030-03418-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03418-4_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03417-7

  • Online ISBN: 978-3-030-03418-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics