RV-Match: Practical Semantics-Based Program Analysis

Guth, Dwight; Hathhorn, Chris; Saxena, Manasvi; Roşu, Grigore

doi:10.1007/978-3-319-41528-4_24

Dwight Guth¹⁵,
Chris Hathhorn^15,17,
Manasvi Saxena^15,16 &
…
Grigore Roşu^15,16

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9779))

Included in the following conference series:

International Conference on Computer Aided Verification

2223 Accesses
9 Citations
3 Altmetric

Abstract

We present RV-Match, a tool for checking C programs for undefined behavior and other common programmer mistakes. Our tool is extracted from the most complete formal semantics of the C11 language. Previous versions of this tool were used primarily for testing the correctness of the semantics, but we have improved it into a tool for doing practical analysis of real C programs. It beats many similar tools in its ability to catch a broad range of undesirable behaviors. We demonstrate this with comparisons based on a third-party benchmark.

You have full access to this open access chapter, Download conference paper PDF

Runtime Verification at Work: A Tutorial

SeaHorn: A Framework for Verifying C Programs (Competition Contribution)

2LS for Program Analysis

Keywords

1 Introduction

The \(\mathbb {K}\) semantic framework^{Footnote 1} is a program analysis environment based on term rewriting [1]. Users define the formal semantics of a target programming language and the \(\mathbb {K}\) framework provides a series of formal analysis tools specialized for that language, such as a symbolic execution engine, a semantic debugger, a systematic checker for undesired behaviors (model checker), and even a fully fledged deductive program verifier. Our tool, RV-Match, is based on the \(\mathbb {K}\) framework instantiated with the publicly-available C11 semantics^{Footnote 2} [6, 7], a rigorous formalization of the current ISO C11 standard [10]. We have specially optimized RV-Match for the execution and detection of errors in C programs.

Unlike modern optimizing compilers, which have a goal to produce binaries that are as small and as fast as possible at the expense of compiling programs that may be semantically incorrect, RV-Match instead aims at mathematically rigorous dynamic checking of programs for strict conformance with the ISO C11 standard. A strictly-conforming program is one that does not rely on implementation-specific behaviors and is free of the most notorious feature of the C language, undefined behavior. Undefined behaviors are semantic holes left by the standard for implementations to fill in. They are the source of many subtle bugs and security issues [9].

Running RV-Match. Users interface with RV-Match through the kcc executable, which behaves as a drop-in replacement for compilers like gcc and clang. Consider a file undef.c with contents:

We compile the program with kcc just as we would with gcc or clang. This produces an executable named a.out by default, which should behave just as an executable produced by another compiler—for strictly-conforming, valid programs. For undefined or invalid programs, however, kcc reports errors and exits if it cannot recover:

In addition to location information and a stack trace, kcc also cites relevant sections of the standard [10].

2 Practical Semantics-Based Program Analysis

Unlike similar tools, we do not instrument an executable produced by a separate compiler. Instead, RV-Match directly interprets programs according to a formal operational semantics. The semantics gives a separate treatment to the three main phases of a C implementation: compilation, linking, and execution. The first two phases together form the “translation” semantics, which we extract into an OCaml program to be executed by the kcc tool. The kcc tool, then, translates C programs according to the semantics, producing an abstract syntax tree as the result of the compilation and linking phases. This AST then becomes the input to another OCaml program extracted from the execution semantics.

The tool on which we have based our work was originally born as a method for testing the correctness of the operational semantics from which it was extracted [7], but the performance and scalability limitations of this original version did not make it a practical option for analysis of real programs. To this end, we have improved the tool on several fronts:

OCaml-based execution engine. We implemented a new execution engine that interprets programs according to a language semantics 3 orders of magnitude faster than our previous Java-based version. For this improvement in performance, we take advantage of the optimized pattern-matching implemented by the OCaml compiler, a natural fit for \(\mathbb {K}\) Framework semantics. In the course of this work, we uncovered and fixed a few limitations of the OCaml compiler itself in dealing with very large pattern match expressions.^{Footnote 3}
Native libraries. Previous versions of our tool required all libraries to be given semantics (or their C source code) before they could be interpreted. We now support linking against and calling native libraries, automatically marshalling data to and from the representation used in the semantics.
Expanded translation phase. In our C semantics, we now calculate the type of all terms, the values of initializers, and generally do more evaluation of programs during the translation phase. Previously, much of this work was duplicated during execution.
Error recovery and implementation-defined behavior. We have implemented error recovery and expanded support for implementation-defined behavior. Programs generated by older versions of kcc would halt when encountering undefined or implementation-defined behavior. Our new version of kcc gives semantics for many common undefined behaviors so the interpreter can continue with what was likely the expected behavior after reporting the error. Similarly, we have added support for implementation profiles, giving users an easy way to parameterize the semantics over the behaviors of common C implementations.
Scope of errors. We have also expanded the breadth of the errors reported by kcc to include bad practices and errors involving standard library functions.^{Footnote 4}

These improvements have allowed kcc to build and analyze programs in excess of 300k lines of code, including the BIND DNS server.

Performance evaluation. For an idea of the extent of the performance enhancements over previous versions of our tool, consider this simple program that calculates the sum of integers between 0 and 10000:

In the table below, we compare the time in seconds to compile and run this program five times with an old version of our tool^{Footnote 5} [9] to our new version using our OCaml execution engine. The first and second rows report the average time for five compilations and runs,^{Footnote 6} respectively, and the third reports the sum of all runs plus the average compilation time to simulate the case of a compiled test being run on different input.

3 Evaluation

Of course, many other tools exist for analyzing C programs. In this section, we compare RV-Match with some popular C analyzers on a benchmark from Toyota ITC. We also briefly mention our experience with running our tool on the SV-COMP benchmark. The other tools we consider:

GrammaTech CodeSonar is a static analysis tool for identifying “bugs that can result in system crashes, unexpected behavior, and security breaches” [8].
MathWorks Polyspace Bug Finder is a static analyzer for identifying “run-time errors, concurrency issues, security vulnerabilities, and other defects in C and C++ embedded software” [11].
MathWorks Polyspace Code Prover is a tool based on abstract interpretation that “proves the absence of overflow, divide-by-zero, out-of-bounds array access, and certain other run-time errors in C and C++ source code” [12].
Clang UBSan, TSan, MSan, and ASan (version 3.7.1) are all clang modules for instrumenting compiled binaries with various mechanisms for detecting undefined behavior, data races, uninitialized reads, and various memory issues, respectively [5].
Valgrind Memcheck and Helgrind (version 3.10.1, GCC version 4.8.4) are tools for instrumenting binaries for the detection of several memory and thread-related issues (illegal reads/writes, use of uninitialized or unaddressable values, deadlocks, data races, etc.) [13].
The CompCert C interpreter (version 2.6) uses an approach similar to our own. It executes programs according to the semantics used by the CompCert compiler [3] and reports undefined behavior.
Frama-C Value Analysis (version sodium-20150201), like Code Prover, is a tool based on static analysis and abstract interpretation for catching several forms of undefinedness [4].

indicates the best score in a category for a particular metric.
DR, FPR, and PM are, respectively, the detection rate, (the complement of the false positive rate), and the productivity metric.
The final average is weighted by the number of tests in each category.
Italics and a dash indicate categories for which a tool has no support.

The Toyota ITC benchmark [14]. This publicly-available^{Footnote 7} benchmark consists of 1,276 tests, half with planted defects meant to evaluate the defect rate capability of analysis tools and the other half without defects meant to evaluate the false positive rate. The tests are grouped in nine categories: static memory, dynamic memory, stack-related, numerical, resource management, pointer-related, concurrency, inappropriate code, and miscellaneous.

We evaluated RV-Match along with the tools mentioned above on this benchmark. Our results appear in Fig. 1 and the tools we used for our evaluation are available online.^{Footnote 8} Following the method of [14], we report the value of three metrics: DR is the detection rate, the percentage of tests containing errors where the error was detected; \({{{\textsf {\textit{FPR}}}}} = 100 - {{{\textsf {\textit{FPR}}}}}\), where FPR is the false positive rate; and PM is a productivity metric, where , the geometric mean of DR and \({{{\textsf {\textit{FPR}}}}}\).

Interestingly, and similar to our experience with the SV-COMP benchmark mentioned below, the use of RV-Match on the Toyota ITC benchmark detected a number of flaws in the benchmark itself, both in the form of undefined behavior that was not intended, and in the form of tests that were intended to contain a defect but were actually correct. Our fixes for these issues were accepted by the Toyota ITC authors and we used the fixed version of the benchmark in our experiments. Unfortunately, we do not have access to the MathWorks and GrammaTech static analysis tools, so in Fig. 1 we have reproduced the results reported in [14]. Thus, it is possible that the metrics scored for the other tools may be off by some amount.

The SV-COMP benchmark suite. This consist of a large number of C programs used as verification tasks during the International Competition on Software Verification (SV-COMP) [2]. We analyzed 1346 programs classified as correct with RV-Match and observed that 188 (14 %) of the programs exhibited undefined behavior. Issues ranged from using uninitialized values in expressions, potentially invalid conversions, incompatible declarations, to more subtle strict aliasing violations. Our detailed results are available online.^{Footnote 9}

4 Conclusion

We have presented RV-Match, a semantics-based ISO C11 compliance checker. It does better than the other tools we considered in terms of its detection rate, and note that it reports no false positives. Also, we think our experience with finding undefined behavior even in the presumed-correct programs of the above benchmarks demonstrates our tool’s usefulness.

We do not claim, however, that our approach is simply better than the approaches represented by the other tools. We see our technology as a complement to other approaches. Static analysis tools, for example, are more forgiving in terms of analyzing code that does not even compile, so they can help find errors earlier. They also typically analyze all code in one run of the tool. On the other hand, our tool, like all tools performing dynamic analysis, generally requires the program to actually execute in order to detect most errors. Our tool also limits itself to the code that is actually executed, so it is best combined with existing testing infrastructure (e.g., by running unit tests with kcc).

Notes

1.
http://kframework.org.
2.
https://github.com/kframework/c-semantics.
3.
See http://caml.inria.fr/mantis/view.php?id=6883 and http://caml.inria.fr/mantis/view.php?id=6913.
4.
For a summary of the kinds of errors kcc will report, see https://github.com/kframework/c-semantics/blob/master/examples/error-codes/Error_Codes.csv.
5.
Version 3.4.0, with \(\mathbb {K}\) Framework version 3.4.
6.
These tests were run on a dual CPU 2.4 GHz Intel Xeon with 8 GB of memory. On more memory-intensive programs, we see an additional order of magnitude or more improvement in performance.
7.
https://github.com/Toyota-ITC-SSD/Software-Analysis-Benchmark.
8.
https://github.com/runtimeverification/evaluation/tree/master/toyota-itc-benchmark.
9.
https://github.com/runtimeverification/evaluation/tree/master/svcomp-benchmark.

References

Roşu, G., Şerbănuţă, T.F.: An overview of the K semantic framework. J. Log. Algebr. Program. 79(6), 397–434 (2010). doi:10.1016/j.jlap.2010.03.012
Article MathSciNet MATH Google Scholar
Beyer, D.: Reliableand reproducible competition results with BenchExec and witnesses. In: Chechik, M., Raskin, J.-F. (eds.) Tools and Algorithms for the Construction and Analysis of Systems: 22nd International Conference (TACAS 2016), (Report on SV-COMP 2016), pp. 887–904 (2016). ISBN: 978-3-662-49674-9, doi:10.1007/978-3-662-49674-9_55
Google Scholar
Campbell, B.: An executable semantics for CompCert C. In: Hawblitzel, C., Miller, D. (eds.) CPP 2012. LNCS, vol. 7679, pp. 60–75. Springer, Heidelberg (2012)
Chapter Google Scholar
Canet, G., Cuoq, P., Monate, B.: A value analysis for C programs. In: Conference Source Code Analysis and Manipulation (SCAM 2009), pp. 123–124. IEEE (2009). doi:10.1109/SCAM.2009.22
Clang: Clang 3.9 documentation. http://clang.llvm.org/docs/index.html
Ellison, C.: A formal semantics of C with applications. Ph.D. thesis, University of Illinois, July 2012. http://hdl.handle.net/2142/34297
Ellison, C., Roşu, G.: An executable formal semantics of C with applications. In: ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2012), pp. 533–544 (2012). doi:10.1145/2103656.2103719
GrammaTech: CodeSonar. http://grammatech.com/products/codesonar
Hathhorn, C., Ellison, C., Roşu, G.: Defining the undefinedness of C. In: 36th Conference on Programming Language Design and Implementation (PLDI 2015) (2015)
Google Scholar
ISO/IEC JTC 1, SC 22, WG 14. ISO/IEC 9899:2011: Programming Language C Technical report International Organisation for Standardization (2012)
Google Scholar
MathWorks: Polyspace Bug Finder. http://www.mathworks.com/products/polyspace-bug-finder
MathWorks: Polyspace Code Prover. http://www.mathworks.com/products/polyspace-code-prover
Nethercote, N., Seward, J.: Valgrind: a framework for heavy-weight dynamic binary instrumentation. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2007), pp. 89–100. ACM (2007). doi:10.1145/1250734.1250746
Shiraishi, S., Mohan, V., Marimuthu, H.: Test suites for benchmarks of static analysis tools. In: The 26th IEEE International Symposium on Software Reliability Engineering (ISSRE 2015), Industrial Track (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Runtime Verification Inc., Urbana, USA
Dwight Guth, Chris Hathhorn, Manasvi Saxena & Grigore Roşu
University of Illinois at Urbana-Champaign, Urbana, USA
Manasvi Saxena & Grigore Roşu
University of Missouri, Columbia, USA
Chris Hathhorn

Authors

Dwight Guth
View author publications
You can also search for this author in PubMed Google Scholar
Chris Hathhorn
View author publications
You can also search for this author in PubMed Google Scholar
Manasvi Saxena
View author publications
You can also search for this author in PubMed Google Scholar
Grigore Roşu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Dwight Guth , Chris Hathhorn , Manasvi Saxena or Grigore Roşu .

Editor information

Editors and Affiliations

Rice University , Houston, Texas, USA
Swarat Chaudhuri
University of Toronto , Toronto, Ontario, Canada
Azadeh Farzan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guth, D., Hathhorn, C., Saxena, M., Roşu, G. (2016). RV-Match: Practical Semantics-Based Program Analysis. In: Chaudhuri, S., Farzan, A. (eds) Computer Aided Verification. CAV 2016. Lecture Notes in Computer Science(), vol 9779. Springer, Cham. https://doi.org/10.1007/978-3-319-41528-4_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-41528-4_24
Published: 13 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41527-7
Online ISBN: 978-3-319-41528-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RV-Match: Practical Semantics-Based Program Analysis

Abstract