The Termination and Complexity Competition
- 3.7k Downloads
The termination and complexity competition (termCOMP) focuses on automated termination and complexity analysis for various kinds of programming paradigms, including categories for term rewriting, integer transition systems, imperative programming, logic programming, and functional programming. In all categories, the competition also welcomes the participation of tools providing certifiable output. The goal of the competition is to demonstrate the power and advances of the state-of-the-art tools in each of these areas.
Termination and complexity analysis have attracted a lot of research since the early days of computer science. In particular, termination for the rewriting model of computation is essential for methods in equational reasoning: the word problem  asks for convertibility with respect to a rewrite system, and some instances can be solved by a completion procedure where termination needs to be checked in each step . Term rewriting is the basis of functional programming , which, in turn, is the basis of automated theorem proving . As early examples for the importance of termination in other domains and models of computation we mention that completion is used in symbolic computation for the construction of Gröbner Bases for polynomial ideals , and that boundedness of Petri Nets can be modeled by termination of vector addition systems, which is decidable .
Both termination and complexity (or resource consumption) are very relevant properties for many computation systems and keep being the focus of interest in newly emerging technologies. For instance, complexity analyzers are applied to analyze large Java programs in order to detect vulnerabilities .
Another particularly interesting example are smart contracts in blockchains, which are becoming very popular. Providing tools for analyzing their termination and bounding their resource consumption is critical . For example, transactions that run out-of-gas in the Ethereum blockchain platform throw an exception, their effect is reverted, and the gas consumed up to that point is lost.
Deciding the (uniform) termination problem is to determine whether a given program has only finite executions for all possible inputs. Termination is a well-known undecidable property for programs written in any Turing complete language, and any complexity analyzer must include termination analysis as well. Despite this challenging undecidable scenario, powerful automatic tools for many different formalisms are available nowadays.
History of termCOMP. After a tool demonstration at the Termination Workshop 2003 (Valencia) organized by Albert Rubio, the community decided to hold an annual termination competition and to collect benchmarks in order to spur the development of tools and new termination techniques. Since 2004 the competition, known as termCOMP, has been organized annually, with usually between 10 and 20 tools participating in the different categories on termination, complexity, and/or certification. The actual organizers of the competition have been Claude Marché (from 2004 to 2007), René Thiemann (from 2008 to 2013), Johannes Waldmann (from 2014 to 2017), and Akihisa Yamada (since 2018). Recent competitions have been executed live during the main conferences of the field (at FLoC 2018, FSCD 2017, WST 2016, CADE 2015, VSL 2014, RDP 2013, IJCAR 2012, RTA 2011, and FLoC 2010). Information on all termination and complexity competitions is available from http://termination-portal.org/.
Computational resources for the execution of the competition have been provided by LRI, Université Paris-Sud (from 2004 to 2007) and by the University of Innsbruck (from 2008 to 2013). Since 2014, the competition runs on StarExec, a cross-community service at the University of Iowa for the evaluation of automated tools based on formal reasoning. It provides a single piece of storage and computing infrastructure to the communities in computational logic developing such tools .
From 2014 to 2017, competition results were presented using a separate web application star-exec-presenter developed at HTWK Leipzig , giving both an aggregated view of results, as well as detailed results per category. Additionally, it provides options for sorting and selecting subsets of benchmarks and solvers according to various criteria, as well as for comparing results of various competitions and/or test runs. This helps to estimate progress and to detect inconsistencies. Since 2018, starexec-master  (the successor of star-exec-presenter) is in use (see Fig. 1 in Sect. 2).
Competition Benchmarks. The benchmark s used to run the competition on are collected in the Termination Problem Data Base (TPDB for short), which was originally created by Claude Marché, Albert Rubio, and Hans Zantema, and later on maintained, extended, and reorganized by René Thiemann, Johannes Waldmann, and Akihisa Yamada. Many researchers have contributed with new benchmark s over the years. The current version of TPDB (10.6) contains a total of 43,112 benchmark s and extends over 674 MByte (uncompressed).
The termination competitions started with categories on termination of string rewrite systems (SRSs) and term rewrite systems (TRSs). Apart from standard rewriting, there were also categories based on adding strategies and extensions like equational, innermost, or context-sensitive rewriting. Further categories were introduced afterwards, including, for instance, higher-order rewriting (since 2010) and cycle rewriting (since 2015). Categories on complexity analysis of rewrite systems were added in 2008.
Regarding analysis tools for programming languages, a category on termination of logic programs was already part of the competition in 2004. Categories for other programming paradigms were introduced later: since 2007 there is a category for functional (Haskell) programs, since 2009 termination of Java programs is also considered, and since 2014 C programs are handled as well. Moreover, back-end languages like integer transition systems (ITSs) or integer term rewriting are part of termCOMP since 2014. Last but not least, complexity analysis categories for some of these languages have also been included recently.
Finally, the first certification categories on rewriting were included in 2007 and have been extended to some other languages and formalisms over the years.
2 Organization of the Competition
In 2019 we plan to run the competition on StarExec again. Each tool will be run on all benchmark s of the categories it is registered for, with a wall-clock timeout of 300 s per example. Tools are expected to give an answer in the first line of their standard output, followed by a justification for their answer.
In termination categories, the expected answers are YES (indicating termination), NO (indicating nontermination), and MAYBE (indicating that the tool had to give up). Each YES or NO answer will score one point, unless it turns out to be incorrect. Each incorrect answer scores \(-10\) points.
In complexity categories, an answer specifies either or both upper- and lower-bound (worst-case) complexity. The score of an answer is the sum of the scores for the upper-bound and lower-bound, each of which depends on the number of other participants. Details of the answer format and scoring scheme are available at http://cbr.uibk.ac.at/competition/rules.php.
In contrast to previous years, we will not run the competition live but before the TACAS conference takes place. We reserve about two weeks for resolving technical issues, analyzing conflicting answers, and debugging. If participants fail to agree on the treatment of conflicts, the steering committee will finally decide which answer will be penalized.
Benchmarks are grouped in the TPDB according to the underlying computational model (rewriting or programming) and to the aim of the analysis (termination or complexity). This organization results in the following three meta categories since termCOMP 2014: termination of rewriting, termination of programs, and complexity analysis. (A further split of complexity analysis into two meta categories “complexity of rewriting” and “complexity of programs” might be considered in the future if there are categories concerning the complexity of several different programming languages.)
Roughly speaking, the two termination meta categories cover, on the one hand, termination of different flavors of rewriting according to various strategies (termination of rewriting), and on the other hand, termination of actual programming languages as well as related formalisms (termination of programs).
Which categories of a given meta category are actually run during a competition depends on the registered participants. Any category with at least two participants is run as part of its associated meta category. Of course, it is desirable to have as many participants as possible and therefore all developers of termination and complexity analysis tools are strongly encouraged to participate in the competition. In addition, as a special case, all those categories having only a single participant are collected into the auxiliary demonstration meta category. (While demonstration categories are not considered for computing scores and are thus not part of a competition in terms of awards or medals, this at least allows us to make unique tools visible to the outside world.)
Independent of their respective meta categories, there are several categories that come also in a special certified variant (marked by Open image in new window below). Before 2007, the standard approach of participating tools was to give some textual justification for their answers. However, there was no consensus on the format or the amount of detail for such justifications. Automated termination and complexity tools are rather complex programs. They are typically tuned for efficiency using sophisticated data structures and often have short release cycles facilitating the quick integration of new techniques. So, why should we trust such tools? Certification is the answer to this question. Tools that participate in certified categories are required to produce their justifications in a common format, the certification problem format, or CPF  for short. Justifications in this format are usually called certificates. To make sure that certificates are correct, certified categories employ a certifier—an automated tool that is able to rigorously validate a given certificate. For recent editions of termCOMP this certifier is CeTA [6, 49], short for “certified tool assertions”. Its reliability is due to the fact that its correctness has been established using the proof assistant Isabelle/HOL . In the past, other certifiers like CoLoR/Rainbow  and CiME/Coccinelle , formalized in Coq , were used as well.
3.1 Termination of Rewriting
There are many different flavors of term rewriting and strategies for applying rewrite rules. Many of those have their own categories.
For standard term rewrite systems, there are categories for plain rewriting (TRS StandardOpen image in new window ), relative rewriting (TRS RelativeOpen image in new window ), rewriting modulo equational theories (TRS EquationalOpen image in new window ), conditional term rewriting (TRS Conditional), context-sensitive rewriting (TRS Context Sensitive), innermost rewriting (TRS InnermostOpen image in new window ), and outermost rewriting (TRS OutermostOpen image in new window ). There is also a category for higher-order rewriting systems (HRS (union beta)).
Concerning string rewrite systems, there are categories for plain rewriting (SRS StandardOpen image in new window ), relative rewriting (SRS RelativeOpen image in new window ), and cycle rewriting (Cycle Rewriting).
3.2 Termination of Programs
Regarding programming languages and related formalisms, there are categories for C programs (C), C programs restricted to integers (C Integer), Java Bytecode (Java Bytecode), Prolog programs (Prolog), Haskell programs (Haskell), integer transition systems (Integer Transition Systems), and innermost rewriting with integer term rewrite systems (Integer TRS Innermost). Concerning termination of C programs, there is an “overlap” with the SV-COMP competition,1 where however the focus of the two competitions is different, since SV-COMP considers all kinds of verification tasks for C programs, whereas termCOMP considers termination of all kinds of programming languages. Usually, SV-COMP runs in winter and termCOMP runs in summer, such that in each of the competitions the new current state-of-the-art of C termination analysis is represented.
3.3 Complexity Analysis
With respect to complexity analysis, there are categories for integer transition systems (Complexity: ITS), C programs restricted to integers (Complexity: C Integer), runtime complexity of term rewrite systems (Runtime Complexity: TRSOpen image in new window ), runtime complexity of innermost rewriting (Runtime Complexity: TRS InnermostOpen image in new window ), and derivational complexity of term rewrite systems (Derivational Complexity: TRSOpen image in new window ).
4 Tools and Techniques
In this section, we give an overview on the tools that participated in the last edition, termCOMP 2018, of the competition and highlight the main techniques used by these tools.
4.1 Termination of Rewriting
In 2018, eight tools participated in categories devoted to term rewriting. On the one hand, some tools are specifically designed for certain variants of rewriting (e.g., MultumNonMulta only handles string rewrite systems, whereas Wanda, SOL, and SizeChangeTool are mainly designed for higher-order rewriting). On the other hand, the tools AProVE, Open image in new window , NaTT, and MU-TERM participated in categories for many different variants of term rewrite systems. To prove termination of TRSs, the tools use both classical reduction orderings as well as more recent powerful improvements like dependency pairs , matrix interpretations , match-bounds , etc. To generate the required orderings automatically, the tools typically apply existing SAT and SMT solvers.
More precisely, AProVE  and Open image in new window  implement the dependency pair framework [28, 30] which performs termination proofs in a modular way and allows the tool to apply different termination techniques for each sub-proof. NaTT  combines the dependency pair framework with the weighted path order . MU-TERM  is particularly suitable for TRSs with modified reduction relations (like innermost, context-sensitive, equational, or conditional rewriting). The goal of the tool MultumNonMulta  is to demonstrate the power of a few selected methods based on matrix interpretations for termination analysis of string rewrite systems. WANDA  implements higher-order termination techniques based on dependency pairs  and higher-order orderings , and applies an external first-order termination tool (AProVE) as a back-end . The tool SOL  uses an extended notion of reducibility  for termination proofs of rules derived from second-order algebraic theories. Finally, SizeChangeTool  extends the size-change termination principle  to higher-order rewriting.
4.2 Termination of Programs
In 2018, two tools (AProVE and UltimateAutomizer) participated in the category for termination of full C programs (which may include low-level memory operations). For C programs that only operate on integers, in addition to the two tools above, the tool VeryMax participated as well. The categories for termination of other programming languages (Java, Haskell, and Prolog) were only run as a demonstration, since in that year, only the tool AProVE analyzed their termination.
For all of these programming languages, AProVE uses an approach to transform the original program into a simple back-end language (an integer transition system or a combination of ITSs and TRSs) and to prove termination of the resulting back-end system instead . In contrast, the tool UltimateAutomizer  uses a generalization of program paths to Büchi Automata in order to remove terminating paths. VeryMax  is based on a framework which allows to combine conditional termination proofs obtained using Max-SMT solvers in order to generate an (unconditional) termination proof of the program.
Termination of ITSs was analyzed by the tools VeryMax, iRankFinder, and Ctrl. Moreover, Ctrl and AProVE also analyzed termination of systems that combine ITSs and TRSs. Here, iRankFinder  generates lexicographic combinations of ranking functions and ranks transitions incrementally . Ctrl  and AProVE prove termination of TRSs extended by built-in integers by suitable adaptions of termination techniques for ordinary TRSs [24, 36].
4.3 Complexity Analysis
Complexity of ITSs and of C programs on integers was analyzed by CoFloCo and AProVE, where AProVE applies two integrated sub-tools KoAT and LoAT to infer upper and lower runtime bounds for integer programs, respectively. The tool CoFloCo  uses a modeling with cost relations to infer amortized cost bounds, whereas KoAT  infers both upper runtime and size bounds for parts of the program in an alternating way. Lower bounds on the worst-case runtime are inferred by LoAT  by simplifying programs using an adaption of ranking functions for lower bounds, and by a subsequent inference of asymptotic lower bounds for the resulting simplified programs.
Runtime complexity of TRSs was analyzed by AProVE and Open image in new window . While runtime complexity only considers evaluations that start with basic terms (where “algorithms” are applied to “data objects”), Open image in new window also analyzed derivational complexity of arbitrary evaluations in corresponding demonstration categories. For complexity analysis, both AProVE and Open image in new window  use techniques which originate from termination analysis of TRSs and which are adapted in order to infer upper bounds on the number of evaluation steps [4, 44]. Moreover, the tools also infer lower bounds on the (worst-case) runtime using an extension of the concept of loops in order to detect rules that are guaranteed to result in certain asymptotic lower bounds .
In this short paper, we gave a brief summary of the termination and complexity competition (termCOMP), described its organization and its different categories, and presented an overview on recent tools that participated in the competition. The competition is always open to introduce new categories in order to reflect the continuing development in the area. It also welcomes the submission of new termination and complexity problems, especially problems that come from applications. Thus, it strives to remain the main competition in the field of automated termination and complexity analysis.
- 2.Albert, E., Gordillo, P., Rubio, A., Sergey, I.: GASTAP: a gas analyzer for smart contracts. CoRR abs/1811.10403 (2018). https://arxiv.org/abs/1811.10403
- 6.Avanzini, M., Sternagel, C., Thiemann, R.: Certification of complexity proofs using CeTA. In: Fernández, M. (ed.) RTA 2015, LIPIcs, vol. 36, pp. 23–39 (2015). https://doi.org/10.4230/LIPIcs.RTA.2015.23
- 10.Blanqui, F., Genestier, G.: Termination of \(\lambda \varPi \) modulo rewriting using the size-change principle. In: Lucas, S. (ed.) WST 2018, pp. 10–14 (2018). http://wst2018.webs.upv.es/wst2018proceedings.pdf
- 12.Borralleras, C., Brockschmidt, M., Larraz, D., Oliveras, A., Rodríguez-Carbonell, E., Rubio, A.: Proving termination through conditional termination. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10205, pp. 99–117. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54577-5_6CrossRefGoogle Scholar
- 16.Chen, Y., et al.: Advanced automata-based algorithms for program termination checking. In: Foster, J.S., Grossman, D. (eds.) PLDI 2018, pp. 135–150 (2018). https://doi.org/10.1145/3192366.3192405
- 17.Contejean, E., Courtieu, P., Forest, J., Pons, O., Urbain, X.: Automated certified proofs with CiME3. In: Schmidt-Schauß, M. (ed.) RTA 2011, LIPIcs, vol. 10, pp. 21–30 (2011). https://doi.org/10.4230/LIPIcs.RTA.2011.21
- 19.Doménech, J.J., Genaim, S.: iRankFinder. In: Lucas, S. (ed.) WST 2018, p. 83 (2018). http://wst2018.webs.upv.es/wst2018proceedings.pdf
- 31.Hofbauer, D.: MultumNonMulta at TermComp 2018. In: Lucas, S. (ed.) WST 2018, p. 80 (2018). http://wst2018.webs.upv.es/wst2018proceedings.pdf
- 34.Knuth, D.E., Bendix, P.: Simple word problems in universal algebras. In: Leech, J. (ed.) Computational Problems in Abstract Algebra, pp. 263–297 (1970)Google Scholar
- 35.Kop, C.: Higher order termination. Ph.D. thesis, VU University Amsterdam (2012). https://www.cs.ru.nl/~cynthiakop/phdthesis.pdf
- 38.Kop, C., van Raamsdonk, F.: Dynamic dependency pairs for algebraic functional systems. Logical Methods Comput. Sci. 8(2) (2012). https://doi.org/10.2168/LMCS-8(2:10)2012
- 40.von der Krone, S., Muhl, R., Waldmann, J.: star-exec-presenter (Software) (2014). https://github.com/jwaldmann/star-exec-presenter/
- 41.Lee, C.S., Jones, N.D., Ben-Amram, A.M.: The size-change principle for program termination. In: Hankin, C., Schmidt, D. (eds.) POPL 2001, pp. 81–92 (2001). https://doi.org/10.1145/360204.360210
- 45.Space/Time Analysis for Cybersecurity (STAC). https://www.darpa.mil/program/space-time-analysis-for-cybersecurity
- 46.Sternagel, C., Thiemann, R.: The certification problem format. In: Benzmüller, C., Woltzenlogel Paleo, B. (eds.) UITP 2014. EPTCS, vol. 167, pp. 61–72 (2014). https://doi.org/10.4204/EPTCS.167.8
- 50.Yamada, A.: starexec-master (Software) (2018). https://github.com/AkihisaYamada/starexec-master
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.