Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Computing or approximating the set of terms reachable by rewriting has more and more applications. For a Term Rewriting System (TRS) \({\mathcal {R}}\) and a set of terms \(L_0\subseteq \mathcal{T(F)}\), the set of reachable terms is . This set can be computed exactly for specific classes of \({\mathcal {R}}\) [10] but, in general, it has to be approximated. Applications of the approximation of \({\mathcal {R}}^*(L_0)\) are ranging from cryptographic protocol verification [1], to static analysis of various programming languages [5] or to TRS termination proofs [15]. Most of the techniques compute such approximations using tree automata as the core formalism to represent or approximate the (possibly) infinite set of terms \({\mathcal {R}}^*(L_0)\). Most of them also rely on a Knuth-Bendix completion-like algorithm completing a tree automaton \({\mathcal {A}}\) recognizing \(L_0\) into an automaton recognizing exactly, or over-approximating, the set \({\mathcal {R}}^*(L_0)\). As a result, these techniques can be refered as tree automata completion techniques [4, 8, 9, 13, 19, 22]. A strength of this algorithm, and at the same time a weakness, is that its precision is parameterized by a function [8] or a set of equations [13]. It is a strength because tuning the approximation function (or equations) permits to adapt the precision of completion to a specific goal to tackle. This is what made it successful for program and protocol verification. On the other hand, this is a weakness because it is difficult to guarantee its termination.

In this paper, we define a simple sufficient condition on the set of equations for the tree automata completion algorithm to terminate. This condition, which is strong in general, reveals to be natural and well adapted for the approximation of reachable terms when TRSs encode typed functional programs. We thus obtain a way to automatically over-approximate the set of all reachable program states of a functional program, or even restrict it to the set of all results. Thus we can over-approximate the image of a functional program.

2 Related Work

Tree automata completion. With regards to most papers about completion [4, 8, 9, 13, 19, 22], our contribution is to give the first criterion on the approximation for the completion to terminate. Note that it is possible to guarantee termination of the completion by inferring an approximation adapted to the TRS under concern, like in [20]. In this case, given a TRS, the approximation is fixed and unique. Our solution is more flexible because it lets the user change the precision of the approximation while keeping the termination guarantee. In [22], T. Takai have a completion parameterized by a set of equations. He also gives a termination proof for its completion but only for some restricted classes of TRSs. Here our termination proof holds for any left-linear TRS provided that the set of equations satisfy some properties.

Static analysis of functional programs. With regards to static analysis of functional programs using grammars or automata, our contribution is in the scope of data-flow analysis techniques, rather than control-flow analysis. More precisely, we are interested here in predicting the results of a function [21], rather than predicting the control flow [18]. Those two papers, as well as many other ones, deal with higher order functions using complex higher-order grammar formalisms (PMRS and HORS). Higher-order functions are not in the scope of the solution we propose here. However, we obtained some preliminary results suggesting that an extension to higher order functions is possible and gives relevant results (see Sect. 6). Furthermore, using equations, approximations are defined in a more declarative and flexible way than in [21], where they are defined by a dedicated algorithm. Besides, the verification mechanisms of [21] use automatic abstraction refinement. This can be also performed in the completion setting [3] and adapted to the analysis of functional programs [14]. Finally, using a simpler (first order) formalism, i.e. tree automata, makes it easier to take into account some other aspects like: evaluation strategies and built-ins types (see Sect. 6) that are not considered by those papers.

3 Background

In this section, we introduce some definitions and concepts that will be used throughout the rest of the paper (see also [2, 7]). Let \(\mathcal{F}\) be a finite set of symbols, each associated with an arity function, and let \(\mathcal{X}\) be a countable set of variables. \(\mathcal{T(F, X)}\) denotes the set of terms and \(\mathcal{T(F)}\) denotes the set of ground terms (terms without variables). The set of variables of a term \(t\) is denoted by \({\mathcal {V}}ar(t)\). A substitution is a function \(\sigma \) from \(\mathcal{X}\) into \(\mathcal{T(F, X)}\), which can be uniquely extended to an endomorphism of \(\mathcal{T(F, X)}\). A position \(p\) for a term \(t\) is a finite word over . The empty sequence \(\lambda \) denotes the top-most position. The set \({\mathcal {P}}os(t)\) of positions of a term \(t\) is inductively defined by \({\mathcal {P}}os(t)= \{ \lambda \} \) if \(t \in \mathcal{X}\) or \(t\) is a constant and \({\mathcal {P}}os(f(t_1,\dots ,t_n)) = \{ \lambda \} \cup \{i.p \mid 1 \le i \le n \text{ and } p \in {\mathcal {P}}os(t_i) \}\) otherwise. If \(p \in {\mathcal {P}}os(t)\), then \(t|_p\) denotes the subterm of \(t\) at position \(p\) and \(t[s]_p\) denotes the term obtained by replacement of the subterm \(t|_p\) at position \(p\) by the term \(s\).

A term rewriting system (TRS) \({\mathcal {R}}\) is a set of rewrite rules \(l \rightarrow r\), where \(l, r \in \mathcal{T(F, X)}\), \(l \not \in \mathcal{X}\), and \({\mathcal {V}}ar(l) \supseteq {\mathcal {V}}ar(r)\). A rewrite rule \(l \rightarrow r\) is left-linear if each variable of \(l\) occurs only once in \(l\). A TRS \({\mathcal {R}}\) is left-linear if every rewrite rule \(l \rightarrow r\) of \({\mathcal {R}}\) is left-linear. The TRS \({\mathcal {R}}\) induces a rewriting relation \(\rightarrow _{{\mathcal {R}}}\) on terms as follows. Let \(s, t\in \mathcal{T(F, X)}\) and \(l \rightarrow r \in {\mathcal {R}}\), \(s \rightarrow _{{\mathcal {R}}} t\) denotes that there exists a position \(p\in {\mathcal {P}}os(s)\) and a substitution \(\sigma \) such that \(s|_p= l\sigma \) and \(t=s[r\sigma ]_p\). Given a TRS \({\mathcal {R}}\), \(\mathcal{F}\) can be split into two disjoint sets \(\mathcal{C}\) and \(\mathcal{D}\). All symbols occurring at the root position of left-hand sides of rules of \({\mathcal {R}}\) are in \(\mathcal{D}\). \(\mathcal{D}\) is the set of defined symbols of \({\mathcal {R}}\), \(\mathcal{C}\) is the set of constructors. Terms in \(\mathcal{T(C)}\) are called data-terms. The reflexive transitive closure of \(\rightarrow _{{\mathcal {R}}}\) is denoted by and denotes that and \(t\) is irreducible by \({\mathcal {R}}\). The set of irreducible terms w.r.t. a TRS \({\mathcal {R}}\) is denoted by \({{\mathrm{I{\textsc {rr}}}}}({\mathcal {R}})\). The set of \({\mathcal {R}}\)-descendants of a set of ground terms \(I\) is . A TRS \({\mathcal {R}}\) is sufficiently complete if for all \(s\in \mathcal{T(F)}\), .

An equation set \(E\) is a set of equations \(l = r\), where \(l, r \in \mathcal{T(F, X)}\). The relation \(=_E\) is the smallest congruence such that for all substitution \(\sigma \) we have \(l\sigma =_E r\sigma \). Given a TRS \({\mathcal {R}}\) and a set of equations \(E\), a term \(s\in \mathcal{T(F)}\) is rewritten modulo \(E\) into \(t\in \mathcal{T(F)}\), denoted , if there exist \(s'\in \mathcal{T(F)}\) and \(t'\in \mathcal{T(F)}\) such that \(s=_Es'\rightarrow _{\mathcal {R}}t'=_E t\). The reflexive transitive closure of is defined as usual except that reflexivity is extended to terms equal modulo \(E\), i.e. for all \(s, t\in \mathcal{T(F)}\) if \(s=_E t\) then . The set of \({\mathcal {R}}\)-descendants modulo \(E\) of a set of ground terms \(I\) is

Let \({\mathcal {Q}}\) be a countably infinite set of symbols with arity \(0\), called states, such that \({\mathcal {Q}}\cap \mathcal{F}= \emptyset \). \(\mathcal{T(F \cup Q)}\) is called the set of configurations. A transition is a rewrite rule \(c \rightarrow q\), where \(c\) is a configuration and \(q\) is state. A transition is normalized when \(c = f(q_1, \ldots , q_n)\), \(f \in \mathcal{F}\) is of arity \(n\), and \(q_1, \ldots , q_n \in {\mathcal {Q}}\). An \(\epsilon \) -transition is a transition of the form \(q \rightarrow q'\) where \(q\) and \(q'\) are states. A bottom-up non-deterministic finite tree automaton (tree automaton for short) over the alphabet \(\mathcal{F}\) is a tuple \({\mathcal {A}}= \langle \mathcal{F}, {\mathcal {Q}}, {\mathcal {Q}}_F,\varDelta \rangle \), where \({\mathcal {Q}}_F\) is a finite subset of \({\mathcal {Q}}\), \(\varDelta \) is a finite set of normalized transitions and \(\epsilon \)-transitions. The transitive and reflexive rewriting relation on \(\mathcal{T(F \cup Q)}\) induced by the set of transitions \(\varDelta \) (resp. all transitions except \(\epsilon \)-transitions) is denoted by (resp. \(\rightarrow ^{\not \epsilon \, *}_{\varDelta }\)). When \(\varDelta \) is attached to a tree automaton \({\mathcal {A}}\) we also note those two relations and \(\rightarrow ^{\not \epsilon \, *}_{{\mathcal {A}}}\), respectively. A tree automaton \({\mathcal {A}}\) is complete if for all \(s\in \mathcal{T(F)}\) there exists a state \(q\) of \({\mathcal {A}}\) such that . The language (resp. -language) recognized by \({\mathcal {A}}\) in a state \(q\) is (resp. ). A state \(q\) of an automaton \({\mathcal {A}}\) is reachable (resp. -reachable) if \(\mathcal {L}({\mathcal {A}},q)\ne \emptyset \) (resp. ). We define \(\mathcal {L}({\mathcal {A}}) = \bigcup _{q \in {\mathcal {Q}}_F} \mathcal {L}{}({\mathcal {A}}, q)\). A set of transitions \(\varDelta \) is -deterministic if there are no two normalized transitions in \(\varDelta \) with the same left-hand side. A tree automaton \({\mathcal {A}}\) is -deterministic if its set of transitions is -deterministic. Note that if \({\mathcal {A}}\) is -deterministic then for all states \(q_1, q_2\) of \({\mathcal {A}}\) such that \(q_1\ne q_2\), we have .

4 Tree Automata Completion Algorithm

Tree Automata Completion algorithms were proposed in [9, 13, 16, 22]. They are very similar to a Knuth-Bendix completion except that they run on two distinct sets of rules: a TRS \({\mathcal {R}}\) and a set of transitions \(\varDelta \) of a tree automaton \({\mathcal {A}}\).

Starting from a tree automaton \({\mathcal {A}}_0=\langle \mathcal{F}, {\mathcal {Q}}, {\mathcal {Q}}_f, \varDelta _0\rangle \) and a left-linear TRS \({\mathcal {R}}\), the algorithm computes a tree automaton \({\mathcal {A}}'\) such that \(\mathcal {L}({\mathcal {A}}') ={\mathcal {R}}^{*}(\mathcal {L}({\mathcal {A}}_0))\) or \(\mathcal {L}({\mathcal {A}}') \supseteq {\mathcal {R}}^{*}(\mathcal {L}({\mathcal {A}}_0))\). The algorithm iteratively computes tree automata \(\mathcal{A}_{{\mathcal {R}}}^1\), \(\mathcal{A}_{{\mathcal {R}}}^2\), ... such that \(\forall i\ge 0: \mathcal {L}(\mathcal{A}_{{\mathcal {R}}}^i) \subseteq \mathcal {L}(\mathcal{A}_{{\mathcal {R}}}^{i+1})\) until we get an automaton \(\mathcal{A}_{{\mathcal {R}}}^k\) with and \(\mathcal {L}(\mathcal{A}_{{\mathcal {R}}}^k) = \mathcal {L}(\mathcal{A}_{{\mathcal {R}}}^{k+1})\). For all , if \(s \in \mathcal {L}(\mathcal{A}_{{\mathcal {R}}}^i)\) and , then \(t\in \mathcal {L}(\mathcal{A}_{{\mathcal {R}}}^{i+1})\). Thus, if \(\mathcal{A}_{{\mathcal {R}}}^k\) is a fixpoint then it also verifies \(\mathcal {L}(\mathcal{A}_{{\mathcal {R}}}^k) \supseteq {\mathcal {R}}^{*}(\mathcal {L}({\mathcal {A}}_0))\). To construct \(\mathcal{A}_{{\mathcal {R}}}^{i+1}\) from \(\mathcal{A}_{{\mathcal {R}}}^i\), we achieve a completion step which consists in finding critical pairs between \(\rightarrow _{{\mathcal {R}}}\) and \(\rightarrow _{\mathcal{A}_{{\mathcal {R}}}^i}\). A critical pair is a triple \((l\rightarrow r, \sigma , q)\) where \(l \rightarrow r \in {\mathcal {R}}\), \(\sigma :\mathcal{X}\mapsto {\mathcal {Q}}\) and \(q\in {\mathcal {Q}}\) such that \(l \sigma \rightarrow ^{*}_{\mathcal{A}_{{\mathcal {R}}}^{i}} q\) and \(r \sigma \not \rightarrow ^{*}_{\mathcal{A}_{{\mathcal {R}}}^{i}} q\). For \(r\sigma \) to be recognized by the same state and thus model the rewriting of \(l\sigma \) into \(r\sigma \), it is enough to add the necessary transitions to \(\mathcal{A}_{{\mathcal {R}}}^i\) to obtain \(\mathcal{A}_{{\mathcal {R}}}^{i+1}\) such that . In [13, 22], critical pairs are joined in the following way:

From an algorithmic point of view, there remains two problems to solve: find all the critical pairs \((l\rightarrow r, \sigma , q)\) and find the transitions to add to \(\mathcal{A}_{{\mathcal {R}}}^i\) to have . The first problem, called matching, can be efficiently solved using a specific algorithm [8, 10]. The second problem is solved using Normalization.

4.1 Normalization

The normalization function replaces subterms either by states of \({\mathcal {Q}}\) (using transitions of \(\varDelta \)) or by new states. A state \(q\) of \({\mathcal {Q}}\) is used to normalize a term \(t\) if \(t \rightarrow ^{\not \epsilon }_{\varDelta }q\). Normalizing by reusing states of \({\mathcal {Q}}\) and transitions of \(\varDelta \) permits to preserve the -determinism of \(\rightarrow ^{\not \epsilon }_{\varDelta }\). Indeed, \(\rightarrow ^{\not \epsilon }_{\varDelta }\) can be kept deterministic during completion though \(\rightarrow _{\varDelta }\) cannot.

Definition 1

(New state). Given a set of transitions \(\varDelta \), a new state (for \(\varDelta \)) is a state of \({\mathcal {Q}}\setminus {\mathcal {Q}}_f\) not occurring in left or right-hand sides of rules of \(\varDelta \) Footnote 1.

We here define normalization as a bottom-up process. This definition is simpler and equivalent to top-down definitions [13]. In the recursive call, the choice of the context \(C[\,]\) may be non deterministic but all the possible results are the equivalent modulo state renaming.

Definition 2

(Normalization). Let \(\varDelta \) be a set of transitions defined on a set of states \({\mathcal {Q}}\), \(t \in \mathcal{T(F \cup Q)}\setminus {\mathcal {Q}}\). Let \(C[\;]\) be a non empty context of \(\mathcal{T(F \cup Q)}\setminus {\mathcal {Q}}\), \(f\in \mathcal{F}\) of arity \(n\), and \(q,q',q_1, \ldots , q_n\in {\mathcal {Q}}\). The normalization function is inductively defined by:

  1. 1.

    \(Norm_{\varDelta }(f(q_1, \ldots , q_n) \rightarrow q) = \{ f(q_1, \ldots , q_n) \rightarrow q\}\)

  2. 2.

                                              

    where either (\(f(q_1, \ldots , q_n) \rightarrow q' \in \varDelta \)) or (\(q'\) is a new state for \(\varDelta \) and \(\forall q'' \in Q : f(q_1, \ldots , q_n) \rightarrow q'' \not \in \varDelta \)).

In the second case of the definition, if there are several states \(q'\) such that \(f(q_1,\ldots ,q_n) \rightarrow q' \in \varDelta \), we arbitrarily choose one of them. We illustrate the above definition on the normalization of a simple transition.

Example 1

Given \(\varDelta =\{b \rightarrow q_0\}\), \(Norm_\varDelta (f(g(a), b, g(a))\rightarrow q) = \{a \rightarrow q_1, g(q_1) \rightarrow q_2, b \rightarrow q_0, f(q_2, q_0, q_2) \rightarrow q\}.\)

4.2 One Step of Completion

A step of completion only consists in joining critical pairs. We first need to formally define the substitutions under concern: state substitutions.

Definition 3

(State substitutions, \(\varSigma ({\mathcal {Q}},\mathcal{X})\) ). A state substitution over an automaton \({\mathcal {A}}\) with a set of states \({\mathcal {Q}}\) is a function \(\sigma : \mathcal{X}\mapsto {\mathcal {Q}}\). We can extend this definition to a morphism \(\sigma : \mathcal{T(F, X)}\mapsto \mathcal {T}(\mathcal {F},\mathcal {Q})\). We denote by \(\varSigma ({\mathcal {Q}}, \mathcal{X})\) the set of state substitutions built over \({\mathcal {Q}}\) and \(\mathcal{X}\).

Definition 4

(Set of critical pairs). Let a TRS \({\mathcal {R}}\) and a tree automaton \({\mathcal {A}}=\langle \mathcal{F}, {\mathcal {Q}}, {\mathcal {Q}}_f, \varDelta \rangle \). The set of critical pairs between \({\mathcal {R}}\) and \({\mathcal {A}}\) is .

Recall that the completion process builds a sequence \(\mathcal{A}_{{\mathcal {R}}}^0,\mathcal{A}_{{\mathcal {R}}}^1,\ldots ,\mathcal{A}_{{\mathcal {R}}}^k\) of automata such that if \(s\in \mathcal {L}(\mathcal{A}_{{\mathcal {R}}}^i)\) and \(s\rightarrow _{{\mathcal {R}}} t\) then \(t\in \mathcal {L}(\mathcal{A}_{{\mathcal {R}}}^{i+1})\). One step of completion, i.e. the process computing \(\mathcal{A}_{{\mathcal {R}}}^{i+1}\) from \(\mathcal{A}_{{\mathcal {R}}}^i\), is defined as follows. Again, the following definition is a simplification of the definition of [13].

Definition 5

(One step of completion). Let \({\mathcal {A}}= \langle \mathcal{F}, {\mathcal {Q}}, {\mathcal {Q}}_f, \varDelta \rangle \) be a tree automaton, \({\mathcal {R}}\) be a left-linear TRS. The one step completed automaton is \(\mathcal{C}_{{\mathcal {R}}}({\mathcal {A}}) = \langle \mathcal{F}, {\mathcal {Q}}, {\mathcal {Q}}_f, Join^{CP({\mathcal {R}},{\mathcal {A}})}(\varDelta ) \rangle \) where \(Join^{S}(\varDelta )\) is inductively defined by:

  • \(Join^{\emptyset }(\varDelta )= \varDelta \)

  • \(Join^{\{(l\rightarrow r, q, \sigma )\} \cup S}(\varDelta )= Join^S(\varDelta \cup \varDelta ')\) where

    \(\varDelta '= \{q' \rightarrow q\}\) if there exists \(q'\in {\mathcal {Q}}\) s.t. \(r\sigma \rightarrow ^{\not \epsilon \, *}_{\varDelta }q'\), and otherwise

    \(\varDelta '= Norm_{\varDelta }(r\sigma \rightarrow q') \cup \{q' \rightarrow q\}\) where \(q'\) is a new state for \(\varDelta \)

Example 2

Let \({\mathcal {A}}\) be a tree automaton with \(\varDelta =\{f(q_1) \rightarrow q_0, a \rightarrow q_1, g(q_1) \rightarrow q_2\}\). If \({\mathcal {R}}=\{f(x) \rightarrow f(g(x))\}\) then \(CP({\mathcal {R}},{\mathcal {A}})= \{(f(x) \rightarrow f(g(x)), \sigma _3, q_0)\}\) with \(\sigma _3= \{x \mapsto q_1\}\), because and . We have \(f(g(x))\sigma _3= f(g(q_1))\) and there exists no state \(q\) such that \(f(g(q_1)) \rightarrow ^{\not \epsilon \, *}_{{\mathcal {A}}}q\). Hence, \(Join^{\{(f(x) \rightarrow f(g(x)),\sigma _3, q_0)\}}(\varDelta )= Join^{\emptyset }(\varDelta \cup Norm_{\varDelta }(f(g(q_1)) \rightarrow q_3) \cup \{q_3 \rightarrow q_0\})\). Since \(Norm_{\varDelta }(f(g(q_1)) \rightarrow q_3)= \{f(q_2) \rightarrow q_3, q(q_1) \rightarrow q_2\}\), we get that \(\mathcal{C}_{{\mathcal {R}}}({\mathcal {A}})= \langle \mathcal{F}, {\mathcal {Q}}\cup \{q_3\}, {\mathcal {Q}}_f,\varDelta \cup \{f(q_2) \rightarrow q_3, q_3 \rightarrow q_0\}\rangle \).

4.3 Simplification of Tree Automata by Equations

In this section, we define the simplification of tree automata \({\mathcal {A}}\) w.r.t. a set of equations \(E\). This operation permits to over-approximate languages that cannot be recognized exactly using tree automata completion, e.g. non regular languages. The simplification operation consists in finding \(E\)-equivalent terms recognized in \({\mathcal {A}}\) by different states and then by merging those states together. The merging of states is performed using renaming of a state in a tree automaton.

Definition 6

(Renaming of a state in a tree automaton). Let \({\mathcal {Q}},{\mathcal {Q}}'\) be set of states, \({\mathcal {A}}= \langle \mathcal{F}, {\mathcal {Q}}, {\mathcal {Q}}_f, \varDelta \rangle \) be a tree automaton, and \(\alpha \) a function \(\alpha : {\mathcal {Q}}\mapsto {\mathcal {Q}}'\). We denote by \({\mathcal {A}}\alpha \) the tree automaton where every occurrence of \(q\) is replaced by \(\alpha (q)\) in \({\mathcal {Q}}\), \({\mathcal {Q}}_f\) and in every left and right-hand side of every transition of \(\varDelta \).

If there exists a bijection \(\alpha \) such that \({\mathcal {A}}={\mathcal {A}}'\alpha \) then \({\mathcal {A}}\) and \({\mathcal {A}}'\) are said to be equivalent modulo renaming. Now we define the simplification relation which merges states in a tree automaton according to an equation. Note that it is not required for equations of \(E\) to be linear.

Definition 7

(Simplification relation). Let \({\mathcal {A}}= \langle \mathcal{F}, {\mathcal {Q}}, {\mathcal {Q}}_f, \varDelta \rangle \) be a tree automaton and \(E\) be a set of equations. For \(s=t \in E\), \(\sigma \in \varSigma ({\mathcal {Q}}, \mathcal{X})\), \(q_a, q_b \in {\mathcal {Q}}\) such that \(s\sigma \rightarrow ^{\not \epsilon \, *}_{{\mathcal {A}}}q_a\), \(t\sigma \rightarrow ^{\not \epsilon \, *}_{{\mathcal {A}}}q_b\), and \(q_a \ne q_b\) then \({\mathcal {A}}\) can be simplified into \({\mathcal {A}}'= {\mathcal {A}}\{q_b \mapsto q_a\}\), denoted by .

Example 3

Let \(E=\{s(s(x))=s(x)\}\) and \({\mathcal {A}}\) be the tree automaton with set of transitions \(\varDelta = \{ a \rightarrow q_0, s(q_0) \rightarrow q_1, s(q_1) \rightarrow q_2\}\). We can perform a simplification step using the equation \(s(s(x))=s(x)\) because we found a substitution \(\sigma =\{x \mapsto q_0\}\) such that: \(s(s(x))\sigma \rightarrow ^{\not \epsilon \, *}_{{\mathcal {A}}}q_2\) and \(s(x)\sigma \rightarrow ^{\not \epsilon \, *}_{{\mathcal {A}}}q_1\) Hence, Footnote 2

As stated in [13], simplification is a terminating relation (each step suppresses a state) and it enlarges the language recognized by a tree automaton, i.e. if then \(\mathcal {L}({\mathcal {A}}) \subseteq \mathcal {L}({\mathcal {A}}')\). Furthermore, no matter how simplification steps are performed, the obtained automata are equivalent modulo state renaming. In the following, denotes that and \({\mathcal {A}}'\) is irreducible by . We denote by \(\mathcal {S}_{E}\left( {\mathcal {A}}\right) \) any automaton \({\mathcal {A}}'\) such that .

Theorem 1

(Simplified Tree Automata [13]). Let \({\mathcal {A}},{\mathcal {A}}_1', {\mathcal {A}}'_2\) be tree automata and \(E\) be a set of equations. If and then \({\mathcal {A}}'_1\) and \({\mathcal {A}}'_2\) are equivalent modulo state renaming.

4.4 The Full Completion Algorithm

Definition 8

(Automaton completion). Let \({\mathcal {A}}\) be a tree automaton, \({\mathcal {R}}\) a left-linear TRS and \(E\) a set of equations.

  • \(\mathcal{A}_{{\mathcal {R}}, E}^0= {\mathcal {A}}\)

  • \(\mathcal{A}_{{\mathcal {R}}, E}^{n+1}= \mathcal {S}_{E}\left( \mathcal{C}_{{\mathcal {R}}}(\mathcal{A}_{{\mathcal {R}}, E}^n)\right) \), for \(n \ge 0\)

If there exists such that \(\mathcal{A}_{{\mathcal {R}}, E}^{k}=\mathcal{A}_{{\mathcal {R}}, E}^{k+1}\), then we denote \(\mathcal{A}_{{\mathcal {R}}, E}^k\) by \(\mathcal{A}_{{\mathcal {R}}, E}^*\).

In practice, checking if \(CP({\mathcal {R}},\mathcal{A}_{{\mathcal {R}}, E}^k)=\emptyset \) is sufficient to know that \(\mathcal{A}_{{\mathcal {R}}, E}^k\) is a fixpoint. However, a fixpoint cannot always be finitely reachedFootnote 3. To ensure termination, one can provide a set of approximating equations to overcome infinite rewriting and completion divergence.

Example 4

Let \({\mathcal {R}}=\{f(x, y) \rightarrow f(s(x), s(y))\}\), \(E=\{ s(s(x))=s(x) \}\) and \({\mathcal {A}}^0\) be the tree automaton with set of transitions \(\varDelta =\{f(q_a, q_b) \rightarrow q_0), a \rightarrow q_a, b \rightarrow q_b\}\), i.e. \(\mathcal {L}({\mathcal {A}}^0)=\{f(a,b)\}\). The completion ends after two completion steps on \(\mathcal{A}_{{\mathcal {R}}, E}^2\) which is a fixpoint. Completion steps are summed up in the following table. To simplify the presentation, we do not repeat the common transitions: \(\mathcal{A}_{{\mathcal {R}}, E}^i\) and \(\mathcal{C}_{\mathcal {R}}({\mathcal {A}}^{i})\) columns are supposed to contain all transitions of \({\mathcal {A}}^0,\ldots , \mathcal{A}_{{\mathcal {R}}, E}^{i-1}\). The automaton \(\mathcal{A}_{{\mathcal {R}}, E}^1\) is exactly \(\mathcal{C}_{{\mathcal {R}}}({\mathcal {A}}^0)\) since simplification by equations do not apply. Simplification has been applied on \(\mathcal{C}_{\mathcal {R}}(\mathcal{A}_{{\mathcal {R}}, E}^1)\) to obtain \(\mathcal{A}_{{\mathcal {R}}, E}^2\).

figure a

Now, we recall the lower and upper bound theorems. Tree automata completion of automaton \({\mathcal {A}}\) with TRS \({\mathcal {R}}\) and set of equations \(E\) is lower bounded by and upper bounded by . The lower bound theorem ensures that the completed automaton recognizes all \({\mathcal {R}}\)-reachable terms (but not all \({\mathcal {R}}/E\)-reachable terms). The upper bound theorem guarantees that all terms recognized by are only \({\mathcal {R}}/E\)-reachable terms.

Theorem 2

(Lower bound [13]). Let \({\mathcal {R}}\) be a left-linear TRS, \({\mathcal {A}}\) be a tree automaton and \(E\) be a set of equations. If completion terminates on then .

The upper bound theorem states the precision result of completion. It is defined using the \({\mathcal {R}}/E\)-coherence property. The intuition behind \({\mathcal {R}}/E\)-coherence is the following: in the tree automaton \(\epsilon \)-transitions represent rewriting steps and normalized transitions recognize \(E\)-equivalence classes. More precisely, in a \({\mathcal {R}}/E\)-coherent tree automaton, if two terms \(s,t\) are recognized into the same state \(q\) using only normalized transitions then they belong to the same \(E\)-equivalence class. Otherwise, if at least one \(\epsilon \)-transition is necessary to recognize, say, \(t\) into \(q\) then at least one step of rewriting was necessary to obtain \(t\) from \(s\).

Theorem 3

(Upper bound [13]). Let \({\mathcal {R}}\) be a left-linear TRS, \(E\) a set of equations and \({\mathcal {A}}\) a \({\mathcal {R}}/E\)-coherent tree automaton. For any : \(\mathcal {L}(\mathcal{A}_{{\mathcal {R}}, E}^i)\subseteq {\mathcal {R}}^*_E(\mathcal {L}({\mathcal {A}}))\) and \(\mathcal{A}_{{\mathcal {R}}, E}^i\) is \({\mathcal {R}}/E\)-coherent.

5 Termination Criterion for a Given Set of Equations

Given a set of equations \(E\), the effect of the simplification with \(E\) on a tree automaton is to merge two distinct states recognizing instances of the left and right-hand side for all the equations of \(E\). In this section, we give a sufficient condition on \(E\) and on the completed tree automata \(\mathcal{A}_{{\mathcal {R}}, E}^i\) for the tree automata completion to always terminate. The intuition behind this condition is simple: if the set of equivalence classes for \(E\), i.e. \(\mathcal{T(F)}/_{=_E}\), is finite then so should be the set of new states used in completion. However, this is not true in general because simplification of an automaton with \(E\) does not necessarily merge all \(E\)-equivalent terms.

Example 5

Let \({\mathcal {A}}\) be the tree automaton with set of transitions \(a \rightarrow q\), \({\mathcal {R}}=\{a \rightarrow c\}\) and let \(E=\{a=b,b=c\}\). The set of transitions of \(\mathcal{C}_{\mathcal {R}}({\mathcal {A}})\) is \(\{a \rightarrow q, c \rightarrow q', q' \rightarrow q\}\). We have \(a =_E c\), and but on the automaton \(\mathcal{C}_{\mathcal {R}}({\mathcal {A}})\), no simplification situation (as described by Definition 7), can be found because the term \(b\) is not recognized by \(\mathcal{C}_{\mathcal {R}}({\mathcal {A}})\). Hence, the simplified automaton is \(\mathcal{C}_{\mathcal {R}}({\mathcal {A}})\) where \(a\) and \(c\) are recognized by different states.

There is no simple solution to have a simplification algorithm merging all states recognizing \(E\)-equivalent terms (see Sect. 6). Having a complete automaton \({\mathcal {A}}\) solve the above problem but leads to rough approximations (see [11]). In the next section, we propose to give some simple restrictions on \(E\) to ensure that completion terminates. In Sect. 5.2, we will see how those restrictions can easily be met for “functional” TRS, i.e. a typed first-order functional program translated into a TRS.

5.1 General Criterion

What Example 5 shows is that, for a simplification with \(E\) to apply, it is necessary that both sides of the equation are recognized by the tree automaton. In the following, we will define a set \(E^c\) of contracting equations so that this property is true. What Example 5 does not show is that, by default, tree automata are not \(E\)-compatible. In particular, any non -deterministic automaton does not satisfy the reflexivity of \(=_E\). For instance, if an automaton \({\mathcal {A}}\) has two transitions \(a \rightarrow q_1\) and \(a \rightarrow q_2\), since \(a=_E a\) for all \(E\), for \({\mathcal {A}}\) to be \(E\)-compatible we should have \(q_1 = q_2\). To enforce -determinism by automata simplification, we define a set of reflexivity equations as follows.

Definition 9

(Set of reflexivity equations \(E^{r}\) ). For a given set of symbols \(\mathcal{F}\), \(E^{r}=\{f(x_1, \ldots , x_n)=f(x_1, \ldots , x_n) \; | \;f \in \mathcal{F}, \text{ and } \text{ arity } \text{ of } f \text{ is } n\}\), where \(x_1\ldots x_n\) are pairwise distinct variables.

Note that for all set of equations \(E\), the relation \(=_E\) is trivially equivalent to \(=_{E\cup E^{r}}\). Furthermore, simplification with \(E^{r}\) transforms all automaton into an -deterministic automaton, as stated in the following lemma.

Lemma 1

For all tree automaton \({\mathcal {A}}\) and all set of equation \(E\), if \(E\supseteq E^{r}\) and then \({\mathcal {A}}'\) is -deterministic.

Proof

Shown by induction on the height of terms (see [11] for details).\(\quad \square \)

We now define sets of contracting equations. Such sets are defined for a set of symbols \(\mathcal{K}\) which can be a subset of \(\mathcal{F}\). This will be used later to restrict contracting equations to the subset of constructor symbols of \(\mathcal{F}\).

Definition 10

(Sets of contracting equations for \(\mathcal{K}\), \(E^c_{\mathcal{K}}\) ). Let \(\mathcal{K}\subseteq \mathcal{F}\). A set of equations is contracting for \(\mathcal{K}\), denoted by \(E^c_{\mathcal{K}}\), if all equations of \(E^c_{\mathcal{K}}\) are of the form \(u= u|_p\) with \(u\in \mathcal{T}(\mathcal{K}, \mathcal{X})\) a linear term, \(p \ne \lambda \), and if the set of normal forms of \(\mathcal{T}(\mathcal{K})\) w.r.t. the TRS \(\overrightarrow{E^c_{\mathcal{K}}}=\{ u \rightarrow u|_p \, \; | \;\, u= u|_p \in E^c_{\mathcal{K}}\}\) is finite.

Contracting equations, if defined on \(\mathcal{F}\), define an upper bound on the number of states of a simplified automaton.

Lemma 2

Let \({\mathcal {A}}\) be a tree automaton and \(E^c_{\mathcal{F}}\) a set of contracting equations for \(\mathcal{F}\). If \(E\supseteq E^c_{\mathcal{F}}\cup E^{r}\) then the simplified automaton \(\mathcal {S}_{E}\left( {\mathcal {A}}\right) \) is an -deterministic automaton having no more states than terms in \({{\mathrm{I{\textsc {rr}}}}}(\overrightarrow{E^c_{\mathcal{F}}})\).

Proof

First, assume for all state \(q\) of \(\mathcal {S}_{E}\left( {\mathcal {A}}\right) \), . Then, for all terms \(s\) such that \(s \rightarrow ^{\not \epsilon \, *}_{\mathcal {S}_{E}\left( {\mathcal {A}}\right) }q\), we know that \(s\) is not in normal form w.r.t. \(\overrightarrow{E^c_{\mathcal{F}}}\). As a result, the left-hand side of an equation of \(E^c_{\mathcal{F}}\) can be applied to \(s\). This means that there exists an equation \(u = u|_p\), a ground context \(C\) and a substitution \(\theta \) such that \(s=C[u\theta ]\). Furthermore, since \(s \rightarrow ^{\not \epsilon \, *}_{\mathcal {S}_{E}\left( {\mathcal {A}}\right) }q\), we know that \(C[u\theta ]\rightarrow ^{\not \epsilon \, *}_{\mathcal {S}_{E}\left( {\mathcal {A}}\right) }q\) and that there exists a state \(q'\) such that \(C[q'] \rightarrow ^{\not \epsilon \, *}_{\mathcal {S}_{E}\left( {\mathcal {A}}\right) }q\) and \(u\theta \rightarrow ^{\not \epsilon \, *}_{\mathcal {S}_{E}\left( {\mathcal {A}}\right) }q'\). From \(u\theta \rightarrow ^{\not \epsilon \, *}_{\mathcal {S}_{E}\left( {\mathcal {A}}\right) }q'\), we know that all subterms of \(u\theta \) are recognized by at least one state in \(\mathcal {S}_{E}\left( {\mathcal {A}}\right) \). Thus, there exists a state \(q''\) such that \(u|_p\theta \rightarrow ^{\not \epsilon \, *}_{\mathcal {S}_{E}\left( {\mathcal {A}}\right) }q''\). We thus have a situation of application of the equation \(u = u|_p\) in the automaton. Since \(\mathcal {S}_{E}\left( {\mathcal {A}}\right) \) is simplified, we thus know that \(q'=q''\). As mentioned above, we know that \(C[q'] \rightarrow ^{\not \epsilon \, *}_{\mathcal {S}_{E}\left( {\mathcal {A}}\right) }q\). Hence \(C[u|_p\theta ] \rightarrow ^{\not \epsilon \, *}_{\mathcal {S}_{E}\left( {\mathcal {A}}\right) }C[q'] \rightarrow ^{\not \epsilon \, *}_{\mathcal {S}_{E}\left( {\mathcal {A}}\right) }q\). If \(C[u|_p\theta ]\) is not in normal form w.r.t. \(\overrightarrow{E^c_{\mathcal{F}}}\) then we can do the same reasoning on \(C[u|_p\theta ] \rightarrow ^{\not \epsilon \, *}_{\mathcal {S}_{E}\left( {\mathcal {A}}\right) }q\) until getting a term that is in normal form w.r.t. \(\overrightarrow{E^c_{\mathcal{F}}}\) and recognized by the same state \(q\). Thus, this contradicts the fact that \(\mathcal {S}_{E}\left( {\mathcal {A}}\right) \) recognizes no term of \({{\mathrm{I{\textsc {rr}}}}}(\overrightarrow{E^c_{\mathcal{F}}})\).

Then, by definition of \(E^c_{\mathcal{F}}\), \({{\mathrm{I{\textsc {rr}}}}}(\overrightarrow{E^c_{\mathcal{F}}})\) is finite. Let \(\{t_1, \ldots , t_n\}\) be the subset of \({{\mathrm{I{\textsc {rr}}}}}(\overrightarrow{E^c_{\mathcal{F}}})\) recognized by \(\mathcal {S}_{E}\left( {\mathcal {A}}\right) \). Let \(q_1, \ldots , q_n\) be the states recognizing \(t_1, \ldots , t_n\) respectively. We know that there is a finite set of states recognizing \(t_1, \ldots , t_n\) because \(E \supseteq E^{r}\) and Lemma 1 entails that \(\mathcal {S}_{E}\left( {\mathcal {A}}\right) \) is -deterministic. Now, for all terms \(s\) recognized by a state \(q\) in \(\mathcal {S}_{E}\left( {\mathcal {A}}\right) \), i.e. \(s \rightarrow ^{\not \epsilon \, *}_{\mathcal {S}_{E}\left( {\mathcal {A}}\right) }q\), we can use a reasoning similar to the one carried out above and show that \(q\) is equal to one state of \(\{q_1, \ldots , q_n\}\) recognizing normal forms of \(\overrightarrow{E^c_{\mathcal{F}}}\) in \(\mathcal {S}_{E}\left( {\mathcal {A}}\right) \). Finally, there are at most \(card({{\mathrm{I{\textsc {rr}}}}}(\overrightarrow{E^c_{\mathcal{F}}}))\) states in \(\mathcal {S}_{E}\left( {\mathcal {A}}\right) \).\(\quad \square \)

Now it is possible to state the Theorem guaranteeing the termination of completion if the set of equations \(E\) contains a set of contracting equations \(E^c_{\mathcal{F}}\) for \(\mathcal{F}\) and a set of reflexivity equations.

Theorem 4

Let \({\mathcal {A}}\) be a tree automaton, \({\mathcal {R}}\) a left linear TRS and \(E\) a set of equations. If \(E\supseteq E^{r}\cup E^c_{\mathcal{F}}\), then completion of \({\mathcal {A}}\) by \({\mathcal {R}}\) and \(E\) terminates.

Proof

For completion to diverge it must produce infinitely many new states. This is impossible if \(E\) contains \(E^c_{\mathcal{F}}\) and \(E^{r}\) (see Lemma 2).\(\quad \square \)

5.2 Criterion for Functional TRSs

Now, we consider functional programs viewed as TRSs. We assume that such TRSs are left-linear, which is a common assumption on TRSs obtained from functional programs [2]. In this section, we will restrict ourselves to sufficiently complete TRSs obtained from functional programs and will refer to them as functional TRSs. For TRSs representing functional programs, defining contracting equations of \(E^c_{\mathcal{C}}\) on \(\mathcal{C}\) rather than on \(\mathcal{F}\) is enough to guarantee termination of completion. This is more convenient and also closer to what is usually done in static analysis where abstractions are usually defined on data and not on function applications. Since the TRSs we consider are sufficiently complete, any term of \(\mathcal{T(F)}\) can be rewritten into a data-term of \(\mathcal{T(C)}\). As above, using equations of \(E^c_{\mathcal{C}}\) we are going to ensure that the data-terms of the computed languages will be recognized by a bounded set of states. To lift-up this property to \(\mathcal{T(F)}\) it is enough to ensure that \(\forall s,t \in \mathcal{T(F)}\) if \(s \rightarrow _R t\) then \(s\) and \(t\) are recognized by equivalent states. This is the role of the set of equations \(E_R\).

Definition 11

( \(E_{\mathcal {R}}\) ). Let \({\mathcal {R}}\) be a TRS, the set of \({\mathcal {R}}\)-equations is \(E_{\mathcal {R}}= \{l=r \; | \;l \rightarrow r \in {\mathcal {R}}\}\).

Theorem 5

Let \({\mathcal {A}}_0\) be a tree automaton, \({\mathcal {R}}\) a sufficiently complete left-linear TRS and \(E\) a set of equations. If \(E \supseteq E^{r}\cup E^c_{\mathcal{C}}\cup E_{\mathcal {R}}\) with \(E^c_{\mathcal{C}}\) contracting then completion of \({\mathcal {A}}_0\) by \({\mathcal {R}}\) and \(E\) terminates.

Proof

Firstly, to show that the number of states recognizing terms of \(\mathcal{T(C)}\) is finite we can do a proof similar to the one of Lemma 2. Let \(G\subseteq \mathcal{T(C)}\) be the finite set of normal forms of \(\mathcal{T(C)}\) w.r.t. \(\overrightarrow{E^c_{\mathcal{C}}}\). Since \(E \supseteq E^{r}\cup E^c_{\mathcal{C}}\), like in the proof of Lemma 2, we can show that in any completed automaton, terms of \(\mathcal{T(C)}\) are recognized by no more states than terms in \(G\). Secondly, since \({\mathcal {R}}\) is sufficiently complete, for all terms \(s \in \mathcal{T(F)}\setminus \mathcal{T(C)}\) we know that there exists a term \(t \in \mathcal{T(C)}\) such that . The fact that \(E\supseteq E_{\mathcal {R}}\) guarantees that \(s\) and \(t\) will be recognized by equivalent states in the completed (and simplified) automaton. Since the number of states necessary to recognize \(\mathcal{T(C)}\) is finite, so is the number of states necessary to recognize terms of \(\mathcal{T(F)}\).\(\quad \square \)

Finally, to exploit the types of the functional program, we now see \(\mathcal{F}\) as a many-sorted signature whose set of sorts is \(\mathcal {S}\). Each symbol \(f\in \mathcal{F}\) is associated to a profile \(f : S_1 \times \ldots \times S_k \mapsto S\) where \(S_1, \ldots , S_k, S \in \mathcal {S}\) and \(k\) is the arity of \(f\). Well-sorted terms are inductively defined as follows: \(f(t_1, \ldots , t_k)\) is a well-sorted term of sort \(S\) if \(f : S_1 \times \ldots \times S_k \mapsto S\) and \(t_1, \ldots , t_k\) are well-sorted terms of sorts \(S_1, \ldots , S_k\), respectively. We denote by \(\mathcal{T(F, X)}^\mathcal {S}\), \(\mathcal{T(F)}^\mathcal {S}\) and \(\mathcal{T(C)}^\mathcal {S}\) the set of well-sorted terms, ground terms and constructor terms, respectively. Note that we have \(\mathcal{T(F, X)}^\mathcal {S}\subseteq \mathcal{T(F, X)}\), \(\mathcal{T(F)}^\mathcal {S}\subseteq \mathcal{T(F)}\) and \(\mathcal{T(C)}^\mathcal {S}\subseteq \mathcal{T(C)}\). We assume that \({\mathcal {R}}\) and \(E\) are sort preserving, i.e. that for all rule \(l \rightarrow r \in R\) and all equation \(u=v \in E\), \(l,r,u,v \in \mathcal{T(F, X)}^\mathcal {S}\), \(l\) and \(r\) have the same sort and so do \(u\) and \(v\). Note that well-typedness of the functional program entails the well-sortedness of \({\mathcal {R}}\). We still assume that the (sorted) TRS is sufficiently complete, which is defined in a similar way except that it holds only for well-sorted terms, i.e. for all \(s\in \mathcal{T(F)}^\mathcal {S}\) there exists a term \(t\in \mathcal{T(C)}^\mathcal {S}\) such that . We slightly refine the definition of contracting equations as follows. For all sort \(S\), if \(S\) has a unique constant symbol we note it \(c^S\).

Definition 12

(Set \(E^c_{{\mathcal{K},\mathcal {S}}}\) of contracting equations for \(\mathcal{K}\) and \(\mathcal {S}\) ). Let \(\mathcal{K}\subseteq \mathcal{F}\). The set of well-sorted equations \(E^c_{{\mathcal{K},\mathcal {S}}}\) is contracting (for \(\mathcal{K}\)) if its equations are of the form (a) \(u= u|_p\) with \(u\) linear and \(p \ne \varLambda \), or (b) \(u=c^S\) with \(u\) of sort \(S\), and if the set of normal forms of \(\mathcal{T}(\mathcal{K})^\mathcal {S}\) w.r.t. the TRS \(\overrightarrow{E^c_{{\mathcal{K},\mathcal {S}}}}=\) \(\{ u \rightarrow v \; | \;u=v \in E^c_{{\mathcal{K},\mathcal {S}}}\wedge (v=u|_p \vee v=c^S)\}\) is finite.

The termination theorem for completion of sorted TRSs is similar to the previous one except that it needs \({\mathcal {R}}/E\)-coherence of \({\mathcal {A}}_0\) to ensure that terms recognized by completed automata are well-sorted (see [11] for proof).

Theorem 6

Let \({\mathcal {A}}_0\) be a tree automaton recognizing well-sorted terms, \({\mathcal {R}}\) a sufficiently complete sort-preserving left-linear TRS and \(E\) a sort-preserving set of equations. If \(E \supseteq E^{r}\cup E^c_{\mathcal{C},\mathcal {S}}\cup E_{\mathcal {R}}\) with \(E^c_{\mathcal{C},\mathcal {S}}\) contracting and \({\mathcal {A}}_0\) is \({\mathcal {R}}/E\)-coherent then completion of \({\mathcal {A}}_0\) by \({\mathcal {R}}\) and \(E\) terminates.

5.3 Experiments

The objective of data-flow analysis is to predict the set of all program states reachable from a language of initial function calls, i.e. to over-approximate \({\mathcal {R}}^*(\mathcal {L}({\mathcal {A}}))\) where \({\mathcal {R}}\) represents the functional program and \({\mathcal {A}}\) the language of initial function calls. In this setting, we automatically compute an automaton over-approximating \({\mathcal {R}}^*(\mathcal {L}({\mathcal {A}}))\). But we can do more. Since we are dealing with left-linear TRS, it is possible to build \({\mathcal {A}}_{{{\mathrm{I{\textsc {rr}}}}}({\mathcal {R}})}\) recognizing \({{\mathrm{I{\textsc {rr}}}}}({\mathcal {R}})\). Finally, since tree automata are closed under all boolean operations, we can compute an approximation of all the results of the function calls by computing the tree automaton recognizing the intersection between and \({\mathcal {A}}_{{{\mathrm{I{\textsc {rr}}}}}({\mathcal {R}})}\).

Here is an example of application of those theorems. Completions are performed using Timbuk. All the \({\mathcal {A}}_{{{\mathrm{I{\textsc {rr}}}}}({\mathcal {R}})}\) automata and intersections were performed using Taml. Details can be found in [14].

figure b

In this example, the TRS \({\mathcal {R}}\) encodes the classical reverse and append functions. The language recognized by automaton \({\mathcal {A}}_0\) is the set of terms of the form \(rev([a, a, \ldots ,\) \(b, b, \ldots ])\). Note that there are at least one \(a\) and one \(b\) in the list. We assume that \(\mathcal {S}=\{T,list\}\) and sorts for symbols are the following: \(a:T\), \(b:T\), \(nil: list\), \(cons: T \times list \mapsto list\), \(append: list \times list \mapsto list\) and \(rev: list \mapsto list\). Now, to use Theorem 6, we need to prove each of its assumptions. The set \(E\) of equations contains \(E_{\mathcal {R}}\), \(E^{r}\) and \(E^c_{\mathcal{C},\mathcal {S}}\). The set of Equations \(E^c_{\mathcal{C},\mathcal {S}}\) is contracting because the automaton \({\mathcal {A}}_{{{\mathrm{I{\textsc {rr}}}}}(\overrightarrow{E^c_{\mathcal{C},\mathcal {S}}})}\) recognizes a finite language. This automaton can be computed using Taml: it is the intersection between the automaton \({\mathcal {A}}_{\mathcal{T(C)}^\mathcal {S}}\) Footnote 4 recognising \(\mathcal{T(C)}^\mathcal {S}\) and the automaton \({\mathcal {A}}_{{{\mathrm{I{\textsc {rr}}}}}(\{cons(X,cons(Y,Z)) \rightarrow cons(Y,Z)\})}\):

figure c

The language of \({\mathcal {A}}_0\) is well-sorted and \(E\) and \({\mathcal {R}}\) are sort preserving. We can prove sufficient completeness of \({\mathcal {R}}\) on \(\mathcal{T(F)}^\mathcal {S}\) using, for instance, Maude [6] or even Timbuk [9] itself. The last assumption of Theorem 6 to prove is that \({\mathcal {A}}_0\) is \({\mathcal {R}}/E\)-coherent. This can be shown by remarking that each state \(q\) of \({\mathcal {A}}_0\) recognizes at least one term and if \(s \rightarrow ^{\not \epsilon \, *}_{{\mathcal {A}}_0}q\) and \(t \rightarrow ^{\not \epsilon \, *}_{{\mathcal {A}}_0}q\) then \(s \equiv _E t\). For instance \(cons(b, cons(b,nil)) \rightarrow ^{\not \epsilon \, *}_{{\mathcal {A}}_0}q_{lb}\) and \(cons(b, nil) \rightarrow ^{\not \epsilon \, *}_{{\mathcal {A}}_0}q_{lb}\) and \(cons(b, cons(b,nil))\equiv _E cons(b, nil)\). Thus, completion is guaranteed to terminate: after 4 completion steps (7 ms) we obtain a fixpoint automaton with 11 transitions. To restrain the language to normal forms it is enough to compute the intersection with \({{\mathrm{I{\textsc {rr}}}}}(R)\). Since we are dealing with sufficiently complete TRSs, we know that \({{\mathrm{I{\textsc {rr}}}}}(R) \subseteq \mathcal{T(C)}^\mathcal {S}\). Thus, we can use again \({\mathcal {A}}_{\mathcal{T(C)}^\mathcal {S}}\) for the intersection that is:

figure d

which recognizes any (non empty) flat list of \(a\) and \(b\). Thus, our analysis preserved the property that the result cannot be the empty list but lost the order of the elements in the list. This is not surprising because the equation makes \(cons(a, cons(b, nil))\) equal to \(cons(a,nil)\). It is possible to refine by hand \(E^c_{\mathcal{C},\mathcal {S}}\) using the following equations: This set of equations avoids the previous problem. Again, \(E\) verifies the conditions of Theorem 6 and completion is still guaranteed to terminate. The result is the automaton having 19 transitions. This time, intersection with \({\mathcal {A}}_{\mathcal{T(C)}^\mathcal {S}}\) gives:

figure e

This automaton exactly recognizes lists of the form \([b,b, \ldots ,a, a, \ldots ]\) with at least one \(b\) and one \(a\), as expected. Hopefully, refinement of equations can be automatized in completion [3] and can be used here, see [14] for examples. More examples can be found in the Timbuk 3.1 source distribution.

6 Conclusion and Further Research

In this paper we defined a criterion on the set of approximation equations to guarantee termination of the tree automata completion. When dealing with, so called, functional TRS this criterion is close to what is generally expected in static analysis and abstract interpretation: a finite model for an infinite set of data-terms. This work is a first step to use completion for static analysis of functional programs. There remains some interesting points to address.

Dealing with higher-order functions. Higher-order functions can be encoded into first order TRS using a simple encoding borrowed from [17]: defined symbols become constants, constructor symbols remain the same, and an additional application operator ‘’ of arity 2 is introduced. On all the examples of [21], completion and this simple encoding produces exactly the same results [14].

Dealing with evaluation strategies. The technique proposed here, as well as [21], over-approximates the set of results for all evaluation strategies. As far as we know, no static analysis technique for functional programs can take into account evaluation strategies. However, it is possible to restrict the completion algorithm to recognize only innermost descendants [14], i.e. call-by-value results. If the approximation is precise enough, any non terminating program with call-by-value will have an empty set of results. An open research direction is to use this to prove non termination of functional programs by call-by-value strategy.

Dealing with built-in types. Values manipulated by real functional programs are not always terms or trees. They can be numerals or be terms embedding numerals. In [12], it has been shown that completion can compute over-approximations of reachable terms embedding built-in terms. The structural part of the term is approximated using tree automata and the built-in part is approximated using lattices and abstract interpretation.

Besides, there remain some interesting theoretical points to solve. In Sect. 5, we saw that having a finite \(\mathcal{T(F)}/_{=_E}\) is not enough to guarantee the termination of completion. This is due to the fact that the simplification algorithm does not merge all states recognizing \(E\)-equivalent terms. Having a simplification algorithm ensuring this property is not trivial. First, the theory defined by \(E\) has to be decidable. Second, even if \(E\) is decidable, finding all the \(E\)-equivalent terms recognized by the tree automaton is an open problem. Furthermore, proving that \(\mathcal{T(F)}/_{=_E}\) is finite, is itself difficult. This question is undecidable in general [23], but can be answered for some particular \(E\). For instance, if \(E\) can be oriented into a TRS \({\mathcal {R}}\) which is terminating, confluent and such that \({{\mathrm{I{\textsc {rr}}}}}({\mathcal {R}})\) is finite then \(\mathcal{T(F)}/_{=_E}\) is finite [23].