Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Model checking has become one of the main techniques for algorithmic verification of computer systems. The original applications were found in context of finite-state systems, such as hardware circuits, where the behavior of the system can be captured by a finite state machine. In the last two decades, there has also been a large amount of work devoted to extending model checking so that its can handle models with infinite state spaces such as Petri nets, timed automata, push-down systems, counter automata, and channel machines. Recent works have considered systems that are infinite in multiple dimensions. For instance, many classes of timed protocols are parameterized (consist of unbounded numbers of components), and hence they can be naturally modeled by timed Petri nets [10]. Also, many message passing protocols have behaviors that are constrained by timing conditions, giving rise to timed channel systems [5].

In particular, Push-Down Automata (Pda) have been studied extensively as a model for the analysis of recursive programs (e.g., [12, 23, 25, 33]). The model of Pda has been extended to allow quantitative reasoning with respect to time [1] and probabilities [24, 26]. However, all existing models assume finite-state control, which means that variables in the program are assumed to range over finite domains. In this paper, we consider an extension of Pda, which we call Pdad, that strengthens the model in two ways. First, in addition to the stack, a Pdad also operates on a number of variables ranging over the natural numbers. Furthermore, each message inside the stack is equipped with a natural number which represents its “value”. Thus, we get a model that is possibly unbounded in two dimensions, namely we have an unbounded number of messages inside the stack each of which has an attribute that is a natural number. The operations allowed on the stack are the standard push and pop operations. However, when pushing a symbol to the stack, its value may be defined to be the value of a program variable. Also, when a message is popped, then its value may be copied to a variable. A Pdad allows comparing the values of variables according to the gap-order constraint system, where two variables may be tested for equality, or for checking that there is a minimal gap (defined by a natural number) between the values of the two variables. Also, a variable may be assigned a new arbitrary value, the value of another variable, or a value that is at least some (given) natural number larger than the value of another variable. In this manner, the model of Pdad subsumes two known models, namely that of Pda (which we get by removing the variables in the program and by neglecting the values of the symbols in the stack), and the model of Integral Relational Automata [15] (which we get by removing the stack).

In this paper, we show decidability of the control reachability problem for Pdad. Given a control (local) state of the automaton, we check whether the automaton reaches the state from its initial configuration. We solve the problem in two steps. We introduce a class of Context-Free Grammars with Data (Cfgd). In a Cfgd, each non-terminal has an arity. The grammar generates terms each of which is either a terminal or a non-terminal equipped with a tuple of natural number (as many as its arity). An application of a production rewrites a term to a set of terms. Such an application is constrained by the arguments of the involved non-terminals. The constraints are defined by gap-order conditions. For Cfgd, we solve a reachability problem in which we ask whether it is possible to derive a set of terms each of which is a terminal belonging to a given set of terminals. In the first step of our method, we give a reachability analysis algorithm that solves the above mentioned problem for Cfgds.

The algorithm is based on a constraint representation of infinite sets of terms, and it is formulated within the framework of well structured transition systems [4, 6].

The second step of our method translates a given Pdad into a Cfgd so as to exploit the corresponding reachability analysis procedure to solve control state reachability for Pdads.

To our knowledge our result yields a new decidable fragment of pushdown automata with data (see Section 10).

2 Preliminaries

In this section, we introduce some notations and definitions that we will use in the rest of the paper. We use \(\mathbb{N }\) to denote the set of natural numbers.

We fix a finite set \(\mathcal{V }\) of variables that range over \(\mathbb{N }\). A valuation is a mapping \({{ Val}}:{\mathcal{V }}\rightarrow {\mathbb{N }}\), i.e., it assigns a natural number to each variable. Given a variable \(x\in \mathcal{V }\), a natural number \(c\in \mathbb{N }\), and a valuation \({{ Val}}:{\mathcal{V }}\rightarrow {\mathbb{N }}\), we use \({ Val}\left[ {x}\leftarrow {c}\right] \) to denote the valuation \({ Val}'\) defined as follows: \({ Val}'(x)=c\), and \({ Val}'(y)={ Val}(y)\) for all \(y\in (\mathcal{V }\setminus \{x\})\).

A renaming is a mapping \({{ Ren}}:{\mathcal{V }}\rightarrow {\mathcal{V }}\), i.e., it renames each variable to another one. A renaming \({ Ren}\) does not need to be injective, i.e., several variables may be renamed to the same variable by \({ Ren}\). We say that \({ Ren}\) is a renaming for \(W\) if \({ Ren}\left( {x}\right) \in W\) for all \(x\in \mathcal{V }\).

For a set \(A\), we use \({A}^*\) to denote the set of finite words over \(A\). We use \(\epsilon \) to denote the empty word. For words \(\alpha _1,\alpha _2\in {A}^*\), we use \(\alpha _1\cdot \alpha _2\) to denote the concatenation of \(\alpha _1\) and \(\alpha _2\).

A transition system is a tuple \(\left\langle {\Upsilon ,\gamma _{ init},\overset{{}}{\longrightarrow }}\right\rangle \) where \(\Upsilon \) is a (potentially infinite) set of configurations, \(\gamma _{ init}\in \Upsilon \) is the initial configuration, and \(\overset{{}}{\longrightarrow }\subseteq \Upsilon \times \Upsilon \) is the transition relation. As usual, we write \(\gamma \overset{{}}{\longrightarrow }\gamma '\) to denote that \(\left\langle {\gamma ,\gamma '}\right\rangle \in \overset{{}}{\longrightarrow }\), and use \(\overset{{*}}{\longrightarrow }\) to denote the reflexive transition closure of \(\overset{{}}{\longrightarrow }\). For a configuration \(\gamma \in \Upsilon \) and a set \(\Gamma \subseteq \Upsilon \) of configurations, we use \(\gamma \overset{{*}}{\longrightarrow }\Gamma \) to denote that \(\gamma \overset{{*}}{\longrightarrow }\gamma '\) for some \(\gamma '\in \Gamma \).

3 Push-Down Automata with Data

In this section, we introduce Push-Down Automata with Data (Pdad) that are extensions of the classical model of Push-Down Automata (Pda). First, we define the model, then we define the operational semantics, i.e., the transition system induced by a Pdad, and finally we introduce the reachability problem. As in the case of a Pda a Pdad operates on an unbounded stack to which it can push (append) messages and from which it can pop (remove) message in last-in-first-out manner. The messages are chosen from a finite alphabet. Pdads extend Pdas in two ways. First, in addition to the stack, the automaton is equipped with a finite set of variables ranging over natural numbers. Second, each message inside the stack is equipped by a natural number that represents its “value”. The allowed operations on variables are defined by the gap-order constraint system [15, 31]. More precisely, the model allows non-deterministic value assignment, copying the value of one variable to another, and assignment of a value \(v\) to some variable such that \(v\) is larger of at least a given natural number than the current value of another variable. The transitions may be conditioned by tests that compare the values of two variables for equality, or that give the minimal allowed gap between two variables. A push operation may copy the value of a variable to the pushed message, and a pop operation may copy the value of the popped message to a variable.

Model. A Pdad \(\mathcal{A }\) is a tuple\(\left\langle {Q,q_{ init},A,\Delta }\right\rangle \) where \(Q\) is the finite set of states, \(q_{ init}\in Q\) is the initial state, \(A\) is the stack alphabet, and \(\Delta \) is the transition relation. We remark that the stack alphabet is infinite since it consists of pairs \(\left\langle {a,\ell }\right\rangle \) where \(a\) is taken from a finite set and \(\ell \) is a natural number. A transition \(\delta \in \Delta \) is a triple \(\left\langle {q_1,{ op},q_2}\right\rangle \) where \(q_1,q_2\in Q\) are states and \({ op}\) is an operation of one of the following forms: (i) \({ nop}\) is an empty operation that does not change the values of the variables or the content of the stack, (ii) \(x\leftarrow *\) assigns non-deterministically an arbitrary value in \(\mathbb{N }\) to the variable \(x\), (iii) \(y\leftarrow x\) copies the value of variable \(x\) to \(y\), (iv) \(y\leftarrow \left( >_cx\right) \) assigns non-deterministically to \(y\) a value that exceeds the current value of \(x\) by \(c\) (so the new value of \(y\) is \(>x+c\)), (v) \(y=x\) checks whether the value of \(y\) is equal to the value of \(x\), (vi) \(x<_{c}y\) checks whether the gap between the values of \(y\) and \(x\) is larger than \(c\), (vii) \({ push}\left( a\right) \left( x\right) \) pushes the symbol \(a\in A\) to the stack and assigns to it the value of \(x\), and (viii) \({ pop}\left( a\right) \left( x\right) \) pops the symbol \(a\in A\) (if \(a\) is the top-most symbol at the stack) and assigns its value to the variable \(x\).

Transition System. A Pdad induces a transition system as follows. A configuration \(\gamma \) is a triple \(\left\langle {q,{ Val},\alpha }\right\rangle \) where \(q\in Q\) is a state, \({ Val}:\mathcal{V }\mapsto \mathbb{N }\) is a valuation, and \(\alpha \in {\left( A\times \mathbb{N }\right) }^*\) defines the content of the stack (each element of the word is a pair \(\left\langle {a,c}\right\rangle \) where \(a\) is the symbol and \(c\) is its value).

We define the transition relation \(\overset{{}}{\longrightarrow }:=\cup _{\delta \in \Delta }\overset{{\delta }}{\longrightarrow }\), where \(\overset{{\delta }}{\longrightarrow }\) describes the effect of the transition \(\delta \). For configurations \(\gamma =\left\langle {q,{ Val},\alpha }\right\rangle \), \(\gamma '=\left\langle {q',{ Val}',\alpha '}\right\rangle \), and a transition \(\delta =\left\langle {q_1,{ op},q_2}\right\rangle \in \Delta \), we write \(\gamma \overset{{\delta }}{\longrightarrow }\gamma '\) to denote that \(q=q_1\), \(q'=q_2\), and one of the following conditions is satisfied:

  • \({ op}\) is \({ nop}\), \({ Val}'={ Val}\), and \(\alpha '=\alpha \). The values of the variables and the stack content are not changed.

  • \({ op}\) is \(x\leftarrow *\), \({ Val}'={ Val}\left[ {x}\leftarrow {c}\right] \) where \(c\in \mathbb{N }\), and \(\alpha '=\alpha \). The value of the variable \(x\) is changed non-deterministically to some natural number. The values of the other variables and the stack content are not changed.

  • \({ op}\) is \(y\leftarrow x\), \({ Val}'={ Val}\left[ {y}\leftarrow {{ Val}\left( {x}\right) }\right] \), and \(\alpha '=\alpha \). The value of the variable \(x\) is copied to the variable \(y\). The values of the other variables and the stack content are not changed.

  • \({ op}\) is \(y\leftarrow \left( >_cx\right) \), \({ Val}'={ Val}\left[ {y}\leftarrow {c'}\right] \), where \(c'>{ Val}\left( {x}\right) +c\), and \(\alpha '=\alpha \). The variable \(y\) is assigned non-deterministically a value that exceeds the value of \(x\) by \(c\). The values of the other variables and the stack content are not changed.

  • \({ op}\) is \(y=x\), \({ Val}\left( {y}\right) ={ Val}\left( {x}\right) \), \({ Val}'={ Val}\), and \(\alpha '=\alpha \). The transition is only enabled if the value of \(y\) is equal to the value of \(x\). The values of the variables and the stack content are not changed.

  • \({ op}\) is \(x<_{c}y\), \({ Val}\left( {y}\right) >{ Val}\left( {x}\right) +c\), \({ Val}'={ Val}\), and \(\alpha '=\alpha \). The transition is only enabled if the value of \(y\) is larger than the value of \(x\) by more than \(c\). The values of the variables and the stack content are not changed.

  • \({ op}\) is \({ push}\left( a\right) \left( x\right) \), \({ Val}'={ Val}\), and \(\alpha '=\left\langle {a,{ Val}\left( {x}\right) }\right\rangle \cdot \alpha \). The symbol \(a\) is pushed onto the stack with a value equal to that of \(x\).

  • \({ op}\) is \({ pop}\left( x\right) \left( a\right) \), \(\alpha =\left\langle {a,c}\right\rangle \cdot \alpha '\) for some \(c\in \mathbb{N }\), and \({ Val}'={ Val}\left[ {x}\leftarrow {c}\right] \). The symbol \(a\) is popped from the stack (if it is the top-most symbol), and its value is copied to the variable \(x\).

We define the initial configuration \(\gamma _{ init}:=\left\langle {q_{ init},{ Val}_{ init},\epsilon }\right\rangle \), where \({ Val}_{ init}(x)=0\) for all \(x\in \mathcal{V }\). In other words, we start from a configuration where the automaton is in its initial state, the values of all variables are equal to \(0\), and the stack is empty (the fact that we choose to initialize the variables to \(0\) is not crucial for solving the problem).

For a configuration and a state \(q\in Q\), we write \(\gamma \overset{{*}}{\longrightarrow }q\) to denote that \(\gamma \overset{{*}}{\longrightarrow } \gamma '=\left\langle {q,{ Val},\alpha }\right\rangle \) for some \({ Val}:\mathcal{V }\mapsto \mathbb{N }\) and \(\alpha \in {\left( A\times \mathbb{N }\right) }^*\).

In other words, from \(\gamma \) we can reach a configuration whose state is \(q\).

Reachability Problem. In the reachability problem Pdad-Reach, given a Pdad \(\mathcal{A }=\left\langle {Q,q_{ init},A,\Delta }\right\rangle \) and a state \(q_{ target}\in Q\), we ask whether \(\gamma _{ init}\overset{{*}}{\longrightarrow }q_{ target}\).

4 Context-Free Grammars with Data

In this section, we introduce Context-Free Grammars with Data (Cfgd) that are extensions of the classical model of Context-Free Grammars (Cfg) in which (terminal and non terminal) symbols are defined by terms with free variables and productions have conditions defined by gap order constraints. We define the model, the operational semantics, and the reachability problem.

Model. A Context-Free Grammars with Data (Cfgd) is a tuple \(\mathcal{G }=\left\langle {\mathcal{S },X_{ init},P}\right\rangle \), where \(\mathcal{S }\) is a finite set of symbols. \(X_{ init}\in \mathcal{S }\) is the start (or initial) symbol, and \(P\) is the set of productions. Each symbol \(X\) has an arity \(\rho \left( {X}\right) \in \mathbb{N }\) that is a natural number. Without loss of generality, we assume that \(\rho \left( {X_{ init}}\right) =1\). A term has the form \(X(x_1,\ldots ,x_n)\) where \(X\in \mathcal{S }\), \(\rho \left( {X}\right) =n\) and \(x_1,\ldots x_n\in \mathcal{V }\) are variables. A ground term has the form \(X(c_1,\ldots ,c_n)\) where \(X\in \mathcal{S }\), \(\rho \left( {X}\right) =n\) and \(c_1,\ldots c_n\in \mathbb{N }\) are natural numbers. For a term \(\sigma \) of the form \(X(x_1,\ldots ,x_n)\) we define \({ Sym}\left( \sigma \right) =X\) and \({ Var}\left( \sigma \right) =\{x_1,\ldots ,x_n\}\). We define \({ Sym}\left( \sigma \right) \) for a ground term \(\sigma \) similarly. A (ground) sentence \(\alpha \) is a finite set \(\left\{ \sigma _1,\sigma _2,\cdots ,\sigma _n\right\} \), where each \(\sigma _i\) is a (ground) term. We define \({ Sym}\left( \alpha \right) :=\left\{ { Sym}\left( \sigma _1\right) ,\ldots ,{ Sym}\left( \sigma _n\right) \right\} \), i.e., it is the set of symbols that occur in \(\alpha \). For a term \(\sigma =X(x_1,\ldots ,x_n)\) and a valuation \({ Val}\), we define \({ Val}\left( {\sigma }\right) :=X({ Val}\left( {x_1}\right) ,\ldots ,{ Val}\left( {x_n}\right) )\) to be the ground term we get by substituting each variable \(x_i\) in \(\sigma \) by \({ Val}\left( {x_i}\right) \). For a sentence \(\alpha \), we define \({ Val}\left( {\alpha }\right) \) similarly.

A condition \(\theta \) is a finite conjunction of formulas of the forms: \(x<_{c}y\) or \(x=y\), where \(x,y\in \mathcal{V }\) and \(c\in \mathbb{N }\). Here \(x<_{c}y\) stands for \(x+c<y\). Sometimes, we treat a condition \(\theta \) as set, and write e.g. \((x<_{c}y)\in \theta \) to indicate that \(x<_{c}y\) is one of the conjuncts in \(\theta \). For a valuation \({ Val}\), we use \({ Val}\left( {\theta }\right) \) to denote the result of substituting each variable \(x\) in \(\theta \) by \({ Val}\left( {x}\right) \). We use \({ Val}\models \theta \) to denote that \({ Val}\left( {\theta }\right) \) evaluates to true. We use \({ Var}\left( \theta \right) \) to denote the set of variables that occur in \(\theta \).

A production \(p\) is of the form \({\sigma }\leadsto {\alpha }\;:\;{\theta }\), where \(\sigma \) is a term, \(\alpha \) is a non-empty sentence, and \(\theta \) is a condition. We often use the notation \({\sigma }\leadsto {\sigma _1\cdots \sigma _n}\;:\;{\theta }\) to denote the production \({\sigma }\leadsto {\{\sigma _1,\ldots ,\sigma _n\}}\;:\;{\theta }\) (i.e. a sequence in the right-hand side denotes a set of terms). We use \(\mathcal{N }\) to denote the set of non-terminals consisting of symbols that occur in the left-hand side of a production (we say that they are defined by a production). We use \(\mathcal{T }\) to denote the set of terminals consisting of symbols that do not occur in the left-hand side of a production. Furthermore, we use \(\mathcal{A _T}\) to denote the set of ground terms with symbols in \(\mathcal{T }\).

Transition System. A configuration \(\gamma \) is a ground sentence. We define a transition relation \(\overset{{}}{\longrightarrow }_{\mathcal{G }}\) on the set of configurations by \(\overset{{}}{\longrightarrow }_{\mathcal{G }}:=\cup _{p\in P}\overset{{p}}{\longrightarrow }\) where \(\overset{{p}}{\longrightarrow }\) represents the effect of applying the production \(p\). More precisely, for a production \(p\in P\) of the form \({\sigma }\leadsto {\alpha }\;:\;{\theta }\), we have \(\gamma _1\overset{{p}}{\longrightarrow }\gamma _2\) if there is a valuation \({ Val}\models \theta \) such that \(\gamma _1=\alpha ' \cup \{{ Val}\left( {\sigma }\right) \}\) and \(\gamma _2=\alpha ' \cup \{{ Val}\left( {\alpha }\right) \}\).

For a set \(S\) of ground terms, we define \({ Pre}\left( {S}\right) \) to be the set of ground terms \(\sigma \) which can, through the single application of a production, generate a configuration \(\gamma \subseteq S\) (i.e., \({\sigma \overset{{}}{\longrightarrow }_{\mathcal{G }}\gamma }\)). Let \({ Pre}^*\left( {\cdot }\right) \) denote the transitive closure of \({ Pre}\left( {\cdot }\right) \).

We will use the following lemmata later in the paper.

Lemma 1.

Let \(\alpha \) be a ground sentence of \(\mathcal{G }\). Then, if for every ground term \(\sigma \in \alpha \), we have \(\sigma \overset{{*}}{\longrightarrow }_{\mathcal{G }} \alpha ''\) for some ground sentence \(\alpha ''\) such that \({ Sym}\left( \alpha ''\right) \subseteq \mathcal{T }\), then \(\alpha \overset{{*}}{\longrightarrow }_{\mathcal{G }} \alpha '\) for \(\alpha '\) such that \({ Sym}\left( \alpha '\right) \subseteq \mathcal{T }\).

Lemma 2.

Let \(S\) be a set of ground terms and \(\sigma \) be a ground term such that \(\sigma \in { Pre}^*\left( {S}\right) \). If \(\sigma \notin S\) then there is a ground term \(\sigma ' \in ({ Pre}\left( {S}\right) \setminus S)\).

Reachability Problem. In the reachability problem Cfgd-Reach, we are given a Cfgd \(\mathcal{G }=\left\langle {\mathcal{S },X_{ init},P}\right\rangle \) and we are asked the question whether \(X_{ init}(0)\overset{{*}}{\longrightarrow }_{\mathcal{G }}\alpha \) for some ground sentence \(\alpha \) such that \({ Sym}\left( \alpha \right) \subseteq \mathcal{T }\). In other words, we start from a configuration consisting of the start symbol with its parameter set to zero, and ask whether the system can reach a configuration where all its ground terms have symbols in \(\mathcal{T }\).

Cfgd vs Cfg A Context-Free Grammars (Cfg) is defined by production of the form \(S\rightarrow w\) where \(w\) is a word defined over terminal and non terminal symbols. We can encode a Cfg as a Cfgd by associating to each terminal/non terminal symbol \(X\) (except the initial) a term \(X(a,b)\) in which \((a,b)\) are used to maintain an order in the right-hand side of a rule. For instance, the production \(S\rightarrow S a S\) is encoded via the Cfgd production \(S(x,y)\rightarrow \{S(x,z),a(z,t),S(t,y)\}:x<z,z<t,t<y\).

Cfgd vs CMRS Cfgd also differ from the CMRS model [7]. CMRS is obtained by combining multiset rewriting and Gap Order constraints and it is aimed at modeling concurrent processes. CMRS rules have multiple heads and work over multisets of monadic terms (i.e. with a single argument, no nested terms). Differently from CMRS, Cfgd productions have a single term in the left-hand side and a set of terms in the right-hand side. This implies that multiple occurrences (with the same variables) of a term like \(p(x,y)\) are counted only once. Furthermore, non-terminal symbols have arbitrary finite arity.

5 Symbolic Encoding

In this section, we define the symbolic representation used in the definition of the reachability algorithm (Section 6). The algorithm operates on constraints, where each constraint \(\phi \) characterizes a (potentially) infinite set \(\left[\![\phi \right]\!]\) of ground terms. A constraint \(\phi \) is of the form \({\sigma }:{\theta }\) where \(\sigma \) is a term and \(\theta \) is a condition. We define \({ Sym}\left( \phi \right) ={ Sym}\left( \sigma \right) \) and \({ Var}\left( \phi \right) ={ Var}\left( \sigma \right) \cup { Var}\left( \theta \right) \).

Definition 3.

The constraint \(\phi \) characterizes a set of ground terms defined by \(\left[\![\phi \right]\!]= \left\{ \sigma '|\, {\exists { Val}.\;({ Val}\models \theta )\wedge (\sigma '={ Val}(\sigma )}\right\} \). For a finite set of constraints \(\Phi \), \(\left[\![\Phi \right]\!]=\bigcup\nolimits _{\phi \in \Phi }\left[\![\phi \right]\!]\).

Without loss of generality, we can assume that \({ Var}\left( \theta \right) ={ Var}\left( \sigma \right) \), and that \(\theta \) is consistent (constraints with inconsistent conditions characterize empty sets of configurations, and can therefore be safely discarded from the reachability analysis). A term \(X(x_1,\ldots ,x_n)\) is said to be pure if \(x_i\ne x_j\) whenever \(i\ne j\). A constraint \({\sigma }:{\theta }\) is said pure if \(\sigma \) is pure. We can assume without loss of generality that all constraints are pure. The reason is that if a variable \(x\) occurs (say) twice then the two occurrences of \(x\) can be replaced by two different variables \(y_1\) and \(y_2\) provided that we add a new conjunct \(y_1=y_2\) to the condition \(\theta \). For constraints \(\phi _1,\phi _2\), we use \(\phi _1\sqsubseteq \phi _2\) to denote that \(\phi _1\) subsumes \(\phi _2\), i.e., \(\left[\![\phi _1\right]\!]\supseteq \left[\![\phi _2\right]\!]\). Then, it is easy to see that checking whether \(\phi _1\sqsubseteq \phi _2\) can be reduced to the satisfiability problem for an existential Presburger formula (which is known to be NP-complete [34] ).

Lemma 4.

For constraints \(\phi _1,\phi _2\), the problem of checking whether \(\phi _1\sqsubseteq \phi _2\) is decidable.

The following lemma states that we can transform any constraint \(\phi \) of the form \({\sigma }:{\theta }\) to an equivalent constraint \( clean (\phi )\) of the form \({\sigma }:{\theta }'\) such that \({ Var}\left( \theta '\right) ={ Var}\left( \sigma \right) \) (i.e., we remove the extra-variables \(({ Var}\left( \theta \right) \setminus { Var}\left( \sigma \right) )\) from \(\theta \) in order to satisfy the assumption that \({ Var}\left( \theta \right) ={ Var}\left( \sigma \right) \)).

Lemma 5.

[31] Given a constraint \(\phi \) of the form \({\sigma }:{\theta }\), we can construct a constraint \( clean (\phi )\) of the form \({\sigma }:{\theta }'\) such that \({ Var}\left( \theta '\right) ={ Var}\left( \sigma \right) \) and \(\left[\![ clean (\phi )\right]\!]=\left[\![\phi \right]\!]\).

Given two terms \(\sigma _1\) and \(\sigma _2\), we say that \(\sigma _1\) matches \(\sigma _2\) iff \({ Sym}\left( \sigma _1\right) = { Sym}\left( \sigma _2\right) \). For matching terms \(\sigma _1=X(x_1,\ldots ,x_n)\) and \(\sigma _2=X(y_1,\ldots ,y_n)\), where \(\sigma _2\) is pure, we define \({ Ren}^{\sigma _2}_{\sigma _1}\) to be a renaming such that \({ Ren}^{\sigma _2}_{\sigma _1}(y_i)=x_i\) for all \(i:1\le i\le n\). Consider a production \(p={\sigma }\leadsto {\sigma _1\cdots \sigma _n}\;:\;{\theta }\) and constraints \(\phi _1={\sigma '_1}:{\theta _1},\ldots , \phi _n={\sigma '_n}:{\theta _n}\) such that \(\sigma _i\) and \(\sigma '_i\) are matching, and such that \(\sigma '_i\) is pure for all \(i:1\le i\le n\). We define \(p\otimes \phi _1\otimes \cdots \otimes \phi _n\) to be the constraint \({\sigma }:{\theta \wedge { Ren}_{\sigma _1}^{\sigma '_1}(\theta _1) \wedge \cdots \wedge { Ren}_{\sigma _n}^{\sigma '_n}(\theta _n)}\). For a set \(\Phi \) of constraints, and production \(p\in P\), we define \({ Pre}_{p}\left( {\Phi }\right) := \left\{ clean (\phi ')|\, {\exists \phi _1,\ldots ,\phi _n\in \Phi .\, \phi '=p\otimes \phi _1\cdots \otimes \phi _n}\right\} \). We define \({ Pre}\left( {\Phi }\right) :=\cup _{p\in P}{ Pre}_{p}\left( {\Phi }\right) \). Intuitively, \({ Pre}\left( {\Phi }\right) \) defines a finite set of constraints that characterize the terms which can, through the single application of a production, generate a set of terms each of which belongs to \(\Phi \).

Lemma 6.

\(\bigcup\nolimits _{\phi '\in { Pre}\left( {\Phi }\right) }\left[\![\phi '\right]\!]= { Pre}\left( {\left[\![\Phi \right]\!]}\right) \).

For the set \(\mathcal{T }\) of terminals, we define

$$\begin{aligned} \Phi _\mathcal{T }:=\left\{ a(x_1,\ldots ,x_n):\mathsf true |\, {a\in \mathcal{T },\ \rho \left( {a}\right) =n}\right\} \end{aligned}$$

Notice that \(\Phi _\mathcal{T }\) denotes the set of configurations whose symbols are in \(\mathcal{T }\).

6 Reachability Analysis

In this section, we present an algorithm for solving the reachability analysis problem for Cfgds, and prove its partial correctness. The algorithm (Algorithm 1) inputs a Cfgd \(\mathcal{G }=\left\langle {\mathcal{S },X_{ init},P}\right\rangle \) and answers the question whether we can reach a sentence where all the occurring terms are in \(\mathcal{A _T}\) (i.e. terms with symbols in \(\mathcal{T }\)). The algorithm maintains two sets of constraints: a set \(\mathtt {ToExplore}\), initialized to \(\Phi _\mathcal{T }\), of constraints that have not yet been analyzed; and a set \(\mathtt {Explored}\), initialized to the empty set, of constraints that contain constraints that have already been analyzed.

The algorithm preserves the following four invariants:

  1. 1.

    For each \(\sigma \in \left[\![\mathtt {ToExplore}\cup \mathtt {Explored}\right]\!]\), \(\sigma \overset{{*}}{\longrightarrow }{\alpha }\) for some \(\alpha \) s.t. \({ Sym}\left( \alpha \right) \subseteq \mathcal{T }\).

  2. 2.

    If \(X_{ init}(0)\overset{{*}}{\longrightarrow }\alpha \) for some \(\alpha \) s.t. \({ Sym}\left( \alpha \right) \subseteq \mathcal{T }\), then there is a ground term \(\sigma \in \left[\![\mathtt {ToExplore}\right]\!]\) such that \(\sigma \not \in \left[\![\mathtt {Explored}\right]\!]\).

  3. 3.

    \(X_{ init}(0)\not \in \left[\![\mathtt {Explored}\right]\!]\).

  4. 4.

    \(\left[\![\Phi _\mathcal{T }\right]\!]\subseteq \left[\![\mathtt {ToExplore}\cup \mathtt {Explored}\right]\!]\).

figure a

It is easy to see that the third and fourth invariants will be preserved. More precisely, for the third invariant, \(\mathtt {Explored}\) is initially empty, and the condition at line 5 prevents adding any constraint whose symbol is \(X_{ init}\) and parameter equals to \(0\) to \(\mathtt {Explored}\). The fourth invariant holds initially since \(\mathtt{ToExplore}\cup \mathtt {Explored}={\Phi _\mathcal{T }\cup \emptyset }={\Phi _\mathcal{T }}\). This invariant is preserved since each time we remove a constraint from \(\mathtt {ToExplore}\) (line 4), it is either eventually moved to \(\mathtt {Explored}\) (line 9), or (in case it is discarded at line 6) there is already a constraint \(\phi '\in \mathtt {Explored}\) with \(\left[\![\phi '\right]\!]\supseteq \left[\![\phi \right]\!]\). Also, each time we remove a constraint \(\phi '\) from \(\mathtt {Explored}\) (line 9), we add the constraint \(\phi \) to \(\mathtt {Explored}\) where \(\left[\![\phi \right]\!]\supseteq \left[\![\phi '\right]\!]\).

Below, we show that the first two invariants are also preserved. Initially, the first invariant holds since \({\left( \mathtt {ToExplore}\cup \mathtt {Explored}\right) }={\Phi _\mathcal{T }}\). The second invariant also holds initially since \(\mathtt {Explored}=\emptyset \) and \(\left[\![\mathtt {ToExplore}\right]\!]=\left[\![\Phi _\mathcal{T }\right]\!]\ne \emptyset \). Due to the first two invariants, the following two conditions can be checked during each step of the algorithm:

  • From the second invariant, if \(\mathtt {ToExplore}\) becomes empty then the algorithm terminates with a negative answer.

  • From the first invariant, if a constraint \(\phi \) is detected such that \(X_{ init}(0)\in \left[\![\phi \right]\!]\), then the algorithm terminates with a positive answer.

If neither of the two conditions is satisfied, the algorithm proceeds by picking and removing a constraint \(\phi \) from \(\mathtt {ToExplore}\). Two possibilities arise depending on the value of \(\sigma \):

  • If there exists a constraint \(\phi '\in \mathtt {Explored}\) with \(\phi '\sqsubseteq \phi \), then we discard \(\phi \). The first invariant is preserved since this operation will not add any new elements to \(\left[\![\mathtt {ToExplore}\cup \mathtt {Explored}\right]\!]\). If \(X_{ init}(0)\overset{{*}}{\longrightarrow }\alpha \) for some \(\alpha \) s.t. \({ Sym}\left( \alpha \right) \subseteq \mathcal{T }\), then the second invariant and the fact that \(\left[\![\phi \right]\!]\subseteq \left[\![\mathtt {Explored}\right]\!]\) imply that there is still some \(\sigma \in \mathtt {ToExplore}\) such that \(\sigma \not \in \left[\![\mathtt {Explored}\right]\!]\). This means that the second invariant will also be preserved by this step.

  • Otherwise, we compute the elements of \({ Pre}\left( \mathtt{Explored}\cup {\phi }\right) \), add them in \(\mathtt {ToExplore}\), move \(\phi \) to \(\mathtt {Explored}\), and remove all constraints in \(\mathtt {Explored}\) that are subsumed by \(\phi \). Let \(\mathtt {Explored^{old}}\) and \(\mathtt {Explored^{new}}\) be the contents of the set \(\mathtt {Explored}\) before resp. after performing the operation. Define \(\mathtt {ToExplore^{old}}\) and \(\mathtt {ToExplore^{new}}\) analogously. The operation preserves the first invariant as follows. Pick any \(\sigma \in \left[\![\mathtt {ToExplore^{new}}\cup \mathtt {Explored^{new}}\right]\!]\). If \(\sigma \in \left[\![\mathtt {ToExplore^{old}}\cup \mathtt {Explored^{old}}\right]\!]\) then the result follows by the first invariant. Otherwise we know that \(\sigma \in \left[\![{ Pre}\left( \mathtt{Explored^{old}}\cup \left\{ \phi \right\} \right) \right]\!]\), i.e., \(\sigma \overset{{}}{\longrightarrow }_{\mathcal{G }}\alpha \) where \(\alpha \subseteq \left[\![\mathtt {Explored^{old}}\cup \left\{ \phi \right\} \right]\!]\) (see Lemma 6). By the induction hypothesis and the first invariant, we know that every ground term \(\sigma '\in \alpha \), \(\sigma '\overset{{*}}{\longrightarrow }_{\mathcal{G }} \alpha '\) for some \( \alpha '\) s.t. \({ Sym}\left( \alpha '\right) \subseteq \mathcal{T }\) . Hence \(\alpha \overset{{*}}{\longrightarrow }_{\mathcal{G }}\alpha ''\) for some \( \alpha ''\) s.t. \({ Sym}\left( \alpha ''\right) \subseteq \mathcal{T }\) (see Lemma 1). In other words, \(\sigma \overset{{}}{\longrightarrow }_{\mathcal{G }}\alpha \overset{{*}}{\longrightarrow }_{\mathcal{G }}\alpha ''\) s.t. \({ Sym}\left( \alpha ''\right) \subseteq \mathcal{T }\). The operation also preserves the second invariant as follows. Assume that \(X_{ init}(0)\overset{{*}}{\longrightarrow }_{\mathcal{G }}\alpha \) for some \(\alpha \) s.t. \({ Sym}\left( \alpha \right) \subseteq \mathcal{T }\). There are two cases. If there is a \(\sigma \in \left[\![\Phi _\mathcal{T }\right]\!]\) such that \(\sigma \not \in \left[\![\mathtt {Explored^{new}}\right]\!]\), then by the fourth invariant \(\sigma \in \left[\![\mathtt {ToExplore^{new}}\right]\!]\) and the invariant holds immediately. Otherwise, \(\left[\![\Phi _\mathcal{T }\right]\!]\subseteq \left[\![\mathtt {Explored^{new}}\right]\!]\). Since \(X_{ init}(0)\overset{{*}}{\longrightarrow }_{\mathcal{G }}\alpha \) we have also that \(X_{ init}(0) \in { Pre}^*\left( {\left[\![\mathtt {Explored^{new}}\right]\!]}\right) \). By the third invariant, we know that \(X_{ init}(0)\not \in \left[\![\mathtt {Explored^{new}}\right]\!]\) . By Lemma 2 that there is a ground term \(\sigma \in ({ Pre}\left( {\left[\![\mathtt {Explored^{new}}\right]\!]}\right) \setminus \left[\![\mathtt {Explored^{new}}\right]\!])\). Since \(\left[\![\mathtt {Explored^{new}}\right]\!]=\left[\![\mathtt {Explored^{old}}\cup \left\{ \phi \right\} \right]\!]\) it follows that \(\sigma \in \left[\![{ Pre}\left( \mathtt{Explored^{old}}\cup \left\{ \phi \right\} \right) \right]\!]\) and hence \(\sigma \in \left[\![\mathtt {ToExplore^{new}}\right]\!]\).

This give us the following theorem.

Theorem 7.

Algorithm 1, under termination assumption, always return the correct answer.

7 Termination

In this section, we show that Algorithm 1 is guaranteed to terminate. To do that, we first recall some basics of the theory of well and better quasi-orderings. Then, we introduce a new class of constraints that we call flat constraints and show that they are better quasi-ordered. We show that each condition can be translated into a number of flat constraints. We use this to show that the set of conditions is well quasi-ordered under set inclusion. This leads to the well quasi-ordering of the set of constraints (of Section 5). Finally, we show the termination of the algorithm.

Wqos and Bqos. A Quasi-Ordering (or a Qo for short), is a pair \(\left\langle {A,\preceq }\right\rangle \) where \(\preceq \) is a reflexive and transitive binary relation on the set \(A\). A QO \(\left\langle {A,\preceq }\right\rangle \) is a Well Quasi-Ordering (Wqo), if for each infinite sequence \(a_1,a_2,a_3,\dots \) of elements of \(A\) , there are \(i<j\) such that \(a_i\preceq a_j\). The following lemma follows from the definition of a Wqo.

Lemma 8.

For Qos \(\preceq \) and \(\preceq '\) on some set \(A\), if \(\preceq \subseteq \preceq '\) and \(\preceq \) is a Wqo then \(\preceq '\) is a Wqo.

Given a Qo \(\left\langle {A,\preceq }\right\rangle \), we define a Qo \(\left\langle {{A}^*,\preceq ^*}\right\rangle \) on the set of words \({A}^*\) such that \(a_1a_2\cdots a_m\preceq ^* a'_1a'_2\cdots a'_n\) if there is an injection \(h:\left\{ 1,\ldots ,m\right\} \mapsto \left\{ 1,\ldots ,n\right\} \) such that \(i<j\) implies \(h(i)<h(j)\) for all \(i,j:1\le i,j\le m\), and \(a_i\preceq a'_{h(i)}\) for each \(i:1\le i\le m\). We define the relation \(\preceq ^\mathcal{P }\) on the powerset \(\mathcal{P }\left( {A}\right) \) (finite set of elements in \(A\)) of \(A\), so that \(A_1\preceq ^\mathcal{P }A_2\) if \(\forall a_2\in A_2.\exists a_1\in A_1. a_1\preceq a_2\).

We define the relation \(\preceq ^{p}\) on the Cartesian product \(A_1\times \ldots \times A_n\) of orders \(\left\langle {A_i,\le _i}\right\rangle \) for \(i:1,\ldots ,n\), so that \(\left\langle {a_1,\ldots ,a_n}\right\rangle \preceq ^{p}\left\langle {a_1',\ldots ,a_n'}\right\rangle \) if \(a_i\preceq _i a_i'\) for \(i:1,\ldots ,n\).

In the following lemma we state some properties of BqosFootnote 1 [10, 28].

Lemma 9.

  • Each Bqo is Wqo.

  • If \(A\) is finite, then \(\left\langle {A,=}\right\rangle \) is a Bqo, and \(\left\langle {\mathcal{P }\left( {A}\right) ,\subseteq }\right\rangle \) is a Bqo.

  • \(\left\langle {\mathbb{N },\le }\right\rangle \) is a Bqo.

  • If \(\left\langle {A_i,\le _i}\right\rangle \) is a Bqo for \(i:1,\ldots ,n\) then \(\left\langle {A_1\times \ldots \times A_n,\preceq ^{p}}\right\rangle \) is a Bqo.

  • If \(\left\langle {A,\preceq }\right\rangle \) is a Bqo, then \(\left\langle {\mathcal{P }\left( {A}\right) ,\preceq ^\mathcal{P }}\right\rangle \) is a Bqo.

Flat Constraints. Fix a set \(\mathcal{V }=\left\{ x_1,\ldots ,x_n\right\} \) of variables. A flat constraint \(\psi \) over \(\mathcal{V }\) if of the form \(A_0c_1A_1\cdots c_mA_m\), where \(c_1,\ldots ,c_m\in \mathbb{N }\), and \(A_0,A_2,\ldots ,A_m\) is a partitioning of \(\mathcal{V }\), i.e., \(\mathcal{V }=A_0\cup A_1\cup \cdots \cup A_m\), \(A_i\ne \emptyset \), and \(A_i\cap A_j=\emptyset \) if \(i\ne j\). In other words, a flat constraint is a word which alternatively contains sets of variables and natural numbers, starting and ending with a set of variables. The flat constraint \(\psi \) characterizes an infinite set \(\left[\![\psi \right]\!]\) of vectors over \(\mathbb{N }\) of length \(n\), i.e., \(\left[\![\psi \right]\!]\subseteq \mathbb{N }^n\). More precisely, define \(h_\psi :\left\{ 1,\ldots ,n\right\} \mapsto \left\{ 0,\ldots ,m\right\} \) such that \(h_\psi (i)=k\) if \(x_i\in A_k\). \(v=\left\langle {d_1,\ldots ,d_n}\right\rangle \in \left[\![\psi \right]\!]\) iff the following conditions are satisfied for all \(i,j:1\le i,j\le n\):

  • \(d_i=d_j\) if \(h_\psi (i)=h_\psi (j)\).

  • If \(h_\psi (i)=k\). and \(h_\psi (j)=k+1\) then \(c_{k+1}<d_j-d_i\).

In other words, the variable \(x_i\) represents \(d_i\) in \(\psi \). If two variables are mapped to the same set then their values should be identical. Furthermore, the natural numbers \(c_i\) define the gaps between values of variables belonging to the different sets. For flat constraints \(\psi =A_0c_1A_1\cdots c_mA_m\) and \(\psi '=A'_0c'_1A'_1\cdots c'_mA'_m\) over \(\mathcal{V }\), we write \(\psi \preceq \psi '\) to denote that (i) \(A'_i=A_i\) for all \(i:0\le i\le m\), and (ii) \(c_i\le c'_i\) for all \(i:1\le i\le m\). The following lemma follows from the definitions.

Lemma 10.

\(\psi \preceq \psi '\) implies that \(\left[\![\psi \right]\!]\supseteq \left[\![\psi '\right]\!]\).

By Lemma 9 it follows that

Lemma 11.

\(\preceq \) is a Bqo on the set of flat constraints.

Proof.

We first observe that flat contraints can be viewed as tuples with at most \(K=|\mathcal{V }|\) partitions and \(|\mathcal{V }|-1\) constants and we can always add finite sequences such as \(0\emptyset 0\ldots 0\emptyset \) to consider \(K\)-tuples only. From Lemma 9, we know that \(\left\langle {\mathbb{N },\le }\right\rangle \) and \(\left\langle {\mathcal{P }\left( {\mathcal{V }}\right) , =}\right\rangle \) are Bqos. Thus, the Cartesian product \((\mathcal{P }\left( {\mathcal{V }}\right) \times \mathbb{N })^{K-1}\times \mathcal{P }\left( {\mathcal{V }}\right) \) with \(\preceq \) is still a Bqo.

Flattening. Consider a condition \(\theta \) with \({ Var}\left( \theta \right) =\left\{ x_1,\ldots ,x_n\right\} \) (recall the definitions of conditions and constraints from Section 5). We define \(\left[\![\theta \right]\!]\) to be the set of vectors \(v=\left\langle {d_1,\ldots ,d_n}\right\rangle \in \mathbb{N }^n\), such that there is a valuation \({ Val}\) with \({ Val}\models \theta \) and \({ Val}\left( {x_i}\right) =d_i\) for all \(i:1\le i\le n\). Furthermore, for two conditions on the same set of variables we define \(\theta \sqsubseteq \theta '\) iff \(\left[\![\theta \right]\!]\supseteq \left[\![\theta '\right]\!]\). A flattening of \(\theta \) is a flat constraint \(\psi \) over \({ Var}\left( \theta \right) \), of the form \(A_0c_1A_1\cdots c_mA_m\) where \(c_1,\ldots ,c_m\ge 0\) are minimal natural numbers such that the following conditions are satisfied:

  • If \((x=y)\in \theta \) then \(x,y\in A_i\) for some \(i:1\le i\le m\).

  • If \((x<_cy)\in \theta \), \(x\in A_i\), and \(y\in A_j\) then \(c\le \left( \sum\nolimits _{k=i+1}^j(c_k+1)-1\right) \).

Intuitively, variables which are required to be equal by \(\theta \), are put in the same \(X_i\). Also, variables which are ordered according to \(\theta \), are placed sufficiently far apart to cover the corresponding gap. We define \(\mathcal{F }\left( \theta \right) \) to be the set of flattening of \(\theta \). In general conditions induce a partial order between variables. The flattening contains all linearizations with minimal gaps (constants) between variables. Notice that this set is finite. As an example, consider the condition \(x<_2y,x<_1z\). Since there are no constraints on \(y\) and \(z\), we have three different flattening where \(y<z\) or \(y=z\) or \(y>z\), namely \(\{x\}2\{y\}0\{z\}\), \(\{x\}2\{y,z\}\), and \(\{x\}1\{z\}0\{y\}\).

We define an ordering \(\preceq \) on conditions such that \(\theta \preceq \theta '\) if for each \(\psi '\in \mathcal{F }\left( \theta '\right) \) there is a \(\psi \in \mathcal{F }\left( \theta \right) \) with \(\psi \preceq \psi '\). From Lemma 10 we get the following.

Lemma 12.

\(\theta \preceq \theta '\) implies that \(\left[\![\theta \right]\!]\supseteq \left[\![\theta '\right]\!]\).

The following lemma follows from Lemma 9 and Lemma 11.

Lemma 13.

\(\preceq \) is a Bqo (and hence Wqo) on the set of conditions.

From Lemma 13, Lemma 12, and Lemma 8 we get the following lemma.

Lemma 14.

The set of conditions is Wqo under \(\sqsubseteq \).

The following lemma then holds.

Lemma 15.

The set of constraints is Wqo under \(\sqsubseteq \).

Proof.

Consider an infinite sequence of constraints: \(\phi _1,\phi _2,\phi _3,\ldots \). Since the set \(\mathcal{N }\cup \mathcal{T }\) is finite, there is an infinite sequence \(i_1<i_2<i_3<\cdots \) such that \({ Sym}\left( \phi _{i_1}\right) ={ Sym}\left( \phi _{i_2}\right) ={ Sym}\left( \phi _{i_3}\right) =\cdots \). If \({ Sym}\left( \phi _{i_j}\right) \in \mathcal{T }\) then the result follows immediately (since \(\left[\![\phi _{i_j}\right]\!]=\left\{ { Sym}\left( \phi _{i_j}\right) \right\} \) for all \(j\ge 1\)). Otherwise, we can assume, without loss of generality, that \(\phi _{i_j}\) is of the form \({X(x_1,\ldots ,x_n)}:{\theta _{i_j}}\). Notice that each \({ Var}\left( \theta _{i_j}\right) =\left\{ x_1,\ldots ,x_n\right\} \) is a condition over \(\left\{ x_1,\ldots ,x_n\right\} \). By Lemma 14, there are \(j<k\) such that \(\theta _{i_j}\sqsubseteq \theta _{i_k}\), and hence \(\phi _{i_j}\sqsubseteq \phi _{i_k}\).

Termination. The reason why the algorithm always terminates is that only a finite set of constraints can be added to \(\mathtt {Explored}\). This can be explained as follows. By definition, a new element \(\phi \) is added to \(\mathtt {Explored}\) only if \(\phi '\not \sqsubseteq \phi \), for each \(\phi ^\prime \) already added to \(\mathtt {Explored}\). This means that the constraints added to \(\mathtt {Explored}\) form a sequence \(\phi _1,\phi _2,\phi _3,\ldots \), such that \(\phi _i\not \sqsubseteq \phi _j\) for all \(i < j\). By Wqo of \(\sqsubseteq \) (Lemma 15) it follows that this sequence is finite. This gives the following theorem.

Theorem 16.

Algorithm 1 is guaranteed to terminate.

8 Translation

Reachability with Empty Stacks. We consider a different variant of Pdad-Reach which we call Pdad-Reach-Empty. An instance of Pdad-Reach-Empty is defined by a Pdad \(\mathcal{A }=\left\langle {Q,q_{ init},A,\Delta }\right\rangle \) and a state \(q_{ target}\in Q\), and we are asked whether \(\gamma _{ init}\overset{{*}}{\longrightarrow }\gamma \) for some \(\gamma \) of the form \(\left\langle {q_{ target},{ Val},\epsilon }\right\rangle \), i.e., we ask whether we reach \(q_{ target}\) at a configuration where the stack is empty. Given an instance of Pdad-Reach, defined by a Pdad \(\mathcal{A }=\left\langle {Q,q_{ init},A,\Delta }\right\rangle \) and a state \(q_{ target}\in Q\), we derive an equivalent instance of Pdad-Reach-Empty as follows. We construct a new Pdad \(\mathcal{A }'\) from \(\mathcal{A }\) by adding a new state \(q_{ new}\) to \(Q\), and adding a transition labeled with \({ nop}\) from \(q_{ target}\) to \(q_{ new}\). For each member \(a\in A\) of the stack alphabet, we add a self-loop on \(q_{ new}\) that pops \(a\) (with any value). The two problem instances are equivalent as follows. Suppose that \(q_{ new}\) is reachable with an empty stack in \(\mathcal{A }'\). Then, the run of \(\mathcal{A }'\) reaching \(q_{ new}\) must have passed through \(q_{ target}\) (since \(q_{ new}\) can only be reached from \(q_{ target}\)). This means that \(q_{ target}\) is reachable in \(\mathcal{A }\). On the other hand, suppose that \(q_{ target}\) is reachable in \(\mathcal{A }\). Then, \(\mathcal{A }'\) can simulate the run of \(\mathcal{A }\) until it reaches \(q_{ target}\). From there, it takes the transition to \(q_{ new}\), and starts executing the self-loops, popping all the symbols in the stack until the stack becomes empty.

From Pdad to Cfgd. Suppose that we are given an instance of Pdap-Reach-Empty defined by a Pdad \(\mathcal{A }=\left\langle {Q,q_{ init},A,\Delta }\right\rangle \) and a state \(q_{ target}\in Q\). Let \(\left\{ x_1,\ldots ,x_n\right\} \) be the set of variables that occur in \(\mathcal{A }\). We derive an equivalent instance of Cfgd-Reach defined by a Cfgd \(\mathcal{G }=\left\langle {\mathcal{S },X_{ init},P}\right\rangle \). The set \(\mathcal{T }\) of \(\mathcal{G }\) is defined by the singleton set \(\left\{ t\right\} \) and we assume that the arity of \(t\) is \(0\) (i.e., \(\rho \left( {t}\right) =0\)). The set of \(\mathcal{N }\) of \(\mathcal{G }\) is defined as follows: For each pair of states \(q_1,q_2\in Q\) and symbol \(a\in A\cup \{\bot \}\), with \(\bot \notin A\), we have a nonterminal \(X_{(q_1,a,q_2)}\in \mathcal{N }\) with arity \(2n+1\). The symbol \(\bot \) is used to denote that the stack of \(\mathcal{A }\) is empty. The set of non-terminal set \(\mathcal{N }\) contains the initial symbol \(X_{ init}\) (by definition).

In the following, let \(\bar{y}\) denote a vector \(\left\langle {y_1,\ldots ,y_n}\right\rangle \) of length \(n\), and define \(\bar{y}[i]:=y_i\) for \(i:1\le i\le n\). For vectors \(\bar{z}=\left\langle {z_1,\ldots ,z_n}\right\rangle \) and \(\bar{y}=\left\langle {y_1,\ldots ,y_n}\right\rangle \), we use \(\bar{z}=\bar{y}\) (resp. \(\bar{z}\ne _j \bar{y}\) for some \(j : 1 \le j \le n\)) to denote the condition \(\bigwedge\nolimits _{1\le i \le n} z_i= y_i\) (resp. \(\bigwedge\nolimits _{(1\le i \le n) \wedge (i \ne j)} z_i= y_i\)). Furthermore, for brevity, we sometimes shorten a conjunction of conditions \(\theta _1\wedge \ldots \wedge \theta _n\) into a list \(\theta _1,\ldots ,\theta _n\).

Intuitively, a non-terminal of the form \({X_{(q_1,a,q_3)} (\bar{y},\bar{z},\ell )}\) represents a run of \({\mathcal{A }}\) from a configuration where the state is \(q_1\), the topmost stack symbol is \(a\) and its corresponding value is given by the value \(\ell \) (if \(a=\bot \) then the stack is empty), and the valuation of the shared variables of \(\mathcal{A }\) is given by the valuation of \(\bar{y}\), to a configuration with a stack content where \(a\) has been popped and where the state is \(q_3\) and the valuation of the shared variables of \(\mathcal{A }\) is given by the valuation of \(\bar{z}\).

Fig. 1
figure 1

From transitions of pushdown with data to productions

The set \(P\) is derived from \(\Delta \), and it contains the productions of Fig. 1. Then the following property holds.

Proposition 17.

\(\gamma _{ init}\overset{{*}}{\longrightarrow }\gamma \) for some \(\gamma =\left\langle {q_{ target},{ Val},\epsilon }\right\rangle \) iff \(X_{ init}\overset{{*}}{\longrightarrow }_{\mathcal{G }}\alpha \) for some sentence \(\alpha \) such that \({ Sym}\left( \alpha \right) \subseteq \mathcal{T }\).

As an immediate consequence of the above Proposition, Theorem 7, and Theorem 16, we get:

Theorem 18.

The Pdad-Reach and Pdad-Reach-Empty problems are decidable for Pdads.

9 Extended Pdads

In this section, we present generalizations of the basic Pdad model for which the results presented in this paper still hold.

The first extension consists in adding to conditions of the form \(x=c\), \(x>c\), and \(x<c\) for a variable \(x\) and a constant value \(c\ge 0\). The resulting formulas corresponds to the original Gap Order Constraints considered in [31].

The second extension consists in adding multiple data fields in each element pushed to the stack. For fixed number of data fields \(k\ge 0\), the configuration of Pdad \(_k\) becomes a triple \(\left\langle {q,{ Val},\alpha }\right\rangle \) where \(q\in Q\) is a state, \({ Val}:\mathcal{V }\mapsto \mathbb{N }\) is a valuation, and \(\alpha \in {\left( A\times \mathbb{N }^k\right) }^*\) defines the content of the stack (each element of the word is a pair \(\left\langle {a,c_1,\ldots ,c_k}\right\rangle \) where \(a\) is the symbol and \(c_i\) is its value for the \(i\)-th field).

We now consider operations that manipulate the data fields. We first extend the push operation and consider \({ push}\left( a\right) \left( x_1,\ldots ,x_k\right) \) to push the symbol \(a\in A\) and to assign to the \(i\)-th field the value of \(x_i\) for \(i:1,\ldots ,k\). We also consider operation \({ pop}\left( a\right) \left( x_1,\ldots ,x_k\right) \) to pop the symbol \(a\in A\) from the stack and to assign to \(x_i\) the value of the \(i\)-th field on the top of the stack \(i:1,\ldots ,k\). The operational semantics can be naturally extended in order to cope with tuples of values instead of single one.

Finally, we consider operations that test and modify the data fields on the stack. We can use special identifiers \(topx_1,\ldots ,topx_k\) to denote such data fields and use them in conditions of transitions.

To encode the resulting model into Cfgd, we need to introduce non-terminals with extra arguments that represent both the current value and the (guessed) updated value of data fields. More specifically, we need non-terminals of the form \({X_{(q_1,a,q_2)}(\bar{x},\bar{y},\bar{z},\bar{u})}\) to represent a run of a \({\mathcal{A }}_k\) from a configuration where the state is \(q_1\), the topmost stack symbol is \(a\) and its corresponding data field values are given by the vector \(\bar{z}\), and the valuation of the shared variables of \(\mathcal{A }\) is given by the valuation of \(\bar{x}\), to a configuration with the updated data fields \(\bar{u}\) and where the state is \(q_2\) and the valuation of the shared variables is given by the valuation of \(\bar{y}\).

We leave a detailed treatment of this extension for future work.

10 Related Work and Conclusion

Decidability and complexity of reachability problems for pushdown systems with or without data have been extensively studied in the literature. In [12] the authors present an algorithm to compute \(Post^*\) and \(Pre^*\) for a pushdown automata and a regular set of its configurations (represented as automata). Symbolic versions of the algorithms have been studied e.g. in [29]. In [11] the authors consider approximated verification methods for subclasses of pushdown systems called finite indices in which it is possible to handle counters without zero test (i.e. transitions of a Petri net). In [1, 2] the authors present decidability results for timed extensions of pushdown systems. In [14] the authors present decidability results for pushdown systems with either a well-quasi ordered set of control locations or of data values. In our model we do not consider a well-quasi ordered data domain, but introduce a well-quasi ordered relation over values pushed to and popped from the stack in order to decide reachability. Our extensions of pushdown system with Gap Order is orthogonal to the above mentioned models. Furthermore, it subsumes the model presented in [32], where the authors consider pushdown systems in which messages carry (object) identifiers that can be compared by equality. In addition to equality tests, Gap Order can be used to order messages in the stack.

Concerning our proof techniques, the algorithm for solving the Cfgd reachability problem is inspired to the seminal results on Datalog and context-free language reachability [30, 35] and to the evaluation of Datalog with Gap Order Constraints [31]. CLP programs with Gap Order constraints without conjunctions in the body have been used to model transition systems in [27]. The fixpoint semantics of CLP programs has been used to characterize model checking problems in [21] and applied to infinite-state systems in [16, 17, 18, 20]. In [15] extended automata with Gap Order conditions over variables are used as an approximated model of counter systems. The model however does not have recursion. The complexity of verification problems (expressed in temporal logic) for transitions systems with Gap Order Constraints has been studied in [13]. Allowing rules with sets of terms in the right-hand side, Gfgd are more general than the model in [13]. Multiset rewriting systems with Gap Order Constraints (i.e. systems with an arbitrary number of integral variables) have been introduced in [3] and applied to different types of systems in [8] extending the parameterized models described in [9, 22]. These systems are a subclass of multiset rewriting with (linear) constraints applied to infinite state verification, e.g., in [19].

The evaluation procedure for Datalog with Gap Order Constraints in [31] and its termination depend on specific data structures (weighted graphs kept in normal form) used to represent relations between variables that occur in Datalog clauses. In the present paper we formulate an algorithmic solution to Cfgd reachability as an instance of the general framework of well-structured transition systems and apply the theory of better-quasi ordering to naturally infer its termination. This approach has the great advantage of capturing the essential ingredients needed for extending the algorithm to other classes of grammars with data. For instance, under some restrictions on the arity of terms, a slightly modified algorithm can be applied to grammars with sets of terms in the left-hand side of a production. A more formal treatment of this kind of generalization together with a deeper investigation of the complexity of the resulting algorithm is part of our future work.