Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

A word, called ‘arrangement’ in [9], is an isomorphism type of a countable labeled linear order. Such words form a generalization of the classic notions of finite and \(\omega \)-words.

Finite automata on \(\omega \)-words have by now a vast literature, see [13] for a comprehensive treatment. Finite automata acting on well-ordered words longer than \(\omega \) have been investigated in [1, 6, 7, 16, 17], to mention a few references. In the last decade, the theory of automata on well-ordered words has been extended to automata on all countable words, including scattered and dense words. In [2, 3, 5], both operational and logical characterizations of the class of languages of countable words recognized by finite automata were obtained.

Context-free grammars generating \(\omega \)-words were introduced in [8] and subsequently studied in [4, 12]. Context-free grammars generating arbitrary countable words were defined in [10, 11]. Actually, two types of grammars were defined, context-free grammars with Büchi acceptance condition (BCFG), and context-free grammars with Muller acceptance condition (MCFG). These grammars generate the Büchi and the Muller context-free languages of countable words, abbreviated as BCFLs and MCFLs. Every BCFL is clearly an MCFL, but there exists an MCFL of well-ordered words that is not a BCFL, for example the set of all countable well-ordered words over some alphabet. In contrast, the set of all countable words over an alphabet is a BCFL.

In [11], it was shown that it is decidable (in polynomial time) whether a given MCFG generates well-ordered (or scattered) words only. This result was obtained by analysing the structure of a finite graph canonically associated with the grammar. In this note we establish a generic decidability result to the effect that whenever P is some property of countable words definable in monadic second-order logic (MSO), e.g., being well-ordered or scattered, then it is decidable whether an MCFG generates only words satisfying P. Of course for such a general setting one cannot hope for efficient algorithms since model checking MSO is nonelementary.

The main idea of the proof is that one can associate with each MCFG a regular tree in which every derivation tree of the grammar can be represented and moreover, the set of all derivation trees is MSO-definable.

2 Notation

Countable Words and Muller Context-Free Grammars. An alphabet is a finite nonempty set \(\varSigma \) of symbols, usually called letters. A word over \(\varSigma \) is a strict linear ordering \((I,<)\) equipped with a labeling function \(\lambda :I\rightarrow \varSigma \). The empty word, denoted \(\varepsilon \), is the unique word over the empty ordering. It is assumed that no alphabet contains \(\varepsilon \). When \(\varSigma \) is an alphabet, \(\varSigma _\varepsilon \) stands for \(\varSigma \cup \{\varepsilon \}\). An embedding of words is a mapping between the respective underlying linear orderings that preserves the order and the labeling; a surjective embedding is an isomorphism. We usually identify isomorphic words and denote by \(\varSigma ^\sharp \) the set of all countable words over the alphabet \(\varSigma \). As usual, we denote the collections of finite and \(\omega \)-words over \(\varSigma \) by \(\varSigma ^*\) and \(\varSigma ^\omega \), respectively. Sometimes we will also use the same notation for infinite sets.

Let \({\mathbb {N}}\) denote the set of positive integers. When \(u \in {\mathbb {N}}^*\) and \(i \in {\mathbb {N}}\), we usually write ui as \(u \cdot i\). A tree domain D is a prefix- and left-sibling closed nonempty (but possibly infinite) subset of \({\mathbb {N}}^*\). Thus, whenever \(u \cdot (i+1)\) is in D, where \(i \in {\mathbb {N}}\), then \(u\cdot i\) is also in D, and \(u\cdot i\in D\) implies \(u\in D\) as well. Elements of a tree domain D are also called the nodes of D. When u and \(u\cdot i\) are nodes of D, where \(u\in {\mathbb {N}}^*\) and \(i\in {\mathbb {N}}\), then \(u\cdot i\) is called a child of u. A descendant of a node u is a node of the form uv, or \(u\cdot v\), where \(v\in {\mathbb {N}}^*\). Nodes of D having no child are the leaves of D. The leaves, equipped with the order inherited from the lexicographic ordering \(\prec _\ell \) of \({\mathbb {N}}^*\) (that is, \(u\prec _\ell v\) iff \(u=wiw'\) and \(v=wjw''\) for some \(i<j\in {\mathbb {N}}\), \(w,w',w''\in {\mathbb {N}}^*\)) form the frontier of D, denoted \({\mathrm {fr}}(D)\). An inner node of D is a non-leaf node. A path of a tree domain D is a (finite or infinite) prefix-closed subset \(\pi \) of D such that each node of \(\pi \) has at most one child in \(\pi \). Given a tree domain D and some node \(u\in D\), the sub-tree domain of D rooted at u is the tree domain \(D|_u=\{v : uv\in D\}\).

A tree over some alphabet \(\varDelta \), or a \(\varDelta \)-tree for short, is a mapping \(t:{\mathrm {dom}}(t)\rightarrow \varDelta _\varepsilon \), where \({\mathrm {dom}}(t)\) is a tree domain, such that inner vertices are mapped to letters in \(\varDelta \). Notions such as nodes, paths etc. of tree domains are lifted to trees. When \(\pi \) is a path of the tree t, then \({\mathrm {labels}}(\pi )=\{t(u):u\in \pi \}\) is the set of labels occurring on \(\pi \) and \({\mathrm {infLabels}}(\pi )=\mathop \bigcap \limits _{u\in \pi }\{t(v):uv\in \pi \}\subseteq {\mathrm {labels}}(\pi )\) is the set of labels occurring infinitely often. Given a tree t and some node \(u\in {\mathrm {dom}}(t)\), the subtree of t rooted at u is the tree \(t|_u\) with domain \({\mathrm {dom}}(t|_u)={\mathrm {dom}}(t)|_u\) and labeling \(t|_u(v)=t(uv)\). A tree is regular if it has finitely many subtrees.

The frontier word \({\mathrm {lfr}}(t)\) of a tree t is determined by the leaves not labeled by \(\varepsilon \), which is equipped with the lexicographic ordering of \({\mathbb {N}}^*\) and the labeling function inherited from t. The root symbol of t is \(t(\varepsilon )\).

A Muller context-free grammar [11], or MCFG for short, is a system \(G=(V,\varSigma ,R,S,{\mathcal {F}})\), where V and \(\varSigma \) are the pairwise disjoint alphabets of nonterminals and terminals respectively, R is the finite set of productions of the form \(A\rightarrow \alpha \) with \(A\in V\) and \(\alpha \in (\varSigma \cup V)^*\), \(S\in V\) is the start symbol and \({\mathcal {F}}\subseteq P(V)\) is the set of accepting sets.

A \((V\cup \varSigma )\)-tree t is locally consistent with the above grammar G if it satisfies the following conditions:

  1. 1.

    The root symbol of t is S.

  2. 2.

    For each inner node u of t there exists a production \(A\rightarrow X_1\ldots X_n\) in R with \(t(u)=A\), \(X_i\in V\cup \varSigma \) such that:

    1. (a)

      either \(n>0\), the children of u are exactly \(u\cdot 1,\ldots ,u\cdot n\) and for each \(1\le i\le n\), \(t(u\cdot i)=X_i\);

    2. (b)

      or \(n=0\) and u has a single child \(u\cdot 1\) labeled \(\varepsilon \).

  3. 3.

    The leaves of t are labeled in \(\varSigma _\varepsilon \).

A derivation tree of the above grammar G is a locally consistent tree t satisfying the additional condition that for each infinite path \(\pi \) of t, \({\mathrm {infLabels}}(\pi )\) is an accepting set of G.

The language \(L(G)\subseteq \varSigma ^\sharp \) generated by G is the set of frontier words of derivation trees. A Muller context-free language, or MCFL for short, is a language generated by some MCFG.

Example 1

If \(G=(\{S,I\},\{a,b\},R,S,\{\{I\}\})\), with

$$R=\{S\rightarrow a,S \rightarrow b,S \rightarrow \varepsilon , S\rightarrow I,I\rightarrow SI\},$$

then L(G) consists of all the well-ordered words over \(\{a,b\}\).

Indeed, assume \(t_1,t_2,\ldots \) are derivation trees. Then so is the tree t depicted in Fig. 1 with frontier word \({\mathrm {lfr}}(t_1){\mathrm {lfr}}(t_2)\ldots \). Thus, L(G) contains the empty word (by \(S\rightarrow \varepsilon \)), the words of length 1 (by \(S\rightarrow a\) and \(S\rightarrow b\)), and is closed under taking “\(\omega \)-products”. Since the least class of order types which contains \(\mathbf {0}\), \(\mathbf {1}\) and which is closed under \(\omega \)-sums is the class of all countable ordinals (see e.g. [15]), L(G) contains all the well-ordered words over \(\{a,b\}\).

Fig. 1.
figure 1

Derivation tree corresponding to Example 1

For the other direction, assume t is a derivation tree having a frontier word containing an infinite descending chain \(u_1\succ _\ell u_2\succ _\ell \ldots \). Then let us define the path \(v_0,v_1,\ldots \) in t: \(v_0=\varepsilon \) and \(v_{i+1}\) is \(v_i\cdot 1\) if this node is an ancestor of infinitely many \(u_j\) and \(v_i\cdot 2\) otherwise (which happens if \(v_i\) corresponds to the production \(I\rightarrow SI\) and the node \(v_i \cdot 1\) (which is labeled S) has no descendant of the form \(u_j\) at all). Note that for each \(u_j\) there exists a unique \(v_{i_j}\) such that \(v_{i_j}\) is an ancestor of \(u_j\) and \(v_{i_j+1}\) is not, since the length of the words \(v_i\) grows without a bound. Now these nodes \(v_{i_j}\) correspond to the production \(I\rightarrow SI\) and \(v_{i_j+1}=v_{i_j}\cdot 1\), so that the successor of \(v_{i_j}\) along the path is labeled by S. Hence \(v_0,v_1,\ldots ,\) is a path \(\pi \) in t such that \({\mathrm {infLabels}}(\pi )\) contains S, which is a contradiction since the only accepting set is \(\{I\}\).

MSO on Trees and Words. Let \({\mathcal {X}}_1\) and \({\mathcal {X}}_2\) be fixed, countably infinite, disjoint sets of first-order and second-order variables, respectively. It is assumed that \({\mathcal {X}}_1\) and \({\mathcal {X}}_2\) are disjoint from alphabets, they do not contain \(\varepsilon \), etc.

Given an alphabet \(\varDelta \), the set of monadic second-order, or MSO-formulas (for trees over \(\varDelta \)) is the least set satisfying the following conditions:

  1. 1.

    When \({\varvec{x}}\) is a first-order variable and \(\delta \in \varDelta _\varepsilon \) is a symbol, then \(\delta ({\varvec{x}})\) is an MSO-formula.

  2. 2.

    When \({\varvec{x}}\) and \({\varvec{y}}\) are first-order variables, then \({\varvec{y}}={\varvec{x}}\cdot 1\) and \(\mathsf {sibling}({\varvec{x}},{\varvec{y}})\) are MSO-formulas.

  3. 3.

    When \({\varvec{x}}\) is a first-order and \({\varvec{X}}\) is a second-order variable, then \({\varvec{X}}({\varvec{x}})\), also written \({\varvec{x}}\in {\varvec{X}}\) is an MSO-formula.

  4. 4.

    When \(\varphi \) and \(\psi \) are MSO-formulas, then so are \((\varphi \vee \psi )\) and \((\lnot \varphi )\).

  5. 5.

    When \({\varvec{x}}\) (\({\varvec{X}}\), resp.) is a first-order (second-order, resp.) variable and \(\varphi \) is an MSO-formula, then so is \((\exists {\varvec{x}}\varphi )\) (\((\exists {\varvec{X}}\varphi )\), resp).

We also use the standard abbreviations of \(\varphi \wedge \psi =\lnot (\lnot \varphi \vee \lnot \psi )\), \(\varphi \rightarrow \psi =(\lnot \varphi )\vee \psi \), \(\forall {\varvec{x}}\varphi =\lnot \exists {\varvec{x}}\lnot \varphi \) etc., and omit some parentheses for the sake of readability. Formulas over \(\varDelta \) are interpreted on \(\varDelta \)-trees in the expected way. A structure is a triple \((t,\varPi _1,\varPi _2)\) where t is a \(\varDelta \)-tree, \(\varPi _1:{\mathcal {X}}_1\rightarrow {\mathrm {dom}}(t)\) assigns a node of t to each first-order variable, and \(\varPi _2:{\mathcal {X}}_2\rightarrow P({\mathrm {dom}}(t))\) assigns a set of nodes of t to each second-order variable. Then, the above structure satisfies the formula \(\varphi \), denoted \((t,\varPi _1,\varPi _2)\,\models \,\varphi \), if and only if one of the following conditions holds:

  1. 1.

    \(\varphi =\delta ({\varvec{x}})\) for \(\delta \in \varDelta _\varepsilon \) and \({\varvec{x}}\in {\mathcal {X}}_1\), and \(t(\varPi _1({\varvec{x}}))=\delta \).

  2. 2.

    \(\varphi =({\varvec{y}}={\varvec{x}}\cdot 1)\) for \({\varvec{x}},{\varvec{y}}\in {\mathcal {X}}_1\) and \(\varPi _1({\varvec{y}})=\varPi _1({\varvec{x}})\cdot 1\).

  3. 3.

    \(\varphi =\mathsf {sibling}({\varvec{x}},{\varvec{y}})\) for \({\varvec{x}},{\varvec{y}}\in {\mathcal {X}}_1\) and there exist \(u\in {\mathbb {N}}^*\), \(i\in {\mathbb {N}}\) with \(\varPi _1({\varvec{x}})=u\cdot i\), \(\varPi _1({\varvec{y}})=u\cdot (i+1)\).

  4. 4.

    \(\varphi ={\varvec{X}}({\varvec{x}})\) for \({\varvec{x}}\in {\mathcal {X}}_1\), \({\varvec{X}}\in {\mathcal {X}}_2\) and \(\varPi _1({\varvec{x}})\in \varPi _2({\varvec{X}})\).

  5. 5.

    \(\varphi =(\varphi _1\vee \varphi _2)\) and \((t,\varPi _1,\varPi _2)\) satisfies \(\varphi _1\) or \(\varphi _2\) (or both).

  6. 6.

    \(\varphi =(\lnot \varphi _1)\) and it is not the case that the structure satisfies \(\varphi _1\).

  7. 7.

    \(\varphi =(\exists {\varvec{x}}\varphi _1)\) and there is a structure \((t,\varPi _1',\varPi _2)\) satisfying \(\varphi _1\) such that \(\varPi _1({\varvec{y}})=\varPi _1'({\varvec{y}})\) for each first-order variable \({\varvec{y}}\ne {\varvec{x}}\).

  8. 8.

    \(\varphi =(\exists {\varvec{X}}\varphi _1)\) and there is a structure \((t,\varPi _1,\varPi _2')\) satisfying \(\varphi _1\) such that \(\varPi _2({\varvec{Y}})=\varPi _2'({\varvec{Y}})\) for each second-order variable \({\varvec{Y}}\ne {\varvec{X}}\).

It is clear that satisfaction depends on \(\varPi _1({\varvec{x}})\) or \(\varPi _2({\varvec{X}})\) only if the appropriate variable occurs freely in the formula (i.e. not within the scope of some \(\exists \) quantifier). Hence when \(\varphi \) is a sentence, a formula without free variable occurrences, it makes sense to write \(t\,\models \,\varphi \) instead of \((t,\varPi _1,\varPi _2)\,\models \,\varphi \).

In order to ease notation, when \(\varPi _1\) and \(\varPi _2\) are clear from the context, we write \({\underline{{\varvec{x}}}}\) and \({\underline{{\varvec{X}}}}\) for \(\varPi _1({\varvec{x}})\) and \(\varPi _2({\varvec{X}})\).

Example 2

One can define the i-th child relation \({\varvec{y}}={\varvec{x}}\cdot i\) for \(i\in {\mathbb {N}}\) inductively as \(\exists {\varvec{z}}({\varvec{z}}={\varvec{x}}\cdot (i-1)\ \wedge \ \mathsf {sibling}({\varvec{z}},{\varvec{y}}))\) (which is satisfied by a structure if and only if \({\underline{{\varvec{y}}}}={\underline{{\varvec{x}}}}\cdot i\)).

Consider the formula

$$\begin{aligned} \mathsf {child}({\varvec{x}},{\varvec{y}})= & {} \exists {\varvec{z}}({\varvec{z}}={\varvec{x}}\cdot 1)\wedge \\&\forall {\varvec{X}}\Bigl (\bigl (\forall {\varvec{z}}\bigl (({\varvec{z}}={\varvec{x}}\cdot 1)\rightarrow {\varvec{z}}\in {\varvec{X}}\bigr )\bigr )\ \wedge \ \\&\quad \quad \forall {\varvec{z}}\forall {\varvec{w}}({\varvec{z}}\in {\varvec{X}}\wedge \mathsf {sibling}({\varvec{z}},{\varvec{w}})\rightarrow {\varvec{w}}\in {\varvec{X}})\rightarrow {\varvec{y}}\in {\varvec{X}}\Bigr ). \end{aligned}$$

Then, \(\mathsf {child}({\varvec{x}},{\varvec{y}})\) holds in the structure iff \({\underline{{\varvec{x}}}}\) has a first child and if whenever a set \({\underline{{\varvec{X}}}}\) contains the first child of \({\underline{{\varvec{x}}}}\) and is closed under taking right siblings, then \({\underline{{\varvec{X}}}}\) contains \({\underline{{\varvec{y}}}}\) as well, that is, if and only if \({\underline{{\varvec{y}}}}={\underline{{\varvec{x}}}}\cdot i\) for some i.

As another example, the formula

$$\begin{aligned} \exists {\varvec{x}}({\varvec{x}}\in {\varvec{X}})\ \wedge \ \forall {\varvec{x}}\forall {\varvec{y}}({\varvec{x}}\in {\varvec{X}}\wedge \mathsf {child}({\varvec{y}},{\varvec{x}})\rightarrow {\varvec{y}}\in {\varvec{X}}) \end{aligned}$$

holds in a structure if \({\underline{{\varvec{X}}}}\) is a nonempty, prefix-closed subset of the nodes.

It is well-known [14] that given any regular tree t and MSO sentence \(\varphi \), it is decidable whether \(t\,\models \,\varphi \) holds.

For countable \(\varSigma \)-words, the syntax and semantics of MSO are slightly changed due to the differing relational structure: the atomic formulas are of the form \(a({\varvec{x}})\) for \(a\in \varSigma \) and \({\varvec{x}}\in {\mathcal {X}}_1\) and \({\varvec{x}}<{\varvec{y}}\) for \({\varvec{x}},{\varvec{y}}\in {\mathcal {X}}_1\), interpreted in the expected way. A property P of countable \(\varSigma \)-words is called MSO-definable if there exists an MSO sentence \(\varphi _P\) which is satisfied exactly by those \(\varSigma \)-words having property P.

3 Result

Let us fix a Muller context-free grammar \(G=(V,\varSigma ,R,S,{\mathcal {F}})\) for this section, with R being disjoint from \(V\cup \varSigma \). Without loss of generality we assume that each \(A\in V\) is the left-hand side of at least one production. We define the grammar tree associated with G as the unique derivation tree \({\mathcal {T}}\) of the following grammar \(G'=(V\cup R,\varSigma ,R',S,P(V\cup R))\) with \(R'\) consisting of productions of the following form:

  1. 1.

    When \(A\rightarrow \alpha _1\),...,\(A\rightarrow \alpha _k\) are all the productions of G having A on their left side in some fixed ordering of the productions, then \(A\rightarrow (A\rightarrow \alpha _1)(A\rightarrow \alpha _2)\ldots (A\rightarrow \alpha _k)\) is a production of \(G'\) (the right-hand side of this single production is in \(R^*\) while the left-hand side is in V).

  2. 2.

    For each production \(A\rightarrow X_1\ldots X_k\), the production \((A\rightarrow X_1\ldots X_k)\rightarrow X_1\ldots X_k\). (which is a production of the form \(r\rightarrow \alpha \) for some \(r\in R\) and \(\alpha \in (V\cup \varSigma )^*\)) is a production of \(G'\).

Note that each element of \(V\cup R\) is the left-hand side of exactly one production in \(R'\), hence there exists exactly one locally consistent tree of \(G'\). Also, since the acceptance condition is \(P(V\cup R)\), this tree is a valid derivation tree of the grammar, thus \({\mathcal {T}}\) is well-defined and has at most \(|V|+|R|+|\varSigma _\varepsilon |\) subtrees up to isomorphism. Hence it is a regular tree.

Example 3

For the MCFG of Example 1, this tree \({\mathcal {T}}\) is depicted in Fig. 2.

Fig. 2.
figure 2

Grammar tree of the MCFG of Example 1

Moreover, each locally consistent tree t of G can be embedded into \({\mathcal {T}}\) in the following sense: there exists a mapping \(h_t:{\mathrm {dom}}(t)\rightarrow {\mathrm {dom}}({\mathcal {T}})\) with \(h_t(\varepsilon )=\varepsilon \) and \(t(u)={\mathcal {T}}(h_t(u))\) for each \(u\in {\mathrm {dom}}(t)\), moreover \(h_t(u\cdot i)\) is a descendant (in particular, a grandchild) of \(h_t(u)\) in \({\mathcal {T}}\) for each \(u\cdot i\in {\mathrm {dom}}(t)\), moreover, when u and v are siblings in t, then so are \(h_t(u)\) and \(h_t(v)\) in \({\mathcal {T}}\). Indeed, assume \(u\cdot i\in {\mathrm {dom}}(t)\) and that \(u'=h_t(u)\) is already defined. Then since u is an inner node of t, we have \(t(u)=A\in V\). By \({\mathcal {T}}(u')=t(u)=A\), each production \(r=A\rightarrow \alpha \) occurs as \({\mathcal {T}}(u'\cdot k_r)\) for some \(k_r\in {\mathbb {N}}\). In particular, let \(r=A\rightarrow X_1\ldots X_n\) be the production corresponding to u, so that u has n children and \(t(u\cdot j)=X_j\) for each \(j=1,\ldots ,n\) (subsuming the case when \(n=0\) as \(t(u\cdot 1)=\varepsilon \)). Then, we define \(h_t(u\cdot j)\) as \(u'\cdot k_r\cdot j\).

We call a prefix-closed nonempty set \(T\subseteq {\mathrm {dom}}({\mathcal {T}})\) derivation-like iff it satisfies the following conditions:

  1. 1.

    For each \(u\in T\) with \({\mathcal {T}}(u)\in R\), each child of u is in T.

  2. 2.

    For each \(u\in T\) with \({\mathcal {T}}(u)\in V\), exactly one child of u is in T.

It is clear that derivation-like subsets of \({\mathrm {dom}}({\mathcal {T}})\) are in one-to-one correspondence with the locally consistent trees of G: with any locally consistent tree t of G we associate the derivation-like set \(T\subseteq {\mathrm {dom}}({\mathcal {T}})\) which is the closure of \(\mathrm {im}(h_t)\) with respect to the prefix relation. Given a derivation-like set T, the corresponding locally consistent tree is denoted t.

Example 4

Figure 3 shows a (part of a) derivation tree t of the grammar of Example 1 and the corresponding derivation-like subset T of \({\mathrm {dom}}({\mathcal {T}})\) (as nodes in boldface).

Fig. 3.
figure 3

A part of a derivation tree t of Example 1 and the corresponding derivation-like subset \({\varvec{T}}\) of \({\mathcal {T}}\)

Proposition 1

There is an MSO formula \(d({\varvec{X}})\) with the free variable \({\varvec{X}}\in {\mathcal {X}}_2\) such that \(({\mathcal {T}},\varPi _1,\varPi _2)\,\models \,d({\varvec{X}})\) if and only if \({\underline{{\varvec{X}}}}\) is a derivation-like set.

(In short, it is MSO-definable whether some set \({\underline{{\varvec{X}}}}\) is derivation-like.)

Proof

We can define \(d({\varvec{X}})\) as the conjunction of the formulas stating that \({\underline{{\varvec{X}}}}\) is nonempty and prefix-closed (see Example 2), the formula

$$ \forall {\varvec{x}}\Bigl (({\varvec{x}}\in {\varvec{X}}\wedge \mathop \bigvee \limits _{r\in R}r({\varvec{x}}))\ \rightarrow \ \forall {\varvec{y}}(\mathsf {child}({\varvec{x}},{\varvec{y}})\rightarrow {\varvec{y}}\in {\varvec{X}}) \Bigr ) $$

stating that all the children of the nodes labeled by productions are members of \({\underline{{\varvec{X}}}}\), and the formula

$$ \forall {\varvec{x}}\Bigl (({\varvec{x}}\in {\varvec{X}}\wedge \mathop \bigvee \limits _{A\in V}A({\varvec{x}}))\rightarrow \exists {\varvec{y}}\bigl ( \mathsf {child}({\varvec{x}},{\varvec{y}})\wedge \forall {\varvec{z}}(\mathsf {child}({\varvec{x}},{\varvec{z}})\wedge {\varvec{z}}\in {\varvec{X}}\leftrightarrow {\varvec{z}}={\varvec{y}}) \bigr )\Bigr ) $$

stating that exactly one child of the nodes labeled by nonterminals is in \({\underline{{\varvec{X}}}}\).    \(\square \)

Also, from a derivation-like set T determining a locally consistent tree t of G, one can select a subset of nodes corresponding to a path of t:

Proposition 2

There exists an MSO-formula \(p({\varvec{X}},{\varvec{Y}})\) with the free second-order variables \({\varvec{X}},{\varvec{Y}}\) such that \(({\mathcal {T}},\varPi _1,\varPi _2)\,\models \,p({\varvec{X}},{\varvec{Y}})\) if and only if \({\underline{{\varvec{X}}}}=T\) is a derivation-like subset of \({\mathcal {T}}\) with corresponding locally consistent tree t and \({\underline{{\varvec{Y}}}}=h_t(\pi )\) Footnote 1 for some infinite path \(\pi \) of t.

Proof

\(p({\varvec{X}},{\varvec{Y}})\) expresses the following:

  1. (i)

    \({\underline{{\varvec{X}}}}\) is a derivation-like subset of \({\mathcal {T}}\), i.e. \(d({\varvec{X}})\) holds,

  2. (ii)

    \({\underline{{\varvec{Y}}}}\subseteq {\underline{{\varvec{X}}}}\),

  3. (iii)

    each of the nodes of \({\underline{{\varvec{Y}}}}\) is labeled by some member of V,

  4. (iv)

    whenever v is a grandparent of some node \(u\in {\underline{{\varvec{Y}}}}\), then \(v\in {\underline{{\varvec{Y}}}}\) as well, and

  5. (v)

    each \(u\in {\underline{{\varvec{Y}}}}\) has exactly one grandchild in \({\underline{{\varvec{Y}}}}\).

These properties can clearly be defined in MSO.    \(\square \)

Proposition 3

There is an MSO formula \(d'({\varvec{X}})\) expressing that \({\underline{{\varvec{X}}}}\) is a derivation-like set corresponding to an actual derivation tree of G.

Proof

The descendant relation \({\varvec{x}}\preceq {\varvec{y}}\) can be defined in MSO by a formula expressing that whenever \({\underline{{\varvec{Y}}}}\) is a prefix-closed set containing \({\underline{{\varvec{y}}}}\), then \({\underline{{\varvec{Y}}}}\) contains \({\underline{{\varvec{x}}}}\) as well.

Then, for each \(F\in {\mathcal {F}}\) we can construct a formula \(m_F({\varvec{Y}})\) stating that if \({\underline{{\varvec{Y}}}}=h_t(\pi )\) for some path \(\pi \) of some locally consistent tree t of G, then \({\mathrm {infLabels}}(\pi )=F\):

$$\begin{aligned} m_F({\varvec{Y}})=\mathop \bigwedge \limits _{A\in F}i_A({\varvec{Y}})\ \wedge \ \mathop \bigwedge \limits _{A\notin F}\lnot i_A({\varvec{Y}}) \end{aligned}$$

where \(i_A({\varvec{Y}})\) is the formula

$$\begin{aligned} \forall {\varvec{x}}({\varvec{x}}\in {\varvec{Y}}\rightarrow \exists {\varvec{y}}({\varvec{x}}\preceq {\varvec{y}}\ \wedge \ {\varvec{y}}\in {\varvec{Y}}\ \wedge \ A({\varvec{y}}))) \end{aligned}$$

stating that A occurs infinitely many times on the path given by \({\underline{{\varvec{Y}}}}\).

Now, we can define \(d'({\varvec{X}})\) as

$$ d({\varvec{X}})\wedge \forall {\varvec{Y}}(p({\varvec{X}},{\varvec{Y}})\rightarrow \mathop \bigvee \limits _{F\in {\mathcal {F}}}m_F({\varvec{Y}})) $$

expressing that \({\underline{{\varvec{X}}}}\) is a derivation-like set corresponding to some locally consistent tree t of G such that any infinite path \(\pi \) of t satisfies the Muller acceptance condition.    \(\square \)

Now a set \({\underline{{\varvec{Y}}}}\subseteq {\mathrm {dom}}({\mathcal {T}})\) corresponds to a frontier word of some derivation tree of G (i.e. belongs to L(G)) if and only if there exists some \({\underline{{\varvec{X}}}}\subseteq {\mathrm {dom}}({\mathcal {T}})\) satisfying \(d'({\varvec{X}})\) such that a node \(v\in {\mathrm {dom}}({\mathcal {T}})\) is in \({\underline{{\varvec{Y}}}}\) if and only if \(v\in {\underline{{\varvec{X}}}}\) and is a \(\varSigma \)-labeled node of \({\mathcal {T}}\), which is an MSO-definable property:

$$ \exists {\varvec{X}}\bigl (d'({\varvec{X}})\wedge \forall y(y\in {\varvec{Y}}\leftrightarrow (y\in {\varvec{X}}\wedge \mathop \bigvee \limits _{a\in \varSigma }a(y)))\bigr ). $$

Moreover, the lexicographic ordering on these leaves can also be defined in MSO as \(u\prec _\ell v\) iff there exists some common ancestor w of u and v such that \(w\cdot i\) and \(w\cdot j\) are respectively ancestors of u and v for some \(i<j\):

$$ {\varvec{x}}\prec _\ell {\varvec{y}}\ =\ \exists {\varvec{z}}_1\exists {\varvec{z}}_2\bigl ({\varvec{z}}_1\preceq {\varvec{x}}\ \wedge \ {\varvec{z}}_2\preceq {\varvec{y}}\ \wedge \ \mathsf {sibling}^+({\varvec{z}}_1,{\varvec{z}}_2) \bigr ) $$

where \(\mathsf {sibling}^+({\varvec{x}},{\varvec{y}})\) is the transitive closure of \(\mathsf {sibling}\):

$$ {\varvec{x}}\ne {\varvec{y}}\ \wedge \ \forall {\varvec{X}}\bigl ({\varvec{x}}\in {\varvec{X}}\wedge \forall {\varvec{z}}_1\forall {\varvec{z}}_2({\varvec{z}}_1\in {\varvec{X}}\wedge \mathsf {sibling}({\varvec{z}}_1,{\varvec{z}}_2)\rightarrow {\varvec{z}}_2\in {\varvec{X}})\bigr )\rightarrow {\varvec{y}}\in {\varvec{X}}$$

Hence we have shown:

Proposition 4

For any MCFG G, there exists an effectively constructible MSO formula \(w({\varvec{Y}})\) such that \(({\mathcal {T}},\varPi _1,\varPi _2)\,\models \,w({\varvec{Y}})\) if and only \({\underline{{\varvec{Y}}}}\) is the set of \(\varSigma \)-labeled leaves of some derivation-like subset of \({\mathcal {T}}\) corresponding to a derivation tree of G.

As a corollary, we obtain the main result of this note.

Theorem 1

It is decidable for a given MCFL L and an MSO-definable property \(\varphi \) of words whether every member of L satisfies \(\varphi \).

Proof

The question can be reduced to checking whether \({\mathcal {T}}\) satisfies the formula \(\forall {\varvec{Y}}(w({\varvec{Y}})\rightarrow \varphi ({\varvec{Y}}))\) where \(\varphi ({\varvec{Y}})\) is obtained from \(\varphi \) by replacing all the subformulas of the form \(\exists {\varvec{x}}\varphi '\) and \(\exists {\varvec{X}}\varphi '\) respectively to \(\exists {\varvec{x}}({\varvec{x}}\in {\varvec{Y}}\wedge \varphi ')\) and \(\exists {\varvec{X}}(\forall {\varvec{x}}({\varvec{x}}\in {\varvec{X}}\rightarrow {\varvec{x}}\in {\varvec{Y}}))\wedge \varphi ')\) and substituting the formula defining the lexicographic ordering \({\varvec{x}}\prec _\ell {\varvec{y}}\) in place of the atomic formulas \({\varvec{x}}<{\varvec{y}}\). Since \({\mathcal {T}}\) is regular, model checking the resulting formula on \({\mathcal {T}}\) is decidable.    \(\square \)

(We remark that thus it is also decidable whether there exists a word in L satisfying an MSO formula \(\varphi \) since such a word exists if and only if not all members of L satisfy \(\lnot \varphi \).)

In particular, our former decidability results (without complexity bounds) regarding whether an MCFG generates scattered (or well-ordered) words are corollaries of this general theorem. However, since model-checking MSO formulas on regular trees has a high complexity in general (\(\mathrm {tower}(n)\) when n is the alternation depth of the second-order quantifiers), no polytime decision procedures follow from the present theorem. Nevertheless several interesting properties are decidable in polynomial time, including whether every word generated by an MFCG is scattered or well-ordered, cf. [11].

As another example, let \(\varSigma =\{a,b\}\) be an alphabet. The following formula \(\mathsf {segment}({\varvec{X}})\) expresses that \({\underline{{\varvec{X}}}}\) is a nonempty interval, or segment of a given word:

$$\begin{aligned} (\exists {\varvec{x}}\ {\varvec{x}}\in {\varvec{X}})\wedge \bigl (\forall {\varvec{x}}\forall {\varvec{y}}\forall {\varvec{z}}({\varvec{x}}<{\varvec{y}}\wedge {\varvec{y}}<{\varvec{z}}\wedge {\varvec{x}}\in {\varvec{X}}\wedge {\varvec{z}}\in {\varvec{X}})\rightarrow {\varvec{y}}\in {\varvec{X}}\bigr ). \end{aligned}$$

The following formula \(\mathsf {dense}({\varvec{X}})\) expresses that \({\underline{{\varvec{X}}}}\) is a dense subset (containing at least two elements) of the word:

$$\begin{aligned}&\exists {\varvec{x}}\exists {\varvec{y}}({\varvec{x}}<{\varvec{y}}\wedge {\varvec{x}}\in {\varvec{X}}\wedge {\varvec{y}}\in {\varvec{X}})\wedge \\&\forall {\varvec{x}}\forall {\varvec{y}}\Bigl ({\varvec{x}}<{\varvec{y}}\wedge {\varvec{x}}\in {\varvec{X}}\wedge {\varvec{y}}\in {\varvec{X}}\rightarrow \exists {\varvec{z}}({\varvec{x}}<{\varvec{z}}\wedge {\varvec{z}}<{\varvec{y}}\wedge {\varvec{z}}\in {\varvec{X}})\Bigr ). \end{aligned}$$

Thus, the property “there exists a dense segment X of the word such that for all \(x<y\) in X there exists \(z,z'\) with \(x<z,z'<y\) and z is labeled by a, \(z'\) is labeled by b” is also expressible in MSO as

$$\begin{aligned} \exists {\varvec{X}}\Bigl (&\mathsf {dense}({\varvec{X}})\wedge \mathsf {segment}({\varvec{X}})\wedge \\&\forall {\varvec{x}}\forall {\varvec{y}}\bigl ({\varvec{x}}<{\varvec{y}}\wedge {\varvec{x}}\in {\varvec{X}}\wedge {\varvec{y}}\in {\varvec{X}}\\&\quad \rightarrow \exists {\varvec{z}}\exists {\varvec{z}}' ({\varvec{x}}<{\varvec{z}}\wedge {\varvec{z}}<{\varvec{y}}\wedge {\varvec{x}}<{\varvec{z}}'\wedge {\varvec{z}}'<{\varvec{y}}\wedge a({\varvec{z}})\wedge b({\varvec{z}}')) \bigr )\Bigr ) \end{aligned}$$

In other words, \(w\in \{a,b\}^\sharp \) satisfies the above formula iff \(w=u\{a,b\}^\eta v\) for some words \(u,v\in \{a,b\}^\sharp \) where \(\{a,b\}^\eta \) is the so-called shuffle of a and b. Thus, it is also decidable for a MCFL L whether every word in L is of the form \(u\{a,b\}^\eta v\).

Another expressible property is that whether a word is the shuffle product of, say, a dense word and a scattered word consisting only of a’s, that is, whether the underlying linear order can be partitioned into two subsets such that the two subwords determined by the partitions satisfy the appropriate property:

$$ \exists {\varvec{X}}\Bigl (\mathsf {dense}({\varvec{X}})\wedge \forall {\varvec{x}}({\varvec{x}}\notin {\varvec{X}}\rightarrow a({\varvec{x}})) \wedge \lnot \exists {\varvec{Y}}\bigl (\forall {\varvec{x}}({\varvec{x}}\in {\varvec{Y}}\rightarrow {\varvec{x}}\notin {\varvec{X}})\ \wedge \ \mathsf {dense}({\varvec{Y}})\bigr )\Bigr ). $$

Thus, it is also decidable for a given MCFL L whether every member of L is a shuffle product of a dense word and a scattered one consisting only of a’s.

4 Conclusion

We have proved that there is an algorithm to decide for a Muller context-free language L generated by an MCFG and an MSO-definable property P of words whether every word in L has property P. We obtained this result by assigning a regular tree t to an MCFG such that the derivation trees of the grammar have an MSO-interpretation in t. We then used the fact that the MSO-theory of a regular tree is decidable.

There is an alternative method. First, we can prove that the MCFLs are exactly the frontier languages of the tree languages recognizable by Muller tree automata. This is similar to the well-known fact that ordinary context-free languages are the frontier languages of the languages of finite trees recognizable by finite tree automata. Also, there is an algorithm to decide, for a Muller tree automaton and an MSO-definable property of trees whether every tree in the language L recognized by the automaton has property P. This follows using the fact that every Muller tree automaton can be converted to an MSO-formula \(\varphi \), and if P is definable by the formula \(\psi \), then it holds that every tree in L satisfies P iff there is no tree satisfying \(\varphi \wedge \lnot \psi \), which is decidable.