Position Automaton Construction for Regular Expressions with Intersection

Broda, Sabine; Machiavelo, António; Moreira, Nelma; Reis, Rogério

doi:10.1007/978-3-662-53132-7_5

Sabine Broda¹⁵,
António Machiavelo¹⁵,
Nelma Moreira¹⁵ &
…
Rogério Reis¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9840))

Included in the following conference series:

International Conference on Developments in Language Theory

616 Accesses
1 Citations

Abstract

Positions and derivatives are two essential notions in the conversion methods from regular expressions to equivalent finite automata. Partial derivative based methods have recently been extended to regular expressions with intersection. In this paper, we present a position automaton construction for those expressions. This construction generalizes the notion of position making it compatible with intersection. The resulting automaton is homogeneous and has the partial derivative automaton as its quotient.

This work was partially supported by CMUP (UID/MAT/00144/2013), which is funded by FCT (Portugal) with national (MEC) and European structural funds through the programs FEDER, under the partnership agreement PT2020.

Download conference paper PDF

On the State Complexity of Partial Derivative Automata For Regular Expressions with Intersection

Partial Derivative Automaton for Regular Expressions with Shuffle

Derivatives for Enhanced Regular Expressions

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The position automaton, introduced by Glushkov [12], permits the conversion of a simple regular expression (involving only the sum, concatenation and star operations) into an equivalent nondeterministic finite automaton (NFA) without $\varepsilon $-transitions. The states in the position automaton ($\mathcal{A}_{\mathsf {pos}}$) correspond to the positions of letters in the corresponding regular expression plus an additional initial state. McNaughton and Yamada [15] also used the positions of a regular expression to define an automaton, however they directly computed a deterministic version of the position automaton. The position automaton has been well studied [3, 8] and is considered the standard automaton simulation of a regular expression [16]. Some of its interesting properties are: homogeneity, i.e. for each state, all in-transitions have the same label (letter); whenever deterministic, these automata characterize certain families of unambiguous regular expressions, and can be computed in quadratic time [4]; other automata simulations of regular expressions are quotients of the $\mathcal{A}_{\mathsf {pos}}$, e.g. partial derivative automata ($\mathcal{A}_{\mathsf {pd}}$) [9] and follow automata [14].

Many authors observed that the position automaton construction could not directly be extended to regular expressions with intersection [3, 6], as intersection (and also complementation) is not compatible with the notion of position. In fact, considering positions of letters in the expression $(ab^\star ) \cap a$, whose language is $\{ a\}$, we obtain the regular expression $(a_1b_2^\star ) \cap a_3$. Interpreting $a_1$ and $a_3$ as distinct alphabet symbols, the language described by this expression is empty and there is no longer a correspondence between the languages of $(ab^\star ) \cap a$ and $(a_1b_2^\star ) \cap a_3$, as it is the case for expressions without intersection. However, the conversions from expressions to automata based on the notion of derivative or partial derivative can still be extended to regular expressions with intersection [2, 5, 7]. In this paper, we present a position automaton construction for regular expressions with intersection by generalizing the notion of position. Instead of positions, sets of positions are considered in such a way that marking a regular expression is made compatible with the intersection operation. We also show that the partial derivative automaton is a quotient of the position automaton.

2 Preliminaries

In this section we recall the basic definitions we use throughout this paper and the notation. For further details we refer to [13, 17].

Let $\varSigma $ be an alphabet (set of letters). A word over $\varSigma $ is a finite sequence of letters, where $\varepsilon $ is the empty word. The size of a word x, |x|, is the number of alphabet symbols in x. $\varSigma ^\star $ denotes the set of all words over $\varSigma $, and a language over $\varSigma $ is any subset of $\varSigma ^\star $. The concatenation of two languages $L_1$ and $L_2$ is defined by $L_1 \cdot L_2 = \{\; x y \mid x \in L_1, y \in L_2 \;\}$, and $L^\star $ denotes the set $\{\; x_1x_2 \cdots x_n \mid n \ge 0,\, x_i \in L \;\}$. The left quotient of a language $L \subseteq \varSigma ^\star $ w.r.t. a word $x \in \varSigma ^\star $ is the language $x^{-1}L = \{\; y \mid xy \in L \;\}$.

The set ${\mathsf {RE}}_{\cap } $ of regular expressions with intersection over $\varSigma $ is defined by the following grammar

$$\begin{aligned} \alpha , \beta:= & {} \emptyset \mid \varepsilon \mid a \in \varSigma \mid (\alpha + \beta ) \mid (\alpha \cdot \beta ) \mid (\alpha ^\star ) \mid (\alpha \cap \beta ), \end{aligned}$$

(1)

where the concatenation operator $\cdot $ is often omitted. We consider ${\mathsf {RE}}_{\cap } $ expressions modulo the standard equations for $\emptyset $ and $\varepsilon $, i.e. $\alpha + \emptyset = \emptyset + \alpha = \alpha \cdot \varepsilon = \varepsilon \cdot \alpha = \alpha $, $\alpha \cdot \emptyset = \emptyset \cdot \alpha = \alpha \cap \emptyset = \emptyset \cap \alpha = \emptyset $, and $\emptyset ^\star =\varepsilon $. Throughout this paper we often refer to regular expressions with intersection just as regular expressions. The set of alphabet symbols with occurrences in $\alpha $ is denoted by $\varSigma _\alpha $. Expressions containing no occurrence of the operator $\cap $ are called simple regular expressions. A linear regular expression is a regular expression in which every alphabet symbol occurs at most once. We let $|\alpha |$, $|\alpha |_\varSigma $ and $|\alpha |_\cap $ denote for $\alpha \in {\mathsf {RE}}_{\cap } $ the number of symbols, the number of occurrences of alphabet symbols and the number of occurrences of the binary operator $\cap $, respectively. The language $\mathcal {L}(\alpha )$ for $\alpha \in {\mathsf {RE}}_{\cap } $ is defined as usual, with $\mathcal {L}(\alpha \cap \beta ) = \mathcal {L}(\alpha ) \cap \mathcal {L}(\beta )$. The language of $S\subseteq {\mathsf {RE}}_{\cap } $ is $\mathcal {L}(S)=\cup _{\alpha \in S}\mathcal {L}(\alpha )$. Given an expression $\alpha \in {\mathsf {RE}}_{\cap } $, we define $\varepsilon (\alpha ) = \varepsilon $ if $\varepsilon \in \mathcal {L}(\alpha )$, and $\varepsilon (\alpha ) = \emptyset $ otherwise. A recursive definition of $\varepsilon : {\mathsf {RE}}_{\cap } \longrightarrow \{\emptyset ,\varepsilon \}$ is given by the following: $ \varepsilon (a) = \varepsilon (\emptyset ) =\emptyset $, $\varepsilon (\varepsilon ) = \varepsilon (\alpha ^\star ) =\varepsilon $, $\varepsilon (\alpha +\beta ) = \varepsilon (\alpha ) + \varepsilon (\beta )$, and $\varepsilon (\alpha \beta ) = \varepsilon (\alpha \cap \beta ) = \varepsilon (\alpha ) \cdot \varepsilon (\beta )$.

A nondeterministic finite automaton (NFA) is a tuple $\mathcal {A}= \langle S, \varSigma , S_0, \delta , F \rangle $, where S is a finite set of states, $\varSigma $ is a finite alphabet, $S_0 \subseteq S$ a set of initial states, $\delta : S \times \varSigma \longrightarrow \mathcal{P}(S)$ the transition function, and $F \subseteq S$ a set of final states. The extension of $\delta $ to sets of states and words is defined by $\delta (X,\varepsilon ) = X$ and $\delta (X,a x) = \displaystyle {\delta (\cup _{s \in X}\delta (s,a),x)}$. A word $x \in \varSigma ^\star $ is accepted by $\mathcal {A}$ if and only if $\delta (S_0,x) \cap F \ne \emptyset $. The language of $\mathcal {A}$, $\mathcal {L}(\mathcal {A})$, is the set of words accepted by $\mathcal {A}$. The right language of a state s, $\mathcal {L}_s$, is the language accepted by $\mathcal {A}$ if we take $S_0 =\{s\}$. An NFA is initially connected or accessible if each state is reachable from an initial state and it is trimmed if, moreover, the right language of each state is non-empty. Given $\mathcal {A}$, we denote by $\mathcal {A}^\mathsf {ac}$ and $\mathcal {A}^\mathsf {t}$ the result of removing unreachable states from $\mathcal {A}$ and trimming $\mathcal {A}$, respectively. It is clear that $\mathcal {L}(\mathcal {A})=\mathcal {L}(\mathcal {A}^\mathsf {ac})=\mathcal {L}(\mathcal {A}^\mathsf {t})$.

We say that an equivalence relation $\equiv $ over S is right invariant w.r.t. $\mathcal {A}$ iff

1.
$\forall s,t\in S,\;s\equiv t\,\wedge \,s\in F \implies t\in F$
2.
$\forall s,t \in S, \forall a \in \varSigma ,\; s\equiv t \implies \forall s_1 \in \delta (s,a)$ $\exists t_1 \in \delta (t,a), s_1 \equiv t_1$.

If $\equiv $ is right invariant, then we can define the quotient automaton $\mathcal {A}/_\equiv $ in the usual way, and $\mathcal {L}(\mathcal {A}/_\equiv ) = \mathcal {L}(\mathcal {A})$.

The notions of partial derivatives and partial derivative automata were introduced by Antimirov [1] for simple regular expressions. Bastos et al. [2] presented an extension of the Antimirov construction from ${\mathsf {RE}}_{\cap } $ expressions.

Definition 1

For $\alpha \in {\mathsf {RE}}_{\cap } $ and $a \in \varSigma $, the set $\partial _a(\alpha )$ of partial derivatives of $\alpha $ w.r.t. a is defined by:

where for $S,T \subseteq {\mathsf {RE}}_{\cap } $ and $\beta \in {\mathsf {RE}}_{\cap } $, $S\odot \beta = \{\; \alpha \beta \mid \alpha \in S\;\}$, $\beta \odot S = \{\; \beta \alpha \mid \alpha \in S\;\}$, and .

This definition is extended to any word w by $\partial _\varepsilon (\alpha ) = \{\alpha \}$, $\partial _{wa}(\alpha ) = \bigcup _{\alpha _i\in \partial _w(\alpha )}\partial _a(\alpha _i)$, and $\partial _w(R) = \bigcup _{\alpha _i \in R} \partial _w(\alpha _i)$, where $R \subseteq {\mathsf {RE}}_{\cap } $. The set of partial derivatives of an expression $\alpha $ is $\partial (\alpha ) = \bigcup _{w \in \varSigma ^\star } \partial _w(\alpha )$. As for simple regular expressions, the partial derivative automaton of an expression $\alpha \in {\mathsf {RE}}_{\cap } $ is defined by $\mathcal{A}_{\mathsf {pd}}(\alpha ) = \langle \partial (\alpha ), \varSigma , \{\alpha \}, \delta _\mathsf {pd}, F_\mathsf {pd}\rangle ,$ where $F_\mathsf {pd}=\{\; \gamma \in \partial (\alpha ) \mid \varepsilon (\gamma ) =\varepsilon \;\}$ and $\delta _\mathsf {pd}(\gamma , a) = \partial _a(\gamma )$. It follows that $\mathcal {L}(\mathcal{A}_{\mathsf {pd}}(\alpha ))$ is exactly $\mathcal {L}(\alpha )$ and by construction $\mathcal{A}_{\mathsf {pd}}(\alpha )$ is accessible. Bastos et al. [2] showed also that $|\partial (\alpha )|\le 2^{|\alpha |_\varSigma - |\alpha |_\cap -1}+1$ and asymptotically and on average an upper bound for the number of states is $(1.056 +o(1))^n$, where n is the size of the expression.

3 Indexed Expressions

Given an alphabet $\varSigma $ and a nonempty set of indexes $J \subseteq \mathbb {N}$, let $\varSigma _J = \{ \; a_j \mid a\in \varSigma , j \in J \;\}$. An indexed regular expression is a regular expression over the alphabet $\varSigma _J$ such that for all $a_i, b_j \in \varSigma _J$ occurring in the expression, $a \not =b$ implies $i \not = j$. We let $\rho ,\rho _1,\rho _2,\ldots $ denote indexed regular expressions. If $\rho $ is an indexed expression, then $\overline{\rho }$ is the regular expression over the alphabet $\varSigma $ obtained by removing the indexes. The set of all indexes occurring in $\rho $ is denoted by ${{\mathrm{\mathsf {ind}}}}(\rho ) = \{ \; i \mid a_i \in \varSigma _\rho \;\}$. Given an indexed expression $\rho $ and $i \in {{\mathrm{\mathsf {ind}}}}(\rho )$, $\ell _\rho (i)$ is the letter indexed by i in $\rho $. From now on, we will simply write $\ell (i)$ for $\ell _\rho (i)$ since it will always be clear that we are referring to a specific expression $\rho $. Given an indexed expression $\rho $, let

$$\mathcal{I}_\rho = \{\; \mathrm {I}\subseteq {{\mathrm{\mathsf {ind}}}}(\rho ) \mid \mathrm {I}\not = \emptyset \text { and } \forall i_1,i_2 \in \mathrm {I}, \ell (i_1) = \ell (i_2)\;\}.$$

For $\mathrm {I}\in \mathcal{I}_\rho $ we extend the definition of $\ell $ by $\ell (\mathrm {I}) = \ell (i)$, $i \in \mathrm {I}$. Finally, we say that $\rho $ is well-indexed if for all subterms of $\rho $ of the form $\rho _1\cap \rho _2$ one has ${{\mathrm{\mathsf {ind}}}}(\rho _1) \cap {{\mathrm{\mathsf {ind}}}}(\rho _2) = \emptyset $.

Example 2

For $\rho = a_1(a_4b_5^\star \cap a_4)$ one has $\overline{\rho }= a(ab^\star \cap a)$, ${{\mathrm{\mathsf {ind}}}}(\rho )=\{1,4,5\}$, $\ell (4)=\ell (\{1,4\}) = a$ and $\mathcal{I}_\rho = \{\{1\},\{4\}, \{5\}, \{1,4\}\}$. However, this expression is not well-indexed, since $a_4$ occurs on both sides of an intersection.

Definition 3

Consider an indexed expression $\rho $. For $L \subseteq \mathcal{I}_\rho ^\star $ and $x = \mathrm {I}_1 \cdots \mathrm {I}_n \in L$, we define $\ell (x) = \ell (\mathrm {I}_1) \cdots \ell (\mathrm {I}_n)$ and $\ell (L) = \{\; \ell (x) \mid x \in L \;\}$. The indexed intersection of two words $x=\mathrm {I}_1 \cdots \mathrm {I}_m, y= \mathrm {J}_1 \cdots \mathrm {J}_n \in \mathcal{I}_\rho ^\star $ is defined by $x \cap _\mathcal{I}y = (\mathrm {I}_1 \cup \mathrm {J}_1) \cdots (\mathrm {I}_n \cup \mathrm {J}_n)$ if $\ell (x) = \ell (y)$ ^{Footnote 1}, and undefined otherwise. Then, the indexed intersection of two languages $L_1,L_2 \in \mathcal{I}_\rho ^\star $ is defined as follows:

$$ \begin{array}{lll} L_1 \cap _\mathcal{I}L_2= & {} \{\; x \cap _\mathcal{I}y \mid x \in L_1, y \in L_2 \; \}. \end{array}$$

We define the index-language $\mathcal {L}_\mathcal{I}(\rho ) \subseteq \mathcal{I}_\rho ^\star $ associated to $\rho $ as follows.

$$\begin{array}{lllll} \begin{array}{rcl} \mathcal {L}_\mathcal{I}(\emptyset ) &{}=&{} \emptyset , \\ \mathcal {L}_\mathcal{I}(\varepsilon ) &{}=&{} \{\varepsilon \}, \\ \end{array}&{} \quad \begin{array}{rcl} \mathcal {L}_\mathcal{I}(a_i) &{}=&{} \{ \{i\}\}, \\ \mathcal {L}_\mathcal{I}(\rho ^\star ) &{}=&{} \mathcal {L}_\mathcal{I}(\rho )^\star ,\\ \end{array} &{}\quad \quad \quad \begin{array}{rcl} \mathcal {L}_\mathcal{I}(\rho _1 + \rho _2) &{}=&{} \mathcal {L}_\mathcal{I}(\rho _1) \cup \mathcal {L}_\mathcal{I}(\rho _2), \\ \mathcal {L}_\mathcal{I}(\rho _1 \cdot \rho _2) &{}=&{} \mathcal {L}_\mathcal{I}(\rho _1) \cdot \mathcal {L}_\mathcal{I}(\rho _2), \\ \mathcal {L}_\mathcal{I}(\rho _1 \cap \rho _2) &{}=&{} \mathcal {L}_\mathcal{I}(\rho _1) \cap _\mathcal{I}\mathcal {L}_\mathcal{I}(\rho _2). \\ \end{array} \end{array}$$

Example 4

For $\rho = (a_1 a_2 + b_3 + a_4)^\star \cap (a_5 + b_6)^\star $, we have $\mathcal {L}_\mathcal{I}(\rho ) = \{\{4,5\}, \{3,6\},$ $\{1,5\}\{2,5\}, \{4,5\}\{4,5\}, \{4,5\}\{3,6\},\ldots \},$ and $\ell (\mathcal {L}_\mathcal{I}(\rho )) = \{a,b,aa,ab,\ldots \}$ (since $\ell (\{1,5\}\{2,5\}) = \ell (\{4,5\}\{4,5\}) = aa$).

Proposition 5

Given an indexed expression $\rho $, one has $\ell (\mathcal {L}_\mathcal{I}(\rho )) = \mathcal {L}(\overline{\rho })$.

4 A Position Automaton for ${\mathsf {RE}}_{\cap } $ Expressions

Let $\alpha \in {\mathsf {RE}}_{\cap } $. We define the set of positions in $\alpha $ by $\mathsf {pos}(\alpha ) = \{1,\ldots , |\alpha |_\varSigma \}$. As usual, we let $\overline{\alpha }$ denote the expression obtained from $\alpha $ by indexing each letter with its position in $\alpha $. The same notation is used to remove the indexes, as already stated, thus, $\overline{\overline{\alpha }} = \alpha $. Note that for $\alpha \in {\mathsf {RE}}_{\cap } $, the indexed expression $\overline{\alpha }$ is always linear (thus well-indexed), and also $\mathsf {pos}(\alpha ) = {{\mathrm{\mathsf {ind}}}}(\overline{\alpha })$.

Given an indexed linear expression $\rho $ we define the following sets:

$$\begin{aligned} {{\mathrm{\mathsf {First}^\prime }}}(\rho )= & {} \{\; \mathrm {I}\mid \mathrm {I}x \in \mathcal {L}_\mathcal{I}(\rho ) \;\}, \\ {{\mathrm{\mathsf {Last}^\prime }}}(\rho )= & {} \{\; \mathrm {I}\mid x\mathrm {I}\in \mathcal {L}_\mathcal{I}(\rho ) \;\}, \\ {{\mathrm{\mathsf {Follow}^\prime }}}(\rho )= & {} \{\; (\mathrm {I},\mathrm {J}) \mid x\mathrm {I}\mathrm {J}y \in \mathcal {L}_\mathcal{I}(\rho ) \;\} . \end{aligned}$$

Then, given $\alpha \in {\mathsf {RE}}_{\cap } $, we define ${{\mathrm{\mathsf {First}}}}(\alpha )={{\mathrm{\mathsf {First}^\prime }}}(\overline{\alpha }),{{\mathrm{\mathsf {Last}}}}(\alpha )={{\mathrm{\mathsf {Last}^\prime }}}(\overline{\alpha })$, and ${{\mathrm{\mathsf {Follow}}}}(\alpha ) = {{\mathrm{\mathsf {Follow}^\prime }}}(\overline{\alpha })$.

Definition 6

The position automaton of an expression $\alpha \in {\mathsf {RE}}_{\cap } $ is

$$\mathcal{A}_{\mathsf {pos}}(\alpha ) = \langle S_\mathsf {pos},\varSigma ,\{\{0\}\},\delta _\mathsf {pos},F_\mathsf {pos}\rangle ,$$

$$\begin{aligned} \text {where }S_\mathsf {pos}= & {} \{\{0\}\} \cup \{ \mathrm {I}\in \mathcal{I}_{\overline{\alpha }} \mid x\mathrm {I}y \in \mathcal {L}_\mathcal{I}({\overline{\alpha }}) \text { for some } x, y \in \mathcal{I}_{\overline{\alpha }}^\star \;\},\\ \delta _\mathsf {pos}= & {} \{\;(\mathrm {I},\ell (\mathrm {J}),\mathrm {J}) \mid (\mathrm {I},\mathrm {J}) \in {{\mathrm{\mathsf {Follow}}}}(\alpha )\;\} \cup \{\; (\{0\},\ell (\mathrm {I}),\mathrm {I}) \mid \mathrm {I}\in {{\mathrm{\mathsf {First}}}}(\alpha )\;\},\\ F_\mathsf {pos}= & {} {\left\{ \begin{array}{ll} {{\mathrm{\mathsf {Last}}}}(\alpha ) \cup \{\{0\}\},&{}\text {if } \varepsilon (\alpha )=\varepsilon ;\\ F_\mathsf {pos}= {{\mathrm{\mathsf {Last}}}}(\alpha ),&{}\text {otherwise.} \end{array}\right. } \end{aligned}$$

Proposition 7

Given an expression $\alpha \in {\mathsf {RE}}_{\cap } $, one has $\mathcal {L}(\mathcal{A}_{\mathsf {pos}}(\alpha )) = \mathcal {L}(\alpha )$.

Note that for regular expressions without intersection (simple regular expressions) the automaton is, by the definition of $\mathcal {L}_\mathcal{I}$, isomorphic to the classic position automaton, with the difference that now states are labelled with singletons $\{ i \}$ instead of $i \in \mathsf {pos}(\alpha ) \cup \{0\}$. We now give definitions for recursively computing sets corresponding to ${{\mathrm{\mathsf {First}}}}$, ${{\mathrm{\mathsf {Last}}}}$ and ${{\mathrm{\mathsf {Follow}}}}$. These definitions lead to supersets of the corresponding sets but we will proof that extra elements can be discarded and if we trim the resulting NFA we obtain $\mathcal{A}_{\mathsf {pos}}$.

Definition 8

Given a indexed well-indexed expression $\rho $, let $\mathsf {Fst}(\rho ) \subseteq \mathcal{I}_\rho $ be inductively defined as follows,

$$\begin{array}{ll} \begin{array}{rcl} \mathsf {Fst}(\emptyset ) &{}=&{} \mathsf {Fst}(\varepsilon ) = \emptyset \\ \mathsf {Fst}(a_i) &{}=&{} \{ \{i \}\} \\ \mathsf {Fst}(\rho ^\star ) &{}=&{} \mathsf {Fst}(\rho )\\ \end{array} &{}\quad \begin{array}{rcl} \mathsf {Fst}(\rho _1 + \rho _2) &{}=&{} \mathsf {Fst}(\rho _1) \cup \mathsf {Fst}(\rho _2) \\ \mathsf {Fst}(\rho _1\cdot \rho _2) &{}=&{} {\left\{ \begin{array}{ll} \mathsf {Fst}(\rho _1) \cup \mathsf {Fst}(\rho _2), &{} \text {if }\varepsilon (\rho _1) = \varepsilon \\ \mathsf {Fst}(\rho _1), &{} \text {otherwise} \end{array}\right. }\\ \mathsf {Fst}(\rho _1 \cap \rho _2) &{}=&{} \mathsf {Fst}(\rho _1) \otimes \mathsf {Fst}(\rho _2). \end{array} \end{array}$$

where for $F_1,F_2\subseteq \mathcal{I}_\rho $, $F_1 \otimes F_2 = \{\; \mathrm {I}_1 \cup \mathrm {I}_2 \mid \ell (\mathrm {I}_1) = \ell (\mathrm {I}_2) \text { and}\, \mathrm {I}_1 \in F_1, \mathrm {I}_2 \in F_2\;\}.$

By construction, all elements $\mathrm {I}\in \mathsf {Fst}(\rho )$ are non-empty and such that $\ell (i_1)=\ell (i_2)$ for all $i_1,i_2 \in \mathrm {I}$, guaranting that $\otimes $ is well defined and $\mathsf {Fst}(\rho ) \subseteq \mathcal{I}_\rho $.

Example 9

We have $\mathsf {Fst}(a_1^\star b_2^\star \cap a_3) = \mathsf {Fst}(a_1^\star b_2^\star ) \otimes \mathsf {Fst}(a_3) = \{\{1\},\{2\}\} \otimes \{\{3\}\} = \{\{1,3\}\}$.

Definition 10

Given a well-indexed expression $\rho $, the set $\mathsf {Lst}(\rho ) \subseteq \mathcal{I}_\rho $ is defined as $\mathsf {Fst}(\rho )$, with the difference that for concatenation we have:

$$\mathsf {Lst}(\rho _1 \cdot \rho _2) = {\left\{ \begin{array}{ll} \mathsf {Lst}(\rho _1) \cup \mathsf {Lst}(\rho _2), &{} \text {if }\varepsilon (\rho _2) = \varepsilon \\ \mathsf {Lst}(\rho _2), &{} \text {otherwise.} \end{array}\right. }$$

The set $\mathsf {Fol}(\rho ) \subseteq \mathcal{I}_\rho \times \mathcal{I}_\rho $ is inductively defined as follows,

$$\begin{array}{c} \begin{array}{lll} \mathsf {Fol}(\emptyset ) &{}=&{} \mathsf {Fol}(\varepsilon ) = \mathsf {Fol}(a_i) = \emptyset \\ \mathsf {Fol}(\rho ^\star ) &{}=&{} \mathsf {Fol}(\rho ) \cup \mathsf {Lst}(\rho ) \times \mathsf {Fst}(\rho ) \end{array} \quad \begin{array}{lll} \mathsf {Fol}(\rho _1 + \rho _2) &{}=&{} \mathsf {Fol}(\rho _1) \cup \mathsf {Fol}(\rho _2) \\ \mathsf {Fol}(\rho _1 \cap \rho _2) &{}=&{} \mathsf {Fol}(\rho _1) \otimes \mathsf {Fol}(\rho _2) \end{array}\\ \mathsf {Fol}(\rho _1 \cdot \rho _2) = \mathsf {Fol}(\rho _1) \cup \mathsf {Fol}(\rho _2) \cup \mathsf {Lst}(\rho _1) \times \mathsf {Fst}(\rho _2). \end{array}$$

where, for $S_1,S_2\subseteq \mathcal{I}_\rho \times \mathcal{I}_\rho $,

$$\begin{aligned}&S_1 \otimes S_2 = \{\; (\mathrm {I}_1\cup \mathrm {I}_2, \mathrm {J}_1\cup \mathrm {J}_2) \mid (\mathrm {I}_1,\mathrm {J}_1) \in S_1, (\mathrm {I}_2,\mathrm {J}_2) \in S_2\text { and } \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \,\,\ell (\mathrm {I}_1) = \ell (\mathrm {I}_2), \ell (\mathrm {J}_1) = \ell (\mathrm {J}_2) \;\}. \end{aligned}$$

In the next definition we will use the projection functions on the first and second coordinates, $\pi _1$ and $\pi _2$, respectively.

Definition 11

Given $\alpha \in {\mathsf {RE}}_{\cap } $, let $\mathcal{A}_{\mathsf {posi}}(\alpha ) = \langle S_\mathsf {posi}, \varSigma , \{\{0\}\}, \delta _\mathsf {posi}, F_\mathsf {posi}\rangle $ be the NFA where $S_\mathsf {posi}=\{\{0\}\}\cup \mathsf {Fst}(\overline{\alpha })\cup \mathsf {Lst}(\overline{\alpha })\cup \pi _1({\mathsf {Fol}(\overline{\alpha })})\cup \pi _2({\mathsf {Fol}(\overline{\alpha })})$, and $\delta _\mathsf {posi}$ and $F_\mathsf {posi}$ are defined as $\delta _\mathsf {pos}$ and $F_\mathsf {pos}$, substituting the functions ${{\mathrm{\mathsf {First}}}}$, ${{\mathrm{\mathsf {Last}}}}$ and ${{\mathrm{\mathsf {Follow}}}}$, by $\mathsf {Fst}$, $\mathsf {Lst}$ and $\mathsf {Fol}$, respectively.

We will now show that $\mathcal {L}(\mathcal{A}_{\mathsf {pos}}(\alpha )) = \mathcal {L}(\mathcal{A}_{\mathsf {posi}}(\alpha ))$, and that $\mathcal{A}_{\mathsf {pos}}(\alpha )$ is obtained by trimming $\mathcal{A}_{\mathsf {posi}}(\alpha )$, as the result of the two following lemmas. An example is presented at the end of this section.

Lemma 12

Given an indexed linear expression $\rho $, one has: $1) {{\mathrm{\mathsf {First}^\prime }}}(\rho ) \subseteq \mathsf {Fst}(\rho )$; $2) {{\mathrm{\mathsf {Last}^\prime }}}(\rho ) \subseteq \mathsf {Lst}(\rho )$; $3) {{\mathrm{\mathsf {Follow}^\prime }}}(\rho ) \subseteq \mathsf {Fol}(\rho )$.

Example 13

For $\rho = (a_1 \cap b_2) c_3 d_4$, we have $(\{3\},\{4\}) \in \mathsf {Fol}(\rho )$, but $(\{3\},\{4\}) \not \in {{\mathrm{\mathsf {Follow}}}}(\rho )$. Thus, $\mathsf {Fol}(\rho ) \not \subseteq {{\mathrm{\mathsf {Follow}}}}(\rho )$.

The previous Lemma shows that for any $\alpha \in {\mathsf {RE}}_{\cap } $, $\mathcal{A}_{\mathsf {pos}}(\alpha )$ is a subautomaton of $\mathcal{A}_{\mathsf {posi}}(\alpha )$, and thus $\mathcal {L}(\mathcal{A}_{\mathsf {pos}}(\alpha ))\subseteq \mathcal {L}(\mathcal{A}_{\mathsf {posi}}(\alpha ))$. We now show that both recognize the same language and can be made isomorphic by trimming $\mathcal{A}_{\mathsf {posi}}$.

Lemma 14

Given an indexed linear expression $\rho $ and some $n \ge 1$, if $\mathrm {I}_n \in \mathsf {Lst}(\rho )$ and there exist $\mathrm {I}_1,\ldots , \mathrm {I}_n \in \mathcal{I}_\rho $ such that

$$(\{0\},\ell (\mathrm {I}_1), \mathrm {I}_1), (\mathrm {I}_1 ,\ell (\mathrm {I}_2),\mathrm {I}_2), \ldots , (\mathrm {I}_{n-1},\ell (\mathrm {I}_n),\mathrm {I}_n) \in \delta _\mathsf {posi},$$

then $\mathrm {I}_1\cdots \mathrm {I}_n \in \mathcal {L}_\mathcal{I}(\rho )$.

Theorem 15

For any $\alpha \in {\mathsf {RE}}_{\cap } $, $\mathcal {L}(\mathcal{A}_{\mathsf {pos}}(\alpha ))=\mathcal {L}(\mathcal{A}_{\mathsf {posi}}(\alpha ))$.

From these results, it follows that if we trim the automaton $\mathcal{A}_{\mathsf {posi}}$ we obtain exactly $\mathcal{A}_{\mathsf {pos}}$.

Example 16

Consider $\alpha =(ba^\star b+a)\cap (aa+b)^\star $. Then $\overline{\alpha }=(b_1a_2^\star b_3+a_4)\cap (a_5a_6+b_7)^\star $, $\mathsf {Fst}(\overline{\alpha })=\{\{1,7\},\{4,5\}\}$, $\mathsf {Lst}(\overline{\alpha })=\{\{3,7\},\{4,6\}\}$, and $\mathsf {Fol}(\overline{\alpha }) = \{ (\{2,5\},\{2,6\}), (\{2,6\},\{2,5\}),(\{2,6\},\{3,7\}), (\{1,7\},\{2,5\}), (\{1,7\},\{3,7\})\}.$

The automaton $\mathcal{A}_{\mathsf {posi}}(\alpha )$ is represented in Fig. 1. The trimmed automaton, $\mathcal{A}_{\mathsf {posi}}(\alpha )^\mathsf {t}$, is obtained removing the states labeled by $\{4,5\}$ and $\{4,6\}$, and the correspondent transitions.

5 A $\mathsf {c}$-Continuation Automaton for ${\mathsf {RE}}_{\cap } $ Expressions

In the case of simple regular expressions, Champarnaud and Ziadi [9] defined a nondeterministic automaton isomorphic to the position automaton, called the $\mathsf {c}$-continuation automaton, in order to show that the partial derivative automaton can be seen as a quotient of the position automaton. With the same purpose, in this section, we present a $\mathsf {c}$-continuation automaton for expressions with intersection. Moreover, instead of considering derivatives of regular expressions [5], we use partial derivatives to restate some known results for simple regular expressions.

The notion of continuation was defined by Berry and Sethy [3], and developed by Champarnaud and Ziadi [9], by Ilie and Yu [14], and by Chen and Yu [10]. Given $a \in \varSigma $ and a linear simple expression $\alpha $, the set of partial derivatives $\partial _{xa}(\alpha )$, for any word $x\in \varSigma ^\star $, is either $\emptyset $ or has a unique element $\gamma $ called the continuation of a in $\alpha $. Note that using partial derivatives, continuations and non-null $\mathsf {c}$-continuations coincide. Furthermore, the continuation can be obtained by some refinement of the inductive definition of partial derivatives, exploring the linearity of $\alpha $. In order to establish similar results for linear well-indexed expressions, we introduce the notion of partial index-derivative of a well-indexed expression $\rho $ w.r.t. an index $\mathrm {I}\in \mathcal{I}_\rho $.

Given a well-indexed expression $\rho $, a subexpression $\tau $ of $\rho $, and a set of indexes $\mathrm {I}\in \mathcal{I}_\rho $, let ${{\mathrm {I}}\big |_{{\tau }}}$ denote the set of indexes in $\mathrm {I}$ that occur in $\tau $. This definition is naturally extended to words $x = \mathrm {I}_1 \cdots \mathrm {I}_n \in \mathcal{I}_\rho ^\star $ by ${{x}\big |_{{\tau }}} = {{\mathrm {I}_{1}}\big |_{{\tau }}} \cdots {{\mathrm {I}_{n}}\big |_{{\tau }}}$, for $n \ge 0$.

Definition 17

The set of partial index-derivatives of a well-indexed expression $\rho $ by $\mathrm {I}\in \mathcal{I}_\rho \cup \{\emptyset \}$, $\partial _\mathrm {I}(\rho )$, is defined by

The set of partial index-derivatives of $\rho $ by a word $x \in \mathcal{I}_\rho ^\star $ is then inductively defined by $\partial _\varepsilon (\rho ) = \{\rho \}$ and $\partial _{x\mathrm {I}}(\rho ) = \bigcup _{\rho '\in \partial _x(\rho )}\partial _\mathrm {I}(\rho ')$. If S is a set of well-indexed expressions, $\partial _{x}(S)=\bigcup _{\rho \in S}\partial _x(\rho )$.

It is straightforward to see that $\partial _{\emptyset }(\rho ) = \emptyset $ for all $\rho $. Although $\emptyset \not \in \mathcal{I}_\rho $, the notion of partial index-derivative includes the derivative by an empty set of indexes, in order to guarantee that the derivative of an intersection is well-defined. Also note that the partial index-derivative of a well-indexed expression is still well-indexed. Finally, the set of partial index-derivatives of $\rho $ by all $\mathrm {I}\in \mathcal{I}_\rho $ can be calculated simultaneously using an extension of the linear form defined by Antimirov [1], i.e. considering pairs $(\mathrm {I}, \rho ')$ where $\rho '\in \partial _\mathrm {I}(\rho )$. The following lemma is proved by induction on n.

Lemma 18

If $x=\mathrm {I}_1\cdots \mathrm {I}_n$ and $\partial _{x}(\rho ) \not = \emptyset $, then $x={{x}\big |_{{\rho }}}$.

Example 19

We have .

Proposition 20

Consider a well-indexed expression $\rho $ and $\mathrm {I}\in \mathcal{I}_\rho $. Then,

Corollary 21

For every well-indexed expression $\rho \in {\mathsf {RE}}_{\cap } $ and word $x \in \mathcal{I}_\rho ^\star $, one has $x^{-1} \mathcal {L}_\mathcal{I}({\rho }) = \mathcal {L}_\mathcal{I}(\partial _x(\rho ))$ and $\mathcal {L}_\mathcal{I}({\rho })=\mathcal {L}_\mathcal{I}(\bigcup _{x\in \mathcal{I}^\star _\rho }( x \odot \partial _x(\rho ))\cup \varepsilon (\rho )).$

The following is an adaptation, for partial index-derivatives and intersection, of a result due to Berry and Sethi [3].

Proposition 22

Consider a linear indexed expression $\rho \in {\mathsf {RE}}_{\cap } $ and $x\mathrm {I}\in \mathcal{I}_\rho ^\star $, and let ${{\mathrm{\mathsf {suff}}}}(x)$ denote the set of all suffixes of x. The partial index-derivative $\partial _{x\mathrm {I}}(\rho )$ of $\rho $ satisfies:

The previous proposition implies that if $\partial _{x\mathrm {I}}({\rho }) \not = \emptyset $, then it has only one element for every $x\in \mathcal{I}_\rho ^\star $. This fact is proved in Proposition 24 and the unique element (if exists) is defined below.

Definition 23

Given a linear indexed expression $\rho $ and a set of indexes $\mathrm {I}$, the c-continuation $\mathsf {c}_\mathrm {I}(\rho )$ of $\rho $ w.r.t. $\mathrm {I}$ is defined by the following rules.

$$\begin{array}{c} \begin{array}{lll} \mathsf {c}_\mathrm {I}(\emptyset ) &{}=&{} \mathsf {c}_\mathrm {I}(\varepsilon ) = \emptyset \\ \mathsf {c}_\mathrm {I}(a_i) &{}=&{} {\left\{ \begin{array}{ll} \varepsilon , &{} \text {if } \mathrm {I}= \{i\} \\ \emptyset , &{} \text {otherwise} \end{array}\right. }\\ \end{array} \quad \quad \quad \begin{array}{lll} \mathsf {c}_\mathrm {I}(\rho ^\star ) &{}=&{} \mathsf {c}_\mathrm {I}(\rho ) \rho ^\star \\ \mathsf {c}_\mathrm {I}(\rho _1 + \rho _2) &{}=&{} {\left\{ \begin{array}{ll} \mathsf {c}_\mathrm {I}(\rho _1), &{} \text {if } \mathsf {c}_\mathrm {I}(\rho _1) \not = \emptyset \\ \mathsf {c}_\mathrm {I}(\rho _2), &{} \text { otherwise} \end{array}\right. }\\ \end{array} \\ \begin{array}{lll} \mathsf {c}_\mathrm {I}(\rho _1 \cdot \rho _2) &{}=&{} {\left\{ \begin{array}{ll} \mathsf {c}_\mathrm {I}(\rho _1)\cdot \rho _2, &{} \text {if } \mathsf {c}_\mathrm {I}(\rho _1) \ne \emptyset \\ \mathsf {c}_\mathrm {I}(\rho _2), &{} \text {otherwise} \end{array}\right. }\\ \mathsf {c}_\mathrm {I}(\rho _1 \cap \rho _2) &{}=&{} {\left\{ \begin{array}{ll} \mathsf {c}_{{{\mathrm {I}}|_{{\rho _1}}}}(\rho _1) \cap \mathsf {c}_{{{\mathrm {I}}|_{{\rho _2}}}}(\rho _2), &{} \text {if } \mathrm {I}= {{\mathrm {I}}\big |_{{\rho _1}}} \cup {{\mathrm {I}}\big |_{{\rho _1}}}\\ \emptyset , &{} \text {otherwise.} \end{array}\right. } \end{array} \end{array}$$

It is easy to verify that $\mathsf {c}_{\mathrm {I}}(\rho ) \ne \emptyset $ implies $\mathrm {I}\subseteq {{\mathrm{\mathsf {ind}}}}(\rho )$, i.e. ${{\mathrm {I}}\big |_{{\rho }}} = \mathrm {I}$.

Proposition 24

Consider a linear indexed expression $\rho $ and $\mathrm {I}\in \mathcal{I}_\rho $. Then, for every $x \in \mathcal{I}_\rho ^\star $ such that $\partial _{x\mathrm {I}}({\rho }) \not = \emptyset $, one has $\partial _{x\mathrm {I}}({\rho }) = \{\mathsf {c}_\mathrm {I}(\rho )\}$ and $\mathsf {c}_\mathrm {I}(\rho ) \not =\emptyset $.

Proof

We proceed by induction on the structure of $\rho $. For $\emptyset $ and $\varepsilon $ the set of partial index-derivatives is $\emptyset $. Let $\rho $ be $a_i$. We need to prove that $\forall \mathrm {I}\in \mathcal{I}_{a_i} \forall x\in \mathcal{I}_{a_i}^\star \; \left( \partial _{x\mathrm {I}}(a_i)\not =\emptyset \implies \partial _{x\mathrm {I}}(a_i) = \{\mathsf {c}_\mathrm {I}(a_i)\} \not = \{\emptyset \}\right) .$ Let $\partial _{x\mathrm {I}}(a_i)\not =\emptyset $, then by Proposition 22, $\partial _{x\mathrm {I}}(a_i)=\{\varepsilon \}$ and $x\mathrm {I}=\{i\}$. Then $\mathrm {I}=\{i\}$ and $\mathsf {c}_\mathrm {I}(a_i)=\varepsilon $. Thus, we conclude that $\partial _{x\mathrm {I}}(a_i)= \{\mathsf {c}_\mathrm {I}(a_i)\} \not = \{\emptyset \}$. Let us suppose that for $\rho _i$, $i=1,2$ we have $\forall \mathrm {I}\in \mathcal{I}_{\rho _i} \forall x\in \mathcal{I}_{\rho _i}^\star \;(\partial _{x\mathrm {I}}(\rho _i)\not =\emptyset \implies \partial _{x\mathrm {I}}(\rho _i)=\{\mathsf {c}_\mathrm {I}(\rho _i)\} \not = \{\emptyset \}).$ Let $\rho =\rho _1+\rho _2$ be such that $\partial _{x\mathrm {I}}(\rho _1+\rho _2)\not =\emptyset $. Then, $\partial _{x\mathrm {I}}(\rho _1+\rho _2)=\partial _{x\mathrm {I}}(\rho _i)$ with $x\mathrm {I}={{(x\mathrm {I})}\big |_{{\rho _i}}}$, for some $i \in \{1,2\}$. By the induction hypothesis, $\partial _{x\mathrm {I}}(\rho _i)= \{\mathsf {c}_\mathrm {I}(\rho _i)\}\not = \{\emptyset \}$. Thus, $\mathsf {c}_\mathrm {I}(\rho _i)\not =\emptyset $ and $\mathsf {c}_\mathrm {I}(\rho _1+\rho _2)=\mathsf {c}_\mathrm {I}(\rho _i)$. Let $\rho =\rho _1\rho _2$. If $\partial _{x\mathrm {I}}(\rho _1\rho _2)\not =\emptyset $ then we have to consider two cases. Let $\partial _{x\mathrm {I}}(\rho _1\rho _2)= \partial _{x\mathrm {I}}(\rho _1)\odot \rho _2$ and $x\mathrm {I}={{(x\mathrm {I})}\big |_{{\rho _1}}}$. Then, $\partial _{x\mathrm {I}}(\rho _1)\not =\emptyset $ and $\partial _{x\mathrm {I}}(\rho _1)=\{\mathsf {c}_\mathrm {I}(\rho _1)\}$. We conclude that $\mathsf {c}_\mathrm {I}(\rho _1)\not =\emptyset $ and $\mathsf {c}_\mathrm {I}(\rho _1\rho _2)=\mathsf {c}_\mathrm {I}(\rho _1)$. In the second case, $\partial _{x\mathrm {I}}(\rho _1\rho _2)= \partial _{z\mathrm {I}}(\rho _2)\not =\emptyset $, $x=yz$, $\varepsilon (\partial _{y}(\rho _1))=\varepsilon $ and $z\mathrm {I}={{(z\mathrm {I})}\big |_{{\rho _2}}}$. We conclude that $y={{y}\big |_{{\rho _1}}}$ and $\mathrm {I}={{\mathrm {I}}\big |_{{\rho _2}}}$. Then, $\mathsf {c}_\mathrm {I}(\rho _1)=\emptyset $ and $\mathsf {c}_\mathrm {I}(\rho _1\rho _2)=\mathsf {c}_\mathrm {I}(\rho _2)$. By the induction hypothesis, $\partial _{z\mathrm {I}}(\rho _2)=\{\mathsf {c}_\mathrm {I}(\rho _2)\}$ and the result follows. Let $\rho =\rho _1^\star $. If $\partial _{x\mathrm {I}}(\rho _1^\star )\not =\emptyset $, we can write $\partial _{x\mathrm {I}}(\rho _1^\star )=\partial _{v_1\mathrm {I}}(\rho _1)\odot \rho _1^\star \cup \cdots \cup \partial _{v_n\mathrm {I}}(\rho _1)\odot \rho _1^\star $, with $n\ge 1$, such that for all $1 \le i\le n$, $x=u_iv_i$ and $\partial _{v_i\mathrm {I}}(\rho _1)\odot \rho _1^\star \not =\emptyset $. By the induction hypothesis, each nonempty set of partial index-derivatives $\partial _{v_i\mathrm {I}}(\rho _1)$ is equal to $\{\mathsf {c}_\mathrm {I}(\rho _1)\}\not =\{\emptyset \}$. Thus, $\partial _{x\mathrm {I}}(\rho _1^\star )=\{\mathsf {c}_\mathrm {I}(\rho _1)\rho _1^\star \}$. Finally, let $\rho =\rho _1\cap \rho _2$ be such that $\partial _{x\mathrm {I}}(\rho _1\cap \rho _2)\not =\emptyset $. Then , $x\mathrm {I}= {{(x\mathrm {I})}\big |_{{\rho _1}}} \cap _\mathcal{I}{{(x\mathrm {I})}\big |_{{\rho _2}}}$ and $\partial _{{{(x\mathrm {I})}|_{{\rho _i}}}}({\rho _i})\not =\emptyset $, for $i=1,2$. Moreover, $\partial _{{{(x\mathrm {I})}|_{{\rho _i}}}}({\rho _i})=\{\mathsf {c}_{{\mathrm {I}}|_{{\rho _i}}}(\rho _i)\}$. The result follows by the induction hypothesis and from the definition of $\mathsf {c}_\mathrm {I}(\rho _1\cap \rho _2)$. $\square $

This result guarantees that, given a linear indexed expression ${\rho }$ and $\mathrm {I}\in \mathcal{I}_{\rho }$, all sets of partial index-derivatives $\partial _{x\mathrm {I}}(\rho )$ different from $\emptyset $ are singletons with an unique c-continuation $\mathsf {c}_\mathrm {I}(\rho )$ of $\rho $ w.r.t. $\mathrm {I}$.

Lemma 25

Consider a linear indexed expression $\rho $. Then, $\mathrm {I}\in \mathsf {Lst}(\rho )$ if and only if $\varepsilon (\mathsf {c}_\mathrm {I}(\rho )) = \varepsilon $.

Lemma 26

Consider a linear indexed expression $\rho $ and sets of indexes $\mathrm {I}, \mathrm {J}\in \mathcal{I}_\rho ^\star $. Then, $(\mathrm {I}, \mathrm {J}) \in \mathsf {Fol}(\rho )$ if and only if $\mathrm {J}\in \mathsf {Fst}(\mathsf {c}_\mathrm {I}(\rho ))$.

Definition 27

The c-continuation automaton of an expression $\alpha \in {\mathsf {RE}}_{\cap } $ is

$$\mathcal{A}_{\mathsf {c}}(\alpha ) = \langle S_\mathsf {c},\varSigma ,\{(\{0\},\mathsf {c}_{\{0\}}(\overline{\alpha }))\},\delta _\mathsf {c},F_\mathsf {c}\rangle ,$$

where $ S_\mathsf {c}= \{\; (\mathrm {I},\mathsf {c}_\mathrm {I}(\overline{\alpha })) \mid \mathrm {I}\in S_\mathsf {posi}\;\},\; F_\mathsf {c}= \{\; (\,\mathrm {I},\,\mathsf {c}_\mathrm {I}(\overline{\alpha })) \mid \varepsilon (\mathsf {c}_\mathrm {I}(\overline{\alpha })) = \varepsilon \;\},$ $\mathsf {c}_{\{0\}}(\overline{\alpha }) = \overline{\alpha },\, \delta _\mathsf {c}= \{\; ((\mathrm {I},\mathsf {c}_\mathrm {I}(\overline{\alpha })),\ell (\mathrm {J}), (\mathrm {J},\mathsf {c}_\mathrm {J}(\overline{\alpha }))) \mid \mathrm {J}\in \mathsf {Fst}(\mathsf {c}_\mathrm {I}(\overline{\alpha }))\;\}. $

By Lemmas 25 and 26, and considering $\varphi :S_\mathsf {c}\rightarrow S_\mathsf {posi}$ such that $\varphi ((\mathrm {I},\mathsf {c}_\mathrm {I}(\overline{\alpha })))=\mathrm {I}$, the following holds.

Theorem 28

For $\alpha \in {\mathsf {RE}}_{\cap } $, we have $\mathcal{A}_{\mathsf {posi}}(\alpha ) \simeq \mathcal{A}_{\mathsf {c}}(\alpha )$.

Example 29

Consider the expression $\overline{\alpha }=(b_1a_2^\star b_3+a_4)\cap (a_5a_6+b_7)^\star $, from Example 16, and let $\rho _2=(a_5a_6+b_7)^\star $. We have the following $\mathsf {c}$-continuations: $\mathsf {c}_{\{1,7\}}(\overline{\alpha })=a_2^\star b_3\cap \rho _2$, $\mathsf {c}_{\{4,5\}}(\overline{\alpha })=\varepsilon \cap a_6\rho _2$, $\mathsf {c}_{\{4,6\}}(\overline{\alpha })=\varepsilon \cap \rho _2$, $\mathsf {c}_{\{2,5\}}(\overline{\alpha })=a_2^\star b_3\cap a_6\rho _2$, $\mathsf {c}_{\{2,6\}}(\overline{\alpha })=a_2^\star b_3\cap \rho _2$, and $\mathsf {c}_{\{3,7\}}(\overline{\alpha })=\varepsilon \cap \rho _2$.

6 The $\mathcal{A}_{\mathsf {pd}}$ as a Quotient of $\mathcal{A}_{\mathsf {pos}}$

Using $\mathcal{A}_{\mathsf {c}}$ we show that the partial derivative automaton $\mathcal{A}_{\mathsf {pd}}$ is a quotient of $\mathcal{A}_{\mathsf {pos}}$. This extends the corresponding result for simple regular expressions, although the proof cannot use the same technique. Recall that, for a simple regular expression $\alpha $, one builds $\mathcal{A}_{\mathsf {pd}}(\overline{\alpha })$, and then shows that when its transitions are unmarked, the result $\overline{\mathcal{A}_{\mathsf {pd}}(\overline{\alpha })}$ is isomorphic to a quotient of $\mathcal{A}_{\mathsf {c}}(\alpha )$. However, with $\alpha \in {\mathsf {RE}}_{\cap } $, this method cannot be used because, as mentioned in the introduction, intersection does not commute with marking. For $\alpha \in {\mathsf {RE}}_{\cap } $, we will present a direct isomorphism between $\mathcal{A}_{\mathsf {pd}}(\alpha )$ and a quotient of $\mathcal{A}_{\mathsf {c}}(\alpha )$. The next lemmas will be needed to build that isomorphism.

Lemma 30

Consider a linear indexed expression $\rho $ and $\mathrm {I}\in \mathcal{I}_\rho $. If $\mathrm {I}\in \mathsf {Fst}(\rho )$, then $\mathsf {c}_\mathrm {I}(\rho ) \not = \emptyset $ and $\mathsf {c}_\mathrm {I}(\rho ) \in \partial _\mathrm {I}(\rho )$.

Lemma 31

Consider a linear indexed expression $\rho $ and $\mathrm {I}, \mathrm {J}\in \mathcal{I}_\rho $, such that $\mathrm {J}\in \mathsf {Fst}(\mathsf {c}_\mathrm {I}(\rho ))$. Then, $\mathsf {c}_\mathrm {J}(\rho )\in \partial _{\mathrm {J}}(\mathsf {c}_\mathrm {I}(\rho ))$.

Lemma 32

Consider well-indexed expressions $\rho ',\rho $ and $\mathrm {I}\in \mathcal{I}_\rho $, such that $\rho ' \in \partial _\mathrm {I}(\rho )$. Then, $\overline{\rho '}\in \partial _{\ell (\mathrm {I})}(\overline{\rho })$.

Lemma 33

Consider a well-indexed expression $\rho $, $a\in \varSigma $ and $\beta \in \partial _a(\overline{\rho })$. Then, there exist $\mathrm {I}\in \mathcal{I}_\rho $ and $\rho ' \in \partial _\mathrm {I}(\rho )$ with $\ell (\mathrm {I}) = a$ and $\overline{\rho '} = \beta $. Furthermore, for $x=a_1\cdots a_n\in \varSigma ^\star $, if $\beta \in \partial _x(\overline{\rho })$, there exist $\mathrm {I}_1\cdots \mathrm {I}_n\in \mathcal{I}_\rho ^\star $ and $\rho ' \in \partial _{\mathrm {I}_1\cdots \mathrm {I}_n}(\rho )$ with $\ell (\mathrm {I}_1\cdots \mathrm {I}_n) = x$ and $\overline{\rho '} = \beta $.

Given $\alpha \in {\mathsf {RE}}_{\cap } $, consider $\mathcal{A}_{\mathsf {c}}(\alpha )$ and the equivalence relation $\equiv _\ell $ on $S_\mathsf {c}$ given by $(\mathrm {I},\mathsf {c}_\mathrm {I}(\overline{\alpha })) \equiv _\ell (\mathrm {J},\mathsf {c}_\mathrm {J}(\overline{\alpha }))$ if and only if $\overline{\mathsf {c}_\mathrm {I}(\overline{\alpha })} = \overline{\mathsf {c}_\mathrm {J}(\overline{\alpha })}$, for $\mathrm {I},\mathrm {J}\in \mathcal{I}_{\overline{\alpha }}\cup \{\{0\}\}$.

Lemma 34

The relation $\equiv _\ell $ is right invariant w.r.t. $\mathcal{A}_{\mathsf {c}}$.

Theorem 35

For $\alpha \in {\mathsf {RE}}_{\cap } $, $\mathcal{A}_{\mathsf {pd}}(\alpha ) \simeq \mathcal{A}_{\mathsf {c}}(\alpha )^{\mathsf {ac}}/{\equiv _\ell }$.

Proof

Let $\mathcal{A}_{\mathsf {c}}(\alpha )^\mathsf {ac}/{\equiv _\ell }=(S_\ell ,\varSigma ,\delta _\ell ,[(\{0\},\overline{\alpha })],F_\ell )$. We define the map $\varphi :S_\ell \rightarrow \partial (\alpha )$ , by $\varphi ([(\mathrm {I},\mathsf {c}_\mathrm {I}(\overline{\alpha }))])=\overline{\mathsf {c}_\mathrm {I}(\overline{\alpha })}.$ We have to show that: $1)\;\varphi $ is well-defined; $2)\;\varphi $ is bijective; $3) \;\varphi (\delta _\ell (s,a))=\delta _\mathsf {pd}(\varphi (s),a)$ for every $s\in S_\ell , a\in \varSigma $; $4)\;\varphi (F_\ell )=F_\mathsf {pd}$; $5)\;\varphi ([(\{0\},\mathsf {c}_{\{0\}}(\overline{\alpha }))])=\alpha $.

Claim 1 follows from Lemmas 30 and 31. The last two are obvious. That $\varphi $ is injective follows from the definition of $\equiv _\ell $. Furthermore, if $\beta \in \partial (\alpha )$, then there are terms $\beta _0 = \alpha ,\beta _1, \ldots , \beta _n= \beta $ and letters $a_1, \ldots , a_n \in \varSigma $, with $n \ge 0$, such that $\beta _{i+1} \in \partial _{a_{i+1}}(\beta _i)$ for $0 \le i \le n-1$. It follows from Lemma 33 that there exist $\mathrm {I}_1\cdots \mathrm {I}_n\in \mathcal{I}_\rho ^\star $ and $\rho ' \in \partial _{\mathrm {I}_1\cdots \mathrm {I}_n}(\overline{\alpha })$ with $\ell (\mathrm {I}_1\cdots \mathrm {I}_n) = a_1 \cdots a_n$ and $\overline{\rho '} = \beta $. Furthermore, by Proposition 24, we know that $\partial _{\mathrm {I}_1\cdots \mathrm {I}_n}(\overline{\alpha })=\{\mathsf {c}_{\mathrm {I}_n}(\overline{\alpha })\}$, with $\mathsf {c}_{\mathrm {I}_n}(\overline{\alpha }) \not = \emptyset $. Thus, $[(\mathrm {I}_n, \mathsf {c}_{\mathrm {I}_n}(\overline{\alpha }))] \in S_\ell $ and we conclude that $\varphi $ is surjective. For 3) we consider both inclusions. Consider $\beta \in \varphi (\delta _\ell (s,a))$, for $s \in S_\ell $ and $a \in \varSigma $. Then, there exist $\mathrm {I},\mathrm {J}\in \mathcal{I}_{\overline{\alpha }}$ such that $[(\mathrm {I},\overline{\mathsf {c}_\mathrm {I}(\overline{\alpha })})] = s$, $\overline{\mathsf {c}_\mathrm {J}(\overline{\alpha })} = \beta $, $(\mathrm {J},\mathsf {c}_\mathrm {J}(\overline{\alpha })) \in \delta _\mathsf {c}((\mathrm {I},\mathsf {c}_\mathrm {I}(\overline{\alpha })),\ell (\mathrm {J}))$ and $\ell (\mathrm {J}) = a$, i.e. $\mathrm {J}\in \mathsf {Fst}(\mathsf {c}_\mathrm {I}(\overline{\alpha }))$. By Lemma 31, we have $\mathsf {c}_{\mathrm {J}}(\overline{\alpha })\in \partial _{\mathrm {J}}(\mathsf {c}_\mathrm {I}(\overline{\alpha }))$ and by Lemma 32, $\overline{\mathsf {c}_{\mathrm {J}}(\overline{\alpha })}\in \partial _{a}(\overline{\mathsf {c}_\mathrm {I}(\overline{\alpha })})$. Thus, $\overline{\mathsf {c}_{\mathrm {J}}(\overline{\alpha })}\in \delta _\mathsf {pd}(\overline{\mathsf {c}_\mathrm {I}(\overline{\alpha })},a)$. Now, let $\beta \in \delta _\mathsf {pd}(\tau ,a)$, where $\tau =\overline{\mathsf {c}_\mathrm {I}(\overline{\alpha })}$, for some $\mathrm {I}\in \mathcal{I}_{\overline{\alpha }}$ and $a \in \varSigma $. Then, there is a sequence of terms $\tau _0 = \alpha , \tau _1, \ldots ,\tau _n=\tau $ and a sequence of letters $a_1,\ldots ,a_n \in \varSigma $ such that $\tau _{i+1} \in \partial _{a_{i+1}}(\tau _i)$, for $0 \le i \le n-1$, and $\beta \in \partial _a(\tau )$, i.e. $\beta \in \partial _{a_1 \cdots a_n a}(\alpha )$. By Lemma 33, there exist $\mathrm {J}_1, \ldots , \mathrm {J}_n, \mathrm {J}\in \mathcal{I}_{\overline{\alpha }}$, with $\ell (\mathrm {J}_1 \cdots \mathrm {J}_n \mathrm {J})=a_1 \cdots a_n a$, and $\rho ' \in \partial _{\mathrm {J}_1 \cdots \mathrm {J}_n \mathrm {J}}(\overline{\alpha })$ such that $\overline{\rho '}=\beta $. By Proposition 24, $\rho '=\mathsf {c}_{\mathrm {J}}(\overline{\alpha })$. On the other hand, it is straightforward to show by structural induction on a well-indexed expression $\rho $, that $\partial _\mathrm {J}(\rho ) \not = \emptyset $ implies $\mathrm {J}\in \mathsf {Fst}(\rho )$. Thus, $[(\mathrm {J},\mathsf {c}_\mathrm {J}(\overline{\alpha }))] \in \delta _\ell ([(\mathrm {I},\mathsf {c}_\mathrm {I}(\overline{\alpha }))],\ell (\mathrm {J}))$ and consequently $\beta = \overline{\mathsf {c}_\mathrm {J}(\overline{\alpha }}) \in \varphi (\delta _\ell ([(\mathrm {I},\mathsf {c}_\mathrm {I}(\overline{\alpha }))],a))$. $\square $

Example 36

Consider $\alpha =(ba^\star b+a)\cap (aa+b)^\star $ from Examples 16 and 29. Set $\beta =(aa+b)^\star $. For the positions present in $\mathcal{A}_{\mathsf {c}}(\alpha )^{\mathsf {ac}}$, we have $\overline{\mathsf {c}_{\{4,5\}}(\overline{\alpha })}=\varepsilon \cap a\beta $, $\overline{\mathsf {c}_{\{3,7\}}(\overline{\alpha })}=\varepsilon \cap \beta $, $\overline{\mathsf {c}_{\{2,5\}}(\overline{\alpha })}=a^\star b \cap a \beta $, and $\overline{\mathsf {c}_{\{1,7\}}(\overline{\alpha })}=\overline{\mathsf {c}_{\{2,6\}}(\overline{\alpha })}=a^\star b\cap \beta $. Merging states $(\{1,7\},\mathsf {c}_{\{1,7\}}(\overline{\alpha }))$ and $(\{2,6\},\mathsf {c}_{\{2,6\}}(\overline{\alpha }))$ in $\mathcal{A}_{\mathsf {c}}(\alpha )^{\mathsf {ac}}$, one obtains an NFA isomorphic to $\mathcal{A}_{\mathsf {pd}}(\alpha )$, which is represented in Fig. 2.

7 Final Remarks

For simple regular expressions of size n, the size of $\mathcal{A}_{\mathsf {pos}}(\alpha )$ is $O(n^2)$, and using $\mathcal{A}_{\mathsf {c}}(\alpha )$ it is possible to efficiently compute $\mathcal{A}_{\mathsf {pd}}(\alpha )$ [9]. For regular expressions with intersection the conversion to NFA ’s has exponential computational complexity [11] and both the size of $\mathcal{A}_{\mathsf {pos}}$ and $\mathcal{A}_{\mathsf {pd}}$ can be exponential in the size of the regular expression. On the average case, however, the size of these automata seem to be much smaller [2], and thus feasible for practical applications. In this scenario, algorithms for building $\mathcal{A}_{\mathsf {pd}}$ using $\mathcal{A}_{\mathsf {pos}}$ seem worthwhile to develop.

Notes

1.
Note that $\ell (x)=\ell (y)$ implies that $m=n$ and that $\ell (x \cap _\mathcal{I}y)=\ell (x)=\ell (y)$.

References

Antimirov, V.: Partial derivatives of regular expressions and finite automaton constructions. Theoret. Comput. Sci. 155(2), 291–319 (1996)
Article MathSciNet MATH Google Scholar
Bastos, R., Broda, S., Machiavelo, A., Moreira, N., Reis, R.: On the state complexity of partial derivative automata for regular expressions with intersection. In: Câmpeanu, C., Manea, F., Shallit, J. (eds.) DCFS 2016. LNCS, vol. 9777, pp. 45–59. Springer, Heidelberg (2016). doi:10.1007/978-3-319-41114-9_4
Chapter Google Scholar
Berry, G., Sethi, R.: From regular expressions to deterministic automata. Theoret. Comput. Sci. 48, 117–126 (1986)
Article MathSciNet MATH Google Scholar
Brüggemann-Klein, A.: Regular expressions into finite automata. Theoret. Comput. Sci. 48, 197–213 (1993)
Article MathSciNet MATH Google Scholar
Brzozowski, J.: Derivatives of regular expressions. JACM 11(4), 481–494 (1964)
Article MathSciNet MATH Google Scholar
Caron, P., Champarnaud, J.-M., Mignot, L.: Partial derivatives of an extended regular expression. In: Dediu, A.-H., Inenaga, S., Martín-Vide, C. (eds.) LATA 2011. LNCS, vol. 6638, pp. 179–191. Springer, Heidelberg (2011)
Chapter Google Scholar
Caron, P., Champarnaud, J., Mignot, L.: A general framework for the derivation of regular expressions. RAIRO - Theor. Inf. Appl. 48(3), 281–305 (2014)
Article MathSciNet MATH Google Scholar
Caron, P., Ziadi, D.: Characterization of Glushkov automata. Theoret. Comput. Sci. 233(1–2), 75–90 (2000)
Article MathSciNet MATH Google Scholar
Champarnaud, J.M., Ziadi, D.: Canonical derivatives, partial derivatives and finite automaton constructions. Theoret. Comput. Sci. 289, 137–163 (2002)
Article MathSciNet MATH Google Scholar
Chen, H., Yu, S.: Derivatives of regular expressions and an application. In: Dinneen, M.J., Khoussainov, B., Nies, A. (eds.) Computation, Physics and Beyond. LNCS, vol. 7160, pp. 343–356. Springer, Heidelberg (2012)
Chapter Google Scholar
Gelade, W.: Succinctness of regular expressions with interleaving, intersection and counting. Theor. Comput. Sci. 411(31–33), 2987–2998 (2010)
Article MathSciNet MATH Google Scholar
Glushkov, V.M.: The abstract theory of automata. Russ. Math. Surv. 16, 1–53 (1961)
Article MathSciNet MATH Google Scholar
Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages and Computation. Addison Wesley, Reading (1979)
MATH Google Scholar
Ilie, L., Yu, S.: Follow automata. Inf. Comput. 186(1), 140–162 (2003)
Article MathSciNet MATH Google Scholar
McNaughton, R., Yamada, H.: Regular expressions and state graphs for automata. IEEE Trans. Elect. Comput. 9, 39–47 (1960)
Article MATH Google Scholar
Sakarovitch, J.: Elements of Automata Theory. Cambridge University Press, Cambridge (2009)
Book MATH Google Scholar
Yu, S.: Regular languages. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 1, pp. 41–110. Springer, Heidelberg (1997)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

CMUP, Faculdade de Ciências da Universidade do Porto, Porto, Portugal
Sabine Broda, António Machiavelo, Nelma Moreira & Rogério Reis

Authors

Sabine Broda
View author publications
You can also search for this author in PubMed Google Scholar
António Machiavelo
View author publications
You can also search for this author in PubMed Google Scholar
Nelma Moreira
View author publications
You can also search for this author in PubMed Google Scholar
Rogério Reis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rogério Reis .

Editor information

Editors and Affiliations

Université du Québec à Montréal , Montreal, Québec, Canada
Srečko Brlek
Dept Mathematiques, Univ du Quebec Montreal Dept Mathematiques, Montreal, Québec, Canada
Christophe Reutenauer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Broda, S., Machiavelo, A., Moreira, N., Reis, R. (2016). Position Automaton Construction for Regular Expressions with Intersection. In: Brlek, S., Reutenauer, C. (eds) Developments in Language Theory. DLT 2016. Lecture Notes in Computer Science(), vol 9840. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53132-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-662-53132-7_5
Published: 21 July 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-53131-0
Online ISBN: 978-3-662-53132-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Position Automaton Construction for Regular Expressions with Intersection

Abstract

Similar content being viewed by others

On the State Complexity of Partial Derivative Automata For Regular Expressions with Intersection

Partial Derivative Automaton for Regular Expressions with Shuffle

Derivatives for Enhanced Regular Expressions

Keywords

1 Introduction

2 Preliminaries

Definition 1

3 Indexed Expressions

Example 2

Definition 3

Example 4

Proposition 5

4 A Position Automaton for \({\mathsf {RE}}_{\cap } \) Expressions

Definition 6

Proposition 7

Definition 8

Example 9

Definition 10

Definition 11

Lemma 12

Example 13

Lemma 14

Theorem 15

Example 16

5 A \(\mathsf {c}\)-Continuation Automaton for \({\mathsf {RE}}_{\cap } \) Expressions

Definition 17

Lemma 18

Example 19

Proposition 20

Corollary 21

Proposition 22

Definition 23

Proposition 24

Proof

Lemma 25

Lemma 26

Definition 27

Theorem 28

Example 29

6 The \(\mathcal{A}_{\mathsf {pd}}\) as a Quotient of \(\mathcal{A}_{\mathsf {pos}}\)

Lemma 30

Lemma 31

Lemma 32

Lemma 33

Lemma 34

Theorem 35

Proof

Example 36

7 Final Remarks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation