Keywords

1 Introduction

LaSer, [18], the formal language server, allows a user to enter a question about an independent language and provides an answer either in real time or by generating a program that can be executed at the user’s site. Typical examples of independent languages are codes in the classic sense, such as prefix codes and error-detecting languages, or DNA-computing related codes. Typical questions about independent languages are the satisfaction, maximality and construction questions. For example, the satisfaction question is to decide, given an independence \(\mathcal {I} \) and a regular language L, whether L is independent with respect to \(\mathcal {I} \).

We present some theoretical and implementational aspects of LaSer, as well as some ongoing progress and research plans.

2 Transducer Independences Allowed in LaSer

Let R be a binary relation, that is, a subset of \(\varSigma ^*\times \varSigma ^*\), where \(\varSigma \) is an alphabet. A language L is R -independent, [21, 22], if

$$\begin{aligned} (x,y)\in R \text { and } x,y\in L\> \text { implies } x=y. \end{aligned}$$
(1)

Examples of R-independent languages are error-detecting languages for various error combinations, variable length codes, such as prefix codes and suffix codes, as well as DNA-related languages [2, 5, 6, 10, 12,13,14, 23]. LaSer allows users to represent rational relations via transducersFootnote 1, and regular languages via NFAsFootnote 2. The satisfaction question is whether \(\mathtt {L} (\varvec{a})\) is \(\mathtt {R} (\varvec{t})\)-independent, given an NFA \(\varvec{a} \) and a transducer \(\varvec{t} \). If the answer is NO, a witness word pair (xy) is computed such that \(x\ne y\) and one of (xy) and (yx) is in \(\mathtt {R}({\varvec{t}}) \). The maximality question is whether \(\mathtt {L} (\varvec{a})\) is a maximal \(\mathtt {R} (\varvec{t})\)-independent language knowing that \(\mathtt {L} (\varvec{a})\) is \(\mathtt {R} (\varvec{t})\)-independent. If the answer is NO, a witness word \(x\notin \mathtt {L} (\varvec{a})\) is computed such that \(\mathtt {L} (\varvec{a})\cup \{x\}\) is \(\mathtt {R} (\varvec{t})\)-independent. The construction question is to make an n-element language (if possible) that is \(\mathtt {R} (\varvec{t})\)-independent, given transducer \(\varvec{t} \), integer \(n>0\) and the size of the alphabet.

If \(\varvec{t} \) is a transducer then the following language class

$$\mathcal {P} _{\varvec{t}}=\{L\mid L\, \text { satisfies~(1) for }R =\mathtt {R} (\varvec{t}) \} $$

is called the independence, or property, described by \(\varvec{t} \). In this case any language \(L\in \mathcal {P} _{\varvec{t}}\) is said to satisfy \(\mathcal {P} _{\varvec{t}}\). For example, the independence “prefix codes” is described by the transducer \(\mathtt {px}\) and the independence “2-synchronization-error detecting languages” is described by the transducer \(\mathtt {id}_{2}\) (see Fig. 1).

Fig. 1.
figure 1

Various transducers. An arrow with label a/a denotes multiple transitions: one with label a/a for each \(a\in \varSigma \), and similarly for labels \(a/\lambda \). An arrow with label \(a/a'\) denotes multiple transitions: one with label \(a/a'\) for all \(a,a'\in \varSigma \) with \(a\not =a'\). Let x be any word. We have: \(\mathtt {px} (x)\) = the set of proper prefixes of x, equivalently, \(\mathtt {R} (\mathtt {px})\) = the set of word pairs (xy) such that y is a proper prefix of x; \(\mathtt {sx} (x)\) = set of proper suffixes of x; \(\mathtt {sub}_{1} (x)\) = set of words resulting by substituting at most one symbol in x with another one; \(\mathtt {id}_{2} (x)\) = set of words resulting by inserting and/or deleting at most 2 symbols in x.

Transducer independences are closed under intersection, that is, if \(\mathcal {P} _{\varvec{t} _1}\) and \(\mathcal {P} _{\varvec{t} _2}\) are transducer independences, then also \(\mathcal {P} _{\varvec{t} _1}\cap \mathcal {P} _{\varvec{t} _2}\) is a transducer independence. This is useful when we are interested in languages L satisfying two or more properties, such as the combined property of being a prefix and 1-substitution detecting code. As prefix codes are described by the transducer \(\mathtt {px}\) and 1-substitution detecting codes are described by the transducer \(\mathtt {sub}_{1}\) then also the combined property is described by a transducer. This implies that LaSer can answer the satisfaction, maximality and construction questions for the combined property.

LaSer’s backend is based on the Python package FAdo [8], which implements automata, transducers, and independences described by transducer objects in the module codes.py [17]. The choice to use FAdo is based on the facts that its installation is very simple, it contains a rich set of easy to use methods, and is written in Python which in turn provides a rich availability of high level methods.

3 Rational Independence Expressions

Three examples of independences that are not of the form \(\mathcal {P} _{\varvec{t}}\) are the following.

  • The class of UD codes (uniquely decodable/decipherable codes). LaSer supports this class.

  • The class of comma-free codes: that is, all languages L satisfying the equation

    $$\begin{aligned} LL\cap \varSigma ^+L\varSigma ^+=\emptyset . \end{aligned}$$
    (2)
  • The class of language pairs \((L_1,L_2)\) satisfying the equation

    $$\begin{aligned} L_1{\mathop {\leftarrow }\limits ^{\mathrm {sdi}}}L_2=\emptyset . \end{aligned}$$
    (3)

    Here the site directed insertion operation \(x{\mathop {\leftarrow }\limits ^{\mathrm {sdi}}}y\) between words xy, introduced in [3], is such that \(z\in x{\mathop {\leftarrow }\limits ^{\mathrm {sdi}}}y\) if \(x=x_1uvx_2\), \(y=uwv\), \(z=x_1uwvx_2\), where uv are nonempty. This operation models site-directed mutagenesis, an important technique for introducing a mutation into a DNA sequence.

That “UD codes” and “comma-free codes” are not transducer independences can be shown using dependence theory: every transducer independence is a 2-independence, but the “comma-free codes” independence, for instance, is a 3-independence and not a 2-independence—see [12, 15]. In general, intersecting (combining) the above independences with a transducer independence results into a new independence that is not of the form \(\mathcal {P} _{\varvec{t}}\) for some transducer \(\varvec{t}\). LaSer does not handle non-transducer independences, with the exception of “UD codes” for which specific algorithms are employed. More specifically, for the satisfaction question LaSer uses the quadratic-time elegant algorithm of [9]. For the maximality question LaSer uses Schützenberger’s theorem that maximality of L is equivalent to the condition that every word in \(\varSigma ^*\) is subword (part) of some word of \(L^*\) [2]. A specific algorithm is also used in [23] for the satisfaction question of the combination of the UD code and two other DNA-related independences.

We now discuss possible approaches of representing non-transducer independences as finite objects in a way that these objects can be manipulated by algorithms which can answer questions about the independences being represented. Some approaches are discussed in [15, 19]. It is important to note that there is a distinction between the terms “property” and “independence”. A (language) property is simply a class (set) of languages. A (language) independence is a property \(\mathcal {P}\) for which the concept of maximality is defined, that is, \(\mathcal {P} \) must satisfy

      if \(L\in \mathcal {P} \) then also \(L'\in \mathcal {P} \) for all \(L'\subseteq L\)

(see [15] for details). A general type of independences can be defined via rational language equations \(\varphi (L)=\emptyset \), where \(\varphi (L)\) is an independence expression involving the variable L. An independence expression \(\varphi \) is defined inductively as follows: it is L or a language constant, or one of \(\varphi _1\varphi _2,\varphi _1\cup \varphi _2,\varphi _1\cap \varphi _2,(\varphi _1)^*,\varvec{t} (\varphi _1),\theta (\varphi _1)\), where \(\varphi _1,\varphi _2\) are independence expressions, \(\varvec{t}\) is a transducer constant, and \(\theta \) is an antimorphic permutationFootnote 3 constant. A rational language equation is an expression of the form \(\varphi (L)=\emptyset \), where \(\varphi (L)\) is an independence expression containing the variable L and the language constants occurring in \(\varphi (L)\) represent regular languages. A language L is \(\varphi \) -independent if it satisfies the equation \(\varphi (L)=\emptyset \). The independence \(\mathcal {P} _\varphi \) described by \(\varphi \) is the set of \(\varphi \)-independent languages.

Example 1

Every transducer independence \(\mathcal {P} _{\varvec{t}}\) such that \(\varvec{t}\) is an input-altering transducerFootnote 4 is described by the rational language equation

$$\begin{aligned} \varvec{t} (L)\cap L=\emptyset , \end{aligned}$$
(4)

where we have used the standard notation \(\varvec{t} (L)=\{y\mid (x,y)\in \mathtt {R} (\varvec{t}),\, x\in L\}\). The independence “comma-free codes” is described by the rational language equation (2). The independence “\(\theta \)-free languages”, [10], is described by the rational language equation

$$ LL\cap \varSigma ^+\theta (L)\varSigma ^+\>=\>\emptyset . $$

   \(\square \)

The satisfaction question for independences \(\mathcal {P} _\varphi \) is implemented in [19], where parsing of expressions \(\varphi (L)\) is implemented using Python’s lark library, and evaluation of \(\varphi (L)\) for given \(L=\mathtt {L} (\varvec{a})\) is implemented using the FAdo package.

4 Further Independences

What about independences described by equations like (3)? This is a non-transducer independence. Various general types of independences are defined in [5, 11, 12, 22]. Equation (3) however involves an independence expression with two language variables: \(L_1\) and \(L_2\). In analogy to transducer independences that are R-independences, for binary relations R, one can consider independences with respect to higher degree relationsFootnote 5. Consider the ternary relation \(\mathtt {SDI} =\{(x,y,z)\mid z\in x{\mathop {\leftarrow }\limits ^{\mathrm {sdi}}}y\}\), which is realized by the transducer \(\mathtt {sdi}\) in Fig. 2.

Fig. 2.
figure 2

The 3-tape transducer \(\mathtt {sdi}\) realizing the set of all word triples (xyz) such that \(x=x_1uvx_2\), \(y=uwv\) and \(z=x_1uwvx_2\), for some nonempty words uv; that is, \(z\in x{\mathop {\leftarrow }\limits ^{\mathrm {sdi}}}y\).

A language pair \((L_1,L_2)\) is \(\mathtt {SDI} \)-independent if

$$ \mathtt {sdi} (L_1,L_2:3)\>=\>\emptyset . $$

Above we have made the following notation: for any k-tape transducer \(\varvec{t}\), for any \(i\in \{1,\ldots ,k\}\), and for any list of \(k-1\) languages \(L_1,\ldots ,L_{k-1}\), the expression \(\varvec{t} (L_1,\ldots ,L_{k-1}:i)\) denotes the set of all words w that result if we consider the i-th tape as output tape and the rest \(k-1\) tapes as input tapes such that the input \(k-1\) words are from the \(k-1\) languages. Thus, we can talk about the independence described by the equation

$$ \varvec{t} (L_1,\ldots ,L_{k-1}:i) \>=\>\emptyset . $$

Given a k-tape transducer \(\varvec{t}\) and \(k-1\) NFAs accepting the languages \(L_1,\ldots ,L_{k-1}\), the satisfaction question can be decided if we construct the transducer resulting by intersecting \(\varvec{t}\) with the NFAs at the \(k-1\) positions other than i, and then testing whether the resulting transducer has a path from an initial to a final state.

5 Looking Ahead

We propose to investigate further the rational language equations defined here as well as in [15]. Topics of interest are maximality, embedding and expressibilityFootnote 6 as well as enhancement and implementation of algorithms involved. Some of these topics could be complex. For example, the maximality question for transducer independences can be answered by a simple algorithm, but the question is PSPACE hard. The embedding question is to construct a maximal independent language containing the given independent language \(\mathtt {L} (\varvec{a})\). For transducer independences described by Eq. (4), the embedding question is addressed in [16]. The maximality and embedding questions for non-transducer independences appear to be far more complex. For the satisfaction question of independences described by rational language equations, a useful question is to define and implement witnesses of non-satisfaction. For example, a witness of non-satisfaction for equation (2) would be a word triple (xyz) such that \(x,y,z\in L\) and \(xy=uzv\) for some nonempty words u and v.