Theoretical and Implementational Aspects of the Formal Language Server (LaSer)
 55 Downloads
Abstract
LaSer, the formal language server, allows a user to enter a question about an independent language and provides an answer either in real time or by generating a program that can be executed at the user’s site. Typical examples of independent languages are codes in the classic sense, such as prefix codes and errordetecting languages, or DNAcomputing related codes. Typical questions about independent languages are the satisfaction, maximality and construction questions. We present some theoretical and implementational aspects of LaSer, as well as some ongoing progress and research plans.
Keywords
Independent languages Regular languages Codes DNA codes Property satisfaction Maximality Implementation1 Introduction
LaSer, [18], the formal language server, allows a user to enter a question about an independent language and provides an answer either in real time or by generating a program that can be executed at the user’s site. Typical examples of independent languages are codes in the classic sense, such as prefix codes and errordetecting languages, or DNAcomputing related codes. Typical questions about independent languages are the satisfaction, maximality and construction questions. For example, the satisfaction question is to decide, given an independence \(\mathcal {I} \) and a regular language L, whether L is independent with respect to \(\mathcal {I} \).
We present some theoretical and implementational aspects of LaSer, as well as some ongoing progress and research plans.
2 Transducer Independences Allowed in LaSer
Transducer independences are closed under intersection, that is, if \(\mathcal {P} _{\varvec{t} _1}\) and \(\mathcal {P} _{\varvec{t} _2}\) are transducer independences, then also \(\mathcal {P} _{\varvec{t} _1}\cap \mathcal {P} _{\varvec{t} _2}\) is a transducer independence. This is useful when we are interested in languages L satisfying two or more properties, such as the combined property of being a prefix and 1substitution detecting code. As prefix codes are described by the transducer \(\mathtt {px}\) and 1substitution detecting codes are described by the transducer \(\mathtt {sub}_{1}\) then also the combined property is described by a transducer. This implies that LaSer can answer the satisfaction, maximality and construction questions for the combined property.
LaSer’s backend is based on the Python package FAdo [8], which implements automata, transducers, and independences described by transducer objects in the module codes.py [17]. The choice to use FAdo is based on the facts that its installation is very simple, it contains a rich set of easy to use methods, and is written in Python which in turn provides a rich availability of high level methods.
3 Rational Independence Expressions
Three examples of independences that are not of the form \(\mathcal {P} _{\varvec{t}}\) are the following.

The class of UD codes (uniquely decodable/decipherable codes). LaSer supports this class.
 The class of commafree codes: that is, all languages L satisfying the equation$$\begin{aligned} LL\cap \varSigma ^+L\varSigma ^+=\emptyset . \end{aligned}$$(2)
 The class of language pairs \((L_1,L_2)\) satisfying the equationHere the site directed insertion operation \(x{\mathop {\leftarrow }\limits ^{\mathrm {sdi}}}y\) between words x, y, introduced in [3], is such that \(z\in x{\mathop {\leftarrow }\limits ^{\mathrm {sdi}}}y\) if \(x=x_1uvx_2\), \(y=uwv\), \(z=x_1uwvx_2\), where u, v are nonempty. This operation models sitedirected mutagenesis, an important technique for introducing a mutation into a DNA sequence.$$\begin{aligned} L_1{\mathop {\leftarrow }\limits ^{\mathrm {sdi}}}L_2=\emptyset . \end{aligned}$$(3)
That “UD codes” and “commafree codes” are not transducer independences can be shown using dependence theory: every transducer independence is a 2independence, but the “commafree codes” independence, for instance, is a 3independence and not a 2independence—see [12, 15]. In general, intersecting (combining) the above independences with a transducer independence results into a new independence that is not of the form \(\mathcal {P} _{\varvec{t}}\) for some transducer \(\varvec{t}\). LaSer does not handle nontransducer independences, with the exception of “UD codes” for which specific algorithms are employed. More specifically, for the satisfaction question LaSer uses the quadratictime elegant algorithm of [9]. For the maximality question LaSer uses Schützenberger’s theorem that maximality of L is equivalent to the condition that every word in \(\varSigma ^*\) is subword (part) of some word of \(L^*\) [2]. A specific algorithm is also used in [23] for the satisfaction question of the combination of the UD code and two other DNArelated independences.
We now discuss possible approaches of representing nontransducer independences as finite objects in a way that these objects can be manipulated by algorithms which can answer questions about the independences being represented. Some approaches are discussed in [15, 19]. It is important to note that there is a distinction between the terms “property” and “independence”. A (language) property is simply a class (set) of languages. A (language) independence is a property \(\mathcal {P}\) for which the concept of maximality is defined, that is, \(\mathcal {P} \) must satisfy
if \(L\in \mathcal {P} \) then also \(L'\in \mathcal {P} \) for all \(L'\subseteq L\)
(see [15] for details). A general type of independences can be defined via rational language equations \(\varphi (L)=\emptyset \), where \(\varphi (L)\) is an independence expression involving the variable L. An independence expression \(\varphi \) is defined inductively as follows: it is L or a language constant, or one of \(\varphi _1\varphi _2,\varphi _1\cup \varphi _2,\varphi _1\cap \varphi _2,(\varphi _1)^*,\varvec{t} (\varphi _1),\theta (\varphi _1)\), where \(\varphi _1,\varphi _2\) are independence expressions, \(\varvec{t}\) is a transducer constant, and \(\theta \) is an antimorphic permutation^{3} constant. A rational language equation is an expression of the form \(\varphi (L)=\emptyset \), where \(\varphi (L)\) is an independence expression containing the variable L and the language constants occurring in \(\varphi (L)\) represent regular languages. A language L is \(\varphi \) independent if it satisfies the equation \(\varphi (L)=\emptyset \). The independence \(\mathcal {P} _\varphi \) described by \(\varphi \) is the set of \(\varphi \)independent languages.
Example 1
The satisfaction question for independences \(\mathcal {P} _\varphi \) is implemented in [19], where parsing of expressions \(\varphi (L)\) is implemented using Python’s lark library, and evaluation of \(\varphi (L)\) for given \(L=\mathtt {L} (\varvec{a})\) is implemented using the FAdo package.
4 Further Independences
5 Looking Ahead
We propose to investigate further the rational language equations defined here as well as in [15]. Topics of interest are maximality, embedding and expressibility^{6} as well as enhancement and implementation of algorithms involved. Some of these topics could be complex. For example, the maximality question for transducer independences can be answered by a simple algorithm, but the question is PSPACE hard. The embedding question is to construct a maximal independent language containing the given independent language \(\mathtt {L} (\varvec{a})\). For transducer independences described by Eq. (4), the embedding question is addressed in [16]. The maximality and embedding questions for nontransducer independences appear to be far more complex. For the satisfaction question of independences described by rational language equations, a useful question is to define and implement witnesses of nonsatisfaction. For example, a witness of nonsatisfaction for equation (2) would be a word triple (x, y, z) such that \(x,y,z\in L\) and \(xy=uzv\) for some nonempty words u and v.
Footnotes
 1.
 2.
NFA = Nondeterministic Finite Automaton.
 3.
An antimorphic permutation \(\theta \) maps the alphabet \(\varSigma \) onto \(\varSigma \) and extends to words antimorphically: \(\theta (xy)=\theta (y)\theta (x)\). A typical example of this is the DNA involution on the alphabet \(\{\mathtt a,\mathtt c, \mathtt g, \mathtt t\}\) such that \(\theta (\mathtt a)=\mathtt t\), \(\theta (\mathtt c)=\mathtt g\), \(\theta (\mathtt g)=\mathtt c\), \(\theta (\mathtt t)=\mathtt a\). In this case, \(\theta (\mathtt a\mathtt a\mathtt c)=\mathtt g\mathtt t\mathtt t\).
 4.
This is a transducer \(\varvec{t}\) such that \(w\notin \varvec{t} (w)\) for all words w. For example, \(\mathtt {px}\) and \(\mathtt {sx}\) in Fig. 1 are inputaltering transducers.
 5.
 6.
What independences are and are not describable by the independencedescribing method.
Notes
Acknowledgement
We thank the anonymous referees for constructive comments.
References
 1.Berstel, J.: Transductions and ContextFree Languages. B.G. Teubner, Stuttgart (1979)CrossRefGoogle Scholar
 2.Berstel, J., Perrin, D., Reutenauer, C.: Codes and Automata. Cambridge University Press, Cambridge (2009)Google Scholar
 3.Cho, D.J., Han, Y.S., Salomaa, K., Smith, T.J.: Sitedirected insertion: decision problems, maximality and minimality. In: Konstantinidis, S., Pighizzini, G. (eds.) DCFS 2018. LNCS, vol. 10952, pp. 49–61. Springer, Cham (2018). https://doi.org/10.1007/9783319946313_5CrossRefGoogle Scholar
 4.Choffrut, C.: Relations over words and logic: a chronology. Bull. EATCS 89, 159–163 (2006)MathSciNetzbMATHGoogle Scholar
 5.Domaratzki, M.: Trajectorybased codes. Acta Informatica 40, 491–527 (2004). https://doi.org/10.1007/s0023600401404MathSciNetCrossRefzbMATHGoogle Scholar
 6.Domaratzki, M.: Bondfree DNA language classes. Nat. Comput. 6, 371–402 (2007). https://doi.org/10.1007/s1104700690228MathSciNetCrossRefzbMATHGoogle Scholar
 7.Elgot, C.C., Mezei, J.E.: On relations defined by generalized finite automata. IBM J. Res. Dev. 9(1), 47–68 (1965). https://doi.org/10.1147/rd.91.0047MathSciNetCrossRefzbMATHGoogle Scholar
 8.FAdo: Tools for formal languages manipulation. http://fado.dcc.fc.up.pt/. Accessed Jan 2020
 9.Head, T., Weber, A.: Deciding code related properties by means of finite transducers. In: Capocelli, R., de Santis, A., Vaccaro, U. (eds.) Sequences II, Methods in Communication, Security, and Computer Science, pp. 260–272. Springer, Berlin (1993). https://doi.org/10.1007/9781461393238_19CrossRefGoogle Scholar
 10.Hussini, S., Kari, L., Konstantinidis, S.: Coding properties of DNA languages. Theoret. Comput. Sci. 290, 1557–1579 (2003). https://doi.org/10.1016/S03043975(02)000695MathSciNetCrossRefzbMATHGoogle Scholar
 11.Jürgensen, H.: Syntactic monoids of codes. Acta Cybern. 14, 117–133 (1999)MathSciNetzbMATHGoogle Scholar
 12.Jürgensen, H., Konstantinidis, S.: Codes. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 1, pp. 511–607. Springer, Berlin (1997). https://doi.org/10.1007/9783642591365_8
 13.Kari, L., Konstantinidis, S., Kopecki, S.: Transducer descriptions of DNA code properties and undecidability of antimorphic problems. Inf. Comput. 259(2), 237–258 (2018). https://doi.org/10.1016/j.ic.2017.09.004MathSciNetCrossRefzbMATHGoogle Scholar
 14.Konstantinidis, S.: An algebra of discrete channels that involve combinations of three basic error types. Inf. Comput. 167(2), 120–131 (2001). https://doi.org/10.1006/inco.2001.3035MathSciNetCrossRefzbMATHGoogle Scholar
 15.Konstantinidis, S.: Applications of transducers in independent languages, word distances, codes. In: Pighizzini, G., Câmpeanu, C. (eds.) DCFS 2017. LNCS, vol. 10316, pp. 45–62. Springer, Cham (2017). https://doi.org/10.1007/9783319602523_4CrossRefzbMATHGoogle Scholar
 16.Konstantinidis, S., Mastnak, M.: Embedding rationally independent languages into maximal ones. J. Automata Lang. Comb. 21(4), 311–338 (2017). https://doi.org/10.25596/jalc2016311
 17.Konstantinidis, S., Meijer, C., Moreira, N., Reis, R.: Symbolic manipulation of code properties. J. Automata Lang. Comb. 23(1–3), 243–269 (2018). https://doi.org/10.25596/jalc2018243. (This is the full journal version of the paper “Implementation of Code Properties via Transducers” in the Proceedings of CIAA 2016, LNCS 9705, pp 1–13, edited by YoSub Han and Kai Salomaa)
 18.LaSer: Independent LAnguage SERver. http://laser.cs.smu.ca/independence/. Accessed Apr 2020
 19.Rafuse, M.: Deciding rational property definitions, Honours Undergraduate Thesis, Department of Mathematics and Computing Science, Saint Mary’s University, Halifax, NS, Canada (2019)Google Scholar
 20.Sakarovitch, J.: Elements of Automata Theory. Cambridge University Press, Berlin (2009)CrossRefGoogle Scholar
 21.Shyr, H.J., Thierrin, G.: Codes and binary relations. In: Malliavin, M.P. (ed.) Séminaire d’Algèbre Paul Dubreil, Paris 1975–1976 (29ème Année). Lecture Notes in Mathematics, vol. 586, pp. 180–188. Springer, Heidelberg (1977). https://doi.org/10.1007/BFb0087133
 22.Yu, S.S.: Languages and Codes. Tsang Hai Book Publishing, Taichung (2005)Google Scholar
 23.Zaccagnino, R., Zizza, R., Zottoli, C.: Testing DNA code words properties of regular languages. Theoret. Comput. Sci. 608, 84–97 (2015). https://doi.org/10.1016/j.tcs.2015.08.034MathSciNetCrossRefzbMATHGoogle Scholar